LKML Archive mirror
 help / color / mirror / Atom feed
* [PATCH 1/1] Revert "genirq: Remove the second parameter from handle_irq_event_percpu()"
@ 2016-01-13 10:31 zyjzyj2000
  2016-01-13 13:07 ` Thomas Gleixner
  2016-01-21  7:52 ` [V2 PATCH 1/1] genirq: fix desc->action become NULL error zyjzyj2000
  0 siblings, 2 replies; 7+ messages in thread
From: zyjzyj2000 @ 2016-01-13 10:31 UTC (permalink / raw
  To: zyjzyj2000, tglx, linux-kernel

From: Zhu Yanjun <zyjzyj2000@gmail.com>

After this commit 71f64340fc0e ("genirq: Remove the second parameter
from handle_irq_event_percpu()") is applied, the variable action is
not protected by raw_spin_lock. The following calltrace will pop up.

BUG: unable to handle kernel NULL pointer dereference at 0000000000000008
IP: [<ffffffff810a4991>] handle_irq_event_percpu+0x31/0x1c0
PGD 0
Oops: 0000 [#1] PREEMPT SMP
Modules linked in:
CPU: 2 PID: 0 Comm: swapper/2 Not tainted 4.4.0 #30
task: ffff88003d2ed040 ti: ffff88003d380000 task.ti: ffff88003d380000
RIP: 0010:[<ffffffff810a4991>]  [<ffffffff810a4991>] handle_irq_event_percpu+0x31/0x1c0
RSP: 0018:ffff88003eb03ed8  EFLAGS: 00010046
RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000010003
RDX: 0000000080010003 RSI: 0000000000000000 RDI: ffff88003d02ac00
RBP: ffff88003eb03f10 R08: ffff88003d380000 R09: 0000000000000002
R10: 0000000000027e88 R11: 0000000000000282 R12: 0000000000000004
R13: ffff88003d02ac38 R14: 0000000000000000 R15: 0000000000000000
FS:  0000000000000000(0000) GS:ffff88003eb00000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 0000000000000008 CR3: 0000000001e0a000 CR4: 00000000000006e0
Stack:
 ffff88003d02ac00 0000000000000007 ffff88003d02ac00 ffff88003d02acb4
 ffff88003d02ac38 0000000000000034 0000000000000000 ffff88003eb03f38
 ffffffff810a4b5c ffff88003d02ac00 ffff88003d02acb4 ffff88003d02ac38
Call Trace:
 <IRQ>
 [<ffffffff810a4b5c>] handle_irq_event+0x3c/0x60
 [<ffffffff810a7c9f>] handle_edge_irq+0xcf/0x160
 [<ffffffff810067ba>] handle_irq+0x1a/0x30
 [<ffffffff819b0d37>] do_IRQ+0x57/0xf0
 [<ffffffff819af1ff>] common_interrupt+0x7f/0x7f
 <EOI>
 [<ffffffff819ae192>] ? _raw_write_unlock_irq+0x12/0x30
 [<ffffffff819ae1be>] _raw_spin_unlock_irq+0xe/0x10
 [<ffffffff8107703a>] finish_task_switch+0x9a/0x1f0
 [<ffffffff819aa375>] __schedule+0x3c5/0xb60
 [<ffffffff819aac8f>] schedule+0x3f/0x90
 [<ffffffff819aaf18>] schedule_preempt_disabled+0x18/0x30
 [<ffffffff8108f2ec>] cpu_startup_entry+0x13c/0x320
 [<ffffffff810379b1>] start_secondary+0xf1/0x100
RIP  [<ffffffff810a4991>] handle_irq_event_percpu+0x31/0x1c0
 RSP <ffff88003eb03ed8>
CR2: 0000000000000008
---[ end trace c62dc8f0b2aee0f5 ]---
Kernel panic - not syncing: Fatal exception in interrupt
Kernel Offset: disabled
---[ end Kernel panic - not syncing: Fatal exception in interrupt

Signed-off-by: Zhu Yanjun <zyjzyj2000@gmail.com>
---
 kernel/irq/chip.c      | 2 +-
 kernel/irq/handle.c    | 7 ++++---
 kernel/irq/internals.h | 2 +-
 3 files changed, 6 insertions(+), 5 deletions(-)

diff --git a/kernel/irq/chip.c b/kernel/irq/chip.c
index 5797909..ce483ac 100644
--- a/kernel/irq/chip.c
+++ b/kernel/irq/chip.c
@@ -692,7 +692,7 @@ void handle_percpu_irq(struct irq_desc *desc)
 	if (chip->irq_ack)
 		chip->irq_ack(&desc->irq_data);
 
-	handle_irq_event_percpu(desc);
+	handle_irq_event_percpu(desc, desc->action);
 
 	if (chip->irq_eoi)
 		chip->irq_eoi(&desc->irq_data);
diff --git a/kernel/irq/handle.c b/kernel/irq/handle.c
index a302cf9..e25a83b 100644
--- a/kernel/irq/handle.c
+++ b/kernel/irq/handle.c
@@ -132,11 +132,11 @@ void __irq_wake_thread(struct irq_desc *desc, struct irqaction *action)
 	wake_up_process(action->thread);
 }
 
-irqreturn_t handle_irq_event_percpu(struct irq_desc *desc)
+irqreturn_t
+handle_irq_event_percpu(struct irq_desc *desc, struct irqaction *action)
 {
 	irqreturn_t retval = IRQ_NONE;
 	unsigned int flags = 0, irq = desc->irq_data.irq;
-	struct irqaction *action = desc->action;
 
 	do {
 		irqreturn_t res;
@@ -184,13 +184,14 @@ irqreturn_t handle_irq_event_percpu(struct irq_desc *desc)
 
 irqreturn_t handle_irq_event(struct irq_desc *desc)
 {
+	struct irqaction *action = desc->action;
 	irqreturn_t ret;
 
 	desc->istate &= ~IRQS_PENDING;
 	irqd_set(&desc->irq_data, IRQD_IRQ_INPROGRESS);
 	raw_spin_unlock(&desc->lock);
 
-	ret = handle_irq_event_percpu(desc);
+	ret = handle_irq_event_percpu(desc, action);
 
 	raw_spin_lock(&desc->lock);
 	irqd_clear(&desc->irq_data, IRQD_IRQ_INPROGRESS);
diff --git a/kernel/irq/internals.h b/kernel/irq/internals.h
index fcab63c..25a2c9c 100644
--- a/kernel/irq/internals.h
+++ b/kernel/irq/internals.h
@@ -83,7 +83,7 @@ extern void irq_mark_irq(unsigned int irq);
 
 extern void init_kstat_irqs(struct irq_desc *desc, int node, int nr);
 
-irqreturn_t handle_irq_event_percpu(struct irq_desc *desc);
+irqreturn_t handle_irq_event_percpu(struct irq_desc *desc, struct irqaction *action);
 irqreturn_t handle_irq_event(struct irq_desc *desc);
 
 /* Resending of interrupts :*/
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH 1/1] Revert "genirq: Remove the second parameter from handle_irq_event_percpu()"
  2016-01-13 10:31 [PATCH 1/1] Revert "genirq: Remove the second parameter from handle_irq_event_percpu()" zyjzyj2000
@ 2016-01-13 13:07 ` Thomas Gleixner
  2016-01-14  1:29   ` Huang Shijie
  2016-01-14 19:15   ` [tip:irq/urgent] genirq: Validate action before dereferencing it in handle_irq_event_percpu() tip-bot for Thomas Gleixner
  2016-01-21  7:52 ` [V2 PATCH 1/1] genirq: fix desc->action become NULL error zyjzyj2000
  1 sibling, 2 replies; 7+ messages in thread
From: Thomas Gleixner @ 2016-01-13 13:07 UTC (permalink / raw
  To: zyjzyj2000; +Cc: LKML, Huang Shijie, Jiang Liu, Peter Zijlstra

On Wed, 13 Jan 2016, zyjzyj2000@gmail.com wrote:

> After this commit 71f64340fc0e ("genirq: Remove the second parameter
> from handle_irq_event_percpu()") is applied, the variable action is
> not protected by raw_spin_lock. The following calltrace will pop up.

Thanks, for the report. I missed that detail when merging the patch!

Just for correctness sake: You miss to explain why this can happen.

It's not about the variable action, it's about desc->action not being
protected anymore. So the reason why this oopses is that the action is being
removed concurrently.

CPU 0			CPU 1

free_irq()		lock(desc)
lock(desc)		handle_edge_irq()
			  handle_irq_event(desc)
			    unlock(desc)
desc->action = NULL	    handle_irq_event_percpu(desc)
	       		      action = desc->action

While the original code did:

free_irq()		lock(desc)
lock(desc)		handle_edge_irq()
			  handle_irq_event()
	       		    action = desc->action
			    unlock(desc)
desc->action = NULL	    handle_irq_event_percpu(desc, action)
	       		    
So now the question is whether we revert that patch or simply change
handle_irq_event_percpu() to deal with that. Patch below.

That preserves us the code size reduction of commit 71f64340fc0e. This is safe
because we either see a valid desc->action or NULL. If the action is about to
be removed it is still valid as free_irq() is blocked on synchronize_irq().

free_irq()		lock(desc)
lock(desc)		handle_edge_irq()
			  handle_irq_event(desc)
			    set(INPROGRESS)
			    unlock(desc)
			      handle_irq_event_percpu(desc)
	       		        action = desc->action
desc->action = NULL
sychronize_irq()
  while(INPROGRESS);	   lock(desc)
			   clr(INPROGRESS)
free(action)

That's basically the same mechanism as we have for shared
interrupts. action->next can become NULL while handle_irq_event_percpu()
runs. Either it sees the action or NULL. It does not matter, because action
itself cannot go away.

Thanks,

	tglx

8<-------------

--- a/kernel/irq/handle.c
+++ b/kernel/irq/handle.c
@@ -136,9 +136,15 @@ irqreturn_t handle_irq_event_percpu(stru
 {
 	irqreturn_t retval = IRQ_NONE;
 	unsigned int flags = 0, irq = desc->irq_data.irq;
-	struct irqaction *action = desc->action;
+	struct irqaction *action;
 
-	do {
+	/*
+	 * READ_ONCE is not required here. The compiler cannot reload action
+	 * because it'll be action->next for the second iteration of the loop.
+	 */
+	action = desc->action;
+
+	while (action) {
 		irqreturn_t res;
 
 		trace_irq_handler_entry(irq, action);
@@ -173,7 +179,7 @@ irqreturn_t handle_irq_event_percpu(stru
 
 		retval |= res;
 		action = action->next;
-	} while (action);
+	}
 
 	add_interrupt_randomness(irq, flags);
 

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH 1/1] Revert "genirq: Remove the second parameter from handle_irq_event_percpu()"
  2016-01-13 13:07 ` Thomas Gleixner
@ 2016-01-14  1:29   ` Huang Shijie
  2016-01-18  8:00     ` zhuyj
  2016-01-14 19:15   ` [tip:irq/urgent] genirq: Validate action before dereferencing it in handle_irq_event_percpu() tip-bot for Thomas Gleixner
  1 sibling, 1 reply; 7+ messages in thread
From: Huang Shijie @ 2016-01-14  1:29 UTC (permalink / raw
  To: Thomas Gleixner; +Cc: zyjzyj2000, LKML, Jiang Liu, Peter Zijlstra, nd

On Wed, Jan 13, 2016 at 02:07:25PM +0100, Thomas Gleixner wrote:
> On Wed, 13 Jan 2016, zyjzyj2000@gmail.com wrote:
> 
> > After this commit 71f64340fc0e ("genirq: Remove the second parameter
> > from handle_irq_event_percpu()") is applied, the variable action is
> > not protected by raw_spin_lock. The following calltrace will pop up.
> 
> Thanks, for the report. I missed that detail when merging the patch!
> 
> Just for correctness sake: You miss to explain why this can happen.
> 
> It's not about the variable action, it's about desc->action not being
> protected anymore. So the reason why this oopses is that the action is being
> removed concurrently.
> 
> CPU 0			CPU 1
> 
> free_irq()		lock(desc)
> lock(desc)		handle_edge_irq()
> 			  handle_irq_event(desc)
> 			    unlock(desc)
> desc->action = NULL	    handle_irq_event_percpu(desc)
> 	       		      action = desc->action
> 
> While the original code did:
> 
> free_irq()		lock(desc)
> lock(desc)		handle_edge_irq()
> 			  handle_irq_event()
> 	       		    action = desc->action
> 			    unlock(desc)
> desc->action = NULL	    handle_irq_event_percpu(desc, action)
> 	       		    
> So now the question is whether we revert that patch or simply change
> handle_irq_event_percpu() to deal with that. Patch below.
> 
> That preserves us the code size reduction of commit 71f64340fc0e. This is safe
> because we either see a valid desc->action or NULL. If the action is about to
> be removed it is still valid as free_irq() is blocked on synchronize_irq().
> 
> free_irq()		lock(desc)
> lock(desc)		handle_edge_irq()
> 			  handle_irq_event(desc)
> 			    set(INPROGRESS)
> 			    unlock(desc)
> 			      handle_irq_event_percpu(desc)
> 	       		        action = desc->action
> desc->action = NULL
> sychronize_irq()
>   while(INPROGRESS);	   lock(desc)
> 			   clr(INPROGRESS)
> free(action)
> 
> That's basically the same mechanism as we have for shared
> interrupts. action->next can become NULL while handle_irq_event_percpu()
> runs. Either it sees the action or NULL. It does not matter, because action
> itself cannot go away.
> 
> Thanks,
> 
> 	tglx
> 
> 8<-------------
> 
> --- a/kernel/irq/handle.c
> +++ b/kernel/irq/handle.c
> @@ -136,9 +136,15 @@ irqreturn_t handle_irq_event_percpu(stru
>  {
>  	irqreturn_t retval = IRQ_NONE;
>  	unsigned int flags = 0, irq = desc->irq_data.irq;
> -	struct irqaction *action = desc->action;
> +	struct irqaction *action;
>  
> -	do {
> +	/*
> +	 * READ_ONCE is not required here. The compiler cannot reload action
> +	 * because it'll be action->next for the second iteration of the loop.
> +	 */
> +	action = desc->action;
> +
> +	while (action) {
>  		irqreturn_t res;
>  
>  		trace_irq_handler_entry(irq, action);
> @@ -173,7 +179,7 @@ irqreturn_t handle_irq_event_percpu(stru
>  
>  		retval |= res;
>  		action = action->next;
> -	} while (action);
> +	}
>  
>  	add_interrupt_randomness(irq, flags);

I prefer to this patch, revert the old the patch is not a good solution.

thanks
Huang Shijie

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [tip:irq/urgent] genirq: Validate action before dereferencing it in handle_irq_event_percpu()
  2016-01-13 13:07 ` Thomas Gleixner
  2016-01-14  1:29   ` Huang Shijie
@ 2016-01-14 19:15   ` tip-bot for Thomas Gleixner
  1 sibling, 0 replies; 7+ messages in thread
From: tip-bot for Thomas Gleixner @ 2016-01-14 19:15 UTC (permalink / raw
  To: linux-tip-commits
  Cc: peterz, shijie.huang, tglx, mingo, linux-kernel, jiang.liu, hpa

Commit-ID:  570540d50710ed192e98e2f7f74578c9486b6b05
Gitweb:     http://git.kernel.org/tip/570540d50710ed192e98e2f7f74578c9486b6b05
Author:     Thomas Gleixner <tglx@linutronix.de>
AuthorDate: Wed, 13 Jan 2016 14:07:25 +0100
Committer:  Thomas Gleixner <tglx@linutronix.de>
CommitDate: Thu, 14 Jan 2016 20:09:49 +0100

genirq: Validate action before dereferencing it in handle_irq_event_percpu()

commit 71f64340fc0e changed the handling of irq_desc->action from

CPU 0                   CPU 1
free_irq()              lock(desc)
  lock(desc)            handle_edge_irq()
                        if (desc->action) {
                          handle_irq_event()
                            action = desc->action
                            unlock(desc)
  desc->action = NULL       handle_irq_event_percpu(desc, action)
                              action->xxx
to

CPU 0                   CPU 1
free_irq()              lock(desc)
  lock(desc)            handle_edge_irq()
                        if (desc->action) {
                          handle_irq_event()
                            unlock(desc)
  desc->action = NULL       handle_irq_event_percpu(desc, action)
                              action = desc->action
                              action->xxx

So if free_irq manages to set the action to NULL between the unlock and before
the readout, we happily dereference a null pointer.

We could simply revert 71f64340fc0e, but we want to preserve the better code
generation. A simple solution is to change the action loop from a do {} while
to a while {} loop.

This is safe because we either see a valid desc->action or NULL. If the action
is about to be removed it is still valid as free_irq() is blocked on
synchronize_irq().

CPU 0                   CPU 1
free_irq()              lock(desc)
  lock(desc)            handle_edge_irq()
                          handle_irq_event(desc)
                            set(INPROGRESS)
                            unlock(desc)
                            handle_irq_event_percpu(desc)
                            action = desc->action
  desc->action = NULL           while (action) {
                                  action->xxx
                                  ...
                                  action = action->next;
  sychronize_irq()
    while(INPROGRESS);      lock(desc)
                            clr(INPROGRESS)
free(action)

That's basically the same mechanism as we have for shared
interrupts. action->next can become NULL while handle_irq_event_percpu()
runs. Either it sees the action or NULL. It does not matter, because action
itself cannot go away before the interrupt in progress flag has been cleared.

Fixes: commit 71f64340fc0e "genirq: Remove the second parameter from handle_irq_event_percpu()"
Reported-by: zyjzyj2000@gmail.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Huang Shijie <shijie.huang@arm.com>
Cc: Jiang Liu <jiang.liu@linux.intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: stable@vger.kernel.org
Link: http://lkml.kernel.org/r/alpine.DEB.2.11.1601131224190.3575@nanos
---
 kernel/irq/handle.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/kernel/irq/handle.c b/kernel/irq/handle.c
index a302cf9..57bff78 100644
--- a/kernel/irq/handle.c
+++ b/kernel/irq/handle.c
@@ -138,7 +138,8 @@ irqreturn_t handle_irq_event_percpu(struct irq_desc *desc)
 	unsigned int flags = 0, irq = desc->irq_data.irq;
 	struct irqaction *action = desc->action;
 
-	do {
+	/* action might have become NULL since we dropped the lock */
+	while (action) {
 		irqreturn_t res;
 
 		trace_irq_handler_entry(irq, action);
@@ -173,7 +174,7 @@ irqreturn_t handle_irq_event_percpu(struct irq_desc *desc)
 
 		retval |= res;
 		action = action->next;
-	} while (action);
+	}
 
 	add_interrupt_randomness(irq, flags);
 

^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH 1/1] Revert "genirq: Remove the second parameter from handle_irq_event_percpu()"
  2016-01-14  1:29   ` Huang Shijie
@ 2016-01-18  8:00     ` zhuyj
  0 siblings, 0 replies; 7+ messages in thread
From: zhuyj @ 2016-01-18  8:00 UTC (permalink / raw
  To: Huang Shijie, Thomas Gleixner; +Cc: LKML, Jiang Liu, Peter Zijlstra, nd

Hi, all

I made tests for this patch. To now, I can not find any similar problem.

Best Regards!
Zhu Yanjun

On 01/14/2016 09:29 AM, Huang Shijie wrote:
> On Wed, Jan 13, 2016 at 02:07:25PM +0100, Thomas Gleixner wrote:
>> On Wed, 13 Jan 2016, zyjzyj2000@gmail.com wrote:
>>
>>> After this commit 71f64340fc0e ("genirq: Remove the second parameter
>>> from handle_irq_event_percpu()") is applied, the variable action is
>>> not protected by raw_spin_lock. The following calltrace will pop up.
>> Thanks, for the report. I missed that detail when merging the patch!
>>
>> Just for correctness sake: You miss to explain why this can happen.
>>
>> It's not about the variable action, it's about desc->action not being
>> protected anymore. So the reason why this oopses is that the action is being
>> removed concurrently.
>>
>> CPU 0			CPU 1
>>
>> free_irq()		lock(desc)
>> lock(desc)		handle_edge_irq()
>> 			  handle_irq_event(desc)
>> 			    unlock(desc)
>> desc->action = NULL	    handle_irq_event_percpu(desc)
>> 	       		      action = desc->action
>>
>> While the original code did:
>>
>> free_irq()		lock(desc)
>> lock(desc)		handle_edge_irq()
>> 			  handle_irq_event()
>> 	       		    action = desc->action
>> 			    unlock(desc)
>> desc->action = NULL	    handle_irq_event_percpu(desc, action)
>> 	       		
>> So now the question is whether we revert that patch or simply change
>> handle_irq_event_percpu() to deal with that. Patch below.
>>
>> That preserves us the code size reduction of commit 71f64340fc0e. This is safe
>> because we either see a valid desc->action or NULL. If the action is about to
>> be removed it is still valid as free_irq() is blocked on synchronize_irq().
>>
>> free_irq()		lock(desc)
>> lock(desc)		handle_edge_irq()
>> 			  handle_irq_event(desc)
>> 			    set(INPROGRESS)
>> 			    unlock(desc)
>> 			      handle_irq_event_percpu(desc)
>> 	       		        action = desc->action
>> desc->action = NULL
>> sychronize_irq()
>>    while(INPROGRESS);	   lock(desc)
>> 			   clr(INPROGRESS)
>> free(action)
>>
>> That's basically the same mechanism as we have for shared
>> interrupts. action->next can become NULL while handle_irq_event_percpu()
>> runs. Either it sees the action or NULL. It does not matter, because action
>> itself cannot go away.
>>
>> Thanks,
>>
>> 	tglx
>>
>> 8<-------------
>>
>> --- a/kernel/irq/handle.c
>> +++ b/kernel/irq/handle.c
>> @@ -136,9 +136,15 @@ irqreturn_t handle_irq_event_percpu(stru
>>   {
>>   	irqreturn_t retval = IRQ_NONE;
>>   	unsigned int flags = 0, irq = desc->irq_data.irq;
>> -	struct irqaction *action = desc->action;
>> +	struct irqaction *action;
>>   
>> -	do {
>> +	/*
>> +	 * READ_ONCE is not required here. The compiler cannot reload action
>> +	 * because it'll be action->next for the second iteration of the loop.
>> +	 */
>> +	action = desc->action;
>> +
>> +	while (action) {
>>   		irqreturn_t res;
>>   
>>   		trace_irq_handler_entry(irq, action);
>> @@ -173,7 +179,7 @@ irqreturn_t handle_irq_event_percpu(stru
>>   
>>   		retval |= res;
>>   		action = action->next;
>> -	} while (action);
>> +	}
>>   
>>   	add_interrupt_randomness(irq, flags);
> I prefer to this patch, revert the old the patch is not a good solution.
>
> thanks
> Huang Shijie
>

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [V2 PATCH 1/1] genirq: fix desc->action become NULL error
  2016-01-13 10:31 [PATCH 1/1] Revert "genirq: Remove the second parameter from handle_irq_event_percpu()" zyjzyj2000
  2016-01-13 13:07 ` Thomas Gleixner
@ 2016-01-21  7:52 ` zyjzyj2000
  2016-01-21  7:52   ` zyjzyj2000
  1 sibling, 1 reply; 7+ messages in thread
From: zyjzyj2000 @ 2016-01-21  7:52 UTC (permalink / raw
  To: zyjzyj2000, linux-kernel, jiang.liu, peterz, nd, tglx,
	shijie.huang


Hi, all

According to the suggestions from Thomas Gleixner, I made a new patch
to fix this problem.

Changes:
The commit 71f64340fc0e will not be reverted. And action test is
inserted.

Best Regards!
Zhu Yanjun

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [V2 PATCH 1/1] genirq: fix desc->action become NULL error
  2016-01-21  7:52 ` [V2 PATCH 1/1] genirq: fix desc->action become NULL error zyjzyj2000
@ 2016-01-21  7:52   ` zyjzyj2000
  0 siblings, 0 replies; 7+ messages in thread
From: zyjzyj2000 @ 2016-01-21  7:52 UTC (permalink / raw
  To: zyjzyj2000, linux-kernel, jiang.liu, peterz, nd, tglx,
	shijie.huang

From: Zhu Yanjun <zyjzyj2000@gmail.com>

After this commit 71f64340fc0e ("genirq: Remove the second parameter
from handle_irq_event_percpu()") is applied, the variable desc->action is
not protected by raw_spin_lock. The following calltrace will pop up.

BUG: unable to handle kernel NULL pointer dereference at 0000000000000008
IP: [<ffffffff810a4991>] handle_irq_event_percpu+0x31/0x1c0
...
Call Trace:
<IRQ>
[<ffffffff810a4b5c>] handle_irq_event+0x3c/0x60
[<ffffffff810a7c9f>] handle_edge_irq+0xcf/0x160
[<ffffffff810067ba>] handle_irq+0x1a/0x30
[<ffffffff819b0d37>] do_IRQ+0x57/0xf0
[<ffffffff819af1ff>] common_interrupt+0x7f/0x7f
<EOI>
[<ffffffff819ae192>] ? _raw_write_unlock_irq+0x12/0x30
[<ffffffff819ae1be>] _raw_spin_unlock_irq+0xe/0x10
[<ffffffff8107703a>] finish_task_switch+0x9a/0x1f0
[<ffffffff819aa375>] __schedule+0x3c5/0xb60
[<ffffffff819aac8f>] schedule+0x3f/0x90
[<ffffffff819aaf18>] schedule_preempt_disabled+0x18/0x30
[<ffffffff8108f2ec>] cpu_startup_entry+0x13c/0x320
[<ffffffff810379b1>] start_secondary+0xf1/0x100
RIP [<ffffffff810a4991>] handle_irq_event_percpu+0x31/0x1c0
...
The reason is as below:

The variable desc->action is not protected anymore. So desc->action is
removed concurrently.

CPU 0			CPU 1

free_irq()		lock(desc)
lock(desc)		handle_edge_irq()
			  handle_irq_event(desc)
			    unlock(desc)
desc->action = NULL	    handle_irq_event_percpu(desc)
	       		      action = desc->action

Because we either see a valid desc->action or NULL. If the action is about to
be removed it is still valid as free_irq() is blocked on synchronize_irq().

free_irq()		lock(desc)
lock(desc)		handle_edge_irq()
			  handle_irq_event(desc)
			    set(INPROGRESS)
			    unlock(desc)
			      handle_irq_event_percpu(desc)
	       		        action = desc->action
desc->action = NULL
sychronize_irq()
  while(INPROGRESS);	   lock(desc)
			   clr(INPROGRESS)
free(action)

That's basically the same mechanism as we have for shared
interrupts. The variable action->next can become NULL while
handle_irq_event_percpu() runs. Either it sees the action or
NULL. It does not matter, because action itself cannot go away.

Suggested-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Zhu Yanjun <zyjzyj2000@gmail.com>
---
 kernel/irq/handle.c |   12 +++++++++---
 1 file changed, 9 insertions(+), 3 deletions(-)

diff --git a/kernel/irq/handle.c b/kernel/irq/handle.c
index a302cf9..7510b72 100644
--- a/kernel/irq/handle.c
+++ b/kernel/irq/handle.c
@@ -136,9 +136,14 @@ irqreturn_t handle_irq_event_percpu(struct irq_desc *desc)
 {
 	irqreturn_t retval = IRQ_NONE;
 	unsigned int flags = 0, irq = desc->irq_data.irq;
-	struct irqaction *action = desc->action;
+	struct irqaction *action;
 
-	do {
+	/*
+	 * READ_ONCE is not required here. The compiler cannot reload action
+	 * because it'll be action->next for the second iteration of the loop.
+	 */
+	action = desc->action;
+	while (action) {
 		irqreturn_t res;
 
 		trace_irq_handler_entry(irq, action);
@@ -173,7 +179,7 @@ irqreturn_t handle_irq_event_percpu(struct irq_desc *desc)
 
 		retval |= res;
 		action = action->next;
-	} while (action);
+	}
 
 	add_interrupt_randomness(irq, flags);
 
-- 
1.7.9.5

^ permalink raw reply related	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2016-01-21  7:51 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2016-01-13 10:31 [PATCH 1/1] Revert "genirq: Remove the second parameter from handle_irq_event_percpu()" zyjzyj2000
2016-01-13 13:07 ` Thomas Gleixner
2016-01-14  1:29   ` Huang Shijie
2016-01-18  8:00     ` zhuyj
2016-01-14 19:15   ` [tip:irq/urgent] genirq: Validate action before dereferencing it in handle_irq_event_percpu() tip-bot for Thomas Gleixner
2016-01-21  7:52 ` [V2 PATCH 1/1] genirq: fix desc->action become NULL error zyjzyj2000
2016-01-21  7:52   ` zyjzyj2000

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).