All the mail mirrored from lore.kernel.org
 help / color / mirror / Atom feed
* sched,numa: invalid memory access in account_entity_dequeue
@ 2014-05-03 13:16 Sasha Levin
  2014-05-06 11:08 ` Peter Zijlstra
  0 siblings, 1 reply; 5+ messages in thread
From: Sasha Levin @ 2014-05-03 13:16 UTC (permalink / raw
  To: Ingo Molnar, Peter Zijlstra, Mel Gorman; +Cc: LKML, Dave Jones

Hi all,

While fuzzing with trinity inside a KVM tools guest running latest -next
kernel I've stumbled on the following:


[ 1796.591361] BUG: unable to handle kernel paging request at fffffffedf97f040
[ 1796.592665] IP: __cpu_to_node (arch/x86/mm/numa.c:777)
[ 1796.593710] PGD 21e30067 PUD 0
[ 1796.594174] Oops: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC
[ 1796.594937] Dumping ftrace buffer:
[ 1796.595678]    (ftrace buffer empty)
[ 1796.596329] Modules linked in:
[ 1796.596733] CPU: 2 PID: 0 Comm: swapper/2 Tainted: G        W     3.15.0-rc3-next-20140502-sasha-00019-g5cb1c98 #431
[ 1796.598143] task: ffff8803345b8000 ti: ffff880035fc0000 task.ti: ffff880035fc0000
[ 1796.598975] RIP: __cpu_to_node (arch/x86/mm/numa.c:777)
[ 1796.600093] RSP: 0018:ffff8800a6c03b88  EFLAGS: 00010046
[ 1796.600197] RAX: ffff8806e791a000 RBX: ffffffffe791a028 RCX: 0000000000000000
[ 1796.600197] RDX: 0000000000000001 RSI: ffff8806cdc68068 RDI: 00000000e791a028
[ 1796.600197] RBP: ffff8800a6c03b98 R08: ffff880496183078 R09: 00000000000151c6
[ 1796.600197] R10: 000000000000b731 R11: 0000000000000001 R12: ffff8801b4dd7840
[ 1796.600197] R13: 0000000000000000 R14: 000000000000001e R15: ffff8801b34ac1a0
[ 1796.600197] FS:  0000000000000000(0000) GS:ffff8800a6c00000(0000) knlGS:0000000000000000
[ 1796.600197] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[ 1796.600197] CR2: fffffffedf97f040 CR3: 0000000021e2d000 CR4: 00000000000006a0
[ 1796.610323] Stack:
[ 1796.610323]  0000000000000000 ffff8801b34ac1a0 ffff8800a6c03bd8 ffffffff9d1a9646
[ 1796.610323]  ffff8800a6c03bd8 ffff8806cdc68068 ffff8806cdc68068 ffff8801b34ac1a0
[ 1796.610323]  0000000000000000 000000000000b7db ffff8800a6c03c38 ffffffff9d1ae987
[ 1796.610323] Call Trace:
[ 1796.610323]  <IRQ>
[ 1796.610323] account_entity_dequeue (kernel/sched/fair.c:859 kernel/sched/fair.c:2009)
[ 1796.610323] dequeue_entity (kernel/sched/fair.c:2827)
[ 1796.610323] dequeue_task_fair (kernel/sched/fair.c:3907 include/linux/jump_label.h:105 kernel/sched/fair.c:3041 kernel/sched/fair.c:3217 kernel/sched/fair.c:3915)
[ 1796.610323] dequeue_task (kernel/sched/core.c:793)
[ 1796.610323] deactivate_task (kernel/sched/core.c:809)
[ 1796.610323] move_task (kernel/sched/fair.c:5032)
[ 1796.610323] load_balance (kernel/sched/fair.c:5305 kernel/sched/fair.c:6485)
[ 1796.610323] ? debug_smp_processor_id (lib/smp_processor_id.c:57)
[ 1796.610323] rebalance_domains (kernel/sched/fair.c:7032)
[ 1796.610323] ? rebalance_domains (kernel/sched/fair.c:6975)
[ 1796.610323] run_rebalance_domains (kernel/sched/fair.c:7105 kernel/sched/fair.c:7198)
[ 1796.610323] __do_softirq (kernel/softirq.c:269 include/linux/jump_label.h:105 include/trace/events/irq.h:126 kernel/softirq.c:270)
[ 1796.610323] ? irq_exit (include/linux/vtime.h:82 include/linux/vtime.h:121 kernel/softirq.c:384)
[ 1796.610323] irq_exit (kernel/softirq.c:346 kernel/softirq.c:387)
[ 1796.610323] scheduler_ipi (kernel/sched/core.c:1545)
[ 1796.610323] smp_reschedule_interrupt (arch/x86/kernel/smp.c:266)
[ 1796.610323] reschedule_interrupt (arch/x86/kernel/entry_64.S:1178)
[ 1796.610323]  <EOI>
[ 1796.610323] ? native_safe_halt (arch/x86/include/asm/irqflags.h:50)
[ 1796.610323] ? trace_hardirqs_on (kernel/locking/lockdep.c:2607)
[ 1796.637135] default_idle (arch/x86/include/asm/paravirt.h:111 arch/x86/kernel/process.c:310)
[ 1796.637135] arch_cpu_idle (arch/x86/kernel/process.c:302)
[ 1796.637135] cpu_idle_loop (kernel/sched/idle.c:179 kernel/sched/idle.c:226)
[ 1796.637135] cpu_startup_entry (??:?)
[ 1796.637135] start_secondary (arch/x86/kernel/smpboot.c:267)
[ 1796.637135] Code: 3a ea 05 00 74 25 89 de 48 c7 c7 08 b4 6c a1 31 c0 e8 99 6c 45 03 e8 7c 39 46 03 48 8b 05 71 3a ea 05 8b 04 98 eb 16 0f 1f 40 00 <48> 8b 14 dd 00 ef 0a a3 48 c7 c0 d8 f4 00 00 8b 04 10 48 83 c4
[ 1796.637135] RIP __cpu_to_node (arch/x86/mm/numa.c:777)
[ 1796.637135]  RSP <ffff8800a6c03b88>
[ 1796.637135] CR2: fffffffedf97f040



Thanks,
Sasha

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: sched,numa: invalid memory access in account_entity_dequeue
  2014-05-03 13:16 sched,numa: invalid memory access in account_entity_dequeue Sasha Levin
@ 2014-05-06 11:08 ` Peter Zijlstra
  2014-05-06 12:23   ` Sasha Levin
  0 siblings, 1 reply; 5+ messages in thread
From: Peter Zijlstra @ 2014-05-06 11:08 UTC (permalink / raw
  To: Sasha Levin; +Cc: Ingo Molnar, Mel Gorman, LKML, Dave Jones

[-- Attachment #1: Type: text/plain, Size: 8379 bytes --]

On Sat, May 03, 2014 at 09:16:00AM -0400, Sasha Levin wrote:
> Hi all,
> 
> While fuzzing with trinity inside a KVM tools guest running latest -next
> kernel I've stumbled on the following:
> 

Cute.. not making sense.. :-)

> [ 1796.591361] BUG: unable to handle kernel paging request at fffffffedf97f040
> [ 1796.592665] IP: __cpu_to_node (arch/x86/mm/numa.c:777)

I suppose you've scripted this addr2line -ie vmlinux for all addresses
in this splat?

> [ 1796.593710] PGD 21e30067 PUD 0
> [ 1796.594174] Oops: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC
> [ 1796.594937] Dumping ftrace buffer:
> [ 1796.595678]    (ftrace buffer empty)
> [ 1796.596329] Modules linked in:
> [ 1796.596733] CPU: 2 PID: 0 Comm: swapper/2 Tainted: G        W     3.15.0-rc3-next-20140502-sasha-00019-g5cb1c98 #431
> [ 1796.598143] task: ffff8803345b8000 ti: ffff880035fc0000 task.ti: ffff880035fc0000
> [ 1796.598975] RIP: __cpu_to_node (arch/x86/mm/numa.c:777)
> [ 1796.600093] RSP: 0018:ffff8800a6c03b88  EFLAGS: 00010046
> [ 1796.600197] RAX: ffff8806e791a000 RBX: ffffffffe791a028 RCX: 0000000000000000
> [ 1796.600197] RDX: 0000000000000001 RSI: ffff8806cdc68068 RDI: 00000000e791a028
> [ 1796.600197] RBP: ffff8800a6c03b98 R08: ffff880496183078 R09: 00000000000151c6
> [ 1796.600197] R10: 000000000000b731 R11: 0000000000000001 R12: ffff8801b4dd7840
> [ 1796.600197] R13: 0000000000000000 R14: 000000000000001e R15: ffff8801b34ac1a0
> [ 1796.600197] FS:  0000000000000000(0000) GS:ffff8800a6c00000(0000) knlGS:0000000000000000
> [ 1796.600197] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
> [ 1796.600197] CR2: fffffffedf97f040 CR3: 0000000021e2d000 CR4: 00000000000006a0
> [ 1796.610323] Stack:
> [ 1796.610323]  0000000000000000 ffff8801b34ac1a0 ffff8800a6c03bd8 ffffffff9d1a9646
> [ 1796.610323]  ffff8800a6c03bd8 ffff8806cdc68068 ffff8806cdc68068 ffff8801b34ac1a0
> [ 1796.610323]  0000000000000000 000000000000b7db ffff8800a6c03c38 ffffffff9d1ae987
> [ 1796.610323] Call Trace:
> [ 1796.610323]  <IRQ>
> [ 1796.610323] account_entity_dequeue (kernel/sched/fair.c:859 kernel/sched/fair.c:2009)
> [ 1796.610323] dequeue_entity (kernel/sched/fair.c:2827)
> [ 1796.610323] dequeue_task_fair (kernel/sched/fair.c:3907 include/linux/jump_label.h:105 kernel/sched/fair.c:3041 kernel/sched/fair.c:3217 kernel/sched/fair.c:3915)
> [ 1796.610323] dequeue_task (kernel/sched/core.c:793)
> [ 1796.610323] deactivate_task (kernel/sched/core.c:809)
> [ 1796.610323] move_task (kernel/sched/fair.c:5032)
> [ 1796.610323] load_balance (kernel/sched/fair.c:5305 kernel/sched/fair.c:6485)
> [ 1796.610323] ? debug_smp_processor_id (lib/smp_processor_id.c:57)
> [ 1796.610323] rebalance_domains (kernel/sched/fair.c:7032)
> [ 1796.610323] ? rebalance_domains (kernel/sched/fair.c:6975)
> [ 1796.610323] run_rebalance_domains (kernel/sched/fair.c:7105 kernel/sched/fair.c:7198)
> [ 1796.610323] __do_softirq (kernel/softirq.c:269 include/linux/jump_label.h:105 include/trace/events/irq.h:126 kernel/softirq.c:270)
> [ 1796.610323] ? irq_exit (include/linux/vtime.h:82 include/linux/vtime.h:121 kernel/softirq.c:384)
> [ 1796.610323] irq_exit (kernel/softirq.c:346 kernel/softirq.c:387)
> [ 1796.610323] scheduler_ipi (kernel/sched/core.c:1545)
> [ 1796.610323] smp_reschedule_interrupt (arch/x86/kernel/smp.c:266)
> [ 1796.610323] reschedule_interrupt (arch/x86/kernel/entry_64.S:1178)
> [ 1796.610323]  <EOI>
> [ 1796.610323] ? native_safe_halt (arch/x86/include/asm/irqflags.h:50)
> [ 1796.610323] ? trace_hardirqs_on (kernel/locking/lockdep.c:2607)
> [ 1796.637135] default_idle (arch/x86/include/asm/paravirt.h:111 arch/x86/kernel/process.c:310)
> [ 1796.637135] arch_cpu_idle (arch/x86/kernel/process.c:302)
> [ 1796.637135] cpu_idle_loop (kernel/sched/idle.c:179 kernel/sched/idle.c:226)
> [ 1796.637135] cpu_startup_entry (??:?)
> [ 1796.637135] start_secondary (arch/x86/kernel/smpboot.c:267)
> [ 1796.637135] Code: 3a ea 05 00 74 25 89 de 48 c7 c7 08 b4 6c a1 31 c0 e8 99 6c 45 03 e8 7c 39 46 03 48 8b 05 71 3a ea 05 8b 04 98 eb 16 0f 1f 40 00 <48> 8b 14 dd 00 ef 0a a3 48 c7 c0 d8 f4 00 00 8b 04 10 48 83 c4


Could you maybe also do the same with the Code? -- that is, script an
auto-decode for it?

Obviously scripts/decodecode doesn't actually work right anymore:

# echo [ 1796.637135] Code: 3a ea 05 00 74 25 89 de 48 c7 c7 08 b4 6c a1 31 c0 e8 99 6c 45 03 e8 7c 39 46 03 48 8b 05 71 3a ea 05 8b 04 98 eb 16 0f 1f 40 00 <48> 8b 14 dd 00 ef 0a a3 48 c7 c0 d8 00 00 8b 04 10 48 83 c4 | ./scripts/decodecode 
-bash: syntax error near unexpected token `48'

But if I remove the <> by hand I get:

# echo [ 1796.637135] Code: 3a ea 05 00 74 25 89 de 48 c7 c7 08 b4 6c a1 31 c0 e8 99 6c 45 03 e8 7c 39 46 03 48 8b 05 71 3a ea 05 8b 04 98 eb 16 0f 1f 40 00 48 8b 14 dd 00 ef 0a a3 48 c7 c0 d8 00 00 8b 04 10 48 83 c4 | ./scripts/decodecode 
[ 1796.637135] Code: 3a ea 05 00 74 25 89 de 48 c7 c7 08 b4 6c a1 31 c0 e8 99 6c 45 03 e8 7c 39 46 03 48 8b 05 71 3a ea 05 8b 04 98 eb 16 0f 1f 40 00 48 8b 14 dd 00 ef 0a a3 48 c7 c0 d8 00 00 8b 04 10 48 83 c4
sed: -e expression #1, char 1: unknown command: `-'

Code starting with the faulting instruction
===========================================
   0:   3a ea                   cmp    %dl,%ch
   2:   05 00 74 25 89          add    $0x89257400,%eax
   7:   de 48 c7                fimul  -0x39(%rax)
   a:   c7                      (bad)  
   b:   08 b4 6c a1 31 c0 e8    or     %dh,-0x173fce5f(%rsp,%rbp,2)
  12:   99                      cltd   
  13:   6c                      insb   (%dx),%es:(%rdi)
  14:   45 03 e8                add    %r8d,%r13d
  17:   7c 39                   jl     0x52
  19:   46 03 48 8b             rex.RX add -0x75(%rax),%r9d
  1d:   05 71 3a ea 05          add    $0x5ea3a71,%eax
  22:   8b 04 98                mov    (%rax,%rbx,4),%eax
  25:   eb 16                   jmp    0x3d
  27:   0f 1f 40 00             nopl   0x0(%rax)
  2b:   48 8b 14 dd 00 ef 0a    mov    -0x5cf51100(,%rbx,8),%rdx
  32:   a3 
  33:   48 c7 c0 d8 00 00 8b    mov    $0xffffffff8b0000d8,%rax
  3a:   04 10                   add    $0x10,%al
  3c:   48                      rex.W
  3d:   83                      .byte 0x83
  3e:   c4                      .byte 0xc4

And 2b is the offset where the <> was.

Anyway, the reason I did this was that I was hoping to find
the cpu argument in one of the registers, but looking at your RBX value
doesn't really help.


If I compile this function with a defconfig based .config, I get
something like:

00000000000000a0 <__cpu_to_node>:
  a0:   48 83 3d 00 00 00 00    cmpq   $0x0,0x0(%rip)        # a8 <__cpu_to_node+0x8>
  a7:   00 
  a8:   55                      push   %rbp
  a9:   48 89 e5                mov    %rsp,%rbp
  ac:   53                      push   %rbx
  ad:   48 63 df                movslq %edi,%rbx
  b0:   75 15                   jne    c7 <__cpu_to_node+0x27>
  b2:   48 8b 14 dd 00 00 00    mov    0x0(,%rbx,8),%rdx
  b9:   00 
  ba:   48 c7 c0 00 00 00 00    mov    $0x0,%rax
  c1:   8b 04 10                mov    (%rax,%rdx,1),%eax
  c4:   5b                      pop    %rbx
  c5:   5d                      pop    %rbp
  c6:   c3                      retq   
  c7:   89 de                   mov    %ebx,%esi
  c9:   48 c7 c7 00 00 00 00    mov    $0x0,%rdi
  d0:   31 c0                   xor    %eax,%eax
  d2:   e8 00 00 00 00          callq  d7 <__cpu_to_node+0x37>
  d7:   e8 00 00 00 00          callq  dc <__cpu_to_node+0x3c>
  dc:   48 8b 05 00 00 00 00    mov    0x0(%rip),%rax        # e3 <__cpu_to_node+0x43>
  e3:   8b 04 98                mov    (%rax,%rbx,4),%eax
  e6:   eb dc                   jmp    c4 <__cpu_to_node+0x24>
  e8:   0f 1f 84 00 00 00 00    nopl   0x0(%rax,%rax,1)
  ef:   00


And the b2 offset matches up fairly nicely, although the rest of the
decode appears to be crap. Still no hints though.

However, calling convention puts the first argument in EAX, and at b2
EAX should still contain the original value, however your RAX value is
complete nonsense again :/

Of course, the cpu argument being complete crap is a good reason for
this to happen. Which would make thread_info::cpu of the task in
question be complete crap.. and I'm not sure I can explain that either.

la-la-la.. 

[-- Attachment #2: Type: application/pgp-signature, Size: 836 bytes --]

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: sched,numa: invalid memory access in account_entity_dequeue
  2014-05-06 11:08 ` Peter Zijlstra
@ 2014-05-06 12:23   ` Sasha Levin
  2014-05-06 13:40     ` Peter Zijlstra
  0 siblings, 1 reply; 5+ messages in thread
From: Sasha Levin @ 2014-05-06 12:23 UTC (permalink / raw
  To: Peter Zijlstra; +Cc: Ingo Molnar, Mel Gorman, LKML, Dave Jones

On 05/06/2014 07:08 AM, Peter Zijlstra wrote:
> On Sat, May 03, 2014 at 09:16:00AM -0400, Sasha Levin wrote:
>> Hi all,
>> 
>> While fuzzing with trinity inside a KVM tools guest running latest -next kernel I've stumbled on the following:
>> 
> 
> Cute.. not making sense.. :-)
> 
>> [ 1796.591361] BUG: unable to handle kernel paging request at fffffffedf97f040 [ 1796.592665] IP: __cpu_to_node (arch/x86/mm/numa.c:777)
> 
> I suppose you've scripted this addr2line -ie vmlinux for all addresses in this splat?

Yeah, I'm trying to get that script upstream (https://lkml.org/lkml/2014/3/29/1)
since it seems to simplify looking at stack traces.

>> [ 1796.593710] PGD 21e30067 PUD 0 [ 1796.594174] Oops: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC [ 1796.594937] Dumping ftrace buffer: [ 1796.595678]    (ftrace buffer empty) [ 1796.596329] Modules linked in: [ 1796.596733] CPU: 2 PID: 0 Comm: swapper/2 Tainted: G        W     3.15.0-rc3-next-20140502-sasha-00019-g5cb1c98 #431 [ 1796.598143] task: ffff8803345b8000 ti: ffff880035fc0000 task.ti: ffff880035fc0000 [ 1796.598975] RIP: __cpu_to_node (arch/x86/mm/numa.c:777) [ 1796.600093] RSP: 0018:ffff8800a6c03b88  EFLAGS: 00010046 [ 1796.600197] RAX: ffff8806e791a000 RBX: ffffffffe791a028 RCX: 0000000000000000 [ 1796.600197] RDX: 0000000000000001 RSI: ffff8806cdc68068 RDI: 00000000e791a028 [ 1796.600197] RBP: ffff8800a6c03b98 R08: ffff880496183078 R09: 00000000000151c6 [ 1796.600197] R10: 000000000000b731 R11: 0000000000000001 R12: ffff8801b4dd7840 [ 1796.600197] R13: 0000000000000000 R14: 000000000000001e R15: ffff8801b34ac1a0 [ 1796.600197] FS:  0000000000000000(0000) GS:ffff88!
 00a6c00000
(0000) knlGS:0000000000000000 [ 1796.600197] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b [ 1796.600197] CR2: fffffffedf97f040 CR3: 0000000021e2d000 CR4: 00000000000006a0 [ 1796.610323] Stack: [ 1796.610323]  0000000000000000 ffff8801b34ac1a0 ffff8800a6c03bd8 ffffffff9d1a9646 [ 1796.610323]  ffff8800a6c03bd8 ffff8806cdc68068 ffff8806cdc68068 ffff8801b34ac1a0 [ 1796.610323]  0000000000000000 000000000000b7db ffff8800a6c03c38 ffffffff9d1ae987 [ 1796.610323] Call Trace: [ 1796.610323]  <IRQ> [ 1796.610323] account_entity_dequeue (kernel/sched/fair.c:859 kernel/sched/fair.c:2009) [ 1796.610323] dequeue_entity (kernel/sched/fair.c:2827) [ 1796.610323] dequeue_task_fair (kernel/sched/fair.c:3907 include/linux/jump_label.h:105 kernel/sched/fair.c:3041 kernel/sched/fair.c:3217 kernel/sched/fair.c:3915) [ 1796.610323] dequeue_task (kernel/sched/core.c:793) [ 1796.610323] deactivate_task (kernel/sched/core.c:809) [ 1796.610323] move_task (kernel/sched/fair.c:5032) [ 1796.610323] !
 load_balan
ce (kernel/sched/fair.c:5305 kernel/sched/fair.c:6485) [ 1796.610323] ? debug_smp_processor_id (lib/smp_processor_id.c:57) [ 1796.610323] rebalance_domains (kernel/sched/fair.c:7032) [ 1796.610323] ? rebalance_domains (kernel/sched/fair.c:6975) [ 1796.610323] run_rebalance_domains (kernel/sched/fair.c:7105 kernel/sched/fair.c:7198) [ 1796.610323] __do_softirq (kernel/softirq.c:269 include/linux/jump_label.h:105 include/trace/events/irq.h:126 kernel/softirq.c:270) [ 1796.610323] ? irq_exit (include/linux/vtime.h:82 include/linux/vtime.h:121 kernel/softirq.c:384) [ 1796.610323] irq_exit (kernel/softirq.c:346 kernel/softirq.c:387) [ 1796.610323] scheduler_ipi (kernel/sched/core.c:1545) [ 1796.610323] smp_reschedule_interrupt (arch/x86/kernel/smp.c:266) [ 1796.610323] reschedule_interrupt (arch/x86/kernel/entry_64.S:1178) [ 1796.610323]  <EOI> [ 1796.610323] ? native_safe_halt (arch/x86/include/asm/irqflags.h:50) [ 1796.610323] ? trace_hardirqs_on (kernel/locking/lockdep.c:2607)!
  [ 1796.63
7135] default_idle (arch/x86/include/asm/paravirt.h:111 arch/x86/kernel/process.c:310) [ 1796.637135] arch_cpu_idle (arch/x86/kernel/process.c:302) [ 1796.637135] cpu_idle_loop (kernel/sched/idle.c:179 kernel/sched/idle.c:226) [ 1796.637135] cpu_startup_entry (??:?) [ 1796.637135] start_secondary (arch/x86/kernel/smpboot.c:267) [ 1796.637135] Code: 3a ea 05 00 74 25 89 de 48 c7 c7 08 b4 6c a1 31 c0 e8 99 6c 45 03 e8 7c 39 46 03 48 8b 05 71 3a ea 05 8b 04 98 eb 16 0f 1f 40 00 <48> 8b 14 dd 00 ef 0a a3 48 c7 c0 d8 f4 00 00 8b 04 10 48 83 c4
> 
> 
> Could you maybe also do the same with the Code? -- that is, script an auto-decode for it?
> 
> Obviously scripts/decodecode doesn't actually work right anymore:
> 
> # echo [ 1796.637135] Code: 3a ea 05 00 74 25 89 de 48 c7 c7 08 b4 6c a1 31 c0 e8 99 6c 45 03 e8 7c 39 46 03 48 8b 05 71 3a ea 05 8b 04 98 eb 16 0f 1f 40 00 <48> 8b 14 dd 00 ef 0a a3 48 c7 c0 d8 00 00 8b 04 10 48 83 c4 | ./scripts/decodecode -bash: syntax error near unexpected token `48'
> 
> But if I remove the <> by hand I get:
> 
> # echo [ 1796.637135] Code: 3a ea 05 00 74 25 89 de 48 c7 c7 08 b4 6c a1 31 c0 e8 99 6c 45 03 e8 7c 39 46 03 48 8b 05 71 3a ea 05 8b 04 98 eb 16 0f 1f 40 00 48 8b 14 dd 00 ef 0a a3 48 c7 c0 d8 00 00 8b 04 10 48 83 c4 | ./scripts/decodecode [ 1796.637135] Code: 3a ea 05 00 74 25 89 de 48 c7 c7 08 b4 6c a1 31 c0 e8 99 6c 45 03 e8 7c 39 46 03 48 8b 05 71 3a ea 05 8b 04 98 eb 16 0f 1f 40 00 48 8b 14 dd 00 ef 0a a3 48 c7 c0 d8 00 00 8b 04 10 48 83 c4 sed: -e expression #1, char 1: unknown command: `-'
> 
> Code starting with the faulting instruction =========================================== 0:   3a ea                   cmp    %dl,%ch 2:   05 00 74 25 89          add    $0x89257400,%eax 7:   de 48 c7                fimul  -0x39(%rax) a:   c7                      (bad) b:   08 b4 6c a1 31 c0 e8    or     %dh,-0x173fce5f(%rsp,%rbp,2) 12:   99                      cltd 13:   6c                      insb   (%dx),%es:(%rdi) 14:   45 03 e8                add    %r8d,%r13d 17:   7c 39                   jl     0x52 19:   46 03 48 8b             rex.RX add -0x75(%rax),%r9d 1d:   05 71 3a ea 05          add    $0x5ea3a71,%eax 22:   8b 04 98                mov    (%rax,%rbx,4),%eax 25:   eb 16                   jmp    0x3d 27:   0f 1f 40 00             nopl   0x0(%rax) 2b:   48 8b 14 dd 00 ef 0a    mov    -0x5cf51100(,%rbx,8),%rdx 32:   a3 33:   48 c7 c0 d8 00 00 8b    mov    $0xffffffff8b0000d8,%rax 3a:   04 10                   add    $0x10,%al 3c:   48                      rex.W 3d!
 :   83    
                  .byte 0x83 3e:   c4                      .byte 0xc4
> 
> And 2b is the offset where the <> was.

Sure, I can look into that.

> Anyway, the reason I did this was that I was hoping to find the cpu argument in one of the registers, but looking at your RBX value doesn't really help.
> 
> 
> If I compile this function with a defconfig based .config, I get something like:
> 
> 00000000000000a0 <__cpu_to_node>: a0:   48 83 3d 00 00 00 00    cmpq   $0x0,0x0(%rip)        # a8 <__cpu_to_node+0x8> a7:   00 a8:   55                      push   %rbp a9:   48 89 e5                mov    %rsp,%rbp ac:   53                      push   %rbx ad:   48 63 df                movslq %edi,%rbx b0:   75 15                   jne    c7 <__cpu_to_node+0x27> b2:   48 8b 14 dd 00 00 00    mov    0x0(,%rbx,8),%rdx b9:   00 ba:   48 c7 c0 00 00 00 00    mov    $0x0,%rax c1:   8b 04 10                mov    (%rax,%rdx,1),%eax c4:   5b                      pop    %rbx c5:   5d                      pop    %rbp c6:   c3                      retq c7:   89 de                   mov    %ebx,%esi c9:   48 c7 c7 00 00 00 00    mov    $0x0,%rdi d0:   31 c0                   xor    %eax,%eax d2:   e8 00 00 00 00          callq  d7 <__cpu_to_node+0x37> d7:   e8 00 00 00 00          callq  dc <__cpu_to_node+0x3c> dc:   48 8b 05 00 00 00 00    mov    0x0(%rip),%rax        # e3 <__cpu_t!
 o_node+0x4
3> e3:   8b 04 98                mov    (%rax,%rbx,4),%eax e6:   eb dc                   jmp    c4 <__cpu_to_node+0x24> e8:   0f 1f 84 00 00 00 00    nopl   0x0(%rax,%rax,1) ef:   00
> 
> 
> And the b2 offset matches up fairly nicely, although the rest of the decode appears to be crap. Still no hints though.
> 
> However, calling convention puts the first argument in EAX, and at b2 EAX should still contain the original value, however your RAX value is complete nonsense again :/
> 
> Of course, the cpu argument being complete crap is a good reason for this to happen. Which would make thread_info::cpu of the task in question be complete crap.. and I'm not sure I can explain that either.
> 
> la-la-la..
> 

I haven't seen it happening again, so maybe an unrelated memory corruption?


Thanks,
Sasha


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: sched,numa: invalid memory access in account_entity_dequeue
  2014-05-06 12:23   ` Sasha Levin
@ 2014-05-06 13:40     ` Peter Zijlstra
  2014-05-06 14:20       ` Sasha Levin
  0 siblings, 1 reply; 5+ messages in thread
From: Peter Zijlstra @ 2014-05-06 13:40 UTC (permalink / raw
  To: Sasha Levin; +Cc: Ingo Molnar, Mel Gorman, LKML, Dave Jones

[-- Attachment #1: Type: text/plain, Size: 819 bytes --]

On Tue, May 06, 2014 at 08:23:25AM -0400, Sasha Levin wrote:
> > I suppose you've scripted this addr2line -ie vmlinux for all addresses in this splat?
> 
> Yeah, I'm trying to get that script upstream (https://lkml.org/lkml/2014/3/29/1)
> since it seems to simplify looking at stack traces.

Seems like a nice addition to scripts/ indeed. Although Linus doesn't
seem to like it much:

  http://marc.info/?l=linux-kernel&m=139862933908922


> > Could you maybe also do the same with the Code? -- that is, script an auto-decode for it?

> Sure, I can look into that.

Thanks!

> > la-la-la..
> > 
> 
> I haven't seen it happening again, so maybe an unrelated memory corruption?

Yeah, just the thing we need :/ In any case, let me know if you do hit
it again this side of the Sun burning up.



[-- Attachment #2: Type: application/pgp-signature, Size: 836 bytes --]

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: sched,numa: invalid memory access in account_entity_dequeue
  2014-05-06 13:40     ` Peter Zijlstra
@ 2014-05-06 14:20       ` Sasha Levin
  0 siblings, 0 replies; 5+ messages in thread
From: Sasha Levin @ 2014-05-06 14:20 UTC (permalink / raw
  To: Peter Zijlstra; +Cc: Ingo Molnar, Mel Gorman, LKML, Dave Jones

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On 05/06/2014 09:40 AM, Peter Zijlstra wrote:
> On Tue, May 06, 2014 at 08:23:25AM -0400, Sasha Levin wrote:
>>>>> I suppose you've scripted this addr2line -ie vmlinux for all addresses in this splat?
>>> 
>>> Yeah, I'm trying to get that script upstream (https://lkml.org/lkml/2014/3/29/1) since it seems to simplify looking at stack traces.
> Seems like a nice addition to scripts/ indeed. Although Linus doesn't seem to like it much:
> 
> http://marc.info/?l=linux-kernel&m=139862933908922

Linus objected to the original version of the script using the hex numbers
he wants to throw out instead of resolving it from 'nm vmlinux', which was
fixed in later versions.

Since there were no objections from Linus after all the fixes that were asked
for I'll try pinging him again next merge window, but until then I'll add some
more features that were asked for like caching and code decoding and try again :)


Thanks,
Sasha
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/

iQIcBAEBAgAGBQJTaO+fAAoJEN6mb/eXdyzcNFQP/1cSr+MVTKBLQdI0p6esCSx0
nEMdSbJpU/P98cK2kuWffYHL4IUs9cfAeumOrUbWaDWBatetDxH1e557k53VW7aQ
mmaNXrtaDQBFc3lzicDXxdjgjttf+ljkLFzBUexGhw01DsgRHa1DodD9ONcZ5Dav
bsy9sWtEfUyiDNln6NtGxTSlMUXehIpLDVoqKcxzOg5wm6sSs+JF9w6YvohNuEtS
OvrHKZv0XiV+Stv/496lntbmyVoVNpuhpzm3RmAxGy6VspRJLWlc8srIHWVY6KZG
zP/5WvrgGawQqbMGfxKUYDplRSGzupm8xDdpVSZ8RQLfWVEqswEHjO0wRt5+8yqt
r62nXPLcepjJcYppN7t6QqJ18zWTDGAOhvjNb8Yl0+YkLO/9URFetAiYq+frHVYN
WAjEBBQdWiBS3akllFkf8hGEnF4vbhBHdZfBS0irbGG8eyZH3kaLhS5lJzyTnWKr
oHrQ2YPrtDJjTQ1WWNiKU23GouFUHsRS/dK51VPQrUVKn5p8T6O9JSVMC0HMJkji
lRfgOJM3YTB51Y3CJ9N5t8jP0l8ynLllzj4hrLN0KzfkdmYGk0yLzCuSy8BlPzyL
bgN0u2ZEIEWyhlB/RVv/NA2gC6yC3xmypQgAbyQyIjqFWH6+s1hhcV5D+QnW6kN0
UZVTCyFUo4XatOhKqnj6
=6rXX
-----END PGP SIGNATURE-----

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2014-05-06 14:21 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-05-03 13:16 sched,numa: invalid memory access in account_entity_dequeue Sasha Levin
2014-05-06 11:08 ` Peter Zijlstra
2014-05-06 12:23   ` Sasha Levin
2014-05-06 13:40     ` Peter Zijlstra
2014-05-06 14:20       ` Sasha Levin

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.