LKML Archive mirror
 help / color / mirror / Atom feed
* [PATCH] arm64: perf: Fix callchain parse error with kernel tracepoint events
@ 2015-04-28 13:20 Hou Pengyang
  2015-04-29 10:12 ` Will Deacon
  0 siblings, 1 reply; 5+ messages in thread
From: Hou Pengyang @ 2015-04-28 13:20 UTC (permalink / raw
  To: a.p.zijlstra, paulus, mingo, acme
  Cc: wangnan0, catalin.marinas, will.deacon, linux-kernel,
	linux-arm-kernel

For ARM64, when tracing with tracepoint events, the IP and cpsr are set
to 0, preventing the perf code parsing the callchain and resolving the
symbols correctly.

 ./perf record -e sched:sched_switch -g --call-graph dwarf ls
	[ perf record: Captured and wrote 0.146 MB perf.data ]
 ./perf report -f
    Samples: 194  of event 'sched:sched_switch', Event count (approx.): 194
    Children      Self    Command  Shared Object     Symbol
	100.00%       100.00%  ls       [unknown]         [.] 0000000000000000

The fix is to implement perf_arch_fetch_caller_regs for ARM64, which fills
several necessary registers used for callchain unwinding, including pc,sp,
fp and psr .

With this patch, callchain can be parsed correctly as follows:

     ......
+    2.63%     0.00%  ls       [kernel.kallsyms]  [k] vfs_symlink
+    2.63%     0.00%  ls       [kernel.kallsyms]  [k] follow_down
+    2.63%     0.00%  ls       [kernel.kallsyms]  [k] pfkey_get
+    2.63%     0.00%  ls       [kernel.kallsyms]  [k] do_execveat_common.isra.33
-    2.63%     0.00%  ls       [kernel.kallsyms]  [k] pfkey_send_policy_notify
     pfkey_send_policy_notify
     pfkey_get
     v9fs_vfs_rename
     page_follow_link_light
     link_path_walk
     el0_svc_naked
    .......

For tracepoint event, stack parsing also doesn't work well for ARM. Jean Pihet
comed up a patch:
http://thread.gmane.org/gmane.linux.kernel/1734283/focus=1734280

Signed-off-by: Hou Pengyang <houpengyang@huawei.com>
---
 arch/arm64/include/asm/perf_event.h | 16 ++++++++++++++++
 1 file changed, 16 insertions(+)

diff --git a/arch/arm64/include/asm/perf_event.h b/arch/arm64/include/asm/perf_event.h
index d26d1d5..16a074f 100644
--- a/arch/arm64/include/asm/perf_event.h
+++ b/arch/arm64/include/asm/perf_event.h
@@ -24,4 +24,20 @@ extern unsigned long perf_misc_flags(struct pt_regs *regs);
 #define perf_misc_flags(regs)	perf_misc_flags(regs)
 #endif
 
+#define perf_arch_fetch_caller_regs(regs, __ip) { \
+   unsigned long sp;   \
+   __asm__ ("mov %[sp], sp\n" : [sp] "=r" (sp)); \
+       (regs)->pc = (__ip);    \
+       __asm__ (      \
+               "str %[sp],  %[_arm64_sp]  \n\t"    \
+               "str x29, %[_arm64_fp]  \n\t"    \
+               "mrs %[_arm64_cpsr], spsr_el1 \n\t"     \
+               : [_arm64_sp] "=m" (regs->sp),      \
+                 [_arm64_fp] "=m" (regs->regs[29]),  \
+                 [_arm64_cpsr] "=r" (regs->pstate) \
+       : [sp] "r" (sp)     \
+       );      \
+}
+
+
 #endif
-- 
1.8.3.4


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [PATCH] arm64: perf: Fix callchain parse error with kernel tracepoint events
  2015-04-28 13:20 [PATCH] arm64: perf: Fix callchain parse error with kernel tracepoint events Hou Pengyang
@ 2015-04-29 10:12 ` Will Deacon
  2015-04-30  1:50   ` Hou Pengyang
  2015-04-30 13:55   ` Hou Pengyang
  0 siblings, 2 replies; 5+ messages in thread
From: Will Deacon @ 2015-04-29 10:12 UTC (permalink / raw
  To: Hou Pengyang
  Cc: a.p.zijlstra@chello.nl, paulus@samba.org, mingo@redhat.com,
	acme@kernel.org, wangnan0@huawei.com, Catalin Marinas,
	linux-kernel@vger.kernel.org,
	linux-arm-kernel@lists.infradead.org, mark.rutland

Hello,

On Tue, Apr 28, 2015 at 02:20:48PM +0100, Hou Pengyang wrote:
> For ARM64, when tracing with tracepoint events, the IP and cpsr are set
> to 0, preventing the perf code parsing the callchain and resolving the
> symbols correctly.
> 
>  ./perf record -e sched:sched_switch -g --call-graph dwarf ls
> 	[ perf record: Captured and wrote 0.146 MB perf.data ]
>  ./perf report -f
>     Samples: 194  of event 'sched:sched_switch', Event count (approx.): 194
>     Children      Self    Command  Shared Object     Symbol
> 	100.00%       100.00%  ls       [unknown]         [.] 0000000000000000
> 
> The fix is to implement perf_arch_fetch_caller_regs for ARM64, which fills
> several necessary registers used for callchain unwinding, including pc,sp,
> fp and psr .
> 
> With this patch, callchain can be parsed correctly as follows:
> 
>      ......
> +    2.63%     0.00%  ls       [kernel.kallsyms]  [k] vfs_symlink
> +    2.63%     0.00%  ls       [kernel.kallsyms]  [k] follow_down
> +    2.63%     0.00%  ls       [kernel.kallsyms]  [k] pfkey_get
> +    2.63%     0.00%  ls       [kernel.kallsyms]  [k] do_execveat_common.isra.33
> -    2.63%     0.00%  ls       [kernel.kallsyms]  [k] pfkey_send_policy_notify
>      pfkey_send_policy_notify
>      pfkey_get
>      v9fs_vfs_rename
>      page_follow_link_light
>      link_path_walk
>      el0_svc_naked
>     .......
> 
> For tracepoint event, stack parsing also doesn't work well for ARM. Jean Pihet
> comed up a patch:
> http://thread.gmane.org/gmane.linux.kernel/1734283/focus=1734280

Any chance you could revive that series too, please? I'd like to update both
arm and arm64 together, since we're currently working at merging the two
perf backends and introducing discrepencies is going to delay that even
longer.

> Signed-off-by: Hou Pengyang <houpengyang@huawei.com>
> ---
>  arch/arm64/include/asm/perf_event.h | 16 ++++++++++++++++
>  1 file changed, 16 insertions(+)
> 
> diff --git a/arch/arm64/include/asm/perf_event.h b/arch/arm64/include/asm/perf_event.h
> index d26d1d5..16a074f 100644
> --- a/arch/arm64/include/asm/perf_event.h
> +++ b/arch/arm64/include/asm/perf_event.h
> @@ -24,4 +24,20 @@ extern unsigned long perf_misc_flags(struct pt_regs *regs);
>  #define perf_misc_flags(regs)	perf_misc_flags(regs)
>  #endif
>  
> +#define perf_arch_fetch_caller_regs(regs, __ip) { \
> +   unsigned long sp;   \
> +   __asm__ ("mov %[sp], sp\n" : [sp] "=r" (sp)); \
> +       (regs)->pc = (__ip);    \
> +       __asm__ (      \
> +               "str %[sp],  %[_arm64_sp]  \n\t"    \
> +               "str x29, %[_arm64_fp]  \n\t"    \
> +               "mrs %[_arm64_cpsr], spsr_el1 \n\t"     \
> +               : [_arm64_sp] "=m" (regs->sp),      \
> +                 [_arm64_fp] "=m" (regs->regs[29]),  \
> +                 [_arm64_cpsr] "=r" (regs->pstate) \

Does this really all need to be in assembly code? Ideally we'd use something
like __builtin_stack_pointer and __builtin_frame_pointer. That just leaves
the CPSR, but given that it's (a) only used for user_mode_regs tests and (b)
this macro is only used by ftrace, then we just set it to a static value
indicating that we're at EL1.

So I *think* we should be able to write this as three lines of C.

Will

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] arm64: perf: Fix callchain parse error with kernel tracepoint events
  2015-04-29 10:12 ` Will Deacon
@ 2015-04-30  1:50   ` Hou Pengyang
  2015-04-30  8:59     ` Will Deacon
  2015-04-30 13:55   ` Hou Pengyang
  1 sibling, 1 reply; 5+ messages in thread
From: Hou Pengyang @ 2015-04-30  1:50 UTC (permalink / raw
  To: Will Deacon
  Cc: a.p.zijlstra@chello.nl, paulus@samba.org, mingo@redhat.com,
	acme@kernel.org, wangnan0@huawei.com, Catalin Marinas,
	linux-kernel@vger.kernel.org,
	linux-arm-kernel@lists.infradead.org, mark.rutland

On 2015/4/29 18:12, Will Deacon wrote:
> Hello,
>
> On Tue, Apr 28, 2015 at 02:20:48PM +0100, Hou Pengyang wrote:
>> For ARM64, when tracing with tracepoint events, the IP and cpsr are set
>> to 0, preventing the perf code parsing the callchain and resolving the
>> symbols correctly.
>>
>>   ./perf record -e sched:sched_switch -g --call-graph dwarf ls
>> 	[ perf record: Captured and wrote 0.146 MB perf.data ]
>>   ./perf report -f
>>      Samples: 194  of event 'sched:sched_switch', Event count (approx.): 194
>>      Children      Self    Command  Shared Object     Symbol
>> 	100.00%       100.00%  ls       [unknown]         [.] 0000000000000000
>>
>> The fix is to implement perf_arch_fetch_caller_regs for ARM64, which fills
>> several necessary registers used for callchain unwinding, including pc,sp,
>> fp and psr .
>>
>> With this patch, callchain can be parsed correctly as follows:
>>
>>       ......
>> +    2.63%     0.00%  ls       [kernel.kallsyms]  [k] vfs_symlink
>> +    2.63%     0.00%  ls       [kernel.kallsyms]  [k] follow_down
>> +    2.63%     0.00%  ls       [kernel.kallsyms]  [k] pfkey_get
>> +    2.63%     0.00%  ls       [kernel.kallsyms]  [k] do_execveat_common.isra.33
>> -    2.63%     0.00%  ls       [kernel.kallsyms]  [k] pfkey_send_policy_notify
>>       pfkey_send_policy_notify
>>       pfkey_get
>>       v9fs_vfs_rename
>>       page_follow_link_light
>>       link_path_walk
>>       el0_svc_naked
>>      .......
>>
>> For tracepoint event, stack parsing also doesn't work well for ARM. Jean Pihet
>> comed up a patch:
>> http://thread.gmane.org/gmane.linux.kernel/1734283/focus=1734280
>
> Any chance you could revive that series too, please? I'd like to update both
> arm and arm64 together, since we're currently working at merging the two
> perf backends and introducing discrepencies is going to delay that even
> longer.
>
>> Signed-off-by: Hou Pengyang <houpengyang@huawei.com>
>> ---
>>   arch/arm64/include/asm/perf_event.h | 16 ++++++++++++++++
>>   1 file changed, 16 insertions(+)
>>
>> diff --git a/arch/arm64/include/asm/perf_event.h b/arch/arm64/include/asm/perf_event.h
>> index d26d1d5..16a074f 100644
>> --- a/arch/arm64/include/asm/perf_event.h
>> +++ b/arch/arm64/include/asm/perf_event.h
>> @@ -24,4 +24,20 @@ extern unsigned long perf_misc_flags(struct pt_regs *regs);
>>   #define perf_misc_flags(regs)	perf_misc_flags(regs)
>>   #endif
>>
>> +#define perf_arch_fetch_caller_regs(regs, __ip) { \
>> +   unsigned long sp;   \
>> +   __asm__ ("mov %[sp], sp\n" : [sp] "=r" (sp)); \
>> +       (regs)->pc = (__ip);    \
>> +       __asm__ (      \
>> +               "str %[sp],  %[_arm64_sp]  \n\t"    \
>> +               "str x29, %[_arm64_fp]  \n\t"    \
>> +               "mrs %[_arm64_cpsr], spsr_el1 \n\t"     \
>> +               : [_arm64_sp] "=m" (regs->sp),      \
>> +                 [_arm64_fp] "=m" (regs->regs[29]),  \
>> +                 [_arm64_cpsr] "=r" (regs->pstate) \
>
> Does this really all need to be in assembly code? Ideally we'd use something
> like __builtin_stack_pointer and __builtin_frame_pointer. That just leaves
> the CPSR, but given that it's (a) only used for user_mode_regs tests and (b)
> this macro is only used by ftrace, then we just set it to a static value
> indicating that we're at EL1.
>
> So I *think* we should be able to write this as three lines of C.
>
Hi, will, as you said, we can get fp by __builtin_frame_address() and 
pstate by setting it to a static value. However, for sp, there isn't a 
gcc builtin fuction like __builtin_stack_pointer, so assembly code is 
needed. What's more, if CONFIG_FRAME_POINTER is close, can fp be got by 
__builtin_frame_address()?

> Will
>
> .
>



^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] arm64: perf: Fix callchain parse error with kernel tracepoint events
  2015-04-30  1:50   ` Hou Pengyang
@ 2015-04-30  8:59     ` Will Deacon
  0 siblings, 0 replies; 5+ messages in thread
From: Will Deacon @ 2015-04-30  8:59 UTC (permalink / raw
  To: Hou Pengyang
  Cc: a.p.zijlstra@chello.nl, paulus@samba.org, mingo@redhat.com,
	acme@kernel.org, wangnan0@huawei.com, Catalin Marinas,
	linux-kernel@vger.kernel.org,
	linux-arm-kernel@lists.infradead.org, Mark Rutland

On Thu, Apr 30, 2015 at 02:50:05AM +0100, Hou Pengyang wrote:
> On 2015/4/29 18:12, Will Deacon wrote:
> > On Tue, Apr 28, 2015 at 02:20:48PM +0100, Hou Pengyang wrote:
> >> diff --git a/arch/arm64/include/asm/perf_event.h b/arch/arm64/include/asm/perf_event.h
> >> index d26d1d5..16a074f 100644
> >> --- a/arch/arm64/include/asm/perf_event.h
> >> +++ b/arch/arm64/include/asm/perf_event.h
> >> @@ -24,4 +24,20 @@ extern unsigned long perf_misc_flags(struct pt_regs *regs);
> >>   #define perf_misc_flags(regs)	perf_misc_flags(regs)
> >>   #endif
> >>
> >> +#define perf_arch_fetch_caller_regs(regs, __ip) { \
> >> +   unsigned long sp;   \
> >> +   __asm__ ("mov %[sp], sp\n" : [sp] "=r" (sp)); \
> >> +       (regs)->pc = (__ip);    \
> >> +       __asm__ (      \
> >> +               "str %[sp],  %[_arm64_sp]  \n\t"    \
> >> +               "str x29, %[_arm64_fp]  \n\t"    \
> >> +               "mrs %[_arm64_cpsr], spsr_el1 \n\t"     \
> >> +               : [_arm64_sp] "=m" (regs->sp),      \
> >> +                 [_arm64_fp] "=m" (regs->regs[29]),  \
> >> +                 [_arm64_cpsr] "=r" (regs->pstate) \
> >
> > Does this really all need to be in assembly code? Ideally we'd use something
> > like __builtin_stack_pointer and __builtin_frame_pointer. That just leaves
> > the CPSR, but given that it's (a) only used for user_mode_regs tests and (b)
> > this macro is only used by ftrace, then we just set it to a static value
> > indicating that we're at EL1.
> >
> > So I *think* we should be able to write this as three lines of C.
> >
> Hi, will, as you said, we can get fp by __builtin_frame_address() and 
> pstate by setting it to a static value. However, for sp, there isn't a 
> gcc builtin fuction like __builtin_stack_pointer, so assembly code is 
> needed. What's more, if CONFIG_FRAME_POINTER is close, can fp be got by 
> __builtin_frame_address()?

Ah yes, I forgot the history of __builtin_stack_pointer (I think the LLVM
guys proposed it and it was rejected by GCC). Anyway, we can use
current_stack_pointer() instead.

I don't think CONFIG_FRAME_POINTER is relevant here; if you don't have
frame pointers then you won't be able to backtrace. The same issue happens
with your proposed patch.

Will

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] arm64: perf: Fix callchain parse error with kernel tracepoint events
  2015-04-29 10:12 ` Will Deacon
  2015-04-30  1:50   ` Hou Pengyang
@ 2015-04-30 13:55   ` Hou Pengyang
  1 sibling, 0 replies; 5+ messages in thread
From: Hou Pengyang @ 2015-04-30 13:55 UTC (permalink / raw
  To: Will Deacon
  Cc: a.p.zijlstra@chello.nl, paulus@samba.org, mingo@redhat.com,
	acme@kernel.org, wangnan0@huawei.com, Catalin Marinas,
	linux-kernel@vger.kernel.org,
	linux-arm-kernel@lists.infradead.org, mark.rutland

On 2015/4/29 18:12, Will Deacon wrote:
> Hello,
>
> On Tue, Apr 28, 2015 at 02:20:48PM +0100, Hou Pengyang wrote:
>> For ARM64, when tracing with tracepoint events, the IP and cpsr are set
>> to 0, preventing the perf code parsing the callchain and resolving the
>> symbols correctly.
>>
>>   ./perf record -e sched:sched_switch -g --call-graph dwarf ls
>> 	[ perf record: Captured and wrote 0.146 MB perf.data ]
>>   ./perf report -f
>>      Samples: 194  of event 'sched:sched_switch', Event count (approx.): 194
>>      Children      Self    Command  Shared Object     Symbol
>> 	100.00%       100.00%  ls       [unknown]         [.] 0000000000000000
>>
>> The fix is to implement perf_arch_fetch_caller_regs for ARM64, which fills
>> several necessary registers used for callchain unwinding, including pc,sp,
>> fp and psr .
>>
>> With this patch, callchain can be parsed correctly as follows:
>>
>>       ......
>> +    2.63%     0.00%  ls       [kernel.kallsyms]  [k] vfs_symlink
>> +    2.63%     0.00%  ls       [kernel.kallsyms]  [k] follow_down
>> +    2.63%     0.00%  ls       [kernel.kallsyms]  [k] pfkey_get
>> +    2.63%     0.00%  ls       [kernel.kallsyms]  [k] do_execveat_common.isra.33
>> -    2.63%     0.00%  ls       [kernel.kallsyms]  [k] pfkey_send_policy_notify
>>       pfkey_send_policy_notify
>>       pfkey_get
>>       v9fs_vfs_rename
>>       page_follow_link_light
>>       link_path_walk
>>       el0_svc_naked
>>      .......
>>
>> For tracepoint event, stack parsing also doesn't work well for ARM. Jean Pihet
>> comed up a patch:
>> http://thread.gmane.org/gmane.linux.kernel/1734283/focus=1734280
>
> Any chance you could revive that series too, please? I'd like to update both
> arm and arm64 together, since we're currently working at merging the two
> perf backends and introducing discrepencies is going to delay that even
> longer.
>
   hi, Will, following your suggestion, I have rewrite the patch in four
lines of C, which would be shown in patch v2. what's more, both arm and
arm64 are offered. code between arm and arm64 are almost the same, so it
would be convenient to merge them together.

BTW, you're working on merging perf backends of arm and arm64, by which
git address can I follow the progress? thanks.

Hou.

>> Signed-off-by: Hou Pengyang <houpengyang@huawei.com>
>> ---
>>   arch/arm64/include/asm/perf_event.h | 16 ++++++++++++++++
>>   1 file changed, 16 insertions(+)
>>
>> diff --git a/arch/arm64/include/asm/perf_event.h b/arch/arm64/include/asm/perf_event.h
>> index d26d1d5..16a074f 100644
>> --- a/arch/arm64/include/asm/perf_event.h
>> +++ b/arch/arm64/include/asm/perf_event.h
>> @@ -24,4 +24,20 @@ extern unsigned long perf_misc_flags(struct pt_regs *regs);
>>   #define perf_misc_flags(regs)	perf_misc_flags(regs)
>>   #endif
>>
>> +#define perf_arch_fetch_caller_regs(regs, __ip) { \
>> +   unsigned long sp;   \
>> +   __asm__ ("mov %[sp], sp\n" : [sp] "=r" (sp)); \
>> +       (regs)->pc = (__ip);    \
>> +       __asm__ (      \
>> +               "str %[sp],  %[_arm64_sp]  \n\t"    \
>> +               "str x29, %[_arm64_fp]  \n\t"    \
>> +               "mrs %[_arm64_cpsr], spsr_el1 \n\t"     \
>> +               : [_arm64_sp] "=m" (regs->sp),      \
>> +                 [_arm64_fp] "=m" (regs->regs[29]),  \
>> +                 [_arm64_cpsr] "=r" (regs->pstate) \
>
> Does this really all need to be in assembly code? Ideally we'd use something
> like __builtin_stack_pointer and __builtin_frame_pointer. That just leaves
> the CPSR, but given that it's (a) only used for user_mode_regs tests and (b)
> this macro is only used by ftrace, then we just set it to a static value
> indicating that we're at EL1.
>
> So I *think* we should be able to write this as three lines of C.
>
> Will
>
> .
>



^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2015-04-30 13:56 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2015-04-28 13:20 [PATCH] arm64: perf: Fix callchain parse error with kernel tracepoint events Hou Pengyang
2015-04-29 10:12 ` Will Deacon
2015-04-30  1:50   ` Hou Pengyang
2015-04-30  8:59     ` Will Deacon
2015-04-30 13:55   ` Hou Pengyang

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).