perf: wrong event->count report (Was: perf basic-test-aarch64 failures)

LKML Archive mirror
 help / color / mirror / Atom feed

* perf: wrong event->count report (Was: perf basic-test-aarch64 failures)
       [not found] ` <20160215070620.GA23850@dhcppc3.redhat.com>
@ 2016-02-17  3:34   ` Oleg Nesterov
  2016-02-17  9:35     ` Jiri Olsa
  0 siblings, 1 reply; 10+ messages in thread
From: Oleg Nesterov @ 2016-02-17  3:34 UTC (permalink / raw
  To: Pratyush Anand, Peter Zijlstra, Jiri Olsa
  Cc: Jeff Bastian, Michael Petlan, linux-kernel, arm-devel

Finally I reproduced... let me add CC's and reply to initial message. This has
nothing to do with arm/uprobes.

I simply can't understand how perf calculates ->total_time_enabled/running.
At all. But the problem is that

	1. perf_event_enable_on_exec() does enable first, then event_sched_in().

	   After that tstamp_enabled < tstamp_running

	2. This means that after the next update_event_times()
           total_time_running < total_time_enabled

           again, I fail to understand these calculations, but this is what
           perf_event_read_value() reports to user-space.

	3. /usr/bin/perf calls perf_counts_values__scale() which does

		else if (count->run < count->ena) {
			scaled = 1;
			count->val = (u64)((double) count->val * count->ena / count->run + 0.5);
		}

	   and this is why you see the wrong number. count->val was correct but
	   wrongly updated because  total_time_running < total_time_enabled.

I leave this to Peter and Jiri ;)

Oleg.

On 02/15, Pratyush Anand wrote:
>
> Hi Oleg/Jiri,
>
> Latest Linux upstream code is broken for a test case related to uprobe. I have
> tested ARM64 as well as x86 platform and both of them have been broken.
>
> Test case instruments uprobe at each function f_* in the following code and
> counts number of probe callback called for them.
> https://github.com/rfmvh/perftool-testsuite/blob/master/base_probe/examples/exact_counts.c
>
> Test case is:
>
> # add probes
> for i in 1 2 3 103 997 65535; do
>     perf probe -x examples/exact_counts --add f_${i}x
> done
>
> #perf_probe :: test_exact_counts :: using probes :: perf stat
>
> perf stat -x';' -e 'probe_exact:*' examples/exact_counts
>
> And we get something like following on x86.
>
> 65594;;probe_exact:f_65535x;216738260;99.91
> 998;;probe_exact:f_997x;216738260;99.91
> 103;;probe_exact:f_103x;216738260;99.91
> 3;;probe_exact:f_3x;216738260;99.91
> 2;;probe_exact:f_2x;216738260;99.91
> 1;;probe_exact:f_1x;216738260;99.91
>
> So, here you can see that f_65535x count is 65594, while it was expected as
> 65535.
>
> Some observation which might be helpful to resolve it:
>
> - if we use -a with perf stat, then count is correct.
>   perf stat -x';' -e 'probe_exact:*' -a examples/exact_counts
> - When count is wrong, I noticed that "run" and "ena" values for count has huge
>   difference, which causes count to be scaled up by perf tool.
> - Following upstream commit is the first bad commit which is causing this issue.
> # first bad commit: [3e349507d12de93b08b0aa814fc2aa0dee91c5ba] perf: Fix perf_enable_on_exec() event scheduling
>
> ~Pratyush
>
> On 11/02/2016:02:37:56 PM, Michael Petlan wrote:
> > On Thu, 2016-02-11 at 15:00 +0530, Pratyush Anand wrote:
> > > Hi Michael,
> > >
> >
> > Hi.
> >
> > > Can you please replace perf tools with following and try to see if result
> > > improves (I expect test 1 and 5 should pass now).
> > >
> > > http://download.devel.redhat.com/brewroot/work/tasks/9732/10469732/perf-4.5.0-0.rc2.26.el7.test.aarch64.rpm
> >
> > Yes, this perf passes the first and fifth testcase. The rest is the
> > same.
> >
> > >
> > > I will need to look into the test instructions for failing test 2,3,4 (described
> > > in your mail).  I have kernel test package
> > > (git://pkgs.devel.redhat.com/tests/kernel) cloned on my system, but I do not see
> > > directory "perf/Sanity/basic-test-aarch64" in that test package. So can you
> > > please share correct path for kernel-aarch64 test package repository.
> > >
> >
> > This test is in a separate perf repository:
> >
> > git://pkgs.devel.redhat.com/tests/perf
> >
> > It is just a beaker wrapper for the testsuite, which is upstream here:
> >
> > https://github.com/rfmvh/perftool-testsuite
> >
> > Basically, the aarch64 version of the beaker test bundle contains
> > the upstream version. The non-aarch64 test has some changes and hacks in
> > order to run on RHEL.
> >
> > > ~Pratyush
> > >
> >
> > In order to reproduce the failures, do the following:
> >
> > cd base_probe
> > export CMD_PERF=/path/to/your/perf   # not necessary for default
> > export TESTMODE_QUIET=n
> > ./setup.sh
> > ./test_exact_counts.sh
> > ./test_advanced.sh
> >
> > After running the setup.sh script, you may reproduce it manually too:
> >
> > (2) perf_probe :: test_exact_counts :: using probes :: perf report
> >
> > # add probes
> > for i in 1 2 3 103 997 65535; do
> >     perf probe -x examples/exact_counts --add f_${i}x
> > done
> >
> > # record
> > perf record -e 'probe_exact:*' -- examples/exact_counts
> > # there should be 66641 samples
> >
> > # report
> > perf report -n --stdio
> >
> > ###################################################################
> >
> > (3) perf_probe :: test_exact_counts :: using probes :: perf stat
> >
> > After running setup.sh, run:
> >
> > perf stat -x';' -e 'probe_exact:*' examples/exact_counts
> >
> > ## The number of event occurrences should be the same as the names
> > ## of the functions.
> >
> > ###################################################################
> >
> >
> > (4) perf_probe :: test_advanced :: function argument probing :: script
> >
> > After running setup.sh, run:
> >
> > perf probe -x examples/advanced --add 'isprime a'
> > perf record -e 'probe_advanced:isprime' examples/advanced
> > perf script
> >
> >
> >
> >
> > Michael

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: perf: wrong event->count report (Was: perf basic-test-aarch64 failures)
  2016-02-17  3:34   ` perf: wrong event->count report (Was: perf basic-test-aarch64 failures) Oleg Nesterov
@ 2016-02-17  9:35     ` Jiri Olsa
  2016-02-17 10:25       ` Peter Zijlstra
  2016-02-17 10:28       ` Peter Zijlstra
  0 siblings, 2 replies; 10+ messages in thread
From: Jiri Olsa @ 2016-02-17  9:35 UTC (permalink / raw
  To: Oleg Nesterov
  Cc: Pratyush Anand, Peter Zijlstra, Jeff Bastian, Michael Petlan,
	linux-kernel, arm-devel

On Wed, Feb 17, 2016 at 04:34:16AM +0100, Oleg Nesterov wrote:
> Finally I reproduced... let me add CC's and reply to initial message. This has
> nothing to do with arm/uprobes.
> 
> I simply can't understand how perf calculates ->total_time_enabled/running.
> At all. But the problem is that
> 
> 	1. perf_event_enable_on_exec() does enable first, then event_sched_in().
> 
> 	   After that tstamp_enabled < tstamp_running
> 
> 	2. This means that after the next update_event_times()
>            total_time_running < total_time_enabled
> 
>            again, I fail to understand these calculations, but this is what
>            perf_event_read_value() reports to user-space.
> 
> 	3. /usr/bin/perf calls perf_counts_values__scale() which does
> 
> 		else if (count->run < count->ena) {
> 			scaled = 1;
> 			count->val = (u64)((double) count->val * count->ena / count->run + 0.5);
> 		}
> 
> 	   and this is why you see the wrong number. count->val was correct but
> 	   wrongly updated because  total_time_running < total_time_enabled.
> 
> I leave this to Peter and Jiri ;)

I did not notice the ther conversation wasn't public, reposting ;-)

jirka

---
ouch, I tested with fedora kernel.. I can reproduce with 4.5

Pratyush bisected this into following commit:
[3e349507d12de93b08b0aa814fc2aa0dee91c5ba] perf: Fix perf_enable_on_exec() event scheduling

it seems the commit above introduced unwanted difference between
counter's enabled and running times.. I'm checking on that ;-)

as a workaround you could use --no-scale option to disable
scaling:

  # perf stat --no-scale -x';' -e 'probe_exact:*' ./exact_counts
  65535;;probe_exact:f_65535x;0;100.00
  997;;probe_exact:f_997x;0;100.00
  103;;probe_exact:f_103x;0;100.00
  3;;probe_exact:f_3x;0;100.00
  2;;probe_exact:f_2x;0;100.00
  1;;probe_exact:f_1x;0;100.00

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: perf: wrong event->count report (Was: perf basic-test-aarch64 failures)
  2016-02-17  9:35     ` Jiri Olsa
@ 2016-02-17 10:25       ` Peter Zijlstra
  2016-02-17 14:44         ` Jiri Olsa
  2016-02-17 19:34         ` Oleg Nesterov
  2016-02-17 10:28       ` Peter Zijlstra
  1 sibling, 2 replies; 10+ messages in thread
From: Peter Zijlstra @ 2016-02-17 10:25 UTC (permalink / raw
  To: Jiri Olsa
  Cc: Oleg Nesterov, Pratyush Anand, Jeff Bastian, Michael Petlan,
	linux-kernel, arm-devel

On Wed, Feb 17, 2016 at 10:35:39AM +0100, Jiri Olsa wrote:
> On Wed, Feb 17, 2016 at 04:34:16AM +0100, Oleg Nesterov wrote:
> > Finally I reproduced... let me add CC's and reply to initial message. This has
> > nothing to do with arm/uprobes.
> > 
> > I simply can't understand how perf calculates ->total_time_enabled/running.
> > At all. But the problem is that
> > 
> > 	1. perf_event_enable_on_exec() does enable first, then event_sched_in().
> > 
> > 	   After that tstamp_enabled < tstamp_running
> > 
> > 	2. This means that after the next update_event_times()
> >            total_time_running < total_time_enabled
> > 
> >            again, I fail to understand these calculations, but this is what
> >            perf_event_read_value() reports to user-space.
> > 
> > 	3. /usr/bin/perf calls perf_counts_values__scale() which does
> > 
> > 		else if (count->run < count->ena) {
> > 			scaled = 1;
> > 			count->val = (u64)((double) count->val * count->ena / count->run + 0.5);
> > 		}
> > 
> > 	   and this is why you see the wrong number. count->val was correct but
> > 	   wrongly updated because  total_time_running < total_time_enabled.
> > 
> > I leave this to Peter and Jiri ;)
> 
> I did not notice the ther conversation wasn't public, reposting ;-)
> 
> jirka
> 
> ---
> ouch, I tested with fedora kernel.. I can reproduce with 4.5
> 
> Pratyush bisected this into following commit:
> [3e349507d12de93b08b0aa814fc2aa0dee91c5ba] perf: Fix perf_enable_on_exec() event scheduling
> 
> it seems the commit above introduced unwanted difference between
> counter's enabled and running times.. I'm checking on that ;-)

Does something like so work?

---

So prior to 3e349507d12d ("perf: Fix perf_enable_on_exec() event
scheduling") we used to call task_ctx_sched_out() before
event_enable_on_exec().

ctx_sched_out() will call update_context_time(), therefore
__perf_event_mark_enabled() would have an up-to-date ctx->time.

Now, not so much. So explicitly update the ctx time before calling
event_enable_on_exec().

ctx_resched() will again call update_context_time(), resulting in a
slight difference the other way (running > enabled), which doesn't make
any sense either, but that we can (and should) clip.


--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -3173,6 +3173,10 @@ static void perf_event_enable_on_exec(in
 
 	cpuctx = __get_cpu_context(ctx);
 	perf_ctx_lock(cpuctx, ctx);
+
+	update_context_time(ctx);
+	update_cgrp_time_from_cpuctx(cpuctx);
+
 	list_for_each_entry(event, &ctx->event_list, event_entry)
 		enabled |= event_enable_on_exec(event, ctx);
 

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: perf: wrong event->count report (Was: perf basic-test-aarch64 failures)
  2016-02-17  9:35     ` Jiri Olsa
  2016-02-17 10:25       ` Peter Zijlstra
@ 2016-02-17 10:28       ` Peter Zijlstra
  2016-02-17 10:31         ` Jiri Olsa
  2016-02-17 19:18         ` Oleg Nesterov
  1 sibling, 2 replies; 10+ messages in thread
From: Peter Zijlstra @ 2016-02-17 10:28 UTC (permalink / raw
  To: Jiri Olsa
  Cc: Oleg Nesterov, Pratyush Anand, Jeff Bastian, Michael Petlan,
	linux-kernel



Also, you _know_ you should not cross-post to closed lists, that's rude.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: perf: wrong event->count report (Was: perf basic-test-aarch64 failures)
  2016-02-17 10:28       ` Peter Zijlstra
@ 2016-02-17 10:31         ` Jiri Olsa
  2016-02-17 19:18         ` Oleg Nesterov
  1 sibling, 0 replies; 10+ messages in thread
From: Jiri Olsa @ 2016-02-17 10:31 UTC (permalink / raw
  To: Peter Zijlstra
  Cc: Oleg Nesterov, Pratyush Anand, Jeff Bastian, Michael Petlan,
	linux-kernel

On Wed, Feb 17, 2016 at 11:28:29AM +0100, Peter Zijlstra wrote:
> 
> 
> Also, you _know_ you should not cross-post to closed lists, that's rude.

hum.. nope ;-) also did not realized it was closed one, sry

jirka

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: perf: wrong event->count report (Was: perf basic-test-aarch64 failures)
  2016-02-17 10:25       ` Peter Zijlstra
@ 2016-02-17 14:44         ` Jiri Olsa
  2016-02-17 14:56           ` Peter Zijlstra
  2016-02-17 19:34         ` Oleg Nesterov
  1 sibling, 1 reply; 10+ messages in thread
From: Jiri Olsa @ 2016-02-17 14:44 UTC (permalink / raw
  To: Peter Zijlstra
  Cc: Oleg Nesterov, Pratyush Anand, Jeff Bastian, Michael Petlan,
	linux-kernel, arm-devel

On Wed, Feb 17, 2016 at 11:25:55AM +0100, Peter Zijlstra wrote:

SNIP

> 
> ---
> 
> So prior to 3e349507d12d ("perf: Fix perf_enable_on_exec() event
> scheduling") we used to call task_ctx_sched_out() before
> event_enable_on_exec().
> 
> ctx_sched_out() will call update_context_time(), therefore
> __perf_event_mark_enabled() would have an up-to-date ctx->time.
> 
> Now, not so much. So explicitly update the ctx time before calling
> event_enable_on_exec().
> 
> ctx_resched() will again call update_context_time(), resulting in a
> slight difference the other way (running > enabled), which doesn't make
> any sense either, but that we can (and should) clip.
> 
> 
> --- a/kernel/events/core.c
> +++ b/kernel/events/core.c
> @@ -3173,6 +3173,10 @@ static void perf_event_enable_on_exec(in
>  
>  	cpuctx = __get_cpu_context(ctx);
>  	perf_ctx_lock(cpuctx, ctx);
> +
> +	update_context_time(ctx);
> +	update_cgrp_time_from_cpuctx(cpuctx);
> +
>  	list_for_each_entry(event, &ctx->event_list, event_entry)
>  		enabled |= event_enable_on_exec(event, ctx);
>  

that gives me attached fault

jirka


---
ibm-x3650m4-01 login: [  670.771477] kernel tried to execute NX-protected page - exploit attempt? (uid: 0)
[  670.779831] BUG: unable to handle kernel paging request at ffff88047fad6e30
[  670.787618] IP: [<ffff88047fad6e30>] 0xffff88047fad6e30
[  670.793455] PGD 2033067 PUD 275e8f063 PMD 800000047fa001e3
[  670.799699] Oops: 0011 [#1] SMP
[  670.803312] Modules linked in: intel_rapl x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm irqbypass crct10dif_pclmul crc32_pclmul ghash_clmulni_intel aesni_intel lrw gf128mul glue_helper ipmi_ssif ablk_helper iTCO_wdt cryptd cdc_ether ipmi_devintf usbnet sb_edac iTCO_vendor_support edac_core ipmi_si mii shpchp lpc_ich pcspkr sg ipmi_msghandler wmi ioatdma nfsd mfd_core i2c_i801 auth_rpcgss nfs_acl lockd grace sunrpc ip_tables xfs libcrc32c sd_mod sr_mod cdrom mgag200 drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops ttm igb drm ptp ahci libahci pps_core libata crc32c_intel dca megaraid_sas i2c_algo_bit i2c_core dm_mirror dm_region_hash dm_log dm_mod
[  670.870508] CPU: 13 PID: 11185 Comm: exact_counts Not tainted 4.5.0-rc2peterz+ #12
[  670.878955] Hardware name: IBM System x3650 M4 : -[7915E2G]-/00Y7683, BIOS -[VVE124AUS-1.30]- 11/21/2012
[  670.889535] task: ffff8802750795c0 ti: ffff88003532c000 task.ti: ffff88003532c000
[  670.897884] RIP: 0010:[<ffff88047fad6e30>]  [<ffff88047fad6e30>] 0xffff88047fad6e30
[  670.906437] RSP: 0000:ffff88003532fd90  EFLAGS: 00010286
[  670.912362] RAX: ffff880474e009e8 RBX: ffff88003532fe58 RCX: ffff88047fad6e30
[  670.920323] RDX: ffff88003532fdb0 RSI: ffff88047148e180 RDI: ffff880474e009e8
[  670.928284] RBP: ffff88003532fd98 R08: 0000000000000000 R09: ffff88003532fe58
[  670.936245] R10: 00003fffffe00000 R11: 00003ffffffff000 R12: ffff880272dfd540
[  670.944206] R13: 0000000000000002 R14: ffff8800350f0ff8 R15: ffff88047148e180
[  670.952168] FS:  00007fe8e4e57740(0000) GS:ffff880277bc0000(0000) knlGS:0000000000000000
[  670.961197] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  670.967608] CR2: ffff88047fad6e30 CR3: 0000000035527000 CR4: 00000000000406e0
[  670.975569] Stack:
[  670.977811]  ffffffff811bf1a1 ffff88003532fdf8 ffffffff811b65ed ffffea0000d3ab40
[  670.986096]  024000c000000054 0000000000000000 00007fffffffe000 0000000000000000
[  670.994383]  0000000000000000 00007fffffffe080 ffff880272dfd540 000000006be5d191
[  671.002671] Call Trace:
[  671.005403]  [<ffffffff811bf1a1>] ? special_mapping_fault+0x31/0x90
[  671.012396]  [<ffffffff811b65ed>] __do_fault+0x6d/0xe0
[  671.018127]  [<ffffffff811bb076>] handle_mm_fault+0xd96/0x1ab0
[  671.024637]  [<ffffffff81182e40>] ? uprobe_notify_resume+0x700/0xa10
[  671.031729]  [<ffffffff81211a53>] ? __fput+0x193/0x220
[  671.037462]  [<ffffffff8106908b>] __do_page_fault+0x18b/0x400
[  671.043872]  [<ffffffff81069330>] do_page_fault+0x30/0x80
[  671.049898]  [<ffffffff816ab608>] page_fault+0x28/0x30
[  671.055628] Code: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 20 6e ad 7f 04 88 ff ff 20 6e ad 7f 04 88 ff ff <00> 00 00 00 00 00 00 00 00 00 00 00 10 00 00 00 40 6e ad 7f 04
[  671.077296] RIP  [<ffff88047fad6e30>] 0xffff88047fad6e30
[  671.083231]  RSP <ffff88003532fd90>
[  671.087119] CR2: ffff88047fad6e30
[  671.094220] ---[ end trace 17a74a9c13f887c1 ]---
[  671.099369] Kernel panic - not syncing: Fatal exception
[  671.105249] Kernel Offset: disabled
[  671.109138] ---[ end Kernel panic - not syncing: Fatal exception
[  671.115850] ------------[ cut here ]------------
[  671.121004] WARNING: CPU: 13 PID: 11185 at arch/x86/kernel/smp.c:125 native_smp_send_reschedule+0x3e/0x40()
[  671.131874] Modules linked in: intel_rapl x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm irqbypass crct10dif_pclmul crc32_pclmul ghash_clmulni_intel aesni_intel lrw gf128mul glue_helper ipmi_ssif ablk_helper iTCO_wdt cryptd cdc_ether ipmi_devintf usbnet sb_edac iTCO_vendor_support edac_core ipmi_si mii shpchp lpc_ich pcspkr sg ipmi_msghandler wmi ioatdma nfsd mfd_core i2c_i801 auth_rpcgss nfs_acl lockd grace sunrpc ip_tables xfs libcrc32c sd_mod sr_mod cdrom mgag200 drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops ttm igb drm ptp ahci libahci pps_core libata crc32c_intel dca megaraid_sas i2c_algo_bit i2c_core dm_mirror dm_region_hash dm_log dm_mod
[  671.199064] CPU: 13 PID: 11185 Comm: exact_counts Tainted: G      D         4.5.0-rc2peterz+ #12
[  671.208869] Hardware name: IBM System x3650 M4 : -[7915E2G]-/00Y7683, BIOS -[VVE124AUS-1.30]- 11/21/2012
[  671.219448]  0000000000000000 000000006be5d191 ffff880277bc3da8 ffffffff81338380
[  671.227737]  0000000000000000 ffff880277bc3de0 ffffffff810869b6 0000000000000000
[  671.236027]  ffff880277a16c80 000000000000000d 000000000000000d 000000000000de68
[  671.244317] Call Trace:
[  671.247042]  <IRQ>  [<ffffffff81338380>] dump_stack+0x44/0x64
[  671.253467]  [<ffffffff810869b6>] warn_slowpath_common+0x86/0xc0
[  671.260168]  [<ffffffff81086afa>] warn_slowpath_null+0x1a/0x20
[  671.266675]  [<ffffffff8104fb2e>] native_smp_send_reschedule+0x3e/0x40
[  671.273959]  [<ffffffff810c3d85>] trigger_load_balance+0x145/0x1f0
[  671.280855]  [<ffffffff810b2d76>] scheduler_tick+0xa6/0xe0
[  671.286976]  [<ffffffff810ffc70>] ? tick_sched_do_timer+0x50/0x50
[  671.293775]  [<ffffffff810f0871>] update_process_times+0x51/0x60
[  671.300476]  [<ffffffff810ffa25>] tick_sched_handle.isra.17+0x25/0x60
[  671.307661]  [<ffffffff810ffcad>] tick_sched_timer+0x3d/0x70
[  671.313975]  [<ffffffff810f13d3>] __hrtimer_run_queues+0xf3/0x220
[  671.320773]  [<ffffffff810f1838>] hrtimer_interrupt+0xa8/0x1a0
[  671.327281]  [<ffffffff810522e5>] local_apic_timer_interrupt+0x35/0x60
[  671.334566]  [<ffffffff816abf1d>] smp_apic_timer_interrupt+0x3d/0x60
[  671.341655]  [<ffffffff816aa05c>] apic_timer_interrupt+0x8c/0xa0
[  671.348354]  <EOI>  [<ffffffff811859d4>] ? panic+0x1e5/0x229
[  671.354681]  [<ffffffff8101a9e0>] oops_end+0xc0/0xd0
[  671.360218]  [<ffffffff81068579>] no_context+0x139/0x390
[  671.366142]  [<ffffffff810688d9>] __bad_area_nosemaphore+0x109/0x210
[  671.373231]  [<ffffffff810689f3>] bad_area_nosemaphore+0x13/0x20
[  671.379931]  [<ffffffff81068f84>] __do_page_fault+0x84/0x400
[  671.386244]  [<ffffffff81069330>] do_page_fault+0x30/0x80
[  671.392267]  [<ffffffff816ab608>] page_fault+0x28/0x30
[  671.398001]  [<ffffffff811bf1a1>] ? special_mapping_fault+0x31/0x90
[  671.404992]  [<ffffffff811b65ed>] __do_fault+0x6d/0xe0
[  671.410724]  [<ffffffff811bb076>] handle_mm_fault+0xd96/0x1ab0
[  671.417232]  [<ffffffff81182e40>] ? uprobe_notify_resume+0x700/0xa10
[  671.424322]  [<ffffffff81211a53>] ? __fput+0x193/0x220
[  671.430054]  [<ffffffff8106908b>] __do_page_fault+0x18b/0x400
[  671.436464]  [<ffffffff81069330>] do_page_fault+0x30/0x80
[  671.442486]  [<ffffffff816ab608>] page_fault+0x28/0x30
[  671.448217] ---[ end trace 17a74a9c13f887c2 ]---

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: perf: wrong event->count report (Was: perf basic-test-aarch64 failures)
  2016-02-17 14:44         ` Jiri Olsa
@ 2016-02-17 14:56           ` Peter Zijlstra
  0 siblings, 0 replies; 10+ messages in thread
From: Peter Zijlstra @ 2016-02-17 14:56 UTC (permalink / raw
  To: Jiri Olsa
  Cc: Oleg Nesterov, Pratyush Anand, Jeff Bastian, Michael Petlan,
	linux-kernel, arm-devel

On Wed, Feb 17, 2016 at 03:44:56PM +0100, Jiri Olsa wrote:
> > +++ b/kernel/events/core.c
> > @@ -3173,6 +3173,10 @@ static void perf_event_enable_on_exec(in
> >  
> >  	cpuctx = __get_cpu_context(ctx);
> >  	perf_ctx_lock(cpuctx, ctx);
> > +
> > +	update_context_time(ctx);
> > +	update_cgrp_time_from_cpuctx(cpuctx);
> > +
> >  	list_for_each_entry(event, &ctx->event_list, event_entry)
> >  		enabled |= event_enable_on_exec(event, ctx);
> >  
> 
> that gives me attached fault
> 
> jirka
> 
> 
> ---
> ibm-x3650m4-01 login: [  670.771477] kernel tried to execute NX-protected page - exploit attempt? (uid: 0)
> [  670.779831] BUG: unable to handle kernel paging request at ffff88047fad6e30
> [  670.787618] IP: [<ffff88047fad6e30>] 0xffff88047fad6e30
> [  670.793455] PGD 2033067 PUD 275e8f063 PMD 800000047fa001e3
> [  670.799699] Oops: 0011 [#1] SMP
> [  670.803312] Modules linked in: intel_rapl x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm irqbypass crct10dif_pclmul crc32_pclmul ghash_clmulni_intel aesni_intel lrw gf128mul glue_helper ipmi_ssif ablk_helper iTCO_wdt cryptd cdc_ether ipmi_devintf usbnet sb_edac iTCO_vendor_support edac_core ipmi_si mii shpchp lpc_ich pcspkr sg ipmi_msghandler wmi ioatdma nfsd mfd_core i2c_i801 auth_rpcgss nfs_acl lockd grace sunrpc ip_tables xfs libcrc32c sd_mod sr_mod cdrom mgag200 drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops ttm igb drm ptp ahci libahci pps_core libata crc32c_intel dca megaraid_sas i2c_algo_bit i2c_core dm_mirror dm_region_hash dm_log dm_mod
> [  670.870508] CPU: 13 PID: 11185 Comm: exact_counts Not tainted 4.5.0-rc2peterz+ #12
> [  670.878955] Hardware name: IBM System x3650 M4 : -[7915E2G]-/00Y7683, BIOS -[VVE124AUS-1.30]- 11/21/2012
> [  670.889535] task: ffff8802750795c0 ti: ffff88003532c000 task.ti: ffff88003532c000
> [  670.897884] RIP: 0010:[<ffff88047fad6e30>]  [<ffff88047fad6e30>] 0xffff88047fad6e30
> [  670.906437] RSP: 0000:ffff88003532fd90  EFLAGS: 00010286
> [  670.912362] RAX: ffff880474e009e8 RBX: ffff88003532fe58 RCX: ffff88047fad6e30
> [  670.920323] RDX: ffff88003532fdb0 RSI: ffff88047148e180 RDI: ffff880474e009e8
> [  670.928284] RBP: ffff88003532fd98 R08: 0000000000000000 R09: ffff88003532fe58
> [  670.936245] R10: 00003fffffe00000 R11: 00003ffffffff000 R12: ffff880272dfd540
> [  670.944206] R13: 0000000000000002 R14: ffff8800350f0ff8 R15: ffff88047148e180
> [  670.952168] FS:  00007fe8e4e57740(0000) GS:ffff880277bc0000(0000) knlGS:0000000000000000
> [  670.961197] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [  670.967608] CR2: ffff88047fad6e30 CR3: 0000000035527000 CR4: 00000000000406e0
> [  670.975569] Stack:
> [  670.977811]  ffffffff811bf1a1 ffff88003532fdf8 ffffffff811b65ed ffffea0000d3ab40
> [  670.986096]  024000c000000054 0000000000000000 00007fffffffe000 0000000000000000
> [  670.994383]  0000000000000000 00007fffffffe080 ffff880272dfd540 000000006be5d191
> [  671.002671] Call Trace:
> [  671.005403]  [<ffffffff811bf1a1>] ? special_mapping_fault+0x31/0x90
> [  671.012396]  [<ffffffff811b65ed>] __do_fault+0x6d/0xe0
> [  671.018127]  [<ffffffff811bb076>] handle_mm_fault+0xd96/0x1ab0
> [  671.024637]  [<ffffffff81182e40>] ? uprobe_notify_resume+0x700/0xa10
> [  671.031729]  [<ffffffff81211a53>] ? __fput+0x193/0x220
> [  671.037462]  [<ffffffff8106908b>] __do_page_fault+0x18b/0x400
> [  671.043872]  [<ffffffff81069330>] do_page_fault+0x30/0x80
> [  671.049898]  [<ffffffff816ab608>] page_fault+0x28/0x30
> [  671.055628] Code: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 20 6e ad 7f 04 88 ff ff 20 6e ad 7f 04 88 ff ff <00> 00 00 00 00 00 00 00 00 00 00 00 10 00 00 00 40 6e ad 7f 04
> [  671.077296] RIP  [<ffff88047fad6e30>] 0xffff88047fad6e30
> [  671.083231]  RSP <ffff88003532fd90>
> [  671.087119] CR2: ffff88047fad6e30
> [  671.094220] ---[ end trace 17a74a9c13f887c1 ]---

That's a particularly useless splat; how can it not show what code
triggered the pagefault!?

The RIP not getting a symbolic name also doesn't inspire confidence.

Too weird.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: perf: wrong event->count report (Was: perf basic-test-aarch64 failures)
  2016-02-17 10:28       ` Peter Zijlstra
  2016-02-17 10:31         ` Jiri Olsa
@ 2016-02-17 19:18         ` Oleg Nesterov
  1 sibling, 0 replies; 10+ messages in thread
From: Oleg Nesterov @ 2016-02-17 19:18 UTC (permalink / raw
  To: Peter Zijlstra
  Cc: Jiri Olsa, Pratyush Anand, Jeff Bastian, Michael Petlan,
	linux-kernel

On 02/17, Peter Zijlstra wrote:
>
>
> Also, you _know_ you should not cross-post to closed lists, that's rude.

It was me, not Jiri, sorry. Forgot to remove arm-devel.

Oleg.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: perf: wrong event->count report (Was: perf basic-test-aarch64 failures)
  2016-02-17 10:25       ` Peter Zijlstra
  2016-02-17 14:44         ` Jiri Olsa
@ 2016-02-17 19:34         ` Oleg Nesterov
  2016-02-17 19:43           ` Peter Zijlstra
  1 sibling, 1 reply; 10+ messages in thread
From: Oleg Nesterov @ 2016-02-17 19:34 UTC (permalink / raw
  To: Peter Zijlstra
  Cc: Jiri Olsa, Pratyush Anand, Jeff Bastian, Michael Petlan,
	linux-kernel, arm-devel

On 02/17, Peter Zijlstra wrote:
>
> --- a/kernel/events/core.c
> +++ b/kernel/events/core.c
> @@ -3173,6 +3173,10 @@ static void perf_event_enable_on_exec(in
>
>  	cpuctx = __get_cpu_context(ctx);
>  	perf_ctx_lock(cpuctx, ctx);
> +
> +	update_context_time(ctx);
> +	update_cgrp_time_from_cpuctx(cpuctx);
> +

Even if I don't really understand this change I agree, probably we need to update
the counters for enable_on_exec events somehow. But I don't see how this change can
make total_time_running == total_time_enabled.

And probably this is fine. In fact I do not understand why they should be equal.
I inserted printf() into perf_counts_values__scale() to verify that /usr/bin/perf
always sees count->run < count->ena even if I do, say,

	# perf stat -x- -e module:module_load -a sleep 1

	0--module:module_load-1007397333-100.00

the kernel reports run=1007397333 ena=1007397573. Close but not equal. Again, I
do not think this is wrong, I am only saying that I don't understand this logic
because this always means "scaled" for perf.

Nevermind, I never looked into tools/perf before, most probably I simply do not
understand it at all.

Oleg.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: perf: wrong event->count report (Was: perf basic-test-aarch64 failures)
  2016-02-17 19:34         ` Oleg Nesterov
@ 2016-02-17 19:43           ` Peter Zijlstra
  0 siblings, 0 replies; 10+ messages in thread
From: Peter Zijlstra @ 2016-02-17 19:43 UTC (permalink / raw
  To: Oleg Nesterov
  Cc: Jiri Olsa, Pratyush Anand, Jeff Bastian, Michael Petlan,
	linux-kernel

On Wed, Feb 17, 2016 at 08:34:42PM +0100, Oleg Nesterov wrote:
> On 02/17, Peter Zijlstra wrote:
> >
> > --- a/kernel/events/core.c
> > +++ b/kernel/events/core.c
> > @@ -3173,6 +3173,10 @@ static void perf_event_enable_on_exec(in
> >
> >  	cpuctx = __get_cpu_context(ctx);
> >  	perf_ctx_lock(cpuctx, ctx);
> > +
> > +	update_context_time(ctx);
> > +	update_cgrp_time_from_cpuctx(cpuctx);
> > +
> 
> Even if I don't really understand this change I agree, probably we need to update
> the counters for enable_on_exec events somehow. But I don't see how this change can
> make total_time_running == total_time_enabled.

Yes, to get them exactly equal more is needed. But I suspect the very
small difference generated here is practically irrelevant.

Making them equal requires some very careful auditing, but is otherwise
entirely possible.

The bigger problem seems to be that this (seemingly) simple change makes
Jiri's machine explode, I yet have to look at reproducing that.

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2016-02-17 19:43 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <20160211093047.GA12115@dhcp-0-82.del.redhat.com>
     [not found] ` <20160215070620.GA23850@dhcppc3.redhat.com>
2016-02-17  3:34   ` perf: wrong event->count report (Was: perf basic-test-aarch64 failures) Oleg Nesterov
2016-02-17  9:35     ` Jiri Olsa
2016-02-17 10:25       ` Peter Zijlstra
2016-02-17 14:44         ` Jiri Olsa
2016-02-17 14:56           ` Peter Zijlstra
2016-02-17 19:34         ` Oleg Nesterov
2016-02-17 19:43           ` Peter Zijlstra
2016-02-17 10:28       ` Peter Zijlstra
2016-02-17 10:31         ` Jiri Olsa
2016-02-17 19:18         ` Oleg Nesterov

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).