Linux-perf-users Archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/3] perf test: "object code reading" test fixes
@ 2024-04-09  8:47 James Clark
  2024-04-09  8:47 ` [PATCH 1/3] perf tests: Apply attributes to all events in object code reading test James Clark
                   ` (2 more replies)
  0 siblings, 3 replies; 8+ messages in thread
From: James Clark @ 2024-04-09  8:47 UTC (permalink / raw
  To: linux-perf-users, irogers
  Cc: James Clark, Peter Zijlstra, Ingo Molnar,
	Arnaldo Carvalho de Melo, Namhyung Kim, Mark Rutland,
	Alexander Shishkin, Jiri Olsa, Adrian Hunter, Liang, Kan,
	Athira Rajeev, Leo Yan, linux-kernel

A few fixes around the object code reading test. The third patch
appears to be unrelated, but in this case the data symbol test was
broken on Arm N1 by the first commit.

James Clark (3):
  perf tests: Apply attributes to all events in object code reading test
  perf map: Remove kernel map before updating start and end addresses
  perf tests: Skip "test data symbol" on Neoverse N1

 tools/perf/tests/code-reading.c            | 10 +++++-----
 tools/perf/tests/shell/test_data_symbol.sh |  6 ++++++
 tools/perf/util/machine.c                  |  2 +-
 3 files changed, 12 insertions(+), 6 deletions(-)

-- 
2.34.1


^ permalink raw reply	[flat|nested] 8+ messages in thread

* [PATCH 1/3] perf tests: Apply attributes to all events in object code reading test
  2024-04-09  8:47 [PATCH 0/3] perf test: "object code reading" test fixes James Clark
@ 2024-04-09  8:47 ` James Clark
  2024-04-09  8:47 ` [PATCH 2/3] perf map: Remove kernel map before updating start and end addresses James Clark
  2024-04-09  8:47 ` [PATCH 3/3] perf tests: Skip "test data symbol" on Neoverse N1 James Clark
  2 siblings, 0 replies; 8+ messages in thread
From: James Clark @ 2024-04-09  8:47 UTC (permalink / raw
  To: linux-perf-users, irogers
  Cc: James Clark, Peter Zijlstra, Ingo Molnar,
	Arnaldo Carvalho de Melo, Namhyung Kim, Mark Rutland,
	Alexander Shishkin, Jiri Olsa, Adrian Hunter, Liang, Kan,
	Athira Rajeev, Leo Yan, linux-kernel

PERF_PMU_CAP_EXTENDED_HW_TYPE results in multiple events being opened on
heterogeneous systems. Currently this test only sets its required
attributes on the first event. Not disabling enable_on_exec on the other
events causes the test to fail because the forked objdump processes are
sampled. No tracking event is opened so Perf only knows about its own
mappings causing the objdump samples to give the following error:

  $ perf test -vvv "object code reading"

  Reading object code for memory address: 0xffff9aaa55ec
  thread__find_map failed
  ---- end(-1) ----
  24: Object code reading              : FAILED!

Fixes: 251aa040244a ("perf parse-events: Wildcard most "numeric" events")
Signed-off-by: James Clark <james.clark@arm.com>
---
 tools/perf/tests/code-reading.c | 10 +++++-----
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/tools/perf/tests/code-reading.c b/tools/perf/tests/code-reading.c
index 7a3a7bbbec71..29d2f3ee4e10 100644
--- a/tools/perf/tests/code-reading.c
+++ b/tools/perf/tests/code-reading.c
@@ -637,11 +637,11 @@ static int do_test_code_reading(bool try_kcore)
 
 		evlist__config(evlist, &opts, NULL);
 
-		evsel = evlist__first(evlist);
-
-		evsel->core.attr.comm = 1;
-		evsel->core.attr.disabled = 1;
-		evsel->core.attr.enable_on_exec = 0;
+		evlist__for_each_entry(evlist, evsel) {
+			evsel->core.attr.comm = 1;
+			evsel->core.attr.disabled = 1;
+			evsel->core.attr.enable_on_exec = 0;
+		}
 
 		ret = evlist__open(evlist);
 		if (ret < 0) {
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH 2/3] perf map: Remove kernel map before updating start and end addresses
  2024-04-09  8:47 [PATCH 0/3] perf test: "object code reading" test fixes James Clark
  2024-04-09  8:47 ` [PATCH 1/3] perf tests: Apply attributes to all events in object code reading test James Clark
@ 2024-04-09  8:47 ` James Clark
  2024-04-09  8:47 ` [PATCH 3/3] perf tests: Skip "test data symbol" on Neoverse N1 James Clark
  2 siblings, 0 replies; 8+ messages in thread
From: James Clark @ 2024-04-09  8:47 UTC (permalink / raw
  To: linux-perf-users, irogers
  Cc: James Clark, Peter Zijlstra, Ingo Molnar,
	Arnaldo Carvalho de Melo, Namhyung Kim, Mark Rutland,
	Alexander Shishkin, Jiri Olsa, Adrian Hunter, Liang, Kan,
	Athira Rajeev, Leo Yan, linux-kernel

In a debug build there is validation that mmap lists are sorted when
taking a lock. In machine__update_kernel_mmap() the start and end
addresses are updated resulting in an unsorted list before the map is
removed from the list. When the map is removed, the lock is taken which
triggers the validation and the failure:

  $ perf test "object code reading"
  --- start ---
  perf: util/maps.c:88: check_invariants: Assertion `map__start(prev) <= map__start(map)' failed.
  Aborted

Fix it by updating the addresses after removal, but before insertion.
The bug depends on the ordering and type of debug info on the system and
doesn't reproduce everywhere.

Fixes: 659ad3492b91 ("perf maps: Switch from rbtree to lazily sorted array for addresses")
Signed-off-by: James Clark <james.clark@arm.com>
---
 tools/perf/util/machine.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/tools/perf/util/machine.c b/tools/perf/util/machine.c
index 5eb9044bc223..a26c8bea58d0 100644
--- a/tools/perf/util/machine.c
+++ b/tools/perf/util/machine.c
@@ -1549,8 +1549,8 @@ static int machine__update_kernel_mmap(struct machine *machine,
 	updated = map__get(orig);
 
 	machine->vmlinux_map = updated;
-	machine__set_kernel_mmap(machine, start, end);
 	maps__remove(machine__kernel_maps(machine), orig);
+	machine__set_kernel_mmap(machine, start, end);
 	err = maps__insert(machine__kernel_maps(machine), updated);
 	map__put(orig);
 
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH 3/3] perf tests: Skip "test data symbol" on Neoverse N1
  2024-04-09  8:47 [PATCH 0/3] perf test: "object code reading" test fixes James Clark
  2024-04-09  8:47 ` [PATCH 1/3] perf tests: Apply attributes to all events in object code reading test James Clark
  2024-04-09  8:47 ` [PATCH 2/3] perf map: Remove kernel map before updating start and end addresses James Clark
@ 2024-04-09  8:47 ` James Clark
  2024-04-09 15:39   ` Ian Rogers
  2 siblings, 1 reply; 8+ messages in thread
From: James Clark @ 2024-04-09  8:47 UTC (permalink / raw
  To: linux-perf-users, irogers
  Cc: James Clark, Peter Zijlstra, Ingo Molnar,
	Arnaldo Carvalho de Melo, Namhyung Kim, Mark Rutland,
	Alexander Shishkin, Jiri Olsa, Adrian Hunter, Liang, Kan,
	Athira Rajeev, Leo Yan, linux-kernel

To prevent anyone from seeing a test failure appear as a regression and
thinking that it was caused by their code change, just skip the test on
N1.

It can be caused by any unrelated change that shifts the loop into an
unfortunate position in the Perf binary which is almost impossible to
debug as the root cause of the test failure. Ultimately it's caused by
the referenced errata.

Fixes: 60abedb8aa90 ("perf test: Introduce script for data symbol testing")
Signed-off-by: James Clark <james.clark@arm.com>
---
 tools/perf/tests/shell/test_data_symbol.sh | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/tools/perf/tests/shell/test_data_symbol.sh b/tools/perf/tests/shell/test_data_symbol.sh
index 3dfa91832aa8..ffc641d00aa4 100755
--- a/tools/perf/tests/shell/test_data_symbol.sh
+++ b/tools/perf/tests/shell/test_data_symbol.sh
@@ -16,6 +16,12 @@ skip_if_no_mem_event() {
 	return 2
 }
 
+# Skip on Arm N1 due to errata 1694299. Bias exists in SPE sampling
+# which can cause the load and store instructions to be skipped
+# entirely. This comes and goes randomly depending on the offset the
+# linker places the datasym loop at in the Perf binary.
+lscpu | grep -q "Neoverse-N1" && exit 2
+
 skip_if_no_mem_event || exit 2
 
 skip_test_missing_symbol buf1
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH 3/3] perf tests: Skip "test data symbol" on Neoverse N1
  2024-04-09  8:47 ` [PATCH 3/3] perf tests: Skip "test data symbol" on Neoverse N1 James Clark
@ 2024-04-09 15:39   ` Ian Rogers
  2024-04-09 23:17     ` Namhyung Kim
  2024-04-10  9:08     ` James Clark
  0 siblings, 2 replies; 8+ messages in thread
From: Ian Rogers @ 2024-04-09 15:39 UTC (permalink / raw
  To: James Clark
  Cc: linux-perf-users, Peter Zijlstra, Ingo Molnar,
	Arnaldo Carvalho de Melo, Namhyung Kim, Mark Rutland,
	Alexander Shishkin, Jiri Olsa, Adrian Hunter, Liang, Kan,
	Athira Rajeev, Leo Yan, linux-kernel

On Tue, Apr 9, 2024 at 1:48 AM James Clark <james.clark@arm.com> wrote:
>
> To prevent anyone from seeing a test failure appear as a regression and
> thinking that it was caused by their code change, just skip the test on
> N1.
>
> It can be caused by any unrelated change that shifts the loop into an
> unfortunate position in the Perf binary which is almost impossible to
> debug as the root cause of the test failure. Ultimately it's caused by
> the referenced errata.
>
> Fixes: 60abedb8aa90 ("perf test: Introduce script for data symbol testing")
> Signed-off-by: James Clark <james.clark@arm.com>

This change makes me sad :-( Is there no hope of aligning the loop? We
have little enough testing coverage for memory events and even precise
events on ARM that anything take away testing coverage feels like we
should try to do better.

Which models are we losing coverage for, presumably neoverse-n1 but
what about neoverse-v1 and neoverse-n2-v2?

If aligning the loop doesn't work, could we use objdump and check its
alignment skipping when it is off? Or run the test and turn fails to
skip on neoverse-n1 - so we get some coverage testing.

It would also be nice if the change didn't add a dependency on lscpu
for the sake of parsing /proc/cpuinfo, I see another arm test already
did this test_arm_callgraph_fp.sh - that case looks like it should be
using uname.

Thanks,
Ian

> ---
>  tools/perf/tests/shell/test_data_symbol.sh | 6 ++++++
>  1 file changed, 6 insertions(+)
>
> diff --git a/tools/perf/tests/shell/test_data_symbol.sh b/tools/perf/tests/shell/test_data_symbol.sh
> index 3dfa91832aa8..ffc641d00aa4 100755
> --- a/tools/perf/tests/shell/test_data_symbol.sh
> +++ b/tools/perf/tests/shell/test_data_symbol.sh
> @@ -16,6 +16,12 @@ skip_if_no_mem_event() {
>         return 2
>  }
>
> +# Skip on Arm N1 due to errata 1694299. Bias exists in SPE sampling
> +# which can cause the load and store instructions to be skipped
> +# entirely. This comes and goes randomly depending on the offset the
> +# linker places the datasym loop at in the Perf binary.
> +lscpu | grep -q "Neoverse-N1" && exit 2
> +
>  skip_if_no_mem_event || exit 2
>
>  skip_test_missing_symbol buf1
> --
> 2.34.1
>

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH 3/3] perf tests: Skip "test data symbol" on Neoverse N1
  2024-04-09 15:39   ` Ian Rogers
@ 2024-04-09 23:17     ` Namhyung Kim
  2024-04-10  9:07       ` James Clark
  2024-04-10  9:08     ` James Clark
  1 sibling, 1 reply; 8+ messages in thread
From: Namhyung Kim @ 2024-04-09 23:17 UTC (permalink / raw
  To: Ian Rogers
  Cc: James Clark, linux-perf-users, Peter Zijlstra, Ingo Molnar,
	Arnaldo Carvalho de Melo, Mark Rutland, Alexander Shishkin,
	Jiri Olsa, Adrian Hunter, Liang, Kan, Athira Rajeev, Leo Yan,
	linux-kernel

Hi Ian and James,

On Tue, Apr 9, 2024 at 8:39 AM Ian Rogers <irogers@google.com> wrote:
>
> On Tue, Apr 9, 2024 at 1:48 AM James Clark <james.clark@arm.com> wrote:
> >
> > To prevent anyone from seeing a test failure appear as a regression and
> > thinking that it was caused by their code change, just skip the test on
> > N1.
> >
> > It can be caused by any unrelated change that shifts the loop into an
> > unfortunate position in the Perf binary which is almost impossible to
> > debug as the root cause of the test failure. Ultimately it's caused by
> > the referenced errata.
> >
> > Fixes: 60abedb8aa90 ("perf test: Introduce script for data symbol testing")
> > Signed-off-by: James Clark <james.clark@arm.com>
>
> This change makes me sad :-( Is there no hope of aligning the loop? We
> have little enough testing coverage for memory events and even precise
> events on ARM that anything take away testing coverage feels like we
> should try to do better.

Can we just add some noise in the loop?

Thanks,
Namhyung


diff --git a/tools/perf/tests/workloads/datasym.c
b/tools/perf/tests/workloads/datasym.c
index ddd40bc63448..e2514bf393cd 100644
--- a/tools/perf/tests/workloads/datasym.c
+++ b/tools/perf/tests/workloads/datasym.c
@@ -16,6 +16,8 @@ static int datasym(int argc __maybe_unused, const
char **argv __maybe_unused)
 {
        for (;;) {
                buf1.data1++;
+               if ((buf1.data1 % 100129) == 0)
+                       buf1.data1++;
                buf1.data2 += buf1.data1;
        }
        return 0;

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH 3/3] perf tests: Skip "test data symbol" on Neoverse N1
  2024-04-09 23:17     ` Namhyung Kim
@ 2024-04-10  9:07       ` James Clark
  0 siblings, 0 replies; 8+ messages in thread
From: James Clark @ 2024-04-10  9:07 UTC (permalink / raw
  To: Namhyung Kim, Ian Rogers
  Cc: linux-perf-users, Peter Zijlstra, Ingo Molnar,
	Arnaldo Carvalho de Melo, Mark Rutland, Alexander Shishkin,
	Jiri Olsa, Adrian Hunter, Liang, Kan, Athira Rajeev, Leo Yan,
	linux-kernel



On 10/04/2024 00:17, Namhyung Kim wrote:
> Hi Ian and James,
> 
> On Tue, Apr 9, 2024 at 8:39 AM Ian Rogers <irogers@google.com> wrote:
>>
>> On Tue, Apr 9, 2024 at 1:48 AM James Clark <james.clark@arm.com> wrote:
>>>
>>> To prevent anyone from seeing a test failure appear as a regression and
>>> thinking that it was caused by their code change, just skip the test on
>>> N1.
>>>
>>> It can be caused by any unrelated change that shifts the loop into an
>>> unfortunate position in the Perf binary which is almost impossible to
>>> debug as the root cause of the test failure. Ultimately it's caused by
>>> the referenced errata.
>>>
>>> Fixes: 60abedb8aa90 ("perf test: Introduce script for data symbol testing")
>>> Signed-off-by: James Clark <james.clark@arm.com>
>>
>> This change makes me sad :-( Is there no hope of aligning the loop? We
>> have little enough testing coverage for memory events and even precise
>> events on ARM that anything take away testing coverage feels like we
>> should try to do better.
> 
> Can we just add some noise in the loop?
> 
> Thanks,
> Namhyung
> 

Yes that would probably work. I decided to skip rather than touch the
test because I didn't want the errata on one specific core to affect
testing everywhere else.

But if we don't think that it's too hacky to include that in the test
then I can add it. To be honest maybe it makes the test more "realistic"
because a very short infinite loop doesn't really represent a real workload.

> 
> diff --git a/tools/perf/tests/workloads/datasym.c
> b/tools/perf/tests/workloads/datasym.c
> index ddd40bc63448..e2514bf393cd 100644
> --- a/tools/perf/tests/workloads/datasym.c
> +++ b/tools/perf/tests/workloads/datasym.c
> @@ -16,6 +16,8 @@ static int datasym(int argc __maybe_unused, const
> char **argv __maybe_unused)
>  {
>         for (;;) {
>                 buf1.data1++;
> +               if ((buf1.data1 % 100129) == 0)
> +                       buf1.data1++;
>                 buf1.data2 += buf1.data1;
>         }
>         return 0;
> 

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH 3/3] perf tests: Skip "test data symbol" on Neoverse N1
  2024-04-09 15:39   ` Ian Rogers
  2024-04-09 23:17     ` Namhyung Kim
@ 2024-04-10  9:08     ` James Clark
  1 sibling, 0 replies; 8+ messages in thread
From: James Clark @ 2024-04-10  9:08 UTC (permalink / raw
  To: Ian Rogers
  Cc: linux-perf-users, Peter Zijlstra, Ingo Molnar,
	Arnaldo Carvalho de Melo, Namhyung Kim, Mark Rutland,
	Alexander Shishkin, Jiri Olsa, Adrian Hunter, Liang, Kan,
	Athira Rajeev, Leo Yan, linux-kernel



On 09/04/2024 16:39, Ian Rogers wrote:
> On Tue, Apr 9, 2024 at 1:48 AM James Clark <james.clark@arm.com> wrote:
>>
>> To prevent anyone from seeing a test failure appear as a regression and
>> thinking that it was caused by their code change, just skip the test on
>> N1.
>>
>> It can be caused by any unrelated change that shifts the loop into an
>> unfortunate position in the Perf binary which is almost impossible to
>> debug as the root cause of the test failure. Ultimately it's caused by
>> the referenced errata.
>>
>> Fixes: 60abedb8aa90 ("perf test: Introduce script for data symbol testing")
>> Signed-off-by: James Clark <james.clark@arm.com>
> 
> This change makes me sad :-( Is there no hope of aligning the loop? We
> have little enough testing coverage for memory events and even precise
> events on ARM that anything take away testing coverage feels like we
> should try to do better.
> 
> Which models are we losing coverage for, presumably neoverse-n1 but
> what about neoverse-v1 and neoverse-n2-v2?
> 
> If aligning the loop doesn't work, could we use objdump and check its
> alignment skipping when it is off? Or run the test and turn fails to
> skip on neoverse-n1 - so we get some coverage testing.
> 
> It would also be nice if the change didn't add a dependency on lscpu
> for the sake of parsing /proc/cpuinfo, I see another arm test already
> did this test_arm_callgraph_fp.sh - that case looks like it should be
> using uname.
> 

I'll make the change to add the noise to the loop, which will drop this
lscpu addition. And I'll fix up test_arm_callgraph_fp.sh while I'm at it.

> Thanks,
> Ian
> 
>> ---
>>  tools/perf/tests/shell/test_data_symbol.sh | 6 ++++++
>>  1 file changed, 6 insertions(+)
>>
>> diff --git a/tools/perf/tests/shell/test_data_symbol.sh b/tools/perf/tests/shell/test_data_symbol.sh
>> index 3dfa91832aa8..ffc641d00aa4 100755
>> --- a/tools/perf/tests/shell/test_data_symbol.sh
>> +++ b/tools/perf/tests/shell/test_data_symbol.sh
>> @@ -16,6 +16,12 @@ skip_if_no_mem_event() {
>>         return 2
>>  }
>>
>> +# Skip on Arm N1 due to errata 1694299. Bias exists in SPE sampling
>> +# which can cause the load and store instructions to be skipped
>> +# entirely. This comes and goes randomly depending on the offset the
>> +# linker places the datasym loop at in the Perf binary.
>> +lscpu | grep -q "Neoverse-N1" && exit 2
>> +
>>  skip_if_no_mem_event || exit 2
>>
>>  skip_test_missing_symbol buf1
>> --
>> 2.34.1
>>

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2024-04-10  9:08 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-04-09  8:47 [PATCH 0/3] perf test: "object code reading" test fixes James Clark
2024-04-09  8:47 ` [PATCH 1/3] perf tests: Apply attributes to all events in object code reading test James Clark
2024-04-09  8:47 ` [PATCH 2/3] perf map: Remove kernel map before updating start and end addresses James Clark
2024-04-09  8:47 ` [PATCH 3/3] perf tests: Skip "test data symbol" on Neoverse N1 James Clark
2024-04-09 15:39   ` Ian Rogers
2024-04-09 23:17     ` Namhyung Kim
2024-04-10  9:07       ` James Clark
2024-04-10  9:08     ` James Clark

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).