[Questions] perf c2c: What's the current status of perf c2c?

All the mail mirrored from lore.kernel.org
 help / color / mirror / Atom feed

* [Questions] perf c2c: What's the current status of perf c2c?
@ 2015-12-09  4:06 Yunlong Song
  2015-12-09  8:04 ` Jiri Olsa
  0 siblings, 1 reply; 14+ messages in thread
From: Yunlong Song @ 2015-12-09  4:06 UTC (permalink / raw)
  To: dzickus
  Cc: dsahern, fweisbec, jmario, efault, paulus, Peter Zijlstra,
	rfowles, eranian,
	acme@kernel.org >> Arnaldo Carvalho de Melo, mingo,
	Linux Kernel Mailing List, Jiri Olsa,
	wangnan0@huawei.com >> Wang Nan, fowles, Namhyung Kim, andi

Hi, Don,
    I am interested in the perf c2c tool, which is introduced in: http://lwn.net/Articles/588866/
However, I found that this tool has not been applied to the mainline tree of perf, Why? It was first
introduced in Feb. 2014. What's its current status now? Does it have a new version or a repository
somewhere else? And does it support Haswell?

-- 
Thanks,
Yunlong Song


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [Questions] perf c2c: What's the current status of perf c2c?
  2015-12-09  4:06 [Questions] perf c2c: What's the current status of perf c2c? Yunlong Song
@ 2015-12-09  8:04 ` Jiri Olsa
  2015-12-09  8:12   ` Wangnan (F)
  2015-12-09  9:34   ` Peter Zijlstra
  0 siblings, 2 replies; 14+ messages in thread
From: Jiri Olsa @ 2015-12-09  8:04 UTC (permalink / raw)
  To: Yunlong Song
  Cc: dzickus, dsahern, fweisbec, jmario, efault, paulus,
	Peter Zijlstra, rfowles, eranian,
	acme@kernel.org >> Arnaldo Carvalho de Melo, mingo,
	Linux Kernel Mailing List, Jiri Olsa,
	wangnan0@huawei.com >> Wang Nan, fowles, Namhyung Kim, andi

On Wed, Dec 09, 2015 at 12:06:44PM +0800, Yunlong Song wrote:
> Hi, Don,
>     I am interested in the perf c2c tool, which is introduced in: http://lwn.net/Articles/588866/
> However, I found that this tool has not been applied to the mainline tree of perf, Why? It was first
> introduced in Feb. 2014. What's its current status now? Does it have a new version or a repository
> somewhere else? And does it support Haswell?

hi,
not sure Don made any progress on this field, but I'm having
his c2c sources rebased current perf sources ATM.

I changed the tool a little to run over new DATALA events
added in Haswell (in addition to ldlat events) and it seems
to work.

the plan for me is to to use it some more to prove it's useful
and kick it to be merged with perf at some point

jirka

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [Questions] perf c2c: What's the current status of perf c2c?
  2015-12-09  8:04 ` Jiri Olsa
@ 2015-12-09  8:12   ` Wangnan (F)
  2015-12-09  9:11     ` Jiri Olsa
  2015-12-09  9:34   ` Peter Zijlstra
  1 sibling, 1 reply; 14+ messages in thread
From: Wangnan (F) @ 2015-12-09  8:12 UTC (permalink / raw)
  To: Jiri Olsa, Yunlong Song
  Cc: dzickus, dsahern, fweisbec, jmario, efault, paulus,
	Peter Zijlstra, rfowles, eranian,
	acme@kernel.org >> Arnaldo Carvalho de Melo, mingo,
	Linux Kernel Mailing List, Jiri Olsa, fowles, Namhyung Kim, andi



On 2015/12/9 16:04, Jiri Olsa wrote:
> On Wed, Dec 09, 2015 at 12:06:44PM +0800, Yunlong Song wrote:
>> Hi, Don,
>>      I am interested in the perf c2c tool, which is introduced in: http://lwn.net/Articles/588866/
>> However, I found that this tool has not been applied to the mainline tree of perf, Why? It was first
>> introduced in Feb. 2014. What's its current status now? Does it have a new version or a repository
>> somewhere else? And does it support Haswell?
> hi,
> not sure Don made any progress on this field, but I'm having
> his c2c sources rebased current perf sources ATM.

Do you have a git repository so we can fetch the
code of it?

Thank you.


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [Questions] perf c2c: What's the current status of perf c2c?
  2015-12-09  8:12   ` Wangnan (F)
@ 2015-12-09  9:11     ` Jiri Olsa
  0 siblings, 0 replies; 14+ messages in thread
From: Jiri Olsa @ 2015-12-09  9:11 UTC (permalink / raw)
  To: Wangnan (F)
  Cc: Yunlong Song, dzickus, dsahern, fweisbec, jmario, efault, paulus,
	Peter Zijlstra, rfowles, eranian,
	acme@kernel.org >> Arnaldo Carvalho de Melo, mingo,
	Linux Kernel Mailing List, Jiri Olsa, fowles, Namhyung Kim, andi

On Wed, Dec 09, 2015 at 04:12:36PM +0800, Wangnan (F) wrote:
> 
> 
> On 2015/12/9 16:04, Jiri Olsa wrote:
> >On Wed, Dec 09, 2015 at 12:06:44PM +0800, Yunlong Song wrote:
> >>Hi, Don,
> >>     I am interested in the perf c2c tool, which is introduced in: http://lwn.net/Articles/588866/
> >>However, I found that this tool has not been applied to the mainline tree of perf, Why? It was first
> >>introduced in Feb. 2014. What's its current status now? Does it have a new version or a repository
> >>somewhere else? And does it support Haswell?
> >hi,
> >not sure Don made any progress on this field, but I'm having
> >his c2c sources rebased current perf sources ATM.
> 
> Do you have a git repository so we can fetch the
> code of it?

yes, but it makes my eyes bleed ;-) I have some hacks on
top of Don's changes which I'm ashamed to share ATM

let me kick it into some reasonable shape first

jirka

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [Questions] perf c2c: What's the current status of perf c2c?
  2015-12-09  8:04 ` Jiri Olsa
  2015-12-09  8:12   ` Wangnan (F)
@ 2015-12-09  9:34   ` Peter Zijlstra
  2015-12-09 10:58     ` Peter Zijlstra
                       ` (3 more replies)
  1 sibling, 4 replies; 14+ messages in thread
From: Peter Zijlstra @ 2015-12-09  9:34 UTC (permalink / raw)
  To: Jiri Olsa
  Cc: Yunlong Song, dzickus, dsahern, fweisbec, jmario, efault, paulus,
	rfowles, eranian,
	acme@kernel.org >> Arnaldo Carvalho de Melo, mingo,
	Linux Kernel Mailing List, Jiri Olsa,
	wangnan0@huawei.com >> Wang Nan, fowles, Namhyung Kim, andi

On Wed, Dec 09, 2015 at 09:04:40AM +0100, Jiri Olsa wrote:
> On Wed, Dec 09, 2015 at 12:06:44PM +0800, Yunlong Song wrote:
> > Hi, Don,
> >     I am interested in the perf c2c tool, which is introduced in: http://lwn.net/Articles/588866/
> > However, I found that this tool has not been applied to the mainline tree of perf, Why? It was first
> > introduced in Feb. 2014. What's its current status now? Does it have a new version or a repository
> > somewhere else? And does it support Haswell?
> 
> hi,
> not sure Don made any progress on this field, but I'm having
> his c2c sources rebased current perf sources ATM.
> 
> I changed the tool a little to run over new DATALA events
> added in Haswell (in addition to ldlat events) and it seems
> to work.
> 
> the plan for me is to to use it some more to prove it's useful
> and kick it to be merged with perf at some point

So I never really liked the c2c tool because it was so narrowly
focussed, it only works on NUMA thingies IIRC.

I would much rather see a tool that uses PEBS events and does a dwarf
decode of the exact instruction's data reference -- without relying on
data address bits.

That is; suppose we measure LLC_MISS, even if we have a
data-address, as soon as its inside a dynamically allocated object,
you're lost.

However, since we have the exact instruction we can simply look at that.
Imagine something like:

struct foo {
	int blah;
	int val;
	int array[];
};

struct bar {
	struct foo *foo;
}

int foobar(struct bar *bar)
{
	return bar->foo->val;
}

Which we can imagine could result in code like:

foobar:
	mov (%rax), %rax	# load bar::foo
	mov (%rax,1,4), %rax	# load foo::val

And DWARFs should know this, so by knowing the instruction we can know
which load missed the cache.

Once you have this information, you can use pahole like structure output
and heat colour them or whatnot. Bright red if you miss lots etc..

Now currently this is possible but a bit of work because the DWARF
annotations are not exactly following these data types, that is you
might need to decode previous instructions and infer some bits.

I think Stephane was working with GCC people to allow more/better DWARF
annotations and allow easier retrieval of this information.

Note: the proposed scheme still have some holes in, imagine trying to
load an array[] member like:

	mov 8(%rax, %rcx, 4), %rcx

This would load the array element indexed by RCX into RCX, thereby
destroying the index. In this case knowing the data address you can
still compute the index if you also know RAX (which you get from the
PEBS register dump).

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [Questions] perf c2c: What's the current status of perf c2c?
  2015-12-09  9:34   ` Peter Zijlstra
@ 2015-12-09 10:58     ` Peter Zijlstra
  2015-12-09 11:09     ` Joe Mario
                       ` (2 subsequent siblings)
  3 siblings, 0 replies; 14+ messages in thread
From: Peter Zijlstra @ 2015-12-09 10:58 UTC (permalink / raw)
  To: Jiri Olsa
  Cc: Yunlong Song, dzickus, dsahern, fweisbec, jmario, efault, paulus,
	rfowles, eranian,
	acme@kernel.org >> Arnaldo Carvalho de Melo, mingo,
	Linux Kernel Mailing List, Jiri Olsa,
	wangnan0@huawei.com >> Wang Nan, fowles, Namhyung Kim, andi

On Wed, Dec 09, 2015 at 10:34:02AM +0100, Peter Zijlstra wrote:
> Which we can imagine could result in code like:
> 
> foobar:
> 	mov (%rax), %rax	# load bar::foo
> 	mov (%rax,1,4), %rax	# load foo::val
> 
> 
> And DWARFs should know this, so by knowing the instruction we can know
> which load missed the cache.
> 
> Once you have this information, you can use pahole like structure output
> and heat colour them or whatnot. Bright red if you miss lots etc..
> 
> Now currently this is possible but a bit of work because the DWARF
> annotations are not exactly following these data types, that is you
> might need to decode previous instructions and infer some bits.

To clarify, current DWARFs might only know the argument to foobar is of
type struct bar *, and we'll have to infer the rest.

> I think Stephane was working with GCC people to allow more/better DWARF
> annotations and allow easier retrieval of this information.

And even if that gets sorted, it might still make sense to implement the
hard case as per the above, because it'll take a long time before
everything is build with the fancy new GCC/dwarf output.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [Questions] perf c2c: What's the current status of perf c2c?
  2015-12-09  9:34   ` Peter Zijlstra
  2015-12-09 10:58     ` Peter Zijlstra
@ 2015-12-09 11:09     ` Joe Mario
       [not found]     ` <1891297138.17374838.1449658964938.JavaMail.zimbra@redhat.com>
  2015-12-09 16:58     ` Andi Kleen
  3 siblings, 0 replies; 14+ messages in thread
From: Joe Mario @ 2015-12-09 11:09 UTC (permalink / raw)
  To: Peter Zijlstra, Jiri Olsa
  Cc: Yunlong Song, dzickus, dsahern, fweisbec, efault, paulus, rfowles,
	eranian, acme@kernel.org >> Arnaldo Carvalho de Melo, mingo,
	Linux Kernel Mailing List, Jiri Olsa,
	wangnan0@huawei.com >> Wang Nan, fowles, Namhyung Kim, andi

[RESEND - this time w/o html junk]

On 12/09/2015 04:34 AM, Peter Zijlstra wrote:
> On Wed, Dec 09, 2015 at 09:04:40AM +0100, Jiri Olsa wrote:
>> On Wed, Dec 09, 2015 at 12:06:44PM +0800, Yunlong Song wrote:
>>> Hi, Don,
>>>      I am interested in the perf c2c tool, which is introduced in: http://lwn.net/Articles/588866/
>>> However, I found that this tool has not been applied to the mainline tree of perf, Why? It was first
>>> introduced in Feb. 2014. What's its current status now? Does it have a new version or a repository
>>> somewhere else? And does it support Haswell?
>>
>> hi,
>> not sure Don made any progress on this field, but I'm having
>> his c2c sources rebased current perf sources ATM.
>>

<snip>

> So I never really liked the c2c tool because it was so narrowly
> focussed, it only works on NUMA thingies IIRC.
>
> I would much rather see a tool that uses PEBS events and does a dwarf
> decode of the exact instruction's data reference -- without relying on
> data address bits.

Peter:
Yes, that would be a great enhancement, but is it any reason to hold up the current implementation?

I've been using "perf c2c" heavily with customers over the past two years and after they see what it can do, their first question is why it hasn't been checked in upstream yet.

Joe



^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [Questions] perf c2c: What's the current status of perf c2c?
       [not found]     ` <1891297138.17374838.1449658964938.JavaMail.zimbra@redhat.com>
@ 2015-12-09 14:03       ` Peter Zijlstra
  0 siblings, 0 replies; 14+ messages in thread
From: Peter Zijlstra @ 2015-12-09 14:03 UTC (permalink / raw)
  To: Joe Mario
  Cc: Jiri Olsa, Yunlong Song, dzickus, dsahern, fweisbec, efault,
	paulus, rfowles, eranian,
	acme@kernel.org >> Arnaldo Carvalho de Melo, mingo,
	Linux Kernel Mailing List, Jiri Olsa,
	wangnan0@huawei.com >> Wang Nan, fowles, Namhyung Kim, andi

On Wed, Dec 09, 2015 at 06:02:44AM -0500, Joe Mario wrote:
> Yes, that would be a great enhancement,

This is hardly new though; I've outlined the very same the first time
the c2c thing got mentioned.

> but is it any reason to hold up the current implementation?

I just wonder how much of c2c is still useful once we get it done
proper. And once such a tool is out there, its hard to kill, leaving us
with a maintenance burden we could do without.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [Questions] perf c2c: What's the current status of perf c2c?
  2015-12-09  9:34   ` Peter Zijlstra
                       ` (2 preceding siblings ...)
       [not found]     ` <1891297138.17374838.1449658964938.JavaMail.zimbra@redhat.com>
@ 2015-12-09 16:58     ` Andi Kleen
  2015-12-09 17:15       ` Stephane Eranian
  3 siblings, 1 reply; 14+ messages in thread
From: Andi Kleen @ 2015-12-09 16:58 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Jiri Olsa, Yunlong Song, dzickus, dsahern, fweisbec, jmario,
	efault, paulus, rfowles, eranian,
	acme@kernel.org >> Arnaldo Carvalho de Melo, mingo,
	Linux Kernel Mailing List, Jiri Olsa,
	wangnan0@huawei.com >> Wang Nan, fowles, Namhyung Kim, andi

> > the plan for me is to to use it some more to prove it's useful
> > and kick it to be merged with perf at some point
> 
> So I never really liked the c2c tool because it was so narrowly
> focussed, it only works on NUMA thingies IIRC.

It should work on all systems with an Intel Core (not Atom)

However it was never clear to me if the tool was any better
than simply sampling for 

mem_load_uops_l3_hit_retired.xsnp_hitm:pp    (local socket)
mem_load_uops_l3_miss_retired.remote_hitm:pp (remote socket)

which gives you instructions that reference bouncing cache lines.

-Andi

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [Questions] perf c2c: What's the current status of perf c2c?
  2015-12-09 16:58     ` Andi Kleen
@ 2015-12-09 17:15       ` Stephane Eranian
  2015-12-09 17:21         ` Andi Kleen
  2015-12-09 20:41         ` Joe Mario
  0 siblings, 2 replies; 14+ messages in thread
From: Stephane Eranian @ 2015-12-09 17:15 UTC (permalink / raw)
  To: Andi Kleen
  Cc: Peter Zijlstra, Jiri Olsa, Yunlong Song, Don Zickus, David Ahern,
	Frédéric Weisbecker, Joe Mario, Mike Galbraith,
	Paul Mackerras, Richard Fowles,
	acme@kernel.org >> Arnaldo Carvalho de Melo,
	mingo@redhat.com, Linux Kernel Mailing List, Jiri Olsa,
	wangnan0@huawei.com >> Wang Nan, Richard Fowles,
	Namhyung Kim

Hi,

On Wed, Dec 9, 2015 at 8:58 AM, Andi Kleen <andi@firstfloor.org> wrote:
>> > the plan for me is to to use it some more to prove it's useful
>> > and kick it to be merged with perf at some point
>>
>> So I never really liked the c2c tool because it was so narrowly
>> focussed, it only works on NUMA thingies IIRC.
>
> It should work on all systems with an Intel Core (not Atom)
>
> However it was never clear to me if the tool was any better
> than simply sampling for
>
> mem_load_uops_l3_hit_retired.xsnp_hitm:pp    (local socket)
> mem_load_uops_l3_miss_retired.remote_hitm:pp (remote socket)
>
> which gives you instructions that reference bouncing cache lines.
>
If I recall the c2c tool is giving you more than the bouncing line. It shows you
the offset inside the line and the participating CPUs.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [Questions] perf c2c: What's the current status of perf c2c?
  2015-12-09 17:15       ` Stephane Eranian
@ 2015-12-09 17:21         ` Andi Kleen
  2015-12-09 19:48           ` Stephane Eranian
  2015-12-09 20:41         ` Joe Mario
  1 sibling, 1 reply; 14+ messages in thread
From: Andi Kleen @ 2015-12-09 17:21 UTC (permalink / raw)
  To: Stephane Eranian
  Cc: Andi Kleen, Peter Zijlstra, Jiri Olsa, Yunlong Song, Don Zickus,
	David Ahern, Frédéric Weisbecker, Joe Mario,
	Mike Galbraith, Paul Mackerras, Richard Fowles,
	acme@kernel.org >> Arnaldo Carvalho de Melo,
	mingo@redhat.com, Linux Kernel Mailing List, Jiri Olsa,
	wangnan0@huawei.com >> Wang Nan, Richard Fowles,
	Namhyung Kim

> If I recall the c2c tool is giving you more than the bouncing line. It shows you
> the offset inside the line and the participating CPUs.

On Haswell and later you could get the same with the normal address
reporting. The events above support DLA.

-Andi

-- 
ak@linux.intel.com -- Speaking for myself only.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [Questions] perf c2c: What's the current status of perf c2c?
  2015-12-09 17:21         ` Andi Kleen
@ 2015-12-09 19:48           ` Stephane Eranian
  0 siblings, 0 replies; 14+ messages in thread
From: Stephane Eranian @ 2015-12-09 19:48 UTC (permalink / raw)
  To: Andi Kleen
  Cc: Peter Zijlstra, Jiri Olsa, Yunlong Song, Don Zickus, David Ahern,
	Frédéric Weisbecker, Joe Mario, Mike Galbraith,
	Paul Mackerras, Richard Fowles,
	acme@kernel.org >> Arnaldo Carvalho de Melo,
	mingo@redhat.com, Linux Kernel Mailing List, Jiri Olsa,
	wangnan0@huawei.com >> Wang Nan, Richard Fowles,
	Namhyung Kim

On Wed, Dec 9, 2015 at 9:21 AM, Andi Kleen <andi@firstfloor.org> wrote:
>> If I recall the c2c tool is giving you more than the bouncing line. It shows you
>> the offset inside the line and the participating CPUs.
>
> On Haswell and later you could get the same with the normal address
> reporting. The events above support DLA.
>
I know the events track the condition better than just with the
latency threshold.
I think what it boils down to is not so much the PMU side but  rather
the tool side.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [Questions] perf c2c: What's the current status of perf c2c?
  2015-12-09 17:15       ` Stephane Eranian
  2015-12-09 17:21         ` Andi Kleen
@ 2015-12-09 20:41         ` Joe Mario
  2015-12-10  2:36           ` Yunlong Song
  1 sibling, 1 reply; 14+ messages in thread
From: Joe Mario @ 2015-12-09 20:41 UTC (permalink / raw)
  To: Stephane Eranian, Andi Kleen
  Cc: Peter Zijlstra, Jiri Olsa, Yunlong Song, Don Zickus, David Ahern,
	Frédéric Weisbecker, Mike Galbraith, Paul Mackerras,
	acme@kernel.org >> Arnaldo Carvalho de Melo,
	mingo@redhat.com, Linux Kernel Mailing List, Jiri Olsa,
	wangnan0@huawei.com >> Wang Nan, Richard Fowles,
	Namhyung Kim

On 12/09/2015 12:15 PM, Stephane Eranian wrote:
> If I recall the c2c tool is giving you more than the bouncing line. It shows you
> the offset inside the line and the participating CPUs.

Correct.  It shows much more than the bouncing line.

Appended below is the  output for running "perf c2c" on a 4-node system
running a multi-thread version of linpack.  I've annotated it show describe
what some of the fields mean.

Note, your screen output has to be set pretty wide to read it.  For those
not wanting to read it in their mailer, grab it from:
    http://people.redhat.com/jmario/perf_c2c/perf_c2c_annotated_output.txt

Let me know of any questions.  My annotations begin with "// ".

Joe
--------------------------------------------------------------------
// Perf c2c output from a linpack run.

// Set screen wide to view this.

// Here's a breakdown of all loads and stores sampled.
// It shows where they hit and missed.

=================================================
             Trace Event Information
=================================================
   Total records                     :    3229269
   Locked Load/Store Operations      :      64420
   Load Operations                   :    1153827
   Loads - uncacheable               :         11
   Loads - IO                        :          0
   Loads - Miss                      :      11002
   Loads - no mapping                :        200
   Load Fill Buffer Hit              :     355942
   Load L1D hit                      :     361303
   Load L2D hit                      :      46792
   Load LLC hit                      :     274265
   Load Local HITM                   :      18647
   Load Remote HITM                  :      55225
   Load Remote HIT                   :       8917
   Load Local DRAM                   :      11895
   Load Remote DRAM                  :      28275
   Load MESI State Exclusive         :      40170
   Load MESI State Shared            :          0
   Load LLC Misses                   :     104312
   LLC Misses to Local DRAM          :       11.4%
   LLC Misses to Remote DRAM         :       27.1%
   LLC Misses to Remote cache (HIT)  :        8.5%
   LLC Misses to Remote cache (HITM) :       52.9%
   Store Operations                  :    2069610
   Store - uncacheable               :          0
   Store - no mapping                :       3538
   Store L1D Hit                     :    1889168
   Store L1D Miss                    :     176904
   No Page Map Rejects               :     102146
   Unable to parse data source       :       5832


// Table showing activity on the shared cache lines.

=================================================
     Global Shared Cache Line Event Information
=================================================
   Total Shared Cache Lines          :      14213
   Load HITs on shared lines         :     426101
   Fill Buffer Hits on shared lines  :     159230
   L1D hits on shared lines          :      77377
   L2D hits on shared lines          :      28897
   LLC hits on shared lines          :      94709
   Locked Access on shared lines     :      45704
   Store HITs on shared lines        :     198050
   Store L1D hits on shared lines    :     188136
   Total Merged records              :     271778


// In the next table, for each of the 10 hottest cache lines, breakout all the activity for that line.
// The sorting is done by remote hitm percentage, defined as loads that hit in a remote node's modified cacheline.

================================================================================================================================================================================================================
                                                                                           Shared Data Cache Line Table

                                  Total     %All                Total       ---- Core Load Hit ----  -- LLC Load Hit --     ----- LLC Load Hitm -----     -- Load Dram --     LLC       ---- Store Reference ----
    Index           Phys Adrs   Records   Ld Miss     %hitm     Loads        FB       L1D       L2D       Lcl       Rmt     Total       Lcl       Rmt       Lcl       Rmt   Ld Miss     Total     L1Hit    L1Miss
================================================================================================================================================================================================================
        0      0x7f9d8833bf80     12402     1.87%     3.54%     11748      6582      2259        72        24        20      2788       835      1953         2         1      1976       654       648         6
        1       0x85fd0f0c5c0      5593     0.35%     0.65%      3813       797      2131       153       171        32       457        97       360        20        52       464      1780      1780         0
        2       0x81fd0ddfd00      2174     0.21%     0.40%      1352       318       558        50        75        13       315        94       221         9        14       257       822       822         0
        3       0x85fd0f0c8c0      1854     0.21%     0.39%      1436       184       655       153        92        28       279        65       214        13        32       287       418       418         0
        4       0x83fffb17280       735     0.16%     0.31%       543         3       287        12        15         2       216        44       172         2         6       182       192       192         0
        5      0x7fff81db1e00      3641     0.16%     0.30%      3331      1111       849       624       384        83       228        62       166        15        37       301       310       310         0
        6       0x83fff857280       755     0.15%     0.29%       563         3       316        13        24         0       201        43       158         3         3       164       192       192         0
        7       0x85fff917280       801     0.15%     0.28%       648         1       409        19        19         1       198        44       154         1         0       156       153       153         0
        8       0x85fffc17280       767     0.14%     0.27%       664         4       403        27        15         2       205        54       151         2         6       161       103       103         0
        9       0x85fffbf7280       718     0.14%     0.27%       495         5       262        15        14         0       193        42       151         1         5       157       223       223         0


======================================================================================================================================================================================================================================================================

                                                                                                                  Shared Cache Line Distribution Pareto

      ---- All ----  -- Shared --    ---- HITM ----                                                                        Load Inst Execute Latency                                                            Shared Data Participants
        Data Misses   Data Misses   Remote    Local  -- Store Refs --
                                                                                                                           ---- cycles ----             cpu
  Num  %dist  %cumm  %dist  %cumm  LLCmiss   LLChit   L1 hit  L1 Miss       Data Address    Pid    Tid       Inst Address   median     mean     CV      cnt Symbol                         Object               Node{cpus %hitms %stores} Node{cpus %hitms %stores} ...
======================================================================================================================================================================================================================================================================
-----------------------------------------------------------------------------------------------
    0   1.9%   1.9%   3.5%   3.5%     1953      835      648        6     0x7f9d8833bf80 146560
-----------------------------------------------------------------------------------------------
                                      0.3%     0.4%    97.2%     0.0%               0x10 146560    ***           0x9969fa     2036     2265    12.4%     80 __kmp_acquire_queuing_lock_wi  xlinpack_xeon64       0{20  16.7%  25.2%}  1{21  33.3%  24.0%}  2{20  33.3%  25.7%}  3{19  16.7%  25.1%}
                                     88.9%    89.1%     0.0%     0.0%               0x10 146560    ***           0x996dfc      304      349     0.4%     79 __kmp_release_queuing_lock_wi  xlinpack_xeon64       0{20  23.7%    n/a}  1{21  24.5%    n/a}  2{19  26.8%    n/a}  3{19  25.0%    n/a}
                                      0.1%     0.0%     0.0%     0.0%               0x14 146560 146608           0x9969ac      410      410     0.0%      1 __kmp_acquire_queuing_lock_wi  xlinpack_xeon64                                                 2{ 1 100.0%    n/a}
                                     10.7%    10.5%     0.0%     0.0%               0x14 146560    ***           0x996def      303      352     1.2%     69 __kmp_release_queuing_lock_wi  xlinpack_xeon64       0{18  21.5%    n/a}  1{18  27.8%    n/a}  2{18  31.6%    n/a}  3{15  19.1%    n/a}
                                      0.0%     0.0%     2.8%   100.0%               0x14 146560    ***           0x996e54      n/a      n/a      n/a     19 __kmp_release_queuing_lock_wi  xlinpack_xeon64       0{ 4    n/a  25.0%}  1{ 6    n/a  29.2%}  2{ 6    n/a  33.3%}  3{ 3    n/a  12.5%}

// Here's where the data gets interesting.  Look at cacheline 0 above.  The cacheline at data address 0x7f9d8833bf80 had the most contention.
//
//  There were 1953 loads to that cacheline that hit in "remote-node modified cachelines".  That's why this is the hottest cacheline.
//  There were 835 loads that hit in a local modified cacheline.  Also noted are the number of stores that both hit and missed the L1 cache.
//  All accesses to that cacheline occurred at offsets 0x10 and 0x14.
//  All accesses were done by one pid (146560), which was the Pid for linpack.
//  I chose to display "***" when multiple thread ids (Tids) were involved in the same entry.  Individual Tids can be displayed, but it makes for a long output.
//  The instruction address of the load/store is displayed, along with the function and object name.
//  Where loads are involved, the median and mean load latency cycles are displayed.  Dumping the c2c raw records shows you individual worst offenders.  It's not uncommon to see loads taking tens of thousands of cycles to complete when heavy contention is involved.
//  The "cpu cnt" column shows how many individual cpus had samples contributing to a row of data.  In the first row above, there were samples from 80 cpus for that row.
//  The "Shared Data Participants" columns show the nodes the samples occured on, the number of cpus in each node that samples came from for that row, and for each node, the percentage of hitms and stores that came from that node.
//
//  The above data shows how that hot cacheline is being concurrently read and written from cpus across all four nodes on the system.  It's then easy to disassemble the binary (with line info) to identify the source code line numbers causing the false sharing.

-----------------------------------------------------------------------------------------------
    1   0.3%   2.2%   0.7%   4.2%      360       97     1780        0 0xffff885fd0f0a5c0    ***
-----------------------------------------------------------------------------------------------
                                     21.1%    29.9%     0.0%     0.0%               0x18 146560    *** 0xffffffff81196f40     1193     2404    13.9%     44 handle_mm_fault                [kernel.kallsyms]     0{ 4   9.2%    n/a}  1{12  27.6%    n/a}  2{17  40.8%    n/a}  3{11  22.4%    n/a}
                                     25.0%    21.6%     0.0%     0.0%               0x18 146560    *** 0xffffffff811a16d9      377      754    16.8%     43 mm_find_pmd                    [kernel.kallsyms]     0{11  23.3%    n/a}  1{12  16.7%    n/a}  2{11  41.1%    n/a}  3{ 9  18.9%    n/a}
                                     17.2%    12.4%     0.0%     0.0%               0x18      0      0 0xffffffff8163accb     1994     2467    11.0%     58 __schedule                     [kernel.kallsyms]     0{16  24.2%    n/a}  1{13  21.0%    n/a}  2{13  21.0%    n/a}  3{16  33.9%    n/a}
                                      3.9%     2.1%    24.1%     0.0%               0x24    ***    *** 0xffffffff810b41ce     2066     3398    30.0%    136 finish_task_switch             [kernel.kallsyms]     0{32  35.7%  23.5%}  1{36  14.3%  26.8%}  2{36  42.9%  26.1%}  3{32   7.1%  23.5%}
                                     13.9%    11.3%    29.6%     0.0%               0x24 146560    *** 0xffffffff8163aa37     7230     7026     7.8%    139 __schedule                     [kernel.kallsyms]     0{34  26.0%  18.8%}  1{36  24.0%  28.5%}  2{36  34.0%  27.9%}  3{33  16.0%  24.7%}
                                      0.0%     0.0%     0.1%     0.0%               0x24 146560 146665 0xffffffff8163aa3c      n/a      n/a      n/a      1 __schedule                     [kernel.kallsyms]                                               2{ 1    n/a 100.0%}
                                      5.3%     3.1%     0.0%     0.0%               0x38 146560    *** 0xffffffff810aa959      406      971    22.2%     20 down_read_trylock              [kernel.kallsyms]     0{ 3  15.8%    n/a}  1{ 9  52.6%    n/a}  2{ 4  15.8%    n/a}  3{ 4  15.8%    n/a}
                                     12.5%    15.5%    18.0%     0.0%               0x38 146560    *** 0xffffffff810aa965     2342     3845    13.9%     82 down_read_trylock              [kernel.kallsyms]     0{21  15.6%  16.6%}  1{22  24.4%  31.6%}  2{19  35.6%  27.2%}  3{20  24.4%  24.7%}
                                      1.1%     4.1%    28.3%     0.0%               0x38 146560    *** 0xffffffff810aa9c3     1376     1462    20.1%     87 up_read                        [kernel.kallsyms]     0{21  50.0%  16.7%}  1{23  50.0%  31.3%}  2{23   0.0%  28.6%}  3{20   0.0%  23.4%}

-----------------------------------------------------------------------------------------------
    2   0.2%   2.4%   0.4%   4.6%      221       94      822        0 0xffff881fd0dddd00 146560
-----------------------------------------------------------------------------------------------
                                      2.7%     2.1%     0.0%     0.0%               0x00 146560    *** 0xffffffff811a31ff      328      550    25.9%      8 page_lock_anon_vma_read        [kernel.kallsyms]     0{ 2  16.7%    n/a}  1{ 1  16.7%    n/a}  2{ 3  33.3%    n/a}  3{ 2  33.3%    n/a}
                                      9.5%     9.6%     0.0%     0.0%               0x00 146560    *** 0xffffffff811a36f5      429      814    19.2%     24 try_to_unmap_anon              [kernel.kallsyms]     0{ 5  19.0%    n/a}  1{ 9  42.9%    n/a}  2{ 5  23.8%    n/a}  3{ 5  14.3%    n/a}
                                     30.8%    27.7%     0.0%     0.0%               0x00 146560    *** 0xffffffff811a39f0      373      541     9.4%     37 rmap_walk                      [kernel.kallsyms]     0{12  32.4%    n/a}  1{ 9  20.6%    n/a}  2{ 8  25.0%    n/a}  3{ 8  22.1%    n/a}
                                     24.0%    20.2%     0.0%     0.0%               0x00 146560    *** 0xffffffff811a3a6a      424      692    11.0%     35 rmap_walk                      [kernel.kallsyms]     0{ 9  26.4%    n/a}  1{10  24.5%    n/a}  2{ 9  32.1%    n/a}  3{ 7  17.0%    n/a}
                                      0.9%     0.0%     0.0%     0.0%               0x08 146560    *** 0xffffffff810aa959      295      295     4.4%      2 down_read_trylock              [kernel.kallsyms]     0{ 2 100.0%    n/a}
                                      2.3%     2.1%    38.3%     0.0%               0x08 146560    *** 0xffffffff810aa965      537      821    25.9%     48 down_read_trylock              [kernel.kallsyms]     0{15  40.0%  25.4%}  1{12   0.0%  28.6%}  2{11   0.0%  26.7%}  3{10  60.0%  19.4%}
                                      3.2%     4.3%    15.0%     0.0%               0x08 146560    *** 0xffffffff810aa9c3      746      834    17.1%     44 up_read                        [kernel.kallsyms]     0{13  14.3%  23.6%}  1{12  14.3%  23.6%}  2{10  28.6%  26.0%}  3{ 9  42.9%  26.8%}
                                      4.1%     3.2%    22.9%     0.0%               0x08 146560    *** 0xffffffff8163a0a5      950      934    15.7%     46 down_read                      [kernel.kallsyms]     0{14  44.4%  31.9%}  1{13  22.2%  22.3%}  2{11  22.2%  25.5%}  3{ 8  11.1%  20.2%}
                                     12.7%    17.0%     0.0%     0.0%               0x28 146560    *** 0xffffffff811a3140      337      827    26.1%     28 page_get_anon_vma              [kernel.kallsyms]     0{ 7  21.4%    n/a}  1{ 9  42.9%    n/a}  2{ 6  10.7%    n/a}  3{ 6  25.0%    n/a}
                                      2.3%     2.1%    17.0%     0.0%               0x28 146560    *** 0xffffffff811a3151     1288     1870    29.9%     43 page_get_anon_vma              [kernel.kallsyms]     0{13  40.0%  33.6%}  1{11  20.0%  19.3%}  2{10   0.0%  27.1%}  3{ 9  40.0%  20.0%}
                                      0.0%     0.0%     0.1%     0.0%               0x28 146560 146602 0xffffffff811a31aa      n/a      n/a      n/a      1 page_get_anon_vma              [kernel.kallsyms]                          1{ 1    n/a 100.0%}
                                      2.7%     3.2%     6.7%     0.0%               0x28 146560    *** 0xffffffff811c77b4      843     1950    43.3%     35 migrate_pages                  [kernel.kallsyms]     0{10  16.7%  36.4%}  1{ 6  16.7%  18.2%}  2{10  16.7%  21.8%}  3{ 9  50.0%  23.6%}
                                      5.0%     8.5%     0.0%     0.0%               0x30 146560    *** 0xffffffff81190b15      336      561    19.6%     15 anon_vma_interval_tree_iter_f  [kernel.kallsyms]     0{ 5  36.4%    n/a}  1{ 4   9.1%    n/a}  2{ 3  36.4%    n/a}  3{ 3  18.2%    n/a}

-----------------------------------------------------------------------------------------------
    3   0.2%   2.6%   0.4%   5.0%      214       65      418        0 0xffff885fd0f0a8c0    ***
-----------------------------------------------------------------------------------------------
                                      7.9%    13.8%     0.0%     0.0%               0x08      0      0 0xffffffff81065c26      430      724    17.2%     20 leave_mm                       [kernel.kallsyms]     0{10  58.8%    n/a}  1{10  41.2%    n/a}
                                      0.5%     0.0%    12.7%     0.0%               0x08      0      0 0xffffffff81065c38     1343     1343     0.0%     25 leave_mm                       [kernel.kallsyms]     0{15   0.0%  54.7%}  1{10 100.0%  45.3%}
                                      1.9%     1.5%     5.5%     0.0%               0x08      0      0 0xffffffff810b0c74     1902     3074    32.9%     18 cpumask_set_cpu                [kernel.kallsyms]     0{ 9  25.0%  60.9%}  1{ 9  75.0%  39.1%}
                                     11.2%     9.2%     0.0%     0.0%               0x08    ***    *** 0xffffffff8163acb5      401      554    15.9%     20 __schedule                     [kernel.kallsyms]     0{13  58.3%    n/a}  1{ 7  41.7%    n/a}
                                      6.5%     6.2%     0.0%     0.0%               0x0c      0      0 0xffffffff81065c26      310      540    22.2%     14 leave_mm                       [kernel.kallsyms]                          1{ 2  14.3%    n/a}  2{10  78.6%    n/a}  3{ 2   7.1%    n/a}
                                      0.5%     0.0%    11.2%     0.0%               0x0c      0      0 0xffffffff81065c38     3638     3638     0.0%     25 leave_mm                       [kernel.kallsyms]                          1{ 1   0.0%   4.3%}  2{16 100.0%  68.1%}  3{ 8   0.0%  27.7%}
                                      2.8%     4.6%     3.8%     0.0%               0x0c      0      0 0xffffffff810b0c74     1774     2290    25.7%     20 cpumask_set_cpu                [kernel.kallsyms]                          1{ 3   0.0%  18.8%}  2{12 100.0%  50.0%}  3{ 5   0.0%  31.2%}
                                     15.4%     9.2%     0.0%     0.0%               0x0c    ***    *** 0xffffffff8163acb5      415      768    23.1%     21 __schedule                     [kernel.kallsyms]                          1{ 2   3.0%    n/a}  2{12  63.6%    n/a}  3{ 7  33.3%    n/a}
                                     10.7%    15.4%     0.0%     0.0%               0x10      0      0 0xffffffff81065c26      428      619    14.2%     18 leave_mm                       [kernel.kallsyms]     0{13  65.2%    n/a}  1{ 3  13.0%    n/a}                       3{ 2  21.7%    n/a}
                                      0.0%     0.0%    14.1%     0.0%               0x10      0      0 0xffffffff81065c38      n/a      n/a      n/a     27 leave_mm                       [kernel.kallsyms]     0{15    n/a  55.9%}  1{ 5    n/a  18.6%}                       3{ 7    n/a  25.4%}
                                      0.0%     4.6%     6.9%     0.0%               0x10      0      0 0xffffffff810b0c74      n/a      n/a      n/a     18 cpumask_set_cpu                [kernel.kallsyms]     0{10    n/a  41.4%}  1{ 5    n/a  41.4%}                       3{ 3    n/a  17.2%}
                                      9.3%     7.7%     0.0%     0.0%               0x10    ***    *** 0xffffffff8163acb5      404      640    25.5%     18 __schedule                     [kernel.kallsyms]     0{ 6  35.0%    n/a}  1{ 5  25.0%    n/a}                       3{ 7  40.0%    n/a}
                                     13.1%     9.2%     0.0%     0.0%               0x14      0      0 0xffffffff81065c26      340      458     9.8%     21 leave_mm                       [kernel.kallsyms]                          1{ 7  32.1%    n/a}  2{13  64.3%    n/a}  3{ 1   3.6%    n/a}
                                      0.5%     0.0%    20.3%     0.0%               0x14      0      0 0xffffffff81065c38     2662     2662     0.0%     27 leave_mm                       [kernel.kallsyms]                          1{11   0.0%  32.9%}  2{14 100.0%  57.6%}  3{ 2   0.0%   9.4%}
                                      1.9%     0.0%     8.9%     0.0%               0x14      0      0 0xffffffff810b0c74     2892     3086    28.5%     20 cpumask_set_cpu                [kernel.kallsyms]                          1{ 6   0.0%  32.4%}  2{13 100.0%  64.9%}  3{ 1   0.0%   2.7%}
                                      0.0%     0.0%     0.2%     0.0%               0x14      0      0 0xffffffff810b0c78      n/a      n/a      n/a      1 cpumask_set_cpu                [kernel.kallsyms]                                               2{ 1    n/a 100.0%}
                                      8.9%     4.6%     0.0%     0.0%               0x14      0      0 0xffffffff8163acb5      437     1041    20.5%     16 __schedule                     [kernel.kallsyms]                          1{ 8  42.1%    n/a}  2{ 7  52.6%    n/a}  3{ 1   5.3%    n/a}
                                      3.3%     7.7%     0.0%     0.0%               0x18      0      0 0xffffffff81065c26      407      450    15.8%      8 leave_mm                       [kernel.kallsyms]                                                                    3{ 8 100.0%    n/a}
                                      0.0%     0.0%    12.4%     0.0%               0x18      0      0 0xffffffff81065c38      n/a      n/a      n/a     16 leave_mm                       [kernel.kallsyms]                                                                    3{16    n/a 100.0%}
                                      0.9%     1.5%     3.8%     0.0%               0x18      0      0 0xffffffff810b0c74     1007     1230    18.2%     12 cpumask_set_cpu                [kernel.kallsyms]                                                                    3{12 100.0% 100.0%}
                                      4.7%     4.6%     0.0%     0.0%               0x18    ***    *** 0xffffffff8163acb5      342      629    25.1%     10 __schedule                     [kernel.kallsyms]                                                                    3{10 100.0%    n/a}

-----------------------------------------------------------------------------------------------
    4   0.2%   2.8%   0.3%   5.3%      172       44      192        0 0xffff883fffb15280 146560
-----------------------------------------------------------------------------------------------
                                      0.6%     0.0%    85.4%     0.0%               0x00 146560    *** 0xffffffff810e6e13      298      298     0.0%      1 flush_smp_call_function_queue  [kernel.kallsyms]                          1{ 1 100.0% 100.0%}
                                     98.8%    97.7%     0.0%     0.0%               0x00 146560    *** 0xffffffff813067d8      291      363     4.9%     44 llist_add_batch                [kernel.kallsyms]     0{13  27.1%    n/a}  1{10  13.5%    n/a}  2{12  29.4%    n/a}  3{ 9  30.0%    n/a}
                                      0.6%     2.3%    14.6%     0.0%               0x00 146560    *** 0xffffffff813067e1      892     1255     0.0%     21 llist_add_batch                [kernel.kallsyms]     0{ 7   0.0%  35.7%}  1{ 2   0.0%   7.1%}  2{ 6 100.0%  28.6%}  3{ 6   0.0%  28.6%}

-----------------------------------------------------------------------------------------------
    5   0.2%   3.0%   0.3%   5.6%      166       62      310        0 0xffffffff81dafe00    ***
-----------------------------------------------------------------------------------------------
                                      0.6%     0.0%     0.0%     0.0%               0x00 146560 146696 0xffffffff810bcc6c     1363     1363     0.0%      1 update_cfs_rq_h_load           [kernel.kallsyms]                          1{ 1 100.0%    n/a}
                                     22.3%    21.0%     0.0%     0.0%               0x00 146560    *** 0xffffffff810bdf56     1421     2209    15.8%     38 select_task_rq_fair            [kernel.kallsyms]     0{13  32.4%    n/a}  1{ 7  16.2%    n/a}  2{12  35.1%    n/a}  3{ 6  16.2%    n/a}
                                      4.2%     6.5%     0.0%     0.0%               0x00 146560    *** 0xffffffff810bdfaf     1500     5357    64.4%     10 select_task_rq_fair            [kernel.kallsyms]     0{ 1   0.0%    n/a}  1{ 4  71.4%    n/a}  2{ 2   0.0%    n/a}  3{ 3  28.6%    n/a}
                                      6.0%    11.3%     0.0%     0.0%               0x00    ***    *** 0xffffffff810befcf     3172     2506    25.8%     15 update_blocked_averages        [kernel.kallsyms]     0{ 2  10.0%    n/a}  1{ 4  20.0%    n/a}  2{ 4  30.0%    n/a}  3{ 5  40.0%    n/a}
                                     54.8%    54.8%     0.0%     0.0%               0x00    ***    *** 0xffffffff810c1963     1884     2815    13.1%     81 update_cfs_shares              [kernel.kallsyms]     0{16  23.1%    n/a}  1{21  24.2%    n/a}  2{23  27.5%    n/a}  3{21  25.3%    n/a}
                                      1.2%     0.0%     0.0%     0.0%               0x08 146560    *** 0xffffffff810b5fb7     1365     1365    51.4%      2 set_task_cpu                   [kernel.kallsyms]     0{ 1  50.0%    n/a}  1{ 1  50.0%    n/a}
                                      3.0%     3.2%     0.0%     0.0%               0x08 146560    *** 0xffffffff810bf901      815     1186    45.9%      6 can_migrate_task               [kernel.kallsyms]     0{ 2  20.0%    n/a}  1{ 2  40.0%    n/a}  2{ 2  40.0%    n/a}
                                      3.6%     0.0%    44.8%     0.0%               0x20    ***    *** 0xffffffff810bcf6f     5364     6967    41.5%     94 update_cfs_rq_blocked_load     [kernel.kallsyms]     0{22  33.3%  17.3%}  1{27   0.0%  36.0%}  2{23  16.7%  23.0%}  3{22  50.0%  23.7%}
                                      0.0%     0.0%     1.0%     0.0%               0x28    ***    *** 0xffffffff810bf3a9      n/a      n/a      n/a      3 update_blocked_averages        [kernel.kallsyms]     0{ 1    n/a  33.3%}                                            3{ 2    n/a  66.7%}
                                      0.0%     0.0%     0.3%     0.0%               0x28      0      0 0xffffffff810c102a      n/a      n/a      n/a      1 idle_enter_fair                [kernel.kallsyms]                          1{ 1    n/a 100.0%}
                                      0.6%     3.2%    29.7%     0.0%               0x28 146560    *** 0xffffffff810c2455     1647     1647     0.0%     68 dequeue_task_fair              [kernel.kallsyms]     0{23 100.0%  35.9%}  1{18   0.0%  26.1%}  2{19   0.0%  25.0%}  3{ 8   0.0%  13.0%}
                                      0.0%     0.0%     0.6%     0.0%               0x28 146560    *** 0xffffffff810c2ba0      n/a      n/a      n/a      2 task_tick_fair                 [kernel.kallsyms]                                               2{ 1    n/a  50.0%}  3{ 1    n/a  50.0%}
                                      3.0%     0.0%    23.5%     0.0%               0x28    ***    *** 0xffffffff810c4203     2085     2958    31.8%     61 enqueue_task_fair              [kernel.kallsyms]     0{14  20.0%  30.1%}  1{14   0.0%  21.9%}  2{13  20.0%  21.9%}  3{20  60.0%  26.0%}
                                      0.6%     0.0%     0.0%     0.0%               0x30 146560 146593 0xffffffff810b5fe7     3070     3070     0.0%      1 set_task_cpu                   [kernel.kallsyms]                          1{ 1 100.0%    n/a}

-----------------------------------------------------------------------------------------------
    6   0.2%   3.1%   0.3%   5.9%      158       43      192        0 0xffff883fff855280 146560
-----------------------------------------------------------------------------------------------
                                      0.0%     0.0%    87.5%     0.0%               0x00 146560 146569 0xffffffff810e6e13      n/a      n/a      n/a      1 flush_smp_call_function_queue  [kernel.kallsyms]                          1{ 1    n/a 100.0%}
                                     98.7%   100.0%     0.0%     0.0%               0x00 146560    *** 0xffffffff813067d8      317      406     5.9%     44 llist_add_batch                [kernel.kallsyms]     0{13  32.1%    n/a}  1{10  11.5%    n/a}  2{11  30.1%    n/a}  3{10  26.3%    n/a}
                                      1.3%     0.0%    12.5%     0.0%               0x00 146560    *** 0xffffffff813067e1     1040     1040    23.1%     20 llist_add_batch                [kernel.kallsyms]     0{ 6  50.0%  29.2%}  1{ 5   0.0%  20.8%}  2{ 6  50.0%  25.0%}  3{ 3   0.0%  25.0%}

-----------------------------------------------------------------------------------------------
    7   0.1%   3.3%   0.3%   6.2%      154       44      153        0 0xffff885fff915280 146560
-----------------------------------------------------------------------------------------------
                                      0.6%     0.0%    86.9%     0.0%               0x00 146560 146624 0xffffffff810e6e13     1341     1341     0.0%      1 flush_smp_call_function_queue  [kernel.kallsyms]                                               2{ 1 100.0% 100.0%}
                                     98.7%   100.0%     0.0%     0.0%               0x00 146560    *** 0xffffffff813067d8      287      377     9.2%     43 llist_add_batch                [kernel.kallsyms]     0{12  36.8%    n/a}  1{11  24.3%    n/a}  2{10  11.8%    n/a}  3{10  27.0%    n/a}
                                      0.6%     0.0%    13.1%     0.0%               0x00 146560    *** 0xffffffff813067e1      539      539     0.0%     17 llist_add_batch                [kernel.kallsyms]     0{ 6   0.0%  30.0%}  1{ 6   0.0%  40.0%}  2{ 1   0.0%   5.0%}  3{ 4 100.0%  25.0%}

-----------------------------------------------------------------------------------------------
    8   0.1%   3.4%   0.3%   6.4%      151       54      103        0 0xffff885fffc15280 146560
-----------------------------------------------------------------------------------------------
                                      0.7%     0.0%    69.9%     0.0%               0x00 146560 146607 0xffffffff810e6e13      902      902     0.0%      1 flush_smp_call_function_queue  [kernel.kallsyms]                                               2{ 1 100.0% 100.0%}
                                     99.3%    98.1%     0.0%     0.0%               0x00 146560    *** 0xffffffff813067d8      278      377     5.3%     45 llist_add_batch                [kernel.kallsyms]     0{13  34.7%    n/a}  1{11  24.7%    n/a}  2{11  10.7%    n/a}  3{10  30.0%    n/a}
                                      0.0%     1.9%    30.1%     0.0%               0x00 146560    *** 0xffffffff813067e1      n/a      n/a      n/a     23 llist_add_batch                [kernel.kallsyms]     0{ 6    n/a  22.6%}  1{ 7    n/a  32.3%}  2{ 6    n/a  25.8%}  3{ 4    n/a  19.4%}

-----------------------------------------------------------------------------------------------
    9   0.1%   3.5%   0.3%   6.7%      151       42      223        0 0xffff885fffbf5280 146560
-----------------------------------------------------------------------------------------------
                                      0.0%     0.0%    87.4%     0.0%               0x00 146560    *** 0xffffffff810e6e13      n/a      n/a      n/a      1 flush_smp_call_function_queue  [kernel.kallsyms]                                               2{ 1    n/a 100.0%}
                                     99.3%    97.6%     0.0%     0.0%               0x00 146560    *** 0xffffffff813067d8      309      346     2.5%     44 llist_add_batch                [kernel.kallsyms]     0{11  28.0%    n/a}  1{11  23.3%    n/a}  2{11  13.3%    n/a}  3{11  35.3%    n/a}
                                      0.7%     2.4%    12.6%     0.0%               0x00 146560    *** 0xffffffff813067e1     1028     1696     0.0%     20 llist_add_batch                [kernel.kallsyms]     0{ 5   0.0%  25.0%}  1{ 2   0.0%   7.1%}  2{ 6 100.0%  35.7%}  3{ 7   0.0%  32.1%}

                                     
                                      
=====================================================================================================================================
                                                    Object Name, Path & Reference Counts

Index    Records   Object Name                       Object Path
=====================================================================================================================================
     0    2032703   xlinpack_xeon64                   /home/joe/linpack/xlinpack_xeon64
     1    1059580   [kernel.kallsyms]                 /proc/kcore
     2      21352   libpthread-2.17.so                /usr/lib64/libpthread-2.17.so
     3       3882   perf                              /home/root/git/rhel7/tools/perf/perf
     4         26   libc-2.17.so                      /usr/lib64/libc-2.17.so
     5          5   libpython2.7.so.1.0               /usr/lib64/libpython2.7.so.1.0
     6          3   ld-2.17.so                        /usr/lib64/ld-2.17.so
     7          1   sendmail.sendmail                 /usr/sbin/sendmail.sendmail
     8          1   irqbalance                        /usr/sbin/irqbalance





^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [Questions] perf c2c: What's the current status of perf c2c?
  2015-12-09 20:41         ` Joe Mario
@ 2015-12-10  2:36           ` Yunlong Song
  0 siblings, 0 replies; 14+ messages in thread
From: Yunlong Song @ 2015-12-10  2:36 UTC (permalink / raw)
  To: Joe Mario, Stephane Eranian, Andi Kleen
  Cc: Peter Zijlstra, Jiri Olsa, Don Zickus, David Ahern,
	Frédéric Weisbecker, Mike Galbraith, Paul Mackerras,
	acme@kernel.org >> Arnaldo Carvalho de Melo,
	mingo@redhat.com, Linux Kernel Mailing List, Jiri Olsa,
	wangnan0@huawei.com >> Wang Nan, Richard Fowles,
	Namhyung Kim

On 2015/12/10 4:41, Joe Mario wrote:
> Appended below is the  output for running "perf c2c" on a 4-node system
> running a multi-thread version of linpack.  I've annotated it show describe
> what some of the fields mean.
> 
> Note, your screen output has to be set pretty wide to read it.  For those
> not wanting to read it in their mailer, grab it from:
>    http://people.redhat.com/jmario/perf_c2c/perf_c2c_annotated_output.txt
> 
> Let me know of any questions.  My annotations begin with "// ".
> 
> Joe
> --------------------------------------------------------------------
> // Perf c2c output from a linpack run.

Hi, Joe,
    Got these details, thanks a lot for your interpretation. -_-

-- 
Thanks,
Yunlong Song


^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2015-12-10  2:39 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-12-09  4:06 [Questions] perf c2c: What's the current status of perf c2c? Yunlong Song
2015-12-09  8:04 ` Jiri Olsa
2015-12-09  8:12   ` Wangnan (F)
2015-12-09  9:11     ` Jiri Olsa
2015-12-09  9:34   ` Peter Zijlstra
2015-12-09 10:58     ` Peter Zijlstra
2015-12-09 11:09     ` Joe Mario
     [not found]     ` <1891297138.17374838.1449658964938.JavaMail.zimbra@redhat.com>
2015-12-09 14:03       ` Peter Zijlstra
2015-12-09 16:58     ` Andi Kleen
2015-12-09 17:15       ` Stephane Eranian
2015-12-09 17:21         ` Andi Kleen
2015-12-09 19:48           ` Stephane Eranian
2015-12-09 20:41         ` Joe Mario
2015-12-10  2:36           ` Yunlong Song

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.