* atomic operations bottleneck in the IPv6 stack
@ 2014-12-10 16:56 cristian.bercaru
2014-12-10 17:16 ` Hannes Frederic Sowa
0 siblings, 1 reply; 3+ messages in thread
From: cristian.bercaru @ 2014-12-10 16:56 UTC (permalink / raw
To: netdev@vger.kernel.org
Cc: R89243@freescale.com, Madalin-Cristian Bucur,
Razvan.Ungureanu@freescale.com
Hello!
I am running IPv6 forwarding cases and I get worse performance with 24 cores than with 16 cores.
Test scenario:
10G --->[T4240]---> 10G
- platform: Freescale T4240, powerpc, 24 x e6500 64-bit cores (I can disable 8 of them from uboot)
- input type: raw IPv6 78-byte packets
- input rate: 10Gbps
- forwarding/output rate: 16 cores - 3.3 Gbps; 24 cores - 2.4 Gbps
Doing a perf with "record -C 1 -c 10000000 -a sleep 120" record I observe
- on 16 cores
# Overhead Command Shared Object Symbol
19.59% ksoftirqd/1 [kernel.kallsyms] [k] .ip6_pol_route
18.07% ksoftirqd/1 [kernel.kallsyms] [k] .dst_release
5.09% ksoftirqd/1 [kernel.kallsyms] [k] .__netif_receive_skb_core
- on 24 cores
34.98% ksoftirqd/1 [kernel.kallsyms] [k] .ip6_pol_route
31.86% ksoftirqd/1 [kernel.kallsyms] [k] .dst_release
3.76% ksoftirqd/1 [kernel.kallsyms] [k] .ip6_finish_output2
2.72% ksoftirqd/1 [kernel.kallsyms] [k] .__netif_receive_skb_core
I de-inlined 'atomic_dec_return' and 'atomic_inc' that are used by 'ip6_pol_route' and 'dst_release' and I get
- on 16 cores
17.26% ksoftirqd/1 [kernel.kallsyms] [k] .atomic_dec_return_noinline
13.45% ksoftirqd/1 [kernel.kallsyms] [k] .atomic_inc_noinline
5.53% ksoftirqd/1 [kernel.kallsyms] [k] .ip6_pol_route
5.02% ksoftirqd/1 [kernel.kallsyms] [k] .__netif_receive_skb_core
- on 24 cores
32.45% ksoftirqd/1 [kernel.kallsyms] [k] .atomic_dec_return_noinline
30.56% ksoftirqd/1 [kernel.kallsyms] [k] .atomic_inc_noinline
4.71% ksoftirqd/1 [kernel.kallsyms] [k] .ip6_pol_route
3.57% ksoftirqd/1 [kernel.kallsyms] [k] .ip6_finish_output2
It seems to me that the atomic operations on the IPv6 forwarding path are a bottleneck and they are not scalable with the number of cores. Am I right? What improvements can be brought to the IPv6 kernel code to make it less dependent of atomic operations/variables?
Thank you,
Cristian Bercaru
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: atomic operations bottleneck in the IPv6 stack
2014-12-10 16:56 atomic operations bottleneck in the IPv6 stack cristian.bercaru
@ 2014-12-10 17:16 ` Hannes Frederic Sowa
2014-12-10 17:58 ` Hannes Frederic Sowa
0 siblings, 1 reply; 3+ messages in thread
From: Hannes Frederic Sowa @ 2014-12-10 17:16 UTC (permalink / raw
To: cristian.bercaru@freescale.com
Cc: netdev@vger.kernel.org, R89243@freescale.com,
Madalin-Cristian Bucur, Razvan.Ungureanu@freescale.com
On Mi, 2014-12-10 at 16:56 +0000, cristian.bercaru@freescale.com wrote:
>
> It seems to me that the atomic operations on the IPv6 forwarding path
> are a bottleneck and they are not scalable with the number of cores.
> Am I right? What improvements can be brought to the IPv6 kernel code
> to make it less dependent of atomic operations/variables?
For a starter, something like the following commit:
commit d26b3a7c4b3b26319f18bb645de93eba8f4bdcd5
Author: Eric Dumazet <edumazet@google.com>
Date: Tue Jul 31 05:45:30 2012 +0000
ipv4: percpu nh_rth_output cache
Bye,
Hannes
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: atomic operations bottleneck in the IPv6 stack
2014-12-10 17:16 ` Hannes Frederic Sowa
@ 2014-12-10 17:58 ` Hannes Frederic Sowa
0 siblings, 0 replies; 3+ messages in thread
From: Hannes Frederic Sowa @ 2014-12-10 17:58 UTC (permalink / raw
To: cristian.bercaru@freescale.com
Cc: netdev@vger.kernel.org, R89243@freescale.com,
Madalin-Cristian Bucur, Razvan.Ungureanu@freescale.com
On Mi, 2014-12-10 at 18:16 +0100, Hannes Frederic Sowa wrote:
> On Mi, 2014-12-10 at 16:56 +0000, cristian.bercaru@freescale.com wrote:
> >
> > It seems to me that the atomic operations on the IPv6 forwarding path
> > are a bottleneck and they are not scalable with the number of cores.
> > Am I right? What improvements can be brought to the IPv6 kernel code
> > to make it less dependent of atomic operations/variables?
>
> For a starter, something like the following commit:
>
> commit d26b3a7c4b3b26319f18bb645de93eba8f4bdcd5
> Author: Eric Dumazet <edumazet@google.com>
> Date: Tue Jul 31 05:45:30 2012 +0000
>
> ipv4: percpu nh_rth_output cache
Actually, we should be able to remove the atomics in input and
forwarding path by just relying on RCU. I'll have a look.
Bye,
Hannes
^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2014-12-10 18:29 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-12-10 16:56 atomic operations bottleneck in the IPv6 stack cristian.bercaru
2014-12-10 17:16 ` Hannes Frederic Sowa
2014-12-10 17:58 ` Hannes Frederic Sowa
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.