Netdev Archive mirror
 help / color / mirror / Atom feed
* MPLS outbound packets being dropped
@ 2015-12-06 10:20 Sam Russell
  2015-12-07  5:56 ` Sam Russell
  2015-12-07 10:40 ` Robert Shearman
  0 siblings, 2 replies; 6+ messages in thread
From: Sam Russell @ 2015-12-06 10:20 UTC (permalink / raw
  To: netdev

tl;dr mpls_output expects skb->protocol to be set to correct
ethertype, but it isn't

https://github.com/torvalds/linux/blob/ede2059dbaf9c6557a49d466c8c7778343b208ff/net/mpls/mpls_iptunnel.c#L64

Problem:

I set up two interfaces pointed at each other, and added a static arp
entry to minimise complexity

ifconfig enp0s8 10.0.0.1/24 up
ifconfig enp0s9 up
arp -s 10.0.0.5 00:12:34:56:78:90

I then added an MPLS route

./dev/iproute2/ip/ip route add 192.168.2.0/24 encap mpls 100 via inet
10.0.0.5 dev enp0s8

I then tried to ping an IP in this route but got errors back

ping 192.168.2.1
* PING 192.168.2.1 (192.168.2.1) 56(84) bytes of data.
* ping: sendmsg: Invalid argument

I then recorded calls to skb_kfree

./tools/perf/perf record -e skb:kfree_skb -g -a

This gave me the following packet trace:

   100.00%   100.00%  ping     [kernel.kallsyms]  [k] kfree_skb
               |
               ---kfree_skb
                  mpls_output
                  lwtunnel_output
                  ip_local_out_sk
                  ip_send_skb
                  ip_push_pending_frames
                  raw_sendmsg
                  inet_sendmsg
                  sock_sendmsg
                  ___sys_sendmsg
                  __sys_sendmsg
                  sys_sendmsg
                  entry_SYSCALL_64_fastpath
                  sendmsg
                  0

I then went through mpls_output.c and put printk() at every call to
"goto drop" and found that this was being hit after failing to match
skb->protocol

https://github.com/torvalds/linux/blob/ede2059dbaf9c6557a49d466c8c7778343b208ff/net/mpls/mpls_iptunnel.c#L64

My understanding is that skb->protocol is normally set after
dst_output. For example, a ping packet hitting a normal IPv4 route
should follow something like:

raw_sendmsg
ip_push_pending_frames
ip_send_skb
ip_local_out_sk
dst_output
ip_output

ip_output() is the first place where skb->protocol gets set

https://github.com/torvalds/linux/blob/dbd3393c56a8794fe596e7dd20d0efa613b9cf61/net/ipv4/ip_output.c#L356

The path that a packet follows when hitting an MPLS route is as follows:

raw_sendmsg
ip_push_pending_frames
ip_send_skb
ip_local_out_sk
dst_output
lwtunnel_output
mpls_output

lwtunnel_output merely routes to the correct output function (mpls_output)
mpls_output expects skb->protocol to be set, but nothing has set it
yet, so it drops the packet!

Any suggestions on how mpls_output should detect the protocol?

Setup:

Ubuntu 15.10

iproute2 built from head

Kernel 4.3, both ubuntu mainline and home-built:
make menuconfig, add lwtunnel, mpls-router, mpls-gso and mpls-iptunnel

Running virtualbox on OSX

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: MPLS outbound packets being dropped
  2015-12-06 10:20 MPLS outbound packets being dropped Sam Russell
@ 2015-12-07  5:56 ` Sam Russell
  2015-12-07 10:40 ` Robert Shearman
  1 sibling, 0 replies; 6+ messages in thread
From: Sam Russell @ 2015-12-07  5:56 UTC (permalink / raw
  To: netdev

I can confirm that MPLS packets are sent correctly if the conditional
at mpls_iptunnel.c:57-65 is removed and just replaced with lines
58-59:

ttl = ip_hdr(skb)->ttl;
rt = (struct rtable *)dst;

Obviously this isn't sufficient for IPv6, but is a workaround that
enables MPLS-encapsulated packets to be sent from Linux in the
meantime

On 6 December 2015 at 23:20, Sam Russell <sam.h.russell@gmail.com> wrote:
> tl;dr mpls_output expects skb->protocol to be set to correct
> ethertype, but it isn't
>
> https://github.com/torvalds/linux/blob/ede2059dbaf9c6557a49d466c8c7778343b208ff/net/mpls/mpls_iptunnel.c#L64
>
> Problem:
>
> I set up two interfaces pointed at each other, and added a static arp
> entry to minimise complexity
>
> ifconfig enp0s8 10.0.0.1/24 up
> ifconfig enp0s9 up
> arp -s 10.0.0.5 00:12:34:56:78:90
>
> I then added an MPLS route
>
> ./dev/iproute2/ip/ip route add 192.168.2.0/24 encap mpls 100 via inet
> 10.0.0.5 dev enp0s8
>
> I then tried to ping an IP in this route but got errors back
>
> ping 192.168.2.1
> * PING 192.168.2.1 (192.168.2.1) 56(84) bytes of data.
> * ping: sendmsg: Invalid argument
>
> I then recorded calls to skb_kfree
>
> ./tools/perf/perf record -e skb:kfree_skb -g -a
>
> This gave me the following packet trace:
>
>    100.00%   100.00%  ping     [kernel.kallsyms]  [k] kfree_skb
>                |
>                ---kfree_skb
>                   mpls_output
>                   lwtunnel_output
>                   ip_local_out_sk
>                   ip_send_skb
>                   ip_push_pending_frames
>                   raw_sendmsg
>                   inet_sendmsg
>                   sock_sendmsg
>                   ___sys_sendmsg
>                   __sys_sendmsg
>                   sys_sendmsg
>                   entry_SYSCALL_64_fastpath
>                   sendmsg
>                   0
>
> I then went through mpls_output.c and put printk() at every call to
> "goto drop" and found that this was being hit after failing to match
> skb->protocol
>
> https://github.com/torvalds/linux/blob/ede2059dbaf9c6557a49d466c8c7778343b208ff/net/mpls/mpls_iptunnel.c#L64
>
> My understanding is that skb->protocol is normally set after
> dst_output. For example, a ping packet hitting a normal IPv4 route
> should follow something like:
>
> raw_sendmsg
> ip_push_pending_frames
> ip_send_skb
> ip_local_out_sk
> dst_output
> ip_output
>
> ip_output() is the first place where skb->protocol gets set
>
> https://github.com/torvalds/linux/blob/dbd3393c56a8794fe596e7dd20d0efa613b9cf61/net/ipv4/ip_output.c#L356
>
> The path that a packet follows when hitting an MPLS route is as follows:
>
> raw_sendmsg
> ip_push_pending_frames
> ip_send_skb
> ip_local_out_sk
> dst_output
> lwtunnel_output
> mpls_output
>
> lwtunnel_output merely routes to the correct output function (mpls_output)
> mpls_output expects skb->protocol to be set, but nothing has set it
> yet, so it drops the packet!
>
> Any suggestions on how mpls_output should detect the protocol?
>
> Setup:
>
> Ubuntu 15.10
>
> iproute2 built from head
>
> Kernel 4.3, both ubuntu mainline and home-built:
> make menuconfig, add lwtunnel, mpls-router, mpls-gso and mpls-iptunnel
>
> Running virtualbox on OSX

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: MPLS outbound packets being dropped
  2015-12-06 10:20 MPLS outbound packets being dropped Sam Russell
  2015-12-07  5:56 ` Sam Russell
@ 2015-12-07 10:40 ` Robert Shearman
  2015-12-07 12:53   ` [PATCH net] mpls: fix sending of local encapped packets Robert Shearman
  1 sibling, 1 reply; 6+ messages in thread
From: Robert Shearman @ 2015-12-07 10:40 UTC (permalink / raw
  To: Sam Russell, netdev; +Cc: roopa

On 06/12/15 10:20, Sam Russell wrote:
> tl;dr mpls_output expects skb->protocol to be set to correct
> ethertype, but it isn't
>
> https://github.com/torvalds/linux/blob/ede2059dbaf9c6557a49d466c8c7778343b208ff/net/mpls/mpls_iptunnel.c#L64
>
> Problem:
>
> I set up two interfaces pointed at each other, and added a static arp
> entry to minimise complexity
>
> ifconfig enp0s8 10.0.0.1/24 up
> ifconfig enp0s9 up
> arp -s 10.0.0.5 00:12:34:56:78:90
>
> I then added an MPLS route
>
> ./dev/iproute2/ip/ip route add 192.168.2.0/24 encap mpls 100 via inet
> 10.0.0.5 dev enp0s8
>
> I then tried to ping an IP in this route but got errors back
>
> ping 192.168.2.1
> * PING 192.168.2.1 (192.168.2.1) 56(84) bytes of data.
> * ping: sendmsg: Invalid argument
>
> I then recorded calls to skb_kfree
>
> ./tools/perf/perf record -e skb:kfree_skb -g -a
>
> This gave me the following packet trace:
>
>     100.00%   100.00%  ping     [kernel.kallsyms]  [k] kfree_skb
>                 |
>                 ---kfree_skb
>                    mpls_output
>                    lwtunnel_output
>                    ip_local_out_sk
>                    ip_send_skb
>                    ip_push_pending_frames
>                    raw_sendmsg
>                    inet_sendmsg
>                    sock_sendmsg
>                    ___sys_sendmsg
>                    __sys_sendmsg
>                    sys_sendmsg
>                    entry_SYSCALL_64_fastpath
>                    sendmsg
>                    0
>
> I then went through mpls_output.c and put printk() at every call to
> "goto drop" and found that this was being hit after failing to match
> skb->protocol
>
> https://github.com/torvalds/linux/blob/ede2059dbaf9c6557a49d466c8c7778343b208ff/net/mpls/mpls_iptunnel.c#L64
>
> My understanding is that skb->protocol is normally set after
> dst_output. For example, a ping packet hitting a normal IPv4 route
> should follow something like:
>
> raw_sendmsg
> ip_push_pending_frames
> ip_send_skb
> ip_local_out_sk
> dst_output
> ip_output
>
> ip_output() is the first place where skb->protocol gets set
>
> https://github.com/torvalds/linux/blob/dbd3393c56a8794fe596e7dd20d0efa613b9cf61/net/ipv4/ip_output.c#L356
>
> The path that a packet follows when hitting an MPLS route is as follows:
>
> raw_sendmsg
> ip_push_pending_frames
> ip_send_skb
> ip_local_out_sk
> dst_output
> lwtunnel_output
> mpls_output
>
> lwtunnel_output merely routes to the correct output function (mpls_output)
> mpls_output expects skb->protocol to be set, but nothing has set it
> yet, so it drops the packet!
>
> Any suggestions on how mpls_output should detect the protocol?

Thanks for reporting this and for your analysis.

We could write wrappers to lwtunnel_output for the v4 and v6 cases that 
set the protocol accordingly and then call lwtunnel_output, but since 
mpls_output relies on the AF-specific type of dst I think the simpler 
fix is to just test the type of the dst in mpls_output rather than 
skb->protocol.

Thanks,
Rob

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [PATCH net] mpls: fix sending of local encapped packets
  2015-12-07 10:40 ` Robert Shearman
@ 2015-12-07 12:53   ` Robert Shearman
  2015-12-07 17:53     ` Sam Russell
  2015-12-07 21:33     ` David Miller
  0 siblings, 2 replies; 6+ messages in thread
From: Robert Shearman @ 2015-12-07 12:53 UTC (permalink / raw
  To: davem; +Cc: netdev, roopa, ebiederm, Sam Russell, Robert Shearman

Locally generated IPv4 and (probably) IPv6 packets are dropped because
skb->protocol isn't set. We could write wrappers to lwtunnel_output
for IPv4 and IPv6 that set the protocol accordingly and then call
lwtunnel_output, but mpls_output relies on the AF-specific type of dst
anyway to get the via address.

Therefore, make use of dst->dst_ops->family in mpls_output to
determine the type of nexthop and thus protocol of the packet instead
of checking skb->protocol.

Fixes: 61adedf3e3f1 ("route: move lwtunnel state to dst_entry")
Reported-by: Sam Russell <sam.h.russell@gmail.com>
Signed-off-by: Robert Shearman <rshearma@brocade.com>
---
 net/mpls/mpls_iptunnel.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/net/mpls/mpls_iptunnel.c b/net/mpls/mpls_iptunnel.c
index 67591aef9cae..64afd3d0b144 100644
--- a/net/mpls/mpls_iptunnel.c
+++ b/net/mpls/mpls_iptunnel.c
@@ -54,10 +54,10 @@ int mpls_output(struct net *net, struct sock *sk, struct sk_buff *skb)
 	unsigned int ttl;
 
 	/* Obtain the ttl */
-	if (skb->protocol == htons(ETH_P_IP)) {
+	if (dst->ops->family == AF_INET) {
 		ttl = ip_hdr(skb)->ttl;
 		rt = (struct rtable *)dst;
-	} else if (skb->protocol == htons(ETH_P_IPV6)) {
+	} else if (dst->ops->family == AF_INET6) {
 		ttl = ipv6_hdr(skb)->hop_limit;
 		rt6 = (struct rt6_info *)dst;
 	} else {
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [PATCH net] mpls: fix sending of local encapped packets
  2015-12-07 12:53   ` [PATCH net] mpls: fix sending of local encapped packets Robert Shearman
@ 2015-12-07 17:53     ` Sam Russell
  2015-12-07 21:33     ` David Miller
  1 sibling, 0 replies; 6+ messages in thread
From: Sam Russell @ 2015-12-07 17:53 UTC (permalink / raw
  To: Robert Shearman; +Cc: davem, netdev, roopa, ebiederm

I can confirm that this patch works for IPv4 and IPv6

modprobe mpls_router
modprobe mpls_gso
modprobe mpls_iptunnel
ifconfig enp0s8 192.168.99.2/24 up
ifconfig enp0s9 up
arp -s 192.168.99.5  00:12:34:56:78:90
ip -6 neigh add fe80::a00:27ff:1234:5678 lladdr 00:12:34:56:78:90 dev enp0s8
./dev/iproute2/ip/ip route add 192.168.2.0/24 encap mpls 100 via inet
192.168.99.5 dev enp0s8
./dev/iproute2/ip/ip -6 route add 2404:1234:5678::/48 encap mpls 150
via inet6 fe80::a00:27ff:1234:5678 dev enp0s8

ping 192.168.2.1

tcpdump: listening on enp0s8, link-type EN10MB (Ethernet), capture
size 262144 bytes
06:46:16.780466 08:00:27:39:3c:e0 > 00:12:34:56:78:90, ethertype MPLS
unicast (0x8847), length 102: MPLS (label 100, exp 0, [S], ttl 64)
(tos 0x0, ttl 64, id 24189, offset 0, flags [DF], proto ICMP (1), length 84)
    192.168.99.2 > 192.168.2.1: ICMP echo request, id 962, seq 1, length 64
06:46:17.790616 08:00:27:39:3c:e0 > 00:12:34:56:78:90, ethertype MPLS
unicast (0x8847), length 102: MPLS (label 100, exp 0, [S], ttl 64)
(tos 0x0, ttl 64, id 24297, offset 0, flags [DF], proto ICMP (1), length 84)
    192.168.99.2 > 192.168.2.1: ICMP echo request, id 962, seq 2, length 64
06:46:18.791064 08:00:27:39:3c:e0 > 00:12:34:56:78:90, ethertype MPLS
unicast (0x8847), length 102: MPLS (label 100, exp 0, [S], ttl 64)
(tos 0x0, ttl 64, id 24338, offset 0, flags [DF], proto ICMP (1), length 84)
    192.168.99.2 > 192.168.2.1: ICMP echo request, id 962, seq 3, length 64

ping6 2404:1234:5678::1

tcpdump: listening on enp0s8, link-type EN10MB (Ethernet), capture
size 262144 bytes
06:51:15.235286 08:00:27:39:3c:e0 > 00:12:34:56:78:90, ethertype MPLS
unicast (0x8847), length 122: MPLS (label 150, exp 0, [S], ttl 64)
(flowlabel 0xc4081, hlim 64, next-header ICMPv6 (58) payload length:
64) fe80::a00:27ff:fe39:3ce0 > 2404:1234:5678::1: [icmp6 sum ok]
ICMP6, echo request, seq 1
06:51:16.236016 08:00:27:39:3c:e0 > 00:12:34:56:78:90, ethertype MPLS
unicast (0x8847), length 122: MPLS (label 150, exp 0, [S], ttl 64)
(flowlabel 0xc4081, hlim 64, next-header ICMPv6 (58) payload length:
64) fe80::a00:27ff:fe39:3ce0 > 2404:1234:5678::1: [icmp6 sum ok]
ICMP6, echo request, seq 2

Thanks for the quick response!

Cheers
Sam

On 8 December 2015 at 01:53, Robert Shearman <rshearma@brocade.com> wrote:
> Locally generated IPv4 and (probably) IPv6 packets are dropped because
> skb->protocol isn't set. We could write wrappers to lwtunnel_output
> for IPv4 and IPv6 that set the protocol accordingly and then call
> lwtunnel_output, but mpls_output relies on the AF-specific type of dst
> anyway to get the via address.
>
> Therefore, make use of dst->dst_ops->family in mpls_output to
> determine the type of nexthop and thus protocol of the packet instead
> of checking skb->protocol.
>
> Fixes: 61adedf3e3f1 ("route: move lwtunnel state to dst_entry")
> Reported-by: Sam Russell <sam.h.russell@gmail.com>
> Signed-off-by: Robert Shearman <rshearma@brocade.com>
> ---
>  net/mpls/mpls_iptunnel.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/net/mpls/mpls_iptunnel.c b/net/mpls/mpls_iptunnel.c
> index 67591aef9cae..64afd3d0b144 100644
> --- a/net/mpls/mpls_iptunnel.c
> +++ b/net/mpls/mpls_iptunnel.c
> @@ -54,10 +54,10 @@ int mpls_output(struct net *net, struct sock *sk, struct sk_buff *skb)
>         unsigned int ttl;
>
>         /* Obtain the ttl */
> -       if (skb->protocol == htons(ETH_P_IP)) {
> +       if (dst->ops->family == AF_INET) {
>                 ttl = ip_hdr(skb)->ttl;
>                 rt = (struct rtable *)dst;
> -       } else if (skb->protocol == htons(ETH_P_IPV6)) {
> +       } else if (dst->ops->family == AF_INET6) {
>                 ttl = ipv6_hdr(skb)->hop_limit;
>                 rt6 = (struct rt6_info *)dst;
>         } else {
> --
> 2.1.4
>

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH net] mpls: fix sending of local encapped packets
  2015-12-07 12:53   ` [PATCH net] mpls: fix sending of local encapped packets Robert Shearman
  2015-12-07 17:53     ` Sam Russell
@ 2015-12-07 21:33     ` David Miller
  1 sibling, 0 replies; 6+ messages in thread
From: David Miller @ 2015-12-07 21:33 UTC (permalink / raw
  To: rshearma; +Cc: netdev, roopa, ebiederm, sam.h.russell

From: Robert Shearman <rshearma@brocade.com>
Date: Mon, 7 Dec 2015 12:53:15 +0000

> Locally generated IPv4 and (probably) IPv6 packets are dropped because
> skb->protocol isn't set. We could write wrappers to lwtunnel_output
> for IPv4 and IPv6 that set the protocol accordingly and then call
> lwtunnel_output, but mpls_output relies on the AF-specific type of dst
> anyway to get the via address.
> 
> Therefore, make use of dst->dst_ops->family in mpls_output to
> determine the type of nexthop and thus protocol of the packet instead
> of checking skb->protocol.
> 
> Fixes: 61adedf3e3f1 ("route: move lwtunnel state to dst_entry")
> Reported-by: Sam Russell <sam.h.russell@gmail.com>
> Signed-off-by: Robert Shearman <rshearma@brocade.com>

Applied, thanks.

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2015-12-07 21:33 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2015-12-06 10:20 MPLS outbound packets being dropped Sam Russell
2015-12-07  5:56 ` Sam Russell
2015-12-07 10:40 ` Robert Shearman
2015-12-07 12:53   ` [PATCH net] mpls: fix sending of local encapped packets Robert Shearman
2015-12-07 17:53     ` Sam Russell
2015-12-07 21:33     ` David Miller

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).