* MPLS outbound packets being dropped
@ 2015-12-06 10:20 Sam Russell
2015-12-07 5:56 ` Sam Russell
2015-12-07 10:40 ` Robert Shearman
0 siblings, 2 replies; 6+ messages in thread
From: Sam Russell @ 2015-12-06 10:20 UTC (permalink / raw
To: netdev
tl;dr mpls_output expects skb->protocol to be set to correct
ethertype, but it isn't
https://github.com/torvalds/linux/blob/ede2059dbaf9c6557a49d466c8c7778343b208ff/net/mpls/mpls_iptunnel.c#L64
Problem:
I set up two interfaces pointed at each other, and added a static arp
entry to minimise complexity
ifconfig enp0s8 10.0.0.1/24 up
ifconfig enp0s9 up
arp -s 10.0.0.5 00:12:34:56:78:90
I then added an MPLS route
./dev/iproute2/ip/ip route add 192.168.2.0/24 encap mpls 100 via inet
10.0.0.5 dev enp0s8
I then tried to ping an IP in this route but got errors back
ping 192.168.2.1
* PING 192.168.2.1 (192.168.2.1) 56(84) bytes of data.
* ping: sendmsg: Invalid argument
I then recorded calls to skb_kfree
./tools/perf/perf record -e skb:kfree_skb -g -a
This gave me the following packet trace:
100.00% 100.00% ping [kernel.kallsyms] [k] kfree_skb
|
---kfree_skb
mpls_output
lwtunnel_output
ip_local_out_sk
ip_send_skb
ip_push_pending_frames
raw_sendmsg
inet_sendmsg
sock_sendmsg
___sys_sendmsg
__sys_sendmsg
sys_sendmsg
entry_SYSCALL_64_fastpath
sendmsg
0
I then went through mpls_output.c and put printk() at every call to
"goto drop" and found that this was being hit after failing to match
skb->protocol
https://github.com/torvalds/linux/blob/ede2059dbaf9c6557a49d466c8c7778343b208ff/net/mpls/mpls_iptunnel.c#L64
My understanding is that skb->protocol is normally set after
dst_output. For example, a ping packet hitting a normal IPv4 route
should follow something like:
raw_sendmsg
ip_push_pending_frames
ip_send_skb
ip_local_out_sk
dst_output
ip_output
ip_output() is the first place where skb->protocol gets set
https://github.com/torvalds/linux/blob/dbd3393c56a8794fe596e7dd20d0efa613b9cf61/net/ipv4/ip_output.c#L356
The path that a packet follows when hitting an MPLS route is as follows:
raw_sendmsg
ip_push_pending_frames
ip_send_skb
ip_local_out_sk
dst_output
lwtunnel_output
mpls_output
lwtunnel_output merely routes to the correct output function (mpls_output)
mpls_output expects skb->protocol to be set, but nothing has set it
yet, so it drops the packet!
Any suggestions on how mpls_output should detect the protocol?
Setup:
Ubuntu 15.10
iproute2 built from head
Kernel 4.3, both ubuntu mainline and home-built:
make menuconfig, add lwtunnel, mpls-router, mpls-gso and mpls-iptunnel
Running virtualbox on OSX
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: MPLS outbound packets being dropped
2015-12-06 10:20 MPLS outbound packets being dropped Sam Russell
@ 2015-12-07 5:56 ` Sam Russell
2015-12-07 10:40 ` Robert Shearman
1 sibling, 0 replies; 6+ messages in thread
From: Sam Russell @ 2015-12-07 5:56 UTC (permalink / raw
To: netdev
I can confirm that MPLS packets are sent correctly if the conditional
at mpls_iptunnel.c:57-65 is removed and just replaced with lines
58-59:
ttl = ip_hdr(skb)->ttl;
rt = (struct rtable *)dst;
Obviously this isn't sufficient for IPv6, but is a workaround that
enables MPLS-encapsulated packets to be sent from Linux in the
meantime
On 6 December 2015 at 23:20, Sam Russell <sam.h.russell@gmail.com> wrote:
> tl;dr mpls_output expects skb->protocol to be set to correct
> ethertype, but it isn't
>
> https://github.com/torvalds/linux/blob/ede2059dbaf9c6557a49d466c8c7778343b208ff/net/mpls/mpls_iptunnel.c#L64
>
> Problem:
>
> I set up two interfaces pointed at each other, and added a static arp
> entry to minimise complexity
>
> ifconfig enp0s8 10.0.0.1/24 up
> ifconfig enp0s9 up
> arp -s 10.0.0.5 00:12:34:56:78:90
>
> I then added an MPLS route
>
> ./dev/iproute2/ip/ip route add 192.168.2.0/24 encap mpls 100 via inet
> 10.0.0.5 dev enp0s8
>
> I then tried to ping an IP in this route but got errors back
>
> ping 192.168.2.1
> * PING 192.168.2.1 (192.168.2.1) 56(84) bytes of data.
> * ping: sendmsg: Invalid argument
>
> I then recorded calls to skb_kfree
>
> ./tools/perf/perf record -e skb:kfree_skb -g -a
>
> This gave me the following packet trace:
>
> 100.00% 100.00% ping [kernel.kallsyms] [k] kfree_skb
> |
> ---kfree_skb
> mpls_output
> lwtunnel_output
> ip_local_out_sk
> ip_send_skb
> ip_push_pending_frames
> raw_sendmsg
> inet_sendmsg
> sock_sendmsg
> ___sys_sendmsg
> __sys_sendmsg
> sys_sendmsg
> entry_SYSCALL_64_fastpath
> sendmsg
> 0
>
> I then went through mpls_output.c and put printk() at every call to
> "goto drop" and found that this was being hit after failing to match
> skb->protocol
>
> https://github.com/torvalds/linux/blob/ede2059dbaf9c6557a49d466c8c7778343b208ff/net/mpls/mpls_iptunnel.c#L64
>
> My understanding is that skb->protocol is normally set after
> dst_output. For example, a ping packet hitting a normal IPv4 route
> should follow something like:
>
> raw_sendmsg
> ip_push_pending_frames
> ip_send_skb
> ip_local_out_sk
> dst_output
> ip_output
>
> ip_output() is the first place where skb->protocol gets set
>
> https://github.com/torvalds/linux/blob/dbd3393c56a8794fe596e7dd20d0efa613b9cf61/net/ipv4/ip_output.c#L356
>
> The path that a packet follows when hitting an MPLS route is as follows:
>
> raw_sendmsg
> ip_push_pending_frames
> ip_send_skb
> ip_local_out_sk
> dst_output
> lwtunnel_output
> mpls_output
>
> lwtunnel_output merely routes to the correct output function (mpls_output)
> mpls_output expects skb->protocol to be set, but nothing has set it
> yet, so it drops the packet!
>
> Any suggestions on how mpls_output should detect the protocol?
>
> Setup:
>
> Ubuntu 15.10
>
> iproute2 built from head
>
> Kernel 4.3, both ubuntu mainline and home-built:
> make menuconfig, add lwtunnel, mpls-router, mpls-gso and mpls-iptunnel
>
> Running virtualbox on OSX
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: MPLS outbound packets being dropped
2015-12-06 10:20 MPLS outbound packets being dropped Sam Russell
2015-12-07 5:56 ` Sam Russell
@ 2015-12-07 10:40 ` Robert Shearman
2015-12-07 12:53 ` [PATCH net] mpls: fix sending of local encapped packets Robert Shearman
1 sibling, 1 reply; 6+ messages in thread
From: Robert Shearman @ 2015-12-07 10:40 UTC (permalink / raw
To: Sam Russell, netdev; +Cc: roopa
On 06/12/15 10:20, Sam Russell wrote:
> tl;dr mpls_output expects skb->protocol to be set to correct
> ethertype, but it isn't
>
> https://github.com/torvalds/linux/blob/ede2059dbaf9c6557a49d466c8c7778343b208ff/net/mpls/mpls_iptunnel.c#L64
>
> Problem:
>
> I set up two interfaces pointed at each other, and added a static arp
> entry to minimise complexity
>
> ifconfig enp0s8 10.0.0.1/24 up
> ifconfig enp0s9 up
> arp -s 10.0.0.5 00:12:34:56:78:90
>
> I then added an MPLS route
>
> ./dev/iproute2/ip/ip route add 192.168.2.0/24 encap mpls 100 via inet
> 10.0.0.5 dev enp0s8
>
> I then tried to ping an IP in this route but got errors back
>
> ping 192.168.2.1
> * PING 192.168.2.1 (192.168.2.1) 56(84) bytes of data.
> * ping: sendmsg: Invalid argument
>
> I then recorded calls to skb_kfree
>
> ./tools/perf/perf record -e skb:kfree_skb -g -a
>
> This gave me the following packet trace:
>
> 100.00% 100.00% ping [kernel.kallsyms] [k] kfree_skb
> |
> ---kfree_skb
> mpls_output
> lwtunnel_output
> ip_local_out_sk
> ip_send_skb
> ip_push_pending_frames
> raw_sendmsg
> inet_sendmsg
> sock_sendmsg
> ___sys_sendmsg
> __sys_sendmsg
> sys_sendmsg
> entry_SYSCALL_64_fastpath
> sendmsg
> 0
>
> I then went through mpls_output.c and put printk() at every call to
> "goto drop" and found that this was being hit after failing to match
> skb->protocol
>
> https://github.com/torvalds/linux/blob/ede2059dbaf9c6557a49d466c8c7778343b208ff/net/mpls/mpls_iptunnel.c#L64
>
> My understanding is that skb->protocol is normally set after
> dst_output. For example, a ping packet hitting a normal IPv4 route
> should follow something like:
>
> raw_sendmsg
> ip_push_pending_frames
> ip_send_skb
> ip_local_out_sk
> dst_output
> ip_output
>
> ip_output() is the first place where skb->protocol gets set
>
> https://github.com/torvalds/linux/blob/dbd3393c56a8794fe596e7dd20d0efa613b9cf61/net/ipv4/ip_output.c#L356
>
> The path that a packet follows when hitting an MPLS route is as follows:
>
> raw_sendmsg
> ip_push_pending_frames
> ip_send_skb
> ip_local_out_sk
> dst_output
> lwtunnel_output
> mpls_output
>
> lwtunnel_output merely routes to the correct output function (mpls_output)
> mpls_output expects skb->protocol to be set, but nothing has set it
> yet, so it drops the packet!
>
> Any suggestions on how mpls_output should detect the protocol?
Thanks for reporting this and for your analysis.
We could write wrappers to lwtunnel_output for the v4 and v6 cases that
set the protocol accordingly and then call lwtunnel_output, but since
mpls_output relies on the AF-specific type of dst I think the simpler
fix is to just test the type of the dst in mpls_output rather than
skb->protocol.
Thanks,
Rob
^ permalink raw reply [flat|nested] 6+ messages in thread
* [PATCH net] mpls: fix sending of local encapped packets
2015-12-07 10:40 ` Robert Shearman
@ 2015-12-07 12:53 ` Robert Shearman
2015-12-07 17:53 ` Sam Russell
2015-12-07 21:33 ` David Miller
0 siblings, 2 replies; 6+ messages in thread
From: Robert Shearman @ 2015-12-07 12:53 UTC (permalink / raw
To: davem; +Cc: netdev, roopa, ebiederm, Sam Russell, Robert Shearman
Locally generated IPv4 and (probably) IPv6 packets are dropped because
skb->protocol isn't set. We could write wrappers to lwtunnel_output
for IPv4 and IPv6 that set the protocol accordingly and then call
lwtunnel_output, but mpls_output relies on the AF-specific type of dst
anyway to get the via address.
Therefore, make use of dst->dst_ops->family in mpls_output to
determine the type of nexthop and thus protocol of the packet instead
of checking skb->protocol.
Fixes: 61adedf3e3f1 ("route: move lwtunnel state to dst_entry")
Reported-by: Sam Russell <sam.h.russell@gmail.com>
Signed-off-by: Robert Shearman <rshearma@brocade.com>
---
net/mpls/mpls_iptunnel.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/net/mpls/mpls_iptunnel.c b/net/mpls/mpls_iptunnel.c
index 67591aef9cae..64afd3d0b144 100644
--- a/net/mpls/mpls_iptunnel.c
+++ b/net/mpls/mpls_iptunnel.c
@@ -54,10 +54,10 @@ int mpls_output(struct net *net, struct sock *sk, struct sk_buff *skb)
unsigned int ttl;
/* Obtain the ttl */
- if (skb->protocol == htons(ETH_P_IP)) {
+ if (dst->ops->family == AF_INET) {
ttl = ip_hdr(skb)->ttl;
rt = (struct rtable *)dst;
- } else if (skb->protocol == htons(ETH_P_IPV6)) {
+ } else if (dst->ops->family == AF_INET6) {
ttl = ipv6_hdr(skb)->hop_limit;
rt6 = (struct rt6_info *)dst;
} else {
--
2.1.4
^ permalink raw reply related [flat|nested] 6+ messages in thread
* Re: [PATCH net] mpls: fix sending of local encapped packets
2015-12-07 12:53 ` [PATCH net] mpls: fix sending of local encapped packets Robert Shearman
@ 2015-12-07 17:53 ` Sam Russell
2015-12-07 21:33 ` David Miller
1 sibling, 0 replies; 6+ messages in thread
From: Sam Russell @ 2015-12-07 17:53 UTC (permalink / raw
To: Robert Shearman; +Cc: davem, netdev, roopa, ebiederm
I can confirm that this patch works for IPv4 and IPv6
modprobe mpls_router
modprobe mpls_gso
modprobe mpls_iptunnel
ifconfig enp0s8 192.168.99.2/24 up
ifconfig enp0s9 up
arp -s 192.168.99.5 00:12:34:56:78:90
ip -6 neigh add fe80::a00:27ff:1234:5678 lladdr 00:12:34:56:78:90 dev enp0s8
./dev/iproute2/ip/ip route add 192.168.2.0/24 encap mpls 100 via inet
192.168.99.5 dev enp0s8
./dev/iproute2/ip/ip -6 route add 2404:1234:5678::/48 encap mpls 150
via inet6 fe80::a00:27ff:1234:5678 dev enp0s8
ping 192.168.2.1
tcpdump: listening on enp0s8, link-type EN10MB (Ethernet), capture
size 262144 bytes
06:46:16.780466 08:00:27:39:3c:e0 > 00:12:34:56:78:90, ethertype MPLS
unicast (0x8847), length 102: MPLS (label 100, exp 0, [S], ttl 64)
(tos 0x0, ttl 64, id 24189, offset 0, flags [DF], proto ICMP (1), length 84)
192.168.99.2 > 192.168.2.1: ICMP echo request, id 962, seq 1, length 64
06:46:17.790616 08:00:27:39:3c:e0 > 00:12:34:56:78:90, ethertype MPLS
unicast (0x8847), length 102: MPLS (label 100, exp 0, [S], ttl 64)
(tos 0x0, ttl 64, id 24297, offset 0, flags [DF], proto ICMP (1), length 84)
192.168.99.2 > 192.168.2.1: ICMP echo request, id 962, seq 2, length 64
06:46:18.791064 08:00:27:39:3c:e0 > 00:12:34:56:78:90, ethertype MPLS
unicast (0x8847), length 102: MPLS (label 100, exp 0, [S], ttl 64)
(tos 0x0, ttl 64, id 24338, offset 0, flags [DF], proto ICMP (1), length 84)
192.168.99.2 > 192.168.2.1: ICMP echo request, id 962, seq 3, length 64
ping6 2404:1234:5678::1
tcpdump: listening on enp0s8, link-type EN10MB (Ethernet), capture
size 262144 bytes
06:51:15.235286 08:00:27:39:3c:e0 > 00:12:34:56:78:90, ethertype MPLS
unicast (0x8847), length 122: MPLS (label 150, exp 0, [S], ttl 64)
(flowlabel 0xc4081, hlim 64, next-header ICMPv6 (58) payload length:
64) fe80::a00:27ff:fe39:3ce0 > 2404:1234:5678::1: [icmp6 sum ok]
ICMP6, echo request, seq 1
06:51:16.236016 08:00:27:39:3c:e0 > 00:12:34:56:78:90, ethertype MPLS
unicast (0x8847), length 122: MPLS (label 150, exp 0, [S], ttl 64)
(flowlabel 0xc4081, hlim 64, next-header ICMPv6 (58) payload length:
64) fe80::a00:27ff:fe39:3ce0 > 2404:1234:5678::1: [icmp6 sum ok]
ICMP6, echo request, seq 2
Thanks for the quick response!
Cheers
Sam
On 8 December 2015 at 01:53, Robert Shearman <rshearma@brocade.com> wrote:
> Locally generated IPv4 and (probably) IPv6 packets are dropped because
> skb->protocol isn't set. We could write wrappers to lwtunnel_output
> for IPv4 and IPv6 that set the protocol accordingly and then call
> lwtunnel_output, but mpls_output relies on the AF-specific type of dst
> anyway to get the via address.
>
> Therefore, make use of dst->dst_ops->family in mpls_output to
> determine the type of nexthop and thus protocol of the packet instead
> of checking skb->protocol.
>
> Fixes: 61adedf3e3f1 ("route: move lwtunnel state to dst_entry")
> Reported-by: Sam Russell <sam.h.russell@gmail.com>
> Signed-off-by: Robert Shearman <rshearma@brocade.com>
> ---
> net/mpls/mpls_iptunnel.c | 4 ++--
> 1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/net/mpls/mpls_iptunnel.c b/net/mpls/mpls_iptunnel.c
> index 67591aef9cae..64afd3d0b144 100644
> --- a/net/mpls/mpls_iptunnel.c
> +++ b/net/mpls/mpls_iptunnel.c
> @@ -54,10 +54,10 @@ int mpls_output(struct net *net, struct sock *sk, struct sk_buff *skb)
> unsigned int ttl;
>
> /* Obtain the ttl */
> - if (skb->protocol == htons(ETH_P_IP)) {
> + if (dst->ops->family == AF_INET) {
> ttl = ip_hdr(skb)->ttl;
> rt = (struct rtable *)dst;
> - } else if (skb->protocol == htons(ETH_P_IPV6)) {
> + } else if (dst->ops->family == AF_INET6) {
> ttl = ipv6_hdr(skb)->hop_limit;
> rt6 = (struct rt6_info *)dst;
> } else {
> --
> 2.1.4
>
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH net] mpls: fix sending of local encapped packets
2015-12-07 12:53 ` [PATCH net] mpls: fix sending of local encapped packets Robert Shearman
2015-12-07 17:53 ` Sam Russell
@ 2015-12-07 21:33 ` David Miller
1 sibling, 0 replies; 6+ messages in thread
From: David Miller @ 2015-12-07 21:33 UTC (permalink / raw
To: rshearma; +Cc: netdev, roopa, ebiederm, sam.h.russell
From: Robert Shearman <rshearma@brocade.com>
Date: Mon, 7 Dec 2015 12:53:15 +0000
> Locally generated IPv4 and (probably) IPv6 packets are dropped because
> skb->protocol isn't set. We could write wrappers to lwtunnel_output
> for IPv4 and IPv6 that set the protocol accordingly and then call
> lwtunnel_output, but mpls_output relies on the AF-specific type of dst
> anyway to get the via address.
>
> Therefore, make use of dst->dst_ops->family in mpls_output to
> determine the type of nexthop and thus protocol of the packet instead
> of checking skb->protocol.
>
> Fixes: 61adedf3e3f1 ("route: move lwtunnel state to dst_entry")
> Reported-by: Sam Russell <sam.h.russell@gmail.com>
> Signed-off-by: Robert Shearman <rshearma@brocade.com>
Applied, thanks.
^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2015-12-07 21:33 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2015-12-06 10:20 MPLS outbound packets being dropped Sam Russell
2015-12-07 5:56 ` Sam Russell
2015-12-07 10:40 ` Robert Shearman
2015-12-07 12:53 ` [PATCH net] mpls: fix sending of local encapped packets Robert Shearman
2015-12-07 17:53 ` Sam Russell
2015-12-07 21:33 ` David Miller
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).