All the mail mirrored from lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH/RFC] tun: defer skb_orphan() to tun_do_read()
@ 2019-03-05 21:02 Arthur Kepner
  2019-03-05 21:33 ` Willem de Bruijn
  0 siblings, 1 reply; 3+ messages in thread
From: Arthur Kepner @ 2019-03-05 21:02 UTC (permalink / raw
  To: netdev@vger.kernel.org; +Cc: Abhik Sarkar

[-- Attachment #1: Type: text/plain, Size: 820 bytes --]


The attachment contains an UNTESTED patch (though a similar patch was tested with a 
3.10 kernel). 

We've been chasing a bug where packet corruption is seen on a tap device. We have a 
PACKET_MMAP socket which is bound to a tap interface. When throughput goes above a 
threshold, we begin to see that packets received on the tap device are truncated, or 
otherwise corrupted. We found that when packets are enqueued to the tap device, they 
are fine, but by the time they are read, they can be corrupted. 

And we found that simply deferring the call to skb_orphan() (where the destructor, 
tpacket_destruct_skb() marks the frame as TP_STATUS_AVAILABLE) fixes the problem. 

Maybe there's a better fix, but this worked for us. Thoughts? (Please CC me on replies - I'm not 
subscribed.) 

-- 
Arthur 

[-- Attachment #2: tun_defer_skb_orphan.patch --]
[-- Type: application/octet-stream, Size: 934 bytes --]

diff --git a/drivers/net/tun.c b/drivers/net/tun.c
index 1d68921..0a94231 100644
--- a/drivers/net/tun.c
+++ b/drivers/net/tun.c
@@ -1111,11 +1111,6 @@ static netdev_tx_t tun_net_xmit(struct sk_buff *skb, struct net_device *dev)
 
 	skb_tx_timestamp(skb);
 
-	/* Orphan the skb - required as we might hang on to it
-	 * for indefinite time.
-	 */
-	skb_orphan(skb);
-
 	nf_reset(skb);
 
 	if (ptr_ring_produce(&tfile->tx_ring, skb))
@@ -1132,6 +1127,7 @@ static netdev_tx_t tun_net_xmit(struct sk_buff *skb, struct net_device *dev)
 drop:
 	this_cpu_inc(tun->pcpu_stats->tx_dropped);
 	skb_tx_error(skb);
+	skb_orphan(skb);
 	kfree_skb(skb);
 	rcu_read_unlock();
 	return NET_XMIT_DROP;
@@ -2223,6 +2219,7 @@ static ssize_t tun_do_read(struct tun_struct *tun, struct tun_file *tfile,
 		struct sk_buff *skb = ptr;
 
 		ret = tun_put_user(tun, tfile, skb, to);
+		skb_orphan(skb);
 		if (unlikely(ret < 0))
 			kfree_skb(skb);
 		else

^ permalink raw reply related	[flat|nested] 3+ messages in thread

* Re: [PATCH/RFC] tun: defer skb_orphan() to tun_do_read()
  2019-03-05 21:02 [PATCH/RFC] tun: defer skb_orphan() to tun_do_read() Arthur Kepner
@ 2019-03-05 21:33 ` Willem de Bruijn
  2019-03-06  3:30   ` Jason Wang
  0 siblings, 1 reply; 3+ messages in thread
From: Willem de Bruijn @ 2019-03-05 21:33 UTC (permalink / raw
  To: Arthur Kepner; +Cc: netdev@vger.kernel.org, Abhik Sarkar

On Tue, Mar 5, 2019 at 4:03 PM Arthur Kepner <Arthur.Kepner@riverbed.com> wrote:
>
>
> The attachment contains an UNTESTED patch (though a similar patch was tested with a
> 3.10 kernel).
>
> We've been chasing a bug where packet corruption is seen on a tap device. We have a
> PACKET_MMAP socket which is bound to a tap interface. When throughput goes above a
> threshold, we begin to see that packets received on the tap device are truncated, or
> otherwise corrupted. We found that when packets are enqueued to the tap device, they
> are fine, but by the time they are read, they can be corrupted.
>
> And we found that simply deferring the call to skb_orphan() (where the destructor,
> tpacket_destruct_skb() marks the frame as TP_STATUS_AVAILABLE) fixes the problem.
>
> Maybe there's a better fix, but this worked for us. Thoughts? (Please CC me on replies - I'm not
> subscribed.)

The skb_orphan calls tpacket_destruct_skb, which updates the entry in
the packet ring to TP_STATUS_AVAILABLE.

Thanks for the report and suggested fix. Delaying the call to
skb_orphan reduces the race condition between release and read, but
does not fully remove it.

As of commit  5cd8d46ea156 ("packet: copy user buffers before orphan
or clone") in 4.20 this should no longer be an issue.

That reuses the msg_zerocopy infrastructure also for packet ring
packets with shared memory. And creates a private copy whenever these
may be looped to a local destination that may queue indefinitely, like
tun.

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [PATCH/RFC] tun: defer skb_orphan() to tun_do_read()
  2019-03-05 21:33 ` Willem de Bruijn
@ 2019-03-06  3:30   ` Jason Wang
  0 siblings, 0 replies; 3+ messages in thread
From: Jason Wang @ 2019-03-06  3:30 UTC (permalink / raw
  To: Willem de Bruijn, Arthur Kepner; +Cc: netdev@vger.kernel.org, Abhik Sarkar


On 2019/3/6 上午5:33, Willem de Bruijn wrote:
> On Tue, Mar 5, 2019 at 4:03 PM Arthur Kepner <Arthur.Kepner@riverbed.com> wrote:
>>
>> The attachment contains an UNTESTED patch (though a similar patch was tested with a
>> 3.10 kernel).
>>
>> We've been chasing a bug where packet corruption is seen on a tap device. We have a
>> PACKET_MMAP socket which is bound to a tap interface. When throughput goes above a
>> threshold, we begin to see that packets received on the tap device are truncated, or
>> otherwise corrupted. We found that when packets are enqueued to the tap device, they
>> are fine, but by the time they are read, they can be corrupted.
>>
>> And we found that simply deferring the call to skb_orphan() (where the destructor,
>> tpacket_destruct_skb() marks the frame as TP_STATUS_AVAILABLE) fixes the problem.
>>
>> Maybe there's a better fix, but this worked for us. Thoughts? (Please CC me on replies - I'm not
>> subscribed.)
> The skb_orphan calls tpacket_destruct_skb, which updates the entry in
> the packet ring to TP_STATUS_AVAILABLE.
>
> Thanks for the report and suggested fix. Delaying the call to
> skb_orphan reduces the race condition between release and read, but
> does not fully remove it.
>
> As of commit  5cd8d46ea156 ("packet: copy user buffers before orphan
> or clone") in 4.20 this should no longer be an issue.
>
> That reuses the msg_zerocopy infrastructure also for packet ring
> packets with shared memory. And creates a private copy whenever these
> may be looped to a local destination that may queue indefinitely, like
> tun.


+1 and if possible the commit should be backported to 3.10. (Or try 
vhost + TAP instead of packet mmap).

Thanks


^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2019-03-06  3:30 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2019-03-05 21:02 [PATCH/RFC] tun: defer skb_orphan() to tun_do_read() Arthur Kepner
2019-03-05 21:33 ` Willem de Bruijn
2019-03-06  3:30   ` Jason Wang

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.