From: Florian Westphal <fw at strlen.de> To: mptcp at lists.01.org Subject: [MPTCP] [RFC mptpcp-next] mptcp: add ooo prune support Date: Fri, 02 Oct 2020 17:45:35 +0200 [thread overview] Message-ID: <20201002154535.28412-1-fw@strlen.de> (raw) [-- Attachment #1: Type: text/plain, Size: 4557 bytes --] It might be possible that entire receive buffer is occupied by skbs in the OOO queue. In this case we can't pull more skbs from subflows and the holes will never be filled. If this happens, schedule the work queue and prune ~12% of skbs to make space available. Also add a MIB counter for this. Signed-off-by: Florian Westphal <fw(a)strlen.de> --- Paolo, this does relate a bit to our discussion wrt. oow tracking. I thought we might need to add some sort of cushion to account for window discrepancies, but that might then get us in a state where wmem might be full... What do you think? I did NOT see such a problem in practice, this is a theoretical "fix". TCP has similar code to deal with corner cases of small-oow packets. net/mptcp/mib.c | 1 + net/mptcp/mib.h | 1 + net/mptcp/protocol.c | 48 ++++++++++++++++++++++++++++++++++++++++++-- net/mptcp/protocol.h | 1 + 4 files changed, 49 insertions(+), 2 deletions(-) diff --git a/net/mptcp/mib.c b/net/mptcp/mib.c index 84d119436b22..65c575e3af60 100644 --- a/net/mptcp/mib.c +++ b/net/mptcp/mib.c @@ -25,6 +25,7 @@ static const struct snmp_mib mptcp_snmp_list[] = { SNMP_MIB_ITEM("OFOQueueTail", MPTCP_MIB_OFOQUEUETAIL), SNMP_MIB_ITEM("OFOQueue", MPTCP_MIB_OFOQUEUE), SNMP_MIB_ITEM("OFOMerge", MPTCP_MIB_OFOMERGE), + SNMP_MIB_ITEM("OFOPrune", MPTCP_MIB_OFOPRUNE), SNMP_MIB_ITEM("NoDSSInWindow", MPTCP_MIB_NODSSWINDOW), SNMP_MIB_ITEM("DuplicateData", MPTCP_MIB_DUPDATA), SNMP_MIB_ITEM("AddAddr", MPTCP_MIB_ADDADDR), diff --git a/net/mptcp/mib.h b/net/mptcp/mib.h index 47bcecce1106..75a7fb3a87db 100644 --- a/net/mptcp/mib.h +++ b/net/mptcp/mib.h @@ -18,6 +18,7 @@ enum linux_mptcp_mib_field { MPTCP_MIB_OFOQUEUETAIL, /* Segments inserted into OoO queue tail */ MPTCP_MIB_OFOQUEUE, /* Segments inserted into OoO queue */ MPTCP_MIB_OFOMERGE, /* Segments merged in OoO queue */ + MPTCP_MIB_OFOPRUNE, /* Segments pruned from OoO queue */ MPTCP_MIB_NODSSWINDOW, /* Segments not in MPTCP windows */ MPTCP_MIB_DUPDATA, /* Segments discarded due to duplicate DSS */ MPTCP_MIB_ADDADDR, /* Received ADD_ADDR with echo-flag=0 */ diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c index 79cd8e879c10..4cc30a3d426c 100644 --- a/net/mptcp/protocol.c +++ b/net/mptcp/protocol.c @@ -658,8 +658,17 @@ void mptcp_data_ready(struct sock *sk, struct sock *ssk) sk_rbuf = ssk_rbuf; /* over limit? can't append more skbs to msk */ - if (atomic_read(&sk->sk_rmem_alloc) > sk_rbuf) - goto wake; + if (atomic_read(&sk->sk_rmem_alloc) > sk_rbuf) { + if (likely(!skb_queue_empty(&sk->sk_receive_queue))) + goto wake; + + /* Entire recvbuf occupied by OOO skbs? Prune time. */ + if (!test_and_set_bit(MPTCP_WORK_PRUNE_OFO, &msk->flags) && + schedule_work(&msk->work)) + sock_hold(sk); + + return; + } if (move_skbs_to_msk(msk, ssk)) goto wake; @@ -1797,6 +1806,38 @@ static bool mptcp_check_close_timeout(const struct sock *sk) return true; } +static void mptcp_prune_ofo(struct mptcp_sock *msk) +{ + struct sock *sk = &msk->sk.icsk_inet.sk; + struct sk_buff *skb, *prev = NULL; + int goal; + + if (!skb_queue_empty(&sk->sk_receive_queue) || + atomic_read(&sk->sk_rmem_alloc) <= sk->sk_rcvbuf) + return; + + if (WARN_ON_ONCE(RB_EMPTY_ROOT(&msk->out_of_order_queue))) + return; + + MPTCP_INC_STATS(sock_net(sk), MPTCP_MIB_OFOPRUNE); + + goal = READ_ONCE(sk->sk_rcvbuf) >> 3; + skb = msk->ooo_last_skb; + + while (skb) { + prev = skb_rb_prev(skb); + rb_erase(&skb->rbnode, &msk->out_of_order_queue); + goal -= skb->truesize; + mptcp_drop(sk, skb); + + if (goal <= 0) + break; + skb = prev; + } + + msk->ooo_last_skb = prev; +} + static void mptcp_worker(struct work_struct *work) { struct mptcp_sock *msk = container_of(work, struct mptcp_sock, work); @@ -1819,6 +1860,9 @@ static void mptcp_worker(struct work_struct *work) if (mptcp_send_head(sk)) mptcp_push_pending(sk, 0); + if (test_and_clear_bit(MPTCP_WORK_PRUNE_OFO, &msk->flags)) + mptcp_prune_ofo(msk); + if (msk->pm.status) pm_work(msk); diff --git a/net/mptcp/protocol.h b/net/mptcp/protocol.h index d33e9676a1a3..360441fdaa93 100644 --- a/net/mptcp/protocol.h +++ b/net/mptcp/protocol.h @@ -91,6 +91,7 @@ #define MPTCP_WORK_EOF 3 #define MPTCP_FALLBACK_DONE 4 #define MPTCP_WORKER_RUNNING 5 +#define MPTCP_WORK_PRUNE_OFO 6 static inline bool before64(__u64 seq1, __u64 seq2) { -- 2.26.2
WARNING: multiple messages have this Message-ID (diff)
From: Matthieu Baerts <matthieu.baerts@tessares.net> To: mptcp@lists.linux.dev Cc: Florian Westphal <fw@strlen.de> Subject: [RESEND] [MPTCP] [RFC mptpcp-next] mptcp: add ooo prune support Date: Wed, 26 May 2021 18:08:08 +0200 [thread overview] Message-ID: <20201002154535.28412-1-fw@strlen.de> (raw) Message-ID: <20210526160808.UOg3X-SCnjVAmC0qouTk8IltwdlX4Nxma8bMylqBzOs@z> (raw) In-Reply-To: <20210526160813.4160315-1-matthieu.baerts@tessares.net> From: Florian Westphal <fw@strlen.de> It might be possible that entire receive buffer is occupied by skbs in the OOO queue. In this case we can't pull more skbs from subflows and the holes will never be filled. If this happens, schedule the work queue and prune ~12% of skbs to make space available. Also add a MIB counter for this. Signed-off-by: Florian Westphal <fw@strlen.de> --- Paolo, this does relate a bit to our discussion wrt. oow tracking. I thought we might need to add some sort of cushion to account for window discrepancies, but that might then get us in a state where wmem might be full... What do you think? I did NOT see such a problem in practice, this is a theoretical "fix". TCP has similar code to deal with corner cases of small-oow packets. net/mptcp/mib.c | 1 + net/mptcp/mib.h | 1 + net/mptcp/protocol.c | 48 ++++++++++++++++++++++++++++++++++++++++++-- net/mptcp/protocol.h | 1 + 4 files changed, 49 insertions(+), 2 deletions(-) diff --git a/net/mptcp/mib.c b/net/mptcp/mib.c index 84d119436b22..65c575e3af60 100644 --- a/net/mptcp/mib.c +++ b/net/mptcp/mib.c @@ -25,6 +25,7 @@ static const struct snmp_mib mptcp_snmp_list[] = { SNMP_MIB_ITEM("OFOQueueTail", MPTCP_MIB_OFOQUEUETAIL), SNMP_MIB_ITEM("OFOQueue", MPTCP_MIB_OFOQUEUE), SNMP_MIB_ITEM("OFOMerge", MPTCP_MIB_OFOMERGE), + SNMP_MIB_ITEM("OFOPrune", MPTCP_MIB_OFOPRUNE), SNMP_MIB_ITEM("NoDSSInWindow", MPTCP_MIB_NODSSWINDOW), SNMP_MIB_ITEM("DuplicateData", MPTCP_MIB_DUPDATA), SNMP_MIB_ITEM("AddAddr", MPTCP_MIB_ADDADDR), diff --git a/net/mptcp/mib.h b/net/mptcp/mib.h index 47bcecce1106..75a7fb3a87db 100644 --- a/net/mptcp/mib.h +++ b/net/mptcp/mib.h @@ -18,6 +18,7 @@ enum linux_mptcp_mib_field { MPTCP_MIB_OFOQUEUETAIL, /* Segments inserted into OoO queue tail */ MPTCP_MIB_OFOQUEUE, /* Segments inserted into OoO queue */ MPTCP_MIB_OFOMERGE, /* Segments merged in OoO queue */ + MPTCP_MIB_OFOPRUNE, /* Segments pruned from OoO queue */ MPTCP_MIB_NODSSWINDOW, /* Segments not in MPTCP windows */ MPTCP_MIB_DUPDATA, /* Segments discarded due to duplicate DSS */ MPTCP_MIB_ADDADDR, /* Received ADD_ADDR with echo-flag=0 */ diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c index 79cd8e879c10..4cc30a3d426c 100644 --- a/net/mptcp/protocol.c +++ b/net/mptcp/protocol.c @@ -658,8 +658,17 @@ void mptcp_data_ready(struct sock *sk, struct sock *ssk) sk_rbuf = ssk_rbuf; /* over limit? can't append more skbs to msk */ - if (atomic_read(&sk->sk_rmem_alloc) > sk_rbuf) - goto wake; + if (atomic_read(&sk->sk_rmem_alloc) > sk_rbuf) { + if (likely(!skb_queue_empty(&sk->sk_receive_queue))) + goto wake; + + /* Entire recvbuf occupied by OOO skbs? Prune time. */ + if (!test_and_set_bit(MPTCP_WORK_PRUNE_OFO, &msk->flags) && + schedule_work(&msk->work)) + sock_hold(sk); + + return; + } if (move_skbs_to_msk(msk, ssk)) goto wake; @@ -1797,6 +1806,38 @@ static bool mptcp_check_close_timeout(const struct sock *sk) return true; } +static void mptcp_prune_ofo(struct mptcp_sock *msk) +{ + struct sock *sk = &msk->sk.icsk_inet.sk; + struct sk_buff *skb, *prev = NULL; + int goal; + + if (!skb_queue_empty(&sk->sk_receive_queue) || + atomic_read(&sk->sk_rmem_alloc) <= sk->sk_rcvbuf) + return; + + if (WARN_ON_ONCE(RB_EMPTY_ROOT(&msk->out_of_order_queue))) + return; + + MPTCP_INC_STATS(sock_net(sk), MPTCP_MIB_OFOPRUNE); + + goal = READ_ONCE(sk->sk_rcvbuf) >> 3; + skb = msk->ooo_last_skb; + + while (skb) { + prev = skb_rb_prev(skb); + rb_erase(&skb->rbnode, &msk->out_of_order_queue); + goal -= skb->truesize; + mptcp_drop(sk, skb); + + if (goal <= 0) + break; + skb = prev; + } + + msk->ooo_last_skb = prev; +} + static void mptcp_worker(struct work_struct *work) { struct mptcp_sock *msk = container_of(work, struct mptcp_sock, work); @@ -1819,6 +1860,9 @@ static void mptcp_worker(struct work_struct *work) if (mptcp_send_head(sk)) mptcp_push_pending(sk, 0); + if (test_and_clear_bit(MPTCP_WORK_PRUNE_OFO, &msk->flags)) + mptcp_prune_ofo(msk); + if (msk->pm.status) pm_work(msk); diff --git a/net/mptcp/protocol.h b/net/mptcp/protocol.h index d33e9676a1a3..360441fdaa93 100644 --- a/net/mptcp/protocol.h +++ b/net/mptcp/protocol.h @@ -91,6 +91,7 @@ #define MPTCP_WORK_EOF 3 #define MPTCP_FALLBACK_DONE 4 #define MPTCP_WORKER_RUNNING 5 +#define MPTCP_WORK_PRUNE_OFO 6 static inline bool before64(__u64 seq1, __u64 seq2) {
next reply other threads:[~2020-10-02 15:45 UTC|newest] Thread overview: 20+ messages / expand[flat|nested] mbox.gz Atom feed top 2020-10-02 15:45 Florian Westphal [this message] 2021-05-26 16:08 ` [RESEND] [MPTCP] [RFC mptpcp-next] mptcp: add ooo prune support Matthieu Baerts -- strict thread matches above, loose matches on Subject: below -- 2021-05-26 16:08 [RESEND] [PATCH 0/8] Please ignore: resending some patches for patchwork.kernel.org Matthieu Baerts 2021-05-06 6:39 [MPTCP][PATCH mptcp-next 0/3] MP_FAIL support Geliang Tang 2021-05-06 6:39 ` [MPTCP][PATCH mptcp-next 1/3] mptcp: MP_FAIL suboption sending Geliang Tang 2021-05-26 16:08 ` [RESEND] " Matthieu Baerts 2021-05-06 6:39 ` [MPTCP][PATCH mptcp-next 2/3] mptcp: MP_FAIL suboption receiving Geliang Tang 2021-05-26 16:08 ` [RESEND] " Matthieu Baerts 2021-05-06 6:39 ` [MPTCP][PATCH mptcp-next 3/3] mptcp: send out MP_FAIL when data checksum fail Geliang Tang 2021-05-26 16:08 ` [RESEND] " Matthieu Baerts 2021-05-08 0:54 ` Mat Martineau 2021-05-08 0:44 ` [MPTCP][PATCH mptcp-next 2/3] mptcp: MP_FAIL suboption receiving Mat Martineau 2020-11-05 17:01 [MPTCP] [PATCH MPTCP 5/5] mptcp: send fastclose if userspace closes socket with unread data Florian Westphal 2021-05-26 16:08 ` [RESEND] " Matthieu Baerts 2020-11-05 17:01 [MPTCP] [PATCH MPTCP 1/5] tcp: make two mptcp helpers available to tcp stack Florian Westphal 2021-05-26 16:08 ` [RESEND] " Matthieu Baerts 2020-09-24 14:35 [MPTCP] [RFC PATCH 4/4] tcp: parse tcp options contained in reset packets Florian Westphal 2021-05-26 16:08 ` [RESEND] " Matthieu Baerts 2020-09-24 14:35 [MPTCP] [RFC PATCH 2/4] tcp: move selected mptcp helpers to tcp.h/mptcp.h Florian Westphal 2021-05-26 16:08 ` [RESEND] " Matthieu Baerts
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to=20201002154535.28412-1-fw@strlen.de \ --to=unknown@example.com \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: linkBe sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.