LKML Archive mirror
 help / color / mirror / Atom feed
From: Abel Wu <wuyun.abel@bytedance.com>
To: Eric Dumazet <edumazet@google.com>
Cc: Tejun Heo <tj@kernel.org>, Christian Warloe <cwarloe@google.com>,
	Wei Wang <weiwan@google.com>,
	"David S. Miller" <davem@davemloft.net>,
	Jakub Kicinski <kuba@kernel.org>, Paolo Abeni <pabeni@redhat.com>,
	Johannes Weiner <hannes@cmpxchg.org>,
	Michal Hocko <mhocko@kernel.org>,
	Roman Gushchin <roman.gushchin@linux.dev>,
	Shakeel Butt <shakeelb@google.com>,
	Muchun Song <muchun.song@linux.dev>,
	Andrew Morton <akpm@linux-foundation.org>,
	David Ahern <dsahern@kernel.org>,
	Yosry Ahmed <yosryahmed@google.com>,
	"Matthew Wilcox (Oracle)" <willy@infradead.org>,
	Yu Zhao <yuzhao@google.com>,
	Vasily Averin <vasily.averin@linux.dev>,
	Kuniyuki Iwashima <kuniyu@amazon.com>,
	Martin KaFai Lau <martin.lau@kernel.org>,
	Xin Long <lucien.xin@gmail.com>,
	Jason Xing <kernelxing@tencent.com>,
	Michal Hocko <mhocko@suse.com>,
	Alexei Starovoitov <ast@kernel.org>,
	open list <linux-kernel@vger.kernel.org>,
	"open list:NETWORKING [GENERAL]" <netdev@vger.kernel.org>,
	"open list:CONTROL GROUP - MEMORY RESOURCE CONTROLLER (MEMCG)" 
	<cgroups@vger.kernel.org>,
	"open list:CONTROL GROUP - MEMORY RESOURCE CONTROLLER (MEMCG)" 
	<linux-mm@kvack.org>
Subject: Re: Re: [RFC PATCH net-next] sock: Propose socket.urgent for sockmem isolation
Date: Tue, 13 Jun 2023 14:46:32 +0800	[thread overview]
Message-ID: <b879d810-132b-38ab-c13d-30fabdc8954a@bytedance.com> (raw)
In-Reply-To: <CANn89i+Qqq5nV0oRLh_KEHRV6VmSbS5PsSvayVHBi52FbB=sKA@mail.gmail.com>

On 6/9/23 5:07 PM, Eric Dumazet wrote:
> On Fri, Jun 9, 2023 at 10:28 AM Abel Wu <wuyun.abel@bytedance.com> wrote:
>>
>> This is just a PoC patch intended to resume the discussion about
>> tcpmem isolation opened by Google in LPC'22 [1].
>>
>> We are facing the same problem that the global shared threshold can
>> cause isolation issues. Low priority jobs can hog TCP memory and
>> adversely impact higher priority jobs. What's worse is that these
>> low priority jobs usually have smaller cpu weights leading to poor
>> ability to consume rx data.
>>
>> To tackle this problem, an interface for non-root cgroup memory
>> controller named 'socket.urgent' is proposed. It determines whether
>> the sockets of this cgroup and its descendants can escape from the
>> constrains or not under global socket memory pressure.
>>
>> The 'urgent' semantics will not take effect under memcg pressure in
>> order to protect against worse memstalls, thus will be the same as
>> before without this patch.
>>
>> This proposal doesn't remove protocal's threshold as we found it
>> useful in restraining memory defragment. As aforementioned the low
>> priority jobs can hog lots of memory, which is unreclaimable and
>> unmovable, for some time due to small cpu weight.
>>
>> So in practice we allow high priority jobs with net-memcg accounting
>> enabled to escape the global constrains if the net-memcg itselt is
>> not under pressure. While for lower priority jobs, the budget will
>> be tightened as the memory usage of 'urgent' jobs increases. In this
>> way we can finally achieve:
>>
>>    - Important jobs won't be priority inversed by the background
>>      jobs in terms of socket memory pressure/limit.
>>
>>    - Global constrains are still effective, but only on non-urgent
>>      jobs, useful for admins on policy decision on defrag.
>>
>> Comments/Ideas are welcomed, thanks!
>>
> 
> This seems to go in a complete opposite direction than memcg promises.
> 
> Can we fix memcg, so that :
> 
> Each group can use the memory it was provisioned (this includes TCP buffers)

Yes, but might not be easy once memory gets over-committed (which is
common in modern data-centers). So as a tradeoff, we intend to put
harder constraint on memory allocation for low priority jobs. Or else
if every job can use its provisioned memory, than there will be more
memstalls blocking random jobs which could be the important ones.
Either way hurts performance, but the difference is whose performance
gets hurt.

Memory protection (memory.{min,low}) helps the important jobs less
affected by memstalls. But once low priority jobs use lots of kernel
memory like sockmem, the protection might become much less efficient.

> 
> Global tcp_memory can disappear (set tcp_mem to infinity)

  parent reply	other threads:[~2023-06-13  6:46 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-06-09  8:27 [RFC PATCH net-next] sock: Propose socket.urgent for sockmem isolation Abel Wu
2023-06-09  9:07 ` Eric Dumazet
2023-06-09 17:53   ` Shakeel Butt
2023-06-13  6:46     ` Abel Wu
2023-06-13  6:46   ` Abel Wu [this message]
2023-06-16  7:27     ` Abel Wu
2023-06-19 17:30     ` Michal Koutný
2023-06-20  6:39       ` Abel Wu

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=b879d810-132b-38ab-c13d-30fabdc8954a@bytedance.com \
    --to=wuyun.abel@bytedance.com \
    --cc=akpm@linux-foundation.org \
    --cc=ast@kernel.org \
    --cc=cgroups@vger.kernel.org \
    --cc=cwarloe@google.com \
    --cc=davem@davemloft.net \
    --cc=dsahern@kernel.org \
    --cc=edumazet@google.com \
    --cc=hannes@cmpxchg.org \
    --cc=kernelxing@tencent.com \
    --cc=kuba@kernel.org \
    --cc=kuniyu@amazon.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=lucien.xin@gmail.com \
    --cc=martin.lau@kernel.org \
    --cc=mhocko@kernel.org \
    --cc=mhocko@suse.com \
    --cc=muchun.song@linux.dev \
    --cc=netdev@vger.kernel.org \
    --cc=pabeni@redhat.com \
    --cc=roman.gushchin@linux.dev \
    --cc=shakeelb@google.com \
    --cc=tj@kernel.org \
    --cc=vasily.averin@linux.dev \
    --cc=weiwan@google.com \
    --cc=willy@infradead.org \
    --cc=yosryahmed@google.com \
    --cc=yuzhao@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).