Linux-mm Archive mirror
 help / color / mirror / Atom feed
From: Shakeel Butt <shakeel.butt@linux.dev>
To: Jesper Dangaard Brouer <hawk@kernel.org>
Cc: Waiman Long <longman@redhat.com>,
	tj@kernel.org, hannes@cmpxchg.org,  lizefan.x@bytedance.com,
	cgroups@vger.kernel.org, yosryahmed@google.com,
	 netdev@vger.kernel.org, linux-mm@kvack.org,
	kernel-team@cloudflare.com,
	 Arnaldo Carvalho de Melo <acme@kernel.org>,
	Sebastian Andrzej Siewior <bigeasy@linutronix.de>,
	 Daniel Dao <dqminh@cloudflare.com>,
	Ivan Babrou <ivan@cloudflare.com>,
	jr@cloudflare.com
Subject: Re: [PATCH v1] cgroup/rstat: add cgroup_rstat_cpu_lock helpers and tracepoints
Date: Mon, 6 May 2024 09:22:11 -0700	[thread overview]
Message-ID: <mnakwztmiskni3k6ia5mynqfllb3dw5kicuv4wp4e4ituaezwt@2pzkuuqg6r3e> (raw)
In-Reply-To: <55854a94-681e-4142-9160-98b22fa64d61@kernel.org>

On Mon, May 06, 2024 at 02:03:47PM +0200, Jesper Dangaard Brouer wrote:
> 
> 
> On 03/05/2024 21.18, Shakeel Butt wrote:
[...]
> > 
> > Hmm 128 usec is actually unexpectedly high.
> 
> > How does the cgroup hierarchy on your system looks like?
> I didn't design this, so hopefully my co-workers can help me out here? (To
> @Daniel or @Jon)
> 
> My low level view is that, there are 17 top-level directories in
> /sys/fs/cgroup/.
> There are 649 cgroups (counting occurrence of memory.stat).
> There are two directories that contain the major part.
>  - /sys/fs/cgroup/system.slice = 379
>  - /sys/fs/cgroup/production.slice = 233
>  - (production.slice have directory two levels)
>  - remaining 37
> 
> We are open to changing this if you have any advice?
> (@Daniel and @Jon are actually working on restructuring this)
> 
> > How many cgroups have actual workloads running?
> Do you have a command line trick to determine this?
> 

The rstat infra maintains a per-cpu cgroup update tree to only flush
stats of cgroups which have seen updates. So, even if you have large
number of cgroups but the workload is active in small number of cgroups,
the update tree should be much smaller. That is the reason I asked these
questions. I don't have any advise yet. At the I am trying to understand
the usage and then hopefully work on optimizing those.

> 
> > Can the network softirqs run on any cpus or smaller
> > set of cpus? I am assuming these softirqs are processing packets from
> > any or all cgroups and thus have larger cgroup update tree.
> 
> Softirq and specifically NET_RX is running half of the cores (e.g. 64).
> (I'm looking at restructuring this allocation)
> 
> > I wonder if
> > you comment out MEMCG_SOCK stat update and still see the same holding
> > time.
> > 
> 
> It doesn't look like MEMCG_SOCK is used.
> 
> I deduct you are asking:
>  - What is the update count for different types of mod_memcg_state() calls?
> 
> // Dumped via BTF info
> enum memcg_stat_item {
>         MEMCG_SWAP = 43,
>         MEMCG_SOCK = 44,
>         MEMCG_PERCPU_B = 45,
>         MEMCG_VMALLOC = 46,
>         MEMCG_KMEM = 47,
>         MEMCG_ZSWAP_B = 48,
>         MEMCG_ZSWAPPED = 49,
>         MEMCG_NR_STAT = 50,
> };
> 
> sudo bpftrace -e 'kfunc:vmlinux:__mod_memcg_state{@[args->idx]=count()}
> END{printf("\nEND time elapsed: %d sec\n", elapsed / 1000000000);}'
> Attaching 2 probes...
> ^C
> END time elapsed: 99 sec
> 
> @[45]: 17996
> @[46]: 18603
> @[43]: 61858
> @[47]: 21398919
> 
> It seems clear that MEMCG_KMEM = 47 is the main "user".
>  - 21398919/99 = 216150 calls per sec
> 
> Could someone explain to me what this MEMCG_KMEM is used for?
> 

MEMCG_KMEM is the kernel memory charged to a cgroup. It also contains
the untyped kernel memory which is not included in kernel_stack,
pagetables, percpu, vmalloc, slab e.t.c.

The reason I asked about MEMCG_SOCK was that it might be causing larger
update trees (more cgroups) on CPUs processing the NET_RX.

Anyways did the mutex change helped your production workload regarding
latencies?


  reply	other threads:[~2024-05-06 16:22 UTC|newest]

Thread overview: 21+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-05-01 14:04 [PATCH v1] cgroup/rstat: add cgroup_rstat_cpu_lock helpers and tracepoints Jesper Dangaard Brouer
2024-05-01 14:24 ` Waiman Long
2024-05-01 17:22   ` Jesper Dangaard Brouer
2024-05-01 18:41     ` Waiman Long
2024-05-02 11:23       ` Jesper Dangaard Brouer
2024-05-02 18:19         ` Waiman Long
2024-05-03 14:00           ` Jesper Dangaard Brouer
2024-05-03 14:30             ` Waiman Long
2024-05-03 19:18             ` Shakeel Butt
2024-05-06 12:03               ` Jesper Dangaard Brouer
2024-05-06 16:22                 ` Shakeel Butt [this message]
2024-05-06 16:28                   ` Ivan Babrou
2024-05-06 19:45                     ` Shakeel Butt
2024-05-06 19:54                   ` Jesper Dangaard Brouer
2024-05-02 19:44     ` Shakeel Butt
2024-05-03 12:58       ` Jesper Dangaard Brouer
2024-05-03 18:11         ` Shakeel Butt
2024-05-14  5:18 ` Jesper Dangaard Brouer
2024-05-14  5:55   ` Tejun Heo
2024-05-14 16:59     ` Waiman Long
2024-05-15 16:58 ` Tejun Heo

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=mnakwztmiskni3k6ia5mynqfllb3dw5kicuv4wp4e4ituaezwt@2pzkuuqg6r3e \
    --to=shakeel.butt@linux.dev \
    --cc=acme@kernel.org \
    --cc=bigeasy@linutronix.de \
    --cc=cgroups@vger.kernel.org \
    --cc=dqminh@cloudflare.com \
    --cc=hannes@cmpxchg.org \
    --cc=hawk@kernel.org \
    --cc=ivan@cloudflare.com \
    --cc=jr@cloudflare.com \
    --cc=kernel-team@cloudflare.com \
    --cc=linux-mm@kvack.org \
    --cc=lizefan.x@bytedance.com \
    --cc=longman@redhat.com \
    --cc=netdev@vger.kernel.org \
    --cc=tj@kernel.org \
    --cc=yosryahmed@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).