All the mail mirrored from lore.kernel.org
 help / color / mirror / Atom feed
From: Tejun Heo <tj@kernel.org>
To: Michal Hocko <mhocko@kernel.org>
Cc: akpm@linux-foundation.org, hannes@cmpxchg.org,
	cgroups@vger.kernel.org, linux-mm@kvack.org,
	vdavydov@parallels.com, kernel-team@fb.com
Subject: Re: [PATCH 2/3] memcg: ratify and consolidate over-charge handling
Date: Mon, 14 Sep 2015 15:56:08 -0400	[thread overview]
Message-ID: <20150914195608.GF25369@htj.duckdns.org> (raw)
In-Reply-To: <20150914193225.GA26273@dhcp22.suse.cz>

Hello, Michal.

On Mon, Sep 14, 2015 at 09:32:25PM +0200, Michal Hocko wrote:
> >   mem_cgroup_try_charge() needs to switch
> >   the returned cgroup to the root one.
> > 
> > The reality is that in memcg there are cases where we are forced
> > and/or willing to go over the limit.  Each such case needs to be
> > scrutinized and justified but there definitely are situations where
> > that is the right thing to do.  We alredy do this but with a
> > superficial and inconsistent disguise which leads to unnecessary
> > complications.
> >
> > This patch updates try_charge() so that it over-charges and returns 0
> > when deemed necessary.  -EINTR return is removed along with all
> > special case handling in the callers.
> 
> OK the code is easier in the end, although I would argue that try_charge
> could return ENOMEM for GFP_NOWAIT instead of overcharging (this would
> e.g. allow precharge to bail out earlier). Something for a separate patch I
> guess.

Hmm... GFP_NOWAIT is failed unless it also has __GFP_NOFAIL.

> Anyway I still do not like usage > max/hard limit presented to userspace
> because it looks like a clear breaking of max/hard limit semantic. I
> realize that we cannot solve the underlying problem easily or it might
> be unfeasible but we should consider how to present this state to the
> userspace.
> We have basically 2 options AFAICS. We can either document that a
> _temporal_ breach of the max/hard limit is allowed or we can hide this
> fact and always present max(current,max).
> The first one might be better for an easier debugging and it is also
> more honest about the current state but the definition of the hard limit
> is a bit weird. It also exposes implementation details to the userspace.
> The other choice is clearly lying but users shouldn't care about the
> implementation details and if the state is really temporal then the
> userspace shouldn't even notice. There is also a risk that somebody is
> already depending on current < max which happened to work without kmem
> until now.
> This is something to be solved in a separate patch I guess but we
> should think about that. I am not entirely clear on that myself but I am
> more inclined to the first option and simply document the potential
> corner case and temporal breach.

I'm pretty sure we don't wanna lie.  Just document that temporal
small-scale breaches may happen.  I don't even think this is an
implementation detail.  The fact that we have separate high and max
limits is already admitting that this is inherently different from
global case and that memcg is consciously and actively making
trade-offs regarding handling of global and local memory pressure and
I think that's the right thing to do and something inherent to what
memcg is doing here.

Thanks.

-- 
tejun

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

WARNING: multiple messages have this Message-ID (diff)
From: Tejun Heo <tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
To: Michal Hocko <mhocko-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
Cc: akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b@public.gmane.org,
	hannes-druUgvl0LCNAfugRpC6u6w@public.gmane.org,
	cgroups-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	linux-mm-Bw31MaZKKs3YtjvyW6yDsg@public.gmane.org,
	vdavydov-bzQdu9zFT3WakBO8gow8eQ@public.gmane.org,
	kernel-team-b10kYP2dOMg@public.gmane.org
Subject: Re: [PATCH 2/3] memcg: ratify and consolidate over-charge handling
Date: Mon, 14 Sep 2015 15:56:08 -0400	[thread overview]
Message-ID: <20150914195608.GF25369@htj.duckdns.org> (raw)
In-Reply-To: <20150914193225.GA26273-2MMpYkNvuYDjFM9bn6wA6Q@public.gmane.org>

Hello, Michal.

On Mon, Sep 14, 2015 at 09:32:25PM +0200, Michal Hocko wrote:
> >   mem_cgroup_try_charge() needs to switch
> >   the returned cgroup to the root one.
> > 
> > The reality is that in memcg there are cases where we are forced
> > and/or willing to go over the limit.  Each such case needs to be
> > scrutinized and justified but there definitely are situations where
> > that is the right thing to do.  We alredy do this but with a
> > superficial and inconsistent disguise which leads to unnecessary
> > complications.
> >
> > This patch updates try_charge() so that it over-charges and returns 0
> > when deemed necessary.  -EINTR return is removed along with all
> > special case handling in the callers.
> 
> OK the code is easier in the end, although I would argue that try_charge
> could return ENOMEM for GFP_NOWAIT instead of overcharging (this would
> e.g. allow precharge to bail out earlier). Something for a separate patch I
> guess.

Hmm... GFP_NOWAIT is failed unless it also has __GFP_NOFAIL.

> Anyway I still do not like usage > max/hard limit presented to userspace
> because it looks like a clear breaking of max/hard limit semantic. I
> realize that we cannot solve the underlying problem easily or it might
> be unfeasible but we should consider how to present this state to the
> userspace.
> We have basically 2 options AFAICS. We can either document that a
> _temporal_ breach of the max/hard limit is allowed or we can hide this
> fact and always present max(current,max).
> The first one might be better for an easier debugging and it is also
> more honest about the current state but the definition of the hard limit
> is a bit weird. It also exposes implementation details to the userspace.
> The other choice is clearly lying but users shouldn't care about the
> implementation details and if the state is really temporal then the
> userspace shouldn't even notice. There is also a risk that somebody is
> already depending on current < max which happened to work without kmem
> until now.
> This is something to be solved in a separate patch I guess but we
> should think about that. I am not entirely clear on that myself but I am
> more inclined to the first option and simply document the potential
> corner case and temporal breach.

I'm pretty sure we don't wanna lie.  Just document that temporal
small-scale breaches may happen.  I don't even think this is an
implementation detail.  The fact that we have separate high and max
limits is already admitting that this is inherently different from
global case and that memcg is consciously and actively making
trade-offs regarding handling of global and local memory pressure and
I think that's the right thing to do and something inherent to what
memcg is doing here.

Thanks.

-- 
tejun

  reply	other threads:[~2015-09-14 19:56 UTC|newest]

Thread overview: 24+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-09-13 20:14 [PATCH 1/3] memcg: collect kmem bypass conditions into __memcg_kmem_bypass() Tejun Heo
2015-09-13 20:14 ` [PATCH 2/3] memcg: ratify and consolidate over-charge handling Tejun Heo
2015-09-13 20:14   ` Tejun Heo
2015-09-13 20:15   ` [PATCH 3/3] memcg: drop unnecessary cold-path tests from __memcg_kmem_bypass() Tejun Heo
2015-09-13 20:15     ` Tejun Heo
2015-09-14 12:51     ` Vladimir Davydov
2015-09-14 12:51       ` Vladimir Davydov
2015-09-14 19:40     ` Michal Hocko
2015-09-14 19:40       ` Michal Hocko
2015-09-14 12:44   ` [PATCH 2/3] memcg: ratify and consolidate over-charge handling Vladimir Davydov
2015-09-14 12:44     ` Vladimir Davydov
2015-09-14 15:51     ` Tejun Heo
2015-09-14 15:51       ` Tejun Heo
2015-09-14 19:32   ` Michal Hocko
2015-09-14 19:56     ` Tejun Heo [this message]
2015-09-14 19:56       ` Tejun Heo
2015-09-15  8:01       ` Michal Hocko
2015-09-15 15:50         ` Tejun Heo
2015-09-14 20:07   ` [PATCH v2 " Tejun Heo
2015-09-15 16:18     ` Johannes Weiner
2015-09-15 16:18       ` Johannes Weiner
2015-09-14  9:03 ` [PATCH 1/3] memcg: collect kmem bypass conditions into __memcg_kmem_bypass() Vladimir Davydov
2015-09-14  9:03   ` Vladimir Davydov
2015-09-14 15:21 ` Michal Hocko

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20150914195608.GF25369@htj.duckdns.org \
    --to=tj@kernel.org \
    --cc=akpm@linux-foundation.org \
    --cc=cgroups@vger.kernel.org \
    --cc=hannes@cmpxchg.org \
    --cc=kernel-team@fb.com \
    --cc=linux-mm@kvack.org \
    --cc=mhocko@kernel.org \
    --cc=vdavydov@parallels.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.