Re: [RFC V3] net: don't wait for order-3 page allocation

All the mail mirrored from lore.kernel.org
 help / color / mirror / Atom feed

From: Michal Hocko <mhocko@suse.cz>
To: Vlastimil Babka <vbabka@suse.cz>
Cc: Eric Dumazet <edumazet@google.com>,
	David Rientjes <rientjes@google.com>, Shaohua Li <shli@fb.com>,
	netdev <netdev@vger.kernel.org>,
	David Miller <davem@davemloft.net>,
	kernel-team <Kernel-team@fb.com>,
	clm@fb.com, linux-mm@kvack.org, dbavatar@gmail.com
Subject: Re: [RFC V3] net: don't wait for order-3 page allocation
Date: Thu, 18 Jun 2015 17:47:16 +0200	[thread overview]
Message-ID: <20150618154716.GH5858@dhcp22.suse.cz> (raw)
In-Reply-To: <5582E240.8080704@suse.cz>

On Thu 18-06-15 17:22:40, Vlastimil Babka wrote:
> On 06/18/2015 04:43 PM, Michal Hocko wrote:
> >On Thu 18-06-15 07:35:53, Eric Dumazet wrote:
> >>On Thu, Jun 18, 2015 at 7:30 AM, Michal Hocko <mhocko@suse.cz> wrote:
> >>
> >>>Abusing __GFP_NO_KSWAPD is a wrong way to go IMHO. It is true that the
> >>>_current_ implementation of the allocator has this nasty and very subtle
> >>>side effect but that doesn't mean it should be abused outside of the mm
> >>>proper. Why shouldn't this path wake the kswapd and let it compact
> >>>memory on the background to increase the success rate for the later
> >>>high order allocations?
> >>
> >>I kind of agree.
> >>
> >>If kswapd is a problem (is it ???) we should fix it, instead of adding
> >>yet another flag to some random locations attempting
> >>memory allocations.
> >
> >No, kswapd is not a problem. The problem is ~__GFP_WAIT allocation can
> >access some portion of the memory reserves (see gfp_to_alloc_flags resp.
> >__zone_watermark_ok and ALLOC_HARDER). __GFP_NO_KSWAPD is just a dirty
> >hack to not give that access which was introduced for THP AFAIR.
> >
> >The implicit access to memory reserves for non sleeping allocation has
> >been there for ages and it might be not suitable for this particular
> >path but that doesn't mean another gfp flag with a different side effect
> >should be hijacked. We should either stop doing that implicit access to
> >memory reserves and give __GFP_RESERVE or add the __GFP_NORESERVE. But
> >that is a problem to be solved in the mm proper. Spreading subtle
> >dependencies outside of mm will just make situation worse.
> 
> So you are not proposing to use these __GFP_RESERVE/NORESERVE flag outside
> of mm, right? (besides, we distinguish several kinds of reserves, so what
> exactly would the flag do?)

That is to be discussed. Most allocations already express their interest
in memory reserves by __GFP_HIGH directly or by GFP_ATOMIC indirectly.
So maybe we do not need any additional flag here. There are not that
many ~__GFP_WAIT and most of them seem to require it _only_ because the
context doesn't allow for sleeping (e.g. to prevent from deadlocks).

> As that would be also subtle dependency. The
> general problem I think is that we should want the mm users to specify
> higher-level intentions (such as GFP_KERNEL) which would map to specific
> directions (__GFP_*) for the allocator, and currently it's rather a mess of
> both kinds of flags.

I agree. So I think that maybe we should drop that implicit access to
memory reserves for ~__GFP_WAIT allocations and let it do what it is
documented to do.

> Clearly the intention here is "opportunistic allocation that should
> not reclaim/compact, use reserves, wake up kswapd (?) because it's
> better to fall back to smaller pages than wait") and we don't seem to
> have a GFP_OPPORTUNISTIC flag for that. The allocation has to then
> mask out __GFP_WAIT which however looks like an atomic allocation to
> the allocator and give access to reserves, etc...

I think simply dropping GFP_WAIT is a good way to express that. The
fact that the current implementation gives access to memory reserves
implicitly is just a detail and the user of the allocator shouldn't care
about that.
-- 
Michal Hocko
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

WARNING: multiple messages have this Message-ID (diff)

From: Michal Hocko <mhocko@suse.cz>
To: Vlastimil Babka <vbabka@suse.cz>
Cc: Eric Dumazet <edumazet@google.com>,
	David Rientjes <rientjes@google.com>, Shaohua Li <shli@fb.com>,
	netdev <netdev@vger.kernel.org>,
	David Miller <davem@davemloft.net>,
	kernel-team <Kernel-team@fb.com>,
	clm@fb.com, linux-mm@kvack.org, dbavatar@gmail.com
Subject: Re: [RFC V3] net: don't wait for order-3 page allocation
Date: Thu, 18 Jun 2015 17:47:16 +0200	[thread overview]
Message-ID: <20150618154716.GH5858@dhcp22.suse.cz> (raw)
In-Reply-To: <5582E240.8080704@suse.cz>

On Thu 18-06-15 17:22:40, Vlastimil Babka wrote:
> On 06/18/2015 04:43 PM, Michal Hocko wrote:
> >On Thu 18-06-15 07:35:53, Eric Dumazet wrote:
> >>On Thu, Jun 18, 2015 at 7:30 AM, Michal Hocko <mhocko@suse.cz> wrote:
> >>
> >>>Abusing __GFP_NO_KSWAPD is a wrong way to go IMHO. It is true that the
> >>>_current_ implementation of the allocator has this nasty and very subtle
> >>>side effect but that doesn't mean it should be abused outside of the mm
> >>>proper. Why shouldn't this path wake the kswapd and let it compact
> >>>memory on the background to increase the success rate for the later
> >>>high order allocations?
> >>
> >>I kind of agree.
> >>
> >>If kswapd is a problem (is it ???) we should fix it, instead of adding
> >>yet another flag to some random locations attempting
> >>memory allocations.
> >
> >No, kswapd is not a problem. The problem is ~__GFP_WAIT allocation can
> >access some portion of the memory reserves (see gfp_to_alloc_flags resp.
> >__zone_watermark_ok and ALLOC_HARDER). __GFP_NO_KSWAPD is just a dirty
> >hack to not give that access which was introduced for THP AFAIR.
> >
> >The implicit access to memory reserves for non sleeping allocation has
> >been there for ages and it might be not suitable for this particular
> >path but that doesn't mean another gfp flag with a different side effect
> >should be hijacked. We should either stop doing that implicit access to
> >memory reserves and give __GFP_RESERVE or add the __GFP_NORESERVE. But
> >that is a problem to be solved in the mm proper. Spreading subtle
> >dependencies outside of mm will just make situation worse.
> 
> So you are not proposing to use these __GFP_RESERVE/NORESERVE flag outside
> of mm, right? (besides, we distinguish several kinds of reserves, so what
> exactly would the flag do?)

That is to be discussed. Most allocations already express their interest
in memory reserves by __GFP_HIGH directly or by GFP_ATOMIC indirectly.
So maybe we do not need any additional flag here. There are not that
many ~__GFP_WAIT and most of them seem to require it _only_ because the
context doesn't allow for sleeping (e.g. to prevent from deadlocks).

> As that would be also subtle dependency. The
> general problem I think is that we should want the mm users to specify
> higher-level intentions (such as GFP_KERNEL) which would map to specific
> directions (__GFP_*) for the allocator, and currently it's rather a mess of
> both kinds of flags.

I agree. So I think that maybe we should drop that implicit access to
memory reserves for ~__GFP_WAIT allocations and let it do what it is
documented to do.

> Clearly the intention here is "opportunistic allocation that should
> not reclaim/compact, use reserves, wake up kswapd (?) because it's
> better to fall back to smaller pages than wait") and we don't seem to
> have a GFP_OPPORTUNISTIC flag for that. The allocation has to then
> mask out __GFP_WAIT which however looks like an atomic allocation to
> the allocator and give access to reserves, etc...

I think simply dropping GFP_WAIT is a good way to express that. The
fact that the current implementation gives access to memory reserves
implicitly is just a detail and the user of the allocator shouldn't care
about that.
-- 
Michal Hocko
SUSE Labs

next prev parent reply	other threads:[~2015-06-18 15:47 UTC|newest]

Thread overview: 20+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-06-11 23:50 [RFC V3] net: don't wait for order-3 page allocation Shaohua Li
2015-06-11 23:50 ` Shaohua Li
2015-06-12  0:02 ` Eric Dumazet
2015-06-12  0:02   ` Eric Dumazet
2015-06-12  0:34 ` David Miller
2015-06-12  0:34   ` David Miller
2015-06-12  9:36 ` Vlastimil Babka
2015-06-12  9:36   ` Vlastimil Babka
2015-06-17 23:02   ` David Rientjes
2015-06-17 23:02     ` David Rientjes
2015-06-18 14:30     ` Michal Hocko
2015-06-18 14:30       ` Michal Hocko
2015-06-18 14:35       ` Eric Dumazet
2015-06-18 14:43         ` Michal Hocko
2015-06-18 14:43           ` Michal Hocko
2015-06-18 15:22           ` Vlastimil Babka
2015-06-18 15:47             ` Michal Hocko [this message]
2015-06-18 15:47               ` Michal Hocko
2015-06-30 23:49               ` David Rientjes
2015-06-30 23:49                 ` David Rientjes

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20150618154716.GH5858@dhcp22.suse.cz \
    --to=mhocko@suse.cz \
    --cc=Kernel-team@fb.com \
    --cc=clm@fb.com \
    --cc=davem@davemloft.net \
    --cc=dbavatar@gmail.com \
    --cc=edumazet@google.com \
    --cc=linux-mm@kvack.org \
    --cc=netdev@vger.kernel.org \
    --cc=rientjes@google.com \
    --cc=shli@fb.com \
    --cc=vbabka@suse.cz \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.