All the mail mirrored from lore.kernel.org
 help / color / mirror / Atom feed
From: Mina Almasry <almasrymina@google.com>
To: Jakub Kicinski <kuba@kernel.org>
Cc: Dragos Tatulea <dtatulea@nvidia.com>,
	"davem@davemloft.net" <davem@davemloft.net>,
	 "pabeni@redhat.com" <pabeni@redhat.com>,
	 "ilias.apalodimas@linaro.org" <ilias.apalodimas@linaro.org>,
	 "linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	 "jacob.e.keller@intel.com" <jacob.e.keller@intel.com>,
	 "netdev@vger.kernel.org" <netdev@vger.kernel.org>,
	Jianbo Liu <jianbol@nvidia.com>,
	 "edumazet@google.com" <edumazet@google.com>
Subject: Re: [RFC PATCH] net: Fix one page_pool page leak from skb_frag_unref
Date: Thu, 2 May 2024 13:08:23 -0700	[thread overview]
Message-ID: <CAHS8izPBhZ1USvSoDdR-Xwg3jNL5ppRXTzc3FEM9gjx6B1fJAw@mail.gmail.com> (raw)
In-Reply-To: <20240429080022.214a9df9@kernel.org>

On Mon, Apr 29, 2024 at 8:00 AM Jakub Kicinski <kuba@kernel.org> wrote:
>
> On Fri, 26 Apr 2024 21:24:09 -0700 Mina Almasry wrote:
> > On Fri, Apr 26, 2024 at 4:09 PM Jakub Kicinski <kuba@kernel.org> wrote:
> > >
> > > On Thu, 25 Apr 2024 12:20:59 -0700 Mina Almasry wrote:
> > > > -       if (recycle && napi_pp_get_page(page))
> > > > +       if (napi_pp_get_page(page))
> > >
> > > Pretty sure you can't do that. The "recycle" here is a concurrency
> > > guarantee. A guarantee someone is holding a pp ref on that page,
> > > a ref which will not go away while napi_pp_get_page() is executing.
> >
> > I don't mean to argue, but I think the get_page()/put_page() pair we
> > do in the page ref path is susceptible to the same issue. AFAIU it's
> > not safe to get_page() if another CPU can be dropping the last ref,
> > get_page_unless_zero() should be used instead.
>

I uploaded a revert for review, but to reland I perhaps need to
understand a bit more the concern here. AFAICT that diff you're
responding to is safe and it works very well with devmem so it would
be my preferred approach to reland (but there are other options if you
are convinced it's bad). FWIW my thoughts:

> Whoever gave us the pointer to operate on has a reference, so the page
> can't disappear. get_page() is safe.

Agreed.

> The problem with pp is that we
> don't know whether the caller has a pp ref or a page ref. IOW the pp
> ref may not be owned by whoever called us.
>

OK, this is where I'm not sure anymore. The diff you're replying to
attempts to enforce the invariant: "if anyone wants a reference on an
skb_frag, skb_frag_ref will be a pp ref on pp frags
(is_pp_page==true), and page refs on non-pp frags
(is_pp_page==false)".

Additionally the page doesn't transition from pp to non-pp and vice
versa while anyone is holding a pp ref, because
page_pool_set_pp_info() is called right after the page is obtained
from the buddy allocator (before released from the page pool) and
page_pool_clear_pp_info() is called only after all the pp refs are
dropped.

So:

1. We know the caller has a ref (otherwise get_page() wouldn't be safe
in the non-pp case).
2. We know that the page has not transitioned from pp to non-pp or
vice versa since the caller obtained the ref (from code inspection, pp
info is not changed until all the refs are dropped for pp pages).
3. AFAICT, it follows that if the page is pp, then the caller has a pp
ref, and if the page is non-pp, then the caller has a page ref.
4. So, if is_pp_page==true, then the caller has a pp ref, then
napi_pp_get_page() should be concurrently safe.

AFAICT the only way my mental model is broken is if there is code
doing a raw get_page() rather than a skb_frag_ref() in core net stack.
I would like to get rid of these call sites if they exist. They would
not interact well with devmem I think (but could be made to work with
some effort).

--
Thanks,
Mina

  reply	other threads:[~2024-05-02 20:08 UTC|newest]

Thread overview: 23+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-04-24 16:56 [RFC PATCH] net: Fix one page_pool page leak from skb_frag_unref Dragos Tatulea
2024-04-24 17:25 ` Dragos Tatulea
2024-04-24 22:08   ` Mina Almasry
2024-04-25  8:17     ` Dragos Tatulea
2024-04-25 19:20       ` Mina Almasry
2024-04-25 19:47         ` Dragos Tatulea
2024-04-25 20:42           ` Mina Almasry
2024-04-26 14:59             ` Dragos Tatulea
2024-04-26 19:03               ` Mina Almasry
2024-04-26 23:08         ` Jakub Kicinski
2024-04-27  4:24           ` Mina Almasry
2024-04-29 15:00             ` Jakub Kicinski
2024-05-02 20:08               ` Mina Almasry [this message]
2024-05-03  0:46                 ` Jakub Kicinski
2024-04-26 23:05       ` Jakub Kicinski
2024-04-29  7:39         ` Dragos Tatulea
2024-05-01  6:20           ` Dragos Tatulea
2024-05-01  7:48             ` Mina Almasry
2024-05-01  7:58               ` Dragos Tatulea
2024-05-01 14:24               ` Jakub Kicinski
2024-05-01 14:28                 ` Jakub Kicinski
2024-05-01 16:23                 ` Mina Almasry
2024-04-26 23:07     ` Jakub Kicinski

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAHS8izPBhZ1USvSoDdR-Xwg3jNL5ppRXTzc3FEM9gjx6B1fJAw@mail.gmail.com \
    --to=almasrymina@google.com \
    --cc=davem@davemloft.net \
    --cc=dtatulea@nvidia.com \
    --cc=edumazet@google.com \
    --cc=ilias.apalodimas@linaro.org \
    --cc=jacob.e.keller@intel.com \
    --cc=jianbol@nvidia.com \
    --cc=kuba@kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=netdev@vger.kernel.org \
    --cc=pabeni@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.