From: Mina Almasry <almasrymina@google.com>
To: Pavel Begunkov <asml.silence@gmail.com>
Cc: "David Wei" <dw@davidwei.uk>,
netdev@vger.kernel.org, linux-kernel@vger.kernel.org,
linux-doc@vger.kernel.org, linux-alpha@vger.kernel.org,
linux-mips@vger.kernel.org, linux-parisc@vger.kernel.org,
sparclinux@vger.kernel.org, linux-trace-kernel@vger.kernel.org,
linux-arch@vger.kernel.org, bpf@vger.kernel.org,
linux-kselftest@vger.kernel.org, linux-media@vger.kernel.org,
dri-devel@lists.freedesktop.org,
"Donald Hunter" <donald.hunter@gmail.com>,
"Jakub Kicinski" <kuba@kernel.org>,
"David S. Miller" <davem@davemloft.net>,
"Eric Dumazet" <edumazet@google.com>,
"Paolo Abeni" <pabeni@redhat.com>,
"Jonathan Corbet" <corbet@lwn.net>,
"Richard Henderson" <richard.henderson@linaro.org>,
"Ivan Kokshaysky" <ink@jurassic.park.msu.ru>,
"Matt Turner" <mattst88@gmail.com>,
"Thomas Bogendoerfer" <tsbogend@alpha.franken.de>,
"James E.J. Bottomley" <James.Bottomley@hansenpartnership.com>,
"Helge Deller" <deller@gmx.de>,
"Andreas Larsson" <andreas@gaisler.com>,
"Jesper Dangaard Brouer" <hawk@kernel.org>,
"Ilias Apalodimas" <ilias.apalodimas@linaro.org>,
"Steven Rostedt" <rostedt@goodmis.org>,
"Masami Hiramatsu" <mhiramat@kernel.org>,
"Mathieu Desnoyers" <mathieu.desnoyers@efficios.com>,
"Arnd Bergmann" <arnd@arndb.de>,
"Alexei Starovoitov" <ast@kernel.org>,
"Daniel Borkmann" <daniel@iogearbox.net>,
"Andrii Nakryiko" <andrii@kernel.org>,
"Martin KaFai Lau" <martin.lau@linux.dev>,
"Eduard Zingerman" <eddyz87@gmail.com>,
"Song Liu" <song@kernel.org>,
"Yonghong Song" <yonghong.song@linux.dev>,
"John Fastabend" <john.fastabend@gmail.com>,
"KP Singh" <kpsingh@kernel.org>,
"Stanislav Fomichev" <sdf@google.com>,
"Hao Luo" <haoluo@google.com>, "Jiri Olsa" <jolsa@kernel.org>,
"Steffen Klassert" <steffen.klassert@secunet.com>,
"Herbert Xu" <herbert@gondor.apana.org.au>,
"David Ahern" <dsahern@kernel.org>,
"Willem de Bruijn" <willemdebruijn.kernel@gmail.com>,
"Shuah Khan" <shuah@kernel.org>,
"Sumit Semwal" <sumit.semwal@linaro.org>,
"Christian König" <christian.koenig@amd.com>,
"Jason Gunthorpe" <jgg@ziepe.ca>,
"Yunsheng Lin" <linyunsheng@huawei.com>,
"Shailend Chand" <shailend@google.com>,
"Harshitha Ramamurthy" <hramamurthy@google.com>,
"Shakeel Butt" <shakeel.butt@linux.dev>,
"Jeroen de Borst" <jeroendb@google.com>,
"Praveen Kaligineedi" <pkaligineedi@google.com>,
"Willem de Bruijn" <willemb@google.com>,
"Kaiyuan Zhang" <kaiyuanz@google.com>
Subject: Re: [PATCH net-next v9 11/14] tcp: RX path for devmem TCP
Date: Wed, 29 May 2024 10:20:03 -0700 [thread overview]
Message-ID: <CAHS8izOnD3J3i+z1nxg=AZQW9dm0w2JBtbg2=oouiER8xqeRPA@mail.gmail.com> (raw)
In-Reply-To: <29464e46-e196-47aa-9ff5-23173099c95e@gmail.com>
On Tue, May 28, 2024 at 7:42 PM Pavel Begunkov <asml.silence@gmail.com> wrote:
>
> On 5/28/24 18:36, Mina Almasry wrote:
> > On Wed, May 22, 2024 at 11:02 PM David Wei <dw@davidwei.uk> wrote:
> ...
> >>> + */
> >>> + if (!skb_frag_net_iov(frag)) {
> >>> + net_err_ratelimited("Found non-dmabuf skb with net_iov");
> >>> + err = -ENODEV;
> >>> + goto out;
> >>> + }
> >>> +
> >>> + niov = skb_frag_net_iov(frag);
> >>
> >> Sorry if we've already discussed this.
> >>
> >> We have this additional hunk:
> >>
> >> + if (niov->pp->mp_ops != &dmabuf_devmem_ops) {
> >> + err = -ENODEV;
> >> + goto out;
> >> + }
> >>
> >> In case one of our skbs end up here, skb_frag_is_net_iov() and
> >> !skb_frags_readable(). Does this even matter? And if so then is there a
> >> better way to distinguish between our two types of net_iovs?
> >
> > Thanks for bringing this up, yes, maybe we do need a way to
> > distinguish, but it's not 100% critical, no? It's mostly for debug
> > checking?
>
> Not really. io_uring definitely wouldn't want the devmem completion path
> taking an iov and basically stashing it into a socket (via refcount),
> that's a lifetime problem. Nor we'd have all the binding/chunk_owner
> parts you have and probably use there.
>
> Same the other way around, you don't want io_uring grabbing your iov
> and locking it up, it won't even be possible to return it back. We
> also may want to have access to backing pages for different fallback
> purposes, for which we need to know the iov came from this particular
> ring.
>
> It shouldn't happen for a behaving user, but most of it would likely
> be exploitable one way or another.
>
> > I would say add a helper, like net_iov_is_dmabuf() or net_iov_is_io_uring().
>
> We're verifying that the context the iov bound to is the current
> context (e.g. io_uring instance) we're executing from. If we can
> agree that mp_priv should be a valid pointer, the check would look
> like:
>
> if (pp->mp_priv == io_uring_ifq)
>
> > Checking for niov->pp->mp_ops seems a bit hacky to me, and may be
> > outright broken. IIRC niov's can be disconnected from the page_pool
> > via page_pool_clear_pp_info(), and niov->pp may be null. Abstractly
>
> It's called in the release path like page_pool_return_page(),
> I can't imagine someone can sanely clear it while inflight ...
>
Ah, yes, I wasn't sure what happens to the inflight pages when the pp
gets destroyed. I thought maybe the pp would return the inflight
pages, but it looks to me like the pp just returns the free pages in
the alloc cache and the ptr_ring, and the pp stays alive until all the
inflight pages are freed. So indeed niov->pp should always be valid
while it's in flight. I still prefer to have the memory type to be
part of the niov itself, but I don't feel strongly at this point; up
to you.
> > speaking the niov type maybe should be a property of the niov itself,
> > and not the pp the niov is attached to.
>
> ... but I can just stash all that in niov->owner,
> struct dmabuf_genpool_chunk_owner you have. That might be even
> cleaner. And regardless of it I'll be making some minor changes
> to the structure to make it generic.
>
> > It is not immediately obvious to me what the best thing to do here is,
> > maybe it's best to add a flag to niov or to use niov->pp_magic for
> > this.
> >
> > I would humbly ask that your follow up patchset takes care of this
> > bit, if possible. I think mine is doing quite a bit of heavy lifting
> > as is (and I think may be close to ready?), when it comes to concerns
> > of devmem + io_uring coexisting if you're able to take care, awesome,
> > if not, I can look into squashing some fix.
>
> Let it be this way then. It's not a problem while there is
> only one such a provider.
>
Thank you!
--
Thanks,
Mina
next prev parent reply other threads:[~2024-05-29 17:20 UTC|newest]
Thread overview: 28+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-05-10 23:21 [PATCH net-next v9 00/14] Device Memory TCP Mina Almasry
2024-05-10 23:21 ` [PATCH net-next v9 01/14] netdev: add netdev_rx_queue_restart() Mina Almasry
2024-05-10 23:21 ` [PATCH net-next v9 02/14] net: page_pool: create hooks for custom page providers Mina Almasry
2024-05-10 23:21 ` [PATCH net-next v9 03/14] net: netdev netlink api to bind dma-buf to a net device Mina Almasry
2024-05-10 23:21 ` [PATCH net-next v9 04/14] netdev: support binding dma-buf to netdevice Mina Almasry
2024-05-15 10:01 ` Nikolay Aleksandrov
2024-05-15 10:19 ` Nikolay Aleksandrov
2024-05-18 18:46 ` David Wei
2024-05-29 19:49 ` Mina Almasry
2024-05-18 18:57 ` David Wei
2024-05-10 23:21 ` [PATCH net-next v9 05/14] netdev: netdevice devmem allocator Mina Almasry
2024-05-18 0:39 ` David Wei
2024-05-10 23:21 ` [PATCH net-next v9 06/14] page_pool: convert to use netmem Mina Almasry
2024-05-10 23:21 ` [PATCH net-next v9 07/14] page_pool: devmem support Mina Almasry
2024-05-10 23:21 ` [PATCH net-next v9 08/14] memory-provider: dmabuf devmem memory provider Mina Almasry
2024-05-10 23:21 ` [PATCH net-next v9 09/14] net: support non paged skb frags Mina Almasry
2024-05-10 23:21 ` [PATCH net-next v9 10/14] net: add support for skbs with unreadable frags Mina Almasry
2024-05-10 23:21 ` [PATCH net-next v9 11/14] tcp: RX path for devmem TCP Mina Almasry
2024-05-23 6:02 ` David Wei
2024-05-28 17:36 ` Mina Almasry
2024-05-29 2:42 ` Pavel Begunkov
2024-05-29 17:20 ` Mina Almasry [this message]
2024-05-10 23:21 ` [PATCH net-next v9 12/14] net: add SO_DEVMEM_DONTNEED setsockopt to release RX frags Mina Almasry
2024-05-15 9:52 ` Nikolay Aleksandrov
2024-05-10 23:21 ` [PATCH net-next v9 13/14] net: add devmem TCP documentation Mina Almasry
2024-05-10 23:21 ` [PATCH net-next v9 14/14] selftests: add ncdevmem, netcat for devmem TCP Mina Almasry
2024-05-13 23:31 ` [PATCH net-next v9 00/14] Device Memory TCP Jakub Kicinski
2024-05-14 17:15 ` Mina Almasry
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to='CAHS8izOnD3J3i+z1nxg=AZQW9dm0w2JBtbg2=oouiER8xqeRPA@mail.gmail.com' \
--to=almasrymina@google.com \
--cc=James.Bottomley@hansenpartnership.com \
--cc=andreas@gaisler.com \
--cc=andrii@kernel.org \
--cc=arnd@arndb.de \
--cc=asml.silence@gmail.com \
--cc=ast@kernel.org \
--cc=bpf@vger.kernel.org \
--cc=christian.koenig@amd.com \
--cc=corbet@lwn.net \
--cc=daniel@iogearbox.net \
--cc=davem@davemloft.net \
--cc=deller@gmx.de \
--cc=donald.hunter@gmail.com \
--cc=dri-devel@lists.freedesktop.org \
--cc=dsahern@kernel.org \
--cc=dw@davidwei.uk \
--cc=eddyz87@gmail.com \
--cc=edumazet@google.com \
--cc=haoluo@google.com \
--cc=hawk@kernel.org \
--cc=herbert@gondor.apana.org.au \
--cc=hramamurthy@google.com \
--cc=ilias.apalodimas@linaro.org \
--cc=ink@jurassic.park.msu.ru \
--cc=jeroendb@google.com \
--cc=jgg@ziepe.ca \
--cc=john.fastabend@gmail.com \
--cc=jolsa@kernel.org \
--cc=kaiyuanz@google.com \
--cc=kpsingh@kernel.org \
--cc=kuba@kernel.org \
--cc=linux-alpha@vger.kernel.org \
--cc=linux-arch@vger.kernel.org \
--cc=linux-doc@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-kselftest@vger.kernel.org \
--cc=linux-media@vger.kernel.org \
--cc=linux-mips@vger.kernel.org \
--cc=linux-parisc@vger.kernel.org \
--cc=linux-trace-kernel@vger.kernel.org \
--cc=linyunsheng@huawei.com \
--cc=martin.lau@linux.dev \
--cc=mathieu.desnoyers@efficios.com \
--cc=mattst88@gmail.com \
--cc=mhiramat@kernel.org \
--cc=netdev@vger.kernel.org \
--cc=pabeni@redhat.com \
--cc=pkaligineedi@google.com \
--cc=richard.henderson@linaro.org \
--cc=rostedt@goodmis.org \
--cc=sdf@google.com \
--cc=shailend@google.com \
--cc=shakeel.butt@linux.dev \
--cc=shuah@kernel.org \
--cc=song@kernel.org \
--cc=sparclinux@vger.kernel.org \
--cc=steffen.klassert@secunet.com \
--cc=sumit.semwal@linaro.org \
--cc=tsbogend@alpha.franken.de \
--cc=willemb@google.com \
--cc=willemdebruijn.kernel@gmail.com \
--cc=yonghong.song@linux.dev \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).