From: Pavel Begunkov <asml.silence@gmail.com>
To: Mina Almasry <almasrymina@google.com>
Cc: "Christoph Hellwig" <hch@infradead.org>,
"Jason Gunthorpe" <jgg@ziepe.ca>,
netdev@vger.kernel.org, linux-kernel@vger.kernel.org,
linux-doc@vger.kernel.org, linux-alpha@vger.kernel.org,
linux-mips@vger.kernel.org, linux-parisc@vger.kernel.org,
sparclinux@vger.kernel.org, linux-trace-kernel@vger.kernel.org,
linux-arch@vger.kernel.org, bpf@vger.kernel.org,
linux-kselftest@vger.kernel.org, linux-media@vger.kernel.org,
dri-devel@lists.freedesktop.org,
"David S. Miller" <davem@davemloft.net>,
"Eric Dumazet" <edumazet@google.com>,
"Jakub Kicinski" <kuba@kernel.org>,
"Paolo Abeni" <pabeni@redhat.com>,
"Jonathan Corbet" <corbet@lwn.net>,
"Richard Henderson" <richard.henderson@linaro.org>,
"Ivan Kokshaysky" <ink@jurassic.park.msu.ru>,
"Matt Turner" <mattst88@gmail.com>,
"Thomas Bogendoerfer" <tsbogend@alpha.franken.de>,
"James E.J. Bottomley" <James.Bottomley@hansenpartnership.com>,
"Helge Deller" <deller@gmx.de>,
"Andreas Larsson" <andreas@gaisler.com>,
"Jesper Dangaard Brouer" <hawk@kernel.org>,
"Ilias Apalodimas" <ilias.apalodimas@linaro.org>,
"Steven Rostedt" <rostedt@goodmis.org>,
"Masami Hiramatsu" <mhiramat@kernel.org>,
"Mathieu Desnoyers" <mathieu.desnoyers@efficios.com>,
"Arnd Bergmann" <arnd@arndb.de>,
"Alexei Starovoitov" <ast@kernel.org>,
"Daniel Borkmann" <daniel@iogearbox.net>,
"Andrii Nakryiko" <andrii@kernel.org>,
"Martin KaFai Lau" <martin.lau@linux.dev>,
"Eduard Zingerman" <eddyz87@gmail.com>,
"Song Liu" <song@kernel.org>,
"Yonghong Song" <yonghong.song@linux.dev>,
"John Fastabend" <john.fastabend@gmail.com>,
"KP Singh" <kpsingh@kernel.org>,
"Stanislav Fomichev" <sdf@google.com>,
"Hao Luo" <haoluo@google.com>, "Jiri Olsa" <jolsa@kernel.org>,
"Steffen Klassert" <steffen.klassert@secunet.com>,
"Herbert Xu" <herbert@gondor.apana.org.au>,
"David Ahern" <dsahern@kernel.org>,
"Willem de Bruijn" <willemdebruijn.kernel@gmail.com>,
"Shuah Khan" <shuah@kernel.org>,
"Sumit Semwal" <sumit.semwal@linaro.org>,
"Christian König" <christian.koenig@amd.com>,
"Amritha Nambiar" <amritha.nambiar@intel.com>,
"Maciej Fijalkowski" <maciej.fijalkowski@intel.com>,
"Alexander Mikhalitsyn" <alexander@mihalicyn.com>,
"Kaiyuan Zhang" <kaiyuanz@google.com>,
"Christian Brauner" <brauner@kernel.org>,
"Simon Horman" <horms@kernel.org>,
"David Howells" <dhowells@redhat.com>,
"Florian Westphal" <fw@strlen.de>,
"Yunsheng Lin" <linyunsheng@huawei.com>,
"Kuniyuki Iwashima" <kuniyu@amazon.com>,
"Jens Axboe" <axboe@kernel.dk>,
"Arseniy Krasnov" <avkrasnov@salutedevices.com>,
"Aleksander Lobakin" <aleksander.lobakin@intel.com>,
"Michael Lass" <bevan@bi-co.net>, "Jiri Pirko" <jiri@resnulli.us>,
"Sebastian Andrzej Siewior" <bigeasy@linutronix.de>,
"Lorenzo Bianconi" <lorenzo@kernel.org>,
"Richard Gobert" <richardbgobert@gmail.com>,
"Sridhar Samudrala" <sridhar.samudrala@intel.com>,
"Xuan Zhuo" <xuanzhuo@linux.alibaba.com>,
"Johannes Berg" <johannes.berg@intel.com>,
"Abel Wu" <wuyun.abel@bytedance.com>,
"Breno Leitao" <leitao@debian.org>, "David Wei" <dw@davidwei.uk>,
"Shailend Chand" <shailend@google.com>,
"Harshitha Ramamurthy" <hramamurthy@google.com>,
"Shakeel Butt" <shakeel.butt@linux.dev>,
"Jeroen de Borst" <jeroendb@google.com>,
"Praveen Kaligineedi" <pkaligineedi@google.com>
Subject: Re: [RFC PATCH net-next v8 02/14] net: page_pool: create hooks for custom page providers
Date: Tue, 7 May 2024 18:34:39 +0100 [thread overview]
Message-ID: <d0610fa1-562f-4d4e-ae84-2a0267316c32@gmail.com> (raw)
In-Reply-To: <CAHS8izNL-phg3y9xiQbx7A2wQE3ZZKXiQA0oFW9mgj4ONk7GSw@mail.gmail.com>
On 5/7/24 18:15, Mina Almasry wrote:
> On Tue, May 7, 2024 at 9:55 AM Pavel Begunkov <asml.silence@gmail.com> wrote:
>>
>> On 5/7/24 17:23, Christoph Hellwig wrote:
>>> On Tue, May 07, 2024 at 01:18:57PM -0300, Jason Gunthorpe wrote:
>>>> On Tue, May 07, 2024 at 05:05:12PM +0100, Pavel Begunkov wrote:
>>>>>> even in tree if you give them enough rope, and they should not have
>>>>>> that rope when the only sensible options are page/folio based kernel
>>>>>> memory (incuding large/huge folios) and dmabuf.
>>>>>
>>>>> I believe there is at least one deep confusion here, considering you
>>>>> previously mentioned Keith's pre-mapping patches. The "hooks" are not
>>>>> that about in what format you pass memory, it's arguably the least
>>>>> interesting part for page pool, more or less it'd circulate whatever
>>>>> is given. It's more of how to have a better control over buffer lifetime
>>>>> and implement a buffer pool passing data to users and empty buffers
>>>>> back.
>>>>
>>>> Isn't that more or less exactly what dmabuf is? Why do you need
>>>> another almost dma-buf thing for another project?
>>>
>>> That's the exact point I've been making since the last round of
>>> the series. We don't need to reinvent dmabuf poorly in every
>>> subsystem, but instead fix the odd parts in it and make it suitable
>>> for everyone.
>>
>> Someone would need to elaborate how dma-buf is like that addition
>> to page pool infra.
>
> I think I understand what Jason is requesting here, and I'll take a
> shot at elaborating. AFAICT what he's saying is technically feasible
> and addresses the nack while giving you the uapi you want. It just
> requires a bit (a lot?) of work on your end unfortunately.
>
> CONFIG_UDMABUF takes in a memfd, converts it to a dmabuf, and returns
> it to userspace. See udmabuf_create().
>
> I think what Jason is saying here, is that you can write similar code
> to udmabuf_creat() that takes in a io_uring memory region, and
> converts it to a dmabuf inside the kernel.
>
> I haven't looked at your series yet too closely (sorry!), but I assume
> you currently have a netlink API that binds an io_uring memory region
> to the NIC rx-queue page_pool, right? That netlink API would need to
> be changed to:
No, it's different, I'll skip details, but the main problem is
that those callbacks are used to implement the user api returning
buffers via a ring, where the callback grabs them (in napi context)
and feeds into page pool. That replaces SO_DEVMEM_DONTNEED and the
need for ioctl/setsockopt.
> 1. Take in the io_uring memory.
> 2. Convert it to a dmabuf like udmabuf_create() does.
> 3. Bind the resulting dmabuf to the rx-queue page_pool.
>
> There would be more changes needed vis-a-vis the clean up path and
> lifetime management, but I think this is the general idea.
>
> This would give you the uapi you want, while the page_pool never seen
> non-dmabuf memory (addresses the nack, I think).
--
Pavel Begunkov
next prev parent reply other threads:[~2024-05-07 17:34 UTC|newest]
Thread overview: 58+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-04-03 0:20 [RFC PATCH net-next v8 00/14] Device Memory TCP Mina Almasry
2024-04-03 0:20 ` [RFC PATCH net-next v8 01/14] queue_api: define queue api Mina Almasry
2024-04-03 0:20 ` [RFC PATCH net-next v8 02/14] net: page_pool: create hooks for custom page providers Mina Almasry
2024-05-01 7:54 ` Christoph Hellwig
2024-05-03 20:10 ` Mina Almasry
2024-05-06 12:04 ` Christoph Hellwig
2024-05-07 16:05 ` Pavel Begunkov
2024-05-07 16:18 ` Jason Gunthorpe
2024-05-07 16:23 ` Christoph Hellwig
2024-05-07 16:42 ` Mina Almasry
2024-05-07 16:48 ` Jason Gunthorpe
2024-05-07 17:19 ` Daniel Vetter
2024-05-07 17:25 ` Pavel Begunkov
2024-05-07 17:56 ` Jason Gunthorpe
2024-05-07 19:35 ` Pavel Begunkov
2024-05-07 23:32 ` Jason Gunthorpe
2024-05-08 7:16 ` Daniel Vetter
2024-05-08 11:35 ` Pavel Begunkov
2024-05-08 15:34 ` Daniel Vetter
2024-05-08 15:51 ` Christoph Hellwig
2024-05-08 17:02 ` Pavel Begunkov
2024-05-09 4:49 ` Christoph Hellwig
2024-05-08 11:30 ` Pavel Begunkov
2024-05-08 14:25 ` Jason Gunthorpe
2024-05-08 15:44 ` Pavel Begunkov
2024-05-08 15:58 ` Jason Gunthorpe
2024-05-08 16:13 ` Pavel Begunkov
2024-05-07 17:17 ` Pavel Begunkov
2024-05-07 16:55 ` Pavel Begunkov
2024-05-07 17:15 ` Mina Almasry
2024-05-07 17:34 ` Pavel Begunkov [this message]
2024-04-03 0:20 ` [RFC PATCH net-next v8 03/14] net: netdev netlink api to bind dma-buf to a net device Mina Almasry
2024-04-03 0:20 ` [RFC PATCH net-next v8 04/14] netdev: support binding dma-buf to netdevice Mina Almasry
2024-04-24 17:35 ` David Wei
2024-04-24 22:11 ` Mina Almasry
2024-04-03 0:20 ` [RFC PATCH net-next v8 05/14] netdev: netdevice devmem allocator Mina Almasry
2024-04-03 0:20 ` [RFC PATCH net-next v8 06/14] page_pool: convert to use netmem Mina Almasry
2024-04-03 17:27 ` Simon Horman
2024-04-03 0:20 ` [RFC PATCH net-next v8 07/14] page_pool: devmem support Mina Almasry
2024-04-27 0:17 ` David Wei
2024-04-27 2:11 ` Mina Almasry
2024-04-30 13:31 ` Pavel Begunkov
2024-04-30 13:45 ` Jens Axboe
2024-04-30 18:29 ` Mina Almasry
2024-04-30 18:55 ` Jens Axboe
2024-04-30 19:19 ` Mina Almasry
2024-05-01 13:58 ` Jesper Dangaard Brouer
2024-05-01 7:55 ` Christoph Hellwig
2024-05-06 0:29 ` David Wei
2024-04-03 0:20 ` [RFC PATCH net-next v8 08/14] memory-provider: dmabuf devmem memory provider Mina Almasry
2024-04-03 0:20 ` [RFC PATCH net-next v8 09/14] net: support non paged skb frags Mina Almasry
2024-04-03 0:20 ` [RFC PATCH net-next v8 10/14] net: add support for skbs with unreadable frags Mina Almasry
2024-04-03 0:20 ` [RFC PATCH net-next v8 11/14] tcp: RX path for devmem TCP Mina Almasry
2024-04-03 0:20 ` [RFC PATCH net-next v8 12/14] net: add SO_DEVMEM_DONTNEED setsockopt to release RX frags Mina Almasry
2024-04-03 0:20 ` [RFC PATCH net-next v8 13/14] net: add devmem TCP documentation Mina Almasry
2024-05-03 13:14 ` Bagas Sanjaya
2024-04-03 0:20 ` [RFC PATCH net-next v8 14/14] selftests: add ncdevmem, netcat for devmem TCP Mina Almasry
2024-04-08 15:57 ` Cong Wang
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=d0610fa1-562f-4d4e-ae84-2a0267316c32@gmail.com \
--to=asml.silence@gmail.com \
--cc=James.Bottomley@hansenpartnership.com \
--cc=aleksander.lobakin@intel.com \
--cc=alexander@mihalicyn.com \
--cc=almasrymina@google.com \
--cc=amritha.nambiar@intel.com \
--cc=andreas@gaisler.com \
--cc=andrii@kernel.org \
--cc=arnd@arndb.de \
--cc=ast@kernel.org \
--cc=avkrasnov@salutedevices.com \
--cc=axboe@kernel.dk \
--cc=bevan@bi-co.net \
--cc=bigeasy@linutronix.de \
--cc=bpf@vger.kernel.org \
--cc=brauner@kernel.org \
--cc=christian.koenig@amd.com \
--cc=corbet@lwn.net \
--cc=daniel@iogearbox.net \
--cc=davem@davemloft.net \
--cc=deller@gmx.de \
--cc=dhowells@redhat.com \
--cc=dri-devel@lists.freedesktop.org \
--cc=dsahern@kernel.org \
--cc=dw@davidwei.uk \
--cc=eddyz87@gmail.com \
--cc=edumazet@google.com \
--cc=fw@strlen.de \
--cc=haoluo@google.com \
--cc=hawk@kernel.org \
--cc=hch@infradead.org \
--cc=herbert@gondor.apana.org.au \
--cc=horms@kernel.org \
--cc=hramamurthy@google.com \
--cc=ilias.apalodimas@linaro.org \
--cc=ink@jurassic.park.msu.ru \
--cc=jeroendb@google.com \
--cc=jgg@ziepe.ca \
--cc=jiri@resnulli.us \
--cc=johannes.berg@intel.com \
--cc=john.fastabend@gmail.com \
--cc=jolsa@kernel.org \
--cc=kaiyuanz@google.com \
--cc=kpsingh@kernel.org \
--cc=kuba@kernel.org \
--cc=kuniyu@amazon.com \
--cc=leitao@debian.org \
--cc=linux-alpha@vger.kernel.org \
--cc=linux-arch@vger.kernel.org \
--cc=linux-doc@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-kselftest@vger.kernel.org \
--cc=linux-media@vger.kernel.org \
--cc=linux-mips@vger.kernel.org \
--cc=linux-parisc@vger.kernel.org \
--cc=linux-trace-kernel@vger.kernel.org \
--cc=linyunsheng@huawei.com \
--cc=lorenzo@kernel.org \
--cc=maciej.fijalkowski@intel.com \
--cc=martin.lau@linux.dev \
--cc=mathieu.desnoyers@efficios.com \
--cc=mattst88@gmail.com \
--cc=mhiramat@kernel.org \
--cc=netdev@vger.kernel.org \
--cc=pabeni@redhat.com \
--cc=pkaligineedi@google.com \
--cc=richard.henderson@linaro.org \
--cc=richardbgobert@gmail.com \
--cc=rostedt@goodmis.org \
--cc=sdf@google.com \
--cc=shailend@google.com \
--cc=shakeel.butt@linux.dev \
--cc=shuah@kernel.org \
--cc=song@kernel.org \
--cc=sparclinux@vger.kernel.org \
--cc=sridhar.samudrala@intel.com \
--cc=steffen.klassert@secunet.com \
--cc=sumit.semwal@linaro.org \
--cc=tsbogend@alpha.franken.de \
--cc=willemdebruijn.kernel@gmail.com \
--cc=wuyun.abel@bytedance.com \
--cc=xuanzhuo@linux.alibaba.com \
--cc=yonghong.song@linux.dev \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).