On 28/02/2020 23:30, Jens Axboe wrote: > IORING_OP_PROVIDE_BUFFERS uses the buffer registration infrastructure to > support passing in an addr/len that is associated with a buffer ID and > buffer group ID. The group ID is used to index and lookup the buffers, > while the buffer ID can be used to notify the application which buffer > in the group was used. The addr passed in is the starting buffer address, > and length is each buffer length. A number of buffers to add with can be > specified, in which case addr is incremented by length for each addition, > and each buffer increments the buffer ID specified. > > No validation is done of the buffer ID. If the application provides > buffers within the same group with identical buffer IDs, then it'll have > a hard time telling which buffer ID was used. The only restriction is > that the buffer ID can be a max of 16-bits in size, so USHRT_MAX is the > maximum ID that can be used. > > Signed-off-by: Jens Axboe > + > +static int io_add_buffers(struct io_provide_buf *pbuf, struct list_head *list) > +{ > + struct io_buffer *buf; > + u64 addr = pbuf->addr; > + int i, bid = pbuf->bid; > + > + for (i = 0; i < pbuf->nbufs; i++) { > + buf = kmalloc(sizeof(*buf), GFP_KERNEL); > + if (!buf) > + break; > + > + buf->addr = addr; > + buf->len = pbuf->len; > + buf->bid = bid; > + list_add(&buf->list, list); > + addr += pbuf->len; So, it chops a linear buffer into pbuf->nbufs chunks of size pbuf->len. Did you consider iovec? I'll try to think about it after getting some sleep > + bid++; > + } > + > + return i; > +} > + > +static int io_provide_buffers(struct io_kiocb *req, struct io_kiocb **nxt, > + bool force_nonblock) > +{ > + struct io_provide_buf *p = &req->pbuf; > + struct io_ring_ctx *ctx = req->ctx; > + struct list_head *list; > + int ret = 0; > + > + /* > + * "Normal" inline submissions always hold the uring_lock, since we > + * grab it from the system call. Same is true for the SQPOLL offload. > + * The only exception is when we've detached the request and issue it > + * from an async worker thread, grab the lock for that case. > + */ > + if (!force_nonblock) > + mutex_lock(&ctx->uring_lock); io_poll_task_handler() calls it with force_nonblock==true, but it doesn't hold the mutex AFAIK. > + > + lockdep_assert_held(&ctx->uring_lock); > + > + list = idr_find(&ctx->io_buffer_idr, p->gid); > + if (!list) { > + list = kmalloc(sizeof(*list), GFP_KERNEL); > + if (!list) { > + ret = -ENOMEM; > + goto out; > + } > + INIT_LIST_HEAD(list); > + ret = idr_alloc(&ctx->io_buffer_idr, list, p->gid, p->gid + 1, > + GFP_KERNEL); > + if (ret < 0) { > + kfree(list); > + goto out; > + } > + } > + > + ret = io_add_buffers(p, list); Isn't it better to not do partial registration? i.e. it may return ret < pbuf->nbufs > + if (!ret) { > + /* no buffers added and list empty, remove entry */ > + if (list_empty(list)) { > + idr_remove(&ctx->io_buffer_idr, p->gid); > + kfree(list); > + } > + ret = -ENOMEM; > + } > +out: > + if (!force_nonblock) > + mutex_unlock(&ctx->uring_lock); > + if (ret < 0) > + req_set_fail_links(req); > + io_cqring_add_event(req, ret); > + io_put_req_find_next(req, nxt); > + return 0; > +} > + -- Pavel Begunkov