All the mail mirrored from lore.kernel.org
 help / color / mirror / Atom feed
From: Kanchan Joshi <joshi.k@samsung.com>
To: Jens Axboe <axboe@kernel.dk>,
	martin.petersen@oracle.com, kbusch@kernel.org, hch@lst.de,
	brauner@kernel.org
Cc: asml.silence@gmail.com, dw@davidwei.uk, io-uring@vger.kernel.org,
	linux-nvme@lists.infradead.org, linux-block@vger.kernel.org,
	gost.dev@samsung.com, Anuj Gupta <anuj20.g@samsung.com>,
	Nitesh Shetty <nj.shetty@samsung.com>
Subject: Re: [PATCH 08/10] io_uring/rw: add support to send meta along with read/write
Date: Tue, 30 Apr 2024 01:41:32 +0530	[thread overview]
Message-ID: <2e8eb4e8-beb2-51cd-67b5-75e920c9fff4@samsung.com> (raw)
In-Reply-To: <f3489d0c-2d27-4e27-ae49-df2e9dad2e00@kernel.dk>

On 4/26/2024 7:55 PM, Jens Axboe wrote:
>> diff --git a/io_uring/rw.c b/io_uring/rw.c
>> index 3134a6ece1be..b2c9ac91d5e5 100644
>> --- a/io_uring/rw.c
>> +++ b/io_uring/rw.c
>> @@ -587,6 +623,8 @@ static int kiocb_done(struct io_kiocb *req, ssize_t ret,
>>   
>>   		req->flags &= ~REQ_F_REISSUE;
>>   		iov_iter_restore(&io->iter, &io->iter_state);
>> +		if (unlikely(rw->kiocb.ki_flags & IOCB_USE_META))
>> +			iov_iter_restore(&io->meta.iter, &io->iter_meta_state);
>>   		return -EAGAIN;
>>   	}
>>   	return IOU_ISSUE_SKIP_COMPLETE;
> This puzzles me a bit, why is the restore now dependent on
> IOCB_USE_META?

Both saving/restore for meta is under this condition (so seemed natural).
Also, to avoid growing "struct io_async_rw" too much, this patch keeps 
keeps meta/iter_meta_state in the same memory as wpq. So doing this 
unconditionally can corrupt wpq for buffered io.

>> @@ -768,7 +806,7 @@ static int io_rw_init_file(struct io_kiocb *req, fmode_t mode)
>>   	if (!(req->flags & REQ_F_FIXED_FILE))
>>   		req->flags |= io_file_get_flags(file);
>>   
>> -	kiocb->ki_flags = file->f_iocb_flags;
>> +	kiocb->ki_flags |= file->f_iocb_flags;
>>   	ret = kiocb_set_rw_flags(kiocb, rw->flags);
>>   	if (unlikely(ret))
>>   		return ret;
>> @@ -787,7 +825,8 @@ static int io_rw_init_file(struct io_kiocb *req, fmode_t mode)
>>   		if (!(kiocb->ki_flags & IOCB_DIRECT) || !file->f_op->iopoll)
>>   			return -EOPNOTSUPP;
>>   
>> -		kiocb->private = NULL;
>> +		if (likely(!(kiocb->ki_flags & IOCB_USE_META)))
>> +			kiocb->private = NULL;
>>   		kiocb->ki_flags |= IOCB_HIPRI;
>>   		kiocb->ki_complete = io_complete_rw_iopoll;
>>   		req->iopoll_completed = 0;
> 
> Why don't we just set ->private generically earlier, eg like we do for
> the ki_flags, rather than have it be a branch in here?

Not sure if I am missing what you have in mind.
But kiocb->private was set before we reached to this point (in 
io_rw_meta). So we don't overwrite that here.

>> @@ -853,7 +892,8 @@ static int __io_read(struct io_kiocb *req, unsigned int issue_flags)
>>   	} else if (ret == -EIOCBQUEUED) {
>>   		return IOU_ISSUE_SKIP_COMPLETE;
>>   	} else if (ret == req->cqe.res || ret <= 0 || !force_nonblock ||
>> -		   (req->flags & REQ_F_NOWAIT) || !need_complete_io(req)) {
>> +		   (req->flags & REQ_F_NOWAIT) || !need_complete_io(req) ||
>> +		   (kiocb->ki_flags & IOCB_USE_META)) {
>>   		/* read all, failed, already did sync or don't want to retry */
>>   		goto done;
>>   	}
> 
> Would it be cleaner to stuff that IOCB_USE_META check in
> need_complete_io(), as that would closer seem to describe why that check
> is there in the first place? With a comment.

Yes, will do.

>> @@ -864,6 +904,12 @@ static int __io_read(struct io_kiocb *req, unsigned int issue_flags)
>>   	 * manually if we need to.
>>   	 */
>>   	iov_iter_restore(&io->iter, &io->iter_state);
>> +	if (unlikely(kiocb->ki_flags & IOCB_USE_META)) {
>> +		/* don't handle partial completion for read + meta */
>> +		if (ret > 0)
>> +			goto done;
>> +		iov_iter_restore(&io->meta.iter, &io->iter_meta_state);
>> +	}
> 
> Also seems a bit odd why we need this check here, surely if this is
> needed other "don't do retry IOs" conditions would be the same?

Yes, will revisit.
>> @@ -1053,7 +1099,8 @@ int io_write(struct io_kiocb *req, unsigned int issue_flags)
>>   		if (ret2 == -EAGAIN && (req->ctx->flags & IORING_SETUP_IOPOLL))
>>   			goto ret_eagain;
>>   
>> -		if (ret2 != req->cqe.res && ret2 >= 0 && need_complete_io(req)) {
>> +		if (ret2 != req->cqe.res && ret2 >= 0 && need_complete_io(req)
>> +				&& !(kiocb->ki_flags & IOCB_USE_META)) {
>>   			trace_io_uring_short_write(req->ctx, kiocb->ki_pos - ret2,
>>   						req->cqe.res, ret2);
> 
> Same here. Would be nice to integrate this a bit nicer rather than have
> a bunch of "oh we also need this extra check here" conditions.

Will look into this too.
>> @@ -1074,12 +1121,33 @@ int io_write(struct io_kiocb *req, unsigned int issue_flags)
>>   	} else {
>>   ret_eagain:
>>   		iov_iter_restore(&io->iter, &io->iter_state);
>> +		if (unlikely(kiocb->ki_flags & IOCB_USE_META))
>> +			iov_iter_restore(&io->meta.iter, &io->iter_meta_state);
>>   		if (kiocb->ki_flags & IOCB_WRITE)
>>   			io_req_end_write(req);
>>   		return -EAGAIN;
>>   	}
>>   }
> 
> Same question here on the (now) conditional restore.

Did not get the concern. Do you prefer it unconditional.

>> +int io_rw_meta(struct io_kiocb *req, unsigned int issue_flags)
>> +{
>> +	struct io_rw *rw = io_kiocb_to_cmd(req, struct io_rw);
>> +	struct io_async_rw *io = req->async_data;
>> +	struct kiocb *kiocb = &rw->kiocb;
>> +	int ret;
>> +
>> +	if (!(req->file->f_flags & O_DIRECT))
>> +		return -EOPNOTSUPP;
> 
> Why isn't this just caught at init time when IOCB_DIRECT is checked?

io_rw_init_file() gets invoked after this, and IOCB_DIRECT check is only 
for IOPOLL situation. We want to check/fail it regardless of IOPOLL.

> 
>> +	kiocb->private = &io->meta;
>> +	if (req->opcode == IORING_OP_READ_META)
>> +		ret = io_read(req, issue_flags);
>> +	else
>> +		ret = io_write(req, issue_flags);
>> +
>> +	return ret;
>> +}
> 
> kiocb->private is a bit of an odd beast, and ownership isn't clear at
> all. It would make the most sense if the owner of the kiocb (eg io_uring
> in this case) owned it, but take a look at eg ocfs2 and see what they do
> with it... I think this would blow up as a result.

Yes, ocfs2 is making use of kiocb->private. But seems that's fine. In 
io_uring we use the field only to send the information down. ocfs2 (or 
anything else unaware of this interface) may just overwrite the 
kiocb->private.
If the lower layer want to support meta exchange, it is supposed to 
extract meta-descriptor from kiocb->private before altering it.

This case is same for block direct path too when we are doing polled io.

  reply	other threads:[~2024-04-29 20:11 UTC|newest]

Thread overview: 41+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <CGME20240425184649epcas5p42f6ddbfb1c579f043a919973c70ebd03@epcas5p4.samsung.com>
2024-04-25 18:39 ` [PATCH 00/10] Read/Write with meta/integrity Kanchan Joshi
     [not found]   ` <CGME20240425184651epcas5p3404f2390d6cf05148eb96e1af093e7bc@epcas5p3.samsung.com>
2024-04-25 18:39     ` [PATCH 01/10] block: set bip_vcnt correctly Kanchan Joshi
2024-04-27  7:02       ` Christoph Hellwig
2024-04-27 14:16         ` Keith Busch
2024-04-29 10:59           ` Kanchan Joshi
2024-05-01  7:45           ` Christoph Hellwig
2024-05-01  8:03             ` Keith Busch
     [not found]   ` <CGME20240425184653epcas5p28de1473090e0141ae74f8b0a6eb921a7@epcas5p2.samsung.com>
2024-04-25 18:39     ` [PATCH 02/10] block: copy bip_max_vcnt vecs instead of bip_vcnt during clone Kanchan Joshi
2024-04-27  7:03       ` Christoph Hellwig
2024-04-29 11:28         ` Kanchan Joshi
2024-04-29 12:04           ` Keith Busch
2024-04-29 17:07             ` Christoph Hellwig
2024-04-30  8:25               ` Keith Busch
2024-05-01  7:46                 ` Christoph Hellwig
2024-05-01  7:50           ` Christoph Hellwig
     [not found]   ` <CGME20240425184656epcas5p42228cdef753cf20a266d12de5bc130f0@epcas5p4.samsung.com>
2024-04-25 18:39     ` [PATCH 03/10] block: copy result back to user meta buffer correctly in case of split Kanchan Joshi
2024-04-27  7:04       ` Christoph Hellwig
     [not found]   ` <CGME20240425184658epcas5p2adb6bf01a5c56ffaac3a55ab57afaf8e@epcas5p2.samsung.com>
2024-04-25 18:39     ` [PATCH 04/10] block: avoid unpinning/freeing the bio_vec incase of cloned bio Kanchan Joshi
2024-04-27  7:05       ` Christoph Hellwig
2024-04-29 11:40         ` Kanchan Joshi
2024-04-29 17:09           ` Christoph Hellwig
2024-05-01 13:02             ` Kanchan Joshi
2024-05-02  7:12               ` Christoph Hellwig
2024-05-03 12:01                 ` Kanchan Joshi
     [not found]   ` <CGME20240425184700epcas5p1687590f7e4a3f3c3620ac27af514f0ca@epcas5p1.samsung.com>
2024-04-25 18:39     ` [PATCH 05/10] block, nvme: modify rq_integrity_vec function Kanchan Joshi
2024-04-27  7:18       ` Christoph Hellwig
2024-04-29 11:34         ` Kanchan Joshi
2024-04-29 17:11           ` Christoph Hellwig
     [not found]   ` <CGME20240425184702epcas5p1ccb0df41b07845bc252d69007558e3fa@epcas5p1.samsung.com>
2024-04-25 18:39     ` [PATCH 06/10] block: modify bio_integrity_map_user argument Kanchan Joshi
2024-04-27  7:19       ` Christoph Hellwig
     [not found]   ` <CGME20240425184704epcas5p3b9eb6cce9c9658eb1d0d32937e778a5d@epcas5p3.samsung.com>
2024-04-25 18:39     ` [PATCH 07/10] block: define meta io descriptor Kanchan Joshi
     [not found]   ` <CGME20240425184706epcas5p1d75c19d1d1458c52fc4009f150c7dc7d@epcas5p1.samsung.com>
2024-04-25 18:39     ` [PATCH 08/10] io_uring/rw: add support to send meta along with read/write Kanchan Joshi
2024-04-26 14:25       ` Jens Axboe
2024-04-29 20:11         ` Kanchan Joshi [this message]
     [not found]   ` <CGME20240425184708epcas5p4f1d95cd8d285614f712868d205a23115@epcas5p4.samsung.com>
2024-04-25 18:39     ` [PATCH 09/10] block: add support to send meta buffer Kanchan Joshi
2024-04-26 15:21       ` Keith Busch
2024-04-29 11:47         ` Kanchan Joshi
     [not found]   ` <CGME20240425184710epcas5p2968bbc40ed10d1f0184bb511af054fcb@epcas5p2.samsung.com>
2024-04-25 18:39     ` [PATCH 10/10] nvme: add separate handling for user integrity buffer Kanchan Joshi
2024-04-25 19:56       ` Keith Busch
2024-04-26 10:57       ` kernel test robot
2024-04-26 14:19   ` [PATCH 00/10] Read/Write with meta/integrity Jens Axboe

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=2e8eb4e8-beb2-51cd-67b5-75e920c9fff4@samsung.com \
    --to=joshi.k@samsung.com \
    --cc=anuj20.g@samsung.com \
    --cc=asml.silence@gmail.com \
    --cc=axboe@kernel.dk \
    --cc=brauner@kernel.org \
    --cc=dw@davidwei.uk \
    --cc=gost.dev@samsung.com \
    --cc=hch@lst.de \
    --cc=io-uring@vger.kernel.org \
    --cc=kbusch@kernel.org \
    --cc=linux-block@vger.kernel.org \
    --cc=linux-nvme@lists.infradead.org \
    --cc=martin.petersen@oracle.com \
    --cc=nj.shetty@samsung.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.