LKML Archive mirror
 help / color / mirror / Atom feed
From: Damien Le Moal <dlemoal@kernel.org>
To: Nitesh Shetty <nj.shetty@samsung.com>,
	Jens Axboe <axboe@kernel.dk>, Jonathan Corbet <corbet@lwn.net>,
	Alasdair Kergon <agk@redhat.com>,
	Mike Snitzer <snitzer@kernel.org>,
	Mikulas Patocka <mpatocka@redhat.com>,
	Keith Busch <kbusch@kernel.org>, Christoph Hellwig <hch@lst.de>,
	Sagi Grimberg <sagi@grimberg.me>,
	Chaitanya Kulkarni <kch@nvidia.com>,
	Alexander Viro <viro@zeniv.linux.org.uk>,
	Christian Brauner <brauner@kernel.org>, Jan Kara <jack@suse.cz>
Cc: martin.petersen@oracle.com, bvanassche@acm.org,
	david@fromorbit.com, hare@suse.de,
	damien.lemoal@opensource.wdc.com, anuj20.g@samsung.com,
	joshi.k@samsung.com, nitheshshetty@gmail.com,
	gost.dev@samsung.com, linux-block@vger.kernel.org,
	linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org,
	dm-devel@lists.linux.dev, linux-nvme@lists.infradead.org,
	linux-fsdevel@vger.kernel.org
Subject: Re: [PATCH v20 02/12] Add infrastructure for copy offload in block and request layer.
Date: Mon, 20 May 2024 17:00:03 +0200	[thread overview]
Message-ID: <8f60ed88-1978-4d7c-9149-aee672aa1b09@kernel.org> (raw)
In-Reply-To: <20240520102033.9361-3-nj.shetty@samsung.com>

On 2024/05/20 12:20, Nitesh Shetty wrote:
> We add two new opcode REQ_OP_COPY_DST, REQ_OP_COPY_SRC.
> Since copy is a composite operation involving src and dst sectors/lba,
> each needs to be represented by a separate bio to make it compatible
> with device mapper.

Why ? The beginning of the sentence isn't justification enough for the two new
operation codes ? The 2 sentences should be reversed for easier reading:
justification first naturally leads to the reader understanding why the codes
are needed.

Also: s/opcode/operations


> We expect caller to take a plug and send bio with destination information,
> followed by bio with source information.

expect ? Plugging is optional. Does copy offload require it ? Please clarify this.

> Once the dst bio arrives we form a request and wait for source

arrives ? You mean "is submitted" ?

s/and wait for/and wait for the

> bio. Upon arrival of source bio we merge these two bio's and send

s/arrival/submission ?

s/of/of the
s/bio's/BIOs
s/and send/and send the
s/down to/down to the

> corresponding request down to device driver.
> Merging non copy offload bio is avoided by checking for copy specific
> opcodes in merge function.

Super unclear... What are you trying to say here ? That merging copy offload
BIOs with other BIOs is not allowed ? That is already handled. Only BIOs &
requests with the same operation can be merged. The code below also suggests
that you allow merging copy offloads... So I really do not understand this.

> 
> Signed-off-by: Nitesh Shetty <nj.shetty@samsung.com>
> Signed-off-by: Anuj Gupta <anuj20.g@samsung.com>
> ---
>  block/blk-core.c          |  7 +++++++
>  block/blk-merge.c         | 41 +++++++++++++++++++++++++++++++++++++++
>  block/blk.h               | 16 +++++++++++++++
>  block/elevator.h          |  1 +
>  include/linux/bio.h       |  6 +-----
>  include/linux/blk_types.h | 10 ++++++++++
>  6 files changed, 76 insertions(+), 5 deletions(-)
> 
> diff --git a/block/blk-core.c b/block/blk-core.c
> index ea44b13af9af..f18ee5f709c0 100644
> --- a/block/blk-core.c
> +++ b/block/blk-core.c
> @@ -122,6 +122,8 @@ static const char *const blk_op_name[] = {
>  	REQ_OP_NAME(ZONE_FINISH),
>  	REQ_OP_NAME(ZONE_APPEND),
>  	REQ_OP_NAME(WRITE_ZEROES),
> +	REQ_OP_NAME(COPY_SRC),
> +	REQ_OP_NAME(COPY_DST),
>  	REQ_OP_NAME(DRV_IN),
>  	REQ_OP_NAME(DRV_OUT),
>  };
> @@ -838,6 +840,11 @@ void submit_bio_noacct(struct bio *bio)
>  		 * requests.
>  		 */
>  		fallthrough;
> +	case REQ_OP_COPY_SRC:
> +	case REQ_OP_COPY_DST:
> +		if (!q->limits.max_copy_sectors)
> +			goto not_supported;
> +		break;
>  	default:
>  		goto not_supported;
>  	}
> diff --git a/block/blk-merge.c b/block/blk-merge.c
> index 8534c35e0497..f8dc48a03379 100644
> --- a/block/blk-merge.c
> +++ b/block/blk-merge.c
> @@ -154,6 +154,20 @@ static struct bio *bio_split_write_zeroes(struct bio *bio,
>  	return bio_split(bio, lim->max_write_zeroes_sectors, GFP_NOIO, bs);
>  }
>  
> +static struct bio *bio_split_copy(struct bio *bio,
> +				  const struct queue_limits *lim,
> +				  unsigned int *nsegs)
> +{
> +	*nsegs = 1;
> +	if (bio_sectors(bio) <= lim->max_copy_sectors)
> +		return NULL;
> +	/*
> +	 * We don't support splitting for a copy bio. End it with EIO if
> +	 * splitting is required and return an error pointer.
> +	 */
> +	return ERR_PTR(-EIO);
> +}

Hmm... Why not check that the copy request is small enough and will not be split
when it is submitted ? Something like blk_check_zone_append() does with
REQ_OP_ZONE_APPEND ? So adding a blk_check_copy_offload(). That would also
include the limits check from the previous hunk.

> +
>  /*
>   * Return the maximum number of sectors from the start of a bio that may be
>   * submitted as a single request to a block device. If enough sectors remain,
> @@ -362,6 +376,12 @@ struct bio *__bio_split_to_limits(struct bio *bio,
>  	case REQ_OP_WRITE_ZEROES:
>  		split = bio_split_write_zeroes(bio, lim, nr_segs, bs);
>  		break;
> +	case REQ_OP_COPY_SRC:
> +	case REQ_OP_COPY_DST:
> +		split = bio_split_copy(bio, lim, nr_segs);
> +		if (IS_ERR(split))
> +			return NULL;
> +		break;

See above.

>  	default:
>  		split = bio_split_rw(bio, lim, nr_segs, bs,
>  				get_max_io_size(bio, lim) << SECTOR_SHIFT);
> @@ -925,6 +945,9 @@ bool blk_rq_merge_ok(struct request *rq, struct bio *bio)
>  	if (!rq_mergeable(rq) || !bio_mergeable(bio))
>  		return false;
>  
> +	if (blk_copy_offload_mergable(rq, bio))
> +		return true;
> +
>  	if (req_op(rq) != bio_op(bio))
>  		return false;
>  
> @@ -958,6 +981,8 @@ enum elv_merge blk_try_merge(struct request *rq, struct bio *bio)
>  {
>  	if (blk_discard_mergable(rq))
>  		return ELEVATOR_DISCARD_MERGE;
> +	else if (blk_copy_offload_mergable(rq, bio))
> +		return ELEVATOR_COPY_OFFLOAD_MERGE;
>  	else if (blk_rq_pos(rq) + blk_rq_sectors(rq) == bio->bi_iter.bi_sector)
>  		return ELEVATOR_BACK_MERGE;
>  	else if (blk_rq_pos(rq) - bio_sectors(bio) == bio->bi_iter.bi_sector)
> @@ -1065,6 +1090,20 @@ static enum bio_merge_status bio_attempt_discard_merge(struct request_queue *q,
>  	return BIO_MERGE_FAILED;
>  }
>  
> +static enum bio_merge_status bio_attempt_copy_offload_merge(struct request *req,
> +							    struct bio *bio)
> +{
> +	if (req->__data_len != bio->bi_iter.bi_size)
> +		return BIO_MERGE_FAILED;
> +
> +	req->biotail->bi_next = bio;
> +	req->biotail = bio;
> +	req->nr_phys_segments++;
> +	req->__data_len += bio->bi_iter.bi_size;

Arg... You seem to be assuming that the source BIO always comes right after the
destination request... What if copy offloads are being concurrently issued ?
Shouldn't you check somehow that the pair is a match ? Or are you relying on the
per-context plugging which prevents that from happening in the first place ? But
that would assumes that you never ever sleep trying to allocate the source BIO
after the destination BIO/request are prepared and plugged.

> +
> +	return BIO_MERGE_OK;
> +}
> +
>  static enum bio_merge_status blk_attempt_bio_merge(struct request_queue *q,
>  						   struct request *rq,
>  						   struct bio *bio,
> @@ -1085,6 +1124,8 @@ static enum bio_merge_status blk_attempt_bio_merge(struct request_queue *q,
>  		break;
>  	case ELEVATOR_DISCARD_MERGE:
>  		return bio_attempt_discard_merge(q, rq, bio);
> +	case ELEVATOR_COPY_OFFLOAD_MERGE:
> +		return bio_attempt_copy_offload_merge(rq, bio);
>  	default:
>  		return BIO_MERGE_NONE;
>  	}
> diff --git a/block/blk.h b/block/blk.h
> index 189bc25beb50..6528a2779b84 100644
> --- a/block/blk.h
> +++ b/block/blk.h
> @@ -174,6 +174,20 @@ static inline bool blk_discard_mergable(struct request *req)
>  	return false;
>  }
>  
> +/*
> + * Copy offload sends a pair of bio with REQ_OP_COPY_DST and REQ_OP_COPY_SRC
> + * operation by taking a plug.
> + * Initially DST bio is sent which forms a request and
> + * waits for SRC bio to arrive. Once SRC bio arrives
> + * we merge it and send request down to driver.
> + */
> +static inline bool blk_copy_offload_mergable(struct request *req,
> +					     struct bio *bio)
> +{
> +	return (req_op(req) == REQ_OP_COPY_DST &&
> +		bio_op(bio) == REQ_OP_COPY_SRC);
> +}

This function is really not needed at all (used in one place only).

> +
>  static inline unsigned int blk_rq_get_max_segments(struct request *rq)
>  {
>  	if (req_op(rq) == REQ_OP_DISCARD)
> @@ -323,6 +337,8 @@ static inline bool bio_may_exceed_limits(struct bio *bio,
>  	case REQ_OP_DISCARD:
>  	case REQ_OP_SECURE_ERASE:
>  	case REQ_OP_WRITE_ZEROES:
> +	case REQ_OP_COPY_SRC:
> +	case REQ_OP_COPY_DST:
>  		return true; /* non-trivial splitting decisions */

See above. Limits should be checked on submission.

>  	default:
>  		break;
> diff --git a/block/elevator.h b/block/elevator.h
> index e9a050a96e53..c7a45c1f4156 100644
> --- a/block/elevator.h
> +++ b/block/elevator.h
> @@ -18,6 +18,7 @@ enum elv_merge {
>  	ELEVATOR_FRONT_MERGE	= 1,
>  	ELEVATOR_BACK_MERGE	= 2,
>  	ELEVATOR_DISCARD_MERGE	= 3,
> +	ELEVATOR_COPY_OFFLOAD_MERGE	= 4,
>  };
>  
>  struct blk_mq_alloc_data;
> diff --git a/include/linux/bio.h b/include/linux/bio.h
> index d5379548d684..528ef22dd65b 100644
> --- a/include/linux/bio.h
> +++ b/include/linux/bio.h
> @@ -53,11 +53,7 @@ static inline unsigned int bio_max_segs(unsigned int nr_segs)
>   */
>  static inline bool bio_has_data(struct bio *bio)
>  {
> -	if (bio &&
> -	    bio->bi_iter.bi_size &&
> -	    bio_op(bio) != REQ_OP_DISCARD &&
> -	    bio_op(bio) != REQ_OP_SECURE_ERASE &&
> -	    bio_op(bio) != REQ_OP_WRITE_ZEROES)
> +	if (bio && (bio_op(bio) == REQ_OP_READ || bio_op(bio) == REQ_OP_WRITE))
>  		return true;

This change seems completely broken and out of place. This would cause a return
of false for zone append operations.

>  
>  	return false;
> diff --git a/include/linux/blk_types.h b/include/linux/blk_types.h
> index 781c4500491b..7f692bade271 100644
> --- a/include/linux/blk_types.h
> +++ b/include/linux/blk_types.h
> @@ -342,6 +342,10 @@ enum req_op {
>  	/* reset all the zone present on the device */
>  	REQ_OP_ZONE_RESET_ALL	= (__force blk_opf_t)15,
>  
> +	/* copy offload src and dst operation */

s/src/source
s/dst/destination
s/operation/operations

> +	REQ_OP_COPY_SRC		= (__force blk_opf_t)18,
> +	REQ_OP_COPY_DST		= (__force blk_opf_t)19,
> +
>  	/* Driver private requests */
>  	REQ_OP_DRV_IN		= (__force blk_opf_t)34,
>  	REQ_OP_DRV_OUT		= (__force blk_opf_t)35,
> @@ -430,6 +434,12 @@ static inline bool op_is_write(blk_opf_t op)
>  	return !!(op & (__force blk_opf_t)1);
>  }
>  
> +static inline bool op_is_copy(blk_opf_t op)
> +{
> +	return ((op & REQ_OP_MASK) == REQ_OP_COPY_SRC ||
> +		(op & REQ_OP_MASK) == REQ_OP_COPY_DST);
> +}

May be use a switch here to avoid the double masking of op ?

> +
>  /*
>   * Check if the bio or request is one that needs special treatment in the
>   * flush state machine.

-- 
Damien Le Moal
Western Digital Research


  reply	other threads:[~2024-05-20 15:00 UTC|newest]

Thread overview: 67+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <CGME20240520102747epcas5p33497a911ca70c991e5da8e22c5d1336b@epcas5p3.samsung.com>
2024-05-20 10:20 ` [PATCH v20 00/12] Implement copy offload support Nitesh Shetty
     [not found]   ` <CGME20240520102830epcas5p27274901f3d0c2738c515709890b1dec4@epcas5p2.samsung.com>
2024-05-20 10:20     ` [PATCH v20 01/12] block: Introduce queue limits and sysfs for copy-offload support Nitesh Shetty
2024-05-20 14:33       ` Damien Le Moal
2024-05-21  8:15         ` Nitesh Shetty
2024-05-20 22:42       ` Bart Van Assche
2024-05-21 14:25         ` Nitesh Shetty
2024-05-22 17:49           ` Bart Van Assche
2024-05-23  7:05             ` Nitesh Shetty
2024-05-21  6:57       ` Hannes Reinecke
2024-06-01  5:53       ` Christoph Hellwig
     [not found]   ` <CGME20240520102842epcas5p4949334c2587a15b8adab2c913daa622f@epcas5p4.samsung.com>
2024-05-20 10:20     ` [PATCH v20 02/12] Add infrastructure for copy offload in block and request layer Nitesh Shetty
2024-05-20 15:00       ` Damien Le Moal [this message]
2024-05-21 10:50         ` Nitesh Shetty
2024-05-20 23:00       ` Bart Van Assche
2024-05-21 11:17         ` Nitesh Shetty
2024-05-22 17:58           ` Bart Van Assche
2024-05-21  7:01       ` Hannes Reinecke
2024-05-24  6:54         ` Nitesh Shetty
     [not found]         ` <66503bc7.630a0220.56c85.8b9dSMTPIN_ADDED_BROKEN@mx.google.com>
2024-05-24 13:52           ` Bart Van Assche
2024-05-27  8:27             ` Nitesh Shetty
2024-05-28 14:07         ` Bart Van Assche
2024-05-22 18:05       ` Bart Van Assche
2024-05-23 11:34         ` Nitesh Shetty
2024-05-24 20:33       ` Bart Van Assche
2024-05-29  6:17         ` Nitesh Shetty
2024-05-29  7:48           ` Damien Le Moal
2024-05-29 22:41             ` Bart Van Assche
2024-05-30  7:16               ` Nitesh Shetty
     [not found]               ` <665850bd.050a0220.a5e6b.5b72SMTPIN_ADDED_BROKEN@mx.google.com>
2024-05-30 17:11                 ` Bart Van Assche
2024-05-31 10:17                   ` Nitesh Shetty
     [not found]                   ` <6659b691.630a0220.90195.d0ebSMTPIN_ADDED_BROKEN@mx.google.com>
2024-05-31 23:45                     ` Bart Van Assche
2024-06-01  5:59                   ` Christoph Hellwig
2024-06-01  5:57             ` Christoph Hellwig
     [not found]   ` <CGME20240520102853epcas5p42d635d6712b8876ea22a45d730cb1378@epcas5p4.samsung.com>
2024-05-20 10:20     ` [PATCH v20 03/12] block: add copy offload support Nitesh Shetty
2024-06-01  6:16       ` Christoph Hellwig
     [not found]   ` <CGME20240520102906epcas5p15b5a0b3c8edd0bf3073030a792a328bb@epcas5p1.samsung.com>
2024-05-20 10:20     ` [PATCH v20 04/12] block: add emulation for copy Nitesh Shetty
2024-05-21  7:06       ` Hannes Reinecke
2024-05-21 11:29         ` Nitesh Shetty
2024-06-01  6:18       ` Christoph Hellwig
     [not found]   ` <CGME20240520102917epcas5p1bda532309b9174bf2702081f6f58daf7@epcas5p1.samsung.com>
2024-05-20 10:20     ` [PATCH v20 05/12] fs/read_write: Enable copy_file_range for block device Nitesh Shetty
2024-05-21  7:07       ` Hannes Reinecke
2024-05-25 23:02       ` Dave Chinner
2024-05-28  5:57         ` Nitesh Shetty
     [not found]   ` <CGME20240520102929epcas5p2f4456f6fa0005d90769615eb2c2bf273@epcas5p2.samsung.com>
2024-05-20 10:20     ` [PATCH v20 06/12] fs, block: copy_file_range for def_blk_ops for direct " Nitesh Shetty
2024-05-25 23:09       ` Dave Chinner
2024-05-27  8:43         ` Nitesh Shetty
     [not found]   ` <CGME20240520102940epcas5p2b5f38ceabe94bed3905fb386a0d65ec7@epcas5p2.samsung.com>
2024-05-20 10:20     ` [PATCH v20 07/12] nvme: add copy offload support Nitesh Shetty
2024-06-01  6:22       ` Christoph Hellwig
     [not found]   ` <CGME20240520102952epcas5p18716d203d1810c38397e7fcc9a26922a@epcas5p1.samsung.com>
2024-05-20 10:20     ` [PATCH v20 08/12] nvmet: add copy command support for bdev and file ns Nitesh Shetty
     [not found]   ` <CGME20240520103004epcas5p4a18f3f6ba0f218d57b0ab4bb84c6ff18@epcas5p4.samsung.com>
2024-05-20 10:20     ` [PATCH v20 09/12] dm: Add support for copy offload Nitesh Shetty
2024-05-21  7:11       ` Hannes Reinecke
2024-05-21 14:08         ` Nitesh Shetty
2024-05-22  6:22           ` Hannes Reinecke
2024-05-22  7:10             ` Nitesh Shetty
     [not found]   ` <CGME20240520103016epcas5p31b9a0f3637959626d49763609ebda6ef@epcas5p3.samsung.com>
2024-05-20 10:20     ` [PATCH v20 10/12] dm: Enable copy offload for dm-linear target Nitesh Shetty
2024-05-20 23:25       ` Bart Van Assche
2024-05-21 14:48         ` Nitesh Shetty
     [not found]   ` <CGME20240520103027epcas5p4789defe8ab3bff23bd2abcf019689fa2@epcas5p4.samsung.com>
2024-05-20 10:20     ` [PATCH v20 11/12] null: Enable trace capability for null block Nitesh Shetty
2024-05-20 23:27       ` Bart Van Assche
2024-06-01  6:23       ` Christoph Hellwig
     [not found]   ` <CGME20240520103039epcas5p4373f7234162a32222ac225b976ae30ce@epcas5p4.samsung.com>
2024-05-20 10:20     ` [PATCH v20 12/12] null_blk: add support for copy offload Nitesh Shetty
2024-05-20 23:42       ` Bart Van Assche
2024-05-21 14:46         ` Nitesh Shetty
2024-05-22 17:52           ` Bart Van Assche
2024-05-23  6:55             ` Nitesh Shetty
2024-05-20 22:54   ` [PATCH v20 00/12] Implement copy offload support Bart Van Assche
2024-06-01  5:47   ` Christoph Hellwig

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=8f60ed88-1978-4d7c-9149-aee672aa1b09@kernel.org \
    --to=dlemoal@kernel.org \
    --cc=agk@redhat.com \
    --cc=anuj20.g@samsung.com \
    --cc=axboe@kernel.dk \
    --cc=brauner@kernel.org \
    --cc=bvanassche@acm.org \
    --cc=corbet@lwn.net \
    --cc=damien.lemoal@opensource.wdc.com \
    --cc=david@fromorbit.com \
    --cc=dm-devel@lists.linux.dev \
    --cc=gost.dev@samsung.com \
    --cc=hare@suse.de \
    --cc=hch@lst.de \
    --cc=jack@suse.cz \
    --cc=joshi.k@samsung.com \
    --cc=kbusch@kernel.org \
    --cc=kch@nvidia.com \
    --cc=linux-block@vger.kernel.org \
    --cc=linux-doc@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-nvme@lists.infradead.org \
    --cc=martin.petersen@oracle.com \
    --cc=mpatocka@redhat.com \
    --cc=nitheshshetty@gmail.com \
    --cc=nj.shetty@samsung.com \
    --cc=sagi@grimberg.me \
    --cc=snitzer@kernel.org \
    --cc=viro@zeniv.linux.org.uk \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).