From: Damien Le Moal <dlemoal@kernel.org>
To: Nitesh Shetty <nj.shetty@samsung.com>,
Jens Axboe <axboe@kernel.dk>, Jonathan Corbet <corbet@lwn.net>,
Alasdair Kergon <agk@redhat.com>,
Mike Snitzer <snitzer@kernel.org>,
Mikulas Patocka <mpatocka@redhat.com>,
Keith Busch <kbusch@kernel.org>, Christoph Hellwig <hch@lst.de>,
Sagi Grimberg <sagi@grimberg.me>,
Chaitanya Kulkarni <kch@nvidia.com>,
Alexander Viro <viro@zeniv.linux.org.uk>,
Christian Brauner <brauner@kernel.org>, Jan Kara <jack@suse.cz>
Cc: martin.petersen@oracle.com, bvanassche@acm.org,
david@fromorbit.com, hare@suse.de,
damien.lemoal@opensource.wdc.com, anuj20.g@samsung.com,
joshi.k@samsung.com, nitheshshetty@gmail.com,
gost.dev@samsung.com, linux-block@vger.kernel.org,
linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org,
dm-devel@lists.linux.dev, linux-nvme@lists.infradead.org,
linux-fsdevel@vger.kernel.org
Subject: Re: [PATCH v20 02/12] Add infrastructure for copy offload in block and request layer.
Date: Mon, 20 May 2024 17:00:03 +0200 [thread overview]
Message-ID: <8f60ed88-1978-4d7c-9149-aee672aa1b09@kernel.org> (raw)
In-Reply-To: <20240520102033.9361-3-nj.shetty@samsung.com>
On 2024/05/20 12:20, Nitesh Shetty wrote:
> We add two new opcode REQ_OP_COPY_DST, REQ_OP_COPY_SRC.
> Since copy is a composite operation involving src and dst sectors/lba,
> each needs to be represented by a separate bio to make it compatible
> with device mapper.
Why ? The beginning of the sentence isn't justification enough for the two new
operation codes ? The 2 sentences should be reversed for easier reading:
justification first naturally leads to the reader understanding why the codes
are needed.
Also: s/opcode/operations
> We expect caller to take a plug and send bio with destination information,
> followed by bio with source information.
expect ? Plugging is optional. Does copy offload require it ? Please clarify this.
> Once the dst bio arrives we form a request and wait for source
arrives ? You mean "is submitted" ?
s/and wait for/and wait for the
> bio. Upon arrival of source bio we merge these two bio's and send
s/arrival/submission ?
s/of/of the
s/bio's/BIOs
s/and send/and send the
s/down to/down to the
> corresponding request down to device driver.
> Merging non copy offload bio is avoided by checking for copy specific
> opcodes in merge function.
Super unclear... What are you trying to say here ? That merging copy offload
BIOs with other BIOs is not allowed ? That is already handled. Only BIOs &
requests with the same operation can be merged. The code below also suggests
that you allow merging copy offloads... So I really do not understand this.
>
> Signed-off-by: Nitesh Shetty <nj.shetty@samsung.com>
> Signed-off-by: Anuj Gupta <anuj20.g@samsung.com>
> ---
> block/blk-core.c | 7 +++++++
> block/blk-merge.c | 41 +++++++++++++++++++++++++++++++++++++++
> block/blk.h | 16 +++++++++++++++
> block/elevator.h | 1 +
> include/linux/bio.h | 6 +-----
> include/linux/blk_types.h | 10 ++++++++++
> 6 files changed, 76 insertions(+), 5 deletions(-)
>
> diff --git a/block/blk-core.c b/block/blk-core.c
> index ea44b13af9af..f18ee5f709c0 100644
> --- a/block/blk-core.c
> +++ b/block/blk-core.c
> @@ -122,6 +122,8 @@ static const char *const blk_op_name[] = {
> REQ_OP_NAME(ZONE_FINISH),
> REQ_OP_NAME(ZONE_APPEND),
> REQ_OP_NAME(WRITE_ZEROES),
> + REQ_OP_NAME(COPY_SRC),
> + REQ_OP_NAME(COPY_DST),
> REQ_OP_NAME(DRV_IN),
> REQ_OP_NAME(DRV_OUT),
> };
> @@ -838,6 +840,11 @@ void submit_bio_noacct(struct bio *bio)
> * requests.
> */
> fallthrough;
> + case REQ_OP_COPY_SRC:
> + case REQ_OP_COPY_DST:
> + if (!q->limits.max_copy_sectors)
> + goto not_supported;
> + break;
> default:
> goto not_supported;
> }
> diff --git a/block/blk-merge.c b/block/blk-merge.c
> index 8534c35e0497..f8dc48a03379 100644
> --- a/block/blk-merge.c
> +++ b/block/blk-merge.c
> @@ -154,6 +154,20 @@ static struct bio *bio_split_write_zeroes(struct bio *bio,
> return bio_split(bio, lim->max_write_zeroes_sectors, GFP_NOIO, bs);
> }
>
> +static struct bio *bio_split_copy(struct bio *bio,
> + const struct queue_limits *lim,
> + unsigned int *nsegs)
> +{
> + *nsegs = 1;
> + if (bio_sectors(bio) <= lim->max_copy_sectors)
> + return NULL;
> + /*
> + * We don't support splitting for a copy bio. End it with EIO if
> + * splitting is required and return an error pointer.
> + */
> + return ERR_PTR(-EIO);
> +}
Hmm... Why not check that the copy request is small enough and will not be split
when it is submitted ? Something like blk_check_zone_append() does with
REQ_OP_ZONE_APPEND ? So adding a blk_check_copy_offload(). That would also
include the limits check from the previous hunk.
> +
> /*
> * Return the maximum number of sectors from the start of a bio that may be
> * submitted as a single request to a block device. If enough sectors remain,
> @@ -362,6 +376,12 @@ struct bio *__bio_split_to_limits(struct bio *bio,
> case REQ_OP_WRITE_ZEROES:
> split = bio_split_write_zeroes(bio, lim, nr_segs, bs);
> break;
> + case REQ_OP_COPY_SRC:
> + case REQ_OP_COPY_DST:
> + split = bio_split_copy(bio, lim, nr_segs);
> + if (IS_ERR(split))
> + return NULL;
> + break;
See above.
> default:
> split = bio_split_rw(bio, lim, nr_segs, bs,
> get_max_io_size(bio, lim) << SECTOR_SHIFT);
> @@ -925,6 +945,9 @@ bool blk_rq_merge_ok(struct request *rq, struct bio *bio)
> if (!rq_mergeable(rq) || !bio_mergeable(bio))
> return false;
>
> + if (blk_copy_offload_mergable(rq, bio))
> + return true;
> +
> if (req_op(rq) != bio_op(bio))
> return false;
>
> @@ -958,6 +981,8 @@ enum elv_merge blk_try_merge(struct request *rq, struct bio *bio)
> {
> if (blk_discard_mergable(rq))
> return ELEVATOR_DISCARD_MERGE;
> + else if (blk_copy_offload_mergable(rq, bio))
> + return ELEVATOR_COPY_OFFLOAD_MERGE;
> else if (blk_rq_pos(rq) + blk_rq_sectors(rq) == bio->bi_iter.bi_sector)
> return ELEVATOR_BACK_MERGE;
> else if (blk_rq_pos(rq) - bio_sectors(bio) == bio->bi_iter.bi_sector)
> @@ -1065,6 +1090,20 @@ static enum bio_merge_status bio_attempt_discard_merge(struct request_queue *q,
> return BIO_MERGE_FAILED;
> }
>
> +static enum bio_merge_status bio_attempt_copy_offload_merge(struct request *req,
> + struct bio *bio)
> +{
> + if (req->__data_len != bio->bi_iter.bi_size)
> + return BIO_MERGE_FAILED;
> +
> + req->biotail->bi_next = bio;
> + req->biotail = bio;
> + req->nr_phys_segments++;
> + req->__data_len += bio->bi_iter.bi_size;
Arg... You seem to be assuming that the source BIO always comes right after the
destination request... What if copy offloads are being concurrently issued ?
Shouldn't you check somehow that the pair is a match ? Or are you relying on the
per-context plugging which prevents that from happening in the first place ? But
that would assumes that you never ever sleep trying to allocate the source BIO
after the destination BIO/request are prepared and plugged.
> +
> + return BIO_MERGE_OK;
> +}
> +
> static enum bio_merge_status blk_attempt_bio_merge(struct request_queue *q,
> struct request *rq,
> struct bio *bio,
> @@ -1085,6 +1124,8 @@ static enum bio_merge_status blk_attempt_bio_merge(struct request_queue *q,
> break;
> case ELEVATOR_DISCARD_MERGE:
> return bio_attempt_discard_merge(q, rq, bio);
> + case ELEVATOR_COPY_OFFLOAD_MERGE:
> + return bio_attempt_copy_offload_merge(rq, bio);
> default:
> return BIO_MERGE_NONE;
> }
> diff --git a/block/blk.h b/block/blk.h
> index 189bc25beb50..6528a2779b84 100644
> --- a/block/blk.h
> +++ b/block/blk.h
> @@ -174,6 +174,20 @@ static inline bool blk_discard_mergable(struct request *req)
> return false;
> }
>
> +/*
> + * Copy offload sends a pair of bio with REQ_OP_COPY_DST and REQ_OP_COPY_SRC
> + * operation by taking a plug.
> + * Initially DST bio is sent which forms a request and
> + * waits for SRC bio to arrive. Once SRC bio arrives
> + * we merge it and send request down to driver.
> + */
> +static inline bool blk_copy_offload_mergable(struct request *req,
> + struct bio *bio)
> +{
> + return (req_op(req) == REQ_OP_COPY_DST &&
> + bio_op(bio) == REQ_OP_COPY_SRC);
> +}
This function is really not needed at all (used in one place only).
> +
> static inline unsigned int blk_rq_get_max_segments(struct request *rq)
> {
> if (req_op(rq) == REQ_OP_DISCARD)
> @@ -323,6 +337,8 @@ static inline bool bio_may_exceed_limits(struct bio *bio,
> case REQ_OP_DISCARD:
> case REQ_OP_SECURE_ERASE:
> case REQ_OP_WRITE_ZEROES:
> + case REQ_OP_COPY_SRC:
> + case REQ_OP_COPY_DST:
> return true; /* non-trivial splitting decisions */
See above. Limits should be checked on submission.
> default:
> break;
> diff --git a/block/elevator.h b/block/elevator.h
> index e9a050a96e53..c7a45c1f4156 100644
> --- a/block/elevator.h
> +++ b/block/elevator.h
> @@ -18,6 +18,7 @@ enum elv_merge {
> ELEVATOR_FRONT_MERGE = 1,
> ELEVATOR_BACK_MERGE = 2,
> ELEVATOR_DISCARD_MERGE = 3,
> + ELEVATOR_COPY_OFFLOAD_MERGE = 4,
> };
>
> struct blk_mq_alloc_data;
> diff --git a/include/linux/bio.h b/include/linux/bio.h
> index d5379548d684..528ef22dd65b 100644
> --- a/include/linux/bio.h
> +++ b/include/linux/bio.h
> @@ -53,11 +53,7 @@ static inline unsigned int bio_max_segs(unsigned int nr_segs)
> */
> static inline bool bio_has_data(struct bio *bio)
> {
> - if (bio &&
> - bio->bi_iter.bi_size &&
> - bio_op(bio) != REQ_OP_DISCARD &&
> - bio_op(bio) != REQ_OP_SECURE_ERASE &&
> - bio_op(bio) != REQ_OP_WRITE_ZEROES)
> + if (bio && (bio_op(bio) == REQ_OP_READ || bio_op(bio) == REQ_OP_WRITE))
> return true;
This change seems completely broken and out of place. This would cause a return
of false for zone append operations.
>
> return false;
> diff --git a/include/linux/blk_types.h b/include/linux/blk_types.h
> index 781c4500491b..7f692bade271 100644
> --- a/include/linux/blk_types.h
> +++ b/include/linux/blk_types.h
> @@ -342,6 +342,10 @@ enum req_op {
> /* reset all the zone present on the device */
> REQ_OP_ZONE_RESET_ALL = (__force blk_opf_t)15,
>
> + /* copy offload src and dst operation */
s/src/source
s/dst/destination
s/operation/operations
> + REQ_OP_COPY_SRC = (__force blk_opf_t)18,
> + REQ_OP_COPY_DST = (__force blk_opf_t)19,
> +
> /* Driver private requests */
> REQ_OP_DRV_IN = (__force blk_opf_t)34,
> REQ_OP_DRV_OUT = (__force blk_opf_t)35,
> @@ -430,6 +434,12 @@ static inline bool op_is_write(blk_opf_t op)
> return !!(op & (__force blk_opf_t)1);
> }
>
> +static inline bool op_is_copy(blk_opf_t op)
> +{
> + return ((op & REQ_OP_MASK) == REQ_OP_COPY_SRC ||
> + (op & REQ_OP_MASK) == REQ_OP_COPY_DST);
> +}
May be use a switch here to avoid the double masking of op ?
> +
> /*
> * Check if the bio or request is one that needs special treatment in the
> * flush state machine.
--
Damien Le Moal
Western Digital Research
next prev parent reply other threads:[~2024-05-20 15:00 UTC|newest]
Thread overview: 67+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <CGME20240520102747epcas5p33497a911ca70c991e5da8e22c5d1336b@epcas5p3.samsung.com>
2024-05-20 10:20 ` [PATCH v20 00/12] Implement copy offload support Nitesh Shetty
[not found] ` <CGME20240520102830epcas5p27274901f3d0c2738c515709890b1dec4@epcas5p2.samsung.com>
2024-05-20 10:20 ` [PATCH v20 01/12] block: Introduce queue limits and sysfs for copy-offload support Nitesh Shetty
2024-05-20 14:33 ` Damien Le Moal
2024-05-21 8:15 ` Nitesh Shetty
2024-05-20 22:42 ` Bart Van Assche
2024-05-21 14:25 ` Nitesh Shetty
2024-05-22 17:49 ` Bart Van Assche
2024-05-23 7:05 ` Nitesh Shetty
2024-05-21 6:57 ` Hannes Reinecke
2024-06-01 5:53 ` Christoph Hellwig
[not found] ` <CGME20240520102842epcas5p4949334c2587a15b8adab2c913daa622f@epcas5p4.samsung.com>
2024-05-20 10:20 ` [PATCH v20 02/12] Add infrastructure for copy offload in block and request layer Nitesh Shetty
2024-05-20 15:00 ` Damien Le Moal [this message]
2024-05-21 10:50 ` Nitesh Shetty
2024-05-20 23:00 ` Bart Van Assche
2024-05-21 11:17 ` Nitesh Shetty
2024-05-22 17:58 ` Bart Van Assche
2024-05-21 7:01 ` Hannes Reinecke
2024-05-24 6:54 ` Nitesh Shetty
[not found] ` <66503bc7.630a0220.56c85.8b9dSMTPIN_ADDED_BROKEN@mx.google.com>
2024-05-24 13:52 ` Bart Van Assche
2024-05-27 8:27 ` Nitesh Shetty
2024-05-28 14:07 ` Bart Van Assche
2024-05-22 18:05 ` Bart Van Assche
2024-05-23 11:34 ` Nitesh Shetty
2024-05-24 20:33 ` Bart Van Assche
2024-05-29 6:17 ` Nitesh Shetty
2024-05-29 7:48 ` Damien Le Moal
2024-05-29 22:41 ` Bart Van Assche
2024-05-30 7:16 ` Nitesh Shetty
[not found] ` <665850bd.050a0220.a5e6b.5b72SMTPIN_ADDED_BROKEN@mx.google.com>
2024-05-30 17:11 ` Bart Van Assche
2024-05-31 10:17 ` Nitesh Shetty
[not found] ` <6659b691.630a0220.90195.d0ebSMTPIN_ADDED_BROKEN@mx.google.com>
2024-05-31 23:45 ` Bart Van Assche
2024-06-01 5:59 ` Christoph Hellwig
2024-06-01 5:57 ` Christoph Hellwig
[not found] ` <CGME20240520102853epcas5p42d635d6712b8876ea22a45d730cb1378@epcas5p4.samsung.com>
2024-05-20 10:20 ` [PATCH v20 03/12] block: add copy offload support Nitesh Shetty
2024-06-01 6:16 ` Christoph Hellwig
[not found] ` <CGME20240520102906epcas5p15b5a0b3c8edd0bf3073030a792a328bb@epcas5p1.samsung.com>
2024-05-20 10:20 ` [PATCH v20 04/12] block: add emulation for copy Nitesh Shetty
2024-05-21 7:06 ` Hannes Reinecke
2024-05-21 11:29 ` Nitesh Shetty
2024-06-01 6:18 ` Christoph Hellwig
[not found] ` <CGME20240520102917epcas5p1bda532309b9174bf2702081f6f58daf7@epcas5p1.samsung.com>
2024-05-20 10:20 ` [PATCH v20 05/12] fs/read_write: Enable copy_file_range for block device Nitesh Shetty
2024-05-21 7:07 ` Hannes Reinecke
2024-05-25 23:02 ` Dave Chinner
2024-05-28 5:57 ` Nitesh Shetty
[not found] ` <CGME20240520102929epcas5p2f4456f6fa0005d90769615eb2c2bf273@epcas5p2.samsung.com>
2024-05-20 10:20 ` [PATCH v20 06/12] fs, block: copy_file_range for def_blk_ops for direct " Nitesh Shetty
2024-05-25 23:09 ` Dave Chinner
2024-05-27 8:43 ` Nitesh Shetty
[not found] ` <CGME20240520102940epcas5p2b5f38ceabe94bed3905fb386a0d65ec7@epcas5p2.samsung.com>
2024-05-20 10:20 ` [PATCH v20 07/12] nvme: add copy offload support Nitesh Shetty
2024-06-01 6:22 ` Christoph Hellwig
[not found] ` <CGME20240520102952epcas5p18716d203d1810c38397e7fcc9a26922a@epcas5p1.samsung.com>
2024-05-20 10:20 ` [PATCH v20 08/12] nvmet: add copy command support for bdev and file ns Nitesh Shetty
[not found] ` <CGME20240520103004epcas5p4a18f3f6ba0f218d57b0ab4bb84c6ff18@epcas5p4.samsung.com>
2024-05-20 10:20 ` [PATCH v20 09/12] dm: Add support for copy offload Nitesh Shetty
2024-05-21 7:11 ` Hannes Reinecke
2024-05-21 14:08 ` Nitesh Shetty
2024-05-22 6:22 ` Hannes Reinecke
2024-05-22 7:10 ` Nitesh Shetty
[not found] ` <CGME20240520103016epcas5p31b9a0f3637959626d49763609ebda6ef@epcas5p3.samsung.com>
2024-05-20 10:20 ` [PATCH v20 10/12] dm: Enable copy offload for dm-linear target Nitesh Shetty
2024-05-20 23:25 ` Bart Van Assche
2024-05-21 14:48 ` Nitesh Shetty
[not found] ` <CGME20240520103027epcas5p4789defe8ab3bff23bd2abcf019689fa2@epcas5p4.samsung.com>
2024-05-20 10:20 ` [PATCH v20 11/12] null: Enable trace capability for null block Nitesh Shetty
2024-05-20 23:27 ` Bart Van Assche
2024-06-01 6:23 ` Christoph Hellwig
[not found] ` <CGME20240520103039epcas5p4373f7234162a32222ac225b976ae30ce@epcas5p4.samsung.com>
2024-05-20 10:20 ` [PATCH v20 12/12] null_blk: add support for copy offload Nitesh Shetty
2024-05-20 23:42 ` Bart Van Assche
2024-05-21 14:46 ` Nitesh Shetty
2024-05-22 17:52 ` Bart Van Assche
2024-05-23 6:55 ` Nitesh Shetty
2024-05-20 22:54 ` [PATCH v20 00/12] Implement copy offload support Bart Van Assche
2024-06-01 5:47 ` Christoph Hellwig
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=8f60ed88-1978-4d7c-9149-aee672aa1b09@kernel.org \
--to=dlemoal@kernel.org \
--cc=agk@redhat.com \
--cc=anuj20.g@samsung.com \
--cc=axboe@kernel.dk \
--cc=brauner@kernel.org \
--cc=bvanassche@acm.org \
--cc=corbet@lwn.net \
--cc=damien.lemoal@opensource.wdc.com \
--cc=david@fromorbit.com \
--cc=dm-devel@lists.linux.dev \
--cc=gost.dev@samsung.com \
--cc=hare@suse.de \
--cc=hch@lst.de \
--cc=jack@suse.cz \
--cc=joshi.k@samsung.com \
--cc=kbusch@kernel.org \
--cc=kch@nvidia.com \
--cc=linux-block@vger.kernel.org \
--cc=linux-doc@vger.kernel.org \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-nvme@lists.infradead.org \
--cc=martin.petersen@oracle.com \
--cc=mpatocka@redhat.com \
--cc=nitheshshetty@gmail.com \
--cc=nj.shetty@samsung.com \
--cc=sagi@grimberg.me \
--cc=snitzer@kernel.org \
--cc=viro@zeniv.linux.org.uk \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).