DM-Devel Archive mirror
 help / color / mirror / Atom feed
From: Stefan Hajnoczi <stefanha@redhat.com>
To: linux-block@vger.kernel.org
Cc: linux-kernel@vger.kernel.org, eblake@redhat.com,
	Alasdair Kergon <agk@redhat.com>,
	Mikulas Patocka <mpatocka@redhat.com>,
	dm-devel@lists.linux.dev, David Teigland <teigland@redhat.com>,
	Mike Snitzer <snitzer@kernel.org>, Jens Axboe <axboe@kernel.dk>,
	Christoph Hellwig <hch@lst.de>, Joe Thornber <ejt@redhat.com>,
	Stefan Hajnoczi <stefanha@redhat.com>
Subject: [RFC 0/9] block: add llseek(SEEK_HOLE/SEEK_DATA) support
Date: Thu, 28 Mar 2024 16:39:01 -0400	[thread overview]
Message-ID: <20240328203910.2370087-1-stefanha@redhat.com> (raw)

cp(1) and backup tools use llseek(SEEK_HOLE/SEEK_DATA) to skip holes in files.
This can speed up the process by reducing the amount of data read and it
preserves sparseness when writing to the output file.

This patch series is an initial attempt at implementing
llseek(SEEK_HOLE/SEEK_DATA) for block devices. I'm looking for feedback on this
approach and suggestions for resolving the open issues.

In the block device world there are similar concepts to holes:
- SCSI has Logical Block Provisioning where the "mapped" state would be
  considered data and other states would be considered holes.
- NBD has NBD_CMD_BLOCK_STATUS for querying whether blocks are present.
- Linux loop block devices and dm-linear targets can pass through queries to
  the backing file.
- dm-thin targets can query metadata to find holes.
- ...and you may be able to think of more examples.

Therefore it is possible to offer this functionality in block drivers.

In my use case a QEMU process in userspace copies the contents of a dm-thin
target. QEMU already uses SEEK_HOLE but that doesn't work on dm-thin targets
without this patch series.

Others have also wished for block device support for SEEK_HOLE. Here is an open
issue from the BorgBackup project:
https://github.com/borgbackup/borg/issues/5609

With these patches userspace can identify holes in loop, dm-linear, and dm-thin
devices. This is done by adding a seek_hole_data() callback to struct
block_device_operations. When the callback is NULL the entire device is
considered data. Device-mapper is extended along the same lines so that targets
can provide seek_hole_data() callbacks.

I'm unfamiliar with much of this code and have probably missed locking
requirements. Since llseek() executes synchronously like ioctl() and is not an
asynchronous I/O request it's possible that my changes to the loop block driver
and dm-thin are broken (e.g. what if the loop device fd is changed during
llseek()?).

To run the tests:

  # make TARGETS=block_seek_hole -C tools/testing/selftests run_tests

The code is also available here:
https://gitlab.com/stefanha/linux/-/tree/block-seek-hole

Please take a look and let me know your thoughts. Thanks!

Stefan Hajnoczi (9):
  block: add llseek(SEEK_HOLE/SEEK_DATA) support
  loop: add llseek(SEEK_HOLE/SEEK_DATA) support
  selftests: block_seek_hole: add loop block driver tests
  dm: add llseek(SEEK_HOLE/SEEK_DATA) support
  selftests: block_seek_hole: add dm-zero test
  dm-linear: add llseek(SEEK_HOLE/SEEK_DATA) support
  selftests: block_seek_hole: add dm-linear test
  dm thin: add llseek(SEEK_HOLE/SEEK_DATA) support
  selftests: block_seek_hole: add dm-thin test

 tools/testing/selftests/Makefile              |   1 +
 .../selftests/block_seek_hole/Makefile        |  17 +++
 include/linux/blkdev.h                        |   7 ++
 include/linux/device-mapper.h                 |   5 +
 block/fops.c                                  |  43 ++++++-
 drivers/block/loop.c                          |  36 +++++-
 drivers/md/dm-linear.c                        |  25 ++++
 drivers/md/dm-thin.c                          |  77 ++++++++++++
 drivers/md/dm.c                               |  68 ++++++++++
 .../testing/selftests/block_seek_hole/config  |   3 +
 .../selftests/block_seek_hole/dm_thin.sh      |  80 ++++++++++++
 .../selftests/block_seek_hole/dm_zero.sh      |  31 +++++
 .../selftests/block_seek_hole/map_holes.py    |  37 ++++++
 .../testing/selftests/block_seek_hole/test.py | 117 ++++++++++++++++++
 14 files changed, 540 insertions(+), 7 deletions(-)
 create mode 100644 tools/testing/selftests/block_seek_hole/Makefile
 create mode 100644 tools/testing/selftests/block_seek_hole/config
 create mode 100755 tools/testing/selftests/block_seek_hole/dm_thin.sh
 create mode 100755 tools/testing/selftests/block_seek_hole/dm_zero.sh
 create mode 100755 tools/testing/selftests/block_seek_hole/map_holes.py
 create mode 100755 tools/testing/selftests/block_seek_hole/test.py

-- 
2.44.0


             reply	other threads:[~2024-03-28 20:39 UTC|newest]

Thread overview: 38+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-03-28 20:39 Stefan Hajnoczi [this message]
2024-03-28 20:39 ` [RFC 1/9] block: add llseek(SEEK_HOLE/SEEK_DATA) support Stefan Hajnoczi
2024-03-28 23:50   ` Eric Blake
2024-03-28 20:39 ` [RFC 2/9] loop: " Stefan Hajnoczi
2024-03-29  0:00   ` Eric Blake
2024-03-29 12:54     ` Stefan Hajnoczi
2024-03-28 20:39 ` [RFC 3/9] selftests: block_seek_hole: add loop block driver tests Stefan Hajnoczi
2024-03-29  0:11   ` Eric Blake
2024-04-03 13:50     ` Stefan Hajnoczi
2024-03-29 12:38   ` Eric Blake
2024-04-03 13:51     ` Stefan Hajnoczi
2024-03-28 20:39 ` [RFC 4/9] dm: add llseek(SEEK_HOLE/SEEK_DATA) support Stefan Hajnoczi
2024-03-29  0:38   ` Eric Blake
2024-04-03 14:11     ` Stefan Hajnoczi
2024-04-03 17:02       ` Eric Blake
2024-04-03 17:58         ` Stefan Hajnoczi
2024-04-03 19:28           ` Eric Blake
2024-03-28 20:39 ` [RFC 5/9] selftests: block_seek_hole: add dm-zero test Stefan Hajnoczi
2024-03-28 22:19   ` Eric Blake
2024-03-28 22:32     ` Stefan Hajnoczi
2024-03-28 20:39 ` [RFC 6/9] dm-linear: add llseek(SEEK_HOLE/SEEK_DATA) support Stefan Hajnoczi
2024-03-29  0:54   ` Eric Blake
2024-04-03 14:22     ` Stefan Hajnoczi
2024-03-28 20:39 ` [RFC 7/9] selftests: block_seek_hole: add dm-linear test Stefan Hajnoczi
2024-03-29  0:59   ` Eric Blake
2024-04-03 14:23     ` Stefan Hajnoczi
2024-03-28 20:39 ` [RFC 8/9] dm thin: add llseek(SEEK_HOLE/SEEK_DATA) support Stefan Hajnoczi
2024-03-29  1:31   ` Eric Blake
2024-04-03 15:03     ` Stefan Hajnoczi
2024-03-28 20:39 ` [RFC 9/9] selftests: block_seek_hole: add dm-thin test Stefan Hajnoczi
2024-03-28 22:16 ` [RFC 0/9] block: add llseek(SEEK_HOLE/SEEK_DATA) support Eric Blake
2024-03-28 22:29   ` Eric Blake
2024-03-28 23:09   ` Stefan Hajnoczi
2024-04-02 12:26 ` Christoph Hellwig
2024-04-02 13:04   ` Stefan Hajnoczi
2024-04-05  7:02     ` Christoph Hellwig
2024-04-02 13:31   ` Eric Blake
2024-04-05  7:02     ` Christoph Hellwig

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20240328203910.2370087-1-stefanha@redhat.com \
    --to=stefanha@redhat.com \
    --cc=agk@redhat.com \
    --cc=axboe@kernel.dk \
    --cc=dm-devel@lists.linux.dev \
    --cc=eblake@redhat.com \
    --cc=ejt@redhat.com \
    --cc=hch@lst.de \
    --cc=linux-block@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mpatocka@redhat.com \
    --cc=snitzer@kernel.org \
    --cc=teigland@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).