From: Mike Snitzer <snitzer@kernel.org>
To: Christoph Hellwig <hch@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>,
Jens Axboe <axboe@kernel.dk>,
Johannes Weiner <hannes@cmpxchg.org>,
"linux-block@vger.kernel.org" <linux-block@vger.kernel.org>,
dm-devel@lists.linux.dev
Subject: Re: [GIT PULL] Block updates for 6.9-rc1
Date: Tue, 12 Mar 2024 11:22:53 -0400 [thread overview]
Message-ID: <ZfBzTWM7NBbGymsY@redhat.com> (raw)
In-Reply-To: <Ze-rRvKpux44ueao@infradead.org>
On Mon, Mar 11 2024 at 9:09P -0400,
Christoph Hellwig <hch@infradead.org> wrote:
> On Mon, Mar 11, 2024 at 08:28:50PM -0400, Mike Snitzer wrote:
> > All for Jens being made to suffer with dm-crypt but I think we need a
> > proper root cause of what is happening for you and Johannes ;)
>
> I'm going to try to stay out of the cranking, but I think the reason is
> that the limits stacking inherits the max_segment_size, nvme has weird
> rules for them due their odd PRPs, and dm-crypt set it's own
> max_segment_size to split out each page. The regression here is
> that we now actually verify that conflict.
>
> So this happens only for dm-crypt on nvme. The fix is probably
> to not inherit low-level limits like max_segment_size, but I need
> to think about it a bit more and come up with an automated test case
> using say nvme-loop.
Yeah, I generally agree.
I looked at the latest code to more fully understand why this failed.
1) dm-crypt.c:crypt_io_hints() sets limits->max_segment_size = PAGE_SIZE;
2) drivers/nvme/host/core.c:nvme_set_ctrl_limits() sets:
lim->virt_boundary_mask = NVME_CTRL_PAGE_SIZE - 1;
lim->max_segment_size = UINT_MAX;
3) blk_stack_limits(t=dm-crypt, b=nvme-pci) will combine limits:
t->virt_boundary_mask = min_not_zero(t->virt_boundary_mask,
b->virt_boundary_mask);
t->max_segment_size = min_not_zero(t->max_segment_size,
b->max_segment_size);
4) blk_validate_limits() will reject the limits that
blk_stack_limits() created:
/*
* Devices that require a virtual boundary do not support scatter/gather
* I/O natively, but instead require a descriptor list entry for each
* page (which might not be identical to the Linux PAGE_SIZE). Because
* of that they are not limited by our notion of "segment size".
*/
if (lim->virt_boundary_mask) {
if (WARN_ON_ONCE(lim->max_segment_size &&
lim->max_segment_size != UINT_MAX))
return -EINVAL;
lim->max_segment_size = UINT_MAX;
} else {
/*
* The maximum segment size has an odd historic 64k default that
* drivers probably should override. Just like the I/O size we
* require drivers to at least handle a full page per segment.
*/
if (!lim->max_segment_size)
lim->max_segment_size = BLK_MAX_SEGMENT_SIZE;
if (WARN_ON_ONCE(lim->max_segment_size < PAGE_SIZE))
return -EINVAL;
}
blk_validate_limits() is currently very pedantic. I discussed with Jens
briefly and we're thinking it might make sense for blk_validate_limits()
to be more forgiving by _not_ imposing hard -EINVAL failure. That in
the interim, during this transition to more curated and atomic limits, a
WARN_ON_ONCE() splat should serve as enough notice to developers (be it
lower level nvme or higher-level virtual devices like DM).
BUT for this specific max_segment_size case, the constraints of dm-crypt
are actually more conservative due to crypto requirements. Yet nvme's
more general "don't care, but will care if non-nvme driver does" for
this particular max_segment_size limit is being imposed when validating
the combined limits that dm-crypt will impose at the top-level.
All said, the above "if (lim->virt_boundary_mask)" check in
blk_validate_limits() looks bogus for stacked device limits.
Mike
next parent reply other threads:[~2024-03-12 15:22 UTC|newest]
Thread overview: 15+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <eaeec3b6-75c2-4b65-8c50-2d37450ccdd9@kernel.dk>
[not found] ` <20240311235023.GA1205@cmpxchg.org>
[not found] ` <CAHk-=wgOfw8NBQ2Qyh8QUjhp5z4v--PuciLE7drn5LJkTtgPVw@mail.gmail.com>
[not found] ` <e3fea6c3-7704-46cd-b078-0c6e8d966474@kernel.dk>
[not found] ` <CAHk-=wgXZ6dE1yHK_jQmnSjbEbndyzZHC5dJNsmQYTD2K-m61w@mail.gmail.com>
[not found] ` <Ze-hwnd3ocfJc9xU@redhat.com>
[not found] ` <Ze-rRvKpux44ueao@infradead.org>
2024-03-12 15:22 ` Mike Snitzer [this message]
2024-03-12 16:28 ` [GIT PULL] Block updates for 6.9-rc1 Keith Busch
2024-03-12 21:10 ` Christoph Hellwig
2024-03-12 22:22 ` Mike Snitzer
2024-03-12 22:30 ` Christoph Hellwig
2024-03-12 22:50 ` Mike Snitzer
2024-03-12 22:58 ` Christoph Hellwig
2024-04-11 20:15 ` [PATCH for-6.10 0/2] dm: use late bio-splitting and queue_limits_set Mike Snitzer
2024-04-11 20:15 ` [PATCH for-6.10 1/2] dm-crypt: stop constraining max_segment_size to PAGE_SIZE Mike Snitzer
2024-04-12 6:11 ` Christoph Hellwig
2024-04-15 14:08 ` Mikulas Patocka
2024-04-23 7:32 ` Ming Lei
2024-04-11 20:15 ` [PATCH for-6.10 2/2] dm: use queue_limits_set Mike Snitzer
2024-04-23 7:33 ` Ming Lei
2024-03-13 13:11 ` [GIT PULL] Block updates for 6.9-rc1 Ming Lei
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=ZfBzTWM7NBbGymsY@redhat.com \
--to=snitzer@kernel.org \
--cc=axboe@kernel.dk \
--cc=dm-devel@lists.linux.dev \
--cc=hannes@cmpxchg.org \
--cc=hch@infradead.org \
--cc=linux-block@vger.kernel.org \
--cc=torvalds@linux-foundation.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).