DM-Devel Archive mirror
 help / color / mirror / Atom feed
From: Mike Snitzer <snitzer@kernel.org>
To: Christoph Hellwig <hch@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>,
	Jens Axboe <axboe@kernel.dk>,
	Johannes Weiner <hannes@cmpxchg.org>,
	"linux-block@vger.kernel.org" <linux-block@vger.kernel.org>,
	dm-devel@lists.linux.dev
Subject: Re: [GIT PULL] Block updates for 6.9-rc1
Date: Tue, 12 Mar 2024 11:22:53 -0400	[thread overview]
Message-ID: <ZfBzTWM7NBbGymsY@redhat.com> (raw)
In-Reply-To: <Ze-rRvKpux44ueao@infradead.org>

On Mon, Mar 11 2024 at  9:09P -0400,
Christoph Hellwig <hch@infradead.org> wrote:

> On Mon, Mar 11, 2024 at 08:28:50PM -0400, Mike Snitzer wrote:
> > All for Jens being made to suffer with dm-crypt but I think we need a
> > proper root cause of what is happening for you and Johannes ;)
> 
> I'm going to try to stay out of the cranking, but I think the reason is
> that the limits stacking inherits the max_segment_size, nvme has weird
> rules for them due their odd PRPs, and dm-crypt set it's own
> max_segment_size to split out each page.  The regression here is
> that we now actually verify that conflict.
> 
> So this happens only for dm-crypt on nvme.  The fix is probably
> to not inherit low-level limits like max_segment_size, but I need
> to think about it a bit more and come up with an automated test case
> using say nvme-loop.

Yeah, I generally agree.

I looked at the latest code to more fully understand why this failed.

1) dm-crypt.c:crypt_io_hints() sets limits->max_segment_size = PAGE_SIZE;

2) drivers/nvme/host/core.c:nvme_set_ctrl_limits() sets:
   lim->virt_boundary_mask = NVME_CTRL_PAGE_SIZE - 1;
   lim->max_segment_size = UINT_MAX;

3) blk_stack_limits(t=dm-crypt, b=nvme-pci) will combine limits:
        t->virt_boundary_mask = min_not_zero(t->virt_boundary_mask,
                                            b->virt_boundary_mask);
        t->max_segment_size = min_not_zero(t->max_segment_size,
                                           b->max_segment_size);

4) blk_validate_limits() will reject the limits that
   blk_stack_limits() created:
        /*
         * Devices that require a virtual boundary do not support scatter/gather
         * I/O natively, but instead require a descriptor list entry for each
         * page (which might not be identical to the Linux PAGE_SIZE).  Because
         * of that they are not limited by our notion of "segment size".
         */
	if (lim->virt_boundary_mask) {
                if (WARN_ON_ONCE(lim->max_segment_size &&
                                 lim->max_segment_size != UINT_MAX))
                        return -EINVAL;
                lim->max_segment_size = UINT_MAX;
	} else {
                /*
                 * The maximum segment size has an odd historic 64k default that
                 * drivers probably should override.  Just like the I/O size we
                 * require drivers to at least handle a full page per segment.
                 */
		if (!lim->max_segment_size)
                        lim->max_segment_size = BLK_MAX_SEGMENT_SIZE;
                if (WARN_ON_ONCE(lim->max_segment_size < PAGE_SIZE))
                	return -EINVAL;
        }

blk_validate_limits() is currently very pedantic. I discussed with Jens
briefly and we're thinking it might make sense for blk_validate_limits()
to be more forgiving by _not_ imposing hard -EINVAL failure.  That in
the interim, during this transition to more curated and atomic limits, a
WARN_ON_ONCE() splat should serve as enough notice to developers (be it
lower level nvme or higher-level virtual devices like DM).

BUT for this specific max_segment_size case, the constraints of dm-crypt
are actually more conservative due to crypto requirements. Yet nvme's
more general "don't care, but will care if non-nvme driver does" for
this particular max_segment_size limit is being imposed when validating
the combined limits that dm-crypt will impose at the top-level.

All said, the above "if (lim->virt_boundary_mask)" check in
blk_validate_limits() looks bogus for stacked device limits.

Mike

       reply	other threads:[~2024-03-12 15:22 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <eaeec3b6-75c2-4b65-8c50-2d37450ccdd9@kernel.dk>
     [not found] ` <20240311235023.GA1205@cmpxchg.org>
     [not found]   ` <CAHk-=wgOfw8NBQ2Qyh8QUjhp5z4v--PuciLE7drn5LJkTtgPVw@mail.gmail.com>
     [not found]     ` <e3fea6c3-7704-46cd-b078-0c6e8d966474@kernel.dk>
     [not found]       ` <CAHk-=wgXZ6dE1yHK_jQmnSjbEbndyzZHC5dJNsmQYTD2K-m61w@mail.gmail.com>
     [not found]         ` <Ze-hwnd3ocfJc9xU@redhat.com>
     [not found]           ` <Ze-rRvKpux44ueao@infradead.org>
2024-03-12 15:22             ` Mike Snitzer [this message]
2024-03-12 16:28               ` [GIT PULL] Block updates for 6.9-rc1 Keith Busch
2024-03-12 21:10               ` Christoph Hellwig
2024-03-12 22:22                 ` Mike Snitzer
2024-03-12 22:30                   ` Christoph Hellwig
2024-03-12 22:50                     ` Mike Snitzer
2024-03-12 22:58                       ` Christoph Hellwig
2024-04-11 20:15                         ` [PATCH for-6.10 0/2] dm: use late bio-splitting and queue_limits_set Mike Snitzer
2024-04-11 20:15                         ` [PATCH for-6.10 1/2] dm-crypt: stop constraining max_segment_size to PAGE_SIZE Mike Snitzer
2024-04-12  6:11                           ` Christoph Hellwig
2024-04-15 14:08                           ` Mikulas Patocka
2024-04-23  7:32                           ` Ming Lei
2024-04-11 20:15                         ` [PATCH for-6.10 2/2] dm: use queue_limits_set Mike Snitzer
2024-04-23  7:33                           ` Ming Lei
2024-03-13 13:11                 ` [GIT PULL] Block updates for 6.9-rc1 Ming Lei

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ZfBzTWM7NBbGymsY@redhat.com \
    --to=snitzer@kernel.org \
    --cc=axboe@kernel.dk \
    --cc=dm-devel@lists.linux.dev \
    --cc=hannes@cmpxchg.org \
    --cc=hch@infradead.org \
    --cc=linux-block@vger.kernel.org \
    --cc=torvalds@linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).