From: Dave Chinner <david@fromorbit.com>
To: Kent Overstreet <kent.overstreet@linux.dev>
Cc: linux-fsdevel@vger.kernel.org, linux-block@vger.kernel.org,
linux-cachefs@redhat.com, dhowells@redhat.com,
gfs2@lists.linux.dev, dm-devel@lists.linux.dev,
linux-security-module@vger.kernel.org, selinux@vger.kernel.org,
linux-kernel@vger.kernel.org
Subject: Re: [PATCH 10/11] list_bl: don't use bit locks for PREEMPT_RT or lockdep
Date: Thu, 7 Dec 2023 15:41:56 +1100 [thread overview]
Message-ID: <ZXFNFKai3ILLFdnR@dread.disaster.area> (raw)
In-Reply-To: <20231207041650.3tzzmv2jfrr5vppl@moria.home.lan>
On Wed, Dec 06, 2023 at 11:16:50PM -0500, Kent Overstreet wrote:
> On Wed, Dec 06, 2023 at 05:05:39PM +1100, Dave Chinner wrote:
> > From: Dave Chinner <dchinner@redhat.com>
> >
> > hash-bl nests spinlocks inside the bit locks. This causes problems
> > for CONFIG_PREEMPT_RT which converts spin locks to sleeping locks,
> > and we're not allowed to sleep while holding a spinning lock.
> >
> > Further, lockdep does not support bit locks, so we lose lockdep
> > coverage of the inode hash table with the hash-bl conversion.
> >
> > To enable these configs to work, add an external per-chain spinlock
> > to the hlist_bl_head() and add helpers to use this instead of the
> > bit spinlock when preempt_rt or lockdep are enabled.
> >
> > This converts all users of hlist-bl to use the external spinlock in
> > these situations, so we also gain lockdep coverage of things like
> > the dentry cache hash table with this change.
> >
> > Signed-off-by: Dave Chinner <dchinner@redhat.com>
>
> Sleepable bit locks can be done with wait_on_bit(), is that worth
> considering for PREEMPT_RT? Or are the other features of real locks
> important there?
I think wait_on_bit() is not scalable. It hashes down to one of 256
shared struct wait_queue_heads which have thundering herd
behaviours, and it requires the locker to always run
prepare_to_wait() and finish_wait(). This means there is at least
one spinlock_irqsave()/unlock pair needed, sometimes two, just to
get an uncontended sleeping bit lock.
So as a fast path operation that requires lock scalability, it's
going to be better to use a straight spinlock that doesn't require
irq safety as it's far less expensive than a sleeping bit lock.
Whether CONFIG_PREEMPT_RT changes that equation at all is not at all
clear to me, and so I'll leave that consideration to RT people if
they see a need to address it. In the mean time, we need to use an
external spinlock for lockdep validation so it really doesn't make
any sense at all to add a third locking variant with completely
different semantics just for PREEMPT_RT...
-Dave.
--
Dave Chinner
david@fromorbit.com
next prev parent reply other threads:[~2023-12-07 4:41 UTC|newest]
Thread overview: 34+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-12-06 6:05 [PATCH 0/11] vfs: inode cache scalability improvements Dave Chinner
2023-12-06 6:05 ` [PATCH 01/11] lib/dlock-list: Distributed and lock-protected lists Dave Chinner
2023-12-07 2:23 ` Al Viro
2023-12-06 6:05 ` [PATCH 02/11] vfs: Remove unnecessary list_for_each_entry_safe() variants Dave Chinner
2023-12-07 2:26 ` Al Viro
2023-12-07 4:18 ` Kent Overstreet
2023-12-06 6:05 ` [PATCH 03/11] vfs: Use dlock list for superblock's inode list Dave Chinner
2023-12-07 2:40 ` Al Viro
2023-12-07 4:59 ` Dave Chinner
2023-12-07 5:03 ` Kent Overstreet
2023-12-06 6:05 ` [PATCH 04/11] lib/dlock-list: Make sibling CPUs share the same linked list Dave Chinner
2023-12-07 4:31 ` Kent Overstreet
2023-12-07 5:42 ` Kent Overstreet
2023-12-07 6:25 ` Dave Chinner
2023-12-07 6:49 ` Al Viro
2023-12-06 6:05 ` [PATCH 05/11] selinux: use dlist for isec inode list Dave Chinner
2023-12-06 21:52 ` Paul Moore
2023-12-06 23:04 ` Dave Chinner
2023-12-07 0:36 ` Paul Moore
2023-12-06 6:05 ` [PATCH 06/11] vfs: factor out inode hash head calculation Dave Chinner
2023-12-07 3:02 ` Al Viro
2023-12-06 6:05 ` [PATCH 07/11] hlist-bl: add hlist_bl_fake() Dave Chinner
2023-12-07 3:05 ` Al Viro
2023-12-06 6:05 ` [PATCH 08/11] vfs: inode cache conversion to hash-bl Dave Chinner
2023-12-07 4:58 ` Kent Overstreet
2023-12-07 6:03 ` Dave Chinner
2023-12-07 6:42 ` Al Viro
2023-12-06 6:05 ` [PATCH 09/11] hash-bl: explicitly initialise hash-bl heads Dave Chinner
2023-12-07 3:15 ` Al Viro
2023-12-06 6:05 ` [PATCH 10/11] list_bl: don't use bit locks for PREEMPT_RT or lockdep Dave Chinner
2023-12-07 4:16 ` Kent Overstreet
2023-12-07 4:41 ` Dave Chinner [this message]
2023-12-06 6:05 ` [PATCH 11/11] hlist-bl: introduced nested locking for dm-snap Dave Chinner
2023-12-07 17:08 ` [PATCH 0/11] vfs: inode cache scalability improvements Kent Overstreet
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=ZXFNFKai3ILLFdnR@dread.disaster.area \
--to=david@fromorbit.com \
--cc=dhowells@redhat.com \
--cc=dm-devel@lists.linux.dev \
--cc=gfs2@lists.linux.dev \
--cc=kent.overstreet@linux.dev \
--cc=linux-block@vger.kernel.org \
--cc=linux-cachefs@redhat.com \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-security-module@vger.kernel.org \
--cc=selinux@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).