gfs2.lists.linux.dev archive mirror
 help / color / mirror / Atom feed
From: Kent Overstreet <kent.overstreet@linux.dev>
To: Dave Chinner <david@fromorbit.com>
Cc: linux-fsdevel@vger.kernel.org, linux-block@vger.kernel.org,
	linux-cachefs@redhat.com, dhowells@redhat.com,
	gfs2@lists.linux.dev, dm-devel@lists.linux.dev,
	linux-security-module@vger.kernel.org, selinux@vger.kernel.org,
	linux-kernel@vger.kernel.org
Subject: Re: [PATCH 08/11] vfs: inode cache conversion to hash-bl
Date: Wed, 6 Dec 2023 23:58:44 -0500	[thread overview]
Message-ID: <20231207045844.u26r5vn26gtmqwe5@moria.home.lan> (raw)
In-Reply-To: <20231206060629.2827226-9-david@fromorbit.com>

On Wed, Dec 06, 2023 at 05:05:37PM +1100, Dave Chinner wrote:
> From: Dave Chinner <dchinner@redhat.com>
> 
> Scalability of the global inode_hash_lock really sucks for
> filesystems that use the vfs inode cache (i.e. everything but XFS).

Ages ago, we talked about (and I attempted, but ended up swearing at
inode lifetime rules) - conversion to rhashtable instead, which I still
believe would be preferable since that code is fully lockless (and
resizeable, of course). But it turned out to be a much bigger project...

But IIRC the bulk of the work was going to be "clean up inode
refcounting/lifetime rules into something sane/modern" - maybe we could
leave some breadcrumbs/comments in fs/inode.c for what that would take,
if/when someone else is sufficiently motivated?

> threads		vanilla	 patched	vanilla	patched
> 2		 7.923	  7.358		 8.003	 7.276
> 4		 8.152	  7.530		 9.097	 8.506
> 8		13.090	  7.871		11.752	10.015
> 16		24.602	  9.540		24.614	13.989
> 32		49.536	 19.314		49.179	25.982

nice

> The big wins here are at >= 8 threads, with both filesytsems now
> being limited by internal filesystem algorithms, not the VFS inode
> cache scalability.
> 
> Ext4 contention moves to the buffer cache on directory block
> lookups:
> 
> -   66.45%     0.44%  [kernel]              [k] __ext4_read_dirblock
>    - 66.01% __ext4_read_dirblock
>       - 66.01% ext4_bread
>          - ext4_getblk
>             - 64.77% bdev_getblk
>                - 64.69% __find_get_block
>                   - 63.01% _raw_spin_lock
>                      - 62.96% do_raw_spin_lock
>                           59.21% __pv_queued_spin_lock_slowpath
> 
> bcachefs contention moves to internal btree traversal locks.
> 
>  - 95.37% __lookup_slow
>     - 93.95% bch2_lookup
>        - 82.57% bch2_vfs_inode_get
> 	  - 65.44% bch2_inode_find_by_inum_trans
> 	     - 65.41% bch2_inode_peek_nowarn
> 		- 64.60% bch2_btree_iter_peek_slot
> 		   - 64.55% bch2_btree_path_traverse_one
> 		      - bch2_btree_path_traverse_cached
> 			 - 63.02% bch2_btree_path_traverse_cached_slowpath
> 			    - 56.60% mutex_lock

dlist-lock ought to be perfect for solving this one

Reviewed-by: Kent Overstreet <kent.overstreet@linux.dev>

  reply	other threads:[~2023-12-07  4:58 UTC|newest]

Thread overview: 34+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-12-06  6:05 [PATCH 0/11] vfs: inode cache scalability improvements Dave Chinner
2023-12-06  6:05 ` [PATCH 01/11] lib/dlock-list: Distributed and lock-protected lists Dave Chinner
2023-12-07  2:23   ` Al Viro
2023-12-06  6:05 ` [PATCH 02/11] vfs: Remove unnecessary list_for_each_entry_safe() variants Dave Chinner
2023-12-07  2:26   ` Al Viro
2023-12-07  4:18   ` Kent Overstreet
2023-12-06  6:05 ` [PATCH 03/11] vfs: Use dlock list for superblock's inode list Dave Chinner
2023-12-07  2:40   ` Al Viro
2023-12-07  4:59     ` Dave Chinner
2023-12-07  5:03       ` Kent Overstreet
2023-12-06  6:05 ` [PATCH 04/11] lib/dlock-list: Make sibling CPUs share the same linked list Dave Chinner
2023-12-07  4:31   ` Kent Overstreet
2023-12-07  5:42   ` Kent Overstreet
2023-12-07  6:25     ` Dave Chinner
2023-12-07  6:49   ` Al Viro
2023-12-06  6:05 ` [PATCH 05/11] selinux: use dlist for isec inode list Dave Chinner
2023-12-06 21:52   ` Paul Moore
2023-12-06 23:04     ` Dave Chinner
2023-12-07  0:36       ` Paul Moore
2023-12-06  6:05 ` [PATCH 06/11] vfs: factor out inode hash head calculation Dave Chinner
2023-12-07  3:02   ` Al Viro
2023-12-06  6:05 ` [PATCH 07/11] hlist-bl: add hlist_bl_fake() Dave Chinner
2023-12-07  3:05   ` Al Viro
2023-12-06  6:05 ` [PATCH 08/11] vfs: inode cache conversion to hash-bl Dave Chinner
2023-12-07  4:58   ` Kent Overstreet [this message]
2023-12-07  6:03     ` Dave Chinner
2023-12-07  6:42   ` Al Viro
2023-12-06  6:05 ` [PATCH 09/11] hash-bl: explicitly initialise hash-bl heads Dave Chinner
2023-12-07  3:15   ` Al Viro
2023-12-06  6:05 ` [PATCH 10/11] list_bl: don't use bit locks for PREEMPT_RT or lockdep Dave Chinner
2023-12-07  4:16   ` Kent Overstreet
2023-12-07  4:41     ` Dave Chinner
2023-12-06  6:05 ` [PATCH 11/11] hlist-bl: introduced nested locking for dm-snap Dave Chinner
2023-12-07 17:08 ` [PATCH 0/11] vfs: inode cache scalability improvements Kent Overstreet

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20231207045844.u26r5vn26gtmqwe5@moria.home.lan \
    --to=kent.overstreet@linux.dev \
    --cc=david@fromorbit.com \
    --cc=dhowells@redhat.com \
    --cc=dm-devel@lists.linux.dev \
    --cc=gfs2@lists.linux.dev \
    --cc=linux-block@vger.kernel.org \
    --cc=linux-cachefs@redhat.com \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-security-module@vger.kernel.org \
    --cc=selinux@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).