From: Shyam Prasad N <nspmangalore@gmail.com>
To: CIFS <linux-cifs@vger.kernel.org>,
Steve French <smfrench@gmail.com>,
Bharath SM <bharathsm.hsk@gmail.com>,
David Howells <dhowells@redhat.com>,
Enzo Matsumiya <ematsumiya@suse.de>,
Henrique Carvalho <henrique.carvalho@suse.com>,
Paulo Alcantara <pc@manguebit.org>
Subject: [PATCHSET v3] QueryDir and Directory lease improvements
Date: Tue, 28 Apr 2026 21:48:52 +0530 [thread overview]
Message-ID: <CANT5p=pTSh=R7mj6gc=70YZ9fdahYYUT5E2tdxwxZJ=jOrKCjQ@mail.gmail.com> (raw)
[-- Attachment #1: Type: text/plain, Size: 4076 bytes --]
Hi folks,
This patch set of 19 patches is meant to improve metadata performance
mainly surrounding readdir scenario, both with and without directory
leases.
Here is a list of fixes/features:
https://github.com/sprasad-microsoft/smb3-kernel-client/compare/master...personal/sprasad/for-next
cifs: change_conf needs to be called for session setup -> This bug
could cause the client not to request for lease if readdir is run
immediately following mount and multichannel is enabled
cifs: abort open_cached_dir if we don't request leases -> This bug
can cause an extra round-trip that can be avoided
cifs: invalidate cfid on unlink/rename/rmdir -> This is a serious bug
that can cause stale dir enumeration following an unlink/rename/rmdir
cifs: define variable sized buffer for querydir responses -> Enables
the next set of changes by allowing QueryDir responses to be larger
than 64KB
cifs: optimize readdir for small directories -> Similar to Windows
client, the Linux client will now send Open+QueryDir1+QueryDir2 in a
single compound request, avoiding an extra round-trip for all small
directories
cifs: optimize readdir for larger directories -> For directories that
need a second round-trip, with this change, we'll now request for
rsize QueryDir responses
cifs: reorganize cached dir helpers -> Cosmetic change. For better readability
cifs: make cfid locks more granular -> With parallel readdirs and dir
lease enabled, the client will now serialize operations under a big
chunky lock. This change introduces finer locks for better
parallelization
cifs: query dir should reuse cfid even if not fully cached -> Avoids
unnecessary breaking of dir leases when parallel readdirs to same
directory are happening
cifs: back cached_dirents with page cache -> Significant
re-architecture of dir cache code to enable faster lookups and reduced
allocations for large directories
cifs: in place changes to cached_dirents when dir lease is held ->
With this change, the client does not revoke dir lease when the
directory is changed anymore
cifs: register a shrinker to manage cached_dirents -> This change
allows the kernel to callback to the SMB client to free up dir cache
under memory pressure
cifs: option to disable time-based eviction of cache -> With the last
change, the client can now hold leases for a really long time
cifs: option to set unlimited number of cached dirs -> With shrinker
registered, the client can now cache a really large number of dirs.
Helpful for workloads that contain a large number of small dirs
cifs: allow dcache population to happen asynchronously ->
Experimental module param to asynchronously populate the dentry cache.
This allows operations like plain directory enumeration (without attr
requests) to complete much faster
cifs: trace points for cached_dir operations -> Aids debugging of dir
caching code
cifs: discard functions should not return failure -> This serious but
rare bug can cause hangs in reads and readdirs when specific network
errors are returned
cifs: keep cfids in rbtree for efficient lookups -> With the ability
to cache large number of dirs, keeping them on a linked list was
becoming a bottleneck. This enables faster lookups of dir cache data
cifs: invalidate cached_dirents if population aborted -> This allows
dir cache to be invalidated when the application stops continued dir
enumeration
Overall, these changes have significantly improved perf for many
metadata heavy workloads.
Attaching a performance benchmark report for various workloads that I ran.
Overall, not too much penalty for populating the cache with dir leases
enabled (in fact, for some workloads, it improves perf with cold
cache).
With warm cache, we've reduced the server round-trips to a bare
minimum (pretty much only file opens/closes). So with directory leases
enabled, there is significant reduction in the number of round-trips.
I was hoping to improve tar/untar performance. However, they still
involve serialized open/read/close calls to the server that cannot be
avoided even with dir leases.
--
Regards,
Shyam
[-- Attachment #2: dirlease-optimizations5-combined-win.md --]
[-- Type: application/octet-stream, Size: 8563 bytes --]
## Legend
- `ls_r`: Recursive directory listing using `ls -R` (name enumeration focused).
- `ls_lr`: Recursive long listing using `ls -lR` (name + attribute metadata).
- `ls_then_lsl`: Two-step run where iteration 1 is `ls -R` and iteration 2 is `ls -lR`.
- `find_nonempty`: File traversal using `find -type f -size +0c`. (find all regular files with non-zero size)
- `du_recursive`: Recursive disk usage scan using `du`.
- `Cold`: First iteration after remount (cold cache state).
- `Warm`: Second iteration immediately after cold run (warm cache state).
- `flat_dir`: Flat namespace with large number of files in one directory.
- `deep_dir`: Deep hierarchical namespace with nested directories.
- `Total Time`: End-to-end elapsed runtime for that workload/iteration.
- `QueryDirectories`: SMB2 directory query operations observed during the run.
- `QueryInfos`: SMB2 metadata query operations observed during the run.
- Sign convention in tables: values are shown as `baseline -> comparison`; negative `%` in `Total Time` means the comparison run is faster.
## Summary
**Key Findings:**
1. **directory lease optimizations5 vs mainline:**
- handlecache / flat_dir average: **-23.7%**
- handlecache / deep_dir average: **-37.6%**
- nohandlecache / flat_dir average: **-13.5%**
- nohandlecache / deep_dir average: **-30.4%**
2. **Within directory lease optimizations5 (handlecache vs nohandlecache):**
- flat_dir: handlecache is **-14.4%** vs nohandlecache (avg)
- deep_dir: handlecache is **-9.5%** vs nohandlecache (avg)
---
## Comparison A: directory lease optimizations5 vs mainline
### handlecache / flat_dir
| Workload | Iteration | Total Time | QueryDirectories | QueryInfos |
|---|---|---|---|---|
| ls_r | Cold | 5851 -> 4926 (-925, -15.8%) | 731 -> 9 (-722, -98.8%) | 1 -> 2 (+1, +100.0%) |
| ls_r | Warm | 1076 -> 108 (-968, -90.0%) | 731 -> 0 (-731, -100.0%) | 0 -> 0 |
| ls_lr | Cold | 8790 -> 7812 (-978, -11.1%) | 731 -> 9 (-722, -98.8%) | 1 -> 2 (+1, +100.0%) |
| ls_lr | Warm | 3946 -> 3089 (-857, -21.7%) | 731 -> 0 (-731, -100.0%) | 0 -> 0 |
| ls_then_lsl | Cold | 5845 -> 4968 (-877, -15.0%) | 731 -> 9 (-722, -98.8%) | 1 -> 2 (+1, +100.0%) |
| ls_then_lsl | Warm | 3886 -> 3089 (-797, -20.5%) | 731 -> 0 (-731, -100.0%) | 0 -> 0 |
| find_nonempty | Cold | 8501 -> 7636 (-865, -10.2%) | 731 -> 9 (-722, -98.8%) | 2 -> 3 (+1, +50.0%) |
| find_nonempty | Warm | 3689 -> 2865 (-824, -22.3%) | 731 -> 0 (-731, -100.0%) | 1 -> 1 (+0, +0.0%) |
| du_recursive | Cold | 8322 -> 7613 (-709, -8.5%) | 731 -> 9 (-722, -98.8%) | 2 -> 3 (+1, +50.0%) |
| du_recursive | Warm | 3516 -> 2749 (-767, -21.8%) | 731 -> 0 (-731, -100.0%) | 1 -> 1 (+0, +0.0%) |
### handlecache / deep_dir
| Workload | Iteration | Total Time | QueryDirectories | QueryInfos |
|---|---|---|---|---|
| ls_r | Cold | 39419 -> 39000 (-419, -1.1%) | 30010 -> 30010 (+0, +0.0%) | 3 -> 15007 (+15004, +500133.3%) |
| ls_r | Warm | 33549 -> 4855 (-28694, -85.5%) | 30010 -> 0 (-30010, -100.0%) | 2 -> 0 (-2, -100.0%) |
| ls_lr | Cold | 43849 -> 44657 (+808, +1.8%) | 30010 -> 30010 (+0, +0.0%) | 3 -> 15007 (+15004, +500133.3%) |
| ls_lr | Warm | 38035 -> 11213 (-26822, -70.5%) | 30010 -> 0 (-30010, -100.0%) | 2 -> 0 (-2, -100.0%) |
| ls_then_lsl | Cold | 37786 -> 38824 (+1038, +2.7%) | 30010 -> 30010 (+0, +0.0%) | 3 -> 15007 (+15004, +500133.3%) |
| ls_then_lsl | Warm | 36592 -> 11597 (-24995, -68.3%) | 30010 -> 0 (-30010, -100.0%) | 2 -> 0 (-2, -100.0%) |
| find_nonempty | Cold | 42691 -> 43284 (+593, +1.4%) | 30010 -> 30010 (+0, +0.0%) | 4 -> 15008 (+15004, +375100.0%) |
| find_nonempty | Warm | 36733 -> 8603 (-28130, -76.6%) | 30010 -> 0 (-30010, -100.0%) | 3 -> 1 (-2, -66.7%) |
| du_recursive | Cold | 42307 -> 43305 (+998, +2.4%) | 30010 -> 30010 (+0, +0.0%) | 4 -> 15008 (+15004, +375100.0%) |
| du_recursive | Warm | 49356 -> 8866 (-40490, -82.0%) | 30010 -> 0 (-30010, -100.0%) | 5 -> 1 (-4, -80.0%) |
### nohandlecache / flat_dir
| Workload | Iteration | Total Time | QueryDirectories | QueryInfos |
|---|---|---|---|---|
| ls_r | Cold | 5806 -> 5250 (-556, -9.6%) | 731 -> 9 (-722, -98.8%) | 1 -> 1 (+0, +0.0%) |
| ls_r | Warm | 1095 -> 592 (-503, -45.9%) | 731 -> 9 (-722, -98.8%) | 0 -> 0 |
| ls_lr | Cold | 8662 -> 8105 (-557, -6.4%) | 731 -> 9 (-722, -98.8%) | 1 -> 1 (+0, +0.0%) |
| ls_lr | Warm | 3938 -> 3478 (-460, -11.7%) | 731 -> 9 (-722, -98.8%) | 0 -> 0 |
| ls_then_lsl | Cold | 5855 -> 5180 (-675, -11.5%) | 731 -> 9 (-722, -98.8%) | 1 -> 1 (+0, +0.0%) |
| ls_then_lsl | Warm | 3901 -> 3491 (-410, -10.5%) | 731 -> 9 (-722, -98.8%) | 0 -> 0 |
| find_nonempty | Cold | 8463 -> 7935 (-528, -6.2%) | 731 -> 9 (-722, -98.8%) | 2 -> 2 (+0, +0.0%) |
| find_nonempty | Warm | 3684 -> 3221 (-463, -12.6%) | 731 -> 9 (-722, -98.8%) | 1 -> 1 (+0, +0.0%) |
| du_recursive | Cold | 8323 -> 7687 (-636, -7.6%) | 731 -> 9 (-722, -98.8%) | 2 -> 2 (+0, +0.0%) |
| du_recursive | Warm | 3495 -> 3050 (-445, -12.7%) | 731 -> 9 (-722, -98.8%) | 1 -> 1 (+0, +0.0%) |
### nohandlecache / deep_dir
| Workload | Iteration | Total Time | QueryDirectories | QueryInfos |
|---|---|---|---|---|
| ls_r | Cold | 44629 -> 24941 (-19688, -44.1%) | 30010 -> 30010 (+0, +0.0%) | 5 -> 2 (-3, -60.0%) |
| ls_r | Warm | 32693 -> 19146 (-13547, -41.4%) | 30010 -> 30010 (+0, +0.0%) | 2 -> 1 (-1, -50.0%) |
| ls_lr | Cold | 44230 -> 29613 (-14617, -33.0%) | 30010 -> 30010 (+0, +0.0%) | 4 -> 1 (-3, -75.0%) |
| ls_lr | Warm | 38217 -> 23779 (-14438, -37.8%) | 30010 -> 30010 (+0, +0.0%) | 2 -> 1 (-1, -50.0%) |
| ls_then_lsl | Cold | 38455 -> 25326 (-13129, -34.1%) | 30010 -> 30010 (+0, +0.0%) | 3 -> 1 (-2, -66.7%) |
| ls_then_lsl | Warm | 37118 -> 24100 (-13018, -35.1%) | 30010 -> 30010 (+0, +0.0%) | 2 -> 1 (-1, -50.0%) |
| find_nonempty | Cold | 42194 -> 38494 (-3700, -8.8%) | 30010 -> 30010 (+0, +0.0%) | 4 -> 4 (+0, +0.0%) |
| find_nonempty | Warm | 36431 -> 36495 (+64, +0.2%) | 30010 -> 30010 (+0, +0.0%) | 3 -> 3 (+0, +0.0%) |
| du_recursive | Cold | 42781 -> 28932 (-13849, -32.4%) | 30010 -> 30010 (+0, +0.0%) | 4 -> 2 (-2, -50.0%) |
| du_recursive | Warm | 36953 -> 23297 (-13656, -37.0%) | 30010 -> 30010 (+0, +0.0%) | 3 -> 2 (-1, -33.3%) |
---
## Comparison B: handlecache vs nohandlecache (within directory lease optimizations5)
### flat_dir (nohandlecache vs handlecache)
| Workload | Iteration | Total Time | QueryDirectories | QueryInfos |
|---|---|---|---|---|
| ls_r | Cold | 5250 -> 4926 (-324, -6.2%) | 9 -> 9 (+0, +0.0%) | 1 -> 2 (+1, +100.0%) |
| ls_r | Warm | 592 -> 108 (-484, -81.8%) | 9 -> 0 (-9, -100.0%) | 0 -> 0 |
| ls_lr | Cold | 8105 -> 7812 (-293, -3.6%) | 9 -> 9 (+0, +0.0%) | 1 -> 2 (+1, +100.0%) |
| ls_lr | Warm | 3478 -> 3089 (-389, -11.2%) | 9 -> 0 (-9, -100.0%) | 0 -> 0 |
| ls_then_lsl | Cold | 5180 -> 4968 (-212, -4.1%) | 9 -> 9 (+0, +0.0%) | 1 -> 2 (+1, +100.0%) |
| ls_then_lsl | Warm | 3491 -> 3089 (-402, -11.5%) | 9 -> 0 (-9, -100.0%) | 0 -> 0 |
| find_nonempty | Cold | 7935 -> 7636 (-299, -3.8%) | 9 -> 9 (+0, +0.0%) | 2 -> 3 (+1, +50.0%) |
| find_nonempty | Warm | 3221 -> 2865 (-356, -11.1%) | 9 -> 0 (-9, -100.0%) | 1 -> 1 (+0, +0.0%) |
| du_recursive | Cold | 7687 -> 7613 (-74, -1.0%) | 9 -> 9 (+0, +0.0%) | 2 -> 3 (+1, +50.0%) |
| du_recursive | Warm | 3050 -> 2749 (-301, -9.9%) | 9 -> 0 (-9, -100.0%) | 1 -> 1 (+0, +0.0%) |
### deep_dir (nohandlecache vs handlecache)
| Workload | Iteration | Total Time | QueryDirectories | QueryInfos |
|---|---|---|---|---|
| ls_r | Cold | 24941 -> 39000 (+14059, +56.4%) | 30010 -> 30010 (+0, +0.0%) | 2 -> 15007 (+15005, +750250.0%) |
| ls_r | Warm | 19146 -> 4855 (-14291, -74.6%) | 30010 -> 0 (-30010, -100.0%) | 1 -> 0 (-1, -100.0%) |
| ls_lr | Cold | 29613 -> 44657 (+15044, +50.8%) | 30010 -> 30010 (+0, +0.0%) | 1 -> 15007 (+15006, +1500600.0%) |
| ls_lr | Warm | 23779 -> 11213 (-12566, -52.8%) | 30010 -> 0 (-30010, -100.0%) | 1 -> 0 (-1, -100.0%) |
| ls_then_lsl | Cold | 25326 -> 38824 (+13498, +53.3%) | 30010 -> 30010 (+0, +0.0%) | 1 -> 15007 (+15006, +1500600.0%) |
| ls_then_lsl | Warm | 24100 -> 11597 (-12503, -51.9%) | 30010 -> 0 (-30010, -100.0%) | 1 -> 0 (-1, -100.0%) |
| find_nonempty | Cold | 38494 -> 43284 (+4790, +12.4%) | 30010 -> 30010 (+0, +0.0%) | 4 -> 15008 (+15004, +375100.0%) |
| find_nonempty | Warm | 36495 -> 8603 (-27892, -76.4%) | 30010 -> 0 (-30010, -100.0%) | 3 -> 1 (-2, -66.7%) |
| du_recursive | Cold | 28932 -> 43305 (+14373, +49.7%) | 30010 -> 30010 (+0, +0.0%) | 2 -> 15008 (+15006, +750300.0%) |
| du_recursive | Warm | 23297 -> 8866 (-14431, -61.9%) | 30010 -> 0 (-30010, -100.0%) | 2 -> 1 (-1, -50.0%) |
reply other threads:[~2026-04-28 16:19 UTC|newest]
Thread overview: [no followups] expand[flat|nested] mbox.gz Atom feed
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to='CANT5p=pTSh=R7mj6gc=70YZ9fdahYYUT5E2tdxwxZJ=jOrKCjQ@mail.gmail.com' \
--to=nspmangalore@gmail.com \
--cc=bharathsm.hsk@gmail.com \
--cc=dhowells@redhat.com \
--cc=ematsumiya@suse.de \
--cc=henrique.carvalho@suse.com \
--cc=linux-cifs@vger.kernel.org \
--cc=pc@manguebit.org \
--cc=smfrench@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).