Re: [PATCH 0/8] mm/swap: optimize swap cache search space

LKML Archive mirror
 help / color / mirror / Atom feed

From: "Huang, Ying" <ying.huang@intel.com>
To: Kairui Song <ryncsn@gmail.com>
Cc: Matthew Wilcox <willy@infradead.org>,
	 linux-mm@kvack.org,  Andrew Morton <akpm@linux-foundation.org>,
	 Chris Li <chrisl@kernel.org>,
	 Barry Song <v-songbaohua@oppo.com>,
	 Ryan Roberts <ryan.roberts@arm.com>,  Neil Brown <neilb@suse.de>,
	 Minchan Kim <minchan@kernel.org>,
	 Hugh Dickins <hughd@google.com>,
	 David Hildenbrand <david@redhat.com>,
	 Yosry Ahmed <yosryahmed@google.com>,
	 linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org
Subject: Re: [PATCH 0/8] mm/swap: optimize swap cache search space
Date: Tue, 23 Apr 2024 09:29:03 +0800	[thread overview]
Message-ID: <87r0ewx3xc.fsf@yhuang6-desk2.ccr.corp.intel.com> (raw)
In-Reply-To: <CAMgjq7B1YTrvZOrnbtVYfVMVAmtMkkwiqcqc1AGup4=gvgxKhQ@mail.gmail.com> (Kairui Song's message of "Mon, 22 Apr 2024 23:20:19 +0800")

Kairui Song <ryncsn@gmail.com> writes:

> On Mon, Apr 22, 2024 at 3:56 PM Huang, Ying <ying.huang@intel.com> wrote:
>>
>> Hi, Kairui,
>>
>> Kairui Song <ryncsn@gmail.com> writes:
>>
>> > From: Kairui Song <kasong@tencent.com>
>> >
>> > Currently we use one swap_address_space for every 64M chunk to reduce lock
>> > contention, this is like having a set of smaller swap files inside one
>> > big swap file. But when doing swap cache look up or insert, we are
>> > still using the offset of the whole large swap file. This is OK for
>> > correctness, as the offset (key) is unique.
>> >
>> > But Xarray is specially optimized for small indexes, it creates the
>> > redix tree levels lazily to be just enough to fit the largest key
>> > stored in one Xarray. So we are wasting tree nodes unnecessarily.
>> >
>> > For 64M chunk it should only take at most 3 level to contain everything.
>> > But we are using the offset from the whole swap file, so the offset (key)
>> > value will be way beyond 64M, and so will the tree level.
>> >
>> > Optimize this by reduce the swap cache search space into 64M scope.
>>
>
> Hi,
>
> Thanks for the comments!
>
>> In general, I think that it makes sense to reduce the depth of the
>> xarray.
>>
>> One concern is that IIUC we make swap cache behaves like file cache if
>> possible.  And your change makes swap cache and file cache diverge more.
>> Is it possible for us to keep them similar?
>
> So far in this series, I think there is no problem for that, the two
> main helpers for retrieving file & cache offset: folio_index and
> folio_file_pos will work fine and be compatible with current users.
>
> And if we convert to share filemap_* functions for swap cache / page
> cache, they are mostly already accepting index as an argument so no
> trouble at all.
>
>>
>> For example,
>>
>> Is it possible to return the offset inside 64M range in
>> __page_file_index() (maybe rename it)?
>
> Not sure what you mean by this, __page_file_index will be gone as we
> convert to folio.
> And this series did delete / rename it (it might not be easy to see
> this, the usage of these helpers is not very well organized before
> this series so some clean up is involved).
> It was previously only used through page_index (deleted) /
> folio_index, and, now folio_index will be returning the offset inside
> the 64M range.
>
> I guess I just did what you wanted? :)

Good!

> My cover letter and commit message might be not clear enough, I can update it.
>
>>
>> Is it possible to add "start_offset" support in xarray, so "index"
>> will subtract "start_offset" before looking up / inserting?
>
> xarray struct seems already very full, and this usage doesn't look
> generic to me, might be better to fix this kind of issue case by case.

Just some open question.

>>
>> Is it possible to use multiple range locks to protect one xarray to
>> improve the lock scalability?  This is why we have multiple "struct
>> address_space" for one swap device.  And, we may have same lock
>> contention issue for large files too.
>
> Good question, this series can improve the tree depth issue for swap
> cache, but contention in address space is still a thing.

The lock contention for swap cache has been reduced via using multiple
xarray in commit 4b3ef9daa4fc ("mm/swap: split swap cache into 64MB
trunks").  But it fixes that for swap cache only, not for file cache in
general.  We have observed similar lock contention issue for file cache
too.  And the method isn't perfect too, like the issue you found here.
In general, it's about what is "file" for swap device.

> A more generic solution might involve changes of xarray API or use
> some other data struct?
>
> (BTW I think reducing the search space and resolving lock contention
> is not necessarily related, reducing the search space by having a
> large table of small trees should still perform better for swap
> cache).

--
Best Regards,
Huang, Ying

next prev parent reply	other threads:[~2024-04-23  1:31 UTC|newest]

Thread overview: 40+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-04-17 16:08 [PATCH 0/8] mm/swap: optimize swap cache search space Kairui Song
2024-04-17 16:08 ` [PATCH 1/8] NFS: remove nfs_page_lengthg and usage of page_index Kairui Song
2024-04-17 16:08 ` [PATCH 2/8] nilfs2: drop " Kairui Song
2024-04-17 16:14   ` Matthew Wilcox
2024-04-18  2:42     ` Kairui Song
2024-04-17 16:08 ` [PATCH 3/8] f2fs: " Kairui Song
2024-04-17 16:08 ` [PATCH 4/8] ceph: " Kairui Song
2024-04-18  0:28   ` Xiubo Li
2024-04-18  1:30     ` Matthew Wilcox
2024-04-18  1:40       ` Xiubo Li
2024-04-22 15:34         ` Kairui Song
2024-04-23  0:32           ` Xiubo Li
2024-04-17 16:08 ` [PATCH 5/8] cifs: drop usage of page_file_offset Kairui Song
2024-04-17 16:25   ` Matthew Wilcox
2024-04-17 16:08 ` [PATCH 6/8] mm/swap: get the swap file offset directly Kairui Song
2024-04-18 18:43   ` kernel test robot
2024-04-23  1:41   ` Huang, Ying
2024-04-23 13:33     ` Kairui Song
2024-04-17 16:08 ` [PATCH 7/8] mm: drop page_index/page_file_offset and convert swap helpers to use folio Kairui Song
2024-04-18  1:55   ` Barry Song
2024-04-18  2:42     ` Kairui Song
2024-04-18 10:19       ` Barry Song
2024-04-18  3:30     ` Matthew Wilcox
2024-04-18  3:55       ` Barry Song
2024-04-17 16:08 ` [PATCH 8/8] mm/swap: reduce swap cache search space Kairui Song
2024-04-18 18:21   ` kernel test robot
2024-04-18 18:21   ` kernel test robot
2024-04-22  7:54 ` [PATCH 0/8] mm/swap: optimize " Huang, Ying
2024-04-22 15:20   ` Kairui Song
2024-04-23  1:29     ` Huang, Ying [this message]
2024-04-23  3:20   ` Matthew Wilcox
2024-04-24  2:24     ` Huang, Ying
2024-04-26 23:16       ` Chris Li
2024-04-28  1:14         ` Huang, Ying
2024-04-28  2:43           ` Chris Li
2024-04-28  3:21             ` Huang, Ying
2024-04-28 17:26               ` Chris Li
2024-04-28 17:37         ` Kairui Song
2024-04-28 17:45           ` Kairui Song
2024-04-29  5:50           ` Chris Li

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87r0ewx3xc.fsf@yhuang6-desk2.ccr.corp.intel.com \
    --to=ying.huang@intel.com \
    --cc=akpm@linux-foundation.org \
    --cc=chrisl@kernel.org \
    --cc=david@redhat.com \
    --cc=hughd@google.com \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=minchan@kernel.org \
    --cc=neilb@suse.de \
    --cc=ryan.roberts@arm.com \
    --cc=ryncsn@gmail.com \
    --cc=v-songbaohua@oppo.com \
    --cc=willy@infradead.org \
    --cc=yosryahmed@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).