Re: [RFC PATCH 0/3] um: clean up mm creation - another attempt

linux-um.lists.infradead.org archive mirror
 help / color / mirror / Atom feed

From: Benjamin Berg <benjamin@sipsolutions.net>
To: Anton Ivanov <anton.ivanov@cambridgegreys.com>,
	Johannes Berg <johannes@sipsolutions.net>,
	linux-um@lists.infradead.org
Subject: Re: [RFC PATCH 0/3] um: clean up mm creation - another attempt
Date: Wed, 17 Jan 2024 20:54:35 +0100	[thread overview]
Message-ID: <478ac27fd53fa20b4f735b1d792639cd61d5eda4.camel@sipsolutions.net> (raw)
In-Reply-To: <57c2ec52-29a6-4ce7-9334-e0ee436ba630@cambridgegreys.com>

On Wed, 2024-01-17 at 19:45 +0000, Anton Ivanov wrote:
> On 17/01/2024 17:17, Benjamin Berg wrote:
> > Hi,
> > 
> > On Wed, 2023-09-27 at 11:52 +0200, Benjamin Berg wrote:
> > > [SNIP]
> > > Once we are there, we can look for optimizations. The fundamental
> > > problem is that page faults (even minor ones) are extremely expensive
> > > for us.
> > > 
> > > Just throwing out ideas on what we could do:
> > >     1. SECCOMP as that reduces the amount of context switches.
> > >        (Yes, I know I should resubmit the patchset)
> > >     2. Maybe we can disable/cripple page access tracking? If we assume
> > >        initially mark all pages as accessed by userspace (i.e.
> > >        pte_mkyoung), then we avoid a minor page fault on first access.
> > >        Doing that will mess with page eviction though.
> > >     3. Do DAX (direct_access) for files. i.e. mmap files directly in the
> > >        host kernel rather than through UM.
> > >        With a hostfs like file system, one should be able to add an
> > >        intermediate block device that maps host files to physical pages,
> > >        then do DAX in the FS.
> > >        For disk images, the existing iomem infrastructure should be
> > >        usable, this should work with any DAX enabled filesystems (ext2,
> > >        ext4, xfs, virtiofs, erofs).
> > 
> > So, I experimented quite a bit over Christmas (including getting DAX to
> > work with virtiofs). At the end of all this my conclusion is that
> > insufficient page table synchronization is our main problem.
> > 
> > Basically, right now we rely on the flush_tlb_* functions from the
> > kernel, but these are only called when TLB entries are removed, *not*
> > when new PTEs are added (there is also update_mmu_cache, but it isn't
> > enough either). Effectively this means that new page table entries will
> > often only be synced because the userspace code runs into an
> > unnecessary segfright now we rely on the flush_tlb_* functions from the
> > kernel, but these are only called when TLB entries are removed, *not*
> > when new PTEs are added (there is also update_mmu_cache, but it isn't
> > enough either). Effectively this means that new page table entries will
> > often only be synced because the userspace code runs into an
> > unnecessary segfaultault.
> >   
> > Really, what we need is a set_pte_at() implementation that marks the
> > memory range for synchronization. Then we can make sure we sync it
> > before switching to the userspace process (the equivalent of running
> > flush_tlb_mm_range right now).
> > 
> > I think we should:
> >   * Rewrite the userspace syscall code
> >     - Support delaying the execution of syscalls
> >     - Only support mmap/munmap/mprotect and LDT
> >     - Do simple compression of consecutive syscalls here
> >     - Drop the hand-written assembler
> >   * Improve the tlb.c code
> >     - remove the HVC abstraction
> 
> Cool. That was not working particularly well. I tried to improve it a
> few times, but ripping it out and replacing it is probably a better idea.

Hm, now I realise that we still want mmap() syscall compression for the
kernel itself in tlb.c.

> >     - never force immediate syscall execution
> >   * Let set_pte_at() track which memory ranges that need syncing
> >   * At that point we should be able to:
> >     - drop copy_context_skas0
> >     - make flush_tlb_* no-ops
> >     - drop flush_tlb_page from handle_page_fault
> >     - move unmap() from flush_thread to init_new_context
> >       (or do it as part of start_userspace)
> > 
> > So, I did try this using nasty hacks and IIRC one of my runs was going
> > from 21s to 16s and another from 63s to 56s. Which seems like a nice
> > improvement.
> 
> Excellent. I assume you were using hostfs as usual, right? If so, the
> difference is likely to be even more noticeable on ubd.

Yes, I was mostly testing hostfs. Initially also virtiofs with DAX, but
I went back as that didn't result in a pagefault count improvement once
I made some other adjustments.

Benjamin

> 
> > 
> > Benjamin
> > 
> > 
> > PS: As for DAX, it doesn't really seem to help performance. It didn't
> > seem to lower the amount of page faults in UML. And, from my
> > perspective, it isn't really worth just for the memory sharing.
> > 
> > PPS: dirty/young tracking seemed to be only cause a small amount of
> > page faults in the grand scheme. So probably not something worth
> > following up on.
> > 
>

     prev parent reply	other threads:[~2024-01-17 19:54 UTC|newest]

Thread overview: 19+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-09-22 22:37 [RFC PATCH 0/3] um: clean up mm creation - another attempt Johannes Berg
2023-09-22 22:37 ` [RFC PATCH 1/3] um/x86: remove ldt mutex and use mmap lock instead Johannes Berg
2023-09-22 22:37 ` [RFC PATCH 2/3] um: clean up init_new_context() Johannes Berg
2023-09-22 22:37 ` [RFC PATCH 3/3] um: don't force-flush in mm/userspace process start Johannes Berg
2023-09-25 13:29 ` [RFC PATCH 0/3] um: clean up mm creation - another attempt Anton Ivanov
2023-09-25 13:33   ` Johannes Berg
2023-09-25 13:34     ` Anton Ivanov
2023-09-25 14:27     ` Anton Ivanov
2023-09-25 14:44       ` Johannes Berg
2023-09-25 15:20         ` Anton Ivanov
2023-09-26 12:16           ` Anton Ivanov
2023-09-26 12:38             ` Johannes Berg
2023-09-26 13:04               ` Anton Ivanov
2023-09-27  9:52               ` Benjamin Berg
2023-09-27  9:59                 ` Anton Ivanov
2023-09-27 10:42                   ` Benjamin Berg
2024-01-17 17:17                 ` Benjamin Berg
2024-01-17 19:45                   ` Anton Ivanov
2024-01-17 19:54                     ` Benjamin Berg [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=478ac27fd53fa20b4f735b1d792639cd61d5eda4.camel@sipsolutions.net \
    --to=benjamin@sipsolutions.net \
    --cc=anton.ivanov@cambridgegreys.com \
    --cc=johannes@sipsolutions.net \
    --cc=linux-um@lists.infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).