gfs2.lists.linux.dev archive mirror
 help / color / mirror / Atom feed
From: "Darrick J. Wong" <djwong@kernel.org>
To: Amir Goldstein <amir73il@gmail.com>
Cc: Josef Bacik <josef@toxicpanda.com>,
	kernel-team@fb.com, linux-fsdevel@vger.kernel.org, jack@suse.cz,
	brauner@kernel.org, linux-xfs@vger.kernel.org,
	gfs2@lists.linux.dev, linux-bcachefs@vger.kernel.org
Subject: Re: [PATCH v4 00/16] fanotify: add pre-content hooks
Date: Fri, 30 Aug 2024 16:22:45 -0700	[thread overview]
Message-ID: <20240830232245.GQ6216@frogsfrogsfrogs> (raw)
In-Reply-To: <CAOQ4uxh+zD1A18VPyoRHeaBt+XCpt2LB18K6ZHQJR-VqdGrCVw@mail.gmail.com>

On Fri, Aug 30, 2024 at 10:55:10AM +0200, Amir Goldstein wrote:
> On Thu, Aug 29, 2024 at 11:41 PM Darrick J. Wong <djwong@kernel.org> wrote:
> >
> > On Wed, Aug 14, 2024 at 05:25:18PM -0400, Josef Bacik wrote:
> > > v3: https://lore.kernel.org/linux-fsdevel/cover.1723228772.git.josef@toxicpanda.com/
> > > v2: https://lore.kernel.org/linux-fsdevel/cover.1723144881.git.josef@toxicpanda.com/
> > > v1: https://lore.kernel.org/linux-fsdevel/cover.1721931241.git.josef@toxicpanda.com/
> > >
> > > v3->v4:
> > > - Trying to send a final verson Friday at 5pm before you go on vacation is a
> > >   recipe for silly mistakes, fixed the xfs handling yet again, per Christoph's
> > >   review.
> > > - Reworked the file system helper so it's handling of fpin was a little less
> > >   silly, per Chinner's suggestion.
> > > - Updated the return values to not or in VM_FAULT_RETRY, as we have a comment
> > >   in filemap_fault that says if VM_FAULT_ERROR is set we won't have
> > >   VM_FAULT_RETRY set.
> > >
> > > v2->v3:
> > > - Fix the pagefault path to do MAY_ACCESS instead, updated the perm handler to
> > >   emit PRE_ACCESS in this case, so we can avoid the extraneous perm event as per
> > >   Amir's suggestion.
> > > - Reworked the exported helper so the per-filesystem changes are much smaller,
> > >   per Amir's suggestion.
> > > - Fixed the screwup for DAX writes per Chinner's suggestion.
> > > - Added Christian's reviewed-by's where appropriate.
> > >
> > > v1->v2:
> > > - reworked the page fault logic based on Jan's suggestion and turned it into a
> > >   helper.
> > > - Added 3 patches per-fs where we need to call the fsnotify helper from their
> > >   ->fault handlers.
> > > - Disabled readahead in the case that there's a pre-content watch in place.
> > > - Disabled huge faults when there's a pre-content watch in place (entirely
> > >   because it's untested, theoretically it should be straightforward to do).
> > > - Updated the command numbers.
> > > - Addressed the random spelling/grammer mistakes that Jan pointed out.
> > > - Addressed the other random nits from Jan.
> > >
> > > --- Original email ---
> > >
> > > Hello,
> > >
> > > These are the patches for the bare bones pre-content fanotify support.  The
> > > majority of this work is Amir's, my contribution to this has solely been around
> > > adding the page fault hooks, testing and validating everything.  I'm sending it
> > > because Amir is traveling a bunch, and I touched it last so I'm going to take
> > > all the hate and he can take all the credit.
> > >
> > > There is a PoC that I've been using to validate this work, you can find the git
> > > repo here
> > >
> > > https://github.com/josefbacik/remote-fetch
> > >
> > > This consists of 3 different tools.
> > >
> > > 1. populate.  This just creates all the stub files in the directory from the
> > >    source directory.  Just run ./populate ~/linux ~/hsm-linux and it'll
> > >    recursively create all of the stub files and directories.
> > > 2. remote-fetch.  This is the actual PoC, you just point it at the source and
> > >    destination directory and then you can do whatever.  ./remote-fetch ~/linux
> > >    ~/hsm-linux.
> > > 3. mmap-validate.  This was to validate the pagefault thing, this is likely what
> > >    will be turned into the selftest with remote-fetch.  It creates a file and
> > >    then you can validate the file matches the right pattern with both normal
> > >    reads and mmap.  Normally I do something like
> > >
> > >    ./mmap-validate create ~/src/foo
> > >    ./populate ~/src ~/dst
> > >    ./rmeote-fetch ~/src ~/dst
> > >    ./mmap-validate validate ~/dst/foo
> > >
> > > I did a bunch of testing, I also got some performance numbers.  I copied a
> > > kernel tree, and then did remote-fetch, and then make -j4
> > >
> > > Normal
> > > real    9m49.709s
> > > user    28m11.372s
> > > sys     4m57.304s
> > >
> > > HSM
> > > real    10m6.454s
> > > user    29m10.517s
> > > sys     5m2.617s
> > >
> > > So ~17 seconds more to build with HSM.  I then did a make mrproper on both trees
> > > to see the size
> > >
> > > [root@fedora ~]# du -hs /src/linux
> > > 1.6G    /src/linux
> > > [root@fedora ~]# du -hs dst
> > > 125M    dst
> > >
> > > This mirrors the sort of savings we've seen in production.
> > >
> > > Meta has had these patches (minus the page fault patch) deployed in production
> > > for almost a year with our own utility for doing on-demand package fetching.
> > > The savings from this has been pretty significant.
> > >
> > > The page-fault hooks are necessary for the last thing we need, which is
> > > on-demand range fetching of executables.  Some of our binaries are several gigs
> > > large, having the ability to remote fetch them on demand is a huge win for us
> > > not only with space savings, but with startup time of containers.
> >
> > So... does this pre-content fetcher already work for regular reads and
> > writes, and now you're looking to wire up page faults?  Or does it only
> > handle page faults?  Judging from Amir's patches I'm guessing the
> > FAN_PRE_{ACCESS,MODIFY} events are new, so this only sends notifications
> > prior to read and write page faults?  The XFS change looks reasonable to
> > me, but I'm left wondering "what does this shiny /do/?"
> >
> 
> I *think* I understand the confusion.
> 
> Let me try to sort it out.
> 
> This patch set collaboration aims to add the functionality of HSM
> service by adding events FS_PRE_{ACCESS,MODIFY} prior to
> read/write and page faults.
> 
> Maybe you are puzzled by not seeing any new read/write hooks?
> This is because the HSM events utilize the existing fsnotify_file_*perm()
> hooks that are in place for the legacy FS_ACCESS_PERM event.
> 
> So why is a new FS_PRE_ACCESS needed?
> Let me first quote commit message of patch 2/16 [1]:
> ---
> The new FS_PRE_ACCESS permission event is similar to FS_ACCESS_PERM,
> but it meant for a different use case of filling file content before
> access to a file range, so it has slightly different semantics.
> 
> Generate FS_PRE_ACCESS/FS_ACCESS_PERM as two seperate events, same as
> we did for FS_OPEN_PERM/FS_OPEN_EXEC_PERM.
> 
> FS_PRE_MODIFY is a new permission event, with similar semantics as
> FS_PRE_ACCESS, which is called before a file is modified.
> 
> FS_ACCESS_PERM is reported also on blockdev and pipes, but the new
> pre-content events are only reported for regular files and dirs.
> 
> The pre-content events are meant to be used by hierarchical storage
> managers that want to fill the content of files on first access.
> ---
> 
> And from my man page draft [2]:
> ---
>        FAN_PRE_ACCESS (since Linux 6.x)
>               Create  an  event before a read from a directory or a file range,
>               that provides an opportunity for the event listener to
> modify the content of the object
>               before the reader is going to access the content in the
> specified range.
>               An additional information record of type
> FAN_EVENT_INFO_TYPE_RANGE is returned
>              for each event in the read buffer.
> ...
> ---
> 
> So the semantics of the two events is slightly different, but also the
> meaning of "an opportunity for the event listener to modify the content".
> 
> FS_ACCESS_PERM already provided this opportunity on old kernels,
> but prior to "Tidy up file permission hooks" series [3] in v6.8, writing
> file content in FS_ACCESS_PERM event context was prone to deadlocks.
> 
> Therefore, an application using FS_ACCESS_PERM may be prone
> for deadlocks, while an application using FAN_PRE_ACCESS should
> be safe in that regard.

Ah, ok, that's where the missing pieces are. :)

--D

> Thanks,
> Amir.
> 
> [1] https://lore.kernel.org/all/a96217d84dfebb15582a04524dc9821ba3ea1406.1723670362.git.josef@toxicpanda.com/
> [2] https://github.com/amir73il/man-pages/commits/fan_pre_path
> [3] https://lore.kernel.org/all/20231122122715.2561213-1-amir73il@gmail.com/
> 

      reply	other threads:[~2024-08-30 23:22 UTC|newest]

Thread overview: 40+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-08-14 21:25 [PATCH v4 00/16] fanotify: add pre-content hooks Josef Bacik
2024-08-14 21:25 ` [PATCH v4 01/16] fanotify: don't skip extra event info if no info_mode is set Josef Bacik
2024-08-14 21:25 ` [PATCH v4 02/16] fsnotify: introduce pre-content permission event Josef Bacik
2024-08-14 21:25 ` [PATCH v4 03/16] fsnotify: generate pre-content permission event on open Josef Bacik
2024-08-14 21:25 ` [PATCH v4 04/16] fanotify: introduce FAN_PRE_ACCESS permission event Josef Bacik
2024-08-14 21:25 ` [PATCH v4 05/16] fanotify: introduce FAN_PRE_MODIFY " Josef Bacik
2024-08-14 21:25 ` [PATCH v4 06/16] fanotify: pass optional file access range in pre-content event Josef Bacik
2024-08-14 21:25 ` [PATCH v4 07/16] fanotify: rename a misnamed constant Josef Bacik
2024-08-14 21:25 ` [PATCH v4 08/16] fanotify: report file range info with pre-content events Josef Bacik
2024-08-29 10:13   ` Jan Kara
2024-08-14 21:25 ` [PATCH v4 09/16] fanotify: allow to set errno in FAN_DENY permission response Josef Bacik
2024-08-29 10:25   ` Jan Kara
2024-08-14 21:25 ` [PATCH v4 10/16] fanotify: add a helper to check for pre content events Josef Bacik
2024-08-14 21:25 ` [PATCH v4 11/16] fanotify: disable readahead if we have pre-content watches Josef Bacik
2024-08-29 10:48   ` Jan Kara
2024-08-29 12:44     ` Josef Bacik
2024-08-14 21:25 ` [PATCH v4 12/16] mm: don't allow huge faults for files with pre content watches Josef Bacik
2024-08-29 10:51   ` Jan Kara
2024-08-14 21:25 ` [PATCH v4 13/16] fsnotify: generate pre-content permission event on page fault Josef Bacik
2024-08-29 11:07   ` Jan Kara
2024-08-14 21:25 ` [PATCH v4 14/16] bcachefs: add pre-content fsnotify hook to fault Josef Bacik
2024-08-29 11:10   ` Jan Kara
2024-08-29 11:26     ` Kent Overstreet
2024-08-29 12:46       ` Josef Bacik
2024-08-29 12:55         ` Kent Overstreet
2024-08-29 12:25     ` Kent Overstreet
2024-08-14 21:25 ` [PATCH v4 15/16] gfs2: " Josef Bacik
2024-08-29 11:15   ` Jan Kara
2024-08-29 11:26     ` Amir Goldstein
2024-08-29 11:43       ` Jan Kara
2024-08-29 12:49         ` Amir Goldstein
2024-08-29 12:42     ` Josef Bacik
2024-08-14 21:25 ` [PATCH v4 16/16] xfs: add pre-content fsnotify hook for write faults Josef Bacik
2024-08-29 11:17   ` Jan Kara
2024-08-30 23:28     ` Darrick J. Wong
2024-09-02 10:23       ` Jan Kara
2024-09-02 11:19         ` Christian Brauner
2024-08-29 21:41 ` [PATCH v4 00/16] fanotify: add pre-content hooks Darrick J. Wong
2024-08-30  8:55   ` Amir Goldstein
2024-08-30 23:22     ` Darrick J. Wong [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20240830232245.GQ6216@frogsfrogsfrogs \
    --to=djwong@kernel.org \
    --cc=amir73il@gmail.com \
    --cc=brauner@kernel.org \
    --cc=gfs2@lists.linux.dev \
    --cc=jack@suse.cz \
    --cc=josef@toxicpanda.com \
    --cc=kernel-team@fb.com \
    --cc=linux-bcachefs@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-xfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).