Linux-Fsdevel Archive mirror
 help / color / mirror / Atom feed
From: John Groves <John@groves.net>
To: Kent Overstreet <kent.overstreet@linux.dev>
Cc: Matthew Wilcox <willy@infradead.org>,
	Jonathan Corbet <corbet@lwn.net>,
	 Jonathan Cameron <Jonathan.Cameron@huawei.com>,
	Dan Williams <dan.j.williams@intel.com>,
	 Vishal Verma <vishal.l.verma@intel.com>,
	Dave Jiang <dave.jiang@intel.com>,
	 Alexander Viro <viro@zeniv.linux.org.uk>,
	Christian Brauner <brauner@kernel.org>, Jan Kara <jack@suse.cz>,
	 linux-cxl@vger.kernel.org, linux-fsdevel@vger.kernel.org,
	nvdimm@lists.linux.dev,  John Groves <jgroves@micron.com>,
	john@jagalactic.com, Dave Chinner <david@fromorbit.com>,
	 Christoph Hellwig <hch@infradead.org>,
	dave.hansen@linux.intel.com, gregory.price@memverge.com,
	 Randy Dunlap <rdunlap@infradead.org>,
	Jerome Glisse <jglisse@google.com>,
	 Aravind Ramesh <arramesh@micron.com>,
	Ajay Joshi <ajayjoshi@micron.com>,
	 Eishan Mirakhur <emirakhur@micron.com>,
	Ravi Shankar <venkataravis@micron.com>,
	 Srinivasulu Thanneeru <sthanneeru@micron.com>,
	Luis Chamberlain <mcgrof@kernel.org>,
	 Amir Goldstein <amir73il@gmail.com>,
	Chandan Babu R <chandanbabu@kernel.org>,
	 Bagas Sanjaya <bagasdotme@gmail.com>,
	"Darrick J . Wong" <djwong@kernel.org>,
	 Steve French <stfrench@microsoft.com>,
	Nathan Lynch <nathanl@linux.ibm.com>,
	 Michael Ellerman <mpe@ellerman.id.au>,
	Thomas Zimmermann <tzimmermann@suse.de>,
	 Julien Panis <jpanis@baylibre.com>,
	Stanislav Fomichev <sdf@google.com>,
	 Dongsheng Yang <dongsheng.yang@easystack.cn>
Subject: Re: [RFC PATCH v2 00/12] Introduce the famfs shared-memory file system
Date: Tue, 30 Apr 2024 21:09:18 -0500	[thread overview]
Message-ID: <zwtglxnnbjbgtzcv7rhrbtqz6owmviv3yg25bcluenaayzzj4p@nng3gh5twcjo> (raw)
In-Reply-To: <h6tbyvdeq5hzkttsy4uyzq4v64xlkzeqfyo52ku3w4x3vvtpd7@4vxafcct62xh>

On 24/04/29 11:11PM, Kent Overstreet wrote:
> On Mon, Apr 29, 2024 at 09:24:19PM -0500, John Groves wrote:
> > On 24/04/29 07:08PM, Kent Overstreet wrote:
> > > On Mon, Apr 29, 2024 at 07:32:55PM +0100, Matthew Wilcox wrote:
> > > > On Mon, Apr 29, 2024 at 12:04:16PM -0500, John Groves wrote:
> > > > > This patch set introduces famfs[1] - a special-purpose fs-dax file system
> > > > > for sharable disaggregated or fabric-attached memory (FAM). Famfs is not
> > > > > CXL-specific in anyway way.
> > > > > 
> > > > > * Famfs creates a simple access method for storing and sharing data in
> > > > >   sharable memory. The memory is exposed and accessed as memory-mappable
> > > > >   dax files.
> > > > > * Famfs supports multiple hosts mounting the same file system from the
> > > > >   same memory (something existing fs-dax file systems don't do).
> > > > 
> > > > Yes, but we do already have two filesystems that support shared storage,
> > > > and are rather more advanced than famfs -- GFS2 and OCFS2.  What are
> > > > the pros and cons of improving either of those to support DAX rather
> > > > than starting again with a new filesystem?
> > > 
> > > I could see a shared memory filesystem as being a completely different
> > > beast than a shared block storage filesystem - and I've never heard
> > > anyone talking about gfs2 or ocfs2 as codebases we particularly liked.
> > 
> > Thanks for your attention on famfs, Kent.
> > 
> > I think of it as a completely different beast. See my reply to Willy re:
> > famfs being more of a memory allocator with the benefit of allocations 
> > being accessible (and memory-mappable) as files.
> 
> That's pretty much what I expected.
> 
> I would suggest talking to RDMA people; RDMA does similar things with
> exposing address spaces across machine, and an "external" memory
> allocator is a basic building block there as well - it'd be great if we
> could get that turned into some clean library code.
> 
> GPU people as well, possibly.

Thanks for your attention Kent.

I'm on it. Part of the core idea behind famfs is that page-oriented data
movement can be avoided with actual shared memory. Yes, the memory is likely to 
be slower (either BW or latency or both) but it's cacheline access rather than 
full-page (or larger) retrieval, which is a win for some access patterns (and
not so for others).

Part of the issue is communicating the fact that shared access to cachelines
is possible.

There are some interesting possibilities with GPUs retrieving famfs files
(or portions thereof), but I have no insight as to the motivations of GPU 
vendors.

> 
> > The famfs user space repo has some good documentation as to the on-
> > media structure of famfs. Scroll down on [1] (the documentation from
> > the famfs user space repo). There is quite a bit of info in the docs
> > from that repo.
> 
> Ok, looking through that now.
> 
> So youv've got a metadata log; that looks more like a conventional
> filesystem than a conventional purely in-memory thing.
> 
> But you say it's a shared filesystem, and it doesn't say anything about
> that. Inter node locking?
> 
> Perhaps the ocfs2/gfs2 comparison is appropriate, after all.

Famfs is intended to be mounted from more than one host from the same in-memory
image. A metadata log is kinda the simpliest approach to make that work (let me
know your thoughts if you disagree on that). When a client mounts, playing the 
log from the shared memory brings that client mount into sync with the source 
(the Master).

No inter-node locking is currently needed because only the node that created
the file system (the Master) can write the log. Famfs is not intended to be 
a general-purpose FS...

The famfs log is currently append-only, and I think of it as a "code-first"
implementation of a shared memory FS that that gets the job done in something
approaching the simplest possible approach.

If the approach evolves to full allocate-on-write, then moving to a file system
platform that handles that would make sense. If it remains (as I suspect will
make sense) a way to share collections of data sets, or indexes, or other 
data that is published and then consumed [all or mostly] read-only, this
simple approach may be long-term sufficient.

Regards,
John




  reply	other threads:[~2024-05-01  2:09 UTC|newest]

Thread overview: 32+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-04-29 17:04 [RFC PATCH v2 00/12] Introduce the famfs shared-memory file system John Groves
2024-04-29 17:04 ` [RFC PATCH v2 01/12] famfs: Introduce famfs documentation John Groves
2024-04-30  6:46   ` Bagas Sanjaya
2024-04-29 17:04 ` [RFC PATCH v2 02/12] dev_dax_iomap: Move dax_pgoff_to_phys() from device.c to bus.c John Groves
2024-04-29 17:04 ` [RFC PATCH v2 03/12] dev_dax_iomap: Add fs_dax_get() func to prepare dax for fs-dax usage John Groves
2024-04-29 17:04 ` [RFC PATCH v2 04/12] dev_dax_iomap: Save the kva from memremap John Groves
2024-04-29 17:04 ` [RFC PATCH v2 05/12] dev_dax_iomap: Add dax_operations for use by fs-dax on devdax John Groves
2024-04-29 17:04 ` [RFC PATCH v2 06/12] dev_dax_iomap: export dax_dev_get() John Groves
2024-04-29 17:04 ` [RFC PATCH v2 07/12] famfs prep: Add fs/super.c:kill_char_super() John Groves
2024-05-02 18:17   ` Al Viro
2024-05-02 22:25     ` John Groves
2024-05-03  9:04       ` Christian Brauner
2024-05-03 15:38         ` John Groves
2024-04-29 17:04 ` [RFC PATCH v2 08/12] famfs: module operations & fs_context John Groves
2024-04-30 11:01   ` Christian Brauner
2024-05-02 15:51     ` John Groves
2024-05-03 14:15     ` John Groves
2024-05-02 18:23   ` Al Viro
2024-05-02 21:50     ` John Groves
2024-04-29 17:04 ` [RFC PATCH v2 09/12] famfs: Introduce inode_operations and super_operations John Groves
2024-04-29 17:04 ` [RFC PATCH v2 10/12] famfs: Introduce file_operations read/write John Groves
2024-05-02 18:29   ` Al Viro
2024-05-02 21:51     ` John Groves
2024-04-29 17:04 ` [RFC PATCH v2 11/12] famfs: Introduce mmap and VM fault handling John Groves
2024-04-29 17:04 ` [RFC PATCH v2 12/12] famfs: famfs_ioctl and core file-to-memory mapping logic & iomap_ops John Groves
2024-04-29 18:32 ` [RFC PATCH v2 00/12] Introduce the famfs shared-memory file system Matthew Wilcox
2024-04-29 23:08   ` Kent Overstreet
2024-04-30  2:24     ` John Groves
2024-04-30  3:11       ` Kent Overstreet
2024-05-01  2:09         ` John Groves [this message]
2024-04-30  2:11   ` John Groves
2024-04-30 21:01     ` Matthew Wilcox

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=zwtglxnnbjbgtzcv7rhrbtqz6owmviv3yg25bcluenaayzzj4p@nng3gh5twcjo \
    --to=john@groves.net \
    --cc=Jonathan.Cameron@huawei.com \
    --cc=ajayjoshi@micron.com \
    --cc=amir73il@gmail.com \
    --cc=arramesh@micron.com \
    --cc=bagasdotme@gmail.com \
    --cc=brauner@kernel.org \
    --cc=chandanbabu@kernel.org \
    --cc=corbet@lwn.net \
    --cc=dan.j.williams@intel.com \
    --cc=dave.hansen@linux.intel.com \
    --cc=dave.jiang@intel.com \
    --cc=david@fromorbit.com \
    --cc=djwong@kernel.org \
    --cc=dongsheng.yang@easystack.cn \
    --cc=emirakhur@micron.com \
    --cc=gregory.price@memverge.com \
    --cc=hch@infradead.org \
    --cc=jack@suse.cz \
    --cc=jglisse@google.com \
    --cc=jgroves@micron.com \
    --cc=john@jagalactic.com \
    --cc=jpanis@baylibre.com \
    --cc=kent.overstreet@linux.dev \
    --cc=linux-cxl@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=mcgrof@kernel.org \
    --cc=mpe@ellerman.id.au \
    --cc=nathanl@linux.ibm.com \
    --cc=nvdimm@lists.linux.dev \
    --cc=rdunlap@infradead.org \
    --cc=sdf@google.com \
    --cc=stfrench@microsoft.com \
    --cc=sthanneeru@micron.com \
    --cc=tzimmermann@suse.de \
    --cc=venkataravis@micron.com \
    --cc=viro@zeniv.linux.org.uk \
    --cc=vishal.l.verma@intel.com \
    --cc=willy@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).