CEPH-Devel archive mirror
 help / color / mirror / Atom feed
From: David Howells <dhowells@redhat.com>
To: Ilya Dryomov <idryomov@gmail.com>, Xiubo Li <xiubli@redhat.com>,
	Greg Farnum <gfarnum@redhat.com>,
	Venky Shankar <vshankar@redhat.com>
Cc: dhowells@redhat.com, Jeff Layton <jlayton@kernel.org>,
	ceph-devel@vger.kernel.org
Subject: Modifying and fixing(?) the per-inode snap handling in ceph
Date: Mon, 15 Jan 2024 14:07:18 +0000	[thread overview]
Message-ID: <2546440.1705327638@warthog.procyon.org.uk> (raw)

Hi, Ilya, Xiubo, Greg,

I'm trying to finish my patches to make ceph work with netfslib and I'm
wondering if snap handling on inodes can be made easier to work with.  Also, I
think there may be a bug in the interaction between ceph_queue_cap_snap() and
writable mmaps.

What I would like to do is to make page/folio->private point at the
ceph_cap_snap struct instead of pointing to ceph_snap_context.  This makes it
easier to fish the metadata details out in ceph when netfslib asks it to
perform a write operation.

Netfslib has the capability to pass an netfs_group struct through the API, and
I currently have this subclassed by ceph_snap_context, but that doesn't
directly carry sufficient information as I presume that's a global thing and
not an inode-specific thing.

However, it looks like capsnaps don't always exist, even on dirty inodes...

So what I'm thinking is:

 (1) Make struct ceph_cap_snap a subclass of netfs_group.  This would allow
     netfslib to manipulate them and attach them to dirty pages and do
     selective writeback.

 (2) Always keep a ceph_cap_snap on a dirty inode.  It can be treated
     specially when it's the only snap and at the head.

 (3) Offload some of the fields from ceph_inode_info into ceph_cap_snap
     (eg. truncate_size and truncate_seq) and update them directly there.

 (4) On entry to any sort of write routine, see if we need a new capsnap for
     that inode and, if so, create one.  This would include ->write_iter(),
     ->page_mkwrite(), ->setattr(), possibly ->setxattr(),

 (5) In queue_realm_cap_snaps(), mark the capsnap as being obsolete and call
     unmap_mapping_pages() on each inode to force ->page_mkwrite() to be
     called[!] on further modification.

     queue_realm_cap_snaps() doesn't then need to create a new snapcap; this
     can be left to the various write routines.

     [!] This would fix the aforementioned potential bug whereby someone can
     continue writing to the inode even though a new snap has happened.

 (6) ceph_writepages() calls netfs_writepages_group() to flush out pages with
     the matching group, stepping through the capsnap list on the inode.

Any thoughts on whether this would work?  If I can do this, I can reduce
get_oldest_context() to almost nothing and don't need the ceph_writeback_ctl
struct anymore (I think).

Thanks,
David


             reply	other threads:[~2024-01-15 14:07 UTC|newest]

Thread overview: 3+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <f535be3c-3d02-4a13-8aaf-5c634ffa218b@redhat.com>
2024-01-15 14:07 ` David Howells [this message]
2024-01-16 10:42   ` Modifying and fixing(?) the per-inode snap handling in ceph David Howells
2024-01-17  2:28     ` Xiubo Li

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=2546440.1705327638@warthog.procyon.org.uk \
    --to=dhowells@redhat.com \
    --cc=ceph-devel@vger.kernel.org \
    --cc=gfarnum@redhat.com \
    --cc=idryomov@gmail.com \
    --cc=jlayton@kernel.org \
    --cc=vshankar@redhat.com \
    --cc=xiubli@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).