From: David Howells <dhowells@redhat.com>
To: Ilya Dryomov <idryomov@gmail.com>, Xiubo Li <xiubli@redhat.com>,
Greg Farnum <gfarnum@redhat.com>,
Venky Shankar <vshankar@redhat.com>
Cc: dhowells@redhat.com, Jeff Layton <jlayton@kernel.org>,
ceph-devel@vger.kernel.org
Subject: Modifying and fixing(?) the per-inode snap handling in ceph
Date: Mon, 15 Jan 2024 14:07:18 +0000 [thread overview]
Message-ID: <2546440.1705327638@warthog.procyon.org.uk> (raw)
Hi, Ilya, Xiubo, Greg,
I'm trying to finish my patches to make ceph work with netfslib and I'm
wondering if snap handling on inodes can be made easier to work with. Also, I
think there may be a bug in the interaction between ceph_queue_cap_snap() and
writable mmaps.
What I would like to do is to make page/folio->private point at the
ceph_cap_snap struct instead of pointing to ceph_snap_context. This makes it
easier to fish the metadata details out in ceph when netfslib asks it to
perform a write operation.
Netfslib has the capability to pass an netfs_group struct through the API, and
I currently have this subclassed by ceph_snap_context, but that doesn't
directly carry sufficient information as I presume that's a global thing and
not an inode-specific thing.
However, it looks like capsnaps don't always exist, even on dirty inodes...
So what I'm thinking is:
(1) Make struct ceph_cap_snap a subclass of netfs_group. This would allow
netfslib to manipulate them and attach them to dirty pages and do
selective writeback.
(2) Always keep a ceph_cap_snap on a dirty inode. It can be treated
specially when it's the only snap and at the head.
(3) Offload some of the fields from ceph_inode_info into ceph_cap_snap
(eg. truncate_size and truncate_seq) and update them directly there.
(4) On entry to any sort of write routine, see if we need a new capsnap for
that inode and, if so, create one. This would include ->write_iter(),
->page_mkwrite(), ->setattr(), possibly ->setxattr(),
(5) In queue_realm_cap_snaps(), mark the capsnap as being obsolete and call
unmap_mapping_pages() on each inode to force ->page_mkwrite() to be
called[!] on further modification.
queue_realm_cap_snaps() doesn't then need to create a new snapcap; this
can be left to the various write routines.
[!] This would fix the aforementioned potential bug whereby someone can
continue writing to the inode even though a new snap has happened.
(6) ceph_writepages() calls netfs_writepages_group() to flush out pages with
the matching group, stepping through the capsnap list on the inode.
Any thoughts on whether this would work? If I can do this, I can reduce
get_oldest_context() to almost nothing and don't need the ceph_writeback_ctl
struct anymore (I think).
Thanks,
David
next reply other threads:[~2024-01-15 14:07 UTC|newest]
Thread overview: 3+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <f535be3c-3d02-4a13-8aaf-5c634ffa218b@redhat.com>
2024-01-15 14:07 ` David Howells [this message]
2024-01-16 10:42 ` Modifying and fixing(?) the per-inode snap handling in ceph David Howells
2024-01-17 2:28 ` Xiubo Li
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=2546440.1705327638@warthog.procyon.org.uk \
--to=dhowells@redhat.com \
--cc=ceph-devel@vger.kernel.org \
--cc=gfarnum@redhat.com \
--cc=idryomov@gmail.com \
--cc=jlayton@kernel.org \
--cc=vshankar@redhat.com \
--cc=xiubli@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).