All the mail mirrored from lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/4] fs: allow userland tasks to use delayed_fput infrastructure
@ 2015-09-14 13:45 ` Jeff Layton
  0 siblings, 0 replies; 15+ messages in thread
From: Jeff Layton @ 2015-09-14 13:45 UTC (permalink / raw)
  To: Al Viro; +Cc: bfields, linux-nfs, linux-fsdevel, linux-kernel

I'm breaking this piece out of the open file cache work for nfsd to see
if we can get this piece settled before I re-post the whole set. If this
looks like a reasonable approach we can sort out how it should be merged
(either by you directly, or via Bruce's tree with the rest of the open
file cache patches).

For those just joining in, some background:

We want to add an open file cache for nfsd to reduce the open/close
overhead on READ/WRITE RPCs, and so we can eliminate the raparm cache.
The basic idea is to keep a cache of open files, and close them down on
certain sorts of activity -- primarily, after an unlink that takes the
link count to 0, or before setting a lease.

The setlease part is problematic though. The plan is to have a notifier
callback into nfsd from vfs_setlease that will tell nfsd to close any
open files that are associated with the inode so we don't block lease
attempts solely due to cached but otherwise idle nfsd files. That means
that we need to be able to close out the files and ensure that the final
__fput runs before we try to set a lease.

My latest pass involved making __fput_sync available to userland tasks,
but Al had concerns that that could lead to stack blowouts. This
patchset is an alternative approach that allows userland tasks to use
the delayed_fput infrastructure instead. The idea is that we'd have the
pre-setlease notifier do a fput_queue() and then call flush_delayed_fput
to ensure that any queued __fput() calls complete before setting the
lease.

There's also a fix for a potential race in flush_delayed_fput in here
and some doc comment cleanups.

Al, does this look more acceptable than using __fput_sync?

Jeff Layton (4):
  fs: have flush_delayed_fput flush the workqueue job
  fs: add a kerneldoc header to fput
  fs: add fput_queue
  fs: export flush_delayed_fput

 fs/file_table.c      | 57 +++++++++++++++++++++++++++++++++++++++++-----------
 include/linux/file.h |  1 +
 2 files changed, 46 insertions(+), 12 deletions(-)

-- 
2.4.3


^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH 0/4] fs: allow userland tasks to use delayed_fput infrastructure
@ 2015-09-14 13:45 ` Jeff Layton
  0 siblings, 0 replies; 15+ messages in thread
From: Jeff Layton @ 2015-09-14 13:45 UTC (permalink / raw)
  To: Al Viro
  Cc: bfields-uC3wQj2KruNg9hUCZPvPmw, linux-nfs-u79uwXL29TY76Z2rM5mHXA,
	linux-fsdevel-u79uwXL29TY76Z2rM5mHXA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA

I'm breaking this piece out of the open file cache work for nfsd to see
if we can get this piece settled before I re-post the whole set. If this
looks like a reasonable approach we can sort out how it should be merged
(either by you directly, or via Bruce's tree with the rest of the open
file cache patches).

For those just joining in, some background:

We want to add an open file cache for nfsd to reduce the open/close
overhead on READ/WRITE RPCs, and so we can eliminate the raparm cache.
The basic idea is to keep a cache of open files, and close them down on
certain sorts of activity -- primarily, after an unlink that takes the
link count to 0, or before setting a lease.

The setlease part is problematic though. The plan is to have a notifier
callback into nfsd from vfs_setlease that will tell nfsd to close any
open files that are associated with the inode so we don't block lease
attempts solely due to cached but otherwise idle nfsd files. That means
that we need to be able to close out the files and ensure that the final
__fput runs before we try to set a lease.

My latest pass involved making __fput_sync available to userland tasks,
but Al had concerns that that could lead to stack blowouts. This
patchset is an alternative approach that allows userland tasks to use
the delayed_fput infrastructure instead. The idea is that we'd have the
pre-setlease notifier do a fput_queue() and then call flush_delayed_fput
to ensure that any queued __fput() calls complete before setting the
lease.

There's also a fix for a potential race in flush_delayed_fput in here
and some doc comment cleanups.

Al, does this look more acceptable than using __fput_sync?

Jeff Layton (4):
  fs: have flush_delayed_fput flush the workqueue job
  fs: add a kerneldoc header to fput
  fs: add fput_queue
  fs: export flush_delayed_fput

 fs/file_table.c      | 57 +++++++++++++++++++++++++++++++++++++++++-----------
 include/linux/file.h |  1 +
 2 files changed, 46 insertions(+), 12 deletions(-)

-- 
2.4.3

--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH 1/4] fs: have flush_delayed_fput flush the workqueue job
  2015-09-14 13:45 ` Jeff Layton
  (?)
@ 2015-09-14 13:45 ` Jeff Layton
  -1 siblings, 0 replies; 15+ messages in thread
From: Jeff Layton @ 2015-09-14 13:45 UTC (permalink / raw)
  To: Al Viro; +Cc: bfields, linux-nfs, linux-fsdevel, linux-kernel

I think there's a potential race in flush_delayed_fput. A kthread does
an fput() and that file gets added to the list and the delayed work is
scheduled. More than 1 jiffy passes, and the workqueue thread picks up
the work and starts running it. Then the kthread calls
flush_delayed_work.  It sees that the list is empty and returns
immediately, even though the __fput for its file may not have run yet.

Close this by making flush_delayed_fput use flush_delayed_work instead,
which should immediately schedule the work to run if it's not already,
and block until the workqueue job completes.

Signed-off-by: Jeff Layton <jeff.layton@primarydata.com>
---
 fs/file_table.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/fs/file_table.c b/fs/file_table.c
index 7f9d407c7595..f4833af62eae 100644
--- a/fs/file_table.c
+++ b/fs/file_table.c
@@ -243,6 +243,8 @@ static void ____fput(struct callback_head *work)
 	__fput(container_of(work, struct file, f_u.fu_rcuhead));
 }
 
+static DECLARE_DELAYED_WORK(delayed_fput_work, delayed_fput);
+
 /*
  * If kernel thread really needs to have the final fput() it has done
  * to complete, call this.  The only user right now is the boot - we
@@ -255,11 +257,9 @@ static void ____fput(struct callback_head *work)
  */
 void flush_delayed_fput(void)
 {
-	delayed_fput(NULL);
+	flush_delayed_work(&delayed_fput_work);
 }
 
-static DECLARE_DELAYED_WORK(delayed_fput_work, delayed_fput);
-
 void fput(struct file *file)
 {
 	if (atomic_long_dec_and_test(&file->f_count)) {
-- 
2.4.3


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH 2/4] fs: add a kerneldoc header to fput
  2015-09-14 13:45 ` Jeff Layton
  (?)
  (?)
@ 2015-09-14 13:45 ` Jeff Layton
  -1 siblings, 0 replies; 15+ messages in thread
From: Jeff Layton @ 2015-09-14 13:45 UTC (permalink / raw)
  To: Al Viro; +Cc: bfields, linux-nfs, linux-fsdevel, linux-kernel

...and move its EXPORT_SYMBOL just below the function.

Signed-off-by: Jeff Layton <jeff.layton@primarydata.com>
---
 fs/file_table.c | 17 ++++++++++++++++-
 1 file changed, 16 insertions(+), 1 deletion(-)

diff --git a/fs/file_table.c b/fs/file_table.c
index f4833af62eae..d63f4a399d39 100644
--- a/fs/file_table.c
+++ b/fs/file_table.c
@@ -260,6 +260,21 @@ void flush_delayed_fput(void)
 	flush_delayed_work(&delayed_fput_work);
 }
 
+/**
+ * fput - put a struct file reference
+ * @file: file of which to put the reference
+ *
+ * This function decrements the reference count for the struct file reference,
+ * and queues it up for destruction if the count goes to zero. In the case of
+ * most tasks we queue it to the task_work infrastructure, which will be run
+ * just before the task returns back to userspace. kthreads however never
+ * return to userspace, so for those we add them to a global list and schedule
+ * a delayed workqueue job to do the work of putting them.
+ *
+ * Why not just do it synchronously? __fput can be quite stack intensive, so
+ * doing a final fput has the possibility of blowing up if we don't take steps
+ * to ensure that we have enough stack space to make it work.
+ */
 void fput(struct file *file)
 {
 	if (atomic_long_dec_and_test(&file->f_count)) {
@@ -280,6 +295,7 @@ void fput(struct file *file)
 			schedule_delayed_work(&delayed_fput_work, 1);
 	}
 }
+EXPORT_SYMBOL(fput);
 
 /*
  * synchronous analog of fput(); for kernel threads that might be needed
@@ -298,7 +314,6 @@ void __fput_sync(struct file *file)
 	}
 }
 
-EXPORT_SYMBOL(fput);
 
 void put_filp(struct file *file)
 {
-- 
2.4.3


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH 3/4] fs: add fput_queue
@ 2015-09-14 13:45   ` Jeff Layton
  0 siblings, 0 replies; 15+ messages in thread
From: Jeff Layton @ 2015-09-14 13:45 UTC (permalink / raw)
  To: Al Viro; +Cc: bfields, linux-nfs, linux-fsdevel, linux-kernel

Signed-off-by: Jeff Layton <jeff.layton@primarydata.com>
---
 fs/file_table.c      | 18 ++++++++++++++++++
 include/linux/file.h |  1 +
 2 files changed, 19 insertions(+)

diff --git a/fs/file_table.c b/fs/file_table.c
index d63f4a399d39..1ad2e3fd2064 100644
--- a/fs/file_table.c
+++ b/fs/file_table.c
@@ -297,6 +297,24 @@ void fput(struct file *file)
 }
 EXPORT_SYMBOL(fput);
 
+/**
+ * fput_queue - do an fput without using task_work
+ * @file: file of which to put the reference
+ *
+ * If we need to ensure that the final __fput is done on a file before
+ * returning to userland, then we can't queue it to task_work. For that we
+ * borrow the infrastructure used by kthreads, and the task can then just
+ * called flush_delayed_fput to ensure that the final fput has completed.
+ */
+void fput_queue(struct file *file)
+{
+	if (atomic_long_dec_and_test(&file->f_count)) {
+		if (llist_add(&file->f_u.fu_llist, &delayed_fput_list))
+			schedule_delayed_work(&delayed_fput_work, 1);
+	}
+}
+EXPORT_SYMBOL(fput_queue);
+
 /*
  * synchronous analog of fput(); for kernel threads that might be needed
  * in some umount() (and thus can't use flush_delayed_fput() without
diff --git a/include/linux/file.h b/include/linux/file.h
index f87d30882a24..543b0e2faf2c 100644
--- a/include/linux/file.h
+++ b/include/linux/file.h
@@ -12,6 +12,7 @@
 struct file;
 
 extern void fput(struct file *);
+extern void fput_queue(struct file *);
 
 struct file_operations;
 struct vfsmount;
-- 
2.4.3


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH 3/4] fs: add fput_queue
@ 2015-09-14 13:45   ` Jeff Layton
  0 siblings, 0 replies; 15+ messages in thread
From: Jeff Layton @ 2015-09-14 13:45 UTC (permalink / raw)
  To: Al Viro
  Cc: bfields-uC3wQj2KruNg9hUCZPvPmw, linux-nfs-u79uwXL29TY76Z2rM5mHXA,
	linux-fsdevel-u79uwXL29TY76Z2rM5mHXA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA

Signed-off-by: Jeff Layton <jeff.layton-7I+n7zu2hftEKMMhf/gKZA@public.gmane.org>
---
 fs/file_table.c      | 18 ++++++++++++++++++
 include/linux/file.h |  1 +
 2 files changed, 19 insertions(+)

diff --git a/fs/file_table.c b/fs/file_table.c
index d63f4a399d39..1ad2e3fd2064 100644
--- a/fs/file_table.c
+++ b/fs/file_table.c
@@ -297,6 +297,24 @@ void fput(struct file *file)
 }
 EXPORT_SYMBOL(fput);
 
+/**
+ * fput_queue - do an fput without using task_work
+ * @file: file of which to put the reference
+ *
+ * If we need to ensure that the final __fput is done on a file before
+ * returning to userland, then we can't queue it to task_work. For that we
+ * borrow the infrastructure used by kthreads, and the task can then just
+ * called flush_delayed_fput to ensure that the final fput has completed.
+ */
+void fput_queue(struct file *file)
+{
+	if (atomic_long_dec_and_test(&file->f_count)) {
+		if (llist_add(&file->f_u.fu_llist, &delayed_fput_list))
+			schedule_delayed_work(&delayed_fput_work, 1);
+	}
+}
+EXPORT_SYMBOL(fput_queue);
+
 /*
  * synchronous analog of fput(); for kernel threads that might be needed
  * in some umount() (and thus can't use flush_delayed_fput() without
diff --git a/include/linux/file.h b/include/linux/file.h
index f87d30882a24..543b0e2faf2c 100644
--- a/include/linux/file.h
+++ b/include/linux/file.h
@@ -12,6 +12,7 @@
 struct file;
 
 extern void fput(struct file *);
+extern void fput_queue(struct file *);
 
 struct file_operations;
 struct vfsmount;
-- 
2.4.3

--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH 4/4] fs: export flush_delayed_fput
  2015-09-14 13:45 ` Jeff Layton
                   ` (3 preceding siblings ...)
  (?)
@ 2015-09-14 13:45 ` Jeff Layton
  -1 siblings, 0 replies; 15+ messages in thread
From: Jeff Layton @ 2015-09-14 13:45 UTC (permalink / raw)
  To: Al Viro; +Cc: bfields, linux-nfs, linux-fsdevel, linux-kernel

...and clean up the comments over it a bit.

Signed-off-by: Jeff Layton <jeff.layton@primarydata.com>
---
 fs/file_table.c | 16 ++++++++--------
 1 file changed, 8 insertions(+), 8 deletions(-)

diff --git a/fs/file_table.c b/fs/file_table.c
index 1ad2e3fd2064..2b145b513274 100644
--- a/fs/file_table.c
+++ b/fs/file_table.c
@@ -246,19 +246,19 @@ static void ____fput(struct callback_head *work)
 static DECLARE_DELAYED_WORK(delayed_fput_work, delayed_fput);
 
 /*
- * If kernel thread really needs to have the final fput() it has done
- * to complete, call this.  The only user right now is the boot - we
- * *do* need to make sure our writes to binaries on initramfs has
- * not left us with opened struct file waiting for __fput() - execve()
- * won't work without that.  Please, don't add more callers without
- * very good reasons; in particular, never call that with locks
- * held and never call that from a thread that might need to do
- * some work on any kind of umount.
+ * If kernel thread or task that has used fput_queue really needs to have the
+ * final fput() it has done to complete, call this. The only user right now is
+ * the boot - we *do* need to make sure our writes to binaries on initramfs has
+ * not left us with opened struct file waiting for __fput() - execve() won't
+ * work without that.  Please, don't add more callers without very good
+ * reasons; in particular, never call that with locks held and never call that
+ * from a thread that might need to do some work on any kind of umount.
  */
 void flush_delayed_fput(void)
 {
 	flush_delayed_work(&delayed_fput_work);
 }
+EXPORT_SYMBOL(flush_delayed_fput);
 
 /**
  * fput - put a struct file reference
-- 
2.4.3


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [PATCH 3/4] fs: add fput_queue
  2015-09-14 13:45   ` Jeff Layton
  (?)
@ 2015-09-14 14:15   ` Al Viro
  2015-09-14 14:19     ` Jeff Layton
  -1 siblings, 1 reply; 15+ messages in thread
From: Al Viro @ 2015-09-14 14:15 UTC (permalink / raw)
  To: Jeff Layton; +Cc: bfields, linux-nfs, linux-fsdevel, linux-kernel

On Mon, Sep 14, 2015 at 09:45:54AM -0400, Jeff Layton wrote:
> +/**
> + * fput_queue - do an fput without using task_work
> + * @file: file of which to put the reference
> + *
> + * If we need to ensure that the final __fput is done on a file before
> + * returning to userland, then we can't queue it to task_work. For that we
                                                       ?????????
> + * borrow the infrastructure used by kthreads, and the task can then just
> + * called flush_delayed_fput to ensure that the final fput has completed.

Are you sure that it's not a typo?

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 3/4] fs: add fput_queue
  2015-09-14 14:15   ` Al Viro
@ 2015-09-14 14:19     ` Jeff Layton
  2015-09-14 16:39       ` Al Viro
  0 siblings, 1 reply; 15+ messages in thread
From: Jeff Layton @ 2015-09-14 14:19 UTC (permalink / raw)
  To: Al Viro; +Cc: bfields, linux-nfs, linux-fsdevel, linux-kernel

On Mon, 14 Sep 2015 15:15:39 +0100
Al Viro <viro@ZenIV.linux.org.uk> wrote:

> On Mon, Sep 14, 2015 at 09:45:54AM -0400, Jeff Layton wrote:
> > +/**
> > + * fput_queue - do an fput without using task_work
> > + * @file: file of which to put the reference
> > + *
> > + * If we need to ensure that the final __fput is done on a file before
> > + * returning to userland, then we can't queue it to task_work. For that we
>                                                        ?????????
> > + * borrow the infrastructure used by kthreads, and the task can then just
> > + * called flush_delayed_fput to ensure that the final fput has completed.
> 
> Are you sure that it's not a typo?

I don't think so, but it could be clearer. Something like this maybe?

     "then we can't queue it via task_work_add."

-- 
Jeff Layton <jlayton@poochiereds.net>

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 0/4] fs: allow userland tasks to use delayed_fput infrastructure
@ 2015-09-14 14:48   ` J. Bruce Fields
  0 siblings, 0 replies; 15+ messages in thread
From: J. Bruce Fields @ 2015-09-14 14:48 UTC (permalink / raw)
  To: Jeff Layton; +Cc: Al Viro, linux-nfs, linux-fsdevel, linux-kernel

On Mon, Sep 14, 2015 at 09:45:51AM -0400, Jeff Layton wrote:
> I'm breaking this piece out of the open file cache work for nfsd to see
> if we can get this piece settled before I re-post the whole set. If this
> looks like a reasonable approach we can sort out how it should be merged
> (either by you directly, or via Bruce's tree with the rest of the open
> file cache patches).
> 
> For those just joining in, some background:
> 
> We want to add an open file cache for nfsd to reduce the open/close
> overhead on READ/WRITE RPCs, and so we can eliminate the raparm cache.
> The basic idea is to keep a cache of open files, and close them down on
> certain sorts of activity -- primarily, after an unlink that takes the
> link count to 0, or before setting a lease.
> 
> The setlease part is problematic though. The plan is to have a notifier
> callback into nfsd from vfs_setlease that will tell nfsd to close any
> open files that are associated with the inode so we don't block lease
> attempts solely due to cached but otherwise idle nfsd files. That means
> that we need to be able to close out the files and ensure that the final
> __fput runs before we try to set a lease.

I think I probably asked something similar before, but just to be sure I
understand....  Do leases really need to be 100% reliable, or can we get
away with saying "sorry, I don't feel like granting one right now".  An
entry in the filehandle cache suggests we're likely to recall the thing
soon anyway.  We use that option to get out of corner cases in the
delegation case, but I don't know if it makes sense for oplocks.

--b.

> 
> My latest pass involved making __fput_sync available to userland tasks,
> but Al had concerns that that could lead to stack blowouts. This
> patchset is an alternative approach that allows userland tasks to use
> the delayed_fput infrastructure instead. The idea is that we'd have the
> pre-setlease notifier do a fput_queue() and then call flush_delayed_fput
> to ensure that any queued __fput() calls complete before setting the
> lease.
> 
> There's also a fix for a potential race in flush_delayed_fput in here
> and some doc comment cleanups.
> 
> Al, does this look more acceptable than using __fput_sync?
> 
> Jeff Layton (4):
>   fs: have flush_delayed_fput flush the workqueue job
>   fs: add a kerneldoc header to fput
>   fs: add fput_queue
>   fs: export flush_delayed_fput
> 
>  fs/file_table.c      | 57 +++++++++++++++++++++++++++++++++++++++++-----------
>  include/linux/file.h |  1 +
>  2 files changed, 46 insertions(+), 12 deletions(-)
> 
> -- 
> 2.4.3

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 0/4] fs: allow userland tasks to use delayed_fput infrastructure
@ 2015-09-14 14:48   ` J. Bruce Fields
  0 siblings, 0 replies; 15+ messages in thread
From: J. Bruce Fields @ 2015-09-14 14:48 UTC (permalink / raw)
  To: Jeff Layton
  Cc: Al Viro, linux-nfs-u79uwXL29TY76Z2rM5mHXA,
	linux-fsdevel-u79uwXL29TY76Z2rM5mHXA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA

On Mon, Sep 14, 2015 at 09:45:51AM -0400, Jeff Layton wrote:
> I'm breaking this piece out of the open file cache work for nfsd to see
> if we can get this piece settled before I re-post the whole set. If this
> looks like a reasonable approach we can sort out how it should be merged
> (either by you directly, or via Bruce's tree with the rest of the open
> file cache patches).
> 
> For those just joining in, some background:
> 
> We want to add an open file cache for nfsd to reduce the open/close
> overhead on READ/WRITE RPCs, and so we can eliminate the raparm cache.
> The basic idea is to keep a cache of open files, and close them down on
> certain sorts of activity -- primarily, after an unlink that takes the
> link count to 0, or before setting a lease.
> 
> The setlease part is problematic though. The plan is to have a notifier
> callback into nfsd from vfs_setlease that will tell nfsd to close any
> open files that are associated with the inode so we don't block lease
> attempts solely due to cached but otherwise idle nfsd files. That means
> that we need to be able to close out the files and ensure that the final
> __fput runs before we try to set a lease.

I think I probably asked something similar before, but just to be sure I
understand....  Do leases really need to be 100% reliable, or can we get
away with saying "sorry, I don't feel like granting one right now".  An
entry in the filehandle cache suggests we're likely to recall the thing
soon anyway.  We use that option to get out of corner cases in the
delegation case, but I don't know if it makes sense for oplocks.

--b.

> 
> My latest pass involved making __fput_sync available to userland tasks,
> but Al had concerns that that could lead to stack blowouts. This
> patchset is an alternative approach that allows userland tasks to use
> the delayed_fput infrastructure instead. The idea is that we'd have the
> pre-setlease notifier do a fput_queue() and then call flush_delayed_fput
> to ensure that any queued __fput() calls complete before setting the
> lease.
> 
> There's also a fix for a potential race in flush_delayed_fput in here
> and some doc comment cleanups.
> 
> Al, does this look more acceptable than using __fput_sync?
> 
> Jeff Layton (4):
>   fs: have flush_delayed_fput flush the workqueue job
>   fs: add a kerneldoc header to fput
>   fs: add fput_queue
>   fs: export flush_delayed_fput
> 
>  fs/file_table.c      | 57 +++++++++++++++++++++++++++++++++++++++++-----------
>  include/linux/file.h |  1 +
>  2 files changed, 46 insertions(+), 12 deletions(-)
> 
> -- 
> 2.4.3
--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 0/4] fs: allow userland tasks to use delayed_fput infrastructure
@ 2015-09-14 15:21     ` Jeff Layton
  0 siblings, 0 replies; 15+ messages in thread
From: Jeff Layton @ 2015-09-14 15:21 UTC (permalink / raw)
  To: J. Bruce Fields; +Cc: Al Viro, linux-nfs, linux-fsdevel, linux-kernel

On Mon, 14 Sep 2015 10:48:37 -0400
"J. Bruce Fields" <bfields@fieldses.org> wrote:

> On Mon, Sep 14, 2015 at 09:45:51AM -0400, Jeff Layton wrote:
> > I'm breaking this piece out of the open file cache work for nfsd to see
> > if we can get this piece settled before I re-post the whole set. If this
> > looks like a reasonable approach we can sort out how it should be merged
> > (either by you directly, or via Bruce's tree with the rest of the open
> > file cache patches).
> > 
> > For those just joining in, some background:
> > 
> > We want to add an open file cache for nfsd to reduce the open/close
> > overhead on READ/WRITE RPCs, and so we can eliminate the raparm cache.
> > The basic idea is to keep a cache of open files, and close them down on
> > certain sorts of activity -- primarily, after an unlink that takes the
> > link count to 0, or before setting a lease.
> > 
> > The setlease part is problematic though. The plan is to have a notifier
> > callback into nfsd from vfs_setlease that will tell nfsd to close any
> > open files that are associated with the inode so we don't block lease
> > attempts solely due to cached but otherwise idle nfsd files. That means
> > that we need to be able to close out the files and ensure that the final
> > __fput runs before we try to set a lease.
> 
> I think I probably asked something similar before, but just to be sure I
> understand....  Do leases really need to be 100% reliable, or can we get
> away with saying "sorry, I don't feel like granting one right now".  An
> entry in the filehandle cache suggests we're likely to recall the thing
> soon anyway.  We use that option to get out of corner cases in the
> delegation case, but I don't know if it makes sense for oplocks.
> 


They don't need to be 100% reliable, but with the current design nfsd
will hold files open in the cache indefinitely, until one of the
following events occurs:

1) the exports cache is flushed (which is always done after unexporting)

2) an unlink event occurs that drops the i_nlink count to zero

3) userland attempts to set a lease

4) the shrinker kicks in

5) nfsd is shut down

So you could easily have a situation where a NFSv3 client does some
WRITE activity, and then an hour later samba comes along and asks for a
lease. I don't think we'd want to block the lease in that situation as
there's no reason to believe that we'd end up recalling it anytime
soon. The NFS client may be long gone at that point.

We could implement some heuristic that proactively closes out open
files that are idle for a certain amount of time. My first pass did
just that actually, but Christoph didn't much care for it, and I think
he was right. That's not as good a design as just keeping them open
until there's a real reason to close them.

-- 
Jeff Layton <jlayton@poochiereds.net>

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 0/4] fs: allow userland tasks to use delayed_fput infrastructure
@ 2015-09-14 15:21     ` Jeff Layton
  0 siblings, 0 replies; 15+ messages in thread
From: Jeff Layton @ 2015-09-14 15:21 UTC (permalink / raw)
  To: J. Bruce Fields
  Cc: Al Viro, linux-nfs-u79uwXL29TY76Z2rM5mHXA,
	linux-fsdevel-u79uwXL29TY76Z2rM5mHXA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA

On Mon, 14 Sep 2015 10:48:37 -0400
"J. Bruce Fields" <bfields-uC3wQj2KruNg9hUCZPvPmw@public.gmane.org> wrote:

> On Mon, Sep 14, 2015 at 09:45:51AM -0400, Jeff Layton wrote:
> > I'm breaking this piece out of the open file cache work for nfsd to see
> > if we can get this piece settled before I re-post the whole set. If this
> > looks like a reasonable approach we can sort out how it should be merged
> > (either by you directly, or via Bruce's tree with the rest of the open
> > file cache patches).
> > 
> > For those just joining in, some background:
> > 
> > We want to add an open file cache for nfsd to reduce the open/close
> > overhead on READ/WRITE RPCs, and so we can eliminate the raparm cache.
> > The basic idea is to keep a cache of open files, and close them down on
> > certain sorts of activity -- primarily, after an unlink that takes the
> > link count to 0, or before setting a lease.
> > 
> > The setlease part is problematic though. The plan is to have a notifier
> > callback into nfsd from vfs_setlease that will tell nfsd to close any
> > open files that are associated with the inode so we don't block lease
> > attempts solely due to cached but otherwise idle nfsd files. That means
> > that we need to be able to close out the files and ensure that the final
> > __fput runs before we try to set a lease.
> 
> I think I probably asked something similar before, but just to be sure I
> understand....  Do leases really need to be 100% reliable, or can we get
> away with saying "sorry, I don't feel like granting one right now".  An
> entry in the filehandle cache suggests we're likely to recall the thing
> soon anyway.  We use that option to get out of corner cases in the
> delegation case, but I don't know if it makes sense for oplocks.
> 


They don't need to be 100% reliable, but with the current design nfsd
will hold files open in the cache indefinitely, until one of the
following events occurs:

1) the exports cache is flushed (which is always done after unexporting)

2) an unlink event occurs that drops the i_nlink count to zero

3) userland attempts to set a lease

4) the shrinker kicks in

5) nfsd is shut down

So you could easily have a situation where a NFSv3 client does some
WRITE activity, and then an hour later samba comes along and asks for a
lease. I don't think we'd want to block the lease in that situation as
there's no reason to believe that we'd end up recalling it anytime
soon. The NFS client may be long gone at that point.

We could implement some heuristic that proactively closes out open
files that are idle for a certain amount of time. My first pass did
just that actually, but Christoph didn't much care for it, and I think
he was right. That's not as good a design as just keeping them open
until there's a real reason to close them.

-- 
Jeff Layton <jlayton-vpEMnDpepFuMZCB2o+C8xQ@public.gmane.org>
--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 3/4] fs: add fput_queue
  2015-09-14 14:19     ` Jeff Layton
@ 2015-09-14 16:39       ` Al Viro
  2015-09-14 17:30         ` Jeff Layton
  0 siblings, 1 reply; 15+ messages in thread
From: Al Viro @ 2015-09-14 16:39 UTC (permalink / raw)
  To: Jeff Layton; +Cc: bfields, linux-nfs, linux-fsdevel, linux-kernel

On Mon, Sep 14, 2015 at 10:19:18AM -0400, Jeff Layton wrote:
> > > + * borrow the infrastructure used by kthreads, and the task can then just
> > > + * called flush_delayed_fput to ensure that the final fput has completed.
> > 
> > Are you sure that it's not a typo?
> 
> I don't think so, but it could be clearer. Something like this maybe?
> 
>      "then we can't queue it via task_work_add."

Huh?

task_work_add() callbacks *will* run before we return to userland

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 3/4] fs: add fput_queue
  2015-09-14 16:39       ` Al Viro
@ 2015-09-14 17:30         ` Jeff Layton
  0 siblings, 0 replies; 15+ messages in thread
From: Jeff Layton @ 2015-09-14 17:30 UTC (permalink / raw)
  To: Al Viro; +Cc: bfields, linux-nfs, linux-fsdevel, linux-kernel

On Mon, 14 Sep 2015 17:39:54 +0100
Al Viro <viro@ZenIV.linux.org.uk> wrote:

> On Mon, Sep 14, 2015 at 10:19:18AM -0400, Jeff Layton wrote:
> > > > + * borrow the infrastructure used by kthreads, and the task can then just
> > > > + * called flush_delayed_fput to ensure that the final fput has completed.
> > > 
> > > Are you sure that it's not a typo?
> > 
> > I don't think so, but it could be clearer. Something like this maybe?
> > 
> >      "then we can't queue it via task_work_add."
> 
> Huh?
> 
> task_work_add() callbacks *will* run before we return to userland

Right, but only just before. We need it to run before we try to set the
lease in the context of a fcntl() call. How about this text instead
then? I'll fix up the patch if this sounds reasonable:

"When fput is called in the context of a userland process, it'll queue
the actual work (__fput()) to be done just before returning to userland. In
some cases however, we need to ensure that the __fput runs before that
point. There is no safe way to flush work that has been queued via
task_work_add however, so to do this we borrow the delayed_fput
infrastructure that kthreads use. The userland process can use
fput_queue() on one or more struct files and then call
flush_delayed_fput() to ensure that they are completely closed."

-- 
Jeff Layton <jlayton@poochiereds.net>

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2015-09-14 17:30 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-09-14 13:45 [PATCH 0/4] fs: allow userland tasks to use delayed_fput infrastructure Jeff Layton
2015-09-14 13:45 ` Jeff Layton
2015-09-14 13:45 ` [PATCH 1/4] fs: have flush_delayed_fput flush the workqueue job Jeff Layton
2015-09-14 13:45 ` [PATCH 2/4] fs: add a kerneldoc header to fput Jeff Layton
2015-09-14 13:45 ` [PATCH 3/4] fs: add fput_queue Jeff Layton
2015-09-14 13:45   ` Jeff Layton
2015-09-14 14:15   ` Al Viro
2015-09-14 14:19     ` Jeff Layton
2015-09-14 16:39       ` Al Viro
2015-09-14 17:30         ` Jeff Layton
2015-09-14 13:45 ` [PATCH 4/4] fs: export flush_delayed_fput Jeff Layton
2015-09-14 14:48 ` [PATCH 0/4] fs: allow userland tasks to use delayed_fput infrastructure J. Bruce Fields
2015-09-14 14:48   ` J. Bruce Fields
2015-09-14 15:21   ` Jeff Layton
2015-09-14 15:21     ` Jeff Layton

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.