unicorn Ruby/Rack server user+dev discussion/patches/pulls/bugs/help
 help / color / mirror / code / Atom feed
* TeeInput leaks file handles/space
@ 2015-04-22 16:42 Mulvaney, Mike
  2015-04-22 18:38 ` Eric Wong
  0 siblings, 1 reply; 7+ messages in thread
From: Mulvaney, Mike @ 2015-04-22 16:42 UTC (permalink / raw)
  To: unicorn-public@bogomips.org

When I upload large files to my unicorn process, TeeInput is streaming them through a @tmp instance variable which writes to /tmp/0.123456789 (some random number).  The file is immediately deleted but the file handle is not closed, so the files are not really deleted by the file system.

The files are eventually deleted when GC runs, but before GC runs they continue to take up space.  You can see this in lsof output, for example:

ruby       2783   webuser  23u      REG               0,17  6128086    3556917 /tmp/0.04249158625633187 (deleted)

This can cause problems if you have big files and a small /tmp, such as a tmpfs disk mounted in ram.  If someone sends in several 100MB files, you could easily get 2-3 open files for each of 6 unicorn processes, which would take up 1200MB of disk space until GC decides to run.

I looked into fixing this but it doesn't look easy.  I can reach into the TeeInput variable and close out the @tmp instance variable in my application, and that does fix the problem.  But obviously that is not a good solution.  I think there would have to be some kind of "close" method on http_request that would close out all the open resources such as these files.

-Mike

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: TeeInput leaks file handles/space
  2015-04-22 16:42 TeeInput leaks file handles/space Mulvaney, Mike
@ 2015-04-22 18:38 ` Eric Wong
  2015-04-22 19:10   ` Mulvaney, Mike
  0 siblings, 1 reply; 7+ messages in thread
From: Eric Wong @ 2015-04-22 18:38 UTC (permalink / raw)
  To: Mulvaney, Mike; +Cc: unicorn-public@bogomips.org

How about this patch to tee_input?  It won't pollute the common code
path, at least, and unicorn won't support multiple clients/tee_inputs

I might release 4.8.4 with this, but it does break behavior for
hijackers, but 5.0 will have lots of tiny internal incompatibilities
for corner-cases anyways.
-----------------------------8<---------------------------
Subject: [PATCH] tee_input: close temporary file upon allocating the next

This prevents unicorn from using up too much space due to large
uploads while not polluting the common code path of most clients.

This is only fine for unicorn since unicorn itself will never
support multiple clients in the same process.

This may break users of rack.hijack (are there any on unicorn?), but
they can call IO#dup on the file if they're relying on hijack and
server internals like that.

In a perfect world, we could even truncate and rewind to avoid the
FD/inode deallocation/allocations in the kernel; but that would
break any IO#dup workarounds used by hijackers.

Notes: StringIO#close does not release memory immediately, so
there's no point in calling it here

Ref: <CY1PR0301MB078011EB5A22B733EB222A45A4EE0@CY1PR0301MB0780.namprd03.prod.outlook.com>
Reported-by: Mike Mulvaney <MMulvaney@bna.com>
---
 lib/unicorn/tee_input.rb | 15 ++++++++++++++-
 1 file changed, 14 insertions(+), 1 deletion(-)

diff --git a/lib/unicorn/tee_input.rb b/lib/unicorn/tee_input.rb
index 637c583..6ff8210 100644
--- a/lib/unicorn/tee_input.rb
+++ b/lib/unicorn/tee_input.rb
@@ -9,12 +9,17 @@
 # strict interpretation of Rack::Lint::InputWrapper functionality and
 # will not support any deviations from it.
 #
+# This is an internal class meant for Unicorn only, any use beyond
+# what appears in the Rack specificiation is unsupported and subject
+# to change.
+#
 # When processing uploads, Unicorn exposes a TeeInput object under
 # "rack.input" of the Rack environment.
 class Unicorn::TeeInput < Unicorn::StreamInput
   # The maximum size (in +bytes+) to buffer in memory before
   # resorting to a temporary file.  Default is 112 kilobytes.
   @@client_body_buffer_size = Unicorn::Const::MAX_BODY
+  @@cur_tmpio = nil
 
   # sets the maximum size of request bodies to buffer in memory,
   # amounts larger than this are buffered to the filesystem
@@ -28,13 +33,21 @@ class Unicorn::TeeInput < Unicorn::StreamInput
     @@client_body_buffer_size
   end
 
+  # This class is essentially a singleton since unicorn will never support
+  # multiple clients; and we don't want to pollute the common code path
+  # with extra ivars/state
+  def new_tmpio
+    @@cur_tmpio.close if @@cur_tmpio
+    @@cur_tmpio = Unicorn::TmpIO.new
+  end
+
   # Initializes a new TeeInput object.  You normally do not have to call
   # this unless you are writing an HTTP server.
   def initialize(socket, request)
     @len = request.content_length
     super
     @tmp = @len && @len <= @@client_body_buffer_size ?
-           StringIO.new("") : Unicorn::TmpIO.new
+           StringIO.new("") : new_tmpio
   end
 
   # :call-seq:
-- 
EW

^ permalink raw reply related	[flat|nested] 7+ messages in thread

* RE: TeeInput leaks file handles/space
  2015-04-22 18:38 ` Eric Wong
@ 2015-04-22 19:10   ` Mulvaney, Mike
  2015-04-22 19:16     ` Eric Wong
  0 siblings, 1 reply; 7+ messages in thread
From: Mulvaney, Mike @ 2015-04-22 19:10 UTC (permalink / raw)
  To: Eric Wong; +Cc: unicorn-public@bogomips.org

That looks reasonable to me -- this way you would only have one file still open per process at a maximum, right?  I think that's a good solution.

Thanks,

-Mike

-----Original Message-----
From: Eric Wong [mailto:e@80x24.org] 
Sent: Wednesday, April 22, 2015 2:38 PM
To: Mulvaney, Mike
Cc: unicorn-public@bogomips.org
Subject: Re: TeeInput leaks file handles/space

How about this patch to tee_input?  It won't pollute the common code path, at least, and unicorn won't support multiple clients/tee_inputs

I might release 4.8.4 with this, but it does break behavior for hijackers, but 5.0 will have lots of tiny internal incompatibilities for corner-cases anyways.
-----------------------------8<---------------------------
Subject: [PATCH] tee_input: close temporary file upon allocating the next

This prevents unicorn from using up too much space due to large uploads while not polluting the common code path of most clients.

This is only fine for unicorn since unicorn itself will never support multiple clients in the same process.

This may break users of rack.hijack (are there any on unicorn?), but they can call IO#dup on the file if they're relying on hijack and server internals like that.

In a perfect world, we could even truncate and rewind to avoid the FD/inode deallocation/allocations in the kernel; but that would break any IO#dup workarounds used by hijackers.

Notes: StringIO#close does not release memory immediately, so there's no point in calling it here

Ref: <CY1PR0301MB078011EB5A22B733EB222A45A4EE0@CY1PR0301MB0780.namprd03.prod.outlook.com>
Reported-by: Mike Mulvaney <MMulvaney@bna.com>
---
 lib/unicorn/tee_input.rb | 15 ++++++++++++++-
 1 file changed, 14 insertions(+), 1 deletion(-)

diff --git a/lib/unicorn/tee_input.rb b/lib/unicorn/tee_input.rb index 637c583..6ff8210 100644
--- a/lib/unicorn/tee_input.rb
+++ b/lib/unicorn/tee_input.rb
@@ -9,12 +9,17 @@
 # strict interpretation of Rack::Lint::InputWrapper functionality and  # will not support any deviations from it.
 #
+# This is an internal class meant for Unicorn only, any use beyond # 
+what appears in the Rack specificiation is unsupported and subject # to 
+change.
+#
 # When processing uploads, Unicorn exposes a TeeInput object under  # "rack.input" of the Rack environment.
 class Unicorn::TeeInput < Unicorn::StreamInput
   # The maximum size (in +bytes+) to buffer in memory before
   # resorting to a temporary file.  Default is 112 kilobytes.
   @@client_body_buffer_size = Unicorn::Const::MAX_BODY
+  @@cur_tmpio = nil
 
   # sets the maximum size of request bodies to buffer in memory,
   # amounts larger than this are buffered to the filesystem @@ -28,13 +33,21 @@ class Unicorn::TeeInput < Unicorn::StreamInput
     @@client_body_buffer_size
   end
 
+  # This class is essentially a singleton since unicorn will never 
+ support  # multiple clients; and we don't want to pollute the common 
+ code path  # with extra ivars/state  def new_tmpio
+    @@cur_tmpio.close if @@cur_tmpio
+    @@cur_tmpio = Unicorn::TmpIO.new
+  end
+
   # Initializes a new TeeInput object.  You normally do not have to call
   # this unless you are writing an HTTP server.
   def initialize(socket, request)
     @len = request.content_length
     super
     @tmp = @len && @len <= @@client_body_buffer_size ?
-           StringIO.new("") : Unicorn::TmpIO.new
+           StringIO.new("") : new_tmpio
   end
 
   # :call-seq:
--
EW

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: TeeInput leaks file handles/space
  2015-04-22 19:10   ` Mulvaney, Mike
@ 2015-04-22 19:16     ` Eric Wong
  2015-04-22 19:24       ` Mulvaney, Mike
  0 siblings, 1 reply; 7+ messages in thread
From: Eric Wong @ 2015-04-22 19:16 UTC (permalink / raw)
  To: Mulvaney, Mike; +Cc: unicorn-public@bogomips.org

"Mulvaney, Mike" <MMulvaney@bna.com> wrote:
> That looks reasonable to me -- this way you would only have one file
> still open per process at a maximum, right?  I think that's a good
> solution.

Right.

Below is a barely-tested alternative patch for Rack::TempfileReaper,
for Rack 1.6+ users only.  I'm not sure how prevalent 1.6+ was only
released in December 2014...

It's more standardized, but maybe Rack 1.6 isn't prevalent enough, yet.
What do you think?

(Sorry, in a rush, no commit message, yet)

diff --git a/lib/unicorn/tee_input.rb b/lib/unicorn/tee_input.rb
index 637c583..7f6baa2 100644
--- a/lib/unicorn/tee_input.rb
+++ b/lib/unicorn/tee_input.rb
@@ -28,13 +28,20 @@ class Unicorn::TeeInput < Unicorn::StreamInput
     @@client_body_buffer_size
   end
 
+  # for Rack::TempfileReaper in rack 1.6+
+  def new_tmpio
+    tmpio = Unicorn::TmpIO.new
+    (@parser.env['rack.tempfiles'] ||= []) << tmpio
+    tmpio
+  end
+
   # Initializes a new TeeInput object.  You normally do not have to call
   # this unless you are writing an HTTP server.
   def initialize(socket, request)
     @len = request.content_length
     super
     @tmp = @len && @len <= @@client_body_buffer_size ?
-           StringIO.new("") : Unicorn::TmpIO.new
+           StringIO.new("") : new_tmpio
   end
 
   # :call-seq:
diff --git a/lib/unicorn/tmpio.rb b/lib/unicorn/tmpio.rb
index c97979a..db88ed3 100644
--- a/lib/unicorn/tmpio.rb
+++ b/lib/unicorn/tmpio.rb
@@ -21,4 +21,7 @@ class Unicorn::TmpIO < File
     fp.sync = true
     fp
   end
+
+  # pretend we're Tempfile for Rack::TempfileReaper
+  alias close! close
 end

^ permalink raw reply related	[flat|nested] 7+ messages in thread

* RE: TeeInput leaks file handles/space
  2015-04-22 19:16     ` Eric Wong
@ 2015-04-22 19:24       ` Mulvaney, Mike
  2015-04-24  3:08         ` Eric Wong
  0 siblings, 1 reply; 7+ messages in thread
From: Mulvaney, Mike @ 2015-04-22 19:24 UTC (permalink / raw)
  To: Eric Wong; +Cc: unicorn-public@bogomips.org

Oh yeah, I like that even better.

The app I'm currently working on is using Rails 4.1 and Rack 1.5.x.  I don't have any problem with upgrading rack, I just haven't done it yet.

I think it would be reasonable to fix this for Rack 1.6+.  It won't cause any problems for Rack 1.5 users, right?  The environment variable gets set and then ignored, so the app would behave exactly the same way.  If they want to use the new cleanup code, they can upgrade rack.

-Mike

-----Original Message-----
From: Eric Wong [mailto:e@80x24.org] 
Sent: Wednesday, April 22, 2015 3:16 PM
To: Mulvaney, Mike
Cc: unicorn-public@bogomips.org
Subject: Re: TeeInput leaks file handles/space

"Mulvaney, Mike" <MMulvaney@bna.com> wrote:
> That looks reasonable to me -- this way you would only have one file 
> still open per process at a maximum, right?  I think that's a good 
> solution.

Right.

Below is a barely-tested alternative patch for Rack::TempfileReaper, for Rack 1.6+ users only.  I'm not sure how prevalent 1.6+ was only released in December 2014...

It's more standardized, but maybe Rack 1.6 isn't prevalent enough, yet.
What do you think?

(Sorry, in a rush, no commit message, yet)

diff --git a/lib/unicorn/tee_input.rb b/lib/unicorn/tee_input.rb index 637c583..7f6baa2 100644
--- a/lib/unicorn/tee_input.rb
+++ b/lib/unicorn/tee_input.rb
@@ -28,13 +28,20 @@ class Unicorn::TeeInput < Unicorn::StreamInput
     @@client_body_buffer_size
   end
 
+  # for Rack::TempfileReaper in rack 1.6+  def new_tmpio
+    tmpio = Unicorn::TmpIO.new
+    (@parser.env['rack.tempfiles'] ||= []) << tmpio
+    tmpio
+  end
+
   # Initializes a new TeeInput object.  You normally do not have to call
   # this unless you are writing an HTTP server.
   def initialize(socket, request)
     @len = request.content_length
     super
     @tmp = @len && @len <= @@client_body_buffer_size ?
-           StringIO.new("") : Unicorn::TmpIO.new
+           StringIO.new("") : new_tmpio
   end
 
   # :call-seq:
diff --git a/lib/unicorn/tmpio.rb b/lib/unicorn/tmpio.rb index c97979a..db88ed3 100644
--- a/lib/unicorn/tmpio.rb
+++ b/lib/unicorn/tmpio.rb
@@ -21,4 +21,7 @@ class Unicorn::TmpIO < File
     fp.sync = true
     fp
   end
+
+  # pretend we're Tempfile for Rack::TempfileReaper  alias close! close
 end

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: TeeInput leaks file handles/space
  2015-04-22 19:24       ` Mulvaney, Mike
@ 2015-04-24  3:08         ` Eric Wong
  2015-04-24 11:56           ` Mulvaney, Mike
  0 siblings, 1 reply; 7+ messages in thread
From: Eric Wong @ 2015-04-24  3:08 UTC (permalink / raw)
  To: Mulvaney, Mike; +Cc: unicorn-public@bogomips.org

"Mulvaney, Mike" <MMulvaney@bna.com> wrote:
> I think it would be reasonable to fix this for Rack 1.6+.  It won't
> cause any problems for Rack 1.5 users, right?  The environment
> variable gets set and then ignored, so the app would behave exactly
> the same way.  If they want to use the new cleanup code, they can
> upgrade rack.

Right, this only affects 1.6 users.  1.5 users will need to upgrade for
this.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: TeeInput leaks file handles/space
  2015-04-24  3:08         ` Eric Wong
@ 2015-04-24 11:56           ` Mulvaney, Mike
  0 siblings, 0 replies; 7+ messages in thread
From: Mulvaney, Mike @ 2015-04-24 11:56 UTC (permalink / raw)
  To: Eric Wong; +Cc: unicorn-public@bogomips.org

That sounds great. Thanks for fixing this so fast!

-Mike

> On Apr 23, 2015, at 11:08 PM, Eric Wong <e@80x24.org> wrote:
> 
> "Mulvaney, Mike" <MMulvaney@bna.com> wrote:
>> I think it would be reasonable to fix this for Rack 1.6+.  It won't
>> cause any problems for Rack 1.5 users, right?  The environment
>> variable gets set and then ignored, so the app would behave exactly
>> the same way.  If they want to use the new cleanup code, they can
>> upgrade rack.
> 
> Right, this only affects 1.6 users.  1.5 users will need to upgrade for
> this.

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2015-04-24 11:56 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-04-22 16:42 TeeInput leaks file handles/space Mulvaney, Mike
2015-04-22 18:38 ` Eric Wong
2015-04-22 19:10   ` Mulvaney, Mike
2015-04-22 19:16     ` Eric Wong
2015-04-22 19:24       ` Mulvaney, Mike
2015-04-24  3:08         ` Eric Wong
2015-04-24 11:56           ` Mulvaney, Mike

Code repositories for project(s) associated with this public inbox

	https://yhbt.net/unicorn.git/

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).