Rainbows! Rack HTTP server user/dev discussion
 help / color / mirror / code / Atom feed
* streaming input for large requests
@ 2010-08-10 22:23 John Leach
  2010-08-10 23:25 ` Eric Wong
  0 siblings, 1 reply; 4+ messages in thread
From: John Leach @ 2010-08-10 22:23 UTC (permalink / raw)
  To: rainbows-talk-GrnCvJ7WPxnNLxjTenLetw

Hi,

I'm looking to be able to get access to the request body as it is
available on the socket, so I can process uploads on the fly, as they
stream in.

The docs suggest this is possible with rack.input:

"Exposes a streaming “rack.input“ to the Rack application that reads
data off the socket as the application reads it (while retaining
rewindable semantics as required by Rack). This allows Rack-compliant
apps/middleware to implement things such as real-time upload progress
monitoring."

But to be rewindable, I'm assuming they're being stored somewhere?  I'd
like to be able to handle huge request bodies bit by bit without having
them written to disk (or worse, stored in ram!).  Is there some way to
do this?

Thanks,

John.



_______________________________________________
Rainbows! mailing list - rainbows-talk@rubyforge.org
http://rubyforge.org/mailman/listinfo/rainbows-talk
Do not quote signatures (like this one) or top post when replying

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: streaming input for large requests
  2010-08-10 22:23 streaming input for large requests John Leach
@ 2010-08-10 23:25 ` Eric Wong
       [not found]   ` <20100810232527.GA14486-yBiyF41qdooeIZ0/mPfg9Q@public.gmane.org>
  0 siblings, 1 reply; 4+ messages in thread
From: Eric Wong @ 2010-08-10 23:25 UTC (permalink / raw)
  To: Rainbows! list

John Leach <john-6Yy1h0nn+LyySYScBAwQ8Q@public.gmane.org> wrote:
> Hi,
> 
> I'm looking to be able to get access to the request body as it is
> available on the socket, so I can process uploads on the fly, as they
> stream in.

Hi John,

Cool!  If you need some example code, you should check out upr,
http://upr.bogomips.org/  Sadly the demo machine is down, but the one
application I helped somebody write (on a private LAN somewhere) still
works well :)

There's also the test/examples in the Rainbows! source tree:

  t/sha1*.ru
  t/content-md5.ru

> But to be rewindable, I'm assuming they're being stored somewhere?  I'd
> like to be able to handle huge request bodies bit by bit without having
> them written to disk (or worse, stored in ram!).  Is there some way to
> do this?

Yes, we store uploads to an unlinked temporary file if the body is
larger than 112 Kbytes (this threshold was established by Mongrel back
in the day).

Rack currently requires rewindability, but this requirement will
most likely be optional in Rack 2.x, and we'll update our code
to match, then.

Meanwhile, you can either:

1. Write a module to disable writes to tmp for the Unicorn::TeeInput
   class (or monkey patch it) it.

2. Without loading Rack::Lint (or anything that wraps env["rack.input"]):
   Redirect the temporary file to /dev/null:

    input = env["rack.input"]
    if input.respond_to?(:tmp)
      tmp = input.tmp
      # StringIO is used for bodies <112K, can't reopen those
      tmp.respond_to?(:reopen) and tmp.reopen('/dev/null', 'wb')
    end

-- 
Eric Wong
_______________________________________________
Rainbows! mailing list - rainbows-talk-GrnCvJ7WPxnNLxjTenLetw@public.gmane.org
http://rubyforge.org/mailman/listinfo/rainbows-talk
Do not quote signatures (like this one) or top post when replying


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: streaming input for large requests
       [not found]   ` <20100810232527.GA14486-yBiyF41qdooeIZ0/mPfg9Q@public.gmane.org>
@ 2010-08-11  8:54     ` John Leach
  2010-08-11 22:41     ` John Leach
  1 sibling, 0 replies; 4+ messages in thread
From: John Leach @ 2010-08-11  8:54 UTC (permalink / raw)
  To: Rainbows! list

On Tue, 2010-08-10 at 16:25 -0700, Eric Wong wrote:

> Yes, we store uploads to an unlinked temporary file if the body is
> larger than 112 Kbytes (this threshold was established by Mongrel back
> in the day).
> 
> Rack currently requires rewindability, but this requirement will
> most likely be optional in Rack 2.x, and we'll update our code
> to match, then.
> 
> Meanwhile, you can either:
> 
> 1. Write a module to disable writes to tmp for the Unicorn::TeeInput
>    class (or monkey patch it) it.
> 
> 2. Without loading Rack::Lint (or anything that wraps env["rack.input"]):
>    Redirect the temporary file to /dev/null:
> 
>     input = env["rack.input"]
>     if input.respond_to?(:tmp)
>       tmp = input.tmp
>       # StringIO is used for bodies <112K, can't reopen those
>       tmp.respond_to?(:reopen) and tmp.reopen('/dev/null', 'wb')
>     end
> 

thanks for the detailed response Eric.  I'll give this a go.

John.



_______________________________________________
Rainbows! mailing list - rainbows-talk-GrnCvJ7WPxnNLxjTenLetw@public.gmane.org
http://rubyforge.org/mailman/listinfo/rainbows-talk
Do not quote signatures (like this one) or top post when replying


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: streaming input for large requests
       [not found]   ` <20100810232527.GA14486-yBiyF41qdooeIZ0/mPfg9Q@public.gmane.org>
  2010-08-11  8:54     ` John Leach
@ 2010-08-11 22:41     ` John Leach
  1 sibling, 0 replies; 4+ messages in thread
From: John Leach @ 2010-08-11 22:41 UTC (permalink / raw)
  To: Rainbows! list

On Tue, 2010-08-10 at 16:25 -0700, Eric Wong wrote: 
> John Leach <john-6Yy1h0nn+LyySYScBAwQ8Q@public.gmane.org> wrote:
> > Hi,
> > 
> > I'm looking to be able to get access to the request body as it is
> > available on the socket, so I can process uploads on the fly, as they
> > stream in.
> 
> Hi John,
> 
> Cool!  If you need some example code, you should check out upr,
> http://upr.bogomips.org/  Sadly the demo machine is down, but the one
> application I helped somebody write (on a private LAN somewhere) still
> works well :)
> 
> There's also the test/examples in the Rainbows! source tree:
> 
>   t/sha1*.ru
>   t/content-md5.ru
> 
> > But to be rewindable, I'm assuming they're being stored somewhere?  I'd
> > like to be able to handle huge request bodies bit by bit without having
> > them written to disk (or worse, stored in ram!).  Is there some way to
> > do this?
> 
> Yes, we store uploads to an unlinked temporary file if the body is
> larger than 112 Kbytes (this threshold was established by Mongrel back
> in the day).
> 
> Rack currently requires rewindability, but this requirement will
> most likely be optional in Rack 2.x, and we'll update our code
> to match, then.
> 
> Meanwhile, you can either:
> 
> 1. Write a module to disable writes to tmp for the Unicorn::TeeInput
>    class (or monkey patch it) it.
> 
> 2. Without loading Rack::Lint (or anything that wraps env["rack.input"]):
>    Redirect the temporary file to /dev/null:
> 
>     input = env["rack.input"]
>     if input.respond_to?(:tmp)
>       tmp = input.tmp
>       # StringIO is used for bodies <112K, can't reopen those
>       tmp.respond_to?(:reopen) and tmp.reopen('/dev/null', 'wb')
>     end
> 

I knocked together a little test app to do this as you suggested, it
works a treat:

http://gist.github.com/519915

Thanks again Eric!

John.

#
# This little test app generates SHA1 hashes for HTTP uploads on the
# fly, without storing them on disk.
# By John Leach <john-vU9/idRNkX5kbu+0n/iG1Q@public.gmane.org> (with help from Eric Wong)
#
# Start the server like this:
#
#  rainbows -c rainbows.conf.rb rainbows-sha1.ru
#
# I've been testing this with Revactor, which requires Ruby 1.9 
# 
# Use with the following rainbows.conf.rb:
#
#  ENV['RACK_ENV'] = nil # we don't want lint to be loaded
#  worker_processes 2
#  Rainbows! do
#   use :Revactor
#   client_max_body_size nil
#  end
#
# You can upload files like this:
#
#  curl -v -T /path/to/a/file/to/upload http://localhost:8080/
#
# You can upload infinite data to test concurrency like this:
#
#  dd if=/dev/zero bs=16k | curl -v -T - http://localhost:8080/test.bin
#
# Spawn as many of these as you like :) You'll notice regular debug
# output from the server telling you the upload progress of each
# concurrent upload.
#
# If all is well, your disk space should not decrease during the
# uploads and the ram usage of the server should not balloon.

bs = ENV['bs'] ? ENV['bs'].to_i : 16384
require 'digest/sha1'
use Rack::CommonLogger
use Rack::ShowExceptions
use Rack::ContentLength

app = lambda do |env|

  # Tell all expect requests we're happy to accept
  /\A100-continue\z/i =~ env['HTTP_EXPECT'] and
    return [ 100, {}, [] ]

  input = env["rack.input"]
 
  if input.respond_to?(:tmp)
    tmp = input.tmp
    # Hack to prevent request being written to disk
    tmp.respond_to?(:reopen) and tmp.reopen('/dev/null', 'w+')
  end

  digest = Digest::SHA1.new

  recv_bytes = 0
  last_time = Time.now.to_i
  last_recv_bytes = 0
  req_id = rand(0xffff)

  while buf = input.read(bs)
    recv_bytes += buf.size
    digest.update(buf)
    if (recv_bytes / bs) % 10000 == 9999
      time_diff = Time.now.to_i - last_time + 1
      recv_bytes_diff = recv_bytes - last_recv_bytes
      speed = (recv_bytes_diff / time_diff) / 1024
      recv_meg = recv_bytes / 1024 / 1024
      msg = "req #{req_id}: #{recv_meg}M so far, (#{speed}k/s)\n"
      env['rack.errors'].write msg
      last_time = Time.now.to_i
      last_recv_bytes = recv_bytes
    end
  end
  
  [ 200, {
      'Content-Type' => 'text/plain', 
      'SHA1' => digest.hexdigest, 
      'Received-Bytes' => recv_bytes.to_s
    }, [''] ]

end
run app


_______________________________________________
Rainbows! mailing list - rainbows-talk-GrnCvJ7WPxnNLxjTenLetw@public.gmane.org
http://rubyforge.org/mailman/listinfo/rainbows-talk
Do not quote signatures (like this one) or top post when replying


^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2010-08-11 22:41 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2010-08-10 22:23 streaming input for large requests John Leach
2010-08-10 23:25 ` Eric Wong
     [not found]   ` <20100810232527.GA14486-yBiyF41qdooeIZ0/mPfg9Q@public.gmane.org>
2010-08-11  8:54     ` John Leach
2010-08-11 22:41     ` John Leach

Code repositories for project(s) associated with this public inbox

	https://yhbt.net/rainbows.git/

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).