From: John Leach <john-6Yy1h0nn+LyySYScBAwQ8Q@public.gmane.org>
To: Rainbows! list <rainbows-talk-GrnCvJ7WPxnNLxjTenLetw@public.gmane.org>
Subject: Re: streaming input for large requests
Date: Wed, 11 Aug 2010 23:41:32 +0100 [thread overview]
Message-ID: <1281566492.25884.6.camel@dogen> (raw)
In-Reply-To: <20100810232527.GA14486-yBiyF41qdooeIZ0/mPfg9Q@public.gmane.org>
On Tue, 2010-08-10 at 16:25 -0700, Eric Wong wrote:
> John Leach <john-6Yy1h0nn+LyySYScBAwQ8Q@public.gmane.org> wrote:
> > Hi,
> >
> > I'm looking to be able to get access to the request body as it is
> > available on the socket, so I can process uploads on the fly, as they
> > stream in.
>
> Hi John,
>
> Cool! If you need some example code, you should check out upr,
> http://upr.bogomips.org/ Sadly the demo machine is down, but the one
> application I helped somebody write (on a private LAN somewhere) still
> works well :)
>
> There's also the test/examples in the Rainbows! source tree:
>
> t/sha1*.ru
> t/content-md5.ru
>
> > But to be rewindable, I'm assuming they're being stored somewhere? I'd
> > like to be able to handle huge request bodies bit by bit without having
> > them written to disk (or worse, stored in ram!). Is there some way to
> > do this?
>
> Yes, we store uploads to an unlinked temporary file if the body is
> larger than 112 Kbytes (this threshold was established by Mongrel back
> in the day).
>
> Rack currently requires rewindability, but this requirement will
> most likely be optional in Rack 2.x, and we'll update our code
> to match, then.
>
> Meanwhile, you can either:
>
> 1. Write a module to disable writes to tmp for the Unicorn::TeeInput
> class (or monkey patch it) it.
>
> 2. Without loading Rack::Lint (or anything that wraps env["rack.input"]):
> Redirect the temporary file to /dev/null:
>
> input = env["rack.input"]
> if input.respond_to?(:tmp)
> tmp = input.tmp
> # StringIO is used for bodies <112K, can't reopen those
> tmp.respond_to?(:reopen) and tmp.reopen('/dev/null', 'wb')
> end
>
I knocked together a little test app to do this as you suggested, it
works a treat:
http://gist.github.com/519915
Thanks again Eric!
John.
#
# This little test app generates SHA1 hashes for HTTP uploads on the
# fly, without storing them on disk.
# By John Leach <john-vU9/idRNkX5kbu+0n/iG1Q@public.gmane.org> (with help from Eric Wong)
#
# Start the server like this:
#
# rainbows -c rainbows.conf.rb rainbows-sha1.ru
#
# I've been testing this with Revactor, which requires Ruby 1.9
#
# Use with the following rainbows.conf.rb:
#
# ENV['RACK_ENV'] = nil # we don't want lint to be loaded
# worker_processes 2
# Rainbows! do
# use :Revactor
# client_max_body_size nil
# end
#
# You can upload files like this:
#
# curl -v -T /path/to/a/file/to/upload http://localhost:8080/
#
# You can upload infinite data to test concurrency like this:
#
# dd if=/dev/zero bs=16k | curl -v -T - http://localhost:8080/test.bin
#
# Spawn as many of these as you like :) You'll notice regular debug
# output from the server telling you the upload progress of each
# concurrent upload.
#
# If all is well, your disk space should not decrease during the
# uploads and the ram usage of the server should not balloon.
bs = ENV['bs'] ? ENV['bs'].to_i : 16384
require 'digest/sha1'
use Rack::CommonLogger
use Rack::ShowExceptions
use Rack::ContentLength
app = lambda do |env|
# Tell all expect requests we're happy to accept
/\A100-continue\z/i =~ env['HTTP_EXPECT'] and
return [ 100, {}, [] ]
input = env["rack.input"]
if input.respond_to?(:tmp)
tmp = input.tmp
# Hack to prevent request being written to disk
tmp.respond_to?(:reopen) and tmp.reopen('/dev/null', 'w+')
end
digest = Digest::SHA1.new
recv_bytes = 0
last_time = Time.now.to_i
last_recv_bytes = 0
req_id = rand(0xffff)
while buf = input.read(bs)
recv_bytes += buf.size
digest.update(buf)
if (recv_bytes / bs) % 10000 == 9999
time_diff = Time.now.to_i - last_time + 1
recv_bytes_diff = recv_bytes - last_recv_bytes
speed = (recv_bytes_diff / time_diff) / 1024
recv_meg = recv_bytes / 1024 / 1024
msg = "req #{req_id}: #{recv_meg}M so far, (#{speed}k/s)\n"
env['rack.errors'].write msg
last_time = Time.now.to_i
last_recv_bytes = recv_bytes
end
end
[ 200, {
'Content-Type' => 'text/plain',
'SHA1' => digest.hexdigest,
'Received-Bytes' => recv_bytes.to_s
}, [''] ]
end
run app
_______________________________________________
Rainbows! mailing list - rainbows-talk-GrnCvJ7WPxnNLxjTenLetw@public.gmane.org
http://rubyforge.org/mailman/listinfo/rainbows-talk
Do not quote signatures (like this one) or top post when replying
prev parent reply other threads:[~2010-08-11 22:41 UTC|newest]
Thread overview: 4+ messages / expand[flat|nested] mbox.gz Atom feed top
2010-08-10 22:23 streaming input for large requests John Leach
2010-08-10 23:25 ` Eric Wong
[not found] ` <20100810232527.GA14486-yBiyF41qdooeIZ0/mPfg9Q@public.gmane.org>
2010-08-11 8:54 ` John Leach
2010-08-11 22:41 ` John Leach [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
List information: http://yhbt.net/rainbows/
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1281566492.25884.6.camel@dogen \
--to=john-6yy1h0nn+lyysyscbawq8q@public.gmane.org \
--cc=rainbows-talk-GrnCvJ7WPxnNLxjTenLetw@public.gmane.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this public inbox
http://yhbt.net/rainbows.git/
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).