* streaming input for large requests
@ 2010-08-10 22:23 John Leach
2010-08-10 23:25 ` Eric Wong
0 siblings, 1 reply; 4+ messages in thread
From: John Leach @ 2010-08-10 22:23 UTC (permalink / raw)
To: rainbows-talk-GrnCvJ7WPxnNLxjTenLetw
Hi,
I'm looking to be able to get access to the request body as it is
available on the socket, so I can process uploads on the fly, as they
stream in.
The docs suggest this is possible with rack.input:
"Exposes a streaming “rack.input“ to the Rack application that reads
data off the socket as the application reads it (while retaining
rewindable semantics as required by Rack). This allows Rack-compliant
apps/middleware to implement things such as real-time upload progress
monitoring."
But to be rewindable, I'm assuming they're being stored somewhere? I'd
like to be able to handle huge request bodies bit by bit without having
them written to disk (or worse, stored in ram!). Is there some way to
do this?
Thanks,
John.
_______________________________________________
Rainbows! mailing list - rainbows-talk@rubyforge.org
http://rubyforge.org/mailman/listinfo/rainbows-talk
Do not quote signatures (like this one) or top post when replying
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: streaming input for large requests
2010-08-10 22:23 streaming input for large requests John Leach
@ 2010-08-10 23:25 ` Eric Wong
[not found] ` <20100810232527.GA14486-yBiyF41qdooeIZ0/mPfg9Q@public.gmane.org>
0 siblings, 1 reply; 4+ messages in thread
From: Eric Wong @ 2010-08-10 23:25 UTC (permalink / raw)
To: Rainbows! list
John Leach <john-6Yy1h0nn+LyySYScBAwQ8Q@public.gmane.org> wrote:
> Hi,
>
> I'm looking to be able to get access to the request body as it is
> available on the socket, so I can process uploads on the fly, as they
> stream in.
Hi John,
Cool! If you need some example code, you should check out upr,
http://upr.bogomips.org/ Sadly the demo machine is down, but the one
application I helped somebody write (on a private LAN somewhere) still
works well :)
There's also the test/examples in the Rainbows! source tree:
t/sha1*.ru
t/content-md5.ru
> But to be rewindable, I'm assuming they're being stored somewhere? I'd
> like to be able to handle huge request bodies bit by bit without having
> them written to disk (or worse, stored in ram!). Is there some way to
> do this?
Yes, we store uploads to an unlinked temporary file if the body is
larger than 112 Kbytes (this threshold was established by Mongrel back
in the day).
Rack currently requires rewindability, but this requirement will
most likely be optional in Rack 2.x, and we'll update our code
to match, then.
Meanwhile, you can either:
1. Write a module to disable writes to tmp for the Unicorn::TeeInput
class (or monkey patch it) it.
2. Without loading Rack::Lint (or anything that wraps env["rack.input"]):
Redirect the temporary file to /dev/null:
input = env["rack.input"]
if input.respond_to?(:tmp)
tmp = input.tmp
# StringIO is used for bodies <112K, can't reopen those
tmp.respond_to?(:reopen) and tmp.reopen('/dev/null', 'wb')
end
--
Eric Wong
_______________________________________________
Rainbows! mailing list - rainbows-talk-GrnCvJ7WPxnNLxjTenLetw@public.gmane.org
http://rubyforge.org/mailman/listinfo/rainbows-talk
Do not quote signatures (like this one) or top post when replying
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: streaming input for large requests
[not found] ` <20100810232527.GA14486-yBiyF41qdooeIZ0/mPfg9Q@public.gmane.org>
@ 2010-08-11 8:54 ` John Leach
2010-08-11 22:41 ` John Leach
1 sibling, 0 replies; 4+ messages in thread
From: John Leach @ 2010-08-11 8:54 UTC (permalink / raw)
To: Rainbows! list
On Tue, 2010-08-10 at 16:25 -0700, Eric Wong wrote:
> Yes, we store uploads to an unlinked temporary file if the body is
> larger than 112 Kbytes (this threshold was established by Mongrel back
> in the day).
>
> Rack currently requires rewindability, but this requirement will
> most likely be optional in Rack 2.x, and we'll update our code
> to match, then.
>
> Meanwhile, you can either:
>
> 1. Write a module to disable writes to tmp for the Unicorn::TeeInput
> class (or monkey patch it) it.
>
> 2. Without loading Rack::Lint (or anything that wraps env["rack.input"]):
> Redirect the temporary file to /dev/null:
>
> input = env["rack.input"]
> if input.respond_to?(:tmp)
> tmp = input.tmp
> # StringIO is used for bodies <112K, can't reopen those
> tmp.respond_to?(:reopen) and tmp.reopen('/dev/null', 'wb')
> end
>
thanks for the detailed response Eric. I'll give this a go.
John.
_______________________________________________
Rainbows! mailing list - rainbows-talk-GrnCvJ7WPxnNLxjTenLetw@public.gmane.org
http://rubyforge.org/mailman/listinfo/rainbows-talk
Do not quote signatures (like this one) or top post when replying
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: streaming input for large requests
[not found] ` <20100810232527.GA14486-yBiyF41qdooeIZ0/mPfg9Q@public.gmane.org>
2010-08-11 8:54 ` John Leach
@ 2010-08-11 22:41 ` John Leach
1 sibling, 0 replies; 4+ messages in thread
From: John Leach @ 2010-08-11 22:41 UTC (permalink / raw)
To: Rainbows! list
On Tue, 2010-08-10 at 16:25 -0700, Eric Wong wrote:
> John Leach <john-6Yy1h0nn+LyySYScBAwQ8Q@public.gmane.org> wrote:
> > Hi,
> >
> > I'm looking to be able to get access to the request body as it is
> > available on the socket, so I can process uploads on the fly, as they
> > stream in.
>
> Hi John,
>
> Cool! If you need some example code, you should check out upr,
> http://upr.bogomips.org/ Sadly the demo machine is down, but the one
> application I helped somebody write (on a private LAN somewhere) still
> works well :)
>
> There's also the test/examples in the Rainbows! source tree:
>
> t/sha1*.ru
> t/content-md5.ru
>
> > But to be rewindable, I'm assuming they're being stored somewhere? I'd
> > like to be able to handle huge request bodies bit by bit without having
> > them written to disk (or worse, stored in ram!). Is there some way to
> > do this?
>
> Yes, we store uploads to an unlinked temporary file if the body is
> larger than 112 Kbytes (this threshold was established by Mongrel back
> in the day).
>
> Rack currently requires rewindability, but this requirement will
> most likely be optional in Rack 2.x, and we'll update our code
> to match, then.
>
> Meanwhile, you can either:
>
> 1. Write a module to disable writes to tmp for the Unicorn::TeeInput
> class (or monkey patch it) it.
>
> 2. Without loading Rack::Lint (or anything that wraps env["rack.input"]):
> Redirect the temporary file to /dev/null:
>
> input = env["rack.input"]
> if input.respond_to?(:tmp)
> tmp = input.tmp
> # StringIO is used for bodies <112K, can't reopen those
> tmp.respond_to?(:reopen) and tmp.reopen('/dev/null', 'wb')
> end
>
I knocked together a little test app to do this as you suggested, it
works a treat:
http://gist.github.com/519915
Thanks again Eric!
John.
#
# This little test app generates SHA1 hashes for HTTP uploads on the
# fly, without storing them on disk.
# By John Leach <john-vU9/idRNkX5kbu+0n/iG1Q@public.gmane.org> (with help from Eric Wong)
#
# Start the server like this:
#
# rainbows -c rainbows.conf.rb rainbows-sha1.ru
#
# I've been testing this with Revactor, which requires Ruby 1.9
#
# Use with the following rainbows.conf.rb:
#
# ENV['RACK_ENV'] = nil # we don't want lint to be loaded
# worker_processes 2
# Rainbows! do
# use :Revactor
# client_max_body_size nil
# end
#
# You can upload files like this:
#
# curl -v -T /path/to/a/file/to/upload http://localhost:8080/
#
# You can upload infinite data to test concurrency like this:
#
# dd if=/dev/zero bs=16k | curl -v -T - http://localhost:8080/test.bin
#
# Spawn as many of these as you like :) You'll notice regular debug
# output from the server telling you the upload progress of each
# concurrent upload.
#
# If all is well, your disk space should not decrease during the
# uploads and the ram usage of the server should not balloon.
bs = ENV['bs'] ? ENV['bs'].to_i : 16384
require 'digest/sha1'
use Rack::CommonLogger
use Rack::ShowExceptions
use Rack::ContentLength
app = lambda do |env|
# Tell all expect requests we're happy to accept
/\A100-continue\z/i =~ env['HTTP_EXPECT'] and
return [ 100, {}, [] ]
input = env["rack.input"]
if input.respond_to?(:tmp)
tmp = input.tmp
# Hack to prevent request being written to disk
tmp.respond_to?(:reopen) and tmp.reopen('/dev/null', 'w+')
end
digest = Digest::SHA1.new
recv_bytes = 0
last_time = Time.now.to_i
last_recv_bytes = 0
req_id = rand(0xffff)
while buf = input.read(bs)
recv_bytes += buf.size
digest.update(buf)
if (recv_bytes / bs) % 10000 == 9999
time_diff = Time.now.to_i - last_time + 1
recv_bytes_diff = recv_bytes - last_recv_bytes
speed = (recv_bytes_diff / time_diff) / 1024
recv_meg = recv_bytes / 1024 / 1024
msg = "req #{req_id}: #{recv_meg}M so far, (#{speed}k/s)\n"
env['rack.errors'].write msg
last_time = Time.now.to_i
last_recv_bytes = recv_bytes
end
end
[ 200, {
'Content-Type' => 'text/plain',
'SHA1' => digest.hexdigest,
'Received-Bytes' => recv_bytes.to_s
}, [''] ]
end
run app
_______________________________________________
Rainbows! mailing list - rainbows-talk-GrnCvJ7WPxnNLxjTenLetw@public.gmane.org
http://rubyforge.org/mailman/listinfo/rainbows-talk
Do not quote signatures (like this one) or top post when replying
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2010-08-11 22:41 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2010-08-10 22:23 streaming input for large requests John Leach
2010-08-10 23:25 ` Eric Wong
[not found] ` <20100810232527.GA14486-yBiyF41qdooeIZ0/mPfg9Q@public.gmane.org>
2010-08-11 8:54 ` John Leach
2010-08-11 22:41 ` John Leach
Code repositories for project(s) associated with this public inbox
https://yhbt.net/rainbows.git/
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).