Rainbows! Rack HTTP server user/dev discussion
 help / color / mirror / code / Atom feed
From: John Leach <john-6Yy1h0nn+LyySYScBAwQ8Q@public.gmane.org>
To: Rainbows! list <rainbows-talk-GrnCvJ7WPxnNLxjTenLetw@public.gmane.org>
Subject: Re: streaming input for large requests
Date: Wed, 11 Aug 2010 23:41:32 +0100	[thread overview]
Message-ID: <1281566492.25884.6.camel@dogen> (raw)
In-Reply-To: <20100810232527.GA14486-yBiyF41qdooeIZ0/mPfg9Q@public.gmane.org>

On Tue, 2010-08-10 at 16:25 -0700, Eric Wong wrote: 
> John Leach <john-6Yy1h0nn+LyySYScBAwQ8Q@public.gmane.org> wrote:
> > Hi,
> > 
> > I'm looking to be able to get access to the request body as it is
> > available on the socket, so I can process uploads on the fly, as they
> > stream in.
> 
> Hi John,
> 
> Cool!  If you need some example code, you should check out upr,
> http://upr.bogomips.org/  Sadly the demo machine is down, but the one
> application I helped somebody write (on a private LAN somewhere) still
> works well :)
> 
> There's also the test/examples in the Rainbows! source tree:
> 
>   t/sha1*.ru
>   t/content-md5.ru
> 
> > But to be rewindable, I'm assuming they're being stored somewhere?  I'd
> > like to be able to handle huge request bodies bit by bit without having
> > them written to disk (or worse, stored in ram!).  Is there some way to
> > do this?
> 
> Yes, we store uploads to an unlinked temporary file if the body is
> larger than 112 Kbytes (this threshold was established by Mongrel back
> in the day).
> 
> Rack currently requires rewindability, but this requirement will
> most likely be optional in Rack 2.x, and we'll update our code
> to match, then.
> 
> Meanwhile, you can either:
> 
> 1. Write a module to disable writes to tmp for the Unicorn::TeeInput
>    class (or monkey patch it) it.
> 
> 2. Without loading Rack::Lint (or anything that wraps env["rack.input"]):
>    Redirect the temporary file to /dev/null:
> 
>     input = env["rack.input"]
>     if input.respond_to?(:tmp)
>       tmp = input.tmp
>       # StringIO is used for bodies <112K, can't reopen those
>       tmp.respond_to?(:reopen) and tmp.reopen('/dev/null', 'wb')
>     end
> 

I knocked together a little test app to do this as you suggested, it
works a treat:

http://gist.github.com/519915

Thanks again Eric!

John.

#
# This little test app generates SHA1 hashes for HTTP uploads on the
# fly, without storing them on disk.
# By John Leach <john-vU9/idRNkX5kbu+0n/iG1Q@public.gmane.org> (with help from Eric Wong)
#
# Start the server like this:
#
#  rainbows -c rainbows.conf.rb rainbows-sha1.ru
#
# I've been testing this with Revactor, which requires Ruby 1.9 
# 
# Use with the following rainbows.conf.rb:
#
#  ENV['RACK_ENV'] = nil # we don't want lint to be loaded
#  worker_processes 2
#  Rainbows! do
#   use :Revactor
#   client_max_body_size nil
#  end
#
# You can upload files like this:
#
#  curl -v -T /path/to/a/file/to/upload http://localhost:8080/
#
# You can upload infinite data to test concurrency like this:
#
#  dd if=/dev/zero bs=16k | curl -v -T - http://localhost:8080/test.bin
#
# Spawn as many of these as you like :) You'll notice regular debug
# output from the server telling you the upload progress of each
# concurrent upload.
#
# If all is well, your disk space should not decrease during the
# uploads and the ram usage of the server should not balloon.

bs = ENV['bs'] ? ENV['bs'].to_i : 16384
require 'digest/sha1'
use Rack::CommonLogger
use Rack::ShowExceptions
use Rack::ContentLength

app = lambda do |env|

  # Tell all expect requests we're happy to accept
  /\A100-continue\z/i =~ env['HTTP_EXPECT'] and
    return [ 100, {}, [] ]

  input = env["rack.input"]
 
  if input.respond_to?(:tmp)
    tmp = input.tmp
    # Hack to prevent request being written to disk
    tmp.respond_to?(:reopen) and tmp.reopen('/dev/null', 'w+')
  end

  digest = Digest::SHA1.new

  recv_bytes = 0
  last_time = Time.now.to_i
  last_recv_bytes = 0
  req_id = rand(0xffff)

  while buf = input.read(bs)
    recv_bytes += buf.size
    digest.update(buf)
    if (recv_bytes / bs) % 10000 == 9999
      time_diff = Time.now.to_i - last_time + 1
      recv_bytes_diff = recv_bytes - last_recv_bytes
      speed = (recv_bytes_diff / time_diff) / 1024
      recv_meg = recv_bytes / 1024 / 1024
      msg = "req #{req_id}: #{recv_meg}M so far, (#{speed}k/s)\n"
      env['rack.errors'].write msg
      last_time = Time.now.to_i
      last_recv_bytes = recv_bytes
    end
  end
  
  [ 200, {
      'Content-Type' => 'text/plain', 
      'SHA1' => digest.hexdigest, 
      'Received-Bytes' => recv_bytes.to_s
    }, [''] ]

end
run app


_______________________________________________
Rainbows! mailing list - rainbows-talk-GrnCvJ7WPxnNLxjTenLetw@public.gmane.org
http://rubyforge.org/mailman/listinfo/rainbows-talk
Do not quote signatures (like this one) or top post when replying


      parent reply	other threads:[~2010-08-11 22:41 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-08-10 22:23 streaming input for large requests John Leach
2010-08-10 23:25 ` Eric Wong
     [not found]   ` <20100810232527.GA14486-yBiyF41qdooeIZ0/mPfg9Q@public.gmane.org>
2010-08-11  8:54     ` John Leach
2010-08-11 22:41     ` John Leach [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://yhbt.net/rainbows/

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1281566492.25884.6.camel@dogen \
    --to=john-6yy1h0nn+lyysyscbawq8q@public.gmane.org \
    --cc=rainbows-talk-GrnCvJ7WPxnNLxjTenLetw@public.gmane.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://yhbt.net/rainbows.git/

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).