From mboxrd@z Thu Jan 1 00:00:00 1970 X-Spam-Checker-Version: SpamAssassin 3.3.2 (2011-06-06) on dcvr.yhbt.net X-Spam-Level: X-Spam-ASN: AS14383 205.234.109.0/24 X-Spam-Status: No, score=0.0 required=3.0 tests=MSGID_FROM_MTA_HEADER shortcircuit=no autolearn=unavailable version=3.3.2 Path: news.gmane.org!not-for-mail From: John Leach Newsgroups: gmane.comp.lang.ruby.rainbows.general Subject: Re: streaming input for large requests Date: Wed, 11 Aug 2010 23:41:32 +0100 Organization: Brightbox Message-ID: <1281566492.25884.6.camel@dogen> References: <1281478992.8692.15.camel@dogen> <20100810232527.GA14486@dcvr.yhbt.net> NNTP-Posting-Host: lo.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit X-Trace: dough.gmane.org 1281566504 6959 80.91.229.12 (11 Aug 2010 22:41:44 GMT) X-Complaints-To: usenet@dough.gmane.org NNTP-Posting-Date: Wed, 11 Aug 2010 22:41:44 +0000 (UTC) To: Rainbows! list Original-X-From: rainbows-talk-bounces-GrnCvJ7WPxnNLxjTenLetw@public.gmane.org Thu Aug 12 00:41:42 2010 Return-path: Envelope-to: gclrrg-rainbows-talk@m.gmane.org X-Original-To: rainbows-talk-GrnCvJ7WPxnNLxjTenLetw@public.gmane.org Delivered-To: rainbows-talk-GrnCvJ7WPxnNLxjTenLetw@public.gmane.org In-Reply-To: <20100810232527.GA14486-yBiyF41qdooeIZ0/mPfg9Q@public.gmane.org> X-Mailer: Evolution 2.28.3 X-BeenThere: rainbows-talk-GrnCvJ7WPxnNLxjTenLetw@public.gmane.org X-Mailman-Version: 2.1.12 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Original-Sender: rainbows-talk-bounces-GrnCvJ7WPxnNLxjTenLetw@public.gmane.org Errors-To: rainbows-talk-bounces-GrnCvJ7WPxnNLxjTenLetw@public.gmane.org Xref: news.gmane.org gmane.comp.lang.ruby.rainbows.general:120 Archived-At: Received: from rubyforge.org ([205.234.109.19]) by lo.gmane.org with esmtp (Exim 4.69) (envelope-from ) id 1OjJzQ-0006NM-Mo for gclrrg-rainbows-talk@m.gmane.org; Thu, 12 Aug 2010 00:41:40 +0200 Received: from rubyforge.org (rubyforge.org [127.0.0.1]) by rubyforge.org (Postfix) with ESMTP id 7404B185837C; Wed, 11 Aug 2010 18:41:39 -0400 (EDT) Received: from smtp153.iad.emailsrvr.com (smtp153.iad.emailsrvr.com [207.97.245.153]) by rubyforge.org (Postfix) with ESMTP id EA5D81858387 for ; Wed, 11 Aug 2010 18:41:33 -0400 (EDT) Received: from relay15.relay.iad.mlsrvr.com (localhost [127.0.0.1]) by relay15.relay.iad.mlsrvr.com (SMTP Server) with ESMTP id A8A921B4039 for ; Wed, 11 Aug 2010 18:41:33 -0400 (EDT) Received: by relay15.relay.iad.mlsrvr.com (Authenticated sender: john-AT-brightbox.co.uk) with ESMTPSA id 38A9C1B4042 for ; Wed, 11 Aug 2010 18:41:33 -0400 (EDT) On Tue, 2010-08-10 at 16:25 -0700, Eric Wong wrote: > John Leach wrote: > > Hi, > > > > I'm looking to be able to get access to the request body as it is > > available on the socket, so I can process uploads on the fly, as they > > stream in. > > Hi John, > > Cool! If you need some example code, you should check out upr, > http://upr.bogomips.org/ Sadly the demo machine is down, but the one > application I helped somebody write (on a private LAN somewhere) still > works well :) > > There's also the test/examples in the Rainbows! source tree: > > t/sha1*.ru > t/content-md5.ru > > > But to be rewindable, I'm assuming they're being stored somewhere? I'd > > like to be able to handle huge request bodies bit by bit without having > > them written to disk (or worse, stored in ram!). Is there some way to > > do this? > > Yes, we store uploads to an unlinked temporary file if the body is > larger than 112 Kbytes (this threshold was established by Mongrel back > in the day). > > Rack currently requires rewindability, but this requirement will > most likely be optional in Rack 2.x, and we'll update our code > to match, then. > > Meanwhile, you can either: > > 1. Write a module to disable writes to tmp for the Unicorn::TeeInput > class (or monkey patch it) it. > > 2. Without loading Rack::Lint (or anything that wraps env["rack.input"]): > Redirect the temporary file to /dev/null: > > input = env["rack.input"] > if input.respond_to?(:tmp) > tmp = input.tmp > # StringIO is used for bodies <112K, can't reopen those > tmp.respond_to?(:reopen) and tmp.reopen('/dev/null', 'wb') > end > I knocked together a little test app to do this as you suggested, it works a treat: http://gist.github.com/519915 Thanks again Eric! John. # # This little test app generates SHA1 hashes for HTTP uploads on the # fly, without storing them on disk. # By John Leach (with help from Eric Wong) # # Start the server like this: # # rainbows -c rainbows.conf.rb rainbows-sha1.ru # # I've been testing this with Revactor, which requires Ruby 1.9 # # Use with the following rainbows.conf.rb: # # ENV['RACK_ENV'] = nil # we don't want lint to be loaded # worker_processes 2 # Rainbows! do # use :Revactor # client_max_body_size nil # end # # You can upload files like this: # # curl -v -T /path/to/a/file/to/upload http://localhost:8080/ # # You can upload infinite data to test concurrency like this: # # dd if=/dev/zero bs=16k | curl -v -T - http://localhost:8080/test.bin # # Spawn as many of these as you like :) You'll notice regular debug # output from the server telling you the upload progress of each # concurrent upload. # # If all is well, your disk space should not decrease during the # uploads and the ram usage of the server should not balloon. bs = ENV['bs'] ? ENV['bs'].to_i : 16384 require 'digest/sha1' use Rack::CommonLogger use Rack::ShowExceptions use Rack::ContentLength app = lambda do |env| # Tell all expect requests we're happy to accept /\A100-continue\z/i =~ env['HTTP_EXPECT'] and return [ 100, {}, [] ] input = env["rack.input"] if input.respond_to?(:tmp) tmp = input.tmp # Hack to prevent request being written to disk tmp.respond_to?(:reopen) and tmp.reopen('/dev/null', 'w+') end digest = Digest::SHA1.new recv_bytes = 0 last_time = Time.now.to_i last_recv_bytes = 0 req_id = rand(0xffff) while buf = input.read(bs) recv_bytes += buf.size digest.update(buf) if (recv_bytes / bs) % 10000 == 9999 time_diff = Time.now.to_i - last_time + 1 recv_bytes_diff = recv_bytes - last_recv_bytes speed = (recv_bytes_diff / time_diff) / 1024 recv_meg = recv_bytes / 1024 / 1024 msg = "req #{req_id}: #{recv_meg}M so far, (#{speed}k/s)\n" env['rack.errors'].write msg last_time = Time.now.to_i last_recv_bytes = recv_bytes end end [ 200, { 'Content-Type' => 'text/plain', 'SHA1' => digest.hexdigest, 'Received-Bytes' => recv_bytes.to_s }, [''] ] end run app _______________________________________________ Rainbows! mailing list - rainbows-talk-GrnCvJ7WPxnNLxjTenLetw@public.gmane.org http://rubyforge.org/mailman/listinfo/rainbows-talk Do not quote signatures (like this one) or top post when replying