unicorn Ruby/Rack server user+dev discussion/patches/pulls/bugs/help
 help / color / mirror / code / Atom feed
From: Eric Wong <normalperson@yhbt.net>
To: unicorn list <mongrel-unicorn@rubyforge.org>
Subject: Re: Is a client uploading a file a slow client from unicorn's point of view?
Date: Tue, 9 Oct 2012 01:58:00 +0000	[thread overview]
Message-ID: <20121009015800.GA1772@dcvr.yhbt.net> (raw)
In-Reply-To: <CAHStS5iqBqJxoPZGEUaSHL2qSVf_tfZfbdedc_bRcjcTbcKwAQ@mail.gmail.com>

Jimmy Soho <jimmy.soho@gmail.com> wrote:
> Hi All,
> 
> I was wondering what would happen when large files were uploaded to
> our system in parallel to endpoints that don't process file uploads.
> In particular I was wondering if we're vulnerable to a simple DoS
> attack.

nginx will protect you by buffering large requests to disk, so slow
requests are taken care of (of course you may still run out of disk
space)

> The setup I tested with was nginx v1.2.4 with upload module (v2.2.0)
> configured only for location /uploads with 2 unicorn (v4.3.1) workers
> with timeout 30 secs, all running on 1 small unix box.
> 
> In a few terminals I started this command 3 times in parallel:
> 
>    $ curl -i -F importer_input=@/Users/admin/largefile.tar.gz
> https://mywebserver.com/doesnotexist
> 
> In a browser I then tried to go a page that would be served by a unicorn worker.
> 
> My expectation was that I would not get to see the web page as all
> unicorn workers would be busy with receiving / saving the upload. As
> discussed in for example this article:
> http://stackoverflow.com/questions/9592664/unicorn-rails-large-uploads.
> Or as https://github.com/dwilkie/carrierwave_direct describes it:
> 
>   "Processing and saving file uploads are typically long running tasks
> and should be done in a background process."

That is true.  It's good to move slow jobs to background processes if
possible if the bottleneck is either:

a) your application processing
b) the storage destination of your app (e.g. cloud storage)

However, if your only bottleneck is client <-> your app, then nginx
will take care of that part for you.

> But I don't see this. The page is served just fine in my setup. The
> requests for the file uploads appear in the nginx access log at the
> same time the curl upload command eventually finishes minutes later
> client side, and then it's handed off to a unicorn/rack worker
> process, which quickly returns a 404 page not found. Response times of
> less than 50ms.
> 
> What am I missing here? I'm starting to wonder what's the use of the
> nginx upload module? My understanding was that it's use was to keep
> unicorn workers available as long as a file upload was in progress,
> but it seems that without that module it does the same thing.

I'm not familiar with the nginx upload module, but stock nginx will
already do full request buffering for you.  It looks like the nginx
upload module[1] is mostly meant for standalone apps written for
nginx, and not when nginx is used as a proxy for Rails app...

[1] http://www.grid.net.ru/nginx/upload.en.html

> Another question (more an nginx question though I guess): is there a
> way to kill an upload request as early as possible if the request is
> not made against known / accepted URI locations, instead of waiting
> for it to be completely uploaded to our system and/or waiting for it
> to reach the unicorn workers?

I'm not sure if nginx has this functionality, but unicorn lazily buffers
uploads.  So your upload will be fully read by nginx, but unicorn
will only read the uploaded request body if your application wants to
read it.

Unfortunately, I think most application frameworks (e.g. Rails) will
attempt to do all the multipart parsing up front.  To get around this,
you'll probably want some middleware along the following lines (and
placed in front of whichever part of your stack calls
Rack::Multipart.parse_multipart)

class BadUploadStopper
  def initialize(app)
    @app = app
  end

  def call(env)
    case env["REQUEST_METHOD"]
    when "POST", "PUT"
      case env["PATH_INFO"]
      when "/upload_allowed"
        @app.call(env) # forward to the app
      else
        # bad path, don't waste time with @app.call
        [ 403, {}, [ "Go away\n" ] ]
      end
    else
      @app.call(env)  # forward to the app
    end
  end
end

------------------- config.ru ---------------------
use BadUploadStopper
run YourApp.new
_______________________________________________
Unicorn mailing list - mongrel-unicorn@rubyforge.org
http://rubyforge.org/mailman/listinfo/mongrel-unicorn
Do not quote signatures (like this one) or top post when replying

  reply	other threads:[~2012-10-09  1:58 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-10-09  0:39 Is a client uploading a file a slow client from unicorn's point of view? Jimmy Soho
2012-10-09  1:58 ` Eric Wong [this message]
2012-10-09  6:31   ` Laas Toom
2012-10-09 20:03     ` Eric Wong
2012-10-09 23:06       ` Laas Toom
2012-10-09 23:54         ` Eric Wong
2012-10-10  6:59           ` Laas Toom

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://yhbt.net/unicorn/

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20121009015800.GA1772@dcvr.yhbt.net \
    --to=normalperson@yhbt.net \
    --cc=mongrel-unicorn@rubyforge.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://yhbt.net/unicorn.git/

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).