unicorn Ruby/Rack server user+dev discussion/patches/pulls/bugs/help
 help / color / mirror / code / Atom feed
* A barrage of unexplained timeouts
@ 2013-08-20 14:47 nick
  2013-08-20 16:37 ` Eric Wong
  0 siblings, 1 reply; 15+ messages in thread
From: nick @ 2013-08-20 14:47 UTC (permalink / raw)
  To: mongrel-unicorn

We've been running unicorn-3.6.2 on REE 1.8.7 2011.12 in production for quite some time and we use monit to monitor each unicorn worker.  Occasionally, I'll get a notification that a worker has timed-out and has been re-spawned.  In all these cases, when I look at the rails logs, I can see the last request that the worker handled, and they all have appeared to complete successfully from the client's perspective (rails and nginx respond with 200), but the unicorn log shows that it was killed due to timeout.  This has always been relatively rare and I thought it was a non-problem.

Until today.

Today, for about a 7 minute period, our workers would continually report as having timed-out and would be killed by the master.  After re-spawning, the workers would serve a handful of requests and then eventually be killed again.

During this time, our servers (Web, PG DB, and redis) were not under load and IO was normal.  After the last monit notification at 8:30, everything went back to normal.  I understand why unicorns would timeout if they were waiting (>120 secs) on IO, but there aren't any orphaned requests in the rails log.  For each request line, there's a corresponding completion line.  No long running queries to blame on PG, either.

I know we're probably due for an upgrade, but I'm hoping to get to the bottom of these unexplained timeouts.

Thanks for your help!

-Nick

_______________________________________________
Unicorn mailing list - mongrel-unicorn@rubyforge.org
http://rubyforge.org/mailman/listinfo/mongrel-unicorn
Do not quote signatures (like this one) or top post when replying

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: A barrage of unexplained timeouts
  2013-08-20 14:47 A barrage of unexplained timeouts nick
@ 2013-08-20 16:37 ` Eric Wong
  2013-08-20 17:27   ` nick
  0 siblings, 1 reply; 15+ messages in thread
From: Eric Wong @ 2013-08-20 16:37 UTC (permalink / raw)
  To: unicorn list

nick@auger.net wrote:
> We've been running unicorn-3.6.2 on REE 1.8.7 2011.12 in production
> for quite some time and we use monit to monitor each unicorn worker.
> Occasionally, I'll get a notification that a worker has timed-out and
> has been re-spawned.  In all these cases, when I look at the rails
> logs, I can see the last request that the worker handled, and they all
> have appeared to complete successfully from the client's perspective
> (rails and nginx respond with 200), but the unicorn log shows that it
> was killed due to timeout.  This has always been relatively rare and I
> thought it was a non-problem.
> 
> During this time, our servers (Web, PG DB, and redis) were not under
> load and IO was normal.  After the last monit notification at 8:30,
> everything went back to normal.  I understand why unicorns would
> timeout if they were waiting (>120 secs) on IO, but there aren't any
> orphaned requests in the rails log.  For each request line, there's a
> corresponding completion line.  No long running queries to blame on
> PG, either.

Can you take a look at the nginx error and access logs?  From what
you're saying, there's a chance a request never even got to the Rails
layer.  However, nginx should be logging failed/long-running requests to
unicorn.
_______________________________________________
Unicorn mailing list - mongrel-unicorn@rubyforge.org
http://rubyforge.org/mailman/listinfo/mongrel-unicorn
Do not quote signatures (like this one) or top post when replying

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: A barrage of unexplained timeouts
  2013-08-20 16:37 ` Eric Wong
@ 2013-08-20 17:27   ` nick
  2013-08-20 17:40     ` Eric Wong
  0 siblings, 1 reply; 15+ messages in thread
From: nick @ 2013-08-20 17:27 UTC (permalink / raw)
  To: unicorn list

"Eric Wong" <normalperson@yhbt.net> said:
> Can you take a look at the nginx error and access logs?  From what
> you're saying, there's a chance a request never even got to the Rails
> layer.  However, nginx should be logging failed/long-running requests to
> unicorn.

The nginx access logs show frequent 499 responses.  The error logs are filled with:

connect() failed (110: Connection timed out) while connecting to upstream
upstream timed out (110: Connection timed out) while reading response header from upstream

What specific pieces of information should I be looking for in the logs?

Thanks Again,

-Nick

_______________________________________________
Unicorn mailing list - mongrel-unicorn@rubyforge.org
http://rubyforge.org/mailman/listinfo/mongrel-unicorn
Do not quote signatures (like this one) or top post when replying

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: A barrage of unexplained timeouts
  2013-08-20 17:27   ` nick
@ 2013-08-20 17:40     ` Eric Wong
  2013-08-20 18:11       ` nick
  0 siblings, 1 reply; 15+ messages in thread
From: Eric Wong @ 2013-08-20 17:40 UTC (permalink / raw)
  To: unicorn list

nick@auger.net wrote:
> "Eric Wong" <normalperson@yhbt.net> said:
> > Can you take a look at the nginx error and access logs?  From what
> > you're saying, there's a chance a request never even got to the Rails
> > layer.  However, nginx should be logging failed/long-running requests to
> > unicorn.
> 
> The nginx access logs show frequent 499 responses.  The error logs are filled with:
> 
> connect() failed (110: Connection timed out) while connecting to upstream
> upstream timed out (110: Connection timed out) while reading response header from upstream
> 
> What specific pieces of information should I be looking for in the logs?

Do you have any other requests in your logs which could be taking
a long time and hogging workers, but not high enough to trigger the
unicorn kill timeout.

(enable $request_time in nginx access logs if you haven't already)

Is this with Unix or TCP sockets?  If it's over a LAN, maybe there's
still a bad switch/port/cable somewhere (that happens often to me).

With Unix sockets, I don't recall encountering recent problems under
Linux.  Which OS are you running?

I hit several TCP bugs in the early 3.7 Linux kernels, too...
_______________________________________________
Unicorn mailing list - mongrel-unicorn@rubyforge.org
http://rubyforge.org/mailman/listinfo/mongrel-unicorn
Do not quote signatures (like this one) or top post when replying

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: A barrage of unexplained timeouts
  2013-08-20 17:40     ` Eric Wong
@ 2013-08-20 18:11       ` nick
  2013-08-20 18:49         ` Eric Wong
  0 siblings, 1 reply; 15+ messages in thread
From: nick @ 2013-08-20 18:11 UTC (permalink / raw)
  To: unicorn list

"Eric Wong" <normalperson@yhbt.net> said:
> nick@auger.net wrote:
>> "Eric Wong" <normalperson@yhbt.net> said:
>> > Can you take a look at the nginx error and access logs?  From what
>> > you're saying, there's a chance a request never even got to the Rails
>> > layer.  However, nginx should be logging failed/long-running requests to
>> > unicorn.
>>
>> The nginx access logs show frequent 499 responses.  The error logs are filled
>> with:
>>
>> connect() failed (110: Connection timed out) while connecting to upstream
>> upstream timed out (110: Connection timed out) while reading response header from
>> upstream
>>
>> What specific pieces of information should I be looking for in the logs?
> 
> Do you have any other requests in your logs which could be taking
> a long time and hogging workers, but not high enough to trigger the
> unicorn kill timeout.

I don't *think* so.  Most requests finish <300ms.  We do have some more intensive code-paths, but they're administrative and called much less frequently.  Most of these pages complete in <3seconds.

For requests that made it to rails logging, the LAST processed request before the worker timed-out all completed very quickly (and no real pattern in terms of which page may be triggering it.)

> (enable $request_time in nginx access logs if you haven't already)

I'll enable this.

> Is this with Unix or TCP sockets?  If it's over a LAN, maybe there's
> still a bad switch/port/cable somewhere (that happens often to me).

TCP sockets, with nginx and unicorn running on the same box.

> With Unix sockets, I don't recall encountering recent problems under
> Linux.  Which OS are you running?

Stock RHEL 5, kernel 2.6.18.

Thanks again,

-Nick



_______________________________________________
Unicorn mailing list - mongrel-unicorn@rubyforge.org
http://rubyforge.org/mailman/listinfo/mongrel-unicorn
Do not quote signatures (like this one) or top post when replying

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: A barrage of unexplained timeouts
  2013-08-20 18:11       ` nick
@ 2013-08-20 18:49         ` Eric Wong
  2013-08-20 20:03           ` nick
  0 siblings, 1 reply; 15+ messages in thread
From: Eric Wong @ 2013-08-20 18:49 UTC (permalink / raw)
  To: unicorn list

nick@auger.net wrote:
> "Eric Wong" <normalperson@yhbt.net> said:
> > 
> > Do you have any other requests in your logs which could be taking
> > a long time and hogging workers, but not high enough to trigger the
> > unicorn kill timeout.

> I don't *think* so.  Most requests finish <300ms.  We do have some
> more intensive code-paths, but they're administrative and called much
> less frequently.  Most of these pages complete in <3seconds.
> 
> For requests that made it to rails logging, the LAST processed request
> before the worker timed-out all completed very quickly (and no real
> pattern in terms of which page may be triggering it.)

This is really strange.  This was only really bad for a 7s period?
Has it happened again?  Anything else going on with the system at that
time?  Swapping, particularly...

And if you're inside a VM, maybe your neighbors were hogging things.
Large PUT/POST requests which require filesystem I/O are particularly
sensitive to this.

> > Is this with Unix or TCP sockets?  If it's over a LAN, maybe there's
> > still a bad switch/port/cable somewhere (that happens often to me).
> 
> TCP sockets, with nginx and unicorn running on the same box.

OK, that probably rules out a bunch of problems.
Just to be thorough, anything interesting in dmesg or syslogs?

> > With Unix sockets, I don't recall encountering recent problems under
> > Linux.  Which OS are you running?
> 
> Stock RHEL 5, kernel 2.6.18.

RHEL 5.0 or 5.x?  I can't remember /that/ far back to 5.0 (I don't think
I even tried it until 5.2), but don't recall anything being obviously
broken in those...
_______________________________________________
Unicorn mailing list - mongrel-unicorn@rubyforge.org
http://rubyforge.org/mailman/listinfo/mongrel-unicorn
Do not quote signatures (like this one) or top post when replying

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: A barrage of unexplained timeouts
  2013-08-20 18:49         ` Eric Wong
@ 2013-08-20 20:03           ` nick
  2013-08-20 20:42             ` Eric Wong
  0 siblings, 1 reply; 15+ messages in thread
From: nick @ 2013-08-20 20:03 UTC (permalink / raw)
  To: unicorn list

"Eric Wong" <normalperson@yhbt.net> said:

> nick@auger.net wrote:
>> "Eric Wong" <normalperson@yhbt.net> said:
>> >
>> > Do you have any other requests in your logs which could be taking
>> > a long time and hogging workers, but not high enough to trigger the
>> > unicorn kill timeout.
> 
>> I don't *think* so.  Most requests finish <300ms.  We do have some
>> more intensive code-paths, but they're administrative and called much
>> less frequently.  Most of these pages complete in <3seconds.
>>
>> For requests that made it to rails logging, the LAST processed request
>> before the worker timed-out all completed very quickly (and no real
>> pattern in terms of which page may be triggering it.)
> 
> This is really strange.  This was only really bad for a 7s period?

It was a 7 minute period.  All of the workers would become busy and exceed their >120s timeout.  Master would kill and re-spawn them, they'd start to respond to a handful of requests (anywhere from 5-50) after which they'd become "busy" again, and get force killed by master.  This pattern happened 3 times over a 7 minute period.

> Has it happened again?

No

> Anything else going on with the system at that time?  Swapping, particularly...

No swap activity or high load.  Our munin graphs indicate a peak of web/app server disk latency around that time, although our graphs show many other similar peaks, without incident.

> And if you're inside a VM, maybe your neighbors were hogging things.
> Large PUT/POST requests which require filesystem I/O are particularly
> sensitive to this.

We can't blame virtualization as we're on dedicated hardware.

>> > Is this with Unix or TCP sockets?  If it's over a LAN, maybe there's
>> > still a bad switch/port/cable somewhere (that happens often to me).
>>
>> TCP sockets, with nginx and unicorn running on the same box.
> 
> OK, that probably rules out a bunch of problems.
> Just to be thorough, anything interesting in dmesg or syslogs?

Nothing interesting in /var/log/messages.  dmesg (no timestamps) includes lines like:

TCP: Treason uncloaked! Peer ###.###.###.###:63554/80 shrinks window 1149440448:1149447132. Repaired.

>> > With Unix sockets, I don't recall encountering recent problems under
>> > Linux.  Which OS are you running?
>>
>> Stock RHEL 5, kernel 2.6.18.
> 
> RHEL 5.0 or 5.x?  I can't remember /that/ far back to 5.0 (I don't think
> I even tried it until 5.2), but don't recall anything being obviously
> broken in those...

We're on RHEL 5.6.

Thanks,

-Nick

_______________________________________________
Unicorn mailing list - mongrel-unicorn@rubyforge.org
http://rubyforge.org/mailman/listinfo/mongrel-unicorn
Do not quote signatures (like this one) or top post when replying

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: A barrage of unexplained timeouts
  2013-08-20 20:03           ` nick
@ 2013-08-20 20:42             ` Eric Wong
  2013-08-20 21:19               ` nick
  0 siblings, 1 reply; 15+ messages in thread
From: Eric Wong @ 2013-08-20 20:42 UTC (permalink / raw)
  To: unicorn list

nick@auger.net wrote:
> "Eric Wong" <normalperson@yhbt.net> said:
> > This is really strange.  This was only really bad for a 7s period?
> 
> It was a 7 minute period.  All of the workers would become busy and
> exceed their >120s timeout.  Master would kill and re-spawn them,
> they'd start to respond to a handful of requests (anywhere from 5-50)
> after which they'd become "busy" again, and get force killed by
> master.  This pattern happened 3 times over a 7 minute period.
> 
> > Has it happened again?
> 
> No
> 
> > Anything else going on with the system at that time?  Swapping,
> > particularly...
> 
> No swap activity or high load.  Our munin graphs indicate a peak of
> web/app server disk latency around that time, although our graphs show
> many other similar peaks, without incident.

I'm stumped :<

Do you have any background threads running that could be hanging the
workers?   This is Ruby 1.8, after all, so there's more likely to be
some blocking call hanging the entire process.  AFAIK, some monitoring
software runs a background thread in the unicorn worker and maybe the
OpenSSL extension doesn't work as well if it encountered network
problems under Ruby 1.8

Otherwise, I don't know...
_______________________________________________
Unicorn mailing list - mongrel-unicorn@rubyforge.org
http://rubyforge.org/mailman/listinfo/mongrel-unicorn
Do not quote signatures (like this one) or top post when replying

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: A barrage of unexplained timeouts
  2013-08-20 20:42             ` Eric Wong
@ 2013-08-20 21:19               ` nick
  2013-08-20 21:32                 ` Eric Wong
  0 siblings, 1 reply; 15+ messages in thread
From: nick @ 2013-08-20 21:19 UTC (permalink / raw)
  To: unicorn list

"Eric Wong" <normalperson@yhbt.net> said:

> nick@auger.net wrote:
>> "Eric Wong" <normalperson@yhbt.net> said:
>> > This is really strange.  This was only really bad for a 7s period?
>>
>> It was a 7 minute period.  All of the workers would become busy and
>> exceed their >120s timeout.  Master would kill and re-spawn them,
>> they'd start to respond to a handful of requests (anywhere from 5-50)
>> after which they'd become "busy" again, and get force killed by
>> master.  This pattern happened 3 times over a 7 minute period.
>>
>> > Has it happened again?
>>
>> No
>>
>> > Anything else going on with the system at that time?  Swapping,
>> > particularly...
>>
>> No swap activity or high load.  Our munin graphs indicate a peak of
>> web/app server disk latency around that time, although our graphs show
>> many other similar peaks, without incident.
> 
> I'm stumped :<

I was afraid you'd say that :(.

> Do you have any background threads running that could be hanging the
> workers?   This is Ruby 1.8, after all, so there's more likely to be
> some blocking call hanging the entire process.  AFAIK, some monitoring
> software runs a background thread in the unicorn worker and maybe the
> OpenSSL extension doesn't work as well if it encountered network
> problems under Ruby 1.8

We don't explicitly create any threads in our rails code.  We do communicate with backgroundrb worker processes, although, none of the strangeness today involved any routes that would hit backgroundrb workers.

Is there any instrumentation that I could add that might help debugging in the future? ($request_time and $upstream_response_time are now in my nginx logs.)  We have noticed these "unexplainable timeouts" before, but typically for a single worker.  If there's some debugging that could be added I might be able to track it down during these one-off events.

Thoughts?

I absolutely appreciate all your help!

-Nick

_______________________________________________
Unicorn mailing list - mongrel-unicorn@rubyforge.org
http://rubyforge.org/mailman/listinfo/mongrel-unicorn
Do not quote signatures (like this one) or top post when replying

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: A barrage of unexplained timeouts
  2013-08-20 21:19               ` nick
@ 2013-08-20 21:32                 ` Eric Wong
  2013-08-21 13:33                   ` nick
  0 siblings, 1 reply; 15+ messages in thread
From: Eric Wong @ 2013-08-20 21:32 UTC (permalink / raw)
  To: unicorn list

nick@auger.net wrote:
> "Eric Wong" <normalperson@yhbt.net> said:
> > nick@auger.net wrote:
> >> "Eric Wong" <normalperson@yhbt.net> said:
> > I'm stumped :<
> 
> I was afraid you'd say that :(.

Actually, another potential issue is DNS lookups timing out.  But they
shouldn't take *that* long...

> > Do you have any background threads running that could be hanging the
> > workers?   This is Ruby 1.8, after all, so there's more likely to be
> > some blocking call hanging the entire process.  AFAIK, some monitoring
> > software runs a background thread in the unicorn worker and maybe the
> > OpenSSL extension doesn't work as well if it encountered network
> > problems under Ruby 1.8
> 
> We don't explicitly create any threads in our rails code.  We do
> communicate with backgroundrb worker processes, although, none of the
> strangeness today involved any routes that would hit backgroundrb
> workers.

I proactively audit every piece of code (including external
libraries/gems) loaded by an app for potentially blocking calls (hits to
the filesystem, socket calls w/o timeout/blocking).   I use strace to
help me find that sometimes...

> Is there any instrumentation that I could add that might help
> debugging in the future? ($request_time and $upstream_response_time
> are now in my nginx logs.)  We have noticed these "unexplainable
> timeouts" before, but typically for a single worker.  If there's some
> debugging that could be added I might be able to track it down during
> these one-off events.

As an experiment, can you replay traffic a few minutes leading up to and
including that 7m period in a test setup with only one straced worker?

Run "strace -T -f -o $FILE -p $PID_OF_WORKER" and see if there's any
unexpected/surprising dependencies (connect() to unrecognized addresses,
open() to networked filesystems, fcntl locks, etc...).

You can play around with some other strace options (-v/-s SIZE/-e filters)

Maybe you'll find something, there.
_______________________________________________
Unicorn mailing list - mongrel-unicorn@rubyforge.org
http://rubyforge.org/mailman/listinfo/mongrel-unicorn
Do not quote signatures (like this one) or top post when replying

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: A barrage of unexplained timeouts
  2013-08-20 21:32                 ` Eric Wong
@ 2013-08-21 13:33                   ` nick
  2013-08-22  2:32                     ` Jimmy Soho
  0 siblings, 1 reply; 15+ messages in thread
From: nick @ 2013-08-21 13:33 UTC (permalink / raw)
  To: unicorn list

"Eric Wong" <normalperson@yhbt.net> said:
> nick@auger.net wrote:
>> "Eric Wong" <normalperson@yhbt.net> said:
>> > nick@auger.net wrote:
>> >> "Eric Wong" <normalperson@yhbt.net> said:
>> > I'm stumped :<
>>
>> I was afraid you'd say that :(.
> 
> Actually, another potential issue is DNS lookups timing out.  But they
> shouldn't take *that* long...
> 
>> > Do you have any background threads running that could be hanging the
>> > workers?   This is Ruby 1.8, after all, so there's more likely to be
>> > some blocking call hanging the entire process.  AFAIK, some monitoring
>> > software runs a background thread in the unicorn worker and maybe the
>> > OpenSSL extension doesn't work as well if it encountered network
>> > problems under Ruby 1.8
>>
>> We don't explicitly create any threads in our rails code.  We do
>> communicate with backgroundrb worker processes, although, none of the
>> strangeness today involved any routes that would hit backgroundrb
>> workers.
> 
> I proactively audit every piece of code (including external
> libraries/gems) loaded by an app for potentially blocking calls (hits to
> the filesystem, socket calls w/o timeout/blocking).   I use strace to
> help me find that sometimes...
> 
>> Is there any instrumentation that I could add that might help
>> debugging in the future? ($request_time and $upstream_response_time
>> are now in my nginx logs.)  We have noticed these "unexplainable
>> timeouts" before, but typically for a single worker.  If there's some
>> debugging that could be added I might be able to track it down during
>> these one-off events.
> 
> As an experiment, can you replay traffic a few minutes leading up to and
> including that 7m period in a test setup with only one straced worker?

I've replayed this in a development environment without triggering the issue.  Lack of POST payloads and slight differences between the two environments, make this difficult to test.

> Run "strace -T -f -o $FILE -p $PID_OF_WORKER" and see if there's any
> unexpected/surprising dependencies (connect() to unrecognized addresses,
> open() to networked filesystems, fcntl locks, etc...).
> 
> You can play around with some other strace options (-v/-s SIZE/-e filters)

I'll do my best to look at the output for anything out of the ordinary.  Unfortunately, strace is a new tool for me and is a bit out of my wheelhouse.


_______________________________________________
Unicorn mailing list - mongrel-unicorn@rubyforge.org
http://rubyforge.org/mailman/listinfo/mongrel-unicorn
Do not quote signatures (like this one) or top post when replying

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: A barrage of unexplained timeouts
  2013-08-21 13:33                   ` nick
@ 2013-08-22  2:32                     ` Jimmy Soho
  2013-08-22  4:09                       ` Eric Wong
  0 siblings, 1 reply; 15+ messages in thread
From: Jimmy Soho @ 2013-08-22  2:32 UTC (permalink / raw)
  To: unicorn list

fwiw, we experience this too irregularly, about once a week or 2 weeks for a few seconds, and sometimes up to a minute or 2 minutes long. for no apparent reasons indeed. and at random times.

my suspicion so far has always been that this is due to bad (wifi) networks of clients. what happens when a request is handled successfully by rails, but for whatever reason the last bit of the response can't be delivered because the receiving end is not consuming in a timely fashion?

If i'm not mistaken i think that rails logs a 200 success message before the response is actually send back?

Is a response send back by rails / unicorn fully buffered by nginx? Or is a unicorn worker blocked until the response is confirmed to have been successfully received by the client?

Aside: how do you protect your unicorn workers against a barrage of client requests that eat the response veeeeeery sloooooooowly?


--
JS

_______________________________________________
Unicorn mailing list - mongrel-unicorn@rubyforge.org
http://rubyforge.org/mailman/listinfo/mongrel-unicorn
Do not quote signatures (like this one) or top post when replying

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: A barrage of unexplained timeouts
  2013-08-22  2:32                     ` Jimmy Soho
@ 2013-08-22  4:09                       ` Eric Wong
  2013-08-22 13:10                         ` nick
  0 siblings, 1 reply; 15+ messages in thread
From: Eric Wong @ 2013-08-22  4:09 UTC (permalink / raw)
  To: unicorn list

Jimmy Soho <jimmy.soho@gmail.com> wrote:
> fwiw, we experience this too irregularly, about once a week or 2 weeks
> for a few seconds, and sometimes up to a minute or 2 minutes long. for
> no apparent reasons indeed. and at random times.

Wait, based on what you say below, is this happening to unicorn or
nginx?

> my suspicion so far has always been that this is due to bad (wifi)
> networks of clients. what happens when a request is handled
> successfully by rails, but for whatever reason the last bit of the
> response can't be delivered because the receiving end is not consuming
> in a timely fashion?

So nginx is complaining about disconnected clients, not unicorn,
correct?

> If i'm not mistaken i think that rails logs a 200 success message
> before the response is actually send back?

Correct, rails/unicorn won't know if the client disconnected until
nginx breaks the connection to unicorn.

> Is a response send back by rails / unicorn fully buffered by nginx? Or
> is a unicorn worker blocked until the response is confirmed to have
> been successfully received by the client?

nginx fully buffers both requests and responses by default.
Which leads us to...

> Aside: how do you protect your unicorn workers against a barrage of
> client requests that eat the response veeeeeery sloooooooowly?

That's the whole point of using nginx with unicorn.  nginx shields
unicorn from slow clients.
_______________________________________________
Unicorn mailing list - mongrel-unicorn@rubyforge.org
http://rubyforge.org/mailman/listinfo/mongrel-unicorn
Do not quote signatures (like this one) or top post when replying

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: A barrage of unexplained timeouts
  2013-08-22  4:09                       ` Eric Wong
@ 2013-08-22 13:10                         ` nick
  2013-08-22 17:05                           ` Eric Wong
  0 siblings, 1 reply; 15+ messages in thread
From: nick @ 2013-08-22 13:10 UTC (permalink / raw)
  To: unicorn list



"Eric Wong" <normalperson@yhbt.net> said:

> Jimmy Soho <jimmy.soho@gmail.com> wrote:
>> fwiw, we experience this too irregularly, about once a week or 2 weeks
>> for a few seconds, and sometimes up to a minute or 2 minutes long. for
>> no apparent reasons indeed. and at random times.
> 
> Wait, based on what you say below, is this happening to unicorn or
> nginx?
> 
>> my suspicion so far has always been that this is due to bad (wifi)
>> networks of clients. what happens when a request is handled
>> successfully by rails, but for whatever reason the last bit of the
>> response can't be delivered because the receiving end is not consuming
>> in a timely fashion?
> 
> So nginx is complaining about disconnected clients, not unicorn,
> correct?
> 
>> If i'm not mistaken i think that rails logs a 200 success message
>> before the response is actually send back?
> 
> Correct, rails/unicorn won't know if the client disconnected until
> nginx breaks the connection to unicorn.
> 
>> Is a response send back by rails / unicorn fully buffered by nginx? Or
>> is a unicorn worker blocked until the response is confirmed to have
>> been successfully received by the client?
> 
> nginx fully buffers both requests and responses by default.
> Which leads us to...
> 
>> Aside: how do you protect your unicorn workers against a barrage of
>> client requests that eat the response veeeeeery sloooooooowly?
> 
> That's the whole point of using nginx with unicorn.  nginx shields
> unicorn from slow clients.

Is it possible that the behavior could be related to nginx?  Nginx hanging onto the connection too long from the worker, so that the worker never makes it back into the pool and eventually get's killed for not checking in?

_______________________________________________
Unicorn mailing list - mongrel-unicorn@rubyforge.org
http://rubyforge.org/mailman/listinfo/mongrel-unicorn
Do not quote signatures (like this one) or top post when replying

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: A barrage of unexplained timeouts
  2013-08-22 13:10                         ` nick
@ 2013-08-22 17:05                           ` Eric Wong
  0 siblings, 0 replies; 15+ messages in thread
From: Eric Wong @ 2013-08-22 17:05 UTC (permalink / raw)
  To: unicorn list

nick@auger.net wrote:
> "Eric Wong" <normalperson@yhbt.net> said:
> > Jimmy Soho <jimmy.soho@gmail.com> wrote:
> >> Is a response send back by rails / unicorn fully buffered by nginx? Or
> >> is a unicorn worker blocked until the response is confirmed to have
> >> been successfully received by the client?
> > 
> > nginx fully buffers both requests and responses by default.
> > Which leads us to...
> > 
> >> Aside: how do you protect your unicorn workers against a barrage of
> >> client requests that eat the response veeeeeery sloooooooowly?
> > 
> > That's the whole point of using nginx with unicorn.  nginx shields
> > unicorn from slow clients.
 
> Is it possible that the behavior could be related to nginx?  Nginx
> hanging onto the connection too long from the worker, so that the
> worker never makes it back into the pool and eventually get's killed
> for not checking in?

Unlikely.  That'd be a serious bug in nginx since nginx was originally
designed to do this buffering since the earliest versions.  Which
version of nginx are you running?  Can you be sure "proxy_buffering off"
is _not_ set anywhere?
_______________________________________________
Unicorn mailing list - mongrel-unicorn@rubyforge.org
http://rubyforge.org/mailman/listinfo/mongrel-unicorn
Do not quote signatures (like this one) or top post when replying

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2013-08-22 17:05 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-08-20 14:47 A barrage of unexplained timeouts nick
2013-08-20 16:37 ` Eric Wong
2013-08-20 17:27   ` nick
2013-08-20 17:40     ` Eric Wong
2013-08-20 18:11       ` nick
2013-08-20 18:49         ` Eric Wong
2013-08-20 20:03           ` nick
2013-08-20 20:42             ` Eric Wong
2013-08-20 21:19               ` nick
2013-08-20 21:32                 ` Eric Wong
2013-08-21 13:33                   ` nick
2013-08-22  2:32                     ` Jimmy Soho
2013-08-22  4:09                       ` Eric Wong
2013-08-22 13:10                         ` nick
2013-08-22 17:05                           ` Eric Wong

Code repositories for project(s) associated with this public inbox

	https://yhbt.net/unicorn.git/

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).