Rainbows! Rack HTTP server user/dev discussion
 help / color / mirror / code / Atom feed
* Page request roundtrip time increases substantially after a bit of use
@ 2011-01-24 21:03 chris mckenzie
       [not found] ` <571697.98064.qm-oNR6tK37MtiB9c0Qi4KiSlZ8N9CAUha/QQ4Iyu8u01E@public.gmane.org>
  0 siblings, 1 reply; 10+ messages in thread
From: chris mckenzie @ 2011-01-24 21:03 UTC (permalink / raw)
  To: rainbows-talk-GrnCvJ7WPxnNLxjTenLetw

--- Note ---
I'm sorry if you get this twice.  I was sending this email from Yahoo, and I 
guess they defaulted to HTML for the mail.  My apologies if this was the case.  
If not, please ignore this message.  Thank you for your time.
--- Note ---


Hi,

First of all, let me thank all of you for creating such a wonderful product.  
Rainbows is a unique solution and is the perfect candidate to solve our complex 
problems.  I don't know where this current project could possibly be without 
your fine work.  :-)

Now about the problem.  First a little background on the architecture, so you 
can get the context:

I'm dealing with some code that I can't just legally paste for example (although 

I can probably make a simple proof of concept if needed) ... Here's the design:

I have a cascading long-poll connection, which listens for various JSON 
messages.  The throughput is quite low (a few a second) and I have a policy of 
falling over based on either a large amount of traffic, or a predefined amount  
of time (30 seconds) lapsing.

Essentially I have a 30 second connection and at second 25, a new one opens up 
... the first one closes, and that one lasts for 30 seconds, etc. 


This is designed for web browsers and it's implemented through hidden iframes.

This is a problem that has persisted across Firefox 3.6/OS X and Firefox 3.6/XP 
and Firefox 4.0b6/XP along with IE8/XP and Chrome/OS X and FF 3.6/Ubuntu so I 
don't think that any nuance of a browser or OS could be considered culpable.

I'm also not using any middle webserver like nginx and am connecting directly to 

rainbows!

*** The Problem ***
When serving static files along with my long polled connections, I will get a 
round trip time of ten millisecond or so, usually.  This is perfectly 
acceptable.

Every now and then, however, it will be about 2.5 seconds.  This will then be 
followed by a bunch of the snappy millisecond level transaction  times.  This is 

one browser with 1 persistent connection against rainbows configured with 10 
worker_processes and 100 worker_connections.

Everything changes about 5-10 minutes into things. 

Then every transaction takes about 2-4 seconds.  Static files that are 10 bytes 
in size, 2-4 seconds. Ruby code to emit "Hello World"? 2-4 seconds.  Every 
request.  Still using just 1 browser.

After I exit all browsers and then do a netstat on the client machine to see 
that the connections have closed,  I can then do a curl command for a static 
file; again 2-4 seconds.

On the machine running rainbows if I do a netstat, I get this:

tcp        1      0 10.10.192.12:7788       10.10.131.165:17443     CLOSE_WAIT
tcp        1      0  10.10.192.12:7788       10.10.131.165:17352     CLOSE_WAIT
tcp        1    196 10.10.192.12:7788       10.10.131.165:17317     CLOSE_WAIT
tcp        1    196 10.10.192.12:7788       10.10.131.165:17310     CLOSE_WAIT
tcp        1      0 10.10.192.12:7788       10.10.131.165:17437     CLOSE_WAIT
tcp        1      0 10.10.192.12:7788       10.10.131.165:17366     CLOSE_WAIT
tcp        1      0  10.10.192.12:7788       10.10.131.165:17410     CLOSE_WAIT
tcp        1      0 10.10.192.12:7788       10.10.131.165:17447     CLOSE_WAIT
tcp        1      0 10.10.192.12:7788       10.10.131.165:17357     CLOSE_WAIT
tcp        1      0 10.10.192.12:7788       10.10.131.165:17449     CLOSE_WAIT
tcp        1      0 10.10.192.12:7788       10.10.131.165:17347     CLOSE_WAIT
^^^ rainbows is running on 7788 on this machine ^^^.  

The reference  machine, in this case, windows, claims that all the connections 
are closed:
  Proto  Local Address          Foreign Address        State
  TCP    0.0.0.0:80             0.0.0.0:0              LISTENING
  TCP    0.0.0.0:135            0.0.0.0:0              LISTENING
  TCP    0.0.0.0:443            0.0.0.0:0              LISTENING
  TCP     0.0.0.0:445            0.0.0.0:0              LISTENING
  TCP    0.0.0.0:6060           0.0.0.0:0              LISTENING
  TCP    0.0.0.0:63508          0.0.0.0:0              LISTENING
  TCP    10.10.131.165:139      0.0.0.0:0              LISTENING
  TCP    10.10.131.165:16841    187.39.33.180:11827    ESTABLISHED
  TCP    10.10.131.165:16844     69.63.181.105:5222     ESTABLISHED
  TCP    10.10.131.165:16849    10.10.10.71:5222       ESTABLISHED
  TCP    10.10.131.165:16883    64.12.29.50:443        ESTABLISHED
  TCP    10.10.131.165:16904    64.12.28.222:443       ESTABLISHED
  TCP    10.10.131.165:16915    64.12.165.99:443       ESTABLISHED
  TCP    10.10.131.165:16918    64.12.202.37:443       ESTABLISHED
  TCP    10.10.131.165:16958    205.188.248.151:443    ESTABLISHED
  TCP    10.10.131.165:16961    205.188.254.83:443      ESTABLISHED
  TCP    10.10.131.165:17102    10.10.131.136:22       ESTABLISHED
  TCP    10.10.131.165:17466    10.10.192.12:22        ESTABLISHED
  TCP    10.10.131.165:17470    10.0.0.29:515          SYN_SENT
  TCP    127.0.0.1:1030         0.0.0.0:0              LISTENING
  TCP    192.168.56.1:139       0.0.0.0:0              LISTENING
  TCP    [::]:135                [::]:0                 LISTENING       0

I can disconnect the windows client machine; turn it off even, and this problem 
persists.

I think that somewhere in the ruby stack, the connections are not closing.  If I 

increase my worker_process count and prolong the long poll, then yes, I'll 
survive for 15 minutes instead of 5; but the problem will still eventually occur 

and I will hit the wall.

I have yet to try to test unicorn or zbatery for this style of solution because 
I need the keep-alive; and although I know that unicorn put in the keep-alive 
support for rainbows, I haven't really taken the time necessary to know how to 
invoke it.  If you think this would be instructive, I'd be happy to do so.

As far as my ruby set-up, I'm using ruby1.8 and I have the ThreadSpawn model.  
We are running on a modern  version of Ubuntu without any serious customization.

If you think that another configuration would do the trick or if you know how to 

squash this bug, it would be
very helpful.   This problem has become of great concern for us.  We love 
rainbows and all that it is. Thanks for the project and keep up the good work.

Cheers,
~chris.


      
_______________________________________________
Rainbows! mailing list - rainbows-talk-GrnCvJ7WPxnNLxjTenLetw@public.gmane.org
http://rubyforge.org/mailman/listinfo/rainbows-talk
Do not quote signatures (like this one) or top post when replying


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Page request roundtrip time increases substantially after a bit of use
       [not found] ` <571697.98064.qm-oNR6tK37MtiB9c0Qi4KiSlZ8N9CAUha/QQ4Iyu8u01E@public.gmane.org>
@ 2011-01-24 21:54   ` Eric Wong
       [not found]     ` <20110124215440.GA25489-yBiyF41qdooeIZ0/mPfg9Q@public.gmane.org>
  0 siblings, 1 reply; 10+ messages in thread
From: Eric Wong @ 2011-01-24 21:54 UTC (permalink / raw)
  To: Rainbows! list

chris mckenzie <kristopolous-/E1597aS9LQAvxtiuMwx3w@public.gmane.org> wrote:
> Every now and then, however, it will be about 2.5 seconds.  This will then be 
> followed by a bunch of the snappy millisecond level transaction  times.  This is 

That is probably GC, but...

> Everything changes about 5-10 minutes into things. 
> 
> Then every transaction takes about 2-4 seconds.  Static files that are 10 bytes 
> in size, 2-4 seconds. Ruby code to emit "Hello World"? 2-4 seconds.  Every 
> request.  Still using just 1 browser.
> 
> After I exit all browsers and then do a netstat on the client machine to see 
> that the connections have closed,  I can then do a curl command for a static 
> file; again 2-4 seconds.
> 
> On the machine running rainbows if I do a netstat, I get this:
> 
> tcp        1      0 10.10.192.12:7788       10.10.131.165:17443     CLOSE_WAIT
> tcp        1      0  10.10.192.12:7788       10.10.131.165:17352     CLOSE_WAIT
> tcp        1    196 10.10.192.12:7788       10.10.131.165:17317     CLOSE_WAIT
> tcp        1    196 10.10.192.12:7788       10.10.131.165:17310     CLOSE_WAIT

             ^ Strange that Send-Q is 1 across all those connections..

Did you see the machine/connection that ran curl in there?  How does
hitting Rainbows! from localhost work?

Are you dropping packets?

> I think that somewhere in the ruby stack, the connections are not
> closing.

I would do an lsof on some of the worker processes to see if they
still think a connection is open.  Do you see the connection from
curl in netstat?

> increase my worker_process count and prolong the long poll, then yes, I'll 
> survive for 15 minutes instead of 5; but the problem will still eventually occur 
> 
> and I will hit the wall.

> I have yet to try to test unicorn or zbatery for this style of
> solution because I need the keep-alive; and although I know that
> unicorn put in the keep-alive support for rainbows, I haven't really
> taken the time necessary to know how to invoke it.  If you think this
> would be instructive, I'd be happy to do so.

The Unicorn parser supports keepalive for Rainbows!, but Unicorn itself
does not.  Rainbows! "use :Base" (the default) is basically the same
thing with Unicorn+keepalive.  Zbatery doesn't do anything differently
for managing client connections than Rainbows!

> As far as my ruby set-up, I'm using ruby1.8 and I have the ThreadSpawn
> model.  We are running on a modern  version of Ubuntu without any
> serious customization.

What's your Rainbows! keepalive_timeout set to?  The default is 5s.
What's your worker_connections setting?

> If you think that another configuration would do the trick or if you
> know how to squash this bug, it would be very helpful.   This problem
> has become of great concern for us.  We love rainbows and all that it
> is. Thanks for the project and keep up the good work.

Do you notice your Rainbows! worker processes growing in memory usage
over time?  Which Ruby 1.8 patch level?  Are you running any custom
GC?  If your app supports it, try Ruby 1.9.2-p136, too.

Do you always set Content-Length or "Transfer-Encoding: chunked" in your
app responses?  Missing/inaccurate values will throw off clients if you
have keepalive, otherwise I've never seen anything like the problem you
describe :<

-- 
Eric Wong
_______________________________________________
Rainbows! mailing list - rainbows-talk-GrnCvJ7WPxnNLxjTenLetw@public.gmane.org
http://rubyforge.org/mailman/listinfo/rainbows-talk
Do not quote signatures (like this one) or top post when replying


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Page request roundtrip time increases substantially after a bit of use
       [not found]     ` <20110124215440.GA25489-yBiyF41qdooeIZ0/mPfg9Q@public.gmane.org>
@ 2011-01-25  0:11       ` Eric Wong
       [not found]         ` <20110125001107.GA1921-yBiyF41qdooeIZ0/mPfg9Q@public.gmane.org>
  0 siblings, 1 reply; 10+ messages in thread
From: Eric Wong @ 2011-01-25  0:11 UTC (permalink / raw)
  To: Rainbows! list

Eric Wong <normalperson-rMlxZR9MS24@public.gmane.org> wrote:
> chris mckenzie <kristopolous-/E1597aS9LQAvxtiuMwx3w@public.gmane.org> wrote:
> > On the machine running rainbows if I do a netstat, I get this:
> > 
> > tcp        1      0 10.10.192.12:7788       10.10.131.165:17443     CLOSE_WAIT
> > tcp        1      0  10.10.192.12:7788       10.10.131.165:17352     CLOSE_WAIT
> > tcp        1    196 10.10.192.12:7788       10.10.131.165:17317     CLOSE_WAIT
> > tcp        1    196 10.10.192.12:7788       10.10.131.165:17310     CLOSE_WAIT
> 
>              ^ Strange that Send-Q is 1 across all those connections..
> 
> Did you see the machine/connection that ran curl in there?  How does
> hitting Rainbows! from localhost work?

One more thing, do you use Thread#{kill,exit,terminate}! or anything
that would prevent an ensure statement from firing and calling IO#close
on the client socket?

-- 
Eric Wong
_______________________________________________
Rainbows! mailing list - rainbows-talk-GrnCvJ7WPxnNLxjTenLetw@public.gmane.org
http://rubyforge.org/mailman/listinfo/rainbows-talk
Do not quote signatures (like this one) or top post when replying


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Page request roundtrip time increases substantially after a bit of use
       [not found]         ` <20110125001107.GA1921-yBiyF41qdooeIZ0/mPfg9Q@public.gmane.org>
@ 2011-01-25  1:14           ` chris mckenzie
       [not found]             ` <443004.19531.qm-9Fhxc66Fx5iB9c0Qi4KiSlZ8N9CAUha/QQ4Iyu8u01E@public.gmane.org>
  2011-01-25  2:55           ` chris mckenzie
  1 sibling, 1 reply; 10+ messages in thread
From: chris mckenzie @ 2011-01-25  1:14 UTC (permalink / raw)
  To: Rainbows! list

Hi Eric,

I'll prepare a more formal response in a bit, but here is my test run:

My rackup file for this is:
use Rack::ShowExceptions
use Rack::ShowStatus
map "/static/" do
  run Rack::File.new(File.dirname(__FILE__))
end




My config file is:
pid "/tmp/my-pid.pid"

timeout 300
listen "*:7788", :backlog => 2048

stderr_path "/tmp/my-log.stderr.log"
stdout_path "/tmp/my-log.stdout.log"

worker_processes 10
Rainbows! do
  use :ThreadSpawn
  worker_connections 400
end


I have a static file, test.txt with the contents "Hello world"

After doing
rainbows -c config.rb rackup.ru

I then executed the shell script below:
#!/bin/tcsh
rm testrun
loop:
echo -n `/bin/date +%s.%N`," "  >> testrun
curl -s -w "%{time_connect}, %{time_pretransfer}, %{time_starttransfer}, 
%{time_total}\n" $1 -o /dev/null >> testrun
goto loop

On the localhost, for some time.

The memory footprint remained flat.
The CPU usage did not spike noticeably
netstat -an did reveal some CLOSE_WAIT values on the ports but nothing that 
hadn't previously been pointed out.

The (CSV) output of the test can be seen here:
http://qaa.ath.cx/single-request.csv.gz

For those of you without any visualization software, I made a rudimentary graph 
from the data here:

http://qaa.ath.cx/single-request.png

You can clearly see how the delay increases and then doesn't ever go back down 
to previous levels.


No web browser was running while this test was done and a grep -v on the stderr 
revealed that no other request other then for the localhost was satisfied.



I'll research answers to your previous questions now.  Thanks for looking into 
this!

~chris.
----- Original Message ----
From: Eric Wong <normalperson-rMlxZR9MS24@public.gmane.org>
To: Rainbows! list <rainbows-talk-GrnCvJ7WPxnNLxjTenLetw@public.gmane.org>
Sent: Mon, January 24, 2011 4:11:07 PM
Subject: Re: Page request roundtrip time increases substantially after a bit of 
use

Eric Wong <normalperson-rMlxZR9MS24@public.gmane.org> wrote:
> chris mckenzie <kristopolous-/E1597aS9LQAvxtiuMwx3w@public.gmane.org> wrote:
> > On the machine running rainbows if I do a netstat, I get this:
> > 
> > tcp        1      0 10.10.192.12:7788       10.10.131.165:17443     
>CLOSE_WAIT
> > tcp        1      0  10.10.192.12:7788       10.10.131.165:17352     
>CLOSE_WAIT
> > tcp        1    196 10.10.192.12:7788       10.10.131.165:17317     
>CLOSE_WAIT
> > tcp        1    196 10.10.192.12:7788       10.10.131.165:17310     
>CLOSE_WAIT
> 
>              ^ Strange that Send-Q is 1 across all those connections..
> 
> Did you see the machine/connection that ran curl in there?  How does
> hitting Rainbows! from localhost work?

One more thing, do you use Thread#{kill,exit,terminate}! or anything
that would prevent an ensure statement from firing and calling IO#close
on the client socket?

-- 
Eric Wong
_______________________________________________
Rainbows! mailing list - rainbows-talk-GrnCvJ7WPxnNLxjTenLetw@public.gmane.org
http://rubyforge.org/mailman/listinfo/rainbows-talk
Do not quote signatures (like this one) or top post when replying



      
_______________________________________________
Rainbows! mailing list - rainbows-talk-GrnCvJ7WPxnNLxjTenLetw@public.gmane.org
http://rubyforge.org/mailman/listinfo/rainbows-talk
Do not quote signatures (like this one) or top post when replying


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Page request roundtrip time increases substantially after a bit of use
       [not found]             ` <443004.19531.qm-9Fhxc66Fx5iB9c0Qi4KiSlZ8N9CAUha/QQ4Iyu8u01E@public.gmane.org>
@ 2011-01-25  2:02               ` Eric Wong
  0 siblings, 0 replies; 10+ messages in thread
From: Eric Wong @ 2011-01-25  2:02 UTC (permalink / raw)
  To: Rainbows! list

chris mckenzie <kristopolous-/E1597aS9LQAvxtiuMwx3w@public.gmane.org> wrote:
> Hi Eric,
> 
> I'll prepare a more formal response in a bit, but here is my test run:
> rainbows -c config.rb rackup.ru

Thanks, I'm trying to reproduce it now.  Are you able to reproduce it
more quickly by throwing some ab or httperf runs in the mix to make more
requests?  Reducing worker_processes usually help reproduce issues more
quickly, too, especially if it's a resource leak.

> The memory footprint remained flat.
> The CPU usage did not spike noticeably
> netstat -an did reveal some CLOSE_WAIT values on the ports but nothing that 
> hadn't previously been pointed out.

Also, do you have any iptables/firewall/QoS settings on that machine
that could be interfering?

I haven't noticed anything in CLOSE_WAIT since I started testing it a
few minutes ago.  Maybe it takes longer, but CLOSE_WAIT has always been
a rarity to see in my years of working with TCP client/servers...

> The (CSV) output of the test can be seen here:
> http://qaa.ath.cx/single-request.csv.gz

Just covering all my bases here, you don't have a super slow
disk/filesystem that bogs down your entire system once your logs grow to
a certain size, right?

> For those of you without any visualization software, I made a rudimentary graph 
> from the data here:
> 
> http://qaa.ath.cx/single-request.png
> 
> You can clearly see how the delay increases and then doesn't ever go back down 
> to previous levels.

Very strange.

I'm testing on my 32-bit machine with ruby 1.8.7 (2010-12-23 patchlevel
330) [i686-linux] straight off of ruby-lang.org .  I usually use
Rainbows! with 1.9.2-p136 and will try that if I can reproduce the bug
under 1.8.7, too.

> I'll research answers to your previous questions now.  Thanks for looking into 
> this!

Alright, thanks!

-- 
Eric Wong
_______________________________________________
Rainbows! mailing list - rainbows-talk-GrnCvJ7WPxnNLxjTenLetw@public.gmane.org
http://rubyforge.org/mailman/listinfo/rainbows-talk
Do not quote signatures (like this one) or top post when replying


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Page request roundtrip time increases substantially after a bit of use
       [not found]         ` <20110125001107.GA1921-yBiyF41qdooeIZ0/mPfg9Q@public.gmane.org>
  2011-01-25  1:14           ` chris mckenzie
@ 2011-01-25  2:55           ` chris mckenzie
       [not found]             ` <288407.18061.qm-oNR6tK37MtiB9c0Qi4KiSlZ8N9CAUha/QQ4Iyu8u01E@public.gmane.org>
  1 sibling, 1 reply; 10+ messages in thread
From: chris mckenzie @ 2011-01-25  2:55 UTC (permalink / raw)
  To: Rainbows! list

chris mckenzie <kristopolous at yahoo.com> wrote:
> Hi Eric,
> 
> I'll prepare a more formal response in a bit, but here is my test run:
> rainbows -c config.rb rackup.ru

>Thanks, I'm trying to reproduce it now.  Are you able to reproduce it
>more quickly by throwing some ab or httperf runs in the mix to make more
>requests?  Reducing worker_processes usually help reproduce issues more
>quickly, too, especially if it's a resource leak.

It's a finicky bug.  That data below was perfect; exactly what I was 
experiencing; but now it refuses to show its face again.  Just to be sure; it's 
the plateau that concerns me,not the occasional GC spikes ... not even those few 
huge ones.


Another interesting note is that when I use the full stack that I have, the 
bitrate throughput goes to the tube too.  It may be a problem with chunked 
transfer? I don't know really; probably ethtool and wireshark would tell us.

Here's the basic pattern though...

My application load pulls down < 100k JS.  Usually it's about 60ms or so.  When 
the plateau hits, curl reports that the throughput goes from 10.9M/s to 32K/s 
... however, this could be an unrelated problem.

If it's not, however, and If this problem was scarce resource acquisition 
entirely, then I would probably think that a 10 byte file and an 80k file would 
take about the same time ... say 2 seconds + however long it took to transfer.


> Also, do you have any iptables/firewall/QoS settings on that machine
> that could be interfering?

No.  This is a vanilla install.  It's a desktop linux system and so it has X and 
firefox and a few terminals; some minimal window manager; nothing extraordinary; 
htop reveals 9GB ram free, although I just noticed that ruby is taking up double 
digit cpu per core; perhaps it's just the nature of ruby but I'll see what I can 
do about graphing that during a lifecycle.




> I haven't noticed anything in CLOSE_WAIT since I started testing it a
> few minutes ago.  Maybe it takes longer, but CLOSE_WAIT has always been
> a rarity to see in my years of working with TCP client/servers...

We are looking at "TIME_WAIT" for a lot of them ... but after the browser is 
down and the thread has been idling for 10 minutes ... it falls into 
"CLOSE_WAIT" until I bring it down and back up.


> The (CSV) output of the test can be seen here:
> http://qaa.ath.cx/single-request.csv.gz

> Just covering all my bases here, you don't have a super slow
> disk/filesystem that bogs down your entire system once your logs grow to
> a certain size, right?

I *thought* about this ... truly.  I haven't factored it out to be honest but 
no, the await from iostat is good, the machine itself is still responsive to my 
cp and mv commands for the data during the tests and there is no noticable 
slowdown of anything else.  It's a two way quad core xeon with 12GB of memory 
(t7500) so I don't think that the physical hardware is to blame (although it 
could be memory related; who knows!)

> For those of you without any visualization software, I made a rudimentary graph 
>
>
> from the data here:
> 
> http://qaa.ath.cx/single-request.png
> 
> You can clearly see how the delay increases and then doesn't ever go back down 


> to previous levels.

> Very strange.

> I'm testing on my 32-bit machine with ruby 1.8.7 (2010-12-23 patchlevel
> 330) [i686-linux] straight off of ruby-lang.org .  I usually use
> Rainbows! with 1.9.2-p136 and will try that if I can reproduce the bug
> under 1.8.7, too.


Let me upgrade my ruby to that today or early tomorrow ... I'm just using the 
one that ubuntu gives me after apt-get update/upgrade (which is patch 72).  I 
probably can't go too custom as this has to be part of a deployable sdk; but 
it's worth a shot.  Thanks again.

~chris.

> I'll research answers to your previous questions now.  Thanks for looking into 


> this!

Alright, thanks!

-- 
Eric Wong



      
_______________________________________________
Rainbows! mailing list - rainbows-talk-GrnCvJ7WPxnNLxjTenLetw@public.gmane.org
http://rubyforge.org/mailman/listinfo/rainbows-talk
Do not quote signatures (like this one) or top post when replying


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Page request roundtrip time increases substantially after a bit of use
       [not found]             ` <288407.18061.qm-oNR6tK37MtiB9c0Qi4KiSlZ8N9CAUha/QQ4Iyu8u01E@public.gmane.org>
@ 2011-01-25  3:50               ` Eric Wong
       [not found]                 ` <20110125035048.GA8124-yBiyF41qdooeIZ0/mPfg9Q@public.gmane.org>
  0 siblings, 1 reply; 10+ messages in thread
From: Eric Wong @ 2011-01-25  3:50 UTC (permalink / raw)
  To: Rainbows! list

chris mckenzie <kristopolous-/E1597aS9LQAvxtiuMwx3w@public.gmane.org> wrote:
> chris mckenzie <kristopolous at yahoo.com> wrote:
> > Hi Eric,
> > 
> > I'll prepare a more formal response in a bit, but here is my test run:
> > rainbows -c config.rb rackup.ru
> 
> >Thanks, I'm trying to reproduce it now.  Are you able to reproduce it
> >more quickly by throwing some ab or httperf runs in the mix to make more
> >requests?  Reducing worker_processes usually help reproduce issues more
> >quickly, too, especially if it's a resource leak.
> 
> It's a finicky bug.  That data below was perfect; exactly what I was 
> experiencing; but now it refuses to show its face again.  Just to be sure; it's 
> the plateau that concerns me,not the occasional GC spikes ... not even those few 
> huge ones.

Yeah, GC spikes are unavoidable in MRI (but so is Internet latency and
users can't tell :).  The plateau is strange/disturbing and I haven't
been able to reproduce it.

I'm still running my test and will probably run it for a few more
hours to be on the safe side.

> Another interesting note is that when I use the full stack that I have, the 
> bitrate throughput goes to the tube too.  It may be a problem with chunked 
> transfer? I don't know really; probably ethtool and wireshark would tell us.

Your test managed to reproduce over loopback, though.

> > Also, do you have any iptables/firewall/QoS settings on that machine
> > that could be interfering?
> 
> No.  This is a vanilla install.  It's a desktop linux system and so it has X and 
> firefox and a few terminals; some minimal window manager; nothing extraordinary; 
> htop reveals 9GB ram free, although I just noticed that ruby is taking up double 
> digit cpu per core; perhaps it's just the nature of ruby but I'll see what I can 
> do about graphing that during a lifecycle.

Aha! Double-digit CPU usage is definitely atypical for MRI.  Is it some
logrotate job hitting Rainbows! with USR1 signals repeatedly?  Is it
stuck in the double-digits or just spiking?

Debian (and presumably Ubuntu) build Ruby 1.8 with the (non-default)
--enable-pthreads option which can lead to this *occasionally* and also
hurt performance.  If you're stuck on 1.8, I would always build Ruby to
not use pthreads (or try some of the patches from the guy at
timetobleed.com that makes 1.8+pthreads better).

> > I haven't noticed anything in CLOSE_WAIT since I started testing it a
> > few minutes ago.  Maybe it takes longer, but CLOSE_WAIT has always been
> > a rarity to see in my years of working with TCP client/servers...
> 
> We are looking at "TIME_WAIT" for a lot of them ... but after the browser is 
> down and the thread has been idling for 10 minutes ... it falls into 
> "CLOSE_WAIT" until I bring it down and back up.

It's common to have many TIME_WAIT sockets lying around.  Until you get
many thousands of non-keepalive requests a second, they're mostly
harmless and the kernel will GC them.

> Let me upgrade my ruby to that today or early tomorrow ... I'm just using the 
> one that ubuntu gives me after apt-get update/upgrade (which is patch 72).  I 
> probably can't go too custom as this has to be part of a deployable sdk; but 
> it's worth a shot.  Thanks again.

Definitely try that and going without --enable-pthreads if you have to
use 1.8.  I just "./configure --prefix=$HOME && make && make install"
but I've heard RVM is popular these days.

-- 
Eric Wong
_______________________________________________
Rainbows! mailing list - rainbows-talk-GrnCvJ7WPxnNLxjTenLetw@public.gmane.org
http://rubyforge.org/mailman/listinfo/rainbows-talk
Do not quote signatures (like this one) or top post when replying


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Page request roundtrip time increases substantially after a bit of use
       [not found]                 ` <20110125035048.GA8124-yBiyF41qdooeIZ0/mPfg9Q@public.gmane.org>
@ 2011-01-25 21:38                   ` Eric Wong
       [not found]                     ` <20110125213835.GA9421-yBiyF41qdooeIZ0/mPfg9Q@public.gmane.org>
  0 siblings, 1 reply; 10+ messages in thread
From: Eric Wong @ 2011-01-25 21:38 UTC (permalink / raw)
  To: Rainbows! list

Eric Wong <normalperson-rMlxZR9MS24@public.gmane.org> wrote:
> I'm still running my test and will probably run it for a few more
> hours to be on the safe side.

I left your test running overnight and everything appeared normal and
responsive.

Let me know if you or anybody else can reproduce this more reliably or
if you figured it was something unique to your setup.

-- 
Eric Wong
_______________________________________________
Rainbows! mailing list - rainbows-talk-GrnCvJ7WPxnNLxjTenLetw@public.gmane.org
http://rubyforge.org/mailman/listinfo/rainbows-talk
Do not quote signatures (like this one) or top post when replying


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Page request roundtrip time increases substantially after a bit of use
       [not found]                     ` <20110125213835.GA9421-yBiyF41qdooeIZ0/mPfg9Q@public.gmane.org>
@ 2011-02-03  2:54                       ` Eric Wong
       [not found]                         ` <20110203025415.GA3812-yBiyF41qdooeIZ0/mPfg9Q@public.gmane.org>
  0 siblings, 1 reply; 10+ messages in thread
From: Eric Wong @ 2011-02-03  2:54 UTC (permalink / raw)
  To: Rainbows! list

Eric Wong <normalperson-rMlxZR9MS24@public.gmane.org> wrote:
> Let me know if you or anybody else can reproduce this more reliably or
> if you figured it was something unique to your setup.

Is this still a problem?  Silence implies that it's not a problem with
Rainbows! :>

-- 
Eric Wong
_______________________________________________
Rainbows! mailing list - rainbows-talk-GrnCvJ7WPxnNLxjTenLetw@public.gmane.org
http://rubyforge.org/mailman/listinfo/rainbows-talk
Do not quote signatures (like this one) or top post when replying


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Page request roundtrip time increases substantially after a bit of use
       [not found]                         ` <20110203025415.GA3812-yBiyF41qdooeIZ0/mPfg9Q@public.gmane.org>
@ 2011-02-03  4:44                           ` chris mckenzie
  0 siblings, 0 replies; 10+ messages in thread
From: chris mckenzie @ 2011-02-03  4:44 UTC (permalink / raw)
  To: Rainbows! list

I don't know who is to blame.  I know how to keep rainbows happy though.  So 
what I do is make sure that rainbows is tasked with closing all connections ... 
the client doesn't cut off the long polls; that goes pretty far.  Then there are 
a few other tricks I believe I used ... but I don't remember.  




----- Original Message ----
From: Eric Wong <normalperson-rMlxZR9MS24@public.gmane.org>
To: Rainbows! list <rainbows-talk-GrnCvJ7WPxnNLxjTenLetw@public.gmane.org>
Sent: Wed, February 2, 2011 6:54:15 PM
Subject: Re: Page request roundtrip time increases substantially after a bit of 
use

Eric Wong <normalperson-rMlxZR9MS24@public.gmane.org> wrote:
> Let me know if you or anybody else can reproduce this more reliably or
> if you figured it was something unique to your setup.

Is this still a problem?  Silence implies that it's not a problem with
Rainbows! :>

-- 
Eric Wong
_______________________________________________
Rainbows! mailing list - rainbows-talk-GrnCvJ7WPxnNLxjTenLetw@public.gmane.org
http://rubyforge.org/mailman/listinfo/rainbows-talk
Do not quote signatures (like this one) or top post when replying



      
_______________________________________________
Rainbows! mailing list - rainbows-talk-GrnCvJ7WPxnNLxjTenLetw@public.gmane.org
http://rubyforge.org/mailman/listinfo/rainbows-talk
Do not quote signatures (like this one) or top post when replying


^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2011-02-03  4:44 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-01-24 21:03 Page request roundtrip time increases substantially after a bit of use chris mckenzie
     [not found] ` <571697.98064.qm-oNR6tK37MtiB9c0Qi4KiSlZ8N9CAUha/QQ4Iyu8u01E@public.gmane.org>
2011-01-24 21:54   ` Eric Wong
     [not found]     ` <20110124215440.GA25489-yBiyF41qdooeIZ0/mPfg9Q@public.gmane.org>
2011-01-25  0:11       ` Eric Wong
     [not found]         ` <20110125001107.GA1921-yBiyF41qdooeIZ0/mPfg9Q@public.gmane.org>
2011-01-25  1:14           ` chris mckenzie
     [not found]             ` <443004.19531.qm-9Fhxc66Fx5iB9c0Qi4KiSlZ8N9CAUha/QQ4Iyu8u01E@public.gmane.org>
2011-01-25  2:02               ` Eric Wong
2011-01-25  2:55           ` chris mckenzie
     [not found]             ` <288407.18061.qm-oNR6tK37MtiB9c0Qi4KiSlZ8N9CAUha/QQ4Iyu8u01E@public.gmane.org>
2011-01-25  3:50               ` Eric Wong
     [not found]                 ` <20110125035048.GA8124-yBiyF41qdooeIZ0/mPfg9Q@public.gmane.org>
2011-01-25 21:38                   ` Eric Wong
     [not found]                     ` <20110125213835.GA9421-yBiyF41qdooeIZ0/mPfg9Q@public.gmane.org>
2011-02-03  2:54                       ` Eric Wong
     [not found]                         ` <20110203025415.GA3812-yBiyF41qdooeIZ0/mPfg9Q@public.gmane.org>
2011-02-03  4:44                           ` chris mckenzie

Code repositories for project(s) associated with this public inbox

	https://yhbt.net/rainbows.git/

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).