From mboxrd@z Thu Jan 1 00:00:00 1970 X-Spam-Checker-Version: SpamAssassin 3.3.2 (2011-06-06) on dcvr.yhbt.net X-Spam-Level: X-Spam-ASN: AS33070 50.56.128.0/17 X-Spam-Status: No, score=0.3 required=3.0 tests=MSGID_FROM_MTA_HEADER, PLING_QUERY,TVD_RCVD_IP shortcircuit=no autolearn=no version=3.3.2 Path: news.gmane.org!not-for-mail From: Ilya Maykov Newsgroups: gmane.comp.lang.ruby.rainbows.general Subject: Rainbows! + EventMachine + Sinatra::Synchrony == pegged CPU when idle? Date: Tue, 19 Jun 2012 02:06:19 -0700 Message-ID: NNTP-Posting-Host: plane.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit X-Trace: dough.gmane.org 1340096792 31238 80.91.229.3 (19 Jun 2012 09:06:32 GMT) X-Complaints-To: usenet@dough.gmane.org NNTP-Posting-Date: Tue, 19 Jun 2012 09:06:32 +0000 (UTC) To: rainbows-talk-GrnCvJ7WPxnNLxjTenLetw@public.gmane.org Original-X-From: rainbows-talk-bounces-GrnCvJ7WPxnNLxjTenLetw@public.gmane.org Tue Jun 19 11:06:32 2012 Return-path: Envelope-to: gclrrg-rainbows-talk@m.gmane.org X-Original-To: rainbows-talk-GrnCvJ7WPxnNLxjTenLetw@public.gmane.org Delivered-To: rainbows-talk-GrnCvJ7WPxnNLxjTenLetw@public.gmane.org DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:date:message-id:subject:from:to:content-type; bh=4oC4nQlAwfh5JBWDgJvReA9kCkfJiD1p0A9GB2tM9v0=; b=mhB+ILad139dMQmOOYUysL08cT3hMtQ/G9y71tAMrxBEfAqUtP6Qt4OzPKdZkLBf4O DIh9qMOqdTT65erzLMRRLItW/vUb6qiJcBzOqwvxCXEtbQS4uJBKgOyMO99jnDQWzqzJ oVV7UVVAFdSkzLYFgB3eQrOjShC6fe4ZurGOV9vnyXcvfXQIXONylKIMqKKEY6ynP+FK 4TNbQ5TA4mbYPEV5z++zXjciduDkqomDfmGk78AMScK0K6qIchgkYWLE3hjLucxdOnRj hrZyoVCTNsSPlbUXjvLoEoWWlOGexqTp2CaEvDzqPEt3Z9HZ3V0nYMUUYtuaPgCEyTS1 AZeQ== X-BeenThere: rainbows-talk-GrnCvJ7WPxnNLxjTenLetw@public.gmane.org X-Mailman-Version: 2.1.12 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Original-Sender: rainbows-talk-bounces-GrnCvJ7WPxnNLxjTenLetw@public.gmane.org Errors-To: rainbows-talk-bounces-GrnCvJ7WPxnNLxjTenLetw@public.gmane.org Xref: news.gmane.org gmane.comp.lang.ruby.rainbows.general:366 Archived-At: Received: from 50-56-192-79.static.cloud-ips.com ([50.56.192.79] helo=rubyforge.org) by plane.gmane.org with esmtp (Exim 4.69) (envelope-from ) id 1SguOM-00061v-HM for gclrrg-rainbows-talk@m.gmane.org; Tue, 19 Jun 2012 11:06:30 +0200 Received: from localhost.localdomain (localhost [127.0.0.1]) by rubyforge.org (Postfix) with ESMTP id 82DDE2E06E; Tue, 19 Jun 2012 09:06:29 +0000 (UTC) Received: from mail-lb0-f178.google.com (mail-lb0-f178.google.com [209.85.217.178]) by rubyforge.org (Postfix) with ESMTP id 57F022E06D for ; Tue, 19 Jun 2012 09:06:20 +0000 (UTC) Received: by lbbgj10 with SMTP id gj10so6777247lbb.23 for ; Tue, 19 Jun 2012 02:06:19 -0700 (PDT) Received: by 10.152.105.235 with SMTP id gp11mr11818803lab.44.1340096779346; Tue, 19 Jun 2012 02:06:19 -0700 (PDT) Received: by 10.152.146.198 with HTTP; Tue, 19 Jun 2012 02:06:19 -0700 (PDT) Hi all, We're using Rainbows + EventMachine + Sinatra::Synchrony to run a fleet of RESTful web servers backed by a Cassandra cluster. We are using the EventMachineTransport to talk to Cassandra with an EM::Synchrony::ConnectionPool in each rainbows worker. We have a Storm cluster pushing a large stream of real-time data into the Rainbows fleet using HTTP PUT requests. We're running into some very strange performance issues and need help figuring out what's going on. Basically, when load is low, everything looks good. When we crank up the load, all of a sudden the CPU gets pegged, request latencies go waaaay up, and requests start timing out. Once this state is reached, the high CPU usage (4 rainbows worker processes at ~50% each on a 2-core machine = nearly full load) remains even if we completely shut off all incoming traffic. Taking a look with strace -p, it looks like the rainbows processes are writing ascii NUL characters to file descriptor 7 (which is a FIFO) as fast as the kernel will let them. My guess is that the worker is trying to communicate with the rainbows master process via the FIFO. Not sure what is triggering this behavior, but would like to know if anyone else has ever seen something like this. This thread sounded like it could've been a similar issue, but died out without any conclusion: http://rubyforge.org/pipermail/rainbows-talk/2012-April/000345.html Some details about the setup: 6-node cassandra cluster 3 nodes running rainbows web servers 4 rainbows workers per node max of 50 cassandra connections per rainbows worker rainbows.conf has: Rainbows! do use :EventMachine worker_connections 50 keepalive_requests 1000 keepalive_timeout 10 end So, each rainbows node can handle 4 * 50 = 200 simultaneous connections 12 Storm worker processes writing to the rainbows web servers each Storm worker has max of 10 connections open to each of the 3 rainbows nodes So, each rainbows node has 12 * 10 = 120 incoming connections from Storm. Have been playing around with the numbers, the bug (assuming it is a bug) seems to be easier to trigger when I increase the number of incoming connections (from Storm workers), even if they are a lot less than the rainbows servers can take (60-70% of the max connections is usually enough). The bug is also easier to trigger when we increase the volume of data we're pushing through Storm - hundreds or thousands of requests per minute, no bug - hundreds of thousands of requests per minute, yes bug. Cassandra is not the issue, it can easily take the write load we're generating and is basically idle. Any help in figuring this out would be greatly appreciated. Thanks, -- Ilya _______________________________________________ Rainbows! mailing list - rainbows-talk-GrnCvJ7WPxnNLxjTenLetw@public.gmane.org http://rubyforge.org/mailman/listinfo/rainbows-talk Do not quote signatures (like this one) or top post when replying