Random crash when sending USR2 + QUIT signals to Unicorn process

unicorn Ruby/Rack server user+dev discussion/patches/pulls/bugs/help
 help / color / mirror / code / Atom feed

* Random crash when sending USR2 + QUIT signals to Unicorn process
@ 2017-07-13 18:48 Pere Joan Martorell
  2017-07-13 19:34 ` Eric Wong
  0 siblings, 1 reply; 16+ messages in thread
From: Pere Joan Martorell @ 2017-07-13 18:48 UTC (permalink / raw)
  To: unicorn-public

Hi, I'm experimenting with a Rack application and Unicorn and I have a
random error occurring when upgrading my app. When upgrading (sending
a USR2 + QUIT signals), 10% of the times unicorn master process and
its 2 worker processes start and after serving the first request they
crash.

This the stderror.log:

> 91.126.37.106 - - [13/Jul/2017:18:21:09 +0000] "GET /login HTTP/1.0" 200 1571 0.0473
> I, [2017-07-13T18:21:14.351677 #21426]  INFO -- : executing ["/home/deployer/.rbenv/versions/2.4.1/bin/unicorn", "-c", "/home/deployer/apps/suppliers/current/config/unicorn.rb", "-E", "deployment", "-D", {12=>#<Kgio::UNIXServer:fd 12>}] (in /home/deployer/apps/suppliers/releases/20170713180855)
> I, [2017-07-13T18:21:14.351881 #21426]  INFO -- : forked child re-executing...
> I, [2017-07-13T18:21:14.441075 #21426]  INFO -- : inherited addr=/home/deployer/apps/suppliers/current/tmp/sockets/unicorn.socket fd=12
> I, [2017-07-13T18:21:14.441414 #21426]  INFO -- : Refreshing Gem list
> I, [2017-07-13T18:21:14.576843 #21426]  INFO -- : worker=0 spawning...
> I, [2017-07-13T18:21:14.577589 #21426]  INFO -- : worker=1 spawning...
> I, [2017-07-13T18:21:14.578117 #21426]  INFO -- : master process ready
> I, [2017-07-13T18:21:14.578866 #21430]  INFO -- : worker=0 spawned pid=21430
> I, [2017-07-13T18:21:14.579053 #21430]  INFO -- : worker=0 ready
> I, [2017-07-13T18:21:14.579649 #21432]  INFO -- : worker=1 spawned pid=21432
> I, [2017-07-13T18:21:14.579790 #21432]  INFO -- : worker=1 ready
> I, [2017-07-13T18:21:16.449066 #21369]  INFO -- : reaped #<Process::Status: pid 21373 exit 0> worker=0
> I, [2017-07-13T18:21:16.449174 #21369]  INFO -- : reaped #<Process::Status: pid 21375 exit 0> worker=1
> I, [2017-07-13T18:21:16.449207 #21369]  INFO -- : master complete
> /home/deployer/.rbenv/versions/2.4.1/lib/ruby/gems/2.4.0/gems/unicorn-5.3.0/lib/unicorn/http_request.rb:80:in `parse': method `hash' called on unexpected T_NODE object (0x0055b15b973508 flags=0xaa31b) (NotImplementedError)
> from /home/deployer/.rbenv/versions/2.4.1/lib/ruby/gems/2.4.0/gems/unicorn-5.3.0/lib/unicorn/http_request.rb:80:in `read'
> from /home/deployer/.rbenv/versions/2.4.1/lib/ruby/gems/2.4.0/gems/unicorn-5.3.0/lib/unicorn/http_server.rb:606:in `process_client'
> from /home/deployer/.rbenv/versions/2.4.1/lib/ruby/gems/2.4.0/gems/unicorn-5.3.0/lib/unicorn/http_server.rb:702:in `worker_loop'
> from /home/deployer/.rbenv/versions/2.4.1/lib/ruby/gems/2.4.0/gems/unicorn-5.3.0/lib/unicorn/http_server.rb:549:in `spawn_missing_workers'
> from /home/deployer/.rbenv/versions/2.4.1/lib/ruby/gems/2.4.0/gems/unicorn-5.3.0/lib/unicorn/http_server.rb:142:in `start'
> from /home/deployer/.rbenv/versions/2.4.1/lib/ruby/gems/2.4.0/gems/unicorn-5.3.0/bin/unicorn:126:in `<top (required)>'
> from /home/deployer/.rbenv/versions/2.4.1/bin/unicorn:22:in `load'
> from /home/deployer/.rbenv/versions/2.4.1/bin/unicorn:22:in `<main>'
> /home/deployer/.rbenv/versions/2.4.1/lib/ruby/gems/2.4.0/gems/unicorn-5.3.0/lib/unicorn/http_request.rb:80:in `parse': method `hash' called on unexpected T_NODE object (0x0055b15b973508 flags=0xaa31b) (NotImplementedError)
> from /home/deployer/.rbenv/versions/2.4.1/lib/ruby/gems/2.4.0/gems/unicorn-5.3.0/lib/unicorn/http_request.rb:80:in `read'
> from /home/deployer/.rbenv/versions/2.4.1/lib/ruby/gems/2.4.0/gems/unicorn-5.3.0/lib/unicorn/http_server.rb:606:in `process_client'
> from /home/deployer/.rbenv/versions/2.4.1/lib/ruby/gems/2.4.0/gems/unicorn-5.3.0/lib/unicorn/http_server.rb:702:in `worker_loop'
> from /home/deployer/.rbenv/versions/2.4.1/lib/ruby/gems/2.4.0/gems/unicorn-5.3.0/lib/unicorn/http_server.rb:549:in `spawn_missing_workers'
> from /home/deployer/.rbenv/versions/2.4.1/lib/ruby/gems/2.4.0/gems/unicorn-5.3.0/lib/unicorn/http_server.rb:142:in `start'
> from /home/deployer/.rbenv/versions/2.4.1/lib/ruby/gems/2.4.0/gems/unicorn-5.3.0/bin/unicorn:126:in `<top (required)>'
> from /home/deployer/.rbenv/versions/2.4.1/bin/unicorn:22:in `load'
> from /home/deployer/.rbenv/versions/2.4.1/bin/unicorn:22:in `<main>'
> E, [2017-07-13T18:21:29.147001 #21426] ERROR -- : reaped #<Process::Status: pid 21432 exit 1> worker=1
> I, [2017-07-13T18:21:29.147127 #21426]  INFO -- : worker=1 spawning...
> I, [2017-07-13T18:21:29.150907 #21439]  INFO -- : worker=1 spawned pid=21439
> I, [2017-07-13T18:21:29.151109 #21439]  INFO -- : worker=1 ready
> E, [2017-07-13T18:21:29.153360 #21426] ERROR -- : reaped #<Process::Status: pid 21430 exit 1> worker=0
> I, [2017-07-13T18:21:29.153424 #21426]  INFO -- : worker=0 spawning...
> I, [2017-07-13T18:21:29.154226 #21442]  INFO -- : worker=0 spawned pid=21442
> I, [2017-07-13T18:21:29.154383 #21442]  INFO -- : worker=0 ready



This is the configuration of unicorn:

> DEPLOY_TO = "/home/deployer/apps/suppliers" # The path in which capistrano caches deployed versions of the source
> CURRENT = "#{DEPLOY_TO}/current" # The current, deployed release
> UNICORN_PID = "#{CURRENT}/tmp/pids/unicorn.pid" # Unicorn server process id file
> UNICORN_STDOUT = "#{CURRENT}/log/unicorn.stdout.log" # Log file for messages on standard output
> UNICORN_STDERR = "#{CURRENT}/log/unicorn.stderr.log" # Log file for messages on standard error
> UNICORN_SOCKET = "#{CURRENT}/tmp/sockets/unicorn.socket" # Server socket
>
> # Set unicorn options
> working_directory CURRENT
> worker_processes 2
> preload_app true
> timeout 30
>
> # Set up socket location
> listen UNICORN_SOCKET, :backlog => 64
>
> # Logging
> stderr_path UNICORN_STDERR
> stdout_path UNICORN_STDOUT
>
> # Set master PID location
> pid UNICORN_PID


Any idea what is happening?

Thanks!

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Random crash when sending USR2 + QUIT signals to Unicorn process
  2017-07-13 18:48 Random crash when sending USR2 + QUIT signals to Unicorn process Pere Joan Martorell
@ 2017-07-13 19:34 ` Eric Wong
  2017-07-14 10:21   ` Pere Joan Martorell
  0 siblings, 1 reply; 16+ messages in thread
From: Eric Wong @ 2017-07-13 19:34 UTC (permalink / raw)
  To: Pere Joan Martorell
  Cc: unicorn-public, Philip Cunningham, Jonathan del Strother

+Cc: Philip and Jonathan  since they encountered this three years
ago, but we never heard back from them:

	https://bogomips.org/unicorn-public/?q=T_NODE+d:..20170713

Pere Joan Martorell <pere.joan@camaloon.com> wrote:
> > /home/deployer/.rbenv/versions/2.4.1/lib/ruby/gems/2.4.0/gems/unicorn-5.3.0/lib/unicorn/http_request.rb:80:in `parse': method `hash' called on unexpected T_NODE object (0x0055b15b973508 flags=0xaa31b) (NotImplementedError)

> Any idea what is happening?

This is most likely a bug in a C extension not using write
barriers correctly (perhaps via undocumented C-API functions in
Ruby).

I don't think I've seen this on ruby-core bug reports in a few years:

	https://public-inbox.org/ruby-core/?q=T_NODE

Fwiw, Appendix D on Generational GC in the Ruby source is
worth reading to any C extension authors:

	https://80x24.org/mirrors/ruby.git/plain/doc/extension.rdoc

There are probably build warnings when using some dangerous methods/macros,
maybe you can check build logs for const warnings.

Can you share the list of RubyGems you have loaded and maybe try
upgrading/replacing/eliminating the ones with C extensions
one-by-one until the error stops?

Also, perhaps the output of "pmap $PID_OF_WORKER" if you're on
Linux (or equivalent command if you're on another OS).

Anyways, I didn't notice anything suspicious in your config.

I'll do another self-audit of the unicorn + kgio + raindrops
extensions, too, but judging from the lack of reports and how
much they get used; I suspect the bug is elsewhere (more eyes
welcome, of course).

Thanks for the report and any more info you can provide!

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Random crash when sending USR2 + QUIT signals to Unicorn process
  2017-07-13 19:34 ` Eric Wong
@ 2017-07-14 10:21   ` Pere Joan Martorell
  2017-07-14 21:16     ` Eric Wong
  0 siblings, 1 reply; 16+ messages in thread
From: Pere Joan Martorell @ 2017-07-14 10:21 UTC (permalink / raw)
  To: Eric Wong; +Cc: unicorn-public, Philip Cunningham, Jonathan del Strother

2017-07-13 21:34 GMT+02:00 Eric Wong <e@80x24.org>:
> +Cc: Philip and Jonathan  since they encountered this three years
> ago, but we never heard back from them:
>
>         https://bogomips.org/unicorn-public/?q=T_NODE+d:..20170713
>
>
> Pere Joan Martorell <pere.joan@camaloon.com> wrote:
>> > /home/deployer/.rbenv/versions/2.4.1/lib/ruby/gems/2.4.0/gems/unicorn-5.3.0/lib/unicorn/http_request.rb:80:in `parse': method `hash' called on unexpected T_NODE object (0x0055b15b973508 flags=0xaa31b) (NotImplementedError)
>
>> Any idea what is happening?
>
> This is most likely a bug in a C extension not using write
> barriers correctly (perhaps via undocumented C-API functions in
> Ruby).
>
> I don't think I've seen this on ruby-core bug reports in a few years:
>
>         https://public-inbox.org/ruby-core/?q=T_NODE
>
> Fwiw, Appendix D on Generational GC in the Ruby source is
> worth reading to any C extension authors:
>
>         https://80x24.org/mirrors/ruby.git/plain/doc/extension.rdoc
>
> There are probably build warnings when using some dangerous methods/macros,
> maybe you can check build logs for const warnings.
>
>
> Can you share the list of RubyGems you have loaded and maybe try
> upgrading/replacing/eliminating the ones with C extensions
> one-by-one until the error stops?

Thank you very much for your fast reply. I'm not using Bundler to
manage my dependencies, but I checked it and there's not any conflict
between gem versions.
Seems that I solved the issue removing some of the gems. This was my gem list:

    cuba -v 3.8.0
    slim -v 3.0.8
    cutest -v 1.2.3
    rack-test -v 0.6.3
    sequel -v 4.46.0
    pg -v 0.20.0
    shotgun -v 0.9.2
    shield -v 2.1.1
    sequel_pg -v 1.6.19
    unicorn -v 5.3.0
    capistrano-rbenv -v 2.1.1

And I finally removed these gems:

    cutest -v 1.2.3
    rack-test -v 0.6.3
    shotgun -v 0.9.2
    sequel_pg -v 1.6.19

I suspect that the conflicting gem was 'sequel_pg' (sequel_pg
overwrites the inner loop of the Sequel postgres adapter row fetching
code with a C version. The C version is significantly faster than the
pure ruby version that Sequel uses by default), but given I didn't
remove these gems one by one I can't completely ensure that.

If the problem reemerges I'll keep you informed.

Thanks!! :)


>
> Also, perhaps the output of "pmap $PID_OF_WORKER" if you're on
> Linux (or equivalent command if you're on another OS).
>
> Anyways, I didn't notice anything suspicious in your config.
>
> I'll do another self-audit of the unicorn + kgio + raindrops
> extensions, too, but judging from the lack of reports and how
> much they get used; I suspect the bug is elsewhere (more eyes
> welcome, of course).
>
> Thanks for the report and any more info you can provide!

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Random crash when sending USR2 + QUIT signals to Unicorn process
  2017-07-14 10:21   ` Pere Joan Martorell
@ 2017-07-14 21:16     ` Eric Wong
  2017-07-14 22:50       ` Jeremy Evans
  0 siblings, 1 reply; 16+ messages in thread
From: Eric Wong @ 2017-07-14 21:16 UTC (permalink / raw)
  To: Pere Joan Martorell
  Cc: unicorn-public, Philip Cunningham, Jonathan del Strother,
	Jeremy Evans

Pere Joan Martorell <pere.joan@camaloon.com> wrote:
> I suspect that the conflicting gem was 'sequel_pg' (sequel_pg
> overwrites the inner loop of the Sequel postgres adapter row fetching
> code with a C version. The C version is significantly faster than the
> pure ruby version that Sequel uses by default), but given I didn't
> remove these gems one by one I can't completely ensure that.
> 
> If the problem reemerges I'll keep you informed.
> 
> Thanks!! :)

Thanks for the info.  I've added Jeremy Evans, the author of
sequel_pg to the Cc: even though I think he reads this list...

Anyways, I think I've spotted one potential bug in sequel_pg
w.r.t. RB_GC_GUARD usage, and the fix is below:

  git clone https://github.com/jeremyevans/sequel_pg && cd sequel_pg
  curl https://80x24.org/spew/20170714210918.3332-1-e@80x24.org/raw | git am

(more in-depth explanation is in the commit message)

Pere: perhaps you can give it a shot

Keep in mind I've only compile-tested this.  I didn't find
automated tests in the code and I don't have a usable Postgres
instance, at the moment.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Random crash when sending USR2 + QUIT signals to Unicorn process
  2017-07-14 21:16     ` Eric Wong
@ 2017-07-14 22:50       ` Jeremy Evans
  2017-07-15  0:15         ` Eric Wong
  0 siblings, 1 reply; 16+ messages in thread
From: Jeremy Evans @ 2017-07-14 22:50 UTC (permalink / raw)
  To: Eric Wong
  Cc: Pere Joan Martorell, unicorn-public, Philip Cunningham,
	Jonathan del Strother

On 07/14 09:16, Eric Wong wrote:
> Pere Joan Martorell <pere.joan@camaloon.com> wrote:
> > I suspect that the conflicting gem was 'sequel_pg' (sequel_pg
> > overwrites the inner loop of the Sequel postgres adapter row fetching
> > code with a C version. The C version is significantly faster than the
> > pure ruby version that Sequel uses by default), but given I didn't
> > remove these gems one by one I can't completely ensure that.
> > 
> > If the problem reemerges I'll keep you informed.
> > 
> > Thanks!! :)
> 
> Thanks for the info.  I've added Jeremy Evans, the author of
> sequel_pg to the Cc: even though I think he reads this list...
> 
> Anyways, I think I've spotted one potential bug in sequel_pg
> w.r.t. RB_GC_GUARD usage, and the fix is below:
> 
>   git clone https://github.com/jeremyevans/sequel_pg && cd sequel_pg
>   curl https://80x24.org/spew/20170714210918.3332-1-e@80x24.org/raw | git am
> 
> (more in-depth explanation is in the commit message)
> 
> Pere: perhaps you can give it a shot
> 
> Keep in mind I've only compile-tested this.  I didn't find
> automated tests in the code and I don't have a usable Postgres
> instance, at the moment.

Eric,

Thanks for this patch.  I'm not an RB_GC_GUARD expert, but the changes
look fine to me. The existing RB_GC_GUARD calls were added by me in
2012 to fix an earlier segfault.[1] This is the first reported
RB_GC_GUARD-related segfault in sequel_pg since then.

Pere, I would appreciate if you could test this patch and see if it
fixes your issue.  I will also test it and will release a new sequel_pg
version with this patch if it fixes the issue.

Thanks,
Jeremy

[1] https://github.com/jeremyevans/sequel_pg/commit/15edb132887d9b5292cad419fc7692ed5cd4b01b.diff

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Random crash when sending USR2 + QUIT signals to Unicorn process
  2017-07-14 22:50       ` Jeremy Evans
@ 2017-07-15  0:15         ` Eric Wong
  2017-07-15  1:34           ` Jeremy Evans
  0 siblings, 1 reply; 16+ messages in thread
From: Eric Wong @ 2017-07-15  0:15 UTC (permalink / raw)
  To: Jeremy Evans
  Cc: Pere Joan Martorell, unicorn-public, Philip Cunningham,
	Jonathan del Strother

Jeremy Evans <code@jeremyevans.net> wrote:
> Thanks for this patch.  I'm not an RB_GC_GUARD expert, but the changes
> look fine to me. The existing RB_GC_GUARD calls were added by me in
> 2012 to fix an earlier segfault.[1] This is the first reported
> RB_GC_GUARD-related segfault in sequel_pg since then.

No worries; I don't consider myself a RB_GC_GUARD expert, either(*).

> [1] https://github.com/jeremyevans/sequel_pg/commit/15edb132887d9b5292cad419fc7692ed5cd4b01b.diff

I suspect your original guards were lucky enough for C compilers
in 2012, but compilers have gotten more clever since then.  So
there's a a higher likelyhood of exposing bugs given the
conservative GC in Ruby(**).

Historical note:

  Back in the day, "volatile" alone was enough to defeat
  compiler optimizations in C Ruby.  Eventually, compilers got
  better, so RB_GC_GUARD was introduced.  And in the future,
  RB_GC_GUARD may evolve to accomodate even more clever
  compilers.

> Pere, I would appreciate if you could test this patch and see if it
> fixes your issue.  I will also test it and will release a new sequel_pg
> version with this patch if it fixes the issue.

Yes, actually testing the code is important, everything else
I've written here is theory ;)


(*) Fwiw, I am not fluent in reading asm for systems I run Ruby on,
    but I know how clever compilers can be, and have written
    about it some on ruby-core:
    https://public-inbox.org/ruby-core/?q=RB_GC_GUARD&d:..20170714

(**) similar story with lock-free multi-threading in other projects

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Random crash when sending USR2 + QUIT signals to Unicorn process
  2017-07-15  0:15         ` Eric Wong
@ 2017-07-15  1:34           ` Jeremy Evans
  2017-07-15  4:45             ` Eric Wong
  0 siblings, 1 reply; 16+ messages in thread
From: Jeremy Evans @ 2017-07-15  1:34 UTC (permalink / raw)
  To: Eric Wong
  Cc: Pere Joan Martorell, unicorn-public, Philip Cunningham,
	Jonathan del Strother

On 07/15 12:15, Eric Wong wrote:
> Jeremy Evans <code@jeremyevans.net> wrote:
> > Thanks for this patch.  I'm not an RB_GC_GUARD expert, but the changes
> > look fine to me. The existing RB_GC_GUARD calls were added by me in
> > 2012 to fix an earlier segfault.[1] This is the first reported
> > RB_GC_GUARD-related segfault in sequel_pg since then.
> 
> No worries; I don't consider myself a RB_GC_GUARD expert, either(*).
> 
> > [1] https://github.com/jeremyevans/sequel_pg/commit/15edb132887d9b5292cad419fc7692ed5cd4b01b.diff
> 
> I suspect your original guards were lucky enough for C compilers
> in 2012, but compilers have gotten more clever since then.  So
> there's a a higher likelyhood of exposing bugs given the
> conservative GC in Ruby(**).
> 
> Historical note:
> 
>   Back in the day, "volatile" alone was enough to defeat
>   compiler optimizations in C Ruby.  Eventually, compilers got
>   better, so RB_GC_GUARD was introduced.  And in the future,
>   RB_GC_GUARD may evolve to accomodate even more clever
>   compilers.
> 
> > Pere, I would appreciate if you could test this patch and see if it
> > fixes your issue.  I will also test it and will release a new sequel_pg
> > version with this patch if it fixes the issue.
> 
> Yes, actually testing the code is important, everything else
> I've written here is theory ;)

All of Sequel's postgres adapter tests still pass with this, so I merged
this into the master branch.  I'll do some more testing of my apps, but
unless I run into problems I plan to release this as sequel_pg 1.7.1
early next week.

Thanks,
Jeremy

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Random crash when sending USR2 + QUIT signals to Unicorn process
  2017-07-15  1:34           ` Jeremy Evans
@ 2017-07-15  4:45             ` Eric Wong
  2017-07-15  7:56               ` Jeremy Evans
  0 siblings, 1 reply; 16+ messages in thread
From: Eric Wong @ 2017-07-15  4:45 UTC (permalink / raw)
  To: Jeremy Evans
  Cc: Pere Joan Martorell, unicorn-public, Philip Cunningham,
	Jonathan del Strother

Jeremy Evans <code@jeremyevans.net> wrote:
> All of Sequel's postgres adapter tests still pass with this, so I merged
> this into the master branch.  I'll do some more testing of my apps, but
> unless I run into problems I plan to release this as sequel_pg 1.7.1
> early next week.

Thanks for the update.  Btw, did you get a chance to test with
GC.stress?  It's not 100% reliable (and it is slow), but
probably could've caught problems like this one.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Random crash when sending USR2 + QUIT signals to Unicorn process
  2017-07-15  4:45             ` Eric Wong
@ 2017-07-15  7:56               ` Jeremy Evans
  2017-07-17 14:32                 ` Jeremy Evans
  0 siblings, 1 reply; 16+ messages in thread
From: Jeremy Evans @ 2017-07-15  7:56 UTC (permalink / raw)
  To: Eric Wong
  Cc: Pere Joan Martorell, unicorn-public, Philip Cunningham,
	Jonathan del Strother

On 07/15 04:45, Eric Wong wrote:
> Jeremy Evans <code@jeremyevans.net> wrote:
> > All of Sequel's postgres adapter tests still pass with this, so I merged
> > this into the master branch.  I'll do some more testing of my apps, but
> > unless I run into problems I plan to release this as sequel_pg 1.7.1
> > early next week.
> 
> Thanks for the update.  Btw, did you get a chance to test with
> GC.stress?  It's not 100% reliable (and it is slow), but
> probably could've caught problems like this one.

I hadn't tested with GC.stress before. You weren't kidding about it being
slow.  I'll let it run overnight with the previous code (without your
patch), to see if this is something it would have caught.

Thanks,
Jeremy

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Random crash when sending USR2 + QUIT signals to Unicorn process
  2017-07-15  7:56               ` Jeremy Evans
@ 2017-07-17 14:32                 ` Jeremy Evans
  2017-07-24  1:25                   ` Eric Wong
  0 siblings, 1 reply; 16+ messages in thread
From: Jeremy Evans @ 2017-07-17 14:32 UTC (permalink / raw)
  To: Eric Wong
  Cc: Pere Joan Martorell, unicorn-public, Philip Cunningham,
	Jonathan del Strother

On 07/15 12:56, Jeremy Evans wrote:
> On 07/15 04:45, Eric Wong wrote:
> > Jeremy Evans <code@jeremyevans.net> wrote:
> > > All of Sequel's postgres adapter tests still pass with this, so I merged
> > > this into the master branch.  I'll do some more testing of my apps, but
> > > unless I run into problems I plan to release this as sequel_pg 1.7.1
> > > early next week.
> > 
> > Thanks for the update.  Btw, did you get a chance to test with
> > GC.stress?  It's not 100% reliable (and it is slow), but
> > probably could've caught problems like this one.
> 
> I hadn't tested with GC.stress before. You weren't kidding about it being
> slow.  I'll let it run overnight with the previous code (without your
> patch), to see if this is something it would have caught.

Running with GC.stress didn't catch the error for me.  But I'm using a
fairly old compiler (GCC 4.2.1, the OpenBSD default), so this may be
something that only shows up on a newer compiler that does more
optimizations.

Thanks,
Jeremy

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Random crash when sending USR2 + QUIT signals to Unicorn process
  2017-07-17 14:32                 ` Jeremy Evans
@ 2017-07-24  1:25                   ` Eric Wong
  2017-08-07  6:16                     ` Jeremy Evans
  0 siblings, 1 reply; 16+ messages in thread
From: Eric Wong @ 2017-07-24  1:25 UTC (permalink / raw)
  To: Jeremy Evans, Pere Joan Martorell
  Cc: unicorn-public, Philip Cunningham, Jonathan del Strother

Jeremy Evans <code@jeremyevans.net> wrote:
> Running with GC.stress didn't catch the error for me.  But I'm using a
> fairly old compiler (GCC 4.2.1, the OpenBSD default), so this may be
> something that only shows up on a newer compiler that does more
> optimizations.

Pere: just curious if you've had a chance to test my patch for
sequel_pg from Jeremy's latest sequel_pg.git

In any case, I'm certain my patch fixes a bug which manifests
in a compiler-dependent manner; but here could always be other
bugs in a similar vein.  Thanks.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Random crash when sending USR2 + QUIT signals to Unicorn process
  2017-07-24  1:25                   ` Eric Wong
@ 2017-08-07  6:16                     ` Jeremy Evans
  2017-08-07 20:18                       ` Eric Wong
  0 siblings, 1 reply; 16+ messages in thread
From: Jeremy Evans @ 2017-08-07  6:16 UTC (permalink / raw)
  To: Eric Wong
  Cc: Pere Joan Martorell, unicorn-public, Philip Cunningham,
	Jonathan del Strother

On 07/24 01:25, Eric Wong wrote:
> Jeremy Evans <code@jeremyevans.net> wrote:
> > Running with GC.stress didn't catch the error for me.  But I'm using a
> > fairly old compiler (GCC 4.2.1, the OpenBSD default), so this may be
> > something that only shows up on a newer compiler that does more
> > optimizations.
> 
> Pere: just curious if you've had a chance to test my patch for
> sequel_pg from Jeremy's latest sequel_pg.git
> 
> In any case, I'm certain my patch fixes a bug which manifests
> in a compiler-dependent manner; but here could always be other
> bugs in a similar vein.  Thanks.

I can't get it to crash with sequel_pg 1.7.0 when compiled using clang
4.0.0 either.  I even tried to build a special program designed to
trigger the crash.

Compiler used:

$ cc -v
OpenBSD clang version 4.0.0 (tags/RELEASE_400/final) (based on LLVM 4.0.0)
Target: amd64-unknown-openbsd6.1
Thread model: posix
InstalledDir: /usr/bin

Program used:

require 'sequel'
DB = Sequel.postgres(:test=>false)
DB.extension :pg_array
# pg_array.txt contains ([[1] * 100] * 100) in PostgreSQL array format
t = File.read('pg_array.txt')
dot = '.'
trap(:HUP){}
Thread.new do
  while true
    sleep 1
    Process.kill(:HUP, $$)
  end
end
GC.stress = true
(0..2).map do
  Thread.new do
    i = 0
    pr = lambda{|v| print dot if ((i+=1) % 100) == 0; "#{v}#{v}"}
    while true
      print 'L'
      Sequel::Postgres.parse_pg_array(t.dup, pr)
    end
  end
end.map(&:join)

Pere, can you test this program and see if it crashes in your
environment?  If not, can you put together a reproducible example that
does crash in your environment?

Thanks,
Jeremy

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Random crash when sending USR2 + QUIT signals to Unicorn process
  2017-08-07  6:16                     ` Jeremy Evans
@ 2017-08-07 20:18                       ` Eric Wong
  2017-10-03 14:52                         ` Xuanzhong Wei
  0 siblings, 1 reply; 16+ messages in thread
From: Eric Wong @ 2017-08-07 20:18 UTC (permalink / raw)
  To: Jeremy Evans
  Cc: Pere Joan Martorell, unicorn-public, Philip Cunningham,
	Jonathan del Strother

Jeremy Evans <code@jeremyevans.net> wrote:
> On 07/24 01:25, Eric Wong wrote:
> > Jeremy Evans <code@jeremyevans.net> wrote:
> > > Running with GC.stress didn't catch the error for me.  But I'm using a
> > > fairly old compiler (GCC 4.2.1, the OpenBSD default), so this may be
> > > something that only shows up on a newer compiler that does more
> > > optimizations.
> > 
> > Pere: just curious if you've had a chance to test my patch for
> > sequel_pg from Jeremy's latest sequel_pg.git
> > 
> > In any case, I'm certain my patch fixes a bug which manifests
> > in a compiler-dependent manner; but here could always be other
> > bugs in a similar vein.  Thanks.
> 
> I can't get it to crash with sequel_pg 1.7.0 when compiled using clang
> 4.0.0 either.  I even tried to build a special program designed to
> trigger the crash.

From anecdotes on ruby-core, clang still seems less aggressive
at optimizations than modern gcc.

Fwiw, a few GC bugs in Ruby trunk got fixed recently and the
fixes should be in 2.4.2 (soon):
https://public-inbox.org/ruby-core/?q=T_NONE+d%3A20161225..20170808

Not identical to T_NODE which Pere got, but if it's a GC bug,
but both T_NONE and T_NODE triggers are symptoms of GC bugs.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Random crash when sending USR2 + QUIT signals to Unicorn process
  2017-08-07 20:18                       ` Eric Wong
@ 2017-10-03 14:52                         ` Xuanzhong Wei
  2017-10-03 17:15                           ` Eric Wong
  0 siblings, 1 reply; 16+ messages in thread
From: Xuanzhong Wei @ 2017-10-03 14:52 UTC (permalink / raw)
  To: e; +Cc: code, maillist, pere.joan, philip, unicorn-public, Xuanzhong Wei

We have the same issue here.

IMHO, it is a bug introduced by 979ebcf91705709be5041a3be4514e5f1f6ec02c.
The `mark_ary` get GCed before we add it to the ruby's global_list
since we are doing memory allocations before calling rb_global_variable.

A simple test can be found here:
https://github.com/azrle/ruby_c_ext_test

I will try to submit a patch later.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Random crash when sending USR2 + QUIT signals to Unicorn process
  2017-10-03 14:52                         ` Xuanzhong Wei
@ 2017-10-03 17:15                           ` Eric Wong
  2017-10-03 18:20                             ` Xuanzhong Wei
  0 siblings, 1 reply; 16+ messages in thread
From: Eric Wong @ 2017-10-03 17:15 UTC (permalink / raw)
  To: Xuanzhong Wei; +Cc: code, maillist, pere.joan, philip, unicorn-public

Xuanzhong Wei <azrlew@gmail.com> wrote:
> We have the same issue here.
> 
> IMHO, it is a bug introduced by 979ebcf91705709be5041a3be4514e5f1f6ec02c.
> The `mark_ary` get GCed before we add it to the ruby's global_list
> since we are doing memory allocations before calling rb_global_variable.
> 
> A simple test can be found here:
> https://github.com/azrle/ruby_c_ext_test

Thanks, which compiler and version did you use?

> I will try to submit a patch later.

https://bogomips.org/unicorn-public/20171003145718.30404-1-azrlew@gmail.com/raw

Yes, seems corect since the compiler doesn't need to keep
mark_ary anymore once it only needs the address (&mark_ary).
OBJ_FREEZE is an inline which does nothing to prevent
the compiler from only keeping RBasic->flags around and
not the actual VALUE.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Random crash when sending USR2 + QUIT signals to Unicorn process
  2017-10-03 17:15                           ` Eric Wong
@ 2017-10-03 18:20                             ` Xuanzhong Wei
  0 siblings, 0 replies; 16+ messages in thread
From: Xuanzhong Wei @ 2017-10-03 18:20 UTC (permalink / raw)
  To: e; +Cc: azrlew, code, maillist, pere.joan, philip, unicorn-public

Thanks for quick response!

Here is the compiler version:
gcc version 4.4.7 20120313 (Red Hat 4.4.7-4) (GCC)

Furthermore, although I was not able to re-produce the issue with codes caused
the problem, I generated some core files of normal and abnormal masters and
workers.

Here is a glimpse of `ruby_current_vm.objspace.global_list`
compared with normal and abnormal masters with same codes.

// normal
(gdb)
$72 = (struct gc_list *) 0x7feb65625680
$73 = 5
---------------------------------------------
(gdb)
$74 = (struct gc_list *) 0x7feb6424e2a0
$75 = 7 # T_ARRAY
---------------------------------------------
(gdb)
$76 = (struct gc_list *) 0x7feb63a8c120
$77 = 8

// abnormal
(gdb)
$72 = (struct gc_list *) 0x7f7547651cc0
$73 = 5
---------------------------------------------
(gdb)
$74 = (struct gc_list *) 0x7f7546939760
$75 = 27 # T_NODE
---------------------------------------------
(gdb)
$76 = (struct gc_list *) 0x7f7546583e70
$77 = 8

Note that I have checked the whole global_list, the order looks the same.
Hence, I am quite sure that the T_NODE in the abnormal one comes from the same
module as the T_ARRAY in the normal one.
And from contents in T_ARRAY, the module should be unicorn_http.

^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2017-10-03 18:21 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-07-13 18:48 Random crash when sending USR2 + QUIT signals to Unicorn process Pere Joan Martorell
2017-07-13 19:34 ` Eric Wong
2017-07-14 10:21   ` Pere Joan Martorell
2017-07-14 21:16     ` Eric Wong
2017-07-14 22:50       ` Jeremy Evans
2017-07-15  0:15         ` Eric Wong
2017-07-15  1:34           ` Jeremy Evans
2017-07-15  4:45             ` Eric Wong
2017-07-15  7:56               ` Jeremy Evans
2017-07-17 14:32                 ` Jeremy Evans
2017-07-24  1:25                   ` Eric Wong
2017-08-07  6:16                     ` Jeremy Evans
2017-08-07 20:18                       ` Eric Wong
2017-10-03 14:52                         ` Xuanzhong Wei
2017-10-03 17:15                           ` Eric Wong
2017-10-03 18:20                             ` Xuanzhong Wei

Code repositories for project(s) associated with this public inbox

	https://yhbt.net/unicorn.git/

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).