Re: Segfaults on http_close?

cmogstored dev/user discussion/issues/patches/etc
 help / color / mirror / code / Atom feed

From: Xiao Yu <xyu@automattic.com>
To: Eric Wong <e@80x24.org>
Cc: Xiao Yu <xyu@automattic.com>, Arkadi Colson <arkadi@smartbit.be>,
	cmogstored-public@yhbt.net
Subject: Re: Segfaults on http_close?
Date: Wed, 20 Jan 2021 05:21:44 +0000	[thread overview]
Message-ID: <CABfxMcWGu+7X9Y-A7gTSYwsXpDRyYxGDtom3imV6-J3vT_qJPA@mail.gmail.com> (raw)
In-Reply-To: <20210117095109.GA28219@dcvr>

Thanks for the quick response! Sorry about the delay but I ran into a
couple issues (sorry kinda learning gdb and compiling binaries in
general on the go here) and have not been able to capture more useful
logs yet as crashes have seem to have slowed / stopped since
recompiling and reloading. For the record I recompiled cmogstored with
the newer RH `devtoolset-9-toolchain` (9.1) and it has not crash
since.

Also sorry about the lack of useful logs in my initial message but
neither kernel logs nor messages contained anything interesting around
the segfaults. Making matters worse, we didn't consistently reload
cmogstored as various versions of the compiled binary was installed
across the cluster and didn't really save the debugging symbols from
each of the compilations so can't really reply with a more useful
stack trace with the current core dumps.

For what it's worth, we also run another cluster with 1.7.3 that we've
upgraded over the years and have never seen this issue. Those nodes
are on a different distro (Debian) if that makes any difference.

On Sun, Jan 17, 2021 at 9:51 AM Eric Wong <e@80x24.org> wrote:
>
> Eric Wong <e@80x24.org> wrote:
> > Xiao Yu <xyu@automattic.com> wrote:
> > > Howdy, we are running a 96 node cmogstored cluster and have noticed
> > > that when the cluster is busy with lots of writes we occasionally get
> > > segfaults in cmogstored. This has happened 7 times in the past week
> > > each time on a random and different cmogstored node. Looking at the
> > > abrt backtrace of the core dump shows something similar to the
> > > following in each instance:
> >
> > Thanks for the bug report, sorry this caused you trouble
> > and I wonder if this is the same issue Arkadi was hitting
> > last year...
>
> Hi Xiao and Arkadi: Can either of you try the 1-line patch
> below to disable pthread_attr_setstacksize?

I'm going to let the current version run for a bit more and hope we
get a core dump, I think I finally have things cleaned up so that all
96 of our servers are running the same binary and have saved the right
/ corresponding debugging symbols file to check the stack trace later.
If we see another crash I'll recompile with the patch below and try
again. :)

> I took another look at the code and couldn't find any other
> culprits... (though I admit I'm not mentally as sharp given
> pandemic-induced stress and anxiety :<).

Oof, sorry to hear that, take care of yourself man!

> Given the mysterious nature of this problem and my inability to
> reproduce it; I wonder if there's stack corruption with certain
> compilers/glibc happening and blowing past the 4K guard page...
>
> @Arkadi: Xiao recently brought up this (or similar) issue again:
>   https://yhbt.net/cmogstored-public/20210111212621.GA12555@dcvr/T/
>
> diff --git a/thrpool.c b/thrpool.c
> index bc67ea0..bd71f95 100644
> --- a/thrpool.c
> +++ b/thrpool.c
> @@ -141,7 +141,7 @@ thrpool_add(struct mog_thrpool *tp, unsigned size, unsigned long *nr_eagain)
>
>         CHECK(int, 0, pthread_attr_init(&attr));
>
> -       if (stacksize > 0)
> +       if (0 && stacksize > 0)
>                 CHECK(int, 0, pthread_attr_setstacksize(&attr, stacksize));
>
>         thr = &tp->threads[tp->n_threads].thr;
>
>
>
> In retrospect, running a small stack is unnecessary on 64-bit
> systems due to practically unlimited virtual address space and
> lazy allocation.  It may still make sense for 32-bit (some
> embedded systems), though they can set RLIMIT_STACK before
> launching.

next prev parent reply	other threads:[~2021-01-20  5:22 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-01-11 20:48 Segfaults on http_close? Xiao Yu
2021-01-11 21:26 ` Eric Wong
2021-01-17  9:51   ` Eric Wong
2021-01-20  5:21     ` Xiao Yu [this message]
2021-01-20  8:57       ` Eric Wong
2021-01-20 21:13         ` Xiao Yu
2021-01-20 21:22           ` Eric Wong
2021-01-25 17:36             ` Xiao Yu
2021-01-25 17:47               ` Eric Wong
2021-01-25 19:27                 ` Xiao Yu
2021-02-12  6:54                   ` Eric Wong
2021-02-12 21:18                     ` Xiao Yu
2021-02-13  2:19                       ` Eric Wong

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://yhbt.net/cmogstored/README

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CABfxMcWGu+7X9Y-A7gTSYwsXpDRyYxGDtom3imV6-J3vT_qJPA@mail.gmail.com \
    --to=xyu@automattic.com \
    --cc=arkadi@smartbit.be \
    --cc=cmogstored-public@yhbt.net \
    --cc=e@80x24.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

Code repositories for project(s) associated with this public inbox

	https://yhbt.net/cmogstored.git/

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).