cmogstored dev/user discussion/issues/patches/etc
 help / color / mirror / code / Atom feed
From: Eric Wong <e@80x24.org>
To: Xiao Yu <xyu@automattic.com>
Cc: cmogstored-public@yhbt.net
Subject: Re: Segfaults on http_close?
Date: Mon, 11 Jan 2021 21:26:21 +0000	[thread overview]
Message-ID: <20210111212621.GA12555@dcvr> (raw)
In-Reply-To: <CABfxMcXPr7q8o1ayRdn1x-Fukuh7-s3YG=04KX=vAzz4DYqhuQ@mail.gmail.com>

Xiao Yu <xyu@automattic.com> wrote:
> Howdy, we are running a 96 node cmogstored cluster and have noticed
> that when the cluster is busy with lots of writes we occasionally get
> segfaults in cmogstored. This has happened 7 times in the past week
> each time on a random and different cmogstored node. Looking at the
> abrt backtrace of the core dump shows something similar to the
> following in each instance:

Thanks for the bug report, sorry this caused you trouble
and I wonder if this is the same issue Arkadi was hitting
last year...

> ---
> {   "signal": 11
> ,   "executable": "/usr/local/bin/cmogstored"
> ,   "stacktrace":
>       [ {   "crash_thread": true
>         ,   "frames":
>               [ {   "address": 140389358944542
>                 ,   "build_id": "3c61131d1dac9da79b73188e7702bef786c2ad54"
>                 ,   "build_id_offset": 528670
>                 ,   "function_name": "_int_free"
>                 ,   "file_name": "/usr/lib64/libc-2.17.so"

Anything in stderr or dmesg kernel logs from that?  glibc malloc
will emit something to stderr, I think...

>                 }
>               , {   "address": 4225373
>                 ,   "build_id": "9ca387b687027c0bac678943337d72b109fdf1e7"
>                 ,   "build_id_offset": 31069
>                 ,   "function_name": "http_close"
>                 ,   "file_name": "/usr/local/bin/cmogstored"
>                 }
>               , {   "address": 4228819
>                 ,   "build_id": "9ca387b687027c0bac678943337d72b109fdf1e7"
>                 ,   "build_id_offset": 34515
>                 ,   "function_name": "mog_http_queue_step"
>                 ,   "file_name": "/usr/local/bin/cmogstored"
>                 }

<snip>

That's a pretty standard code path; though a better backtrace +
core dumps with line numbers and pointer values would be more
useful.

> We are using the latest 1.8.0 release on SL 7
> (5.8.7-1.el7.elrepo.x86_64) and here's what it's linked against:

I'll need more time to investigate, but can you try mixing in
some older versions (1.6, 1.7.x) and maybe see if it reproduces
on a test cluster?

I know 1.6 has gone through several PB of traffic without
problems (but with older kernels and IPv4); the newer releases
are more focused on my smaller home setup.

I don't know how long or what kernels you've tried with
cmogstored and what versions you've used in the past.

Is this your first time seeing this?

<snip>

> Looking at http_close() it does not appear to really do all that much
> and mog_rbuf_free() appears to already test to see if the rbuf pointer
> is null before freeing it so I'm not sure what the issue is. (Sorry
> I'm not really a C dev so don't have a strong grasp on what is
> happening.) I'm not really sure how to debug this issue further, is
> there any other data I could collect or something I can do to try and
> track down the issue?

Actually, mog_packaddr_free could be there, too, if you're using
IPv6.  (I haven't used IPv6 heavily).

But it could also be a double-free in mog_rbuf_free.

Would you happen to have anybody on staff that can look at core
dumps and poke at it from gdb?

Basically you need to ensure you're getting backtraces that
gdb can inspect.  It would be helpful to see exactly the
contents and owner of the pointer being freed.

The default build uses -ggdb3 to maximize debug info.

  reply	other threads:[~2021-01-11 21:26 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-01-11 20:48 Segfaults on http_close? Xiao Yu
2021-01-11 21:26 ` Eric Wong [this message]
2021-01-17  9:51   ` Eric Wong
2021-01-20  5:21     ` Xiao Yu
2021-01-20  8:57       ` Eric Wong
2021-01-20 21:13         ` Xiao Yu
2021-01-20 21:22           ` Eric Wong
2021-01-25 17:36             ` Xiao Yu
2021-01-25 17:47               ` Eric Wong
2021-01-25 19:27                 ` Xiao Yu
2021-02-12  6:54                   ` Eric Wong
2021-02-12 21:18                     ` Xiao Yu
2021-02-13  2:19                       ` Eric Wong

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://yhbt.net/cmogstored/README

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20210111212621.GA12555@dcvr \
    --to=e@80x24.org \
    --cc=cmogstored-public@yhbt.net \
    --cc=xyu@automattic.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://yhbt.net/cmogstored.git/

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).