From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on dcvr.yhbt.net X-Spam-Level: X-Spam-Status: No, score=-4.0 required=3.0 tests=ALL_TRUSTED,BAYES_00 shortcircuit=no autolearn=ham autolearn_force=no version=3.4.2 Received: from localhost (dcvr.yhbt.net [127.0.0.1]) by dcvr.yhbt.net (Postfix) with ESMTP id 67E4F1F934; Sun, 17 Jan 2021 09:51:09 +0000 (UTC) Date: Sun, 17 Jan 2021 09:51:09 +0000 From: Eric Wong To: Xiao Yu , Arkadi Colson Cc: cmogstored-public@yhbt.net Subject: Re: Segfaults on http_close? Message-ID: <20210117095109.GA28219@dcvr> References: <20210111212621.GA12555@dcvr> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline In-Reply-To: <20210111212621.GA12555@dcvr> List-Id: Eric Wong wrote: > Xiao Yu wrote: > > Howdy, we are running a 96 node cmogstored cluster and have noticed > > that when the cluster is busy with lots of writes we occasionally get > > segfaults in cmogstored. This has happened 7 times in the past week > > each time on a random and different cmogstored node. Looking at the > > abrt backtrace of the core dump shows something similar to the > > following in each instance: > > Thanks for the bug report, sorry this caused you trouble > and I wonder if this is the same issue Arkadi was hitting > last year... Hi Xiao and Arkadi: Can either of you try the 1-line patch below to disable pthread_attr_setstacksize? I took another look at the code and couldn't find any other culprits... (though I admit I'm not mentally as sharp given pandemic-induced stress and anxiety :<). Given the mysterious nature of this problem and my inability to reproduce it; I wonder if there's stack corruption with certain compilers/glibc happening and blowing past the 4K guard page... @Arkadi: Xiao recently brought up this (or similar) issue again: https://yhbt.net/cmogstored-public/20210111212621.GA12555@dcvr/T/ diff --git a/thrpool.c b/thrpool.c index bc67ea0..bd71f95 100644 --- a/thrpool.c +++ b/thrpool.c @@ -141,7 +141,7 @@ thrpool_add(struct mog_thrpool *tp, unsigned size, unsigned long *nr_eagain) CHECK(int, 0, pthread_attr_init(&attr)); - if (stacksize > 0) + if (0 && stacksize > 0) CHECK(int, 0, pthread_attr_setstacksize(&attr, stacksize)); thr = &tp->threads[tp->n_threads].thr; In retrospect, running a small stack is unnecessary on 64-bit systems due to practically unlimited virtual address space and lazy allocation. It may still make sense for 32-bit (some embedded systems), though they can set RLIMIT_STACK before launching.