From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on dcvr.yhbt.net X-Spam-Level: X-Spam-Status: No, score=-3.2 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,SPF_HELO_NONE,SPF_PASS shortcircuit=no autolearn=ham autolearn_force=no version=3.4.2 Received: from mx1.dfw.automattic.com (mx1.dfw.automattic.com [192.0.84.151]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by dcvr.yhbt.net (Postfix) with ESMTPS id 24BE81F4B4 for ; Mon, 11 Jan 2021 20:49:09 +0000 (UTC) Received: from localhost (localhost.localdomain [127.0.0.1]) by mx1.dfw.automattic.com (Postfix) with ESMTP id 31A491C1802 for ; Mon, 11 Jan 2021 20:49:08 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=automattic.com; h=content-type:content-type:subject:subject:message-id:date :date:from:from:mime-version:received:received:received:received :received; s=automattic1; t=1610398147; bh=XJqJoFkcWEca18mNO6phU c/CfWCNwpIJiz1ODHBBA3g=; b=BK6FZMm3u1qwMg9mFufZRaqFFLdX1coMVmg2O 29NMEcl9zCWoeFaCSzCkiIxi6k1rwHNC78VGKodmABjaRSSEXh/Nz2mM78q0ZIgZ 8yxonxI1tiP7TQVk7TiNykbKjlmd4mgXv7786cNIbf/TrwSEOZ7i0Ih5wFsTBBY2 p2+nUVO2eZZNnih4aUZ9N6nJRqCa1VMNre7oYZCUspUVcHPdKfqwYQBjIrDu4opN b+cJKjExfsjBFHUNR9NWNvlSUTJACJUd7TWpgvjYiFjKozbX+RbrcqKsfYs8zyAt JLDC99vNgPc3D+d9IKuqNxd0A7kvuuZ0885UoH1pvB/6qTW2A== X-Virus-Scanned: Debian amavisd-new at wordpress.com Received: from mx1.dfw.automattic.com ([127.0.0.1]) by localhost (mx1.dfw.automattic.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id 5dw-aqcKRp-o for ; Mon, 11 Jan 2021 20:49:07 +0000 (UTC) Received: from smtp-gw.dca.automattic.com (smtp-gw.dca.automattic.com [192.0.97.210]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mx1.dfw.automattic.com (Postfix) with ESMTPS id B1FDC1C182B for ; Mon, 11 Jan 2021 20:49:07 +0000 (UTC) Authentication-Results: mail.automattic.com; dkim=pass (2048-bit key; unprotected) header.d=automattic.com header.i=@automattic.com header.b="XDOyDE3U"; dkim=fail reason="signature verification failed" (2048-bit key; unprotected) header.d=a8c-com.20150623.gappssmtp.com header.i=@a8c-com.20150623.gappssmtp.com header.b="rqW0/WLK"; dkim-atps=neutral Received: from smtp-gw.dca.automattic.com (localhost.localdomain [127.0.0.1]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by smtp-gw.dca.automattic.com (Postfix) with ESMTPS id 3E649A099C for ; Mon, 11 Jan 2021 20:49:07 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=automattic.com; s=automattic1; t=1610398147; bh=XJqJoFkcWEca18mNO6phUc/CfWCNwpIJiz1ODHBBA3g=; h=From:Date:Subject:To:From; b=XDOyDE3UUAo0sZ028VxzUNHChOkp5Y8AQqVwXHEbXSBlAFoN4TVEDQSQYe7evadkG uTdk/YngBeKW5ddh+X6XGchqqBwEObdiADxROTgpaKZbms9Dq7vu7qfo+9A2ygAI6U jAkGhbsoeDXYKq3AeoI+7NopkMuichuC4BvKXOI3prPkN5KLNx3EA7wHVe/CCBA6b2 pHqk77J/7p6DP6Sve6ElqJQDKAV79I9cMo9DyNdI0S9wcu8fMQf/DnGtNIdSNlpXZA xY+B/xBne/62OK7qlu0eH68Atqg8xN4PEWEubjOaHQh0eAPB7CHh1CagljhPZdzcMJ 2MQENMbg9QeyQ== Received: from mail-wm1-f69.google.com (mail-wm1-f69.google.com [209.85.128.69]) (using TLSv1.3 with cipher TLS_AES_128_GCM_SHA256 (128/128 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by smtp-gw.dca.automattic.com (Postfix) with ESMTPS id 27DD7A0791 for ; Mon, 11 Jan 2021 20:49:07 +0000 (UTC) Received: by mail-wm1-f69.google.com with SMTP id r5so6021wma.2 for ; Mon, 11 Jan 2021 12:49:07 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=a8c-com.20150623.gappssmtp.com; s=20150623; h=mime-version:from:date:message-id:subject:to; bh=rHCdOPAesGn/KA6c+CTZbMJZf4YWxVMv/ZgqTO3+jbE=; b=rqW0/WLKazgetU9O5kUzoH/ha0pv0tAUs1rPOkC1MagikUR/tcUEXKB0MM1eF3KCx7 Xae22RygM8+/VtM99/7fGizRvYAzo9NwtqxCkm8ChPXJnTieTIggDIBaIK0/mnDWnT83 4dQ2xioGN4/eHAivab1Ju/It7y1nQB8nJG3FGqZoDwAm/HrSNhf3K0QSFtgbC9UjZS1c XXRWCBL3IgEdnSnMmQZGgID/BgDR/grxX+mOc8Mz8g+lvUMxTWqW9iFh2kOZFqH3pD+x 4mKZFjJ/FbUAr1as9rpIue+X2jVb3r+r/ulrJPMHA7hJ4cbonhYYEA/godEPLD3N+e5x Jo1g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:from:date:message-id:subject:to; bh=rHCdOPAesGn/KA6c+CTZbMJZf4YWxVMv/ZgqTO3+jbE=; b=eJs9eyCny9gzB1rPLayt4ws6fdq3bFttghdfi/LQgbFhyp63jXPm9Sgxd6Hz4UR0nK +WAmk0e2otu3CHGBJ+OLtvX8yG0AMUfDd3Ky5x03C0gQqM1bO5gQtsEhabF0X7wrE99x SCDDlL74Lpd/UMux9LfJHn8LaGmQea7UnQy5tjKrYAltGr43T9WH6fp4MD1q1GfmWA13 CgsSqknc4Z2bO/kBIODERvorDnKxux9dyuXBvswdtvgL2UnnT22eshNuqRCR7auCc16s lwtR9uEvzJiadjoir57XyeZ3ln9ySfgYuWgW9z0/K6magL55nsA+n9SzGsXQPZrM9u7O XhgA== X-Gm-Message-State: AOAM533WkNOykVf9CCaFQCJoCxaPiZdy3G5JKqag7oMO+fy8MRmaXsu7 PaTLkOCV0ouMj5R4LIKjU9qjLQ7LLc4LzZertOMjSX5yBF5bdtcCuUO1mACfodVsR6dVTic59v5 0teBrzeAanakETOVzCc03x6UzqiNKeyYxPfvZJQ== X-Received: by 2002:adf:ef01:: with SMTP id e1mr852897wro.59.1610398145849; Mon, 11 Jan 2021 12:49:05 -0800 (PST) X-Google-Smtp-Source: ABdhPJxL7s0T9WvOZT90QlX7Hleinrrsem69E6q2JCOcGFlmXJSiwJQMOtL6r4N7CLt5XiBtRHKofh3W8s/eiSiw0Mo= X-Received: by 2002:adf:ef01:: with SMTP id e1mr852881wro.59.1610398145484; Mon, 11 Jan 2021 12:49:05 -0800 (PST) MIME-Version: 1.0 From: Xiao Yu Date: Mon, 11 Jan 2021 20:48:53 +0000 Message-ID: Subject: Segfaults on http_close? To: cmogstored-public@yhbt.net Content-Type: text/plain; charset="UTF-8" List-Id: Howdy, we are running a 96 node cmogstored cluster and have noticed that when the cluster is busy with lots of writes we occasionally get segfaults in cmogstored. This has happened 7 times in the past week each time on a random and different cmogstored node. Looking at the abrt backtrace of the core dump shows something similar to the following in each instance: --- { "signal": 11 , "executable": "/usr/local/bin/cmogstored" , "stacktrace": [ { "crash_thread": true , "frames": [ { "address": 140389358944542 , "build_id": "3c61131d1dac9da79b73188e7702bef786c2ad54" , "build_id_offset": 528670 , "function_name": "_int_free" , "file_name": "/usr/lib64/libc-2.17.so" } , { "address": 4225373 , "build_id": "9ca387b687027c0bac678943337d72b109fdf1e7" , "build_id_offset": 31069 , "function_name": "http_close" , "file_name": "/usr/local/bin/cmogstored" } , { "address": 4228819 , "build_id": "9ca387b687027c0bac678943337d72b109fdf1e7" , "build_id_offset": 34515 , "function_name": "mog_http_queue_step" , "file_name": "/usr/local/bin/cmogstored" } , { "address": 4256381 , "build_id": "9ca387b687027c0bac678943337d72b109fdf1e7" , "build_id_offset": 62077 , "function_name": "mog_queue_loop" , "file_name": "/usr/local/bin/cmogstored" } , { "address": 140389362433493 , "build_id": "3d9441083d079dc2977f1bd50c8068d11767232d" , "build_id_offset": 32213 , "function_name": "start_thread" , "file_name": "/usr/lib64/libpthread-2.17.so" } , { "address": 140389359455917 , "build_id": "3c61131d1dac9da79b73188e7702bef786c2ad54" , "build_id_offset": 1040045 , "function_name": "__clone" , "file_name": "/usr/lib64/libc-2.17.so" } ] } ] } --- We are using the latest 1.8.0 release on SL 7 (5.8.7-1.el7.elrepo.x86_64) and here's what it's linked against: --- # ldd -v /usr/local/bin/cmogstored linux-vdso.so.1 => (0x00007ffc2898d000) libpthread.so.0 => /lib64/libpthread.so.0 (0x00007f5c0ccff000) libc.so.6 => /lib64/libc.so.6 (0x00007f5c0c932000) /lib64/ld-linux-x86-64.so.2 (0x00007f5c0cf1b000) Version information: /usr/local/bin/cmogstored: libpthread.so.0 (GLIBC_2.3.2) => /lib64/libpthread.so.0 libpthread.so.0 (GLIBC_2.2.5) => /lib64/libpthread.so.0 libc.so.6 (GLIBC_2.3) => /lib64/libc.so.6 libc.so.6 (GLIBC_2.9) => /lib64/libc.so.6 libc.so.6 (GLIBC_2.14) => /lib64/libc.so.6 libc.so.6 (GLIBC_2.8) => /lib64/libc.so.6 libc.so.6 (GLIBC_2.3.2) => /lib64/libc.so.6 libc.so.6 (GLIBC_2.7) => /lib64/libc.so.6 libc.so.6 (GLIBC_2.10) => /lib64/libc.so.6 libc.so.6 (GLIBC_2.6) => /lib64/libc.so.6 libc.so.6 (GLIBC_2.17) => /lib64/libc.so.6 libc.so.6 (GLIBC_2.4) => /lib64/libc.so.6 libc.so.6 (GLIBC_2.3.4) => /lib64/libc.so.6 libc.so.6 (GLIBC_2.2.5) => /lib64/libc.so.6 /lib64/libpthread.so.0: ld-linux-x86-64.so.2 (GLIBC_2.2.5) => /lib64/ld-linux-x86-64.so.2 ld-linux-x86-64.so.2 (GLIBC_2.3) => /lib64/ld-linux-x86-64.so.2 ld-linux-x86-64.so.2 (GLIBC_PRIVATE) => /lib64/ld-linux-x86-64.so.2 libc.so.6 (GLIBC_2.14) => /lib64/libc.so.6 libc.so.6 (GLIBC_2.3.2) => /lib64/libc.so.6 libc.so.6 (GLIBC_PRIVATE) => /lib64/libc.so.6 libc.so.6 (GLIBC_2.2.5) => /lib64/libc.so.6 /lib64/libc.so.6: ld-linux-x86-64.so.2 (GLIBC_2.3) => /lib64/ld-linux-x86-64.so.2 ld-linux-x86-64.so.2 (GLIBC_PRIVATE) => /lib64/ld-linux-x86-64.so.2 --- Looking at http_close() it does not appear to really do all that much and mog_rbuf_free() appears to already test to see if the rbuf pointer is null before freeing it so I'm not sure what the issue is. (Sorry I'm not really a C dev so don't have a strong grasp on what is happening.) I'm not really sure how to debug this issue further, is there any other data I could collect or something I can do to try and track down the issue? Thanks! Xiao Yu