Heavy load

cmogstored dev/user discussion/issues/patches/etc
 help / color / mirror / code / Atom feed

* Heavy load
@ 2019-12-11 13:54 Arkadi Colson
  2019-12-11 17:06 ` Eric Wong
  0 siblings, 1 reply; 20+ messages in thread
From: Arkadi Colson @ 2019-12-11 13:54 UTC (permalink / raw)
  To: cmogstored-public@bogomips.org

Hi
On heavy load our cmogstored daemon are crashing frequently with 
following error in the syslog:
Dec 11 12:28:08 hostname kernel: [1995674.355262] cmogstored[15660]: 
segfault at 7fd981ae3fd8 ip 00007fd9825f0dc8 sp 00007fd981ae3fe0 error 6 
in libc-2.24.so[7fd9825aa000+195000]
Somebody an idea why?

Best regards
Arkadi Colson

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Heavy load
  2019-12-11 13:54 Heavy load Arkadi Colson
@ 2019-12-11 17:06 ` Eric Wong
  2019-12-12  7:30   ` Arkadi Colson
  0 siblings, 1 reply; 20+ messages in thread
From: Eric Wong @ 2019-12-11 17:06 UTC (permalink / raw)
  To: Arkadi Colson; +Cc: cmogstored-public

Arkadi Colson <arkadi@smartbit.be> wrote:
> Hi
> On heavy load our cmogstored daemon are crashing frequently with 
> following error in the syslog:
> Dec 11 12:28:08 hostname kernel: [1995674.355262] cmogstored[15660]: 
> segfault at 7fd981ae3fd8 ip 00007fd9825f0dc8 sp 00007fd981ae3fe0 error 6 
> in libc-2.24.so[7fd9825aa000+195000]
> Somebody an idea why?

That's not good and not something I've seen.  Do you have a
backtrace and debugging symbols for libc and cmogstored?

Which version of cmogstored?

Thanks.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Heavy load
  2019-12-11 17:06 ` Eric Wong
@ 2019-12-12  7:30   ` Arkadi Colson
  2019-12-12  7:59     ` Eric Wong
  2019-12-12 19:16     ` Eric Wong
  0 siblings, 2 replies; 20+ messages in thread
From: Arkadi Colson @ 2019-12-12  7:30 UTC (permalink / raw)
  To: Eric Wong; +Cc: cmogstored-public@bogomips.org

Hi

We are running cmogstored version 1.7.1 compiled and running on Debian 
stretch. How do I catch backtrace and debugging symbols?

Best regards
Arkadi Colson

On 11/12/19 18:06, Eric Wong wrote:
> Arkadi Colson <arkadi@smartbit.be> wrote:
>> Hi
>> On heavy load our cmogstored daemon are crashing frequently with
>> following error in the syslog:
>> Dec 11 12:28:08 hostname kernel: [1995674.355262] cmogstored[15660]:
>> segfault at 7fd981ae3fd8 ip 00007fd9825f0dc8 sp 00007fd981ae3fe0 error 6
>> in libc-2.24.so[7fd9825aa000+195000]
>> Somebody an idea why?
> That's not good and not something I've seen.  Do you have a
> backtrace and debugging symbols for libc and cmogstored?
>
> Which version of cmogstored?
>
> Thanks.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Heavy load
  2019-12-12  7:30   ` Arkadi Colson
@ 2019-12-12  7:59     ` Eric Wong
  2019-12-12 19:16     ` Eric Wong
  1 sibling, 0 replies; 20+ messages in thread
From: Eric Wong @ 2019-12-12  7:59 UTC (permalink / raw)
  To: Arkadi Colson; +Cc: cmogstored-public

Arkadi Colson <arkadi@smartbit.be> wrote:
> Hi
> 
> We are running cmogstored version 1.7.1 compiled and running on Debian 
> stretch. How do I catch backtrace and debugging symbols?

OK, on Linux, you need to make sure core file size is big
enough to hold the entire core dump.  I normally just use
"unlimited" since I have more free disk space than RAM.

Via init scripts, use "ulimit -c unlimited" in the shell
script before spawning cmogstored (from the same shell
process).

With systemd, it's the "LimitCORE=" property (see systemd.exec
manpage).

You can check "Max core file size" in the
"/proc/$PID_OF_CMOGSTORED/limits" file once cmogstored is running
to verify.

AFAIK, core dumps go to the working directory of the process by
default.  For daemonized processes like cmogstored might try to
write to "/" (and fail due to lack of permissions).

I've used the following to get all my core dumps in
/var/tmp/core.$CRASHED_PID:

	echo /var/tmp/core >/proc/sys/kernel/core_pattern
	echo 1 >/proc/sys/kernel/core_uses_pid

But modern kernels have more options for piping core dumps
to processes and such...

cmogstored builds by default with '-ggdb3' in the CFLAGS, which
is the most verbose debug info.  Some installers strip the debug
info, unfortunately.  Running "file /path/to/cmogstored" will
tell you if there's debug_info and if it's stripped
(probably easiest to install the unstripped one)

Once you have a core dump from a crashed process, you can run
gdb against it with an unstripped binary to get lots of info.

  gdb /path/to/cmogstored /var/tmp/core.$CRASHED_PID

And then you can type "bt" to get a backtrace.
You can also use "t 1", "t 2", etc to switch between
threads and get the backtrace of other threads and a
host of other commands.

I'm not an expert in using gdb, as I can usually figure
everything out from just the backtrace.

Hopefully that's enough to get you the info I need to
fix the problem.  But there's a lot of info out there
on debugging C programs on there (and I'm tired atm and
about to pass out).

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Heavy load
  2019-12-12  7:30   ` Arkadi Colson
  2019-12-12  7:59     ` Eric Wong
@ 2019-12-12 19:16     ` Eric Wong
  2019-12-17  7:40       ` Arkadi Colson
  2019-12-17  7:41       ` Arkadi Colson
  1 sibling, 2 replies; 20+ messages in thread
From: Eric Wong @ 2019-12-12 19:16 UTC (permalink / raw)
  To: Arkadi Colson; +Cc: cmogstored-public

Arkadi Colson <arkadi@smartbit.be> wrote:
> We are running cmogstored version 1.7.1 compiled and running on Debian 
> stretch. How do I catch backtrace and debugging symbols?

Btw, did you run older versions of cmogstored w/o troubles?
Thanks.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Heavy load
  2019-12-12 19:16     ` Eric Wong
@ 2019-12-17  7:40       ` Arkadi Colson
  2019-12-17  8:43         ` Eric Wong
  2019-12-17  7:41       ` Arkadi Colson
  1 sibling, 1 reply; 20+ messages in thread
From: Arkadi Colson @ 2019-12-17  7:40 UTC (permalink / raw)
  To: Eric Wong; +Cc: cmogstored-public@bogomips.org

Hi

Yes we had this already on older version but less frequent. Below you 
can find the backtrace:

[Current thread is 1 (Thread 0x7f5e98655700 (LWP 8317))]
(gdb) bt
#0  __find_specmb (format=0x7f5e980de374 "%s") at printf-parse.h:108
#1  _IO_vfprintf_internal (s=0x7f5e98650560, format=0x7f5e980de374 "%s", 
ap=0x7f5e98652c08) at vfprintf.c:1312
#2  0x00007f5e97fc3c53 in buffered_vfprintf (s=0x7f5e98314520 
<_IO_2_1_stderr_>, format=<optimized out>, args=<optimized out>) at 
vfprintf.c:2325
#3  0x00007f5e97fc0f25 in _IO_vfprintf_internal 
(s=s@entry=0x7f5e98314520 <_IO_2_1_stderr_>, 
format=format@entry=0x7f5e980de374 "%s", ap=ap@entry=0x7f5e98652c08) at 
vfprintf.c:1293
#4  0x00007f5e97fe09b2 in __fxprintf (fp=0x7f5e98314520 
<_IO_2_1_stderr_>, fp@entry=0x0, fmt=fmt@entry=0x7f5e980de374 "%s") at 
fxprintf.c:50
#5  0x00007f5e97fa5de0 in __assert_fail_base (fmt=0x7f5e980df310 
"%s%s%s:%u: %s%sAssertion `%s' failed.\n%n", 
assertion=assertion@entry=0x563930234ee7 "0 && \"BUG: unset HTTP 
method\"", file=file@entry=0x563930234ee0 "http.c", line=line@entry=80, 
function=function@entry=0x563930235440 "http_process_client")
     at assert.c:64
#6  0x00007f5e97fa5f12 in __GI___assert_fail (assertion=0x563930234ee7 
"0 && \"BUG: unset HTTP method\"", file=0x563930234ee0 "http.c", 
line=80, function=0x563930235440 "http_process_client") at assert.c:101
#7  0x000056393020b66f in ?? ()
#8  0x000056393020bfa5 in ?? ()
#9  0x000056393020c10e in ?? ()
#10 0x000056393021328d in ?? ()
#11 0x00007f5e983204a4 in start_thread (arg=0x7f5e98655700) at 
pthread_create.c:456
#12 0x00007f5e98062d0f in clone () at 
../sysdeps/unix/sysv/linux/x86_64/clone.S:97

(gdb) t 2
[Switching to thread 2 (Thread 0x7f5e98591700 (LWP 8345))]
#0  0x00007f5e98059c0a in sendfile64 () at 
../sysdeps/unix/syscall-template.S:84
84    ../sysdeps/unix/syscall-template.S: No such file or directory.
(gdb) bt
#0  0x00007f5e98059c0a in sendfile64 () at 
../sysdeps/unix/syscall-template.S:84
#1  0x000056393020cedf in ?? ()
#2  0x000056393020bfa5 in ?? ()
#3  0x000056393020c10e in ?? ()
#4  0x000056393021328d in ?? ()
#5  0x00007f5e983204a4 in start_thread (arg=0x7f5e98591700) at 
pthread_create.c:456
#6  0x00007f5e98062d0f in clone () at 
../sysdeps/unix/sysv/linux/x86_64/clone.S:97

(gdb) t 3
[Switching to thread 3 (Thread 0x7f5e9870b700 (LWP 8291))]
#0  0x00007f5e98063303 in epoll_wait () at 
../sysdeps/unix/syscall-template.S:84
84    ../sysdeps/unix/syscall-template.S: No such file or directory.
(gdb) bt
#0  0x00007f5e98063303 in epoll_wait () at 
../sysdeps/unix/syscall-template.S:84
#1  0x00005639302130ab in ?? ()
#2  0x00005639302132c0 in ?? ()
#3  0x00007f5e983204a4 in start_thread (arg=0x7f5e9870b700) at 
pthread_create.c:456
#4  0x00007f5e98062d0f in clone () at 
../sysdeps/unix/sysv/linux/x86_64/clone.S:97

Any idea? If you need more info, please just ask!

Best regards
Arkadi Colson

On 12/12/19 20:16, Eric Wong wrote:
> Arkadi Colson<arkadi@smartbit.be>  wrote:
>> We are running cmogstored version 1.7.1 compiled and running on Debian
>> stretch. How do I catch backtrace and debugging symbols?
> Btw, did you run older versions of cmogstored w/o troubles?
> Thanks.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Heavy load
  2019-12-12 19:16     ` Eric Wong
  2019-12-17  7:40       ` Arkadi Colson
@ 2019-12-17  7:41       ` Arkadi Colson
  2019-12-17  8:31         ` Eric Wong
  1 sibling, 1 reply; 20+ messages in thread
From: Arkadi Colson @ 2019-12-17  7:41 UTC (permalink / raw)
  To: Eric Wong; +Cc: cmogstored-public@bogomips.org

This is the output from another crash:

[Current thread is 1 (Thread 0x7f9aa16f2700 (LWP 3318))]
(gdb) bt
#0  __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:51
#1  0x00007f9aa217742a in __GI_abort () at abort.c:89
#2  0x00007f9aa21b3c00 in __libc_message (do_abort=do_abort@entry=2, 
fmt=fmt@entry=0x7f9aa22a8fd0 "*** Error in `%s': %s: 0x%s ***\n") at 
../sysdeps/posix/libc_fatal.c:175
#3  0x00007f9aa21b9fc6 in malloc_printerr (action=3, str=0x7f9aa22a5b8a 
"free(): invalid pointer", ptr=<optimized out>, ar_ptr=<optimized out>) 
at malloc.c:5049
#4  0x00007f9aa21ba80e in _int_free (av=0x7f9aa24dcb00 <main_arena>, 
p=0x7f9a580010f0, have_lock=0) at malloc.c:3905
#5  0x000055d501c98659 in ?? ()
#6  0x000055d501c9bfba in ?? ()
#7  0x000055d501c9c10e in ?? ()
#8  0x000055d501ca328d in ?? ()
#9  0x00007f9aa24e94a4 in start_thread (arg=0x7f9aa16f2700) at 
pthread_create.c:456
#10 0x00007f9aa222bd0f in clone () at 
../sysdeps/unix/sysv/linux/x86_64/clone.S:97

Best regards
Arkadi Colson

On 12/12/19 20:16, Eric Wong wrote:
> Arkadi Colson<arkadi@smartbit.be>  wrote:
>> We are running cmogstored version 1.7.1 compiled and running on Debian
>> stretch. How do I catch backtrace and debugging symbols?
> Btw, did you run older versions of cmogstored w/o troubles?
> Thanks.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Heavy load
  2019-12-17  7:41       ` Arkadi Colson
@ 2019-12-17  8:31         ` Eric Wong
  2019-12-17  8:43           ` Arkadi Colson
  0 siblings, 1 reply; 20+ messages in thread
From: Eric Wong @ 2019-12-17  8:31 UTC (permalink / raw)
  To: Arkadi Colson; +Cc: cmogstored-public

Arkadi Colson <arkadi@smartbit.be> wrote:
> This is the output from another crash:
> 
> [Current thread is 1 (Thread 0x7f9aa16f2700 (LWP 3318))]
> (gdb) bt
> #0  __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:51
> #1  0x00007f9aa217742a in __GI_abort () at abort.c:89
> #2  0x00007f9aa21b3c00 in __libc_message (do_abort=do_abort@entry=2, 
> fmt=fmt@entry=0x7f9aa22a8fd0 "*** Error in `%s': %s: 0x%s ***\n") at 
> ../sysdeps/posix/libc_fatal.c:175
> #3  0x00007f9aa21b9fc6 in malloc_printerr (action=3, str=0x7f9aa22a5b8a 
> "free(): invalid pointer", ptr=<optimized out>, ar_ptr=<optimized out>) 
> at malloc.c:5049
> #4  0x00007f9aa21ba80e in _int_free (av=0x7f9aa24dcb00 <main_arena>, 
> p=0x7f9a580010f0, have_lock=0) at malloc.c:3905
> #5  0x000055d501c98659 in ?? ()
> #6  0x000055d501c9bfba in ?? ()
> #7  0x000055d501c9c10e in ?? ()
> #8  0x000055d501ca328d in ?? ()

Thanks, I really wish #5..#8 had more info instead of "??"

Was cmogstored compiled with -ggdb3? (should be the default)

When you run "file /path/to/cmogstored", it should
have something like: "with debug_info, not stripped"

You can check the "CFLAGS =" line in the Makefile after
running configure.  Thanks.

> #9  0x00007f9aa24e94a4 in start_thread (arg=0x7f9aa16f2700) at 
> pthread_create.c:456
> #10 0x00007f9aa222bd0f in clone () at 
> ../sysdeps/unix/sysv/linux/x86_64/clone.S:97

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Heavy load
  2019-12-17  7:40       ` Arkadi Colson
@ 2019-12-17  8:43         ` Eric Wong
  2019-12-17  8:57           ` Arkadi Colson
  0 siblings, 1 reply; 20+ messages in thread
From: Eric Wong @ 2019-12-17  8:43 UTC (permalink / raw)
  To: Arkadi Colson; +Cc: cmogstored-public

Arkadi Colson <arkadi@smartbit.be> wrote:
> Hi
> 
> Yes we had this already on older version but less frequent. Below you 

Oh, I would've welcomed any bug reports much earlier :)

Which version(s)?  I had more time to work on this in the past.
Were any versions always good?

> can find the backtrace:
> 
> [Current thread is 1 (Thread 0x7f5e98655700 (LWP 8317))]
> (gdb) bt
> #0  __find_specmb (format=0x7f5e980de374 "%s") at printf-parse.h:108
> #1  _IO_vfprintf_internal (s=0x7f5e98650560, format=0x7f5e980de374 "%s", 
> ap=0x7f5e98652c08) at vfprintf.c:1312
> #2  0x00007f5e97fc3c53 in buffered_vfprintf (s=0x7f5e98314520 
> <_IO_2_1_stderr_>, format=<optimized out>, args=<optimized out>) at 
> vfprintf.c:2325
> #3  0x00007f5e97fc0f25 in _IO_vfprintf_internal 
> (s=s@entry=0x7f5e98314520 <_IO_2_1_stderr_>, 
> format=format@entry=0x7f5e980de374 "%s", ap=ap@entry=0x7f5e98652c08) at 
> vfprintf.c:1293
> #4  0x00007f5e97fe09b2 in __fxprintf (fp=0x7f5e98314520 
> <_IO_2_1_stderr_>, fp@entry=0x0, fmt=fmt@entry=0x7f5e980de374 "%s") at 
> fxprintf.c:50
> #5  0x00007f5e97fa5de0 in __assert_fail_base (fmt=0x7f5e980df310 
> "%s%s%s:%u: %s%sAssertion `%s' failed.\n%n", 
> assertion=assertion@entry=0x563930234ee7 "0 && \"BUG: unset HTTP 
> method\"", file=file@entry=0x563930234ee0 "http.c", line=line@entry=80, 
> function=function@entry=0x563930235440 "http_process_client")
>      at assert.c:64
> #6  0x00007f5e97fa5f12 in __GI___assert_fail (assertion=0x563930234ee7 
> "0 && \"BUG: unset HTTP method\"", file=0x563930234ee0 "http.c", 
> line=80, function=0x563930235440 "http_process_client") at assert.c:101
> #7  0x000056393020b66f in ?? ()
> #8  0x000056393020bfa5 in ?? ()
> #9  0x000056393020c10e in ?? ()
> #10 0x000056393021328d in ?? ()

Yeah, -ggdb3 should be giving info on those "??" lines.
I just double-checked the HTTP parser and it should never
even get to http.c line=80 if the method was unrecognized;
so I suspect there's some other memory corruption bug
which isn't the parser...

> #11 0x00007f5e983204a4 in start_thread (arg=0x7f5e98655700) at 
> pthread_create.c:456
> #12 0x00007f5e98062d0f in clone () at 
> ../sysdeps/unix/sysv/linux/x86_64/clone.S:97
> 
> (gdb) t 2
> [Switching to thread 2 (Thread 0x7f5e98591700 (LWP 8345))]

> (gdb) t 3
> [Switching to thread 3 (Thread 0x7f5e9870b700 (LWP 8291))]

<snip>

Any more threads?  (just a number of threads is fine).

> Any idea? If you need more info, please just ask!

How many "/devXYZ" devices do you have?  Are they all
on different partitions?

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Heavy load
  2019-12-17  8:31         ` Eric Wong
@ 2019-12-17  8:43           ` Arkadi Colson
  2019-12-17  8:50             ` Eric Wong
  0 siblings, 1 reply; 20+ messages in thread
From: Arkadi Colson @ 2019-12-17  8:43 UTC (permalink / raw)
  To: Eric Wong; +Cc: cmogstored-public@bogomips.org

root@mogstore:/tmp/cmogstored-1.7.1# file cmogstored
cmogstored: ELF 64-bit LSB shared object, x86-64, version 1 (SYSV), 
dynamically linked, interpreter /lib64/ld-linux-x86-64.so.2, for 
GNU/Linux 2.6.32, 
BuildID[sha1]=99ac0b8fa751bab384af658b8de346ac8e9fc6b7, not stripped

CFLAGS are correct. I did a recompile because I'm not sure it was build 
with ggdb3... Let's wait for the next crash.

Arkadi

On 17/12/19 09:31, Eric Wong wrote:
> Arkadi Colson<arkadi@smartbit.be>  wrote:
>> This is the output from another crash:
>>
>> [Current thread is 1 (Thread 0x7f9aa16f2700 (LWP 3318))]
>> (gdb) bt
>> #0  __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:51
>> #1  0x00007f9aa217742a in __GI_abort () at abort.c:89
>> #2  0x00007f9aa21b3c00 in __libc_message (do_abort=do_abort@entry=2,
>> fmt=fmt@entry=0x7f9aa22a8fd0 "*** Error in `%s': %s: 0x%s ***\n") at
>> ../sysdeps/posix/libc_fatal.c:175
>> #3  0x00007f9aa21b9fc6 in malloc_printerr (action=3, str=0x7f9aa22a5b8a
>> "free(): invalid pointer", ptr=<optimized out>, ar_ptr=<optimized out>)
>> at malloc.c:5049
>> #4  0x00007f9aa21ba80e in _int_free (av=0x7f9aa24dcb00 <main_arena>,
>> p=0x7f9a580010f0, have_lock=0) at malloc.c:3905
>> #5  0x000055d501c98659 in ?? ()
>> #6  0x000055d501c9bfba in ?? ()
>> #7  0x000055d501c9c10e in ?? ()
>> #8  0x000055d501ca328d in ?? ()
> Thanks, I really wish #5..#8 had more info instead of "??"
>
> Was cmogstored compiled with -ggdb3? (should be the default)
>
> When you run "file /path/to/cmogstored", it should
> have something like: "with debug_info, not stripped"
>
> You can check the "CFLAGS =" line in the Makefile after
> running configure.  Thanks.
>
>> #9  0x00007f9aa24e94a4 in start_thread (arg=0x7f9aa16f2700) at
>> pthread_create.c:456
>> #10 0x00007f9aa222bd0f in clone () at
>> ../sysdeps/unix/sysv/linux/x86_64/clone.S:97

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Heavy load
  2019-12-17  8:43           ` Arkadi Colson
@ 2019-12-17  8:50             ` Eric Wong
  0 siblings, 0 replies; 20+ messages in thread
From: Eric Wong @ 2019-12-17  8:50 UTC (permalink / raw)
  To: Arkadi Colson; +Cc: cmogstored-public

Arkadi Colson <arkadi@smartbit.be> wrote:
> root@mogstore:/tmp/cmogstored-1.7.1# file cmogstored
> cmogstored: ELF 64-bit LSB shared object, x86-64, version 1 (SYSV), 
> dynamically linked, interpreter /lib64/ld-linux-x86-64.so.2, for 
> GNU/Linux 2.6.32, 
> BuildID[sha1]=99ac0b8fa751bab384af658b8de346ac8e9fc6b7, not stripped
> 
> CFLAGS are correct. I did a recompile because I'm not sure it was build 
> with ggdb3... Let's wait for the next crash.

Yeah, there's no "debug_info" above.
I'm tired, will try to think of other things later...

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Heavy load
  2019-12-17  8:43         ` Eric Wong
@ 2019-12-17  8:57           ` Arkadi Colson
  2019-12-17 19:42             ` Eric Wong
  0 siblings, 1 reply; 20+ messages in thread
From: Arkadi Colson @ 2019-12-17  8:57 UTC (permalink / raw)
  To: Eric Wong; +Cc: cmogstored-public@bogomips.org



On 17/12/19 09:43, Eric Wong wrote:
> Arkadi Colson<arkadi@smartbit.be>  wrote:
>> Hi
>>
>> Yes we had this already on older version but less frequent. Below you
> Oh, I would've welcomed any bug reports much earlier :)
>
> Which version(s)?  I had more time to work on this in the past.
> Were any versions always good?

I know, I should have reported this earlier ;-)

Last version 1.7.1

Hard to say, I remember me more crashes in the past but not that 
frequently. Maybe it's because our cluster is much more stressed.

>> can find the backtrace:
>>
>> [Current thread is 1 (Thread 0x7f5e98655700 (LWP 8317))]
>> (gdb) bt
>> #0  __find_specmb (format=0x7f5e980de374 "%s") at printf-parse.h:108
>> #1  _IO_vfprintf_internal (s=0x7f5e98650560, format=0x7f5e980de374 "%s",
>> ap=0x7f5e98652c08) at vfprintf.c:1312
>> #2  0x00007f5e97fc3c53 in buffered_vfprintf (s=0x7f5e98314520
>> <_IO_2_1_stderr_>, format=<optimized out>, args=<optimized out>) at
>> vfprintf.c:2325
>> #3  0x00007f5e97fc0f25 in _IO_vfprintf_internal
>> (s=s@entry=0x7f5e98314520 <_IO_2_1_stderr_>,
>> format=format@entry=0x7f5e980de374 "%s", ap=ap@entry=0x7f5e98652c08) at
>> vfprintf.c:1293
>> #4  0x00007f5e97fe09b2 in __fxprintf (fp=0x7f5e98314520
>> <_IO_2_1_stderr_>, fp@entry=0x0, fmt=fmt@entry=0x7f5e980de374 "%s") at
>> fxprintf.c:50
>> #5  0x00007f5e97fa5de0 in __assert_fail_base (fmt=0x7f5e980df310
>> "%s%s%s:%u: %s%sAssertion `%s' failed.\n%n",
>> assertion=assertion@entry=0x563930234ee7 "0 && \"BUG: unset HTTP
>> method\"", file=file@entry=0x563930234ee0 "http.c", line=line@entry=80,
>> function=function@entry=0x563930235440 "http_process_client")
>>       at assert.c:64
>> #6  0x00007f5e97fa5f12 in __GI___assert_fail (assertion=0x563930234ee7
>> "0 && \"BUG: unset HTTP method\"", file=0x563930234ee0 "http.c",
>> line=80, function=0x563930235440 "http_process_client") at assert.c:101
>> #7  0x000056393020b66f in ?? ()
>> #8  0x000056393020bfa5 in ?? ()
>> #9  0x000056393020c10e in ?? ()
>> #10 0x000056393021328d in ?? ()
> Yeah, -ggdb3 should be giving info on those "??" lines.
> I just double-checked the HTTP parser and it should never
> even get to http.c line=80 if the method was unrecognized;
> so I suspect there's some other memory corruption bug
> which isn't the parser...
>
>> #11 0x00007f5e983204a4 in start_thread (arg=0x7f5e98655700) at
>> pthread_create.c:456
>> #12 0x00007f5e98062d0f in clone () at
>> ../sysdeps/unix/sysv/linux/x86_64/clone.S:97
>>
>> (gdb) t 2
>> [Switching to thread 2 (Thread 0x7f5e98591700 (LWP 8345))]
>> (gdb) t 3
>> [Switching to thread 3 (Thread 0x7f5e9870b700 (LWP 8291))]
> <snip>
>
> Any more threads?  (just a number of threads is fine).

Other threads are reporting this:

(gdb) t 4
[Switching to thread 4 (Thread 0x7f5e986be700 (LWP 8302))]
#0  0x00007f5e98063303 in epoll_wait () at 
../sysdeps/unix/syscall-template.S:84
84    in ../sysdeps/unix/syscall-template.S
(gdb) bt
#0  0x00007f5e98063303 in epoll_wait () at 
../sysdeps/unix/syscall-template.S:84
#1  0x00005639302130ab in ?? ()
#2  0x00005639302132c0 in ?? ()
#3  0x00007f5e983204a4 in start_thread (arg=0x7f5e986be700) at 
pthread_create.c:456
#4  0x00007f5e98062d0f in clone () at 
../sysdeps/unix/sysv/linux/x86_64/clone.S:97


> Any idea? If you need more info, please just ask!
We are having about 192 devices spread over about 23 cmogstored hosts. 
Each device is one disk with one partition...
> How many "/devXYZ" devices do you have?  Are they all
> on different partitions?

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Heavy load
  2019-12-17  8:57           ` Arkadi Colson
@ 2019-12-17 19:42             ` Eric Wong
  2019-12-18  7:56               ` Arkadi Colson
  0 siblings, 1 reply; 20+ messages in thread
From: Eric Wong @ 2019-12-17 19:42 UTC (permalink / raw)
  To: Arkadi Colson; +Cc: cmogstored-public

Arkadi Colson <arkadi@smartbit.be> wrote:
> > Any idea? If you need more info, please just ask!
> We are having about 192 devices spread over about 23 cmogstored hosts. 
> Each device is one disk with one partition...
> > How many "/devXYZ" devices do you have?  Are they all
> > on different partitions?

OK, thanks.  I've only got a single host nowadays with 3
rotational HDD.  Most I ever had was 20 rotational HDD on a
host but that place is out-of-business.

Since your build did not include -ggdb3 with debug_info by default;
I wonder if there's something broken in your build system or
build scripts...  Which compiler are you using?

Can you share the output of "ldd /path/to/cmogstored" ?

Since this is Linux, you're not using libkqueue, are you?

Also, which Linux kernel is it?

Are you using "server aio_threads =" via mgmt interface?

Are you using the undocumented -W/--worker-processes or
-M (multi-config) option(s?)

Is your traffic read-heavy or write-heavy?
I just tested 10s parallel PUTs and 100s of parallel GETs
without any problems.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Heavy load
  2019-12-17 19:42             ` Eric Wong
@ 2019-12-18  7:56               ` Arkadi Colson
  2019-12-18 17:58                 ` Eric Wong
  0 siblings, 1 reply; 20+ messages in thread
From: Arkadi Colson @ 2019-12-18  7:56 UTC (permalink / raw)
  To: Eric Wong; +Cc: cmogstored-public@bogomips.org

On 17/12/19 20:42, Eric Wong wrote:
> Arkadi Colson <arkadi@smartbit.be> wrote:
>>> Any idea? If you need more info, please just ask!
>> We are having about 192 devices spread over about 23 cmogstored hosts.
>> Each device is one disk with one partition...
>>> How many "/devXYZ" devices do you have?  Are they all
>>> on different partitions?
> OK, thanks.  I've only got a single host nowadays with 3
> rotational HDD.  Most I ever had was 20 rotational HDD on a
> host but that place is out-of-business.
>
> Since your build did not include -ggdb3 with debug_info by default;
> I wonder if there's something broken in your build system or
> build scripts...  Which compiler are you using?
>
> Can you share the output of "ldd /path/to/cmogstored" ?
root@mogstore:~# ldd /usr/bin/cmogstored
     linux-vdso.so.1 (0x00007fff6584b000)
     libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0 
(0x00007f634a7f2000)
     libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007f634a453000)
     /lib64/ld-linux-x86-64.so.2 (0x00007f634ac50000)

> Since this is Linux, you're not using libkqueue, are you?
>
> Also, which Linux kernel is it?

In fact it's a clean debian stretch installation with htis kernel:

root@mogstore:~# uname -a
Linux mogstore 4.9.0-11-amd64 #1 SMP Debian 4.9.189-3+deb9u2 
(2019-11-11) x86_64 GNU/Linux

>
> Are you using "server aio_threads =" via mgmt interface?
I don't think so. How can I verify this?
>
> Are you using the undocumented -W/--worker-processes or
> -M (multi-config) option(s?)

I don't think so: Config looks like this:

httplisten  = 0.0.0.0:7500
mgmtlisten  = 0.0.0.0:7501
maxconns    = 10000
docroot     = /var/mogdata
daemonize   = 1
server      = none

>
> Is your traffic read-heavy or write-heavy?
We saw peaks of 3Gb traffic on the newest cmogstore when marking one 
host dead...
> I just tested 10s parallel PUTs and 100s of parallel GETs
> without any problems.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Heavy load
  2019-12-18  7:56               ` Arkadi Colson
@ 2019-12-18 17:58                 ` Eric Wong
  2020-01-06  9:46                   ` Arkadi Colson
  0 siblings, 1 reply; 20+ messages in thread
From: Eric Wong @ 2019-12-18 17:58 UTC (permalink / raw)
  To: Arkadi Colson; +Cc: cmogstored-public

Arkadi Colson <arkadi@smartbit.be> wrote:
> On 17/12/19 20:42, Eric Wong wrote:
> > Arkadi Colson <arkadi@smartbit.be> wrote:
> >>> Any idea? If you need more info, please just ask!
> >> We are having about 192 devices spread over about 23 cmogstored hosts.
> >> Each device is one disk with one partition...
> >>> How many "/devXYZ" devices do you have?  Are they all
> >>> on different partitions?
> > OK, thanks.  I've only got a single host nowadays with 3
> > rotational HDD.  Most I ever had was 20 rotational HDD on a
> > host but that place is out-of-business.
> >
> > Since your build did not include -ggdb3 with debug_info by default;
> > I wonder if there's something broken in your build system or
> > build scripts...  Which compiler are you using?
> >
> > Can you share the output of "ldd /path/to/cmogstored" ?
> root@mogstore:~# ldd /usr/bin/cmogstored
>      linux-vdso.so.1 (0x00007fff6584b000)
>      libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0 
> (0x00007f634a7f2000)
>      libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007f634a453000)
>      /lib64/ld-linux-x86-64.so.2 (0x00007f634ac50000)
> 
> > Since this is Linux, you're not using libkqueue, are you?

I was hoping for a simple explanation with libkqueue being
the culprit, but that's not it.

Have you gotten a better backtrace with debug_info? (-ggdb3)

> > Also, which Linux kernel is it?
> 
> In fact it's a clean debian stretch installation with htis kernel:
> 
> root@mogstore:~# uname -a
> Linux mogstore 4.9.0-11-amd64 #1 SMP Debian 4.9.189-3+deb9u2 
> (2019-11-11) x86_64 GNU/Linux

OK, I don't think there's known problems with that kernel;
cmogstored is pretty sensitive to OS bugs and bugs in
emulation layers like libkqueue.

> > Are you using "server aio_threads =" via mgmt interface?
> I don't think so. How can I verify this?

You'd have something connecting to the mgmt port (7501 in your
case) and setting "server aio_threads = $NUMBER".

> > Are you using the undocumented -W/--worker-processes or
> > -M (multi-config) option(s?)
> 
> I don't think so: Config looks like this:
> 
> httplisten  = 0.0.0.0:7500
> mgmtlisten  = 0.0.0.0:7501
> maxconns    = 10000
> docroot     = /var/mogdata
> daemonize   = 1
> server      = none

OK

> >
> > Is your traffic read-heavy or write-heavy?
> We saw peaks of 3Gb traffic on the newest cmogstore when marking one 
> host dead...

Are you able to reproduce the problem on a test instance with
just cmogstored?
(no need for full MogileFS instance, just PUT/GET over HTTP).

Also, are you on SSD or HDD?  Lower latency of SSD could trigger
some bugs.  The design is for high-latency HDD, but it ought to
work well with SSD, too.  I haven't tested with SSD, much,
unfortunately.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Heavy load
  2019-12-18 17:58                 ` Eric Wong
@ 2020-01-06  9:46                   ` Arkadi Colson
  2020-01-08  3:35                     ` Eric Wong
  0 siblings, 1 reply; 20+ messages in thread
From: Arkadi Colson @ 2020-01-06  9:46 UTC (permalink / raw)
  To: Eric Wong; +Cc: cmogstored-public@bogomips.org



On 18/12/19 18:58, Eric Wong wrote:
> Arkadi Colson <arkadi@smartbit.be> wrote:
>> On 17/12/19 20:42, Eric Wong wrote:
>>> Arkadi Colson <arkadi@smartbit.be> wrote:
>>>>> Any idea? If you need more info, please just ask!
>>>> We are having about 192 devices spread over about 23 cmogstored hosts.
>>>> Each device is one disk with one partition...
>>>>> How many "/devXYZ" devices do you have? Are they all
>>>>> on different partitions?
>>> OK, thanks. I've only got a single host nowadays with 3
>>> rotational HDD. Most I ever had was 20 rotational HDD on a
>>> host but that place is out-of-business.
>>>
>>> Since your build did not include -ggdb3 with debug_info by default;
>>> I wonder if there's something broken in your build system or
>>> build scripts... Which compiler are you using?
>>>
>>> Can you share the output of "ldd /path/to/cmogstored" ?
>> root@mogstore:~# ldd /usr/bin/cmogstored
>>     linux-vdso.so.1 (0x00007fff6584b000)
>>     libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0
>> (0x00007f634a7f2000)
>>     libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007f634a453000)
>>     /lib64/ld-linux-x86-64.so.2 (0x00007f634ac50000)
>>
>>> Since this is Linux, you're not using libkqueue, are you?
> I was hoping for a simple explanation with libkqueue being
> the culprit, but that's not it.
>
> Have you gotten a better backtrace with debug_info? (-ggdb3)
No, no other crashes so far...
>>> Also, which Linux kernel is it?
>> In fact it's a clean debian stretch installation with htis kernel:
>>
>> root@mogstore:~# uname -a
>> Linux mogstore 4.9.0-11-amd64 #1 SMP Debian 4.9.189-3+deb9u2
>> (2019-11-11) x86_64 GNU/Linux
> OK, I don't think there's known problems with that kernel;
> cmogstored is pretty sensitive to OS bugs and bugs in
> emulation layers like libkqueue.
>
>>> Are you using "server aio_threads =" via mgmt interface?
>> I don't think so. How can I verify this?
> You'd have something connecting to the mgmt port (7501 in your
> case) and setting "server aio_threads = $NUMBER".
I'm not sure I'm getting this setting. How can I set or get this 
setting? We did not configure anything like this anywhere. So I presume 
it uses the default setting?
>
>>> Are you using the undocumented -W/--worker-processes or
>>> -M (multi-config) option(s?)
>> I don't think so: Config looks like this:
>>
>> httplisten  = 0.0.0.0:7500
>> mgmtlisten  = 0.0.0.0:7501
>> maxconns    = 10000
>> docroot     = /var/mogdata
>> daemonize   = 1
>> server      = none
> OK
>
>>> Is your traffic read-heavy or write-heavy?
>> We saw peaks of 3Gb traffic on the newest cmogstore when marking one
>> host dead...
> Are you able to reproduce the problem on a test instance with
> just cmogstored?
> (no need for full MogileFS instance, just PUT/GET over HTTP).
I had no time yet to try and reproduce the problem in another 
environment. We try to do it as soon as possible...
> Also, are you on SSD or HDD? Lower latency of SSD could trigger
> some bugs. The design is for high-latency HDD, but it ought to
> work well with SSD, too. I haven't tested with SSD, much,
> unfortunately.

We only use HDD, no SSD

I will come back to you with more information after the next crash with 
more debug info. For me it's OK to put the case on hold for now... By 
the way, thanks a lot already for your help!

BR
Arkadi


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Heavy load
  2020-01-06  9:46                   ` Arkadi Colson
@ 2020-01-08  3:35                     ` Eric Wong
  2020-01-08  9:40                       ` Arkadi Colson
  0 siblings, 1 reply; 20+ messages in thread
From: Eric Wong @ 2020-01-08  3:35 UTC (permalink / raw)
  To: Arkadi Colson; +Cc: cmogstored-public, cmogstored-public

Arkadi Colson <arkadi@smartbit.be> wrote:
> On 18/12/19 18:58, Eric Wong wrote:
> > Arkadi Colson <arkadi@smartbit.be> wrote:
> >> On 17/12/19 20:42, Eric Wong wrote:
> >>> Arkadi Colson <arkadi@smartbit.be> wrote:
> >>>>> Any idea? If you need more info, please just ask!
> >>>> We are having about 192 devices spread over about 23 cmogstored hosts.
> >>>> Each device is one disk with one partition...
> >>>>> How many "/devXYZ" devices do you have? Are they all
> >>>>> on different partitions?
> >>> OK, thanks. I've only got a single host nowadays with 3
> >>> rotational HDD. Most I ever had was 20 rotational HDD on a
> >>> host but that place is out-of-business.
> >>>
> >>> Since your build did not include -ggdb3 with debug_info by default;
> >>> I wonder if there's something broken in your build system or
> >>> build scripts... Which compiler are you using?
> >>>
> >>> Can you share the output of "ldd /path/to/cmogstored" ?
> >> root@mogstore:~# ldd /usr/bin/cmogstored
> >>     linux-vdso.so.1 (0x00007fff6584b000)
> >>     libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0
> >> (0x00007f634a7f2000)
> >>     libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007f634a453000)
> >>     /lib64/ld-linux-x86-64.so.2 (0x00007f634ac50000)
> >>
> >>> Since this is Linux, you're not using libkqueue, are you?
> > I was hoping for a simple explanation with libkqueue being
> > the culprit, but that's not it.
> >
> > Have you gotten a better backtrace with debug_info? (-ggdb3)
> No, no other crashes so far...

OK, so still no crash after a few weeks?

Btw, new address will be:  cmogstored-public@yhbt.net
bogomips.org is going away since I can't afford it
(and I hate ICANN for what they're doing to .org TLD)

> >>> Also, which Linux kernel is it?
> >> In fact it's a clean debian stretch installation with htis kernel:
> >>
> >> root@mogstore:~# uname -a
> >> Linux mogstore 4.9.0-11-amd64 #1 SMP Debian 4.9.189-3+deb9u2
> >> (2019-11-11) x86_64 GNU/Linux
> > OK, I don't think there's known problems with that kernel;
> > cmogstored is pretty sensitive to OS bugs and bugs in
> > emulation layers like libkqueue.
> >
> >>> Are you using "server aio_threads =" via mgmt interface?
> >> I don't think so. How can I verify this?
> > You'd have something connecting to the mgmt port (7501 in your
> > case) and setting "server aio_threads = $NUMBER".
> I'm not sure I'm getting this setting. How can I set or get this 
> setting? We did not configure anything like this anywhere. So I presume 
> it uses the default setting?

Yes, it's probably the default.  There's no way of reading the
actual value in either cmogstored or Perl mogstored, only
setting it.  You can however display the actual number threads
the kernel is using by listing /proc/$PID/task/ or using tools like
ps(1)/top(1).

In either case, you can set threads by using telnet to type that
parameter in via telnet|nc|socat|whatever.

> >>> Are you using the undocumented -W/--worker-processes or
> >>> -M (multi-config) option(s?)
> >> I don't think so: Config looks like this:
> >>
> >> httplisten  = 0.0.0.0:7500
> >> mgmtlisten  = 0.0.0.0:7501
> >> maxconns    = 10000
> >> docroot     = /var/mogdata
> >> daemonize   = 1
> >> server      = none
> > OK
> >
> >>> Is your traffic read-heavy or write-heavy?
> >> We saw peaks of 3Gb traffic on the newest cmogstore when marking one
> >> host dead...
> > Are you able to reproduce the problem on a test instance with
> > just cmogstored?
> > (no need for full MogileFS instance, just PUT/GET over HTTP).
> I had no time yet to try and reproduce the problem in another 
> environment. We try to do it as soon as possible...

OK.  Since it was near holidays, was your traffic higher or lower?
(I know some shopping sites hit traffic spikes, but not sure
about your case).

> > Also, are you on SSD or HDD? Lower latency of SSD could trigger
> > some bugs. The design is for high-latency HDD, but it ought to
> > work well with SSD, too. I haven't tested with SSD, much,
> > unfortunately.
> 
> We only use HDD, no SSD
> 
> I will come back to you with more information after the next crash with 
> more debug info. For me it's OK to put the case on hold for now... By 
> the way, thanks a lot already for your help!

No problem.  I'm sorry for it crashing.  Btw, was that "ldd"
output from the binary after recompiling with -ggdb3?  If you
could get the ldd output from a binary which caused a crash,
it would be good to compare them in case the original build
was broken.

Fwiw, the only Linux segfault I ever saw in production was fixed in:
https://bogomips.org/cmogstored-public/e8217a1fe0cf341b/s/
And that was because it was using the -W/--worker-process feature.
That instance saw many TB of traffic every day for years and never
saw any other problem.


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Heavy load
  2020-01-08  3:35                     ` Eric Wong
@ 2020-01-08  9:40                       ` Arkadi Colson
  2020-01-30  0:35                         ` Eric Wong
  0 siblings, 1 reply; 20+ messages in thread
From: Arkadi Colson @ 2020-01-08  9:40 UTC (permalink / raw)
  To: Eric Wong; +Cc: cmogstored-public@yhbt.net, cmogstored-public@bogomips.org



Met vriendelijke groeten
Arkadi Colson

Smartschool • Digitaal Schoolplatform
Smartbit bvba • Hoogstraat 13 • 3670 Meeuwen
www.smartschool.be • info@smartschool.be
T +32 11 64 08 80 • F +32 11 64 08 81


On 8/01/20 04:35, Eric Wong wrote:
> Arkadi Colson <arkadi@smartbit.be> wrote:
>> On 18/12/19 18:58, Eric Wong wrote:
>>> Arkadi Colson <arkadi@smartbit.be> wrote:
>>>> On 17/12/19 20:42, Eric Wong wrote:
>>>>> Arkadi Colson <arkadi@smartbit.be> wrote:
>>>>>>> Any idea? If you need more info, please just ask!
>>>>>> We are having about 192 devices spread over about 23 cmogstored hosts.
>>>>>> Each device is one disk with one partition...
>>>>>>> How many "/devXYZ" devices do you have? Are they all
>>>>>>> on different partitions?
>>>>> OK, thanks. I've only got a single host nowadays with 3
>>>>> rotational HDD. Most I ever had was 20 rotational HDD on a
>>>>> host but that place is out-of-business.
>>>>>
>>>>> Since your build did not include -ggdb3 with debug_info by default;
>>>>> I wonder if there's something broken in your build system or
>>>>> build scripts... Which compiler are you using?
>>>>>
>>>>> Can you share the output of "ldd /path/to/cmogstored" ?
>>>> root@mogstore:~# ldd /usr/bin/cmogstored
>>>>      linux-vdso.so.1 (0x00007fff6584b000)
>>>>      libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0
>>>> (0x00007f634a7f2000)
>>>>      libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007f634a453000)
>>>>      /lib64/ld-linux-x86-64.so.2 (0x00007f634ac50000)
>>>>
>>>>> Since this is Linux, you're not using libkqueue, are you?
>>> I was hoping for a simple explanation with libkqueue being
>>> the culprit, but that's not it.
>>>
>>> Have you gotten a better backtrace with debug_info? (-ggdb3)
>> No, no other crashes so far...
> OK, so still no crash after a few weeks?
Nope, but in holidays there is less activity on the application so I'm 
afraid we have to wait until it happens again...
> Btw, new address will be:  cmogstored-public@yhbt.net
> bogomips.org is going away since I can't afford it
> (and I hate ICANN for what they're doing to .org TLD)
>
>>>>> Also, which Linux kernel is it?
>>>> In fact it's a clean debian stretch installation with htis kernel:
>>>>
>>>> root@mogstore:~# uname -a
>>>> Linux mogstore 4.9.0-11-amd64 #1 SMP Debian 4.9.189-3+deb9u2
>>>> (2019-11-11) x86_64 GNU/Linux
>>> OK, I don't think there's known problems with that kernel;
>>> cmogstored is pretty sensitive to OS bugs and bugs in
>>> emulation layers like libkqueue.
>>>
>>>>> Are you using "server aio_threads =" via mgmt interface?
>>>> I don't think so. How can I verify this?
>>> You'd have something connecting to the mgmt port (7501 in your
>>> case) and setting "server aio_threads = $NUMBER".
>> I'm not sure I'm getting this setting. How can I set or get this
>> setting? We did not configure anything like this anywhere. So I presume
>> it uses the default setting?
> Yes, it's probably the default.  There's no way of reading the
> actual value in either cmogstored or Perl mogstored, only
> setting it.  You can however display the actual number threads
> the kernel is using by listing /proc/$PID/task/ or using tools like
> ps(1)/top(1).
root@mogstore:~# ls /proc/787/task/ | wc -l
114

> In either case, you can set threads by using telnet to type that
> parameter in via telnet|nc|socat|whatever.
>
>>>>> Are you using the undocumented -W/--worker-processes or
>>>>> -M (multi-config) option(s?)
>>>> I don't think so: Config looks like this:
>>>>
>>>> httplisten  = 0.0.0.0:7500
>>>> mgmtlisten  = 0.0.0.0:7501
>>>> maxconns    = 10000
>>>> docroot     = /var/mogdata
>>>> daemonize   = 1
>>>> server      = none
>>> OK
>>>
>>>>> Is your traffic read-heavy or write-heavy?
>>>> We saw peaks of 3Gb traffic on the newest cmogstore when marking one
>>>> host dead...
>>> Are you able to reproduce the problem on a test instance with
>>> just cmogstored?
>>> (no need for full MogileFS instance, just PUT/GET over HTTP).
>> I had no time yet to try and reproduce the problem in another
>> environment. We try to do it as soon as possible...
> OK.  Since it was near holidays, was your traffic higher or lower?
> (I know some shopping sites hit traffic spikes, but not sure
> about your case).
Lower in our case, so let's wait until it happens again and I will get 
back to you. I tried to reproduce in a test environment but could not 
reproduce...
>
>>> Also, are you on SSD or HDD? Lower latency of SSD could trigger
>>> some bugs. The design is for high-latency HDD, but it ought to
>>> work well with SSD, too. I haven't tested with SSD, much,
>>> unfortunately.
>> We only use HDD, no SSD
>>
>> I will come back to you with more information after the next crash with
>> more debug info. For me it's OK to put the case on hold for now... By
>> the way, thanks a lot already for your help!
> No problem.  I'm sorry for it crashing.  Btw, was that "ldd"
> output from the binary after recompiling with -ggdb3?  If you
> could get the ldd output from a binary which caused a crash,
> it would be good to compare them in case the original build
> was broken.
ldd looks the same before and after recompiling
>
> Fwiw, the only Linux segfault I ever saw in production was fixed in:
> https://bogomips.org/cmogstored-public/e8217a1fe0cf341b/s/
> And that was because it was using the -W/--worker-process feature.
> That instance saw many TB of traffic every day for years and never
> saw any other problem.
>

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Heavy load
  2020-01-08  9:40                       ` Arkadi Colson
@ 2020-01-30  0:35                         ` Eric Wong
  2020-03-03 15:46                           ` Arkadi Colson
  0 siblings, 1 reply; 20+ messages in thread
From: Eric Wong @ 2020-01-30  0:35 UTC (permalink / raw)
  To: Arkadi Colson; +Cc: cmogstored-public

Hey Arkadi, it's been a few weeks and I expect school is in
session.  Any updates on how cmogstored is holding up?

(Or does it get most busy during exam times?)

Thanks.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Heavy load
  2020-01-30  0:35                         ` Eric Wong
@ 2020-03-03 15:46                           ` Arkadi Colson
  0 siblings, 0 replies; 20+ messages in thread
From: Arkadi Colson @ 2020-03-03 15:46 UTC (permalink / raw)
  To: Eric Wong; +Cc: cmogstored-public@yhbt.net

Hi

Sorry for the late answer but everything is still working fine. I think 
it was a combination of high load with broken disks and fsck running or 
something. Anyway, when it does happen again I will get back to you with 
more debugging information.


Best regards
Arkadi Colson

On 30/01/20 01:35, Eric Wong wrote:
> Hey Arkadi, it's been a few weeks and I expect school is in
> session.  Any updates on how cmogstored is holding up?
>
> (Or does it get most busy during exam times?)
>
> Thanks.

^ permalink raw reply	[flat|nested] 20+ messages in thread

end of thread, other threads:[~2020-03-03 15:46 UTC | newest]

Thread overview: 20+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-12-11 13:54 Heavy load Arkadi Colson
2019-12-11 17:06 ` Eric Wong
2019-12-12  7:30   ` Arkadi Colson
2019-12-12  7:59     ` Eric Wong
2019-12-12 19:16     ` Eric Wong
2019-12-17  7:40       ` Arkadi Colson
2019-12-17  8:43         ` Eric Wong
2019-12-17  8:57           ` Arkadi Colson
2019-12-17 19:42             ` Eric Wong
2019-12-18  7:56               ` Arkadi Colson
2019-12-18 17:58                 ` Eric Wong
2020-01-06  9:46                   ` Arkadi Colson
2020-01-08  3:35                     ` Eric Wong
2020-01-08  9:40                       ` Arkadi Colson
2020-01-30  0:35                         ` Eric Wong
2020-03-03 15:46                           ` Arkadi Colson
2019-12-17  7:41       ` Arkadi Colson
2019-12-17  8:31         ` Eric Wong
2019-12-17  8:43           ` Arkadi Colson
2019-12-17  8:50             ` Eric Wong

Code repositories for project(s) associated with this public inbox

	https://yhbt.net/cmogstored.git/

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).