Date | Commit message (Collapse) |
|
This unfortunate loop burned too much CPU on FreeBSD and caused
shutdown to take too long when using sched_yield. nanosleep for
10ms instead, hopefully allowing the system to accomplish some
disk I/O and other tasks before we poke it again.
Reported-by: Mikolaj Golub
|
|
It's unlikely we'll even come close to see 2-4 billion devices in a
MogileFS instance for a while. Meanwhile, it's also unlikely the
kernel will ever run that many threads, either. So make it easier
to pack and shrink data structures to save a few bytes and perhaps
get better memory alignement.
For reference, the POSIX semaphore API specifies initial values
with unsigned (int) values, too.
This leads to a minor size reduction (and we're not even packing):
$ ~/linux/scripts/bloat-o-meter cmogstored.before cmogstored
add/remove: 0/0 grow/shrink: 0/13 up/down: 0/-86 (-86)
function old new delta
mog_svc_dev_quit_prepare 13 12 -1
mog_mgmt_fn_aio_threads 147 146 -1
mog_dev_user_rescale_i 27 26 -1
mog_ioq_requeue_prepare 52 50 -2
mog_ioq_init 80 78 -2
mog_thrpool_start 101 96 -5
mog_svc_dev_user_rescale 143 137 -6
mog_svc_start_each 264 256 -8
mog_svc_aio_threads_handler 257 249 -8
mog_ioq_ready 263 255 -8
mog_ioq_next 303 295 -8
mog_svc_thrpool_rescale 206 197 -9
mog_thrpool_set_size 1028 1001 -27
|
|
We want to yield dying threads as soon as possible during
thread shutdown, so we check the quit flag and yield the
running thread to trigger a MOG_NEXT_ACTIVE.
|
|
While pthread_yield is non-standard, it is relatively common and
preferable for systems where pthreads are _not_ 1:1 mapped to kernel
threads. This also provides a stronger yield to weaken the priority
of the calling thread wherever we previously used sched_yield.
|
|
This should allow the threads we're terminating to more quickly
enter a safe state where they're allowed to exit. On SMP systems,
we need to yield the signalling thread more times to increase the
probability the interrupted thread can run (and exit).
|
|
Due to data/event loss, we cannot rely on normal syscalls
(accept/epoll_wait) being cancellation points. The benefits of
using a standardized API to terminate threads asynchronously are
lost when toggling cancellation flags.
This implementation allows us to be more explicit and obvious at the
few points where our worker threads may exit and reduces the amount
of code we have. By avoiding the calls to pthread_setcancelstate,
we should halve the number of atomic operations required in the
common case (where the thread is not marked for termination).
|
|
We're using per-svc-based thread pools, so different MogileFS
instances we serve no longer affect each other. This means
changing the aio_threads count only affects the svc of the
sidechannel port which triggered the change.
|
|
This simplifies code, reduces contention, and reduces the
chances of independent MogileFS instances (with one instance
of cmogstored) stepping over each other.
Most cmogstored deployments are single docroot (for a single
instance of MogileFS), however cmogstored supports multiple
docroots for some rare configurations and we support them here.
|
|
I forgot why this bound was necessary, so add a comment
ensuring I do not forget again.
|
|
This will help ensure availability when new devices are added,
without additional user interaction to manually set aio_threads
via sidechannel.
|
|
Older glibc will return ENOMEM on mprotect() failures. This bug
was only fixed in 2011, so the long-term distros and old
installations may not have the necessary backports.
ref: http://www.sourceware.org/bugzilla/show_bug.cgi?id=386
|
|
pthread_create may return EAGAIN as a temporary failure,
do not abort a running process if this is the case.
For the initial mountlist scan, we must retry indefinitely for
cmogstored to be usable. However, with our thread pools, we can
always run fewer threads (as long as there is at least one
thread per-pool).
|
|
gnulib did it for us in m4/gnulib-cache.m4, we'll match.
|
|
This speeds up shutdown for kqueue users, as kevent() is not a
cancellation point.
While we're at it, remove the unnecessary check for mog_queue.
before pthread_kill(). This check was a remnant of the old,
NOTE_TRIGGER-based implementation.
|
|
This allows tunable thread counts at runtime like regular
mogstored (using Perlbal).
|
|
Using pthread_cancel() and pthread_kill() allows us to do
shutdowns of individual threads in the future. EVFILT_USER will
just spam the kernel and the thread-specific "dying" hack won't
work if we only want to shut down a single thread.
kevent() is not a cancellation point in FreeBSD and will
not be in libkqueue, either. However libkqueue will set
errno==EINTR if it is interrupted, allowing cancellation
requests to go through.
|
|
By explicitly giving kevent() sleepers a chance to wakeup
and run, we can reduce the number of times we need to trigger
wakeups via NOTE_TRIGGER.
|
|
The kevent() function as implemented by libkqueue does not
support thread cancellation the same way a real kevent() (on
FreeBSD) appears to. So pretend no implementation of kevent()
is cancelable and handle cancellation ourselves using
pthread_testcancel(). This allows us to support any platform
where kevent() may work, since it's unclear if other *BSDs
implement kevent() as a cancellation point.
|
|
We don't know enough about the libc other platforms to make an
intelligent choice about stack size, so just use the default to
avoid potential problems.
|
|
libkqueue appears to use a lot of stack, so just use
the default stack size to avoid unexplained segfaults.
|
|
BUFSIZ is only 1024 on FreeBSD, this is too small to be
optimal for large I/O operations.
|
|
This should really be tunable, but we can do that later.
|
|
The *printf() family of functions may allocate BUFSIZ on the stack.
We'll need those functions (including syslog(3)) in various
places, so it's safer to have more stack (and it can give more
meaningful assert() messages).
|
|
Nuked old history since it was missing copyright/GPLv3 notices.
|