Date | Commit message (Collapse) |
|
This only affects users of the undocumented --worker-processes
switch. Furthermore, this only affects non-Linux platforms which
rely on the pipe implementation of selfwake.
This prevents us from wasting one extraneous file descriptor slot
(and hence potentially wasting 128 bytes in userland).
|
|
readdir on the same DIR pointer is undefined if DIR was inherited by
multiple children. Using the reentrant readdir_r would not have
helped, since the underlying file descriptor and kernel file handle
were still shared (and we need rewinddir, too).
This readdir usage bug existed in cmogstored since the earliest
releases, but was harmless until the cmogstored 1.3 series.
This misuse of readdir lead to hitting a leftover call to free().
So this bug only manifested since
commit 1fab1e7a7f03f3bc0abb1b5181117f2d4605ce3b
(svc: implement top-level by_mog_devid hash)
Fortunately, these bugs only affect users of the undocumented
multi-process feature (not just multi-threaded).
|
|
This only triggered if the (undocumented) --worker-processes
option is used. This assertion is no longer valid as of
commit d5a52618ca1f9b5d7f6998716fbfe7714f927112
(refactor handling of "server aio_threads = " command)
|
|
Due to data/event loss, we cannot rely on normal syscalls
(accept/epoll_wait) being cancellation points. The benefits of
using a standardized API to terminate threads asynchronously are
lost when toggling cancellation flags.
This implementation allows us to be more explicit and obvious at the
few points where our worker threads may exit and reduces the amount
of code we have. By avoiding the calls to pthread_setcancelstate,
we should halve the number of atomic operations required in the
common case (where the thread is not marked for termination).
|
|
The "shutdown" command needs to trigger EINTR when using
epoll_pwait, otherwise the sleeping thread may not wake up properly.
|
|
We're using per-svc-based thread pools, so different MogileFS
instances we serve no longer affect each other. This means
changing the aio_threads count only affects the svc of the
sidechannel port which triggered the change.
|
|
This simplifies code, reduces contention, and reduces the
chances of independent MogileFS instances (with one instance
of cmogstored) stepping over each other.
Most cmogstored deployments are single docroot (for a single
instance of MogileFS), however cmogstored supports multiple
docroots for some rare configurations and we support them here.
|
|
Having too many acceptor threads does not help, as it leads to
lock contention in the accept syscalls and the EPOLL_CTL_ADD
paths. The fair FIFO ordering of _blocking_ accept/accept4
syscalls also means we trigger unnecessary task switching and
incur cache misses under high load.
Since it is almost impossible for the acceptor threads to
be stuck on disk I/O since
commit 832316624f7a8f44b3e1d78a8a7a62a399241840
("acceptor threads push directly into event queue")
|
|
This will help ensure availability when new devices are added,
without additional user interaction to manually set aio_threads
via sidechannel.
|
|
There's no reason to be referencing FDs for these acceptors
since they're infrequently accessed by svc, so this should
make our internals more consistent. This also removes our
use of mog_fd_get (outside of test code).
|
|
Despite having an extensive test suite and minimal room for user
error, giving users the options to back out of a hot upgrade may
be worth supporting.
|
|
This fixes a missing prototype warning for cmogstored_exit()
when checking exit.c with sparse.
|
|
We need to atomically enable interrupts and sleep with
the same syscall. Fortunately, using pselect (through
mog_sleep) allows that and is POSIX-compliant, so use
that.
|
|
In the absence of a pselect/ppoll-like version of waitpid;
we must use a selfwake descriptor (pipe or eventfd) to
wake the master up whenever a signal is received.
So wait on the selfwake descriptor and always run waitpid
with WNOHANG in a loop to ensure all children are reaped.
The: mog_intr_disable(); waitpid(); mog_intr_enable()
sequence was completely stupid I can't believe I wrote it.
|
|
If we receive both SIGUSR2 and SIGQUIT in a short
time period; we should trigger the upgrade before
gsince raceful exit; as no user will (intentionally) send
SIGQUIT before SIGUSR2.
|
|
This ensures the: inherited $ADDRESS:$PORT on fd=...
messages are prefixed with the PID in logs.
|
|
This project uses C99 features (and some GNU extensions),
so bool is usable.
|
|
Code is easier to follow when interrupts occur at well-defined
points. The worker processes (and master-less standalone) already
follows this.
|
|
USR2 now forks a new cmogstored process which inherits
listener file descriptors from the parent. The parent
renames its pidfile with a ".oldbin" suffix so the new
child can use the new PID file.
Clusters may now upgrade to future versions of cmogstored
without needing to mark hosts down via mogadm.
The behavior of this process should match that of nginx:
http://wiki.nginx.org/CommandLine#Upgrading_To_a_New_Binary_On_The_Fly
|
|
To support transparent upgrades, we need to be able to reap
child processes regardless of what the child process was. So we
must do away with the iostat/worker-specifc waitpid() calls and
use waitpid(-1) to cast a wide net to reap anything and
everything.
When we support transparent upgrades, the fork+exec-ed child
process may die, so the main process (master if
--worker-processes are used) needs to be capable of reaping
that new process.
|
|
This lets us inherit listen sockets from the parent
process, in the future this will allow transparent
upgrades.
|
|
No need to clutter up the main file with graceful exit
functionality.
|
|
cmogstored.c is too big, we can move pidfile functionality
out to pidfile.c easily.
|
|
UINT_MAX worker processes should be more than enough for anyone.
|
|
gnulib did it for us in m4/gnulib-cache.m4, we'll match.
|
|
On SMP machines, EPOLL_CTL_MOD had a race condition under
Linux <= 3.7.1. This allowed events to be missed if it
arrived near the time the EPOLL_CTL_MOD request was issued.
ref: linux.git commit 128dd1759d96ad36c379240f8b9463e8acfd37a1
|
|
Creating the iostat pipe may fail when we're under FD
pressure. Ensure iostat can recover in the future
once FD pressure is reduced.
iostat retries are governed by the usage file generation
interval: 10 seconds
|
|
We now assume any non-zero timeout (not just infinite timeout)
is cancellable as well as interruptible. This means
mog_idleq_wait will no longer retry blindly on EINTR.
Handling of explicit device refreshes and aio_threads
is now easier-to-understand and follow.
|
|
This allows admins to reuse management scripts originally
written for Perl mogstored with cmogstored.
|
|
Unlike Perl mogstored, we currently do not load a default
configuration. However, this switch makes it easier to
use cmogstored as a drop-in replacement for mogstored when
testing MogileFS::Server
|
|
This avoids segfaulting on error messages with
non-glibc systems (e.g. FreeBSD 9.0).
|
|
Compared to our previous function, this:
* prefixes the message with the program name
* can show the strerror() message more consistently
* automatically adds a trailing newline for us
|
|
acceptor threads no longer touch the filesystem (and rarely ever
did so in practice before this), so there's no need to scale
them up based on device count.
|
|
Using pthread_cancel() and pthread_kill() allows us to do
shutdowns of individual threads in the future. EVFILT_USER will
just spam the kernel and the thread-specific "dying" hack won't
work if we only want to shut down a single thread.
kevent() is not a cancellation point in FreeBSD and will
not be in libkqueue, either. However libkqueue will set
errno==EINTR if it is interrupted, allowing cancellation
requests to go through.
|
|
The kevent() function as implemented by libkqueue does not
support thread cancellation the same way a real kevent() (on
FreeBSD) appears to. So pretend no implementation of kevent()
is cancelable and handle cancellation ourselves using
pthread_testcancel(). This allows us to support any platform
where kevent() may work, since it's unclear if other *BSDs
implement kevent() as a cancellation point.
|
|
Similarly, if folks continue to rely on the Perl mogstored
daemon for whatever reason, avoid potentially conflicting and
having unnecessary wakeups/activity for usage file changes.
|
|
Some folks may want to test cmogstored as a GET-only HTTP
server and leave certain functionality to the original Perl
mogstored.
|
|
We disable interrupts earlier so folks/scripts that are
trigger-happy with sending signals won't cause us to use
default signal handlers on us before we're ready.
|
|
No need to waste stack and registers for things we don't use.
|
|
I really hate supporting this hack, especially since the issue
is fixed for newer gcc/gcov users. However, many systems are on
older gcc and it takes a while for folks to upgrade, so it'd be
nice to encourage more coverage testing.
This isn't needed for newer gcov + gcc in Debian testing/unstable,
but gcov/gcc 4.4.5-8 on Debian squeeze fails to pass argc/argv/envp
to ((constructor)) functions when using gcov.
We'll drop this hack when support for Debian squeeze is terminated
(probably 2014-2015).
|
|
Forcing ourselves to be more descriptive...
|
|
This allows another server to be started without waiting for
connected clients to reconnect. While we're at it, refactor
this to avoid redundant code.
|
|
On GNU/Linux, this adds the setproctitle() function as implemented
by the libnostd project:
http://www.25thandclement.com/~william/projects/libnostd.html
|
|
Found by valgrind.
|
|
We don't want too many threads running accept() on one
process because it can lead to unfair load balancing.
This unfairness is difficult to avoid due to process/thread run
ordering at startup and the wake-one behavior we rely on.
So we just cut down on acceptors to minimize contention
for the listen queues in this case.
|
|
This makes it easy to support read-only HTTP traffic on a
different listen port.
This reduces listen queue contention and allows using iptables
to block off DAV traffic from certain hosts while serving
freely.
|
|
We'll ensure "server=none" setups will disable HTTP support
entirely.
|
|
This setting this to a positive value ensures we stay running if
there are any remotely triggerable crashes. Hopefully users
will still send (good) bug reports in this case so we can
fix them.
We may also be able to use this feature to reduce unavoidable
contention in some places, too:
* kernel FD table
* epoll/kqueue descriptor
* global active queue
* malloc()
|
|
This matches the behavior of Perl mogstored. Some
systems (like one of mine) may have many major devices
and fewer devices dedicated to MogileFS storage.
This really *should* be tunable, though...
|
|
By going into single-threaded mode, we can drastically simplify our
shutdown sequence to avoid race conditions. This also allows us
to not have additional overhead during normal runtime: as all the
shutdown-specific logic is isolated to only a few portions of
the code.
Like all graceful shutdown schemes, this is one is still vulnerable to
race conditions due to network latency, but this one should be no worse
than any other server. Fortunately all requests we service are
idempotent.
|