cmogstored.git - alternative mogstored implementation for MogileFS

Date	Commit message (Collapse)
2013-12-02	selfwake: do share pipe descriptors with workers
	This only affects users of the undocumented --worker-processes switch. Furthermore, this only affects non-Linux platforms which rely on the pipe implementation of selfwake. This prevents us from wasting one extraneous file descriptor slot (and hence potentially wasting 128 bytes in userland).
2013-10-12	avoid use-after-free with multi-process setups
	readdir on the same DIR pointer is undefined if DIR was inherited by multiple children. Using the reentrant readdir_r would not have helped, since the underlying file descriptor and kernel file handle were still shared (and we need rewinddir, too). This readdir usage bug existed in cmogstored since the earliest releases, but was harmless until the cmogstored 1.3 series. This misuse of readdir lead to hitting a leftover call to free(). So this bug only manifested since commit 1fab1e7a7f03f3bc0abb1b5181117f2d4605ce3b (svc: implement top-level by_mog_devid hash) Fortunately, these bugs only affect users of the undocumented multi-process feature (not just multi-threaded).
2013-07-10	remove assertion for handling iostat death
	This only triggered if the (undocumented) --worker-processes option is used. This assertion is no longer valid as of commit d5a52618ca1f9b5d7f6998716fbfe7714f927112 (refactor handling of "server aio_threads = " command)
2013-06-25	replace pthreads cancellation with explicit checks
	Due to data/event loss, we cannot rely on normal syscalls (accept/epoll_wait) being cancellation points. The benefits of using a standardized API to terminate threads asynchronously are lost when toggling cancellation flags. This implementation allows us to be more explicit and obvious at the few points where our worker threads may exit and reduces the amount of code we have. By avoiding the calls to pthread_setcancelstate, we should halve the number of atomic operations required in the common case (where the thread is not marked for termination).
2013-06-25	fix "shutdown" over sidechannel with epoll_pwait
	The "shutdown" command needs to trigger EINTR when using epoll_pwait, otherwise the sleeping thread may not wake up properly.
2013-06-25	refactor handling of "server aio_threads = " command
	We're using per-svc-based thread pools, so different MogileFS instances we serve no longer affect each other. This means changing the aio_threads count only affects the svc of the sidechannel port which triggered the change.
2013-06-25	switch to per-svc (per-docroot) queues
	This simplifies code, reduces contention, and reduces the chances of independent MogileFS instances (with one instance of cmogstored) stepping over each other. Most cmogstored deployments are single docroot (for a single instance of MogileFS), however cmogstored supports multiple docroots for some rare configurations and we support them here.
2013-06-25	limit acceptors to reduce contention on large machines
	Having too many acceptor threads does not help, as it leads to lock contention in the accept syscalls and the EPOLL_CTL_ADD paths. The fair FIFO ordering of _blocking_ accept/accept4 syscalls also means we trigger unnecessary task switching and incur cache misses under high load. Since it is almost impossible for the acceptor threads to be stuck on disk I/O since commit 832316624f7a8f44b3e1d78a8a7a62a399241840 ("acceptor threads push directly into event queue")
2013-06-25	update aio_threads count when new devices appear
	This will help ensure availability when new devices are added, without additional user interaction to manually set aio_threads via sidechannel.
2013-05-06	favor "struct mog_fd" for acceptors over int FDs
	There's no reason to be referencing FDs for these acceptors since they're infrequently accessed by svc, so this should make our internals more consistent. This also removes our use of mog_fd_get (outside of test code).
2013-02-18	document/reserve SIGWINCH/SIGHUP for future use v1.2.0
	Despite having an extensive test suite and minimal room for user error, giving users the options to back out of a hot upgrade may be worth supporting.
2013-02-18	move cmogstored_exit() prototype to cmogstored.h
	This fixes a missing prototype warning for cmogstored_exit() when checking exit.c with sparse.
2013-02-15	avoid racy sleep on fork failure in master process
	We need to atomically enable interrupts and sleep with the same syscall. Fortunately, using pselect (through mog_sleep) allows that and is POSIX-compliant, so use that.
2013-02-11	fix signal races when master process is used
	In the absence of a pselect/ppoll-like version of waitpid; we must use a selfwake descriptor (pipe or eventfd) to wake the master up whenever a signal is received. So wait on the selfwake descriptor and always run waitpid with WNOHANG in a loop to ensure all children are reaped. The: mog_intr_disable(); waitpid(); mog_intr_enable() sequence was completely stupid I can't believe I wrote it.
2013-02-11	prioritize upgrade before exit in main loop
	If we receive both SIGUSR2 and SIGQUIT in a short time period; we should trigger the upgrade before gsince raceful exit; as no user will (intentionally) send SIGQUIT before SIGUSR2.
2013-01-31	cmogstored: initialize syslog before inheriting
	This ensures the: inherited $ADDRESS:$PORT on fd=... messages are prefixed with the PID in logs.
2013-01-31	cfg: daemonize is a boolean, not an integer
	This project uses C99 features (and some GNU extensions), so bool is usable.
2013-01-31	minimize interrupt windows for master process
	Code is easier to follow when interrupts occur at well-defined points. The worker processes (and master-less standalone) already follows this.
2013-01-31	implement nginx-style binary upgrade via SIGUSR2
	USR2 now forks a new cmogstored process which inherits listener file descriptors from the parent. The parent renames its pidfile with a ".oldbin" suffix so the new child can use the new PID file. Clusters may now upgrade to future versions of cmogstored without needing to mark hosts down via mogadm. The behavior of this process should match that of nginx: http://wiki.nginx.org/CommandLine#Upgrading_To_a_New_Binary_On_The_Fly
2013-01-31	refactor process management
	To support transparent upgrades, we need to be able to reap child processes regardless of what the child process was. So we must do away with the iostat/worker-specifc waitpid() calls and use waitpid(-1) to cast a wide net to reap anything and everything. When we support transparent upgrades, the fork+exec-ed child process may die, so the main process (master if --worker-processes are used) needs to be capable of reaping that new process.
2013-01-31	inherit: preliminary FD inheritance over exec()
	This lets us inherit listen sockets from the parent process, in the future this will allow transparent upgrades.
2013-01-31	move graceful exit functionality into its own file
	No need to clutter up the main file with graceful exit functionality.
2013-01-31	move pidfile preparation function out
	cmogstored.c is too big, we can move pidfile functionality out to pidfile.c easily.
2013-01-25	limit --worker-processes to UINT_MAX
	UINT_MAX worker processes should be more than enough for anyone.
2013-01-17	copyright comment updates for 2013
	gnulib did it for us in m4/gnulib-cache.m4, we'll match.
2013-01-02	epoll: avoid EPOLL_CTL_MOD bug in Linux <= 3.7.1
	On SMP machines, EPOLL_CTL_MOD had a race condition under Linux <= 3.7.1. This allowed events to be missed if it arrived near the time the EPOLL_CTL_MOD request was issued. ref: linux.git commit 128dd1759d96ad36c379240f8b9463e8acfd37a1
2012-12-08	retry if iostat spawn hit out-of-FD
	Creating the iostat pipe may fail when we're under FD pressure. Ensure iostat can recover in the future once FD pressure is reduced. iostat retries are governed by the usage file generation interval: 10 seconds
2012-12-05	cleanup and simplify interrupt/shutdown handling
	We now assume any non-zero timeout (not just infinite timeout) is cancellable as well as interruptible. This means mog_idleq_wait will no longer retry blindly on EINTR. Handling of explicit device refreshes and aio_threads is now easier-to-understand and follow.
2012-11-14	mgmt: support "shutdown" command (from Perlbal)
	This allows admins to reuse management scripts originally written for Perl mogstored with cmogstored.
2012-11-13	cmogstored: add a no-op --skipconfig switch
	Unlike Perl mogstored, we currently do not load a default configuration. However, this switch makes it easier to use cmogstored as a drop-in replacement for mogstored when testing MogileFS::Server
2012-11-12	import progname from gnulib for error messages
	This avoids segfaulting on error messages with non-glibc systems (e.g. FreeBSD 9.0).
2012-10-30	die() using error() from glibc/gnulib
	Compared to our previous function, this: * prefixes the message with the program name * can show the strerror() message more consistently * automatically adds a trailing newline for us
2012-08-04	scale acceptor threads to number of CPUs available
	acceptor threads no longer touch the filesystem (and rarely ever did so in practice before this), so there's no need to scale them up based on device count.
2012-05-02	kqueue: rely on EINTR instead of EVFILT_USER to shutdown
	Using pthread_cancel() and pthread_kill() allows us to do shutdowns of individual threads in the future. EVFILT_USER will just spam the kernel and the thread-specific "dying" hack won't work if we only want to shut down a single thread. kevent() is not a cancellation point in FreeBSD and will not be in libkqueue, either. However libkqueue will set errno==EINTR if it is interrupted, allowing cancellation requests to go through.
2012-04-21	queue: rework kevent cancellation handling
	The kevent() function as implemented by libkqueue does not support thread cancellation the same way a real kevent() (on FreeBSD) appears to. So pretend no implementation of kevent() is cancelable and handle cancellation ourselves using pthread_testcancel(). This allows us to support any platform where kevent() may work, since it's unclear if other *BSDs implement kevent() as a cancellation point.
2012-04-18	avoid usage file if mgmt sidechannel is inactive
	Similarly, if folks continue to rely on the Perl mogstored daemon for whatever reason, avoid potentially conflicting and having unnecessary wakeups/activity for usage file changes.
2012-04-18	do not spawn iostat if mgmt sidechannel is inactive
	Some folks may want to test cmogstored as a GET-only HTTP server and leave certain functionality to the original Perl mogstored.
2012-03-26	cleanup interrupt disabling/enabling
	We disable interrupts earlier so folks/scripts that are trigger-happy with sending signals won't cause us to use default signal handlers on us before we're ready.
2012-03-19	kill some unused function parameters
	No need to waste stack and registers for things we don't use.
2012-03-17	setproctitle: avoid __attribute__((constructor)) when using gcov
	I really hate supporting this hack, especially since the issue is fixed for newer gcc/gcov users. However, many systems are on older gcc and it takes a while for folks to upgrade, so it'd be nice to encourage more coverage testing. This isn't needed for newer gcov + gcc in Debian testing/unstable, but gcov/gcc 4.4.5-8 on Debian squeeze fails to pass argc/argv/envp to ((constructor)) functions when using gcov. We'll drop this hack when support for Debian squeeze is terminated (probably 2014-2015).
2012-03-16	cmogstored: disable short non-standard CLI switches
	Forcing ourselves to be more descriptive...
2012-03-15	graceful quit closes listen sockets ASAP
	This allows another server to be started without waiting for connected clients to reconnect. While we're at it, refactor this to avoid redundant code.
2012-03-15	set process title at graceful shutdown
	On GNU/Linux, this adds the setproctitle() function as implemented by the libnostd project: http://www.25thandclement.com/~william/projects/libnostd.html
2012-03-15	ensure graceful quit and memory release of GET-only HTTP
	Found by valgrind.
2012-03-15	limit accept() threads per-process if workers are used
	We don't want too many threads running accept() on one process because it can lead to unfair load balancing. This unfairness is difficult to avoid due to process/thread run ordering at startup and the wake-one behavior we rely on. So we just cut down on acceptors to minimize contention for the listen queues in this case.
2012-03-14	support for httpgetlisten config directive
	This makes it easy to support read-only HTTP traffic on a different listen port. This reduces listen queue contention and allows using iptables to block off DAV traffic from certain hosts while serving freely.
2012-03-14	test handling of the "server" configuration option
	We'll ensure "server=none" setups will disable HTTP support entirely.
2012-03-14	optional --worker-processes=NUM feature
	This setting this to a positive value ensures we stay running if there are any remotely triggerable crashes. Hopefully users will still send (good) bug reports in this case so we can fix them. We may also be able to use this feature to reduce unavoidable contention in some places, too: * kernel FD table * epoll/kqueue descriptor * global active queue * malloc()
2012-03-12	change thread count based on number of dev* entries
	This matches the behavior of Perl mogstored. Some systems (like one of mine) may have many major devices and fewer devices dedicated to MogileFS storage. This really should be tunable, though...
2012-02-25	implement graceful shutdown for outstanding requests
	By going into single-threaded mode, we can drastically simplify our shutdown sequence to avoid race conditions. This also allows us to not have additional overhead during normal runtime: as all the shutdown-specific logic is isolated to only a few portions of the code. Like all graceful shutdown schemes, this is one is still vulnerable to race conditions due to network latency, but this one should be no worse than any other server. Fortunately all requests we service are idempotent.