cmogstored.git - alternative mogstored implementation for MogileFS

Date	Commit message (Collapse)
2013-12-09	thrpool: sleep instead of yield when poking thread
	This unfortunate loop burned too much CPU on FreeBSD and caused shutdown to take too long when using sched_yield. nanosleep for 10ms instead, hopefully allowing the system to accomplish some disk I/O and other tasks before we poke it again. Reported-by: Mikolaj Golub
2013-07-14	downgrade thread/device-count fields to unsigned int
	It's unlikely we'll even come close to see 2-4 billion devices in a MogileFS instance for a while. Meanwhile, it's also unlikely the kernel will ever run that many threads, either. So make it easier to pack and shrink data structures to save a few bytes and perhaps get better memory alignement. For reference, the POSIX semaphore API specifies initial values with unsigned (int) values, too. This leads to a minor size reduction (and we're not even packing): $ ~/linux/scripts/bloat-o-meter cmogstored.before cmogstored add/remove: 0/0 grow/shrink: 0/13 up/down: 0/-86 (-86) function old new delta mog_svc_dev_quit_prepare 13 12 -1 mog_mgmt_fn_aio_threads 147 146 -1 mog_dev_user_rescale_i 27 26 -1 mog_ioq_requeue_prepare 52 50 -2 mog_ioq_init 80 78 -2 mog_thrpool_start 101 96 -5 mog_svc_dev_user_rescale 143 137 -6 mog_svc_start_each 264 256 -8 mog_svc_aio_threads_handler 257 249 -8 mog_ioq_ready 263 255 -8 mog_ioq_next 303 295 -8 mog_svc_thrpool_rescale 206 197 -9 mog_thrpool_set_size 1028 1001 -27
2013-07-11	mgmt: checksumming is interruptible during thread shutdown
	We want to yield dying threads as soon as possible during thread shutdown, so we check the quit flag and yield the running thread to trigger a MOG_NEXT_ACTIVE.
2013-06-25	introduce mog_yield wrapper around sched_yield/pthread_yield
	While pthread_yield is non-standard, it is relatively common and preferable for systems where pthreads are _not_ 1:1 mapped to kernel threads. This also provides a stronger yield to weaken the priority of the calling thread wherever we previously used sched_yield.
2013-06-25	call sched_yield repeatedly when terminating threads
	This should allow the threads we're terminating to more quickly enter a safe state where they're allowed to exit. On SMP systems, we need to yield the signalling thread more times to increase the probability the interrupted thread can run (and exit).
2013-06-25	replace pthreads cancellation with explicit checks
	Due to data/event loss, we cannot rely on normal syscalls (accept/epoll_wait) being cancellation points. The benefits of using a standardized API to terminate threads asynchronously are lost when toggling cancellation flags. This implementation allows us to be more explicit and obvious at the few points where our worker threads may exit and reduces the amount of code we have. By avoiding the calls to pthread_setcancelstate, we should halve the number of atomic operations required in the common case (where the thread is not marked for termination).
2013-06-25	refactor handling of "server aio_threads = " command
	We're using per-svc-based thread pools, so different MogileFS instances we serve no longer affect each other. This means changing the aio_threads count only affects the svc of the sidechannel port which triggered the change.
2013-06-25	switch to per-svc (per-docroot) queues
	This simplifies code, reduces contention, and reduces the chances of independent MogileFS instances (with one instance of cmogstored) stepping over each other. Most cmogstored deployments are single docroot (for a single instance of MogileFS), however cmogstored supports multiple docroots for some rare configurations and we support them here.
2013-06-25	thrpool: add comment explaining minimum thread count
	I forgot why this bound was necessary, so add a comment ensuring I do not forget again.
2013-06-25	update aio_threads count when new devices appear
	This will help ensure availability when new devices are added, without additional user interaction to manually set aio_threads via sidechannel.
2013-02-16	handle pthread_create returning ENOMEM on old glibc
	Older glibc will return ENOMEM on mprotect() failures. This bug was only fixed in 2011, so the long-term distros and old installations may not have the necessary backports. ref: http://www.sourceware.org/bugzilla/show_bug.cgi?id=386
2013-02-16	graceful handling of pthread_create EAGAIN failure
	pthread_create may return EAGAIN as a temporary failure, do not abort a running process if this is the case. For the initial mountlist scan, we must retry indefinitely for cmogstored to be usable. However, with our thread pools, we can always run fewer threads (as long as there is at least one thread per-pool).
2013-01-17	copyright comment updates for 2013
	gnulib did it for us in m4/gnulib-cache.m4, we'll match.
2012-12-08	thrpool: signal threads concurrently at shutdown
	This speeds up shutdown for kqueue users, as kevent() is not a cancellation point. While we're at it, remove the unnecessary check for mog_queue. before pthread_kill(). This check was a remnant of the old, NOTE_TRIGGER-based implementation.
2012-11-12	mgmt: support "server aio_threads = <digit>"
	This allows tunable thread counts at runtime like regular mogstored (using Perlbal).
2012-05-02	kqueue: rely on EINTR instead of EVFILT_USER to shutdown
	Using pthread_cancel() and pthread_kill() allows us to do shutdowns of individual threads in the future. EVFILT_USER will just spam the kernel and the thread-specific "dying" hack won't work if we only want to shut down a single thread. kevent() is not a cancellation point in FreeBSD and will not be in libkqueue, either. However libkqueue will set errno==EINTR if it is interrupted, allowing cancellation requests to go through.
2012-04-21	kqueue: schedule wakeup of sleepers during shutdown
	By explicitly giving kevent() sleepers a chance to wakeup and run, we can reduce the number of times we need to trigger wakeups via NOTE_TRIGGER.
2012-04-21	queue: rework kevent cancellation handling
	The kevent() function as implemented by libkqueue does not support thread cancellation the same way a real kevent() (on FreeBSD) appears to. So pretend no implementation of kevent() is cancelable and handle cancellation ourselves using pthread_testcancel(). This allows us to support any platform where kevent() may work, since it's unclear if other *BSDs implement kevent() as a cancellation point.
2012-04-20	only set explicit stack size on GNU/libc and FreeBSD
	We don't know enough about the libc other platforms to make an intelligent choice about stack size, so just use the default to avoid potential problems.
2012-04-19	thrpool: use default stack size for libkqueue users
	libkqueue appears to use a lot of stack, so just use the default stack size to avoid unexplained segfaults.
2012-02-11	do not rely on BUFSIZ=8192
	BUFSIZ is only 1024 on FreeBSD, this is too small to be optimal for large I/O operations.
2012-02-10	threads based on the number of usable major devices
	This should really be tunable, but we can do that later.
2012-01-18	thrpool: add BUFSIZ (8K on glibc) to thread stack
	The *printf() family of functions may allocate BUFSIZ on the stack. We'll need those functions (including syslog(3)) in various places, so it's safer to have more stack (and it can give more meaningful assert() messages).
2012-01-11	initial commit
	Nuked old history since it was missing copyright/GPLv3 notices.