All the mail mirrored from lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 1/3] Fix race on thread shutdown causing deadlock
@ 2014-04-29  1:51 Andy Grover
  2014-04-29  1:51 ` [PATCH 2/3] Remove startup_lock Andy Grover
                   ` (2 more replies)
  0 siblings, 3 replies; 4+ messages in thread
From: Andy Grover @ 2014-04-29  1:51 UTC (permalink / raw
  To: stgt

This patch and the next are somewhat a revert of 318e9f2, but the previous
fix didn't quite close the race. This only happens when we create threads
for a backstore that turns out to be invalid, which we then tear down.

See https://bugzilla.redhat.com/show_bug.cgi?id=848585 .

This is occurring because there's still a window where a thread misses
seeing info->stop == 1 but is not yet in cond_wait so it misses the
broadcast:

thread_close:              thread_worker_fn:
                           info->stop is seen as 0
info->stop = 1
pthread_cond_broadcast     -- misses broadcast
                           pthread_cond_wait
pthread_join (hangs)

I believe the solution is to go back to using pthread_cancel. We can call
it before pthread_cond_wait is called (or after) and it will do the right
thing: pop out and exit. The only tricky bit is we need to use the
pthread_cleanup_push mechanism to properly release info->pending_lock.

Signed-off-by: Andy Grover <agrover@redhat.com>
---
 usr/bs.c        | 25 ++++++++++++++-----------
 usr/bs_thread.h |  2 --
 2 files changed, 14 insertions(+), 13 deletions(-)

diff --git a/usr/bs.c b/usr/bs.c
index 13d3b4e..d81aaee 100644
--- a/usr/bs.c
+++ b/usr/bs.c
@@ -213,6 +213,12 @@ static void bs_sig_request_done(int fd, int events, void *data)
 	}
 }
 
+/* Unlock mutex even if thread is cancelled */
+static void mutex_cleanup(void *mutex)
+{
+	pthread_mutex_unlock(mutex);
+}
+
 static void *bs_thread_worker_fn(void *arg)
 {
 	struct bs_thread_info *info = arg;
@@ -226,15 +232,13 @@ static void *bs_thread_worker_fn(void *arg)
 	dprintf("started this thread\n");
 	pthread_mutex_unlock(&info->startup_lock);
 
-	while (!info->stop) {
+	while (1) {
 		pthread_mutex_lock(&info->pending_lock);
+		pthread_cleanup_push(mutex_cleanup, &info->pending_lock);
+
 	retest:
 		if (list_empty(&info->pending_list)) {
 			pthread_cond_wait(&info->pending_cond, &info->pending_lock);
-			if (info->stop) {
-				pthread_mutex_unlock(&info->pending_lock);
-				pthread_exit(NULL);
-			}
 			goto retest;
 		}
 
@@ -242,7 +246,7 @@ static void *bs_thread_worker_fn(void *arg)
 				       struct scsi_cmd, bs_list);
 
 		list_del(&cmd->bs_list);
-		pthread_mutex_unlock(&info->pending_lock);
+		pthread_cleanup_pop(1); /* Unlock pending_lock mutex */
 
 		info->request_fn(cmd);
 
@@ -435,10 +439,10 @@ tgtadm_err bs_thread_open(struct bs_thread_info *info, request_func_t *rfn,
 
 	return TGTADM_SUCCESS;
 destroy_threads:
-	info->stop = 1;
 
 	pthread_mutex_unlock(&info->startup_lock);
 	for (; i > 0; i--) {
+		pthread_cancel(info->worker_thread[i - 1]);
 		pthread_join(info->worker_thread[i - 1], NULL);
 		eprintf("stopped the worker thread %d\n", i - 1);
 	}
@@ -455,18 +459,17 @@ void bs_thread_close(struct bs_thread_info *info)
 {
 	int i;
 
-	info->stop = 1;
 	pthread_cond_broadcast(&info->pending_cond);
 
-	for (i = 0; i < info->nr_worker_threads && info->worker_thread[i]; i++)
+	for (i = 0; i < info->nr_worker_threads && info->worker_thread[i]; i++) {
+		pthread_cancel(info->worker_thread[i]);
 		pthread_join(info->worker_thread[i], NULL);
+	}
 
 	pthread_cond_destroy(&info->pending_cond);
 	pthread_mutex_destroy(&info->pending_lock);
 	pthread_mutex_destroy(&info->startup_lock);
 	free(info->worker_thread);
-
-	info->stop = 0;
 }
 
 int bs_thread_cmd_submit(struct scsi_cmd *cmd)
diff --git a/usr/bs_thread.h b/usr/bs_thread.h
index a7e4063..a3ac551 100644
--- a/usr/bs_thread.h
+++ b/usr/bs_thread.h
@@ -13,8 +13,6 @@ struct bs_thread_info {
 
 	pthread_mutex_t startup_lock;
 
-	int stop;
-
 	request_func_t *request_fn;
 };
 
-- 
1.9.0

^ permalink raw reply related	[flat|nested] 4+ messages in thread

* [PATCH 2/3] Remove startup_lock
  2014-04-29  1:51 [PATCH 1/3] Fix race on thread shutdown causing deadlock Andy Grover
@ 2014-04-29  1:51 ` Andy Grover
  2014-04-29  1:51 ` [PATCH 3/3] Replace if()/goto with while() Andy Grover
  2014-04-30 14:29 ` [PATCH 1/3] Fix race on thread shutdown causing deadlock FUJITA Tomonori
  2 siblings, 0 replies; 4+ messages in thread
From: Andy Grover @ 2014-04-29  1:51 UTC (permalink / raw
  To: stgt

The purpose of the startup lock was to hold threads at the startup_lock
so if a thread creation fails, it's still possible to have them
end themselves by looking at info->stop. Since we are back to using
pthread_cancel instead of info->stop, we don't need this any more.
The only wrinkle is we need to only send pthread_cancels to
successfully-created threads, so now check this in the destroy_threads
reverse-loop.

Signed-off-by: Andy Grover <agrover@redhat.com>
---
 usr/bs.c        | 17 +++++------------
 usr/bs_thread.h |  2 --
 2 files changed, 5 insertions(+), 14 deletions(-)

diff --git a/usr/bs.c b/usr/bs.c
index d81aaee..d6b8015 100644
--- a/usr/bs.c
+++ b/usr/bs.c
@@ -228,9 +228,6 @@ static void *bs_thread_worker_fn(void *arg)
 	sigfillset(&set);
 	sigprocmask(SIG_BLOCK, &set, NULL);
 
-	pthread_mutex_lock(&info->startup_lock);
-	dprintf("started this thread\n");
-	pthread_mutex_unlock(&info->startup_lock);
 
 	while (1) {
 		pthread_mutex_lock(&info->pending_lock);
@@ -420,9 +417,7 @@ tgtadm_err bs_thread_open(struct bs_thread_info *info, request_func_t *rfn,
 
 	pthread_cond_init(&info->pending_cond, NULL);
 	pthread_mutex_init(&info->pending_lock, NULL);
-	pthread_mutex_init(&info->startup_lock, NULL);
 
-	pthread_mutex_lock(&info->startup_lock);
 	for (i = 0; i < nr_threads; i++) {
 		ret = pthread_create(&info->worker_thread[i], NULL,
 				     bs_thread_worker_fn, info);
@@ -434,22 +429,21 @@ tgtadm_err bs_thread_open(struct bs_thread_info *info, request_func_t *rfn,
 				goto destroy_threads;
 		}
 	}
-	pthread_mutex_unlock(&info->startup_lock);
 	info->nr_worker_threads = nr_threads;
 
 	return TGTADM_SUCCESS;
 destroy_threads:
 
-	pthread_mutex_unlock(&info->startup_lock);
 	for (; i > 0; i--) {
-		pthread_cancel(info->worker_thread[i - 1]);
-		pthread_join(info->worker_thread[i - 1], NULL);
-		eprintf("stopped the worker thread %d\n", i - 1);
+		if (info->worker_thread[i - 1]) {
+			pthread_cancel(info->worker_thread[i - 1]);
+			pthread_join(info->worker_thread[i - 1], NULL);
+			eprintf("stopped the worker thread %d\n", i - 1);
+		}
 	}
 
 	pthread_cond_destroy(&info->pending_cond);
 	pthread_mutex_destroy(&info->pending_lock);
-	pthread_mutex_destroy(&info->startup_lock);
 	free(info->worker_thread);
 
 	return TGTADM_NOMEM;
@@ -468,7 +462,6 @@ void bs_thread_close(struct bs_thread_info *info)
 
 	pthread_cond_destroy(&info->pending_cond);
 	pthread_mutex_destroy(&info->pending_lock);
-	pthread_mutex_destroy(&info->startup_lock);
 	free(info->worker_thread);
 }
 
diff --git a/usr/bs_thread.h b/usr/bs_thread.h
index a3ac551..260f73f 100644
--- a/usr/bs_thread.h
+++ b/usr/bs_thread.h
@@ -11,8 +11,6 @@ struct bs_thread_info {
 	/* protected by pending_lock */
 	struct list_head pending_list;
 
-	pthread_mutex_t startup_lock;
-
 	request_func_t *request_fn;
 };
 
-- 
1.9.0

^ permalink raw reply related	[flat|nested] 4+ messages in thread

* [PATCH 3/3] Replace if()/goto with while()
  2014-04-29  1:51 [PATCH 1/3] Fix race on thread shutdown causing deadlock Andy Grover
  2014-04-29  1:51 ` [PATCH 2/3] Remove startup_lock Andy Grover
@ 2014-04-29  1:51 ` Andy Grover
  2014-04-30 14:29 ` [PATCH 1/3] Fix race on thread shutdown causing deadlock FUJITA Tomonori
  2 siblings, 0 replies; 4+ messages in thread
From: Andy Grover @ 2014-04-29  1:51 UTC (permalink / raw
  To: stgt

Is equivalent and nicer-looking.

Signed-off-by: Andy Grover <agrover@redhat.com>
---
 usr/bs.c | 15 +++++----------
 1 file changed, 5 insertions(+), 10 deletions(-)

diff --git a/usr/bs.c b/usr/bs.c
index d6b8015..1ab86f7 100644
--- a/usr/bs.c
+++ b/usr/bs.c
@@ -122,11 +122,9 @@ retry:
 	}
 
 	pthread_mutex_lock(&finished_lock);
-retest:
-	if (list_empty(&finished_list)) {
+
+	while (list_empty(&finished_list))
 		pthread_cond_wait(&finished_cond, &finished_lock);
-		goto retest;
-	}
 
 	while (!list_empty(&finished_list)) {
 		cmd = list_first_entry(&finished_list,
@@ -228,16 +226,13 @@ static void *bs_thread_worker_fn(void *arg)
 	sigfillset(&set);
 	sigprocmask(SIG_BLOCK, &set, NULL);
 
-
 	while (1) {
 		pthread_mutex_lock(&info->pending_lock);
 		pthread_cleanup_push(mutex_cleanup, &info->pending_lock);
 
-	retest:
-		if (list_empty(&info->pending_list)) {
-			pthread_cond_wait(&info->pending_cond, &info->pending_lock);
-			goto retest;
-		}
+		while (list_empty(&info->pending_list))
+			pthread_cond_wait(&info->pending_cond,
+					  &info->pending_lock);
 
 		cmd = list_first_entry(&info->pending_list,
 				       struct scsi_cmd, bs_list);
-- 
1.9.0

^ permalink raw reply related	[flat|nested] 4+ messages in thread

* Re: [PATCH 1/3] Fix race on thread shutdown causing deadlock
  2014-04-29  1:51 [PATCH 1/3] Fix race on thread shutdown causing deadlock Andy Grover
  2014-04-29  1:51 ` [PATCH 2/3] Remove startup_lock Andy Grover
  2014-04-29  1:51 ` [PATCH 3/3] Replace if()/goto with while() Andy Grover
@ 2014-04-30 14:29 ` FUJITA Tomonori
  2 siblings, 0 replies; 4+ messages in thread
From: FUJITA Tomonori @ 2014-04-30 14:29 UTC (permalink / raw
  To: agrover; +Cc: stgt

On Mon, 28 Apr 2014 18:51:20 -0700
Andy Grover <agrover@redhat.com> wrote:

> This patch and the next are somewhat a revert of 318e9f2, but the previous
> fix didn't quite close the race. This only happens when we create threads
> for a backstore that turns out to be invalid, which we then tear down.
> 
> See https://bugzilla.redhat.com/show_bug.cgi?id=848585 .
> 
> This is occurring because there's still a window where a thread misses
> seeing info->stop == 1 but is not yet in cond_wait so it misses the
> broadcast:
> 
> thread_close:              thread_worker_fn:
>                            info->stop is seen as 0
> info->stop = 1
> pthread_cond_broadcast     -- misses broadcast
>                            pthread_cond_wait
> pthread_join (hangs)
> 
> I believe the solution is to go back to using pthread_cancel. We can call
> it before pthread_cond_wait is called (or after) and it will do the right
> thing: pop out and exit. The only tricky bit is we need to use the
> pthread_cleanup_push mechanism to properly release info->pending_lock.
> 
> Signed-off-by: Andy Grover <agrover@redhat.com>
> ---
>  usr/bs.c        | 25 ++++++++++++++-----------
>  usr/bs_thread.h |  2 --
>  2 files changed, 14 insertions(+), 13 deletions(-)

Thanks a lot for the fixes and detailed explanation. Surely, looks
like there is a race. The whole patchset looks good. Applied, thanks!

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2014-04-30 14:29 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-04-29  1:51 [PATCH 1/3] Fix race on thread shutdown causing deadlock Andy Grover
2014-04-29  1:51 ` [PATCH 2/3] Remove startup_lock Andy Grover
2014-04-29  1:51 ` [PATCH 3/3] Replace if()/goto with while() Andy Grover
2014-04-30 14:29 ` [PATCH 1/3] Fix race on thread shutdown causing deadlock FUJITA Tomonori

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.