All the mail mirrored from lore.kernel.org
 help / color / mirror / Atom feed
* [Qemu-devel] [PATCH 0/2] AioContext: fix missing wakeups due to event_notifier_test_and_clear
@ 2015-07-18 20:21 Paolo Bonzini
  2015-07-18 20:21 ` [Qemu-devel] [PATCH 1/2] AioContext: fix broken placement of event_notifier_test_and_clear Paolo Bonzini
  2015-07-18 20:21 ` [Qemu-devel] [PATCH 2/2] AioContext: optimize clearing the EventNotifier Paolo Bonzini
  0 siblings, 2 replies; 8+ messages in thread
From: Paolo Bonzini @ 2015-07-18 20:21 UTC (permalink / raw
  To: qemu-devel; +Cc: kwolf, famz, lersek, rjones, stefanha

This series fixes the remaining case where aio_poll() could hang
I/O on the main thread due to a missing wakeup.  It consists of
a bugfix and an optimization, both of which have survived hundreds
of tests on aarch64.

Both the bugfix and the optimization come with a formal model of the
interactions between the main thread, the VCPU thread doing aio_poll,
and the worker thread doing qemu_bh_schedule.  The models can test
the code both with and without ctx->notify_me, showing that this
bug is independent from the other.

The patches apply on top of the ctx->notify_me v3.  The code changes
are really pretty small; the second patch has a good deal of comments too.

Paolo

Paolo Bonzini (2):
  AioContext: fix broken placement of event_notifier_test_and_clear
  AioContext: optimize clearing the EventNotifier

 aio-posix.c                    |   2 +
 aio-win32.c                    |   2 +
 async.c                        |  16 ++++-
 docs/aio_notify_accept.promela | 152 +++++++++++++++++++++++++++++++++++++++++
 docs/aio_notify_bug.promela    | 140 +++++++++++++++++++++++++++++++++++++
 include/block/aio.h            |  32 ++++++++-
 6 files changed, 342 insertions(+), 2 deletions(-)
 create mode 100644 docs/aio_notify_accept.promela
 create mode 100644 docs/aio_notify_bug.promela

-- 
2.4.3

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [Qemu-devel] [PATCH 1/2] AioContext: fix broken placement of event_notifier_test_and_clear
  2015-07-18 20:21 [Qemu-devel] [PATCH 0/2] AioContext: fix missing wakeups due to event_notifier_test_and_clear Paolo Bonzini
@ 2015-07-18 20:21 ` Paolo Bonzini
  2015-07-20  3:55   ` Fam Zheng
  2015-07-18 20:21 ` [Qemu-devel] [PATCH 2/2] AioContext: optimize clearing the EventNotifier Paolo Bonzini
  1 sibling, 1 reply; 8+ messages in thread
From: Paolo Bonzini @ 2015-07-18 20:21 UTC (permalink / raw
  To: qemu-devel; +Cc: kwolf, famz, lersek, rjones, stefanha

event_notifier_test_and_clear must be called before processing events.
Otherwise, an aio_poll could "eat" the notification before the main
I/O thread invokes ppoll().  The main I/O thread then never wakes up.
This is an example of what could happen:

   i/o thread         vcpu thread                   worker thread
   ---------------------------------------------------------------------
   lock_iothread
   notify_me = 1
   ...
   unlock_iothread
                      lock_iothread
                      notify_me = 3
                      ppoll
                      notify_me = 1
                                                     bh->scheduled = 1
                                                     event_notifier_set
                      event_notifier_test_and_clear
   ppoll
   *** hang ***

In contrast with the previous bug, there wasn't really much debugging
to do here.  "Tracing" with qemu_clock_get_ns shows pretty much the
same behavior as in the previous patch, so the only wait to find the
bug is staring at the code.

One could also use a formal model, of course.  The included one shows
this with three processes: notifier corresponds to a QEMU thread pool
worker, temporary_waiter to a VCPU thread that invokes aio_poll(),
waiter to the main I/O thread.  I would be happy to say that the
formal model found the bug for me, but actually I wrote it after the
fact.

This patch is a bit of a big hammer.  The next one optimizes it,
with help (this time for real rather than a posteriori :)) from
another, similar formal model.

Reported-by: Richard W. M. Jones <rjones@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 aio-posix.c                 |   2 +
 aio-win32.c                 |   2 +
 async.c                     |   8 ++-
 docs/aio_notify_bug.promela | 140 ++++++++++++++++++++++++++++++++++++++++++++
 4 files changed, 151 insertions(+), 1 deletion(-)
 create mode 100644 docs/aio_notify_bug.promela

diff --git a/aio-posix.c b/aio-posix.c
index 249889f..5c8b266 100644
--- a/aio-posix.c
+++ b/aio-posix.c
@@ -276,6 +276,8 @@ bool aio_poll(AioContext *ctx, bool blocking)
         aio_context_acquire(ctx);
     }
 
+    event_notifier_test_and_clear(&ctx->notifier);
+
     /* if we have any readable fds, dispatch event */
     if (ret > 0) {
         for (i = 0; i < npfd; i++) {
diff --git a/aio-win32.c b/aio-win32.c
index ea655b0..3e0db20 100644
--- a/aio-win32.c
+++ b/aio-win32.c
@@ -337,6 +337,8 @@ bool aio_poll(AioContext *ctx, bool blocking)
             aio_context_acquire(ctx);
         }
 
+        event_notifier_test_and_clear(&ctx->notifier);
+
         if (first && aio_bh_poll(ctx)) {
             progress = true;
         }
diff --git a/async.c b/async.c
index a232192..d625e8a 100644
--- a/async.c
+++ b/async.c
@@ -203,6 +203,8 @@ aio_ctx_check(GSource *source)
     QEMUBH *bh;
 
     atomic_and(&ctx->notify_me, ~1);
+    event_notifier_test_and_clear(&ctx->notifier);
+
     for (bh = ctx->first_bh; bh; bh = bh->next) {
         if (!bh->deleted && bh->scheduled) {
             return true;
@@ -279,6 +281,10 @@ static void aio_rfifolock_cb(void *opaque)
     aio_notify(opaque);
 }
 
+static void event_notifier_dummy_cb(EventNotifier *e)
+{
+}
+
 AioContext *aio_context_new(Error **errp)
 {
     int ret;
@@ -293,7 +299,7 @@ AioContext *aio_context_new(Error **errp)
     g_source_set_can_recurse(&ctx->source, true);
     aio_set_event_notifier(ctx, &ctx->notifier,
                            (EventNotifierHandler *)
-                           event_notifier_test_and_clear);
+                           event_notifier_dummy_cb);
     ctx->thread_pool = NULL;
     qemu_mutex_init(&ctx->bh_lock);
     rfifolock_init(&ctx->lock, aio_rfifolock_cb, ctx);
diff --git a/docs/aio_notify_bug.promela b/docs/aio_notify_bug.promela
new file mode 100644
index 0000000..b3bfca1
--- /dev/null
+++ b/docs/aio_notify_bug.promela
@@ -0,0 +1,140 @@
+/*
+ * This model describes a bug in aio_notify.  If ctx->notifier is
+ * cleared too late, a wakeup could be lost.
+ *
+ * Author: Paolo Bonzini <pbonzini@redhat.com>
+ *
+ * This file is in the public domain.  If you really want a license,
+ * the WTFPL will do.
+ *
+ * To verify the buggy version:
+ *     spin -a -DBUG docs/aio_notify_bug.promela
+ *     gcc -O2 pan.c
+ *     ./a.out -a -f
+ *
+ * To verify the working version:
+ *     spin -a docs/aio_notify_bug.promela
+ *     gcc -O2 pan.c
+ *     ./a.out -a -f
+ *
+ * Add -DCHECK_REQ to test an alternative invariant and the
+ * "notify_me" optimization.
+ */
+
+int notify_me;
+bool event;
+bool req;
+bool notifier_done;
+
+#ifdef CHECK_REQ
+#define USE_NOTIFY_ME 1
+#else
+#define USE_NOTIFY_ME 0
+#endif
+
+active proctype notifier()
+{
+    do
+        :: true -> {
+            req = 1;
+            if
+               :: !USE_NOTIFY_ME || notify_me -> event = 1;
+               :: else -> skip;
+            fi
+        }
+        :: true -> break;
+    od;
+    notifier_done = 1;
+}
+
+#ifdef BUG
+#define AIO_POLL                                                    \
+    notify_me++;                                                    \
+    if                                                              \
+        :: !req -> {                                                \
+            if                                                      \
+                :: event -> skip;                                   \
+            fi;                                                     \
+        }                                                           \
+        :: else -> skip;                                            \
+    fi;                                                             \
+    notify_me--;                                                    \
+                                                                    \
+    req = 0;                                                        \
+    event = 0;
+#else
+#define AIO_POLL                                                    \
+    notify_me++;                                                    \
+    if                                                              \
+        :: !req -> {                                                \
+            if                                                      \
+                :: event -> skip;                                   \
+            fi;                                                     \
+        }                                                           \
+        :: else -> skip;                                            \
+    fi;                                                             \
+    notify_me--;                                                    \
+                                                                    \
+    event = 0;                                                      \
+    req = 0;
+#endif
+
+active proctype waiter()
+{
+    do
+       :: true -> AIO_POLL;
+    od;
+}
+
+/* Same as waiter(), but disappears after a while.  */
+active proctype temporary_waiter()
+{
+    do
+       :: true -> AIO_POLL;
+       :: true -> break;
+    od;
+}
+
+#ifdef CHECK_REQ
+never {
+    do
+        :: req -> goto accept_if_req_not_eventually_false;
+        :: true -> skip;
+    od;
+
+accept_if_req_not_eventually_false:
+    if
+        :: req -> goto accept_if_req_not_eventually_false;
+    fi;
+    assert(0);
+}
+
+#else
+/* There must be infinitely many transitions of event as long
+ * as the notifier does not exit.
+ *
+ * If event stayed always true, the waiters would be busy looping.
+ * If event stayed always false, the waiters would be sleeping
+ * forever.
+ */
+never {
+    do
+        :: !event    -> goto accept_if_event_not_eventually_true;
+        :: event     -> goto accept_if_event_not_eventually_false;
+        :: true      -> skip;
+    od;
+
+accept_if_event_not_eventually_true:
+    if
+        :: !event && notifier_done  -> do :: true -> skip; od;
+        :: !event && !notifier_done -> goto accept_if_event_not_eventually_true;
+    fi;
+    assert(0);
+
+accept_if_event_not_eventually_false:
+    if
+        :: event     -> goto accept_if_event_not_eventually_false;
+    fi;
+    assert(0);
+}
+#endif
-- 
2.4.3

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [Qemu-devel] [PATCH 2/2] AioContext: optimize clearing the EventNotifier
  2015-07-18 20:21 [Qemu-devel] [PATCH 0/2] AioContext: fix missing wakeups due to event_notifier_test_and_clear Paolo Bonzini
  2015-07-18 20:21 ` [Qemu-devel] [PATCH 1/2] AioContext: fix broken placement of event_notifier_test_and_clear Paolo Bonzini
@ 2015-07-18 20:21 ` Paolo Bonzini
  2015-07-20  2:27   ` Fam Zheng
  1 sibling, 1 reply; 8+ messages in thread
From: Paolo Bonzini @ 2015-07-18 20:21 UTC (permalink / raw
  To: qemu-devel; +Cc: kwolf, famz, lersek, rjones, stefanha

It is pretty rare for aio_notify to actually set the EventNotifier.  It
can happen with worker threads such as thread-pool.c's, but otherwise it
should never be set thanks to the ctx->notify_me optimization.  The
previous patch, unfortunately, added an unconditional call to
event_notifier_test_and_clear; now add a userspace fast path that
avoids the call.

Note that it is not possible to do the same with event_notifier_set;
it would break, as proved (again) by the included formal model.

This patch survived over 800 reboots on aarch64 KVM.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 aio-posix.c                    |   2 +-
 aio-win32.c                    |   2 +-
 async.c                        |  10 ++-
 docs/aio_notify_accept.promela | 152 +++++++++++++++++++++++++++++++++++++++++
 include/block/aio.h            |  32 ++++++++-
 5 files changed, 194 insertions(+), 4 deletions(-)
 create mode 100644 docs/aio_notify_accept.promela

diff --git a/aio-posix.c b/aio-posix.c
index 5c8b266..d477033 100644
--- a/aio-posix.c
+++ b/aio-posix.c
@@ -276,7 +276,7 @@ bool aio_poll(AioContext *ctx, bool blocking)
         aio_context_acquire(ctx);
     }
 
-    event_notifier_test_and_clear(&ctx->notifier);
+    aio_notify_accept(ctx);
 
     /* if we have any readable fds, dispatch event */
     if (ret > 0) {
diff --git a/aio-win32.c b/aio-win32.c
index 3e0db20..9e6eb71 100644
--- a/aio-win32.c
+++ b/aio-win32.c
@@ -337,7 +337,7 @@ bool aio_poll(AioContext *ctx, bool blocking)
             aio_context_acquire(ctx);
         }
 
-        event_notifier_test_and_clear(&ctx->notifier);
+        aio_notify_accept(ctx);
 
         if (first && aio_bh_poll(ctx)) {
             progress = true;
diff --git a/async.c b/async.c
index d625e8a..9a98a74 100644
--- a/async.c
+++ b/async.c
@@ -203,7 +203,7 @@ aio_ctx_check(GSource *source)
     QEMUBH *bh;
 
     atomic_and(&ctx->notify_me, ~1);
-    event_notifier_test_and_clear(&ctx->notifier);
+    aio_notify_accept(ctx);
 
     for (bh = ctx->first_bh; bh; bh = bh->next) {
         if (!bh->deleted && bh->scheduled) {
@@ -267,6 +267,14 @@ void aio_notify(AioContext *ctx)
     smp_mb();
     if (ctx->notify_me) {
         event_notifier_set(&ctx->notifier);
+        atomic_mb_set(&ctx->notified, true);
+    }
+}
+
+void aio_notify_accept(AioContext *ctx)
+{
+    if (atomic_xchg(&ctx->notified, false)) {
+        event_notifier_test_and_clear(&ctx->notifier);
     }
 }
 
diff --git a/docs/aio_notify_accept.promela b/docs/aio_notify_accept.promela
new file mode 100644
index 0000000..9cef2c9
--- /dev/null
+++ b/docs/aio_notify_accept.promela
@@ -0,0 +1,152 @@
+/*
+ * This model describes the interaction between ctx->notified
+ * and ctx->notifier.
+ *
+ * Author: Paolo Bonzini <pbonzini@redhat.com>
+ *
+ * This file is in the public domain.  If you really want a license,
+ * the WTFPL will do.
+ *
+ * To verify the buggy version:
+ *     spin -a -DBUG1 docs/aio_notify_bug.promela
+ *     gcc -O2 pan.c
+ *     ./a.out -a -f
+ * (or -DBUG2)
+ *
+ * To verify the fixed version:
+ *     spin -a docs/aio_notify_bug.promela
+ *     gcc -O2 pan.c
+ *     ./a.out -a -f
+ *
+ * Add -DCHECK_REQ to test an alternative invariant and the
+ * "notify_me" optimization.
+ */
+
+int notify_me;
+bool notified;
+bool event;
+bool req;
+bool notifier_done;
+
+#ifdef CHECK_REQ
+#define USE_NOTIFY_ME 1
+#else
+#define USE_NOTIFY_ME 0
+#endif
+
+#ifdef BUG
+#error Please define BUG1 or BUG2 instead.
+#endif
+
+active proctype notifier()
+{
+    do
+        :: true -> {
+            req = 1;
+            if
+               :: !USE_NOTIFY_ME || notify_me ->
+#if defined BUG1
+                   /* CHECK_REQ does not detect this bug! */
+                   notified = 1;
+                   event = 1;
+#elif defined BUG2
+                   if
+                      :: !notified -> event = 1;
+                      :: else -> skip;
+                   fi;
+                   notified = 1;
+#else
+                   event = 1;
+                   notified = 1;
+#endif
+               :: else -> skip;
+            fi
+        }
+        :: true -> break;
+    od;
+    notifier_done = 1;
+}
+
+#define AIO_POLL                                                    \
+    notify_me++;                                                    \
+    if                                                              \
+        :: !req -> {                                                \
+            if                                                      \
+                :: event -> skip;                                   \
+            fi;                                                     \
+        }                                                           \
+        :: else -> skip;                                            \
+    fi;                                                             \
+    notify_me--;                                                    \
+                                                                    \
+    atomic { old = notified; notified = 0; }                        \
+    if                                                              \
+       :: old -> event = 0;                                         \
+       :: else -> skip;                                             \
+    fi;                                                             \
+                                                                    \
+    req = 0;
+
+active proctype waiter()
+{
+    bool old;
+
+    do
+       :: true -> AIO_POLL;
+    od;
+}
+
+/* Same as waiter(), but disappears after a while.  */
+active proctype temporary_waiter()
+{
+    bool old;
+
+    do
+       :: true -> AIO_POLL;
+       :: true -> break;
+    od;
+}
+
+#ifdef CHECK_REQ
+never {
+    do
+        :: req -> goto accept_if_req_not_eventually_false;
+        :: true -> skip;
+    od;
+
+accept_if_req_not_eventually_false:
+    if
+        :: req -> goto accept_if_req_not_eventually_false;
+    fi;
+    assert(0);
+}
+
+#else
+/* There must be infinitely many transitions of event as long
+ * as the notifier does not exit.
+ *
+ * If event stayed always true, the waiters would be busy looping.
+ * If event stayed always false, the waiters would be sleeping
+ * forever.
+ */
+never {
+    do
+        :: !event    -> goto accept_if_event_not_eventually_true;
+        :: event     -> goto accept_if_event_not_eventually_false;
+        :: true      -> skip;
+    od;
+
+accept_if_event_not_eventually_true:
+    if
+        :: !event && notifier_done  -> do :: true -> skip; od;
+        :: !event && !notifier_done -> goto accept_if_event_not_eventually_true;
+    fi;
+    assert(0);
+
+accept_if_event_not_eventually_false:
+    if
+        :: event     -> goto accept_if_event_not_eventually_false;
+    fi;
+    assert(0);
+}
+#endif
diff --git a/include/block/aio.h b/include/block/aio.h
index be91e3f..9dd32e0 100644
--- a/include/block/aio.h
+++ b/include/block/aio.h
@@ -99,7 +99,19 @@ struct AioContext {
      */
     int walking_bh;
 
-    /* Used for aio_notify.  */
+    /* Used by aio_notify.
+     *
+     * "notified" is used to avoid expensive event_notifier_test_and_clear
+     * calls.  When it is clear, the EventNotifier is clear, or one thread
+     * is going to clear "notified" before processing more events.  False
+     * positives are possible, i.e. "notified" could be set even though the
+     * EventNotifier is clear.
+     *
+     * Note that event_notifier_set *cannot* be optimized the same way.  For
+     * more information on the problem that would result, see "#ifdef BUG2"
+     * in the docs/aio_notify_accept.promela formal model.
+     */
+    bool notified;
     EventNotifier notifier;
 
     /* Thread pool for performing work and receiving completion callbacks */
@@ -174,6 +186,24 @@ QEMUBH *aio_bh_new(AioContext *ctx, QEMUBHFunc *cb, void *opaque);
 void aio_notify(AioContext *ctx);
 
 /**
+ * aio_notify_accept: Acknowledge receiving an aio_notify.
+ *
+ * aio_notify() uses an EventNotifier in order to wake up a sleeping
+ * aio_poll() or g_main_context_iteration().  Calls to aio_notify() are
+ * usually rare, but the AioContext has to clear the EventNotifier on
+ * every aio_poll() or g_main_context_iteration() in order to avoid
+ * busy waiting.  This event_notifier_test_and_clear() cannot be done
+ * using the usual aio_context_set_event_notifier(), because it must
+ * be done before processing all events (file descriptors, bottom halves,
+ * timers).
+ *
+ * aio_notify_accept() is an optimized event_notifier_test_and_clear()
+ * that is specific to an AioContext's notifier; it is used internally
+ * to clear the EventNotifier only if aio_notify() had been called.
+ */
+void aio_notify_accept(AioContext *ctx);
+
+/**
  * aio_bh_poll: Poll bottom halves for an AioContext.
  *
  * These are internal functions used by the QEMU main loop.
-- 
2.4.3

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [Qemu-devel] [PATCH 2/2] AioContext: optimize clearing the EventNotifier
  2015-07-18 20:21 ` [Qemu-devel] [PATCH 2/2] AioContext: optimize clearing the EventNotifier Paolo Bonzini
@ 2015-07-20  2:27   ` Fam Zheng
  2015-07-20  5:25     ` Paolo Bonzini
  0 siblings, 1 reply; 8+ messages in thread
From: Fam Zheng @ 2015-07-20  2:27 UTC (permalink / raw
  To: Paolo Bonzini; +Cc: kwolf, lersek, qemu-devel, stefanha, rjones

On Sat, 07/18 22:21, Paolo Bonzini wrote:
> It is pretty rare for aio_notify to actually set the EventNotifier.  It
> can happen with worker threads such as thread-pool.c's, but otherwise it
> should never be set thanks to the ctx->notify_me optimization.  The
> previous patch, unfortunately, added an unconditional call to
> event_notifier_test_and_clear; now add a userspace fast path that
> avoids the call.
> 
> Note that it is not possible to do the same with event_notifier_set;
> it would break, as proved (again) by the included formal model.
> 
> This patch survived over 800 reboots on aarch64 KVM.

For aio-posix, how about keeping the optimization local which doesn't need
atomic operation? (no idea for win32 :)

diff --git a/aio-posix.c b/aio-posix.c
index 5c8b266..7e98123 100644
--- a/aio-posix.c
+++ b/aio-posix.c
@@ -236,6 +236,7 @@ bool aio_poll(AioContext *ctx, bool blocking)
     int i, ret;
     bool progress;
     int64_t timeout;
+    int aio_notifier_idx = -1;
 
     aio_context_acquire(ctx);
     progress = false;
@@ -256,11 +257,18 @@ bool aio_poll(AioContext *ctx, bool blocking)
     assert(npfd == 0);
 
     /* fill pollfds */
+    i = 0;
     QLIST_FOREACH(node, &ctx->aio_handlers, node) {
         if (!node->deleted && node->pfd.events) {
             add_pollfd(node);
+            if (node->pfd.fd == event_notifier_get_fd(&ctx->notifier)) {
+                assert(aio_notifier_idx == -1);
+                aio_notifier_idx = i;
+            }
+            i++;
         }
     }
+    assert(aio_notifier_idx != -1);
 
     timeout = blocking ? aio_compute_timeout(ctx) : 0;
 
@@ -276,7 +284,9 @@ bool aio_poll(AioContext *ctx, bool blocking)
         aio_context_acquire(ctx);
     }
 
-    event_notifier_test_and_clear(&ctx->notifier);
+    if (pollfds[aio_notifier_idx].revents & (G_IO_IN | G_IO_HUP | G_IO_ERR)) {
+        event_notifier_test_and_clear(&ctx->notifier);
+    }
 
     /* if we have any readable fds, dispatch event */
     if (ret > 0) {

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [Qemu-devel] [PATCH 1/2] AioContext: fix broken placement of event_notifier_test_and_clear
  2015-07-18 20:21 ` [Qemu-devel] [PATCH 1/2] AioContext: fix broken placement of event_notifier_test_and_clear Paolo Bonzini
@ 2015-07-20  3:55   ` Fam Zheng
  2015-07-20  5:32     ` Paolo Bonzini
  0 siblings, 1 reply; 8+ messages in thread
From: Fam Zheng @ 2015-07-20  3:55 UTC (permalink / raw
  To: Paolo Bonzini; +Cc: kwolf, lersek, qemu-devel, stefanha, rjones

On Sat, 07/18 22:21, Paolo Bonzini wrote:
> event_notifier_test_and_clear must be called before processing events.
> Otherwise, an aio_poll could "eat" the notification before the main
> I/O thread invokes ppoll().  The main I/O thread then never wakes up.
> This is an example of what could happen:
> 
>    i/o thread         vcpu thread                   worker thread
>    ---------------------------------------------------------------------
>    lock_iothread
>    notify_me = 1
>    ...
>    unlock_iothread
>                       lock_iothread
>                       notify_me = 3
>                       ppoll
>                       notify_me = 1
>                                                      bh->scheduled = 1
>                                                      event_notifier_set
>                       event_notifier_test_and_clear
>    ppoll
>    *** hang ***

It still seems possible for event_notifier_test_and_clear to happen before main
thread ppoll, will that be a problem? How does main thread get waken up in that
case?

Fam

> 
> In contrast with the previous bug, there wasn't really much debugging
> to do here.  "Tracing" with qemu_clock_get_ns shows pretty much the
> same behavior as in the previous patch, so the only wait to find the
> bug is staring at the code.
> 
> One could also use a formal model, of course.  The included one shows
> this with three processes: notifier corresponds to a QEMU thread pool
> worker, temporary_waiter to a VCPU thread that invokes aio_poll(),
> waiter to the main I/O thread.  I would be happy to say that the
> formal model found the bug for me, but actually I wrote it after the
> fact.
> 
> This patch is a bit of a big hammer.  The next one optimizes it,
> with help (this time for real rather than a posteriori :)) from
> another, similar formal model.
> 
> Reported-by: Richard W. M. Jones <rjones@redhat.com>
> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
> ---
>  aio-posix.c                 |   2 +
>  aio-win32.c                 |   2 +
>  async.c                     |   8 ++-
>  docs/aio_notify_bug.promela | 140 ++++++++++++++++++++++++++++++++++++++++++++
>  4 files changed, 151 insertions(+), 1 deletion(-)
>  create mode 100644 docs/aio_notify_bug.promela
> 
> diff --git a/aio-posix.c b/aio-posix.c
> index 249889f..5c8b266 100644
> --- a/aio-posix.c
> +++ b/aio-posix.c
> @@ -276,6 +276,8 @@ bool aio_poll(AioContext *ctx, bool blocking)
>          aio_context_acquire(ctx);
>      }
>  
> +    event_notifier_test_and_clear(&ctx->notifier);
> +
>      /* if we have any readable fds, dispatch event */
>      if (ret > 0) {
>          for (i = 0; i < npfd; i++) {
> diff --git a/aio-win32.c b/aio-win32.c
> index ea655b0..3e0db20 100644
> --- a/aio-win32.c
> +++ b/aio-win32.c
> @@ -337,6 +337,8 @@ bool aio_poll(AioContext *ctx, bool blocking)
>              aio_context_acquire(ctx);
>          }
>  
> +        event_notifier_test_and_clear(&ctx->notifier);
> +
>          if (first && aio_bh_poll(ctx)) {
>              progress = true;
>          }
> diff --git a/async.c b/async.c
> index a232192..d625e8a 100644
> --- a/async.c
> +++ b/async.c
> @@ -203,6 +203,8 @@ aio_ctx_check(GSource *source)
>      QEMUBH *bh;
>  
>      atomic_and(&ctx->notify_me, ~1);
> +    event_notifier_test_and_clear(&ctx->notifier);
> +
>      for (bh = ctx->first_bh; bh; bh = bh->next) {
>          if (!bh->deleted && bh->scheduled) {
>              return true;
> @@ -279,6 +281,10 @@ static void aio_rfifolock_cb(void *opaque)
>      aio_notify(opaque);
>  }
>  
> +static void event_notifier_dummy_cb(EventNotifier *e)
> +{
> +}
> +
>  AioContext *aio_context_new(Error **errp)
>  {
>      int ret;
> @@ -293,7 +299,7 @@ AioContext *aio_context_new(Error **errp)
>      g_source_set_can_recurse(&ctx->source, true);
>      aio_set_event_notifier(ctx, &ctx->notifier,
>                             (EventNotifierHandler *)
> -                           event_notifier_test_and_clear);
> +                           event_notifier_dummy_cb);
>      ctx->thread_pool = NULL;
>      qemu_mutex_init(&ctx->bh_lock);
>      rfifolock_init(&ctx->lock, aio_rfifolock_cb, ctx);
> diff --git a/docs/aio_notify_bug.promela b/docs/aio_notify_bug.promela
> new file mode 100644
> index 0000000..b3bfca1
> --- /dev/null
> +++ b/docs/aio_notify_bug.promela
> @@ -0,0 +1,140 @@
> +/*
> + * This model describes a bug in aio_notify.  If ctx->notifier is
> + * cleared too late, a wakeup could be lost.
> + *
> + * Author: Paolo Bonzini <pbonzini@redhat.com>
> + *
> + * This file is in the public domain.  If you really want a license,
> + * the WTFPL will do.
> + *
> + * To verify the buggy version:
> + *     spin -a -DBUG docs/aio_notify_bug.promela
> + *     gcc -O2 pan.c
> + *     ./a.out -a -f
> + *
> + * To verify the working version:
> + *     spin -a docs/aio_notify_bug.promela
> + *     gcc -O2 pan.c
> + *     ./a.out -a -f
> + *
> + * Add -DCHECK_REQ to test an alternative invariant and the
> + * "notify_me" optimization.
> + */
> +
> +int notify_me;
> +bool event;
> +bool req;
> +bool notifier_done;
> +
> +#ifdef CHECK_REQ
> +#define USE_NOTIFY_ME 1
> +#else
> +#define USE_NOTIFY_ME 0
> +#endif
> +
> +active proctype notifier()
> +{
> +    do
> +        :: true -> {
> +            req = 1;
> +            if
> +               :: !USE_NOTIFY_ME || notify_me -> event = 1;
> +               :: else -> skip;
> +            fi
> +        }
> +        :: true -> break;
> +    od;
> +    notifier_done = 1;
> +}
> +
> +#ifdef BUG
> +#define AIO_POLL                                                    \
> +    notify_me++;                                                    \
> +    if                                                              \
> +        :: !req -> {                                                \
> +            if                                                      \
> +                :: event -> skip;                                   \
> +            fi;                                                     \
> +        }                                                           \
> +        :: else -> skip;                                            \
> +    fi;                                                             \
> +    notify_me--;                                                    \
> +                                                                    \
> +    req = 0;                                                        \
> +    event = 0;
> +#else
> +#define AIO_POLL                                                    \
> +    notify_me++;                                                    \
> +    if                                                              \
> +        :: !req -> {                                                \
> +            if                                                      \
> +                :: event -> skip;                                   \
> +            fi;                                                     \
> +        }                                                           \
> +        :: else -> skip;                                            \
> +    fi;                                                             \
> +    notify_me--;                                                    \
> +                                                                    \
> +    event = 0;                                                      \
> +    req = 0;
> +#endif
> +
> +active proctype waiter()
> +{
> +    do
> +       :: true -> AIO_POLL;
> +    od;
> +}
> +
> +/* Same as waiter(), but disappears after a while.  */
> +active proctype temporary_waiter()
> +{
> +    do
> +       :: true -> AIO_POLL;
> +       :: true -> break;
> +    od;
> +}
> +
> +#ifdef CHECK_REQ
> +never {
> +    do
> +        :: req -> goto accept_if_req_not_eventually_false;
> +        :: true -> skip;
> +    od;
> +
> +accept_if_req_not_eventually_false:
> +    if
> +        :: req -> goto accept_if_req_not_eventually_false;
> +    fi;
> +    assert(0);
> +}
> +
> +#else
> +/* There must be infinitely many transitions of event as long
> + * as the notifier does not exit.
> + *
> + * If event stayed always true, the waiters would be busy looping.
> + * If event stayed always false, the waiters would be sleeping
> + * forever.
> + */
> +never {
> +    do
> +        :: !event    -> goto accept_if_event_not_eventually_true;
> +        :: event     -> goto accept_if_event_not_eventually_false;
> +        :: true      -> skip;
> +    od;
> +
> +accept_if_event_not_eventually_true:
> +    if
> +        :: !event && notifier_done  -> do :: true -> skip; od;
> +        :: !event && !notifier_done -> goto accept_if_event_not_eventually_true;
> +    fi;
> +    assert(0);
> +
> +accept_if_event_not_eventually_false:
> +    if
> +        :: event     -> goto accept_if_event_not_eventually_false;
> +    fi;
> +    assert(0);
> +}
> +#endif
> -- 
> 2.4.3
> 
> 

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [Qemu-devel] [PATCH 2/2] AioContext: optimize clearing the EventNotifier
  2015-07-20  2:27   ` Fam Zheng
@ 2015-07-20  5:25     ` Paolo Bonzini
  2015-07-20  5:34       ` Paolo Bonzini
  0 siblings, 1 reply; 8+ messages in thread
From: Paolo Bonzini @ 2015-07-20  5:25 UTC (permalink / raw
  To: Fam Zheng; +Cc: kwolf, lersek, qemu-devel, stefanha, rjones



On 20/07/2015 04:27, Fam Zheng wrote:
> For aio-posix, how about keeping the optimization local which doesn't need
> atomic operation? (no idea for win32 :)
> 
> diff --git a/aio-posix.c b/aio-posix.c
> index 5c8b266..7e98123 100644
> --- a/aio-posix.c
> +++ b/aio-posix.c
> @@ -236,6 +236,7 @@ bool aio_poll(AioContext *ctx, bool blocking)
>      int i, ret;
>      bool progress;
>      int64_t timeout;
> +    int aio_notifier_idx = -1;
>  
>      aio_context_acquire(ctx);
>      progress = false;
> @@ -256,11 +257,18 @@ bool aio_poll(AioContext *ctx, bool blocking)
>      assert(npfd == 0);
>  
>      /* fill pollfds */
> +    i = 0;
>      QLIST_FOREACH(node, &ctx->aio_handlers, node) {
>          if (!node->deleted && node->pfd.events) {
>              add_pollfd(node);
> +            if (node->pfd.fd == event_notifier_get_fd(&ctx->notifier)) {
> +                assert(aio_notifier_idx == -1);
> +                aio_notifier_idx = i;
> +            }
> +            i++;
>          }
>      }

That's a good idea.  Since aio_set_fd_handler uses QLIST_INSERT_HEAD,
perhaps we can be sure that aio_notifier_idx is always the last one
(i.e. i-1)?  And the same can be done on Windows, I think.

> +    if (pollfds[aio_notifier_idx].revents & (G_IO_IN | G_IO_HUP | G_IO_ERR)) {
> +        event_notifier_test_and_clear(&ctx->notifier);
> +    }

Might as well zero pollfds[aio_notifier_idx].revents after clearing it.

The atomic operation is not too expensive, so it would really be more
about simplicity than about speed.  Not that simplicity is a bad thing!

I'll send v2 of the first patch, you can look at doing the optimization.
 Thanks!

Paolo

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [Qemu-devel] [PATCH 1/2] AioContext: fix broken placement of event_notifier_test_and_clear
  2015-07-20  3:55   ` Fam Zheng
@ 2015-07-20  5:32     ` Paolo Bonzini
  0 siblings, 0 replies; 8+ messages in thread
From: Paolo Bonzini @ 2015-07-20  5:32 UTC (permalink / raw
  To: Fam Zheng; +Cc: kwolf, lersek, qemu-devel, stefanha, rjones



On 20/07/2015 05:55, Fam Zheng wrote:
> On Sat, 07/18 22:21, Paolo Bonzini wrote:
>> event_notifier_test_and_clear must be called before processing events.
>> Otherwise, an aio_poll could "eat" the notification before the main
>> I/O thread invokes ppoll().  The main I/O thread then never wakes up.
>> This is an example of what could happen:
>>
>>    i/o thread         vcpu thread                   worker thread
>>    ---------------------------------------------------------------------
>>    lock_iothread
>>    notify_me = 1
>>    ...
>>    unlock_iothread
>>                       lock_iothread
>>                       notify_me = 3
>>                       ppoll
>>                       notify_me = 1
>>                                                      bh->scheduled = 1
>>                                                      event_notifier_set
>>                       event_notifier_test_and_clear
>>    ppoll
>>    *** hang ***
> 
> It still seems possible for event_notifier_test_and_clear to happen before main
> thread ppoll, will that be a problem? How does main thread get waken up in that
> case?

You could have this:


   i/o thread         vcpu thread                   worker thread
   ---------------------------------------------------------------------
   lock_iothread
   notify_me = 1
   ...
   unlock_iothread
                      lock_iothread
                      notify_me = 3
                      ppoll
                      notify_me = 1
                                                    bh->scheduled = 1
                                                    event_notifier_set
                      event_notifier_test_and_clear
   ppoll

But then the bottom half *will* be processed by the VCPU thread.  It's a
similar "rule" to the one that you use with lock-free algorithms: the
consumer, 99% of the time, uses the variables in the opposite order of
the producer.

Here the producer calls event_notifier_set after bh->scheduled (of
course...); the consumer *must* clear the EventNotifier before reading
bh->scheduled.

Paolo


> 
>>
>> In contrast with the previous bug, there wasn't really much debugging
>> to do here.  "Tracing" with qemu_clock_get_ns shows pretty much the
>> same behavior as in the previous patch, so the only wait to find the
>> bug is staring at the code.
>>
>> One could also use a formal model, of course.  The included one shows
>> this with three processes: notifier corresponds to a QEMU thread pool
>> worker, temporary_waiter to a VCPU thread that invokes aio_poll(),
>> waiter to the main I/O thread.  I would be happy to say that the
>> formal model found the bug for me, but actually I wrote it after the
>> fact.
>>
>> This patch is a bit of a big hammer.  The next one optimizes it,
>> with help (this time for real rather than a posteriori :)) from
>> another, similar formal model.
>>
>> Reported-by: Richard W. M. Jones <rjones@redhat.com>
>> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
>> ---
>>  aio-posix.c                 |   2 +
>>  aio-win32.c                 |   2 +
>>  async.c                     |   8 ++-
>>  docs/aio_notify_bug.promela | 140 ++++++++++++++++++++++++++++++++++++++++++++
>>  4 files changed, 151 insertions(+), 1 deletion(-)
>>  create mode 100644 docs/aio_notify_bug.promela
>>
>> diff --git a/aio-posix.c b/aio-posix.c
>> index 249889f..5c8b266 100644
>> --- a/aio-posix.c
>> +++ b/aio-posix.c
>> @@ -276,6 +276,8 @@ bool aio_poll(AioContext *ctx, bool blocking)
>>          aio_context_acquire(ctx);
>>      }
>>  
>> +    event_notifier_test_and_clear(&ctx->notifier);
>> +
>>      /* if we have any readable fds, dispatch event */
>>      if (ret > 0) {
>>          for (i = 0; i < npfd; i++) {
>> diff --git a/aio-win32.c b/aio-win32.c
>> index ea655b0..3e0db20 100644
>> --- a/aio-win32.c
>> +++ b/aio-win32.c
>> @@ -337,6 +337,8 @@ bool aio_poll(AioContext *ctx, bool blocking)
>>              aio_context_acquire(ctx);
>>          }
>>  
>> +        event_notifier_test_and_clear(&ctx->notifier);
>> +
>>          if (first && aio_bh_poll(ctx)) {
>>              progress = true;
>>          }
>> diff --git a/async.c b/async.c
>> index a232192..d625e8a 100644
>> --- a/async.c
>> +++ b/async.c
>> @@ -203,6 +203,8 @@ aio_ctx_check(GSource *source)
>>      QEMUBH *bh;
>>  
>>      atomic_and(&ctx->notify_me, ~1);
>> +    event_notifier_test_and_clear(&ctx->notifier);
>> +
>>      for (bh = ctx->first_bh; bh; bh = bh->next) {
>>          if (!bh->deleted && bh->scheduled) {
>>              return true;
>> @@ -279,6 +281,10 @@ static void aio_rfifolock_cb(void *opaque)
>>      aio_notify(opaque);
>>  }
>>  
>> +static void event_notifier_dummy_cb(EventNotifier *e)
>> +{
>> +}
>> +
>>  AioContext *aio_context_new(Error **errp)
>>  {
>>      int ret;
>> @@ -293,7 +299,7 @@ AioContext *aio_context_new(Error **errp)
>>      g_source_set_can_recurse(&ctx->source, true);
>>      aio_set_event_notifier(ctx, &ctx->notifier,
>>                             (EventNotifierHandler *)
>> -                           event_notifier_test_and_clear);
>> +                           event_notifier_dummy_cb);
>>      ctx->thread_pool = NULL;
>>      qemu_mutex_init(&ctx->bh_lock);
>>      rfifolock_init(&ctx->lock, aio_rfifolock_cb, ctx);
>> diff --git a/docs/aio_notify_bug.promela b/docs/aio_notify_bug.promela
>> new file mode 100644
>> index 0000000..b3bfca1
>> --- /dev/null
>> +++ b/docs/aio_notify_bug.promela
>> @@ -0,0 +1,140 @@
>> +/*
>> + * This model describes a bug in aio_notify.  If ctx->notifier is
>> + * cleared too late, a wakeup could be lost.
>> + *
>> + * Author: Paolo Bonzini <pbonzini@redhat.com>
>> + *
>> + * This file is in the public domain.  If you really want a license,
>> + * the WTFPL will do.
>> + *
>> + * To verify the buggy version:
>> + *     spin -a -DBUG docs/aio_notify_bug.promela
>> + *     gcc -O2 pan.c
>> + *     ./a.out -a -f
>> + *
>> + * To verify the working version:
>> + *     spin -a docs/aio_notify_bug.promela
>> + *     gcc -O2 pan.c
>> + *     ./a.out -a -f
>> + *
>> + * Add -DCHECK_REQ to test an alternative invariant and the
>> + * "notify_me" optimization.
>> + */
>> +
>> +int notify_me;
>> +bool event;
>> +bool req;
>> +bool notifier_done;
>> +
>> +#ifdef CHECK_REQ
>> +#define USE_NOTIFY_ME 1
>> +#else
>> +#define USE_NOTIFY_ME 0
>> +#endif
>> +
>> +active proctype notifier()
>> +{
>> +    do
>> +        :: true -> {
>> +            req = 1;
>> +            if
>> +               :: !USE_NOTIFY_ME || notify_me -> event = 1;
>> +               :: else -> skip;
>> +            fi
>> +        }
>> +        :: true -> break;
>> +    od;
>> +    notifier_done = 1;
>> +}
>> +
>> +#ifdef BUG
>> +#define AIO_POLL                                                    \
>> +    notify_me++;                                                    \
>> +    if                                                              \
>> +        :: !req -> {                                                \
>> +            if                                                      \
>> +                :: event -> skip;                                   \
>> +            fi;                                                     \
>> +        }                                                           \
>> +        :: else -> skip;                                            \
>> +    fi;                                                             \
>> +    notify_me--;                                                    \
>> +                                                                    \
>> +    req = 0;                                                        \
>> +    event = 0;
>> +#else
>> +#define AIO_POLL                                                    \
>> +    notify_me++;                                                    \
>> +    if                                                              \
>> +        :: !req -> {                                                \
>> +            if                                                      \
>> +                :: event -> skip;                                   \
>> +            fi;                                                     \
>> +        }                                                           \
>> +        :: else -> skip;                                            \
>> +    fi;                                                             \
>> +    notify_me--;                                                    \
>> +                                                                    \
>> +    event = 0;                                                      \
>> +    req = 0;
>> +#endif
>> +
>> +active proctype waiter()
>> +{
>> +    do
>> +       :: true -> AIO_POLL;
>> +    od;
>> +}
>> +
>> +/* Same as waiter(), but disappears after a while.  */
>> +active proctype temporary_waiter()
>> +{
>> +    do
>> +       :: true -> AIO_POLL;
>> +       :: true -> break;
>> +    od;
>> +}
>> +
>> +#ifdef CHECK_REQ
>> +never {
>> +    do
>> +        :: req -> goto accept_if_req_not_eventually_false;
>> +        :: true -> skip;
>> +    od;
>> +
>> +accept_if_req_not_eventually_false:
>> +    if
>> +        :: req -> goto accept_if_req_not_eventually_false;
>> +    fi;
>> +    assert(0);
>> +}
>> +
>> +#else
>> +/* There must be infinitely many transitions of event as long
>> + * as the notifier does not exit.
>> + *
>> + * If event stayed always true, the waiters would be busy looping.
>> + * If event stayed always false, the waiters would be sleeping
>> + * forever.
>> + */
>> +never {
>> +    do
>> +        :: !event    -> goto accept_if_event_not_eventually_true;
>> +        :: event     -> goto accept_if_event_not_eventually_false;
>> +        :: true      -> skip;
>> +    od;
>> +
>> +accept_if_event_not_eventually_true:
>> +    if
>> +        :: !event && notifier_done  -> do :: true -> skip; od;
>> +        :: !event && !notifier_done -> goto accept_if_event_not_eventually_true;
>> +    fi;
>> +    assert(0);
>> +
>> +accept_if_event_not_eventually_false:
>> +    if
>> +        :: event     -> goto accept_if_event_not_eventually_false;
>> +    fi;
>> +    assert(0);
>> +}
>> +#endif
>> -- 
>> 2.4.3
>>
>>
> 
> 

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [Qemu-devel] [PATCH 2/2] AioContext: optimize clearing the EventNotifier
  2015-07-20  5:25     ` Paolo Bonzini
@ 2015-07-20  5:34       ` Paolo Bonzini
  0 siblings, 0 replies; 8+ messages in thread
From: Paolo Bonzini @ 2015-07-20  5:34 UTC (permalink / raw
  To: Fam Zheng; +Cc: kwolf, lersek, qemu-devel, stefanha, rjones



On 20/07/2015 07:25, Paolo Bonzini wrote:
> >      /* fill pollfds */
> > +    i = 0;
> >      QLIST_FOREACH(node, &ctx->aio_handlers, node) {
> >          if (!node->deleted && node->pfd.events) {
> >              add_pollfd(node);
> > +            if (node->pfd.fd == event_notifier_get_fd(&ctx->notifier)) {
> > +                assert(aio_notifier_idx == -1);
> > +                aio_notifier_idx = i;
> > +            }
> > +            i++;
> >          }
> >      }
>
> That's a good idea.  Since aio_set_fd_handler uses QLIST_INSERT_HEAD,
> perhaps we can be sure that aio_notifier_idx is always the last one
> (i.e. i-1)?  And the same can be done on Windows, I think.

BTW, if this is not true I think I prefer the optimization with atomics.
 But if it's true that we can just use i-1, this one is better.

Paolo

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2015-07-20  5:34 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2015-07-18 20:21 [Qemu-devel] [PATCH 0/2] AioContext: fix missing wakeups due to event_notifier_test_and_clear Paolo Bonzini
2015-07-18 20:21 ` [Qemu-devel] [PATCH 1/2] AioContext: fix broken placement of event_notifier_test_and_clear Paolo Bonzini
2015-07-20  3:55   ` Fam Zheng
2015-07-20  5:32     ` Paolo Bonzini
2015-07-18 20:21 ` [Qemu-devel] [PATCH 2/2] AioContext: optimize clearing the EventNotifier Paolo Bonzini
2015-07-20  2:27   ` Fam Zheng
2015-07-20  5:25     ` Paolo Bonzini
2015-07-20  5:34       ` Paolo Bonzini

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.