From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-5.8 required=3.0 tests=BAYES_00, FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM,HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 9E969C4338F for ; Fri, 23 Jul 2021 02:24:10 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id DDA8860EBE for ; Fri, 23 Jul 2021 02:24:09 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.4.1 mail.kernel.org DDA8860EBE Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=sina.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=kvack.org Received: by kanga.kvack.org (Postfix) id 4D97C6B005D; Thu, 22 Jul 2021 22:24:09 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 489B56B006C; Thu, 22 Jul 2021 22:24:09 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 377906B0070; Thu, 22 Jul 2021 22:24:09 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0213.hostedemail.com [216.40.44.213]) by kanga.kvack.org (Postfix) with ESMTP id 1B28B6B005D for ; Thu, 22 Jul 2021 22:24:09 -0400 (EDT) Received: from smtpin38.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay04.hostedemail.com (Postfix) with ESMTP id AD13E21965 for ; Fri, 23 Jul 2021 02:24:08 +0000 (UTC) X-FDA: 78392257776.38.F1AD49C Received: from r3-17.sinamail.sina.com.cn (r3-17.sinamail.sina.com.cn [202.108.3.17]) by imf21.hostedemail.com (Postfix) with SMTP id 05A52D0265BC for ; Fri, 23 Jul 2021 02:24:06 +0000 (UTC) Received: from unknown (HELO localhost.localdomain)([1.24.238.81]) by sina.com (172.16.97.27) with ESMTP id 60FA284200028B6F; Fri, 23 Jul 2021 10:24:04 +0800 (CST) X-Sender: hdanton@sina.com X-Auth-ID: hdanton@sina.com X-SMAIL-MID: 22905049283309 From: Hillf Danton To: Paolo Bonzini Cc: Hillf Danton , Thomas Gleixner , Sebastian Andrzej Siewior , "Michael S. Tsirkin" , linux-mm@kvack.org, LKML , Al Viro Subject: Re: 5.13-rt1 + KVM = WARNING: at fs/eventfd.c:74 eventfd_signal() Date: Fri, 23 Jul 2021 10:23:56 +0800 Message-Id: <20210723022356.1301-1-hdanton@sina.com> In-Reply-To: <2b4aea8d-a038-e347-7f6f-10476d771b7e@redhat.com> References: <8dfc0ee9-b97a-8ca8-d057-31c8cad3f5b6@redhat.com> <475f84e2-78ee-1a24-ef57-b16c1f2651ed@redhat.com> <20210715102249.2205-1-hdanton@sina.com> <20210716020611.2288-1-hdanton@sina.com> <20210716075539.2376-1-hdanton@sina.com> <20210716093725.2438-1-hdanton@sina.com> <20210718124219.1521-1-hdanton@sina.com> <20210721070452.1008-1-hdanton@sina.com> <20210721101119.1103-1-hdanton@sina.com> MIME-Version: 1.0 Authentication-Results: imf21.hostedemail.com; dkim=none; spf=pass (imf21.hostedemail.com: domain of hdanton@sina.com designates 202.108.3.17 as permitted sender) smtp.mailfrom=hdanton@sina.com; dmarc=none X-Rspamd-Server: rspam02 X-Stat-Signature: cnb59oj8hcmykb5hq8fp686hatdjqy54 X-Rspamd-Queue-Id: 05A52D0265BC X-HE-Tag: 1627007046-516627 Content-Transfer-Encoding: quoted-printable X-Bogosity: Ham, tests=bogofilter, spamicity=0.000004, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Wed, 21 Jul 2021 12:59:39 +0200 Paolo Bonzini wrote: >On 21/07/21 12:11, Hillf Danton wrote: >> On Wed, 21 Jul 2021 09:25:32 +0200 Thomas Gleixner wrote: >>> On Wed, Jul 21 2021 at 15:04, Hillf Danton wrote: >>>> >>>> But the preempting waker can not make sense without the waiter who i= s bloody >>>> special. Why is it so in the first place? Or it is not at all but th= e race >>>> existing from Monday to Friday. >>> >>> See the large comment in eventfd_poll(). >>=20 >> Is it likely for a reader to make eventfd_poll() return 0? >>=20 >> read * poll write >> ---- * ----------------- ------------ >> * count =3D ctx->count (INVALID!) >> * lock ctx->qwh.lock >> * ctx->count +=3D n >> * **waitqueue_active is false= ** >> * **no wake_up_locked_poll!** >> * unlock ctx->qwh.lock >>=20 >> lock ctx->qwh.lock >> *cnt =3D (ctx->flags & EFD_SEMAPHORE) ? 1 : ctx->count; >> ctx->count -=3D *cnt; >> **waitqueue_active is false** >> unlock ctx->qwh.lock >>=20 >> * lock ctx->wqh.lock (in poll_wait) >> * __add_wait_queue >> * unlock ctx->wqh.lock >> * eventfd_poll returns 0 >> */ >> count =3D READ_ONCE(ctx->count); >>=20 > >No, it's simply impossible. The same comment explains why: "count =3D=20 >ctx->count" cannot move above poll_wait's locking of ctx->wqh.lock. Detect concurrent reader and writer by reading event counter before and after poll_wait(), and determine feedback with the case of unstable counter taken into account. Cut the big comment as the added barriers speak for themselves. +++ x/fs/eventfd.c @@ -131,49 +131,20 @@ static __poll_t eventfd_poll(struct file { struct eventfd_ctx *ctx =3D file->private_data; __poll_t events =3D 0; - u64 count; + u64 c0, count; + + c0 =3D ctx->count; + smp_rmb(); =20 poll_wait(file, &ctx->wqh, wait); =20 - /* - * All writes to ctx->count occur within ctx->wqh.lock. This read - * can be done outside ctx->wqh.lock because we know that poll_wait - * takes that lock (through add_wait_queue) if our caller will sleep. - * - * The read _can_ therefore seep into add_wait_queue's critical - * section, but cannot move above it! add_wait_queue's spin_lock acts - * as an acquire barrier and ensures that the read be ordered properly - * against the writes. The following CAN happen and is safe: - * - * poll write - * ----------------- ------------ - * lock ctx->wqh.lock (in poll_wait) - * count =3D ctx->count - * __add_wait_queue - * unlock ctx->wqh.lock - * lock ctx->qwh.lock - * ctx->count +=3D n - * if (waitqueue_active) - * wake_up_locked_poll - * unlock ctx->qwh.lock - * eventfd_poll returns 0 - * - * but the following, which would miss a wakeup, cannot happen: - * - * poll write - * ----------------- ------------ - * count =3D ctx->count (INVALID!) - * lock ctx->qwh.lock - * ctx->count +=3D n - * **waitqueue_active is false** - * **no wake_up_locked_poll!** - * unlock ctx->qwh.lock - * lock ctx->wqh.lock (in poll_wait) - * __add_wait_queue - * unlock ctx->wqh.lock - * eventfd_poll returns 0 - */ - count =3D READ_ONCE(ctx->count); + smp_rmb(); + count =3D ctx->count; + + if (c0 < count) + return EPOLLIN; + if (c0 > count) + return EPOLLOUT; =20 if (count > 0) events |=3D EPOLLIN;