Linux-kselftest Archive mirror
 help / color / mirror / Atom feed
From: Elizabeth Figura <zfigura@codeweavers.com>
To: Peter Zijlstra <peterz@infradead.org>
Cc: "Arnd Bergmann" <arnd@arndb.de>,
	"Greg Kroah-Hartman" <gregkh@linuxfoundation.org>,
	"Jonathan Corbet" <corbet@lwn.net>,
	"Shuah Khan" <shuah@kernel.org>,
	linux-kernel@vger.kernel.org, linux-api@vger.kernel.org,
	wine-devel@winehq.org, "André Almeida" <andrealmeid@igalia.com>,
	"Wolfram Sang" <wsa@kernel.org>,
	"Arkadiusz Hiler" <ahiler@codeweavers.com>,
	"Andy Lutomirski" <luto@kernel.org>,
	linux-doc@vger.kernel.org, linux-kselftest@vger.kernel.org,
	"Randy Dunlap" <rdunlap@infradead.org>,
	"Ingo Molnar" <mingo@redhat.com>, "Will Deacon" <will@kernel.org>,
	"Waiman Long" <longman@redhat.com>,
	"Boqun Feng" <boqun.feng@gmail.com>
Subject: Re: [PATCH v4 00/30] NT synchronization primitive driver
Date: Tue, 16 Apr 2024 16:18:17 -0500	[thread overview]
Message-ID: <23472492.6Emhk5qWAg@terabithia> (raw)
In-Reply-To: <20240416161917.GD12673@noisy.programming.kicks-ass.net>

On Tuesday, 16 April 2024 11:19:17 CDT Peter Zijlstra wrote:
> On Tue, Apr 16, 2024 at 05:53:45PM +0200, Peter Zijlstra wrote:
> > On Tue, Apr 16, 2024 at 05:50:14PM +0200, Peter Zijlstra wrote:
> > > On Tue, Apr 16, 2024 at 10:14:21AM +0200, Peter Zijlstra wrote:
> > > > > Some aspects of the implementation may deserve particular comment:
> > > > > 
> > > > > * In the interest of performance, each object is governed only by a
> > > > > single
> > > > > 
> > > > >   spinlock. However, NTSYNC_IOC_WAIT_ALL requires that the state of
> > > > >   multiple
> > > > >   objects be changed as a single atomic operation. In order to
> > > > >   achieve this, we first take a device-wide lock ("wait_all_lock")
> > > > >   any time we are going to lock more than one object at a time.
> > > > >   
> > > > >   The maximum number of objects that can be used in a vectored wait,
> > > > >   and
> > > > >   therefore the maximum that can be locked simultaneously, is 64.
> > > > >   This number is NT's own limit.
> > > 
> > > AFAICT:
> > > 	spin_lock(&dev->wait_all_lock);
> > > 	
> > > 	  list_for_each_entry(entry, &obj->all_waiters, node)
> > > 	  
> > > 	    for (i=0; i<count; i++)
> > > 	    
> > > 	      spin_lock_nest_lock(q->entries[i].obj->lock,
> > > 	      &dev->wait_all_lock);
> > > 
> > > Where @count <= NTSYNC_MAX_WAIT_COUNT.
> > > 
> > > So while this nests at most 65 spinlocks, there is no actual bound on
> > > the amount of nested lock sections in total. That is, all_waiters list
> > > can be grown without limits.
> > > 
> > > Can we pretty please make wait_all_lock a mutex ?

That should be fine, at least.

> > Hurmph, it's worse, you do that list walk while holding some obj->lock
> > spinlokc too. Still need to figure out how all that works....
> 
> So the point of having that other lock around is so that things like:
> 
> 	try_wake_all_obj(dev, sem)
> 	try_wake_any_sem(sem)
> 
> are done under the same lock?

The point of having the other lock around is that try_wake_all() needs to lock 
multiple objects at the same time. It's a way of avoiding lock inversion.

Consider task A does a wait-for-all on objects X, Y, Z. Then task B signals Y, 
so we do try_wake_all_obj() on Y, which does try_wake_all() on A's queue 
entry; that needs to check X and Z and consume the state of all three objects 
atomically. Another task could be trying to signal Z at the same time and 
could hit a task waiting on Z, Y, X, and that causes inversion.

The simple and easy way to implement everything is just to have a global lock 
on the whole device, but this is kind of known to be a performance bottleneck 
(this was NT's BKL, and they ditched it starting with Vista or 7 or 
something).

Instead we use a lock per object, and normally in the wait-for-any case we 
only ever need to grab one lock at a time, but when we need to do a wait-for-
all we need to lock multiple objects at once, and we grab the outer lock to 
avoid potential lock inversion.

> Where I seem to note that both those functions do that same list
> iteration.

Over different lists. I don't know if there's a better way to name things to 
make that clearer.

There's the "any" wait queue, which tasks which do a wait-for-any add 
themselves to, and the "all" wait queue, which tasks that do a wait-for-all 
add themselves to. Signaling an object could potentially wake up either one, 
but checking whether a task is eligible is a different process.



  reply	other threads:[~2024-04-16 21:18 UTC|newest]

Thread overview: 55+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-04-16  1:08 [PATCH v4 00/30] NT synchronization primitive driver Elizabeth Figura
2024-04-16  1:08 ` [PATCH v4 01/27] ntsync: Introduce NTSYNC_IOC_WAIT_ANY Elizabeth Figura
2024-04-16  1:08 ` [PATCH v4 02/27] ntsync: Introduce NTSYNC_IOC_WAIT_ALL Elizabeth Figura
2024-04-17 11:37   ` Peter Zijlstra
2024-04-17 20:03     ` Elizabeth Figura
2024-04-18  9:35       ` Peter Zijlstra
2024-04-19 16:28         ` Peter Zijlstra
2024-05-14  4:15           ` Elizabeth Figura
2024-04-16  1:08 ` [PATCH v4 03/27] ntsync: Introduce NTSYNC_IOC_CREATE_MUTEX Elizabeth Figura
2024-04-16  1:08 ` [PATCH v4 04/27] ntsync: Introduce NTSYNC_IOC_MUTEX_UNLOCK Elizabeth Figura
2024-04-16  1:08 ` [PATCH v4 05/27] ntsync: Introduce NTSYNC_IOC_MUTEX_KILL Elizabeth Figura
2024-04-16  1:08 ` [PATCH v4 06/27] ntsync: Introduce NTSYNC_IOC_CREATE_EVENT Elizabeth Figura
2024-04-16  1:08 ` [PATCH v4 07/27] ntsync: Introduce NTSYNC_IOC_EVENT_SET Elizabeth Figura
2024-04-16  1:08 ` [PATCH v4 08/27] ntsync: Introduce NTSYNC_IOC_EVENT_RESET Elizabeth Figura
2024-04-16  1:08 ` [PATCH v4 09/27] ntsync: Introduce NTSYNC_IOC_EVENT_PULSE Elizabeth Figura
2024-04-16  1:08 ` [PATCH v4 10/27] ntsync: Introduce NTSYNC_IOC_SEM_READ Elizabeth Figura
2024-04-16  1:08 ` [PATCH v4 11/27] ntsync: Introduce NTSYNC_IOC_MUTEX_READ Elizabeth Figura
2024-04-16  1:08 ` [PATCH v4 12/27] ntsync: Introduce NTSYNC_IOC_EVENT_READ Elizabeth Figura
2024-04-16  1:08 ` [PATCH v4 13/27] ntsync: Introduce alertable waits Elizabeth Figura
2024-04-16  1:08 ` [PATCH v4 14/27] selftests: ntsync: Add some tests for semaphore state Elizabeth Figura
2024-04-16  1:08 ` [PATCH v4 15/27] selftests: ntsync: Add some tests for mutex state Elizabeth Figura
2024-04-16  1:08 ` [PATCH v4 16/27] selftests: ntsync: Add some tests for NTSYNC_IOC_WAIT_ANY Elizabeth Figura
2024-04-16  1:08 ` [PATCH v4 17/27] selftests: ntsync: Add some tests for NTSYNC_IOC_WAIT_ALL Elizabeth Figura
2024-04-16  1:08 ` [PATCH v4 18/27] selftests: ntsync: Add some tests for wakeup signaling with WINESYNC_IOC_WAIT_ANY Elizabeth Figura
2024-04-16  1:08 ` [PATCH v4 19/27] selftests: ntsync: Add some tests for wakeup signaling with WINESYNC_IOC_WAIT_ALL Elizabeth Figura
2024-04-16  1:08 ` [PATCH v4 20/27] selftests: ntsync: Add some tests for manual-reset event state Elizabeth Figura
2024-04-16  1:08 ` [PATCH v4 21/27] selftests: ntsync: Add some tests for auto-reset " Elizabeth Figura
2024-04-16  1:08 ` [PATCH v4 22/27] selftests: ntsync: Add some tests for wakeup signaling with events Elizabeth Figura
2024-04-16  1:08 ` [PATCH v4 23/27] selftests: ntsync: Add tests for alertable waits Elizabeth Figura
2024-04-16  1:08 ` [PATCH v4 24/27] selftests: ntsync: Add some tests for wakeup signaling via alerts Elizabeth Figura
2024-04-16  1:08 ` [PATCH v4 25/27] selftests: ntsync: Add a stress test for contended waits Elizabeth Figura
2024-04-16  1:08 ` [PATCH v4 26/27] maintainers: Add an entry for ntsync Elizabeth Figura
2024-04-16  1:08 ` [PATCH v4 27/27] docs: ntsync: Add documentation for the ntsync uAPI Elizabeth Figura
2024-04-16  2:13   ` Randy Dunlap
2024-04-16  8:14 ` [PATCH v4 00/30] NT synchronization primitive driver Peter Zijlstra
2024-04-16  8:49   ` Greg Kroah-Hartman
2024-04-16 15:50   ` Peter Zijlstra
2024-04-16 15:53     ` Peter Zijlstra
2024-04-16 16:19       ` Peter Zijlstra
2024-04-16 21:18         ` Elizabeth Figura [this message]
2024-04-17  5:21           ` Peter Zijlstra
2024-04-16 21:18   ` Elizabeth Figura
2024-04-16 22:18     ` Elizabeth Figura
2024-04-19 16:16       ` Peter Zijlstra
2024-04-19 20:46         ` Elizabeth Figura
2024-05-07  0:40           ` Elizabeth Figura
2024-05-07  0:50           ` Elizabeth Figura
2024-04-17  5:24     ` Peter Zijlstra
2024-04-16 16:05 ` Peter Zijlstra
2024-04-16 21:18   ` Elizabeth Figura
2024-04-17  5:22     ` Peter Zijlstra
2024-04-17  6:05       ` Elizabeth Figura
2024-04-17 10:01         ` Peter Zijlstra
2024-04-17 20:02           ` Elizabeth Figura
2024-05-15 23:32             ` Elizabeth Figura

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=23472492.6Emhk5qWAg@terabithia \
    --to=zfigura@codeweavers.com \
    --cc=ahiler@codeweavers.com \
    --cc=andrealmeid@igalia.com \
    --cc=arnd@arndb.de \
    --cc=boqun.feng@gmail.com \
    --cc=corbet@lwn.net \
    --cc=gregkh@linuxfoundation.org \
    --cc=linux-api@vger.kernel.org \
    --cc=linux-doc@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-kselftest@vger.kernel.org \
    --cc=longman@redhat.com \
    --cc=luto@kernel.org \
    --cc=mingo@redhat.com \
    --cc=peterz@infradead.org \
    --cc=rdunlap@infradead.org \
    --cc=shuah@kernel.org \
    --cc=will@kernel.org \
    --cc=wine-devel@winehq.org \
    --cc=wsa@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).