On Thu, Mar 28, 2024 at 07:06:21PM -0500, Elizabeth Figura wrote: > diff --git a/Documentation/userspace-api/ntsync.rst b/Documentation/userspace-api/ntsync.rst > new file mode 100644 > index 000000000000..202c2350d3af > --- /dev/null > +++ b/Documentation/userspace-api/ntsync.rst > @@ -0,0 +1,399 @@ > +=================================== > +NT synchronization primitive driver > +=================================== > + > +This page documents the user-space API for the ntsync driver. > + > +ntsync is a support driver for emulation of NT synchronization > +primitives by user-space NT emulators. It exists because implementation > +in user-space, using existing tools, cannot match Windows performance > +while offering accurate semantics. It is implemented entirely in > +software, and does not drive any hardware device. > + > +This interface is meant as a compatibility tool only, and should not > +be used for general synchronization. Instead use generic, versatile > +interfaces such as futex(2) and poll(2). > + > +Synchronization primitives > +========================== > + > +The ntsync driver exposes three types of synchronization primitives: > +semaphores, mutexes, and events. > + > +A semaphore holds a single volatile 32-bit counter, and a static 32-bit > +integer denoting the maximum value. It is considered signaled when the > +counter is nonzero. The counter is decremented by one when a wait is > +satisfied. Both the initial and maximum count are established when the > +semaphore is created. > + > +A mutex holds a volatile 32-bit recursion count, and a volatile 32-bit > +identifier denoting its owner. A mutex is considered signaled when its > +owner is zero (indicating that it is not owned). The recursion count is > +incremented when a wait is satisfied, and ownership is set to the given > +identifier. > + > +A mutex also holds an internal flag denoting whether its previous owner > +has died; such a mutex is said to be abandoned. Owner death is not > +tracked automatically based on thread death, but rather must be > +communicated using ``NTSYNC_IOC_MUTEX_KILL``. An abandoned mutex is > +inherently considered unowned. > + > +Except for the "unowned" semantics of zero, the actual value of the > +owner identifier is not interpreted by the ntsync driver at all. The > +intended use is to store a thread identifier; however, the ntsync > +driver does not actually validate that a calling thread provides > +consistent or unique identifiers. > + > +An event holds a volatile boolean state denoting whether it is signaled > +or not. There are two types of events, auto-reset and manual-reset. An > +auto-reset event is designaled when a wait is satisfied; a manual-reset > +event is not. The event type is specified when the event is created. > + > +Unless specified otherwise, all operations on an object are atomic and > +totally ordered with respect to other operations on the same object. > + > +Objects are represented by files. When all file descriptors to an > +object are closed, that object is deleted. > + > +Char device > +=========== > + > +The ntsync driver creates a single char device /dev/ntsync. Each file > +description opened on the device represents a unique instance intended > +to back an individual NT virtual machine. Objects created by one ntsync > +instance may only be used with other objects created by the same > +instance. > + > +ioctl reference > +=============== > + > +All operations on the device are done through ioctls. There are four > +structures used in ioctl calls:: > + > + struct ntsync_sem_args { > + __u32 sem; > + __u32 count; > + __u32 max; > + }; > + > + struct ntsync_mutex_args { > + __u32 mutex; > + __u32 owner; > + __u32 count; > + }; > + > + struct ntsync_event_args { > + __u32 event; > + __u32 signaled; > + __u32 manual; > + }; > + > + struct ntsync_wait_args { > + __u64 timeout; > + __u64 objs; > + __u32 count; > + __u32 owner; > + __u32 index; > + __u32 alert; > + __u32 flags; > + __u32 pad; > + }; > + > +Depending on the ioctl, members of the structure may be used as input, > +output, or not at all. All ioctls return 0 on success. > + > +The ioctls on the device file are as follows: > + > +.. c:macro:: NTSYNC_IOC_CREATE_SEM > + > + Create a semaphore object. Takes a pointer to struct > + :c:type:`ntsync_sem_args`, which is used as follows: > + > + .. list-table:: > + > + * - ``sem`` > + - On output, contains a file descriptor to the created semaphore. > + * - ``count`` > + - Initial count of the semaphore. > + * - ``max`` > + - Maximum count of the semaphore. > + > + Fails with ``EINVAL`` if ``count`` is greater than ``max``. > + > +.. c:macro:: NTSYNC_IOC_CREATE_MUTEX > + > + Create a mutex object. Takes a pointer to struct > + :c:type:`ntsync_mutex_args`, which is used as follows: > + > + .. list-table:: > + > + * - ``mutex`` > + - On output, contains a file descriptor to the created mutex. > + * - ``count`` > + - Initial recursion count of the mutex. > + * - ``owner`` > + - Initial owner of the mutex. > + > + If ``owner`` is nonzero and ``count`` is zero, or if ``owner`` is > + zero and ``count`` is nonzero, the function fails with ``EINVAL``. > + > +.. c:macro:: NTSYNC_IOC_CREATE_EVENT > + > + Create an event object. Takes a pointer to struct > + :c:type:`ntsync_event_args`, which is used as follows: > + > + .. list-table:: > + > + * - ``event`` > + - On output, contains a file descriptor to the created event. > + * - ``signaled`` > + - If nonzero, the event is initially signaled, otherwise > + nonsignaled. > + * - ``manual`` > + - If nonzero, the event is a manual-reset event, otherwise > + auto-reset. > + > +The ioctls on the individual objects are as follows: > + > +.. c:macro:: NTSYNC_IOC_SEM_POST > + > + Post to a semaphore object. Takes a pointer to a 32-bit integer, > + which on input holds the count to be added to the semaphore, and on > + output contains its previous count. > + > + If adding to the semaphore's current count would raise the latter > + past the semaphore's maximum count, the ioctl fails with > + ``EOVERFLOW`` and the semaphore is not affected. If raising the > + semaphore's count causes it to become signaled, eligible threads > + waiting on this semaphore will be woken and the semaphore's count > + decremented appropriately. > + > +.. c:macro:: NTSYNC_IOC_MUTEX_UNLOCK > + > + Release a mutex object. Takes a pointer to struct > + :c:type:`ntsync_mutex_args`, which is used as follows: > + > + .. list-table:: > + > + * - ``mutex`` > + - Ignored. > + * - ``owner`` > + - Specifies the owner trying to release this mutex. > + * - ``count`` > + - On output, contains the previous recursion count. > + > + If ``owner`` is zero, the ioctl fails with ``EINVAL``. If ``owner`` > + is not the current owner of the mutex, the ioctl fails with > + ``EPERM``. > + > + The mutex's count will be decremented by one. If decrementing the > + mutex's count causes it to become zero, the mutex is marked as > + unowned and signaled, and eligible threads waiting on it will be > + woken as appropriate. > + > +.. c:macro:: NTSYNC_IOC_SET_EVENT > + > + Signal an event object. Takes a pointer to a 32-bit integer, which on > + output contains the previous state of the event. > + > + Eligible threads will be woken, and auto-reset events will be > + designaled appropriately. > + > +.. c:macro:: NTSYNC_IOC_RESET_EVENT > + > + Designal an event object. Takes a pointer to a 32-bit integer, which > + on output contains the previous state of the event. > + > +.. c:macro:: NTSYNC_IOC_PULSE_EVENT > + > + Wake threads waiting on an event object while leaving it in an > + unsignaled state. Takes a pointer to a 32-bit integer, which on > + output contains the previous state of the event. > + > + A pulse operation can be thought of as a set followed by a reset, > + performed as a single atomic operation. If two threads are waiting on > + an auto-reset event which is pulsed, only one will be woken. If two > + threads are waiting a manual-reset event which is pulsed, both will > + be woken. However, in both cases, the event will be unsignaled > + afterwards, and a simultaneous read operation will always report the > + event as unsignaled. > + > +.. c:macro:: NTSYNC_IOC_READ_SEM > + > + Read the current state of a semaphore object. Takes a pointer to > + struct :c:type:`ntsync_sem_args`, which is used as follows: > + > + .. list-table:: > + > + * - ``sem`` > + - Ignored. > + * - ``count`` > + - On output, contains the current count of the semaphore. > + * - ``max`` > + - On output, contains the maximum count of the semaphore. > + > +.. c:macro:: NTSYNC_IOC_READ_MUTEX > + > + Read the current state of a mutex object. Takes a pointer to struct > + :c:type:`ntsync_mutex_args`, which is used as follows: > + > + .. list-table:: > + > + * - ``mutex`` > + - Ignored. > + * - ``owner`` > + - On output, contains the current owner of the mutex, or zero > + if the mutex is not currently owned. > + * - ``count`` > + - On output, contains the current recursion count of the mutex. > + > + If the mutex is marked as abandoned, the function fails with > + ``EOWNERDEAD``. In this case, ``count`` and ``owner`` are set to > + zero. > + > +.. c:macro:: NTSYNC_IOC_READ_EVENT > + > + Read the current state of an event object. Takes a pointer to struct > + :c:type:`ntsync_event_args`, which is used as follows: > + > + .. list-table:: > + > + * - ``event`` > + - Ignored. > + * - ``signaled`` > + - On output, contains the current state of the event. > + * - ``manual`` > + - On output, contains 1 if the event is a manual-reset event, > + and 0 otherwise. > + > +.. c:macro:: NTSYNC_IOC_KILL_OWNER > + > + Mark a mutex as unowned and abandoned if it is owned by the given > + owner. Takes an input-only pointer to a 32-bit integer denoting the > + owner. If the owner is zero, the ioctl fails with ``EINVAL``. If the > + owner does not own the mutex, the function fails with ``EPERM``. > + > + Eligible threads waiting on the mutex will be woken as appropriate > + (and such waits will fail with ``EOWNERDEAD``, as described below). > + > +.. c:macro:: NTSYNC_IOC_WAIT_ANY > + > + Poll on any of a list of objects, atomically acquiring at most one. > + Takes a pointer to struct :c:type:`ntsync_wait_args`, which is > + used as follows: > + > + .. list-table:: > + > + * - ``timeout`` > + - Absolute timeout in nanoseconds. If ``NTSYNC_WAIT_REALTIME`` > + is set, the timeout is measured against the REALTIME clock; > + otherwise it is measured against the MONOTONIC clock. If the > + timeout is equal to or earlier than the current time, the > + function returns immediately without sleeping. If ``timeout`` > + is U64_MAX, the function will sleep until an object is > + signaled, and will not fail with ``ETIMEDOUT``. > + * - ``objs`` > + - Pointer to an array of ``count`` file descriptors > + (specified as an integer so that the structure has the same > + size regardless of architecture). If any object is > + invalid, the function fails with ``EINVAL``. > + * - ``count`` > + - Number of objects specified in the ``objs`` array. > + If greater than ``NTSYNC_MAX_WAIT_COUNT``, the function fails > + with ``EINVAL``. > + * - ``owner`` > + - Mutex owner identifier. If any object in ``objs`` is a mutex, > + the ioctl will attempt to acquire that mutex on behalf of > + ``owner``. If ``owner`` is zero, the ioctl fails with > + ``EINVAL``. > + * - ``index`` > + - On success, contains the index (into ``objs``) of the object > + which was signaled. If ``alert`` was signaled instead, > + this contains ``count``. > + * - ``alert`` > + - Optional event object file descriptor. If nonzero, this > + specifies an "alert" event object which, if signaled, will > + terminate the wait. If nonzero, the identifier must point to a > + valid event. > + * - ``flags`` > + - Zero or more flags. Currently the only flag is > + ``NTSYNC_WAIT_REALTIME``, which causes the timeout to be > + measured against the REALTIME clock instead of MONOTONIC. > + * - ``pad`` > + - Unused, must be set to zero. > + > + This function attempts to acquire one of the given objects. If unable > + to do so, it sleeps until an object becomes signaled, subsequently > + acquiring it, or the timeout expires. In the latter case the ioctl > + fails with ``ETIMEDOUT``. The function only acquires one object, even > + if multiple objects are signaled. > + > + A semaphore is considered to be signaled if its count is nonzero, and > + is acquired by decrementing its count by one. A mutex is considered > + to be signaled if it is unowned or if its owner matches the ``owner`` > + argument, and is acquired by incrementing its recursion count by one > + and setting its owner to the ``owner`` argument. An auto-reset event > + is acquired by designaling it; a manual-reset event is not affected > + by acquisition. > + > + Acquisition is atomic and totally ordered with respect to other > + operations on the same object. If two wait operations (with different > + ``owner`` identifiers) are queued on the same mutex, only one is > + signaled. If two wait operations are queued on the same semaphore, > + and a value of one is posted to it, only one is signaled. The order > + in which threads are signaled is not specified. > + > + If an abandoned mutex is acquired, the ioctl fails with > + ``EOWNERDEAD``. Although this is a failure return, the function may > + otherwise be considered successful. The mutex is marked as owned by > + the given owner (with a recursion count of 1) and as no longer > + abandoned, and ``index`` is still set to the index of the mutex. > + > + The ``alert`` argument is an "extra" event which can terminate the > + wait, independently of all other objects. If members of ``objs`` and > + ``alert`` are both simultaneously signaled, a member of ``objs`` will > + always be given priority and acquired first. > + > + It is valid to pass the same object more than once, including by > + passing the same event in the ``objs`` array and in ``alert``. If a > + wakeup occurs due to that object being signaled, ``index`` is set to > + the lowest index corresponding to that object. > + > + The function may fail with ``EINTR`` if a signal is received. > + > +.. c:macro:: NTSYNC_IOC_WAIT_ALL > + > + Poll on a list of objects, atomically acquiring all of them. Takes a > + pointer to struct :c:type:`ntsync_wait_args`, which is used > + identically to ``NTSYNC_IOC_WAIT_ANY``, except that ``index`` is > + always filled with zero on success if not woken via alert. > + > + This function attempts to simultaneously acquire all of the given > + objects. If unable to do so, it sleeps until all objects become > + simultaneously signaled, subsequently acquiring them, or the timeout > + expires. In the latter case the ioctl fails with ``ETIMEDOUT`` and no > + objects are modified. > + > + Objects may become signaled and subsequently designaled (through > + acquisition by other threads) while this thread is sleeping. Only > + once all objects are simultaneously signaled does the ioctl acquire > + them and return. The entire acquisition is atomic and totally ordered > + with respect to other operations on any of the given objects. > + > + If an abandoned mutex is acquired, the ioctl fails with > + ``EOWNERDEAD``. Similarly to ``NTSYNC_IOC_WAIT_ANY``, all objects are > + nevertheless marked as acquired. Note that if multiple mutex objects > + are specified, there is no way to know which were marked as > + abandoned. > + > + As with "any" waits, the ``alert`` argument is an "extra" event which > + can terminate the wait. Critically, however, an "all" wait will > + succeed if all members in ``objs`` are signaled, *or* if ``alert`` is > + signaled. In the latter case ``index`` will be set to ``count``. As > + with "any" waits, if both conditions are filled, the former takes > + priority, and objects in ``objs`` will be acquired. > + > + Unlike ``NTSYNC_IOC_WAIT_ANY``, it is not valid to pass the same > + object more than once, nor is it valid to pass the same object in > + ``objs`` and in ``alert``. If this is attempted, the function fails > + with ``EINVAL``. The doc LGTM, thanks! Reviewed-by: Bagas Sanjaya -- An old man doll... just what I always wanted! - Clara