On 2024-04-27, Stas Sergeev wrote: > This patch-set implements the OA2_CRED_INHERIT flag for openat2() syscall. > It is needed to perform an open operation with the creds that were in > effect when the dir_fd was opened, if the dir was opened with O_CRED_ALLOW > flag. This allows the process to pre-open some dirs and switch eUID > (and other UIDs/GIDs) to the less-privileged user, while still retaining > the possibility to open/create files within the pre-opened directory set. > > The sand-boxing is security-oriented: symlinks leading outside of a > sand-box are rejected. /proc magic links are rejected. fds opened with > O_CRED_ALLOW are always closed on exec() and cannot be passed via unix > socket. > The more detailed description (including security considerations) > is available in the log messages of individual patches. (I meant to reply last week but I couldn't get my mail server to send mail...) It seems to me that this can already be implemented using MOUNT_ATTR_IDMAP, without creating a new form of credential overriding within the filesystem (and with such a deceptively simple implementation...) If you are a privileged process which plans to change users, you can create a detached tree with a user mapping that gives that user access to only that tree. This is far more effective at restricting possible attacks because id-mapped mounts don't override credentials during VFS operations (meaning that if you miss something, you have a big problem), instead they only affect uid-related operations within the filesystem for that mount. Since this implementation does no inherit CAP_DAC_OVERRIDE, being able to rewrite uid/gids is all you need. A new attack I just thought of while writing this mail is that because there is no RESOLVE_NO_XDEV requirement, it should be possible for the process to get an arbitrary write primitive by creating a new userns+mountns and then bind-mounting / underneath the directory. Since O_CRED_INHERIT uses override_creds, it doesn't care about whether something about the O_CRED_ALLOW directory changed afterwards. Yes, you can "just fix this" by adding a RESOLVE_NO_XDEV requirement too, but given that there have been 2-3 security issues with this design found already, it makes me feel really uneasy. Using id-mapped mounts avoids this issue because the new mount will not have the id-mapping applied and thus there is no security issue. -- Aleksa Sarai Senior Software Engineer (Containers) SUSE Linux GmbH