* cgroup2 labeling status
@ 2024-05-02 18:37 Chris PeBenito
2024-05-02 18:53 ` Stephen Smalley
0 siblings, 1 reply; 5+ messages in thread
From: Chris PeBenito @ 2024-05-02 18:37 UTC (permalink / raw
To: SElinux mailing list
The state of cgroup2 labeling and memory.pressure came up for me again.
This was discussed March last year[1]. To summarize, refpolicy has a
type_transition for the memory.pressure file in cgroup2 to a default of
memory_pressure_t. For example this file:
/sys/fs/cgroup/system.slice/systemd-journald.service/memory.pressure
with the idea that we allow daemons to write to this without allowing
writes to all cgroup_t. Unfortunately, the thread ended and I haven't
seen any improvement.
The conclusion was[3]:
> Ah, now I remembered that we made it such that the transitions would
> only apply if the parent directory has a label explicitly set by
> userspace (via setxattr). Not sure if we can improve it easily, since
> we can't use the normal inode-based logic for cgroupfs (the xattrs are
> stored in kernfs nodes, each of which can be exposed via multiple
> inodes if there is more than one cgroupfs mount).
Testing on a 6.6 kernel and systemd 255, I still see the same issues,
where most are stuck at cgroup_t, with user.slice entries get
memory_pressure_t[2]. Based on my investigations, the user.slice works
because systemd sets the user.invocation_id xattr on these dirs.
Next, I tried modifying systemd to use it's version of
setfscreatecon()+mkdir() when it creates the cgroup directories. This
did not change the labeling behavior. Next I changed the code to a
post-mkdir setfilecon() and then all the memory.pressures finally had
expected labeling.
This setxattr() requirement is unfortunate, and the fact the
setfscreatecon() doesn't work makes it more unfortunate. Is there any
improvement being worked?
[1] https://lore.kernel.org/selinux/87mt47ga29.fsf@defensec.nl/
[2]
https://lore.kernel.org/selinux/CAEjxPJ77ZiWTwJ=hj2DFoNCg4XZMfiU6VNSNAnyCKc0Rd+nM6Q@mail.gmail.com/
[3]
https://lore.kernel.org/selinux/CAFqZXNtLFsmb3n+H=7Jcp1g_sLEFdRL75fzvjMvTU1rXvaQXMA@mail.gmail.com/
--
Chris PeBenito
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: cgroup2 labeling status
2024-05-02 18:37 cgroup2 labeling status Chris PeBenito
@ 2024-05-02 18:53 ` Stephen Smalley
2024-05-02 19:16 ` Chris PeBenito
0 siblings, 1 reply; 5+ messages in thread
From: Stephen Smalley @ 2024-05-02 18:53 UTC (permalink / raw
To: Chris PeBenito, Paul Moore, Ondrej Mosnacek; +Cc: SElinux mailing list
On Thu, May 2, 2024 at 2:37 PM Chris PeBenito <pebenito@ieee.org> wrote:
>
> The state of cgroup2 labeling and memory.pressure came up for me again.
> This was discussed March last year[1]. To summarize, refpolicy has a
> type_transition for the memory.pressure file in cgroup2 to a default of
> memory_pressure_t. For example this file:
>
> /sys/fs/cgroup/system.slice/systemd-journald.service/memory.pressure
>
> with the idea that we allow daemons to write to this without allowing
> writes to all cgroup_t. Unfortunately, the thread ended and I haven't
> seen any improvement.
>
> The conclusion was[3]:
>
> > Ah, now I remembered that we made it such that the transitions would
> > only apply if the parent directory has a label explicitly set by
> > userspace (via setxattr). Not sure if we can improve it easily, since
> > we can't use the normal inode-based logic for cgroupfs (the xattrs are
> > stored in kernfs nodes, each of which can be exposed via multiple
> > inodes if there is more than one cgroupfs mount).
>
> Testing on a 6.6 kernel and systemd 255, I still see the same issues,
> where most are stuck at cgroup_t, with user.slice entries get
> memory_pressure_t[2]. Based on my investigations, the user.slice works
> because systemd sets the user.invocation_id xattr on these dirs.
>
> Next, I tried modifying systemd to use it's version of
> setfscreatecon()+mkdir() when it creates the cgroup directories. This
> did not change the labeling behavior. Next I changed the code to a
> post-mkdir setfilecon() and then all the memory.pressures finally had
> expected labeling.
>
> This setxattr() requirement is unfortunate, and the fact the
> setfscreatecon() doesn't work makes it more unfortunate. Is there any
> improvement being worked?
Possibly I misunderstand, but selinux_kernfs_init_security() appears
to honor the create_sid (setfscreatecon) if set, so I would have
expected that to work.
> [1] https://lore.kernel.org/selinux/87mt47ga29.fsf@defensec.nl/
> [2]
> https://lore.kernel.org/selinux/CAEjxPJ77ZiWTwJ=hj2DFoNCg4XZMfiU6VNSNAnyCKc0Rd+nM6Q@mail.gmail.com/
> [3]
> https://lore.kernel.org/selinux/CAFqZXNtLFsmb3n+H=7Jcp1g_sLEFdRL75fzvjMvTU1rXvaQXMA@mail.gmail.com/
>
> --
> Chris PeBenito
>
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: cgroup2 labeling status
2024-05-02 18:53 ` Stephen Smalley
@ 2024-05-02 19:16 ` Chris PeBenito
2024-05-03 12:00 ` Stephen Smalley
0 siblings, 1 reply; 5+ messages in thread
From: Chris PeBenito @ 2024-05-02 19:16 UTC (permalink / raw
To: Stephen Smalley, Paul Moore, Ondrej Mosnacek; +Cc: SElinux mailing list
On 5/2/2024 2:53 PM, Stephen Smalley wrote:
> On Thu, May 2, 2024 at 2:37 PM Chris PeBenito <pebenito@ieee.org> wrote:
>>
>> The state of cgroup2 labeling and memory.pressure came up for me again.
>> This was discussed March last year[1]. To summarize, refpolicy has a
>> type_transition for the memory.pressure file in cgroup2 to a default of
>> memory_pressure_t. For example this file:
>>
>> /sys/fs/cgroup/system.slice/systemd-journald.service/memory.pressure
>>
>> with the idea that we allow daemons to write to this without allowing
>> writes to all cgroup_t. Unfortunately, the thread ended and I haven't
>> seen any improvement.
>>
>> The conclusion was[3]:
>>
>>> Ah, now I remembered that we made it such that the transitions would
>>> only apply if the parent directory has a label explicitly set by
>>> userspace (via setxattr). Not sure if we can improve it easily, since
>>> we can't use the normal inode-based logic for cgroupfs (the xattrs are
>>> stored in kernfs nodes, each of which can be exposed via multiple
>>> inodes if there is more than one cgroupfs mount).
>>
>> Testing on a 6.6 kernel and systemd 255, I still see the same issues,
>> where most are stuck at cgroup_t, with user.slice entries get
>> memory_pressure_t[2]. Based on my investigations, the user.slice works
>> because systemd sets the user.invocation_id xattr on these dirs.
>>
>> Next, I tried modifying systemd to use it's version of
>> setfscreatecon()+mkdir() when it creates the cgroup directories. This
>> did not change the labeling behavior. Next I changed the code to a
>> post-mkdir setfilecon() and then all the memory.pressures finally had
>> expected labeling.
>>
>> This setxattr() requirement is unfortunate, and the fact the
>> setfscreatecon() doesn't work makes it more unfortunate. Is there any
>> improvement being worked?
>
> Possibly I misunderstand, but selinux_kernfs_init_security() appears
> to honor the create_sid (setfscreatecon) if set, so I would have
> expected that to work.
Does there need to be an xattr on the cgroup2 fs root directory for this
to work? Based on the tracing I did on the systemd code, the post-mkdir
setfilecon() would have happened on the root dir, but the
setfscreatcon() version of the code change obviously wouldn't have
changed anything when it ran on the cgroup2 root dir.
--
Chris PeBenito
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: cgroup2 labeling status
2024-05-02 19:16 ` Chris PeBenito
@ 2024-05-03 12:00 ` Stephen Smalley
2024-05-06 13:32 ` Chris PeBenito
0 siblings, 1 reply; 5+ messages in thread
From: Stephen Smalley @ 2024-05-03 12:00 UTC (permalink / raw
To: Chris PeBenito; +Cc: Paul Moore, Ondrej Mosnacek, SElinux mailing list
On Thu, May 2, 2024 at 3:16 PM Chris PeBenito <pebenito@ieee.org> wrote:
>
> On 5/2/2024 2:53 PM, Stephen Smalley wrote:
> > On Thu, May 2, 2024 at 2:37 PM Chris PeBenito <pebenito@ieee.org> wrote:
> >>
> >> The state of cgroup2 labeling and memory.pressure came up for me again.
> >> This was discussed March last year[1]. To summarize, refpolicy has a
> >> type_transition for the memory.pressure file in cgroup2 to a default of
> >> memory_pressure_t. For example this file:
> >>
> >> /sys/fs/cgroup/system.slice/systemd-journald.service/memory.pressure
> >>
> >> with the idea that we allow daemons to write to this without allowing
> >> writes to all cgroup_t. Unfortunately, the thread ended and I haven't
> >> seen any improvement.
> >>
> >> The conclusion was[3]:
> >>
> >>> Ah, now I remembered that we made it such that the transitions would
> >>> only apply if the parent directory has a label explicitly set by
> >>> userspace (via setxattr). Not sure if we can improve it easily, since
> >>> we can't use the normal inode-based logic for cgroupfs (the xattrs are
> >>> stored in kernfs nodes, each of which can be exposed via multiple
> >>> inodes if there is more than one cgroupfs mount).
> >>
> >> Testing on a 6.6 kernel and systemd 255, I still see the same issues,
> >> where most are stuck at cgroup_t, with user.slice entries get
> >> memory_pressure_t[2]. Based on my investigations, the user.slice works
> >> because systemd sets the user.invocation_id xattr on these dirs.
> >>
> >> Next, I tried modifying systemd to use it's version of
> >> setfscreatecon()+mkdir() when it creates the cgroup directories. This
> >> did not change the labeling behavior. Next I changed the code to a
> >> post-mkdir setfilecon() and then all the memory.pressures finally had
> >> expected labeling.
> >>
> >> This setxattr() requirement is unfortunate, and the fact the
> >> setfscreatecon() doesn't work makes it more unfortunate. Is there any
> >> improvement being worked?
> >
> > Possibly I misunderstand, but selinux_kernfs_init_security() appears
> > to honor the create_sid (setfscreatecon) if set, so I would have
> > expected that to work.
>
> Does there need to be an xattr on the cgroup2 fs root directory for this
> to work? Based on the tracing I did on the systemd code, the post-mkdir
> setfilecon() would have happened on the root dir, but the
> setfscreatcon() version of the code change obviously wouldn't have
> changed anything when it ran on the cgroup2 root dir.
That could be the case, based on Ondrej's statement on the earlier
thread. So it isn't a limitation of the SELinux code per se but rather
the cgroup2/kernfs code.
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: cgroup2 labeling status
2024-05-03 12:00 ` Stephen Smalley
@ 2024-05-06 13:32 ` Chris PeBenito
0 siblings, 0 replies; 5+ messages in thread
From: Chris PeBenito @ 2024-05-06 13:32 UTC (permalink / raw
To: Stephen Smalley; +Cc: Paul Moore, Ondrej Mosnacek, SElinux mailing list
On 5/3/2024 8:00 AM, Stephen Smalley wrote:
> On Thu, May 2, 2024 at 3:16 PM Chris PeBenito <pebenito@ieee.org> wrote:
>>
>> On 5/2/2024 2:53 PM, Stephen Smalley wrote:
>>> On Thu, May 2, 2024 at 2:37 PM Chris PeBenito <pebenito@ieee.org> wrote:
>>>>
>>>> The state of cgroup2 labeling and memory.pressure came up for me again.
>>>> This was discussed March last year[1]. To summarize, refpolicy has a
>>>> type_transition for the memory.pressure file in cgroup2 to a default of
>>>> memory_pressure_t. For example this file:
>>>>
>>>> /sys/fs/cgroup/system.slice/systemd-journald.service/memory.pressure
>>>>
>>>> with the idea that we allow daemons to write to this without allowing
>>>> writes to all cgroup_t. Unfortunately, the thread ended and I haven't
>>>> seen any improvement.
>>>>
>>>> The conclusion was[3]:
>>>>
>>>>> Ah, now I remembered that we made it such that the transitions would
>>>>> only apply if the parent directory has a label explicitly set by
>>>>> userspace (via setxattr). Not sure if we can improve it easily, since
>>>>> we can't use the normal inode-based logic for cgroupfs (the xattrs are
>>>>> stored in kernfs nodes, each of which can be exposed via multiple
>>>>> inodes if there is more than one cgroupfs mount).
>>>>
>>>> Testing on a 6.6 kernel and systemd 255, I still see the same issues,
>>>> where most are stuck at cgroup_t, with user.slice entries get
>>>> memory_pressure_t[2]. Based on my investigations, the user.slice works
>>>> because systemd sets the user.invocation_id xattr on these dirs.
>>>>
>>>> Next, I tried modifying systemd to use it's version of
>>>> setfscreatecon()+mkdir() when it creates the cgroup directories. This
>>>> did not change the labeling behavior. Next I changed the code to a
>>>> post-mkdir setfilecon() and then all the memory.pressures finally had
>>>> expected labeling.
>>>>
>>>> This setxattr() requirement is unfortunate, and the fact the
>>>> setfscreatecon() doesn't work makes it more unfortunate. Is there any
>>>> improvement being worked?
>>>
>>> Possibly I misunderstand, but selinux_kernfs_init_security() appears
>>> to honor the create_sid (setfscreatecon) if set, so I would have
>>> expected that to work.
>>
>> Does there need to be an xattr on the cgroup2 fs root directory for this
>> to work? Based on the tracing I did on the systemd code, the post-mkdir
>> setfilecon() would have happened on the root dir, but the
>> setfscreatcon() version of the code change obviously wouldn't have
>> changed anything when it ran on the cgroup2 root dir.
>
> That could be the case, based on Ondrej's statement on the earlier
> thread. So it isn't a limitation of the SELinux code per se but rather
> the cgroup2/kernfs code.
I think I've reached the end of what I can debug from userspace. I
changed the genfscon to no_access_t so it would be obvious where the
genfs label was still in use on a file. It indicated a relabel is
needed due to entries being created during initramfs, despite systemd
supposedly relabeling /sys/fs/cgroup (still looking into this) just
after loading the policy. I added a tmpfiles.d entry to get the fs
relabeled and retried the setfscreatecon() version and the results were
quite weird:
root [ /home/pebenito ]# ls -lZ /sys/fs/cgroup/*/*/memory.pressure
-rw-r--r--. 1 root root
system_u:object_r:cgroup_t:s0 0 May 6 12:21
/sys/fs/cgroup/system.slice/auditd.service/memory.pressure
-rw-r--r--. 1 root root
system_u:object_r:memory_pressure_t:s0 0 May 6 12:19
/sys/fs/cgroup/system.slice/boot-efi.mount/memory.pressure
-rw-r--r--. 1 root root
system_u:object_r:cgroup_t:s0 0 May 6 12:19
/sys/fs/cgroup/system.slice/chronyd.service/memory.pressure
-rw-r--r--. 1 root root
system_u:object_r:cgroup_t:s0 0 May 6 12:19
/sys/fs/cgroup/system.slice/crond.service/memory.pressure
-rw-r--r--. 1 messagebus messagebus
system_u:object_r:cgroup_t:s0 0 May 6 12:19
/sys/fs/cgroup/system.slice/dbus.service/memory.pressure
-rw-r--r--. 1 root root
system_u:object_r:cgroup_t:s0 0 May 6 12:19
/sys/fs/cgroup/system.slice/hypervkvpd.service/memory.pressure
-rw-r--r--. 1 root root
system_u:object_r:cgroup_t:s0 0 May 6 12:19
/sys/fs/cgroup/system.slice/hypervvssd.service/memory.pressure
-rw-r--r--. 1 root root
system_u:object_r:cgroup_t:s0 0 May 6 12:19
/sys/fs/cgroup/system.slice/irqbalance.service/memory.pressure
-rw-r--r--. 1 root root
system_u:object_r:cgroup_t:s0 0 May 6 12:19
/sys/fs/cgroup/system.slice/sshd.service/memory.pressure
-rw-r--r--. 1 root root
system_u:object_r:memory_pressure_t:s0 0 May 6 12:19
/sys/fs/cgroup/system.slice/sysroot.mount/memory.pressure
-rw-r--r--. 1 root root
system_u:object_r:memory_pressure_t:s0 0 May 6 12:19
/sys/fs/cgroup/system.slice/system-getty.slice/memory.pressure
[...]
In case it was due to entries created in the initramfs, I tried
restarting auditd and still got cgroup_t on the memory.pressure. I
added a type_transition for all domains, but still get cgroup_t. I
can't explain why some memory.pressures would get the expected label,
but others not.
--
Chris PeBenito
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2024-05-06 13:32 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-05-02 18:37 cgroup2 labeling status Chris PeBenito
2024-05-02 18:53 ` Stephen Smalley
2024-05-02 19:16 ` Chris PeBenito
2024-05-03 12:00 ` Stephen Smalley
2024-05-06 13:32 ` Chris PeBenito
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).