cgroup2 labeling status

SELinux Archive mirror
 help / color / mirror / Atom feed

* cgroup2 labeling status
@ 2024-05-02 18:37 Chris PeBenito
  2024-05-02 18:53 ` Stephen Smalley
  0 siblings, 1 reply; 5+ messages in thread
From: Chris PeBenito @ 2024-05-02 18:37 UTC (permalink / raw
  To: SElinux mailing list

The state of cgroup2 labeling and memory.pressure came up for me again. 
This was discussed March last year[1]. To summarize, refpolicy has a 
type_transition for the memory.pressure file in cgroup2 to a default of 
memory_pressure_t. For example this file:

/sys/fs/cgroup/system.slice/systemd-journald.service/memory.pressure

with the idea that we allow daemons to write to this without allowing 
writes to all cgroup_t.  Unfortunately, the thread ended and I haven't 
seen any improvement.

The conclusion was[3]:

> Ah, now I remembered that we made it such that the transitions would
> only apply if the parent directory has a label explicitly set by
> userspace (via setxattr). Not sure if we can improve it easily, since
> we can't use the normal inode-based logic for cgroupfs (the xattrs are
> stored in kernfs nodes, each of which can be exposed via multiple
> inodes if there is more than one cgroupfs mount).

Testing on a 6.6 kernel and systemd 255, I still see the same issues, 
where most are stuck at cgroup_t, with user.slice entries get 
memory_pressure_t[2].  Based on my investigations, the user.slice works 
because systemd sets the user.invocation_id xattr on these dirs.

Next, I tried modifying systemd to use it's version of 
setfscreatecon()+mkdir() when it creates the cgroup directories.  This 
did not change the labeling behavior.  Next I changed the code to a 
post-mkdir setfilecon() and then all the memory.pressures finally had 
expected labeling.

This setxattr() requirement is unfortunate, and the fact the 
setfscreatecon() doesn't work makes it more unfortunate.  Is there any 
improvement being worked?

[1] https://lore.kernel.org/selinux/87mt47ga29.fsf@defensec.nl/
[2] 
https://lore.kernel.org/selinux/CAEjxPJ77ZiWTwJ=hj2DFoNCg4XZMfiU6VNSNAnyCKc0Rd+nM6Q@mail.gmail.com/
[3] 
https://lore.kernel.org/selinux/CAFqZXNtLFsmb3n+H=7Jcp1g_sLEFdRL75fzvjMvTU1rXvaQXMA@mail.gmail.com/

-- 
Chris PeBenito

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: cgroup2 labeling status
  2024-05-02 18:37 cgroup2 labeling status Chris PeBenito
@ 2024-05-02 18:53 ` Stephen Smalley
  2024-05-02 19:16   ` Chris PeBenito
  0 siblings, 1 reply; 5+ messages in thread
From: Stephen Smalley @ 2024-05-02 18:53 UTC (permalink / raw
  To: Chris PeBenito, Paul Moore, Ondrej Mosnacek; +Cc: SElinux mailing list

On Thu, May 2, 2024 at 2:37 PM Chris PeBenito <pebenito@ieee.org> wrote:
>
> The state of cgroup2 labeling and memory.pressure came up for me again.
> This was discussed March last year[1]. To summarize, refpolicy has a
> type_transition for the memory.pressure file in cgroup2 to a default of
> memory_pressure_t. For example this file:
>
> /sys/fs/cgroup/system.slice/systemd-journald.service/memory.pressure
>
> with the idea that we allow daemons to write to this without allowing
> writes to all cgroup_t.  Unfortunately, the thread ended and I haven't
> seen any improvement.
>
> The conclusion was[3]:
>
> > Ah, now I remembered that we made it such that the transitions would
> > only apply if the parent directory has a label explicitly set by
> > userspace (via setxattr). Not sure if we can improve it easily, since
> > we can't use the normal inode-based logic for cgroupfs (the xattrs are
> > stored in kernfs nodes, each of which can be exposed via multiple
> > inodes if there is more than one cgroupfs mount).
>
> Testing on a 6.6 kernel and systemd 255, I still see the same issues,
> where most are stuck at cgroup_t, with user.slice entries get
> memory_pressure_t[2].  Based on my investigations, the user.slice works
> because systemd sets the user.invocation_id xattr on these dirs.
>
> Next, I tried modifying systemd to use it's version of
> setfscreatecon()+mkdir() when it creates the cgroup directories.  This
> did not change the labeling behavior.  Next I changed the code to a
> post-mkdir setfilecon() and then all the memory.pressures finally had
> expected labeling.
>
> This setxattr() requirement is unfortunate, and the fact the
> setfscreatecon() doesn't work makes it more unfortunate.  Is there any
> improvement being worked?

Possibly I misunderstand, but selinux_kernfs_init_security() appears
to honor the create_sid (setfscreatecon) if set, so I would have
expected that to work.

> [1] https://lore.kernel.org/selinux/87mt47ga29.fsf@defensec.nl/
> [2]
> https://lore.kernel.org/selinux/CAEjxPJ77ZiWTwJ=hj2DFoNCg4XZMfiU6VNSNAnyCKc0Rd+nM6Q@mail.gmail.com/
> [3]
> https://lore.kernel.org/selinux/CAFqZXNtLFsmb3n+H=7Jcp1g_sLEFdRL75fzvjMvTU1rXvaQXMA@mail.gmail.com/
>
> --
> Chris PeBenito
>

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: cgroup2 labeling status
  2024-05-02 18:53 ` Stephen Smalley
@ 2024-05-02 19:16   ` Chris PeBenito
  2024-05-03 12:00     ` Stephen Smalley
  0 siblings, 1 reply; 5+ messages in thread
From: Chris PeBenito @ 2024-05-02 19:16 UTC (permalink / raw
  To: Stephen Smalley, Paul Moore, Ondrej Mosnacek; +Cc: SElinux mailing list

On 5/2/2024 2:53 PM, Stephen Smalley wrote:
> On Thu, May 2, 2024 at 2:37 PM Chris PeBenito <pebenito@ieee.org> wrote:
>>
>> The state of cgroup2 labeling and memory.pressure came up for me again.
>> This was discussed March last year[1]. To summarize, refpolicy has a
>> type_transition for the memory.pressure file in cgroup2 to a default of
>> memory_pressure_t. For example this file:
>>
>> /sys/fs/cgroup/system.slice/systemd-journald.service/memory.pressure
>>
>> with the idea that we allow daemons to write to this without allowing
>> writes to all cgroup_t.  Unfortunately, the thread ended and I haven't
>> seen any improvement.
>>
>> The conclusion was[3]:
>>
>>> Ah, now I remembered that we made it such that the transitions would
>>> only apply if the parent directory has a label explicitly set by
>>> userspace (via setxattr). Not sure if we can improve it easily, since
>>> we can't use the normal inode-based logic for cgroupfs (the xattrs are
>>> stored in kernfs nodes, each of which can be exposed via multiple
>>> inodes if there is more than one cgroupfs mount).
>>
>> Testing on a 6.6 kernel and systemd 255, I still see the same issues,
>> where most are stuck at cgroup_t, with user.slice entries get
>> memory_pressure_t[2].  Based on my investigations, the user.slice works
>> because systemd sets the user.invocation_id xattr on these dirs.
>>
>> Next, I tried modifying systemd to use it's version of
>> setfscreatecon()+mkdir() when it creates the cgroup directories.  This
>> did not change the labeling behavior.  Next I changed the code to a
>> post-mkdir setfilecon() and then all the memory.pressures finally had
>> expected labeling.
>>
>> This setxattr() requirement is unfortunate, and the fact the
>> setfscreatecon() doesn't work makes it more unfortunate.  Is there any
>> improvement being worked?
> 
> Possibly I misunderstand, but selinux_kernfs_init_security() appears
> to honor the create_sid (setfscreatecon) if set, so I would have
> expected that to work.

Does there need to be an xattr on the cgroup2 fs root directory for this 
to work?  Based on the tracing I did on the systemd code, the post-mkdir 
setfilecon() would have happened on the root dir, but the 
setfscreatcon() version of the code change obviously wouldn't have 
changed anything when it ran on the cgroup2 root dir.


-- 
Chris PeBenito


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: cgroup2 labeling status
  2024-05-02 19:16   ` Chris PeBenito
@ 2024-05-03 12:00     ` Stephen Smalley
  2024-05-06 13:32       ` Chris PeBenito
  0 siblings, 1 reply; 5+ messages in thread
From: Stephen Smalley @ 2024-05-03 12:00 UTC (permalink / raw
  To: Chris PeBenito; +Cc: Paul Moore, Ondrej Mosnacek, SElinux mailing list

On Thu, May 2, 2024 at 3:16 PM Chris PeBenito <pebenito@ieee.org> wrote:
>
> On 5/2/2024 2:53 PM, Stephen Smalley wrote:
> > On Thu, May 2, 2024 at 2:37 PM Chris PeBenito <pebenito@ieee.org> wrote:
> >>
> >> The state of cgroup2 labeling and memory.pressure came up for me again.
> >> This was discussed March last year[1]. To summarize, refpolicy has a
> >> type_transition for the memory.pressure file in cgroup2 to a default of
> >> memory_pressure_t. For example this file:
> >>
> >> /sys/fs/cgroup/system.slice/systemd-journald.service/memory.pressure
> >>
> >> with the idea that we allow daemons to write to this without allowing
> >> writes to all cgroup_t.  Unfortunately, the thread ended and I haven't
> >> seen any improvement.
> >>
> >> The conclusion was[3]:
> >>
> >>> Ah, now I remembered that we made it such that the transitions would
> >>> only apply if the parent directory has a label explicitly set by
> >>> userspace (via setxattr). Not sure if we can improve it easily, since
> >>> we can't use the normal inode-based logic for cgroupfs (the xattrs are
> >>> stored in kernfs nodes, each of which can be exposed via multiple
> >>> inodes if there is more than one cgroupfs mount).
> >>
> >> Testing on a 6.6 kernel and systemd 255, I still see the same issues,
> >> where most are stuck at cgroup_t, with user.slice entries get
> >> memory_pressure_t[2].  Based on my investigations, the user.slice works
> >> because systemd sets the user.invocation_id xattr on these dirs.
> >>
> >> Next, I tried modifying systemd to use it's version of
> >> setfscreatecon()+mkdir() when it creates the cgroup directories.  This
> >> did not change the labeling behavior.  Next I changed the code to a
> >> post-mkdir setfilecon() and then all the memory.pressures finally had
> >> expected labeling.
> >>
> >> This setxattr() requirement is unfortunate, and the fact the
> >> setfscreatecon() doesn't work makes it more unfortunate.  Is there any
> >> improvement being worked?
> >
> > Possibly I misunderstand, but selinux_kernfs_init_security() appears
> > to honor the create_sid (setfscreatecon) if set, so I would have
> > expected that to work.
>
> Does there need to be an xattr on the cgroup2 fs root directory for this
> to work?  Based on the tracing I did on the systemd code, the post-mkdir
> setfilecon() would have happened on the root dir, but the
> setfscreatcon() version of the code change obviously wouldn't have
> changed anything when it ran on the cgroup2 root dir.

That could be the case, based on Ondrej's statement on the earlier
thread. So it isn't a limitation of the SELinux code per se but rather
the cgroup2/kernfs code.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: cgroup2 labeling status
  2024-05-03 12:00     ` Stephen Smalley
@ 2024-05-06 13:32       ` Chris PeBenito
  0 siblings, 0 replies; 5+ messages in thread
From: Chris PeBenito @ 2024-05-06 13:32 UTC (permalink / raw
  To: Stephen Smalley; +Cc: Paul Moore, Ondrej Mosnacek, SElinux mailing list

On 5/3/2024 8:00 AM, Stephen Smalley wrote:
> On Thu, May 2, 2024 at 3:16 PM Chris PeBenito <pebenito@ieee.org> wrote:
>>
>> On 5/2/2024 2:53 PM, Stephen Smalley wrote:
>>> On Thu, May 2, 2024 at 2:37 PM Chris PeBenito <pebenito@ieee.org> wrote:
>>>>
>>>> The state of cgroup2 labeling and memory.pressure came up for me again.
>>>> This was discussed March last year[1]. To summarize, refpolicy has a
>>>> type_transition for the memory.pressure file in cgroup2 to a default of
>>>> memory_pressure_t. For example this file:
>>>>
>>>> /sys/fs/cgroup/system.slice/systemd-journald.service/memory.pressure
>>>>
>>>> with the idea that we allow daemons to write to this without allowing
>>>> writes to all cgroup_t.  Unfortunately, the thread ended and I haven't
>>>> seen any improvement.
>>>>
>>>> The conclusion was[3]:
>>>>
>>>>> Ah, now I remembered that we made it such that the transitions would
>>>>> only apply if the parent directory has a label explicitly set by
>>>>> userspace (via setxattr). Not sure if we can improve it easily, since
>>>>> we can't use the normal inode-based logic for cgroupfs (the xattrs are
>>>>> stored in kernfs nodes, each of which can be exposed via multiple
>>>>> inodes if there is more than one cgroupfs mount).
>>>>
>>>> Testing on a 6.6 kernel and systemd 255, I still see the same issues,
>>>> where most are stuck at cgroup_t, with user.slice entries get
>>>> memory_pressure_t[2].  Based on my investigations, the user.slice works
>>>> because systemd sets the user.invocation_id xattr on these dirs.
>>>>
>>>> Next, I tried modifying systemd to use it's version of
>>>> setfscreatecon()+mkdir() when it creates the cgroup directories.  This
>>>> did not change the labeling behavior.  Next I changed the code to a
>>>> post-mkdir setfilecon() and then all the memory.pressures finally had
>>>> expected labeling.
>>>>
>>>> This setxattr() requirement is unfortunate, and the fact the
>>>> setfscreatecon() doesn't work makes it more unfortunate.  Is there any
>>>> improvement being worked?
>>>
>>> Possibly I misunderstand, but selinux_kernfs_init_security() appears
>>> to honor the create_sid (setfscreatecon) if set, so I would have
>>> expected that to work.
>>
>> Does there need to be an xattr on the cgroup2 fs root directory for this
>> to work?  Based on the tracing I did on the systemd code, the post-mkdir
>> setfilecon() would have happened on the root dir, but the
>> setfscreatcon() version of the code change obviously wouldn't have
>> changed anything when it ran on the cgroup2 root dir.
> 
> That could be the case, based on Ondrej's statement on the earlier
> thread. So it isn't a limitation of the SELinux code per se but rather
> the cgroup2/kernfs code.

I think I've reached the end of what I can debug from userspace.  I 
changed the genfscon to no_access_t so it would be obvious where the 
genfs label was still in use on a file.  It indicated a relabel is 
needed due to entries being created during initramfs, despite systemd 
supposedly relabeling /sys/fs/cgroup (still looking into this) just 
after loading the policy.  I added a tmpfiles.d entry to get the fs 
relabeled and retried the setfscreatecon() version and the results were 
quite weird:

root [ /home/pebenito ]# ls -lZ /sys/fs/cgroup/*/*/memory.pressure
-rw-r--r--. 1 root            root 
system_u:object_r:cgroup_t:s0          0 May  6 12:21 
/sys/fs/cgroup/system.slice/auditd.service/memory.pressure
-rw-r--r--. 1 root            root 
system_u:object_r:memory_pressure_t:s0 0 May  6 12:19 
/sys/fs/cgroup/system.slice/boot-efi.mount/memory.pressure
-rw-r--r--. 1 root            root 
system_u:object_r:cgroup_t:s0          0 May  6 12:19 
/sys/fs/cgroup/system.slice/chronyd.service/memory.pressure
-rw-r--r--. 1 root            root 
system_u:object_r:cgroup_t:s0          0 May  6 12:19 
/sys/fs/cgroup/system.slice/crond.service/memory.pressure
-rw-r--r--. 1 messagebus      messagebus 
system_u:object_r:cgroup_t:s0          0 May  6 12:19 
/sys/fs/cgroup/system.slice/dbus.service/memory.pressure
-rw-r--r--. 1 root            root 
system_u:object_r:cgroup_t:s0          0 May  6 12:19 
/sys/fs/cgroup/system.slice/hypervkvpd.service/memory.pressure
-rw-r--r--. 1 root            root 
system_u:object_r:cgroup_t:s0          0 May  6 12:19 
/sys/fs/cgroup/system.slice/hypervvssd.service/memory.pressure
-rw-r--r--. 1 root            root 
system_u:object_r:cgroup_t:s0          0 May  6 12:19 
/sys/fs/cgroup/system.slice/irqbalance.service/memory.pressure
-rw-r--r--. 1 root            root 
system_u:object_r:cgroup_t:s0          0 May  6 12:19 
/sys/fs/cgroup/system.slice/sshd.service/memory.pressure
-rw-r--r--. 1 root            root 
system_u:object_r:memory_pressure_t:s0 0 May  6 12:19 
/sys/fs/cgroup/system.slice/sysroot.mount/memory.pressure
-rw-r--r--. 1 root            root 
system_u:object_r:memory_pressure_t:s0 0 May  6 12:19 
/sys/fs/cgroup/system.slice/system-getty.slice/memory.pressure
[...]

In case it was due to entries created in the initramfs, I tried 
restarting auditd and still got cgroup_t on the memory.pressure.  I 
added a type_transition for all domains, but still get cgroup_t.  I 
can't explain why some memory.pressures would get the expected label, 
but others not.

-- 
Chris PeBenito


^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2024-05-06 13:32 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-05-02 18:37 cgroup2 labeling status Chris PeBenito
2024-05-02 18:53 ` Stephen Smalley
2024-05-02 19:16   ` Chris PeBenito
2024-05-03 12:00     ` Stephen Smalley
2024-05-06 13:32       ` Chris PeBenito

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).