LKML Archive mirror
 help / color / mirror / Atom feed
From: Peter Zijlstra <peterz@infradead.org>
To: joel@joelfernandes.org, chris.hyser@oracle.com,
	joshdon@google.com, mingo@kernel.org, vincent.guittot@linaro.org,
	valentin.schneider@arm.com, mgorman@suse.de
Cc: linux-kernel@vger.kernel.org, peterz@infradead.org,
	tj@kernel.org, tglx@linutronix.de
Subject: [PATCH 5/9] sched: prctl() core-scheduling interface
Date: Thu, 01 Apr 2021 15:10:17 +0200	[thread overview]
Message-ID: <20210401133917.350276562@infradead.org> (raw)
In-Reply-To: 20210401131012.395311786@infradead.org

From: Chris Hyser <chris.hyser@oracle.com>

This patch provides support for setting, clearing and copying core
scheduling 'task cookies' between threads (PID), processes (TGID), and
process groups (PGID).

The value of core scheduling isn't that tasks don't share a core,
'nosmt' can do that. The value lies in exploiting all the sharing
opportunities that exist to recover possible lost performance and that
requires a degree of flexibility in the API.

>From a security perspective (and there are others), the thread,
process and process group distinction is an existent hierarchal
categorization of tasks that reflects many of the security concerns
about 'data sharing'. For example, protecting against cache-snooping
by a thread that can just read the memory directly isn't all that
useful.

With this in mind, subcommands to CLEAR/CREATE/SHARE (TO/FROM) provide
a mechanism to create, clear and share cookies. CLEAR/CREATE/SHARE_TO
specify a target pid with enum pidtype used to specify the scope of
the targeted tasks. For example, PIDTYPE_TGID will share the cookie
with the process and all of it's threads as typically desired in a
security scenario.

API:

  prctl(PR_SCHED_CORE, PR_SCHED_CORE_GET, tgtpid, pidtype, &cookie)
  prctl(PR_SCHED_CORE, PR_SCHED_CORE_CLEAR, tgtpid, pidtype, NULL)
  prctl(PR_SCHED_CORE, PR_SCHED_CORE_CREATE, tgtpid, pidtype, NULL)
  prctl(PR_SCHED_CORE, PR_SCHED_CORE_SHARE_TO, tgtpid, pidtype, NULL)
  prctl(PR_SCHED_CORE, PR_SCHED_CORE_SHARE_FROM, srcpid, pidtype, NULL)

where 'tgtpid/srcpid == 0' implies the current process and pidtype is
kernel enum pid_type {PIDTYPE_PID, PIDTYPE_TGID, PIDTYPE_PGID, ...}.
PIDTYPE_SID, sharing a cookie with an entire session, was considered
less useful given the choice to create a new cookie on task exec().

For return values, EINVAL, ENOMEM are what they say. ESRCH means the
tgtpid/srcpid was not found. EPERM indicates lack of PTRACE permission
access to tgtpid/srcpid. EACCES indicates that a task in the target
pidtype group was not updated due to permission.

Current hard-coded policies are:

 - a user can clear the cookie of any process they can set a cookie for.
   Lack of a cookie *might* be a security issue if cookies are being used
   for that.

[peterz: complete rewrite]
Signed-off-by: Chris Hyser <chris.hyser@oracle.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20210324214020.34142-4-joel@joelfernandes.org
---
 include/linux/sched.h            |    2 
 include/uapi/linux/prctl.h       |    9 +++
 kernel/sched/core_sched.c        |  117 +++++++++++++++++++++++++++++++++++++++
 kernel/sys.c                     |    5 +
 tools/include/uapi/linux/prctl.h |    9 +++
 5 files changed, 142 insertions(+)

--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -2173,6 +2173,8 @@ const struct cpumask *sched_trace_rd_spa
 extern void sched_core_free(struct task_struct *tsk);
 extern int sched_core_exec(void);
 extern void sched_core_fork(struct task_struct *p);
+extern int sched_core_share_pid(unsigned int cmd, pid_t pid, enum pid_type type,
+				unsigned long uaddr);
 #else
 static inline void sched_core_free(struct task_struct *tsk) { }
 static inline int sched_core_exec(void) { return 0; }
--- a/include/uapi/linux/prctl.h
+++ b/include/uapi/linux/prctl.h
@@ -255,4 +255,13 @@ struct prctl_mm_map {
 # define SYSCALL_DISPATCH_FILTER_ALLOW	0
 # define SYSCALL_DISPATCH_FILTER_BLOCK	1
 
+/* Request the scheduler to share a core */
+#define PR_SCHED_CORE			60
+# define PR_SCHED_CORE_GET		0
+# define PR_SCHED_CORE_CLEAR		1 /* clear core_sched cookie of pid */
+# define PR_SCHED_CORE_CREATE		2 /* create unique core_sched cookie */
+# define PR_SCHED_CORE_SHARE_TO		3 /* push core_sched cookie to pid */
+# define PR_SCHED_CORE_SHARE_FROM	4 /* pull core_sched cookie to pid */
+# define PR_SCHED_CORE_MAX		5
+
 #endif /* _LINUX_PRCTL_H */
--- a/kernel/sched/core_sched.c
+++ b/kernel/sched/core_sched.c
@@ -1,5 +1,6 @@
 // SPDX-License-Identifier: GPL-2.0-only
 
+#include <linux/prctl.h>
 #include "sched.h"
 
 /*
@@ -125,3 +126,119 @@ int sched_core_exec(void)
 	return 0;
 }
 
+static void __sched_core_set(struct task_struct *p, unsigned long cookie)
+{
+	cookie = sched_core_get_cookie(cookie);
+	cookie = sched_core_update_cookie(p, cookie);
+	sched_core_put_cookie(cookie);
+}
+
+/* Called from prctl interface: PR_SCHED_CORE */
+int sched_core_share_pid(unsigned int cmd, pid_t pid, enum pid_type type,
+			 unsigned long uaddr)
+{
+	unsigned long cookie = 0, id = 0;
+	struct task_struct *task, *p;
+	struct pid *grp;
+	int err = 0;
+
+	if (!static_branch_likely(&sched_smt_present))
+		return -ENODEV;
+
+	if (type > PIDTYPE_PGID || cmd >= PR_SCHED_CORE_MAX || pid < 0 ||
+	    (cmd != PR_SCHED_CORE_GET && uaddr))
+		return -EINVAL;
+
+	rcu_read_lock();
+	if (pid == 0) {
+		task = current;
+	} else {
+		task = find_task_by_vpid(pid);
+		if (!task) {
+			rcu_read_unlock();
+			return -ESRCH;
+		}
+	}
+	get_task_struct(task);
+	rcu_read_unlock();
+
+	/*
+	 * Check if this process has the right to modify the specified
+	 * process. Use the regular "ptrace_may_access()" checks.
+	 */
+	if (!ptrace_may_access(task, PTRACE_MODE_READ_REALCREDS)) {
+		err = -EPERM;
+		goto out;
+	}
+
+	switch (cmd) {
+	case PR_SCHED_CORE_GET:
+		if (type != PIDTYPE_PID || uaddr & 7) {
+			err = -EINVAL;
+			goto out;
+		}
+		cookie = sched_core_clone_cookie(task);
+		if (cookie) {
+			/* XXX improve ? */
+			ptr_to_hashval((void *)cookie, &id);
+		}
+		err = put_user(id, (u64 __user *)uaddr);
+		goto out;
+
+	case PR_SCHED_CORE_CLEAR:
+		cookie = 0;
+		break;
+
+	case PR_SCHED_CORE_CREATE:
+		cookie = sched_core_alloc_cookie();
+		if (!cookie) {
+			err = -ENOMEM;
+			goto out;
+		}
+		break;
+
+	case PR_SCHED_CORE_SHARE_TO:
+		cookie = sched_core_clone_cookie(current);
+		break;
+
+	case PR_SCHED_CORE_SHARE_FROM:
+		if (type != PIDTYPE_PID) {
+			err = -EINVAL;
+			goto out;
+		}
+		cookie = sched_core_clone_cookie(task);
+		__sched_core_set(current, cookie);
+		goto out;
+
+	default:
+		err = -EINVAL;
+		goto out;
+	};
+
+	if (type == PIDTYPE_PID) {
+		__sched_core_set(task, cookie);
+		goto out;
+	}
+
+	read_lock(&tasklist_lock);
+	grp = task_pid_type(task, type);
+
+	do_each_pid_thread(grp, type, p) {
+		if (!ptrace_may_access(p, PTRACE_MODE_READ_REALCREDS)) {
+			err = -EPERM;
+			goto out_tasklist;
+		}
+	} while_each_pid_thread(grp, type, p);
+
+	do_each_pid_thread(grp, type, p) {
+		__sched_core_set(p, cookie);
+	} while_each_pid_thread(grp, type, p);
+out_tasklist:
+	read_unlock(&tasklist_lock);
+
+out:
+	sched_core_put_cookie(cookie);
+	put_task_struct(task);
+	return err;
+}
+
--- a/kernel/sys.c
+++ b/kernel/sys.c
@@ -2534,6 +2534,11 @@ SYSCALL_DEFINE5(prctl, int, option, unsi
 		error = set_syscall_user_dispatch(arg2, arg3, arg4,
 						  (char __user *) arg5);
 		break;
+#ifdef CONFIG_SCHED_CORE
+	case PR_SCHED_CORE:
+		error = sched_core_share_pid(arg2, arg3, arg4, arg5);
+		break;
+#endif
 	default:
 		error = -EINVAL;
 		break;
--- a/tools/include/uapi/linux/prctl.h
+++ b/tools/include/uapi/linux/prctl.h
@@ -255,4 +255,13 @@ struct prctl_mm_map {
 # define SYSCALL_DISPATCH_FILTER_ALLOW	0
 # define SYSCALL_DISPATCH_FILTER_BLOCK	1
 
+/* Request the scheduler to share a core */
+#define PR_SCHED_CORE			60
+# define PR_SCHED_CORE_GET		0
+# define PR_SCHED_CORE_CLEAR		1 /* clear core_sched cookie of pid */
+# define PR_SCHED_CORE_CREATE		2 /* create unique core_sched cookie */
+# define PR_SCHED_CORE_SHARE_TO		3 /* push core_sched cookie to pid */
+# define PR_SCHED_CORE_SHARE_FROM	4 /* pull core_sched cookie to pid */
+# define PR_SCHED_CORE_MAX		5
+
 #endif /* _LINUX_PRCTL_H */



  parent reply	other threads:[~2021-04-01 17:51 UTC|newest]

Thread overview: 36+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-04-01 13:10 [PATCH 0/9] sched: Core scheduling interfaces Peter Zijlstra
2021-04-01 13:10 ` [PATCH 1/9] sched: Allow sched_core_put() from atomic context Peter Zijlstra
2021-04-01 13:10 ` [PATCH 2/9] sched: Implement core-sched assertions Peter Zijlstra
2021-04-01 13:10 ` [PATCH 3/9] sched: Trivial core scheduling cookie management Peter Zijlstra
2021-04-01 20:04   ` Josh Don
2021-04-02  7:13     ` Peter Zijlstra
2021-04-01 13:10 ` [PATCH 4/9] sched: Default core-sched policy Peter Zijlstra
2021-04-21 13:33   ` Peter Zijlstra
2021-04-21 14:31     ` Chris Hyser
2021-04-01 13:10 ` Peter Zijlstra [this message]
2021-04-07 17:00   ` [PATCH 5/9] sched: prctl() core-scheduling interface Peter Zijlstra
2021-04-18  3:52     ` Joel Fernandes
2021-04-01 13:10 ` [PATCH 6/9] kselftest: Add test for core sched prctl interface Peter Zijlstra
2021-04-01 13:10 ` [PATCH 7/9] sched: Cgroup core-scheduling interface Peter Zijlstra
2021-04-02  0:34   ` Josh Don
2021-04-01 13:10 ` [PATCH 8/9] rbtree: Remove const from the rb_find_add() comparator Peter Zijlstra
2021-04-01 13:10 ` [PATCH 9/9] sched: prctl() and cgroup interaction Peter Zijlstra
2021-04-03  1:30   ` Josh Don
2021-04-06 15:12     ` Peter Zijlstra
2021-04-04 23:39 ` [PATCH 0/9] sched: Core scheduling interfaces Tejun Heo
2021-04-05 18:46   ` Joel Fernandes
2021-04-06 14:16     ` Tejun Heo
2021-04-18  1:35       ` Joel Fernandes
2021-04-19  9:00         ` Peter Zijlstra
2021-04-21 13:35           ` Peter Zijlstra
2021-04-21 14:45             ` Chris Hyser
2021-04-06 15:32   ` Peter Zijlstra
2021-04-06 16:08     ` Tejun Heo
2021-04-07 18:39       ` Peter Zijlstra
2021-04-07 16:50   ` Michal Koutný
2021-04-07 18:34     ` Peter Zijlstra
2021-04-08 13:25       ` Michal Koutný
2021-04-08 15:02         ` Peter Zijlstra
2021-04-09  0:16           ` Josh Don
2021-04-19 11:30       ` Tejun Heo
2021-04-20  1:17         ` Josh Don

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20210401133917.350276562@infradead.org \
    --to=peterz@infradead.org \
    --cc=chris.hyser@oracle.com \
    --cc=joel@joelfernandes.org \
    --cc=joshdon@google.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mgorman@suse.de \
    --cc=mingo@kernel.org \
    --cc=tglx@linutronix.de \
    --cc=tj@kernel.org \
    --cc=valentin.schneider@arm.com \
    --cc=vincent.guittot@linaro.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).