[PATCH v2 0/2] Lock and Pointer guards

LKML Archive mirror
 help / color / mirror / Atom feed

* [PATCH v2 0/2] Lock and Pointer guards
@ 2023-05-26 20:52 Peter Zijlstra
  2023-05-26 20:52 ` [PATCH v2 1/2] locking: Introduce __cleanup__ based guards Peter Zijlstra
                   ` (3 more replies)
  0 siblings, 4 replies; 42+ messages in thread
From: Peter Zijlstra @ 2023-05-26 20:52 UTC (permalink / raw)
  To: torvalds, keescook, gregkh, pbonzini
  Cc: linux-kernel, ojeda, ndesaulniers, peterz, mingo, will, longman,
	boqun.feng, juri.lelli, vincent.guittot, dietmar.eggemann,
	rostedt, bsegall, mgorman, bristot, vschneid, paulmck, frederic,
	quic_neeraju, joel, josh, mathieu.desnoyers, jiangshanlai, rcu,
	tj, tglx

By popular demand, a new and improved version :-)

New since -v1 ( https://lkml.kernel.org/r/20230526150549.250372621@infradead.org )

 - much improved interface for lock guards: guard() and scoped () { }
   as suggested by Linus.
 - moved the ';' for the 'last' additional guard members into the definition
   as suggested by Kees.
 - put __cleanup in the right place with the suggested comment
   as suggested by Kees and Miguel
 - renamed the definition macros DEFINE_LOCK_GUARD_[012] to indicate
   the number of arguments the lock function takes
 - added comments to guards.h
 - renamed pretty much all guard types

Again compile and boot tested x86_64-defconfig



^ permalink raw reply	[flat|nested] 42+ messages in thread

* [PATCH v2 1/2] locking: Introduce __cleanup__ based guards
  2023-05-26 20:52 [PATCH v2 0/2] Lock and Pointer guards Peter Zijlstra
@ 2023-05-26 20:52 ` Peter Zijlstra
  2023-05-26 21:24   ` Peter Zijlstra
  2023-05-26 20:52 ` [PATCH v2 2/2] sched: Use fancy new guards Peter Zijlstra
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 42+ messages in thread
From: Peter Zijlstra @ 2023-05-26 20:52 UTC (permalink / raw)
  To: torvalds, keescook, gregkh, pbonzini
  Cc: linux-kernel, ojeda, ndesaulniers, peterz, mingo, will, longman,
	boqun.feng, juri.lelli, vincent.guittot, dietmar.eggemann,
	rostedt, bsegall, mgorman, bristot, vschneid, paulmck, frederic,
	quic_neeraju, joel, josh, mathieu.desnoyers, jiangshanlai, rcu,
	tj, tglx

Use __attribute__((__cleanup__(func))) to buid pointer and lock
guards.

Actual usage in the next patch

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
 include/linux/compiler_attributes.h |    6 +
 include/linux/guards.h              |  142 ++++++++++++++++++++++++++++++++++++
 include/linux/irqflags.h            |    7 +
 include/linux/mutex.h               |    5 +
 include/linux/preempt.h             |    4 +
 include/linux/rcupdate.h            |    3 
 include/linux/sched/task.h          |    2 
 include/linux/spinlock.h            |   27 ++++++
 scripts/checkpatch.pl               |    2 
 9 files changed, 197 insertions(+), 1 deletion(-)

--- a/include/linux/compiler_attributes.h
+++ b/include/linux/compiler_attributes.h
@@ -77,6 +77,12 @@
 #define __attribute_const__             __attribute__((__const__))
 
 /*
+ *   gcc: https://gcc.gnu.org/onlinedocs/gcc/Common-Variable-Attributes.html#index-cleanup-variable-attribute
+ * clang: https://clang.llvm.org/docs/AttributeReference.html#cleanup
+ */
+#define __cleanup(func)			__attribute__((__cleanup__(func)))
+
+/*
  * Optional: only supported since gcc >= 9
  * Optional: not supported by clang
  *
--- /dev/null
+++ b/include/linux/guards.h
@@ -0,0 +1,142 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef __LINUX_GUARDS_H
+#define __LINUX_GUARDS_H
+
+#include <linux/compiler_attributes.h>
+
+/*
+ * Pointer Guards are special pointers (variables) with a scope bound cleanup
+ * function.
+ *
+ * Various guard types can be created using:
+ *
+ *   DEFINE_PTR_GUARD(guard_type, typename, cleanup-exp)
+ *
+ * After which they can be used like so:
+ *
+ *   ptr_guard(guard_type, name) = find_get_object(foo);
+ *
+ * Where the return type of find_get_object() should match the guard_type's
+ * 'typname *'. And when @name goes out of scope cleanup-exp is ran (inserted
+ * by the compiler) when !NULL at that time. Also see the __cleanup attribute.
+ */
+
+#define DEFINE_PTR_GUARD(_type, _Type, _Put)					\
+typedef _Type *ptr_guard_##_type##_t;						\
+										\
+static inline void ptr_guard_##_type##_cleanup(_Type **_ptr)			\
+{										\
+	_Type *_G = *_ptr;							\
+	if (_G)									\
+		_Put(_G);							\
+}
+
+#define ptr_guard(_type, _name)							\
+	ptr_guard_##_type##_t _name __cleanup(ptr_guard_##_type##_cleanup)
+
+
+/*
+ * Lock Guards are like the pointer guards above except they also have
+ * a fixed initializor to cover both the Lock and Unlock of the lock type.
+ *
+ * Lock guards types can be created using:
+ *
+ *   DEFINE_LOCK_GUARD_0(guard_type, Lock, Unlock, [extra guard members])
+ *   DEFINE_LOCK_GUARD_1(guard_type, typename, Lock, Unlock, ...)
+ *   DEFINE_LOCK_GUARD_2(guard_type, typename, Lock, Unlock, ...)
+ *
+ * Where the _n suffix indicates the number of arguments of 'typename *' the
+ * Lock function requires.
+ *
+ * Once defined, the lock guards can be used in one of two ways:
+ *
+ *	guard(guard_type, name, var...);
+ *
+ * or:
+ *
+ *	scoped (guard_type, var...) {
+ *		...
+ *	}
+ *
+ * The first creates a named variable that is initialized with the Lock
+ * function and will call the Unlock function when it goes out of scope.
+ *
+ * The second creates an explicit scope, using a for-loop with an implicit
+ * named _scope variable. Again, Lock is called before the scope is entered and
+ * Unlock will be called when the scope is left.
+ *
+ * Both Lock and Unlock are expressions and can access the guard object through
+ * the _G pointer. The guard object will have _n implicit members called of
+ * type 'typename *' called 'lock' and 'lock2' as well as any additional
+ * members specified in the definition.
+ */
+
+#define DEFINE_LOCK_GUARD_0(_type, _Lock, _Unlock, ...)				\
+typedef struct {								\
+	__VA_ARGS__;								\
+} lock_guard_##_type##_t;							\
+										\
+static inline void lock_guard_##_type##_cleanup(lock_guard_##_type##_t *_G)	\
+{										\
+	_Unlock;								\
+}										\
+										\
+static inline lock_guard_##_type##_t lock_guard_##_type##_init(void)		\
+{										\
+	lock_guard_##_type##_t _g = { }, *_G __maybe_unused = &_g;		\
+	_Lock;									\
+	return _g;								\
+}
+
+#define DEFINE_LOCK_GUARD_1(_type, _Type, _Lock, _Unlock, ...)			\
+typedef struct {								\
+	_Type *lock;								\
+	__VA_ARGS__;								\
+} lock_guard_##_type##_t;							\
+										\
+static inline void lock_guard_##_type##_cleanup(lock_guard_##_type##_t *_G)	\
+{										\
+	_Unlock;								\
+}										\
+										\
+static inline lock_guard_##_type##_t lock_guard_##_type##_init(_Type *lock)	\
+{										\
+	lock_guard_##_type##_t _g = { .lock = lock }, *_G = &_g;		\
+	_Lock;									\
+	return _g;								\
+}
+
+#define DEFINE_LOCK_GUARD_2(_type, _Type, _Lock, _Unlock, ...)			\
+typedef struct {								\
+	_Type *lock;								\
+	_Type *lock2;								\
+	__VA_ARGS__;								\
+} lock_guard_##_type##_t;							\
+										\
+static inline void lock_guard_##_type##_cleanup(lock_guard_##_type##_t *_G)	\
+{										\
+	_Unlock;								\
+}										\
+										\
+static inline lock_guard_##_type##_t						\
+lock_guard_##_type##_init(_Type *lock, _Type *lock2)				\
+{										\
+	lock_guard_##_type##_t _g = { .lock = lock, .lock2 = lock2 }, *_G = &_g;\
+	_Lock;									\
+	return _g;								\
+}
+
+#define variable_scope(_type, _enter, _exit)					\
+	for (_type *_done = NULL, _scope __cleanup(_exit) = _enter;		\
+	     !_done; _done = (void *)8)
+
+#define scoped(_type, _var...)							\
+	variable_scope(lock_guard_##_type##_t,					\
+		       lock_guard_##_type##_init(_var),				\
+		       lock_guard_##_type##_cleanup)
+
+#define guard(_type, _name, _var...)						\
+	lock_guard_##_type##_t __cleanup(lock_guard_##_type##_cleanup) _name =	\
+		lock_guard_##_type##_init(_var)
+
+#endif /* __LINUX_GUARDS_H */
--- a/include/linux/irqflags.h
+++ b/include/linux/irqflags.h
@@ -13,6 +13,7 @@
 #define _LINUX_TRACE_IRQFLAGS_H
 
 #include <linux/typecheck.h>
+#include <linux/guards.h>
 #include <asm/irqflags.h>
 #include <asm/percpu.h>
 
@@ -267,4 +268,10 @@ extern void warn_bogus_irq_restore(void)
 
 #define irqs_disabled_flags(flags) raw_irqs_disabled_flags(flags)
 
+DEFINE_LOCK_GUARD_0(irq, local_irq_disable(), local_irq_enable())
+DEFINE_LOCK_GUARD_0(irqsave,
+		    local_irq_save(_G->flags),
+		    local_irq_restore(_G->flags),
+		    unsigned long flags)
+
 #endif
--- a/include/linux/mutex.h
+++ b/include/linux/mutex.h
@@ -19,6 +19,7 @@
 #include <asm/processor.h>
 #include <linux/osq_lock.h>
 #include <linux/debug_locks.h>
+#include <linux/guards.h>
 
 #ifdef CONFIG_DEBUG_LOCK_ALLOC
 # define __DEP_MAP_MUTEX_INITIALIZER(lockname)			\
@@ -219,4 +220,8 @@ extern void mutex_unlock(struct mutex *l
 
 extern int atomic_dec_and_mutex_lock(atomic_t *cnt, struct mutex *lock);
 
+DEFINE_LOCK_GUARD_1(mutex, struct mutex,
+		    mutex_lock(_G->lock),
+		    mutex_unlock(_G->lock))
+
 #endif /* __LINUX_MUTEX_H */
--- a/include/linux/preempt.h
+++ b/include/linux/preempt.h
@@ -8,6 +8,7 @@
  */
 
 #include <linux/linkage.h>
+#include <linux/guards.h>
 #include <linux/list.h>
 
 /*
@@ -463,4 +464,7 @@ static __always_inline void preempt_enab
 		preempt_enable();
 }
 
+DEFINE_LOCK_GUARD_0(preempt, preempt_disable(), preempt_enable())
+DEFINE_LOCK_GUARD_0(migrate, migrate_disable(), migrate_enable())
+
 #endif /* __LINUX_PREEMPT_H */
--- a/include/linux/rcupdate.h
+++ b/include/linux/rcupdate.h
@@ -27,6 +27,7 @@
 #include <linux/preempt.h>
 #include <linux/bottom_half.h>
 #include <linux/lockdep.h>
+#include <linux/guards.h>
 #include <asm/processor.h>
 #include <linux/cpumask.h>
 #include <linux/context_tracking_irq.h>
@@ -1095,4 +1096,6 @@ rcu_head_after_call_rcu(struct rcu_head
 extern int rcu_expedited;
 extern int rcu_normal;
 
+DEFINE_LOCK_GUARD_0(rcu, rcu_read_lock(), rcu_read_unlock())
+
 #endif /* __LINUX_RCUPDATE_H */
--- a/include/linux/sched/task.h
+++ b/include/linux/sched/task.h
@@ -126,6 +126,8 @@ static inline void put_task_struct(struc
 		__put_task_struct(t);
 }
 
+DEFINE_PTR_GUARD(put_task, struct task_struct, put_task_struct)
+
 static inline void put_task_struct_many(struct task_struct *t, int nr)
 {
 	if (refcount_sub_and_test(nr, &t->usage))
--- a/include/linux/spinlock.h
+++ b/include/linux/spinlock.h
@@ -61,6 +61,7 @@
 #include <linux/stringify.h>
 #include <linux/bottom_half.h>
 #include <linux/lockdep.h>
+#include <linux/guards.h>
 #include <asm/barrier.h>
 #include <asm/mmiowb.h>
 
@@ -502,5 +503,31 @@ int __alloc_bucket_spinlocks(spinlock_t
 
 void free_bucket_spinlocks(spinlock_t *locks);
 
+DEFINE_LOCK_GUARD_1(raw_spinlock, raw_spinlock_t,
+		    raw_spin_lock(_G->lock),
+		    raw_spin_unlock(_G->lock))
+
+DEFINE_LOCK_GUARD_1(raw_spinlock_irq, raw_spinlock_t,
+		    raw_spin_lock_irq(_G->lock),
+		    raw_spin_unlock_irq(_G->lock))
+
+DEFINE_LOCK_GUARD_1(raw_spinlock_irqsave, raw_spinlock_t,
+		    raw_spin_lock_irqsave(_G->lock, _G->flags),
+		    raw_spin_unlock_irqrestore(_G->lock, _G->flags),
+		    unsigned long flags)
+
+DEFINE_LOCK_GUARD_1(spinlock, spinlock_t,
+		    spin_lock(_G->lock),
+		    spin_unlock(_G->lock))
+
+DEFINE_LOCK_GUARD_1(spinlock_irq, spinlock_t,
+		    spin_lock_irq(_G->lock),
+		    spin_unlock_irq(_G->lock))
+
+DEFINE_LOCK_GUARD_1(spinlock_irqsave, spinlock_t,
+		    spin_lock_irqsave(_G->lock, _G->flags),
+		    spin_unlock_irqrestore(_G->lock, _G->flags),
+		    unsigned long flags)
+
 #undef __LINUX_INSIDE_SPINLOCK_H
 #endif /* __LINUX_SPINLOCK_H */
--- a/scripts/checkpatch.pl
+++ b/scripts/checkpatch.pl
@@ -5046,7 +5046,7 @@ sub process {
 				if|for|while|switch|return|case|
 				volatile|__volatile__|
 				__attribute__|format|__extension__|
-				asm|__asm__)$/x)
+				asm|__asm__|scoped)$/x)
 			{
 			# cpp #define statements have non-optional spaces, ie
 			# if there is a space between the name and the open



^ permalink raw reply	[flat|nested] 42+ messages in thread

* [PATCH v2 2/2] sched: Use fancy new guards
  2023-05-26 20:52 [PATCH v2 0/2] Lock and Pointer guards Peter Zijlstra
  2023-05-26 20:52 ` [PATCH v2 1/2] locking: Introduce __cleanup__ based guards Peter Zijlstra
@ 2023-05-26 20:52 ` Peter Zijlstra
  2023-05-27 17:21 ` [PATCH v2 0/2] Lock and Pointer guards Mathieu Desnoyers
  2023-05-27 19:18 ` Linus Torvalds
  3 siblings, 0 replies; 42+ messages in thread
From: Peter Zijlstra @ 2023-05-26 20:52 UTC (permalink / raw)
  To: torvalds, keescook, gregkh, pbonzini
  Cc: linux-kernel, ojeda, ndesaulniers, peterz, mingo, will, longman,
	boqun.feng, juri.lelli, vincent.guittot, dietmar.eggemann,
	rostedt, bsegall, mgorman, bristot, vschneid, paulmck, frederic,
	quic_neeraju, joel, josh, mathieu.desnoyers, jiangshanlai, rcu,
	tj, tglx

Convert kernel/sched/core.c to use the fancy new guards to simplify
the error paths.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
 kernel/sched/core.c  | 1227 +++++++++++++++++++++++----------------------------
 kernel/sched/sched.h |   34 +
 2 files changed, 590 insertions(+), 671 deletions(-)

--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -1097,24 +1097,21 @@ int get_nohz_timer_target(void)
 
 	hk_mask = housekeeping_cpumask(HK_TYPE_TIMER);
 
-	rcu_read_lock();
-	for_each_domain(cpu, sd) {
-		for_each_cpu_and(i, sched_domain_span(sd), hk_mask) {
-			if (cpu == i)
-				continue;
+	scoped (rcu) {
+		for_each_domain(cpu, sd) {
+			for_each_cpu_and(i, sched_domain_span(sd), hk_mask) {
+				if (cpu == i)
+					continue;
 
-			if (!idle_cpu(i)) {
-				cpu = i;
-				goto unlock;
+				if (!idle_cpu(i))
+					return i;
 			}
 		}
-	}
 
-	if (default_cpu == -1)
-		default_cpu = housekeeping_any_cpu(HK_TYPE_TIMER);
-	cpu = default_cpu;
-unlock:
-	rcu_read_unlock();
+		if (default_cpu == -1)
+			default_cpu = housekeeping_any_cpu(HK_TYPE_TIMER);
+		cpu = default_cpu;
+	}
 	return cpu;
 }
 
@@ -1457,16 +1454,12 @@ static void __uclamp_update_util_min_rt_
 
 static void uclamp_update_util_min_rt_default(struct task_struct *p)
 {
-	struct rq_flags rf;
-	struct rq *rq;
-
 	if (!rt_task(p))
 		return;
 
 	/* Protect updates to p->uclamp_* */
-	rq = task_rq_lock(p, &rf);
-	__uclamp_update_util_min_rt_default(p);
-	task_rq_unlock(rq, p, &rf);
+	scoped (task_rq_lock, p)
+		__uclamp_update_util_min_rt_default(p);
 }
 
 static inline struct uclamp_se
@@ -1762,9 +1755,8 @@ static void uclamp_update_root_tg(void)
 	uclamp_se_set(&tg->uclamp_req[UCLAMP_MAX],
 		      sysctl_sched_uclamp_util_max, false);
 
-	rcu_read_lock();
-	cpu_util_update_eff(&root_task_group.css);
-	rcu_read_unlock();
+	scoped (rcu)
+		cpu_util_update_eff(&root_task_group.css);
 }
 #else
 static void uclamp_update_root_tg(void) { }
@@ -1791,10 +1783,10 @@ static void uclamp_sync_util_min_rt_defa
 	smp_mb__after_spinlock();
 	read_unlock(&tasklist_lock);
 
-	rcu_read_lock();
-	for_each_process_thread(g, p)
-		uclamp_update_util_min_rt_default(p);
-	rcu_read_unlock();
+	scoped (rcu) {
+		for_each_process_thread(g, p)
+			uclamp_update_util_min_rt_default(p);
+	}
 }
 
 static int sysctl_sched_uclamp_handler(struct ctl_table *table, int write,
@@ -1804,7 +1796,8 @@ static int sysctl_sched_uclamp_handler(s
 	int old_min, old_max, old_min_rt;
 	int result;
 
-	mutex_lock(&uclamp_mutex);
+	guard(mutex, guard, &uclamp_mutex);
+
 	old_min = sysctl_sched_uclamp_util_min;
 	old_max = sysctl_sched_uclamp_util_max;
 	old_min_rt = sysctl_sched_uclamp_util_min_rt_default;
@@ -1813,7 +1806,7 @@ static int sysctl_sched_uclamp_handler(s
 	if (result)
 		goto undo;
 	if (!write)
-		goto done;
+		return result;
 
 	if (sysctl_sched_uclamp_util_min > sysctl_sched_uclamp_util_max ||
 	    sysctl_sched_uclamp_util_max > SCHED_CAPACITY_SCALE	||
@@ -1849,16 +1842,12 @@ static int sysctl_sched_uclamp_handler(s
 	 * Otherwise, keep it simple and do just a lazy update at each next
 	 * task enqueue time.
 	 */
-
-	goto done;
+	return result;
 
 undo:
 	sysctl_sched_uclamp_util_min = old_min;
 	sysctl_sched_uclamp_util_max = old_max;
 	sysctl_sched_uclamp_util_min_rt_default = old_min_rt;
-done:
-	mutex_unlock(&uclamp_mutex);
-
 	return result;
 }
 #endif
@@ -2249,10 +2238,10 @@ void migrate_disable(void)
 		return;
 	}
 
-	preempt_disable();
-	this_rq()->nr_pinned++;
-	p->migration_disabled = 1;
-	preempt_enable();
+	scoped (preempt) {
+		this_rq()->nr_pinned++;
+		p->migration_disabled = 1;
+	}
 }
 EXPORT_SYMBOL_GPL(migrate_disable);
 
@@ -2276,18 +2265,18 @@ void migrate_enable(void)
 	 * Ensure stop_task runs either before or after this, and that
 	 * __set_cpus_allowed_ptr(SCA_MIGRATE_ENABLE) doesn't schedule().
 	 */
-	preempt_disable();
-	if (p->cpus_ptr != &p->cpus_mask)
-		__set_cpus_allowed_ptr(p, &ac);
-	/*
-	 * Mustn't clear migration_disabled() until cpus_ptr points back at the
-	 * regular cpus_mask, otherwise things that race (eg.
-	 * select_fallback_rq) get confused.
-	 */
-	barrier();
-	p->migration_disabled = 0;
-	this_rq()->nr_pinned--;
-	preempt_enable();
+	scoped (preempt) {
+		if (p->cpus_ptr != &p->cpus_mask)
+			__set_cpus_allowed_ptr(p, &ac);
+		/*
+		 * Mustn't clear migration_disabled() until cpus_ptr points back at the
+		 * regular cpus_mask, otherwise things that race (eg.
+		 * select_fallback_rq) get confused.
+		 */
+		barrier();
+		p->migration_disabled = 0;
+		this_rq()->nr_pinned--;
+	}
 }
 EXPORT_SYMBOL_GPL(migrate_enable);
 
@@ -3272,31 +3261,26 @@ static int migrate_swap_stop(void *data)
 	src_rq = cpu_rq(arg->src_cpu);
 	dst_rq = cpu_rq(arg->dst_cpu);
 
-	double_raw_lock(&arg->src_task->pi_lock,
-			&arg->dst_task->pi_lock);
-	double_rq_lock(src_rq, dst_rq);
-
-	if (task_cpu(arg->dst_task) != arg->dst_cpu)
-		goto unlock;
+	scoped (double_raw_spinlock, &arg->src_task->pi_lock, &arg->dst_task->pi_lock) {
+		guard(double_rq_lock, guard, src_rq, dst_rq);
 
-	if (task_cpu(arg->src_task) != arg->src_cpu)
-		goto unlock;
+		if (task_cpu(arg->dst_task) != arg->dst_cpu)
+			break;
 
-	if (!cpumask_test_cpu(arg->dst_cpu, arg->src_task->cpus_ptr))
-		goto unlock;
+		if (task_cpu(arg->src_task) != arg->src_cpu)
+			break;
 
-	if (!cpumask_test_cpu(arg->src_cpu, arg->dst_task->cpus_ptr))
-		goto unlock;
+		if (!cpumask_test_cpu(arg->dst_cpu, arg->src_task->cpus_ptr))
+			break;
 
-	__migrate_swap_task(arg->src_task, arg->dst_cpu);
-	__migrate_swap_task(arg->dst_task, arg->src_cpu);
+		if (!cpumask_test_cpu(arg->src_cpu, arg->dst_task->cpus_ptr))
+			break;
 
-	ret = 0;
+		__migrate_swap_task(arg->src_task, arg->dst_cpu);
+		__migrate_swap_task(arg->dst_task, arg->src_cpu);
 
-unlock:
-	double_rq_unlock(src_rq, dst_rq);
-	raw_spin_unlock(&arg->dst_task->pi_lock);
-	raw_spin_unlock(&arg->src_task->pi_lock);
+		ret = 0;
+	}
 
 	return ret;
 }
@@ -3464,13 +3448,11 @@ unsigned long wait_task_inactive(struct
  */
 void kick_process(struct task_struct *p)
 {
-	int cpu;
+	guard(preempt, guard);
+	int cpu = task_cpu(p);
 
-	preempt_disable();
-	cpu = task_cpu(p);
 	if ((cpu != smp_processor_id()) && task_curr(p))
 		smp_send_reschedule(cpu);
-	preempt_enable();
 }
 EXPORT_SYMBOL_GPL(kick_process);
 
@@ -3679,16 +3661,15 @@ ttwu_stat(struct task_struct *p, int cpu
 		__schedstat_inc(p->stats.nr_wakeups_local);
 	} else {
 		struct sched_domain *sd;
+		guard(rcu, guard);
 
 		__schedstat_inc(p->stats.nr_wakeups_remote);
-		rcu_read_lock();
 		for_each_domain(rq->cpu, sd) {
 			if (cpumask_test_cpu(cpu, sched_domain_span(sd))) {
 				__schedstat_inc(sd->ttwu_wake_remote);
 				break;
 			}
 		}
-		rcu_read_unlock();
 	}
 
 	if (wake_flags & WF_MIGRATED)
@@ -3887,21 +3868,13 @@ static void __ttwu_queue_wakelist(struct
 void wake_up_if_idle(int cpu)
 {
 	struct rq *rq = cpu_rq(cpu);
-	struct rq_flags rf;
-
-	rcu_read_lock();
-
-	if (!is_idle_task(rcu_dereference(rq->curr)))
-		goto out;
 
-	rq_lock_irqsave(rq, &rf);
-	if (is_idle_task(rq->curr))
-		resched_curr(rq);
-	/* Else CPU is not idle, do nothing here: */
-	rq_unlock_irqrestore(rq, &rf);
-
-out:
-	rcu_read_unlock();
+	guard(rcu, guard);
+	if (is_idle_task(rcu_dereference(rq->curr))) {
+		guard(rq_lock, rq_guard, rq);
+		if (is_idle_task(rq->curr))
+			resched_curr(rq);
+	}
 }
 
 bool cpus_share_cache(int this_cpu, int that_cpu)
@@ -4158,10 +4131,9 @@ bool ttwu_state_match(struct task_struct
 static int
 try_to_wake_up(struct task_struct *p, unsigned int state, int wake_flags)
 {
-	unsigned long flags;
+	guard(preempt, guard);
 	int cpu, success = 0;
 
-	preempt_disable();
 	if (p == current) {
 		/*
 		 * We're waking current, this means 'p->on_rq' and 'task_cpu(p)
@@ -4188,129 +4160,127 @@ try_to_wake_up(struct task_struct *p, un
 	 * reordered with p->state check below. This pairs with smp_store_mb()
 	 * in set_current_state() that the waiting thread does.
 	 */
-	raw_spin_lock_irqsave(&p->pi_lock, flags);
-	smp_mb__after_spinlock();
-	if (!ttwu_state_match(p, state, &success))
-		goto unlock;
+	scoped (raw_spinlock_irqsave, &p->pi_lock) {
+		smp_mb__after_spinlock();
+		if (!ttwu_state_match(p, state, &success))
+			break;
 
-	trace_sched_waking(p);
+		trace_sched_waking(p);
 
-	/*
-	 * Ensure we load p->on_rq _after_ p->state, otherwise it would
-	 * be possible to, falsely, observe p->on_rq == 0 and get stuck
-	 * in smp_cond_load_acquire() below.
-	 *
-	 * sched_ttwu_pending()			try_to_wake_up()
-	 *   STORE p->on_rq = 1			  LOAD p->state
-	 *   UNLOCK rq->lock
-	 *
-	 * __schedule() (switch to task 'p')
-	 *   LOCK rq->lock			  smp_rmb();
-	 *   smp_mb__after_spinlock();
-	 *   UNLOCK rq->lock
-	 *
-	 * [task p]
-	 *   STORE p->state = UNINTERRUPTIBLE	  LOAD p->on_rq
-	 *
-	 * Pairs with the LOCK+smp_mb__after_spinlock() on rq->lock in
-	 * __schedule().  See the comment for smp_mb__after_spinlock().
-	 *
-	 * A similar smb_rmb() lives in try_invoke_on_locked_down_task().
-	 */
-	smp_rmb();
-	if (READ_ONCE(p->on_rq) && ttwu_runnable(p, wake_flags))
-		goto unlock;
+		/*
+		 * Ensure we load p->on_rq _after_ p->state, otherwise it would
+		 * be possible to, falsely, observe p->on_rq == 0 and get stuck
+		 * in smp_cond_load_acquire() below.
+		 *
+		 * sched_ttwu_pending()			try_to_wake_up()
+		 *   STORE p->on_rq = 1			  LOAD p->state
+		 *   UNLOCK rq->lock
+		 *
+		 * __schedule() (switch to task 'p')
+		 *   LOCK rq->lock			  smp_rmb();
+		 *   smp_mb__after_spinlock();
+		 *   UNLOCK rq->lock
+		 *
+		 * [task p]
+		 *   STORE p->state = UNINTERRUPTIBLE	  LOAD p->on_rq
+		 *
+		 * Pairs with the LOCK+smp_mb__after_spinlock() on rq->lock in
+		 * __schedule().  See the comment for smp_mb__after_spinlock().
+		 *
+		 * A similar smb_rmb() lives in try_invoke_on_locked_down_task().
+		 */
+		smp_rmb();
+		if (READ_ONCE(p->on_rq) && ttwu_runnable(p, wake_flags))
+			break;
 
 #ifdef CONFIG_SMP
-	/*
-	 * Ensure we load p->on_cpu _after_ p->on_rq, otherwise it would be
-	 * possible to, falsely, observe p->on_cpu == 0.
-	 *
-	 * One must be running (->on_cpu == 1) in order to remove oneself
-	 * from the runqueue.
-	 *
-	 * __schedule() (switch to task 'p')	try_to_wake_up()
-	 *   STORE p->on_cpu = 1		  LOAD p->on_rq
-	 *   UNLOCK rq->lock
-	 *
-	 * __schedule() (put 'p' to sleep)
-	 *   LOCK rq->lock			  smp_rmb();
-	 *   smp_mb__after_spinlock();
-	 *   STORE p->on_rq = 0			  LOAD p->on_cpu
-	 *
-	 * Pairs with the LOCK+smp_mb__after_spinlock() on rq->lock in
-	 * __schedule().  See the comment for smp_mb__after_spinlock().
-	 *
-	 * Form a control-dep-acquire with p->on_rq == 0 above, to ensure
-	 * schedule()'s deactivate_task() has 'happened' and p will no longer
-	 * care about it's own p->state. See the comment in __schedule().
-	 */
-	smp_acquire__after_ctrl_dep();
+		/*
+		 * Ensure we load p->on_cpu _after_ p->on_rq, otherwise it would be
+		 * possible to, falsely, observe p->on_cpu == 0.
+		 *
+		 * One must be running (->on_cpu == 1) in order to remove oneself
+		 * from the runqueue.
+		 *
+		 * __schedule() (switch to task 'p')	try_to_wake_up()
+		 *   STORE p->on_cpu = 1		  LOAD p->on_rq
+		 *   UNLOCK rq->lock
+		 *
+		 * __schedule() (put 'p' to sleep)
+		 *   LOCK rq->lock			  smp_rmb();
+		 *   smp_mb__after_spinlock();
+		 *   STORE p->on_rq = 0			  LOAD p->on_cpu
+		 *
+		 * Pairs with the LOCK+smp_mb__after_spinlock() on rq->lock in
+		 * __schedule().  See the comment for smp_mb__after_spinlock().
+		 *
+		 * Form a control-dep-acquire with p->on_rq == 0 above, to ensure
+		 * schedule()'s deactivate_task() has 'happened' and p will no longer
+		 * care about it's own p->state. See the comment in __schedule().
+		 */
+		smp_acquire__after_ctrl_dep();
 
-	/*
-	 * We're doing the wakeup (@success == 1), they did a dequeue (p->on_rq
-	 * == 0), which means we need to do an enqueue, change p->state to
-	 * TASK_WAKING such that we can unlock p->pi_lock before doing the
-	 * enqueue, such as ttwu_queue_wakelist().
-	 */
-	WRITE_ONCE(p->__state, TASK_WAKING);
+		/*
+		 * We're doing the wakeup (@success == 1), they did a dequeue (p->on_rq
+		 * == 0), which means we need to do an enqueue, change p->state to
+		 * TASK_WAKING such that we can unlock p->pi_lock before doing the
+		 * enqueue, such as ttwu_queue_wakelist().
+		 */
+		WRITE_ONCE(p->__state, TASK_WAKING);
 
-	/*
-	 * If the owning (remote) CPU is still in the middle of schedule() with
-	 * this task as prev, considering queueing p on the remote CPUs wake_list
-	 * which potentially sends an IPI instead of spinning on p->on_cpu to
-	 * let the waker make forward progress. This is safe because IRQs are
-	 * disabled and the IPI will deliver after on_cpu is cleared.
-	 *
-	 * Ensure we load task_cpu(p) after p->on_cpu:
-	 *
-	 * set_task_cpu(p, cpu);
-	 *   STORE p->cpu = @cpu
-	 * __schedule() (switch to task 'p')
-	 *   LOCK rq->lock
-	 *   smp_mb__after_spin_lock()		smp_cond_load_acquire(&p->on_cpu)
-	 *   STORE p->on_cpu = 1		LOAD p->cpu
-	 *
-	 * to ensure we observe the correct CPU on which the task is currently
-	 * scheduling.
-	 */
-	if (smp_load_acquire(&p->on_cpu) &&
-	    ttwu_queue_wakelist(p, task_cpu(p), wake_flags))
-		goto unlock;
+		/*
+		 * If the owning (remote) CPU is still in the middle of schedule() with
+		 * this task as prev, considering queueing p on the remote CPUs wake_list
+		 * which potentially sends an IPI instead of spinning on p->on_cpu to
+		 * let the waker make forward progress. This is safe because IRQs are
+		 * disabled and the IPI will deliver after on_cpu is cleared.
+		 *
+		 * Ensure we load task_cpu(p) after p->on_cpu:
+		 *
+		 * set_task_cpu(p, cpu);
+		 *   STORE p->cpu = @cpu
+		 * __schedule() (switch to task 'p')
+		 *   LOCK rq->lock
+		 *   smp_mb__after_spin_lock()		smp_cond_load_acquire(&p->on_cpu)
+		 *   STORE p->on_cpu = 1		LOAD p->cpu
+		 *
+		 * to ensure we observe the correct CPU on which the task is currently
+		 * scheduling.
+		 */
+		if (smp_load_acquire(&p->on_cpu) &&
+		    ttwu_queue_wakelist(p, task_cpu(p), wake_flags))
+			break;
 
-	/*
-	 * If the owning (remote) CPU is still in the middle of schedule() with
-	 * this task as prev, wait until it's done referencing the task.
-	 *
-	 * Pairs with the smp_store_release() in finish_task().
-	 *
-	 * This ensures that tasks getting woken will be fully ordered against
-	 * their previous state and preserve Program Order.
-	 */
-	smp_cond_load_acquire(&p->on_cpu, !VAL);
+		/*
+		 * If the owning (remote) CPU is still in the middle of schedule() with
+		 * this task as prev, wait until it's done referencing the task.
+		 *
+		 * Pairs with the smp_store_release() in finish_task().
+		 *
+		 * This ensures that tasks getting woken will be fully ordered against
+		 * their previous state and preserve Program Order.
+		 */
+		smp_cond_load_acquire(&p->on_cpu, !VAL);
 
-	cpu = select_task_rq(p, p->wake_cpu, wake_flags | WF_TTWU);
-	if (task_cpu(p) != cpu) {
-		if (p->in_iowait) {
-			delayacct_blkio_end(p);
-			atomic_dec(&task_rq(p)->nr_iowait);
-		}
+		cpu = select_task_rq(p, p->wake_cpu, wake_flags | WF_TTWU);
+		if (task_cpu(p) != cpu) {
+			if (p->in_iowait) {
+				delayacct_blkio_end(p);
+				atomic_dec(&task_rq(p)->nr_iowait);
+			}
 
-		wake_flags |= WF_MIGRATED;
-		psi_ttwu_dequeue(p);
-		set_task_cpu(p, cpu);
-	}
+			wake_flags |= WF_MIGRATED;
+			psi_ttwu_dequeue(p);
+			set_task_cpu(p, cpu);
+		}
 #else
-	cpu = task_cpu(p);
+		cpu = task_cpu(p);
 #endif /* CONFIG_SMP */
 
-	ttwu_queue(p, cpu, wake_flags);
-unlock:
-	raw_spin_unlock_irqrestore(&p->pi_lock, flags);
+		ttwu_queue(p, cpu, wake_flags);
+	}
 out:
 	if (success)
 		ttwu_stat(p, task_cpu(p), wake_flags);
-	preempt_enable();
 
 	return success;
 }
@@ -5458,23 +5428,20 @@ unsigned int nr_iowait(void)
 void sched_exec(void)
 {
 	struct task_struct *p = current;
-	unsigned long flags;
+	struct migration_arg arg;
 	int dest_cpu;
 
-	raw_spin_lock_irqsave(&p->pi_lock, flags);
-	dest_cpu = p->sched_class->select_task_rq(p, task_cpu(p), WF_EXEC);
-	if (dest_cpu == smp_processor_id())
-		goto unlock;
+	scoped (raw_spinlock_irqsave, &p->pi_lock) {
+		dest_cpu = p->sched_class->select_task_rq(p, task_cpu(p), WF_EXEC);
+		if (dest_cpu == smp_processor_id())
+			return;
 
-	if (likely(cpu_active(dest_cpu))) {
-		struct migration_arg arg = { p, dest_cpu };
+		if (unlikely(!cpu_active(dest_cpu)))
+			return;
 
-		raw_spin_unlock_irqrestore(&p->pi_lock, flags);
-		stop_one_cpu(task_cpu(p), migration_cpu_stop, &arg);
-		return;
+		arg = (struct migration_arg){ p, dest_cpu };
 	}
-unlock:
-	raw_spin_unlock_irqrestore(&p->pi_lock, flags);
+	stop_one_cpu(task_cpu(p), migration_cpu_stop, &arg);
 }
 
 #endif
@@ -5681,9 +5648,6 @@ static void sched_tick_remote(struct wor
 	struct tick_work *twork = container_of(dwork, struct tick_work, work);
 	int cpu = twork->cpu;
 	struct rq *rq = cpu_rq(cpu);
-	struct task_struct *curr;
-	struct rq_flags rf;
-	u64 delta;
 	int os;
 
 	/*
@@ -5693,30 +5657,28 @@ static void sched_tick_remote(struct wor
 	 * statistics and checks timeslices in a time-independent way, regardless
 	 * of when exactly it is running.
 	 */
-	if (!tick_nohz_tick_stopped_cpu(cpu))
-		goto out_requeue;
+	if (tick_nohz_tick_stopped_cpu(cpu)) {
+		scoped (rq_lock_irq, rq) {
+			struct task_struct *curr = rq->curr;
 
-	rq_lock_irq(rq, &rf);
-	curr = rq->curr;
-	if (cpu_is_offline(cpu))
-		goto out_unlock;
+			if (cpu_is_offline(cpu))
+				break;
 
-	update_rq_clock(rq);
+			update_rq_clock(rq);
 
-	if (!is_idle_task(curr)) {
-		/*
-		 * Make sure the next tick runs within a reasonable
-		 * amount of time.
-		 */
-		delta = rq_clock_task(rq) - curr->se.exec_start;
-		WARN_ON_ONCE(delta > (u64)NSEC_PER_SEC * 3);
-	}
-	curr->sched_class->task_tick(rq, curr, 0);
+			if (!is_idle_task(curr)) {
+				/*
+				 * Make sure the next tick runs within a
+				 * reasonable amount of time.
+				 */
+				u64 delta = rq_clock_task(rq) - curr->se.exec_start;
+				WARN_ON_ONCE(delta > (u64)NSEC_PER_SEC * 3);
+			}
+			curr->sched_class->task_tick(rq, curr, 0);
 
-	calc_load_nohz_remote(rq);
-out_unlock:
-	rq_unlock_irq(rq, &rf);
-out_requeue:
+			calc_load_nohz_remote(rq);
+		}
+	}
 
 	/*
 	 * Run the remote tick once per second (1Hz). This arbitrary
@@ -6265,53 +6227,51 @@ static bool try_steal_cookie(int this, i
 	unsigned long cookie;
 	bool success = false;
 
-	local_irq_disable();
-	double_rq_lock(dst, src);
+	scoped (irq) {
+		guard(double_rq_lock, guard, dst, src);
 
-	cookie = dst->core->core_cookie;
-	if (!cookie)
-		goto unlock;
+		cookie = dst->core->core_cookie;
+		if (!cookie)
+			break;
 
-	if (dst->curr != dst->idle)
-		goto unlock;
+		if (dst->curr != dst->idle)
+			break;
 
-	p = sched_core_find(src, cookie);
-	if (!p)
-		goto unlock;
+		p = sched_core_find(src, cookie);
+		if (!p)
+			break;
 
-	do {
-		if (p == src->core_pick || p == src->curr)
-			goto next;
+		do {
+			if (p == src->core_pick || p == src->curr)
+				goto next;
 
-		if (!is_cpu_allowed(p, this))
-			goto next;
+			if (!is_cpu_allowed(p, this))
+				goto next;
 
-		if (p->core_occupation > dst->idle->core_occupation)
-			goto next;
-		/*
-		 * sched_core_find() and sched_core_next() will ensure that task @p
-		 * is not throttled now, we also need to check whether the runqueue
-		 * of the destination CPU is being throttled.
-		 */
-		if (sched_task_is_throttled(p, this))
-			goto next;
+			if (p->core_occupation > dst->idle->core_occupation)
+				goto next;
+			/*
+			 * sched_core_find() and sched_core_next() will ensure
+			 * that task @p is not throttled now, we also need to
+			 * check whether the runqueue of the destination CPU is
+			 * being throttled.
+			 */
+			if (sched_task_is_throttled(p, this))
+				goto next;
 
-		deactivate_task(src, p, 0);
-		set_task_cpu(p, this);
-		activate_task(dst, p, 0);
+			deactivate_task(src, p, 0);
+			set_task_cpu(p, this);
+			activate_task(dst, p, 0);
 
-		resched_curr(dst);
+			resched_curr(dst);
 
-		success = true;
-		break;
+			success = true;
+			break;
 
 next:
-		p = sched_core_next(p, cookie);
-	} while (p);
-
-unlock:
-	double_rq_unlock(dst, src);
-	local_irq_enable();
+			p = sched_core_next(p, cookie);
+		} while (p);
+	}
 
 	return success;
 }
@@ -6339,8 +6299,9 @@ static void sched_core_balance(struct rq
 	struct sched_domain *sd;
 	int cpu = cpu_of(rq);
 
-	preempt_disable();
-	rcu_read_lock();
+	guard(preempt, pg);
+	guard(rcu, rg);
+
 	raw_spin_rq_unlock_irq(rq);
 	for_each_domain(cpu, sd) {
 		if (need_resched())
@@ -6350,8 +6311,6 @@ static void sched_core_balance(struct rq
 			break;
 	}
 	raw_spin_rq_lock_irq(rq);
-	rcu_read_unlock();
-	preempt_enable();
 }
 
 static DEFINE_PER_CPU(struct balance_callback, core_balance_head);
@@ -6370,20 +6329,24 @@ static void queue_core_balance(struct rq
 	queue_balance_callback(rq, &per_cpu(core_balance_head, rq->cpu), sched_core_balance);
 }
 
+DEFINE_LOCK_GUARD_1(core_lock, int,
+		    sched_core_lock(*_G->lock, &_G->flags),
+		    sched_core_unlock(*_G->lock, &_G->flags),
+		    unsigned long flags)
+
 static void sched_core_cpu_starting(unsigned int cpu)
 {
 	const struct cpumask *smt_mask = cpu_smt_mask(cpu);
 	struct rq *rq = cpu_rq(cpu), *core_rq = NULL;
-	unsigned long flags;
 	int t;
 
-	sched_core_lock(cpu, &flags);
+	guard(core_lock, guard, &cpu);
 
 	WARN_ON_ONCE(rq->core != rq);
 
 	/* if we're the first, we'll be our own leader */
 	if (cpumask_weight(smt_mask) == 1)
-		goto unlock;
+		return;
 
 	/* find the leader */
 	for_each_cpu(t, smt_mask) {
@@ -6397,7 +6360,7 @@ static void sched_core_cpu_starting(unsi
 	}
 
 	if (WARN_ON_ONCE(!core_rq)) /* whoopsie */
-		goto unlock;
+		return;
 
 	/* install and validate core_rq */
 	for_each_cpu(t, smt_mask) {
@@ -6408,29 +6371,25 @@ static void sched_core_cpu_starting(unsi
 
 		WARN_ON_ONCE(rq->core != core_rq);
 	}
-
-unlock:
-	sched_core_unlock(cpu, &flags);
 }
 
 static void sched_core_cpu_deactivate(unsigned int cpu)
 {
 	const struct cpumask *smt_mask = cpu_smt_mask(cpu);
 	struct rq *rq = cpu_rq(cpu), *core_rq = NULL;
-	unsigned long flags;
 	int t;
 
-	sched_core_lock(cpu, &flags);
+	guard(core_lock, guard, &cpu);
 
 	/* if we're the last man standing, nothing to do */
 	if (cpumask_weight(smt_mask) == 1) {
 		WARN_ON_ONCE(rq->core != rq);
-		goto unlock;
+		return;
 	}
 
 	/* if we're not the leader, nothing to do */
 	if (rq->core != rq)
-		goto unlock;
+		return;
 
 	/* find a new leader */
 	for_each_cpu(t, smt_mask) {
@@ -6441,7 +6400,7 @@ static void sched_core_cpu_deactivate(un
 	}
 
 	if (WARN_ON_ONCE(!core_rq)) /* impossible */
-		goto unlock;
+		return;
 
 	/* copy the shared state to the new leader */
 	core_rq->core_task_seq             = rq->core_task_seq;
@@ -6463,9 +6422,6 @@ static void sched_core_cpu_deactivate(un
 		rq = cpu_rq(t);
 		rq->core = core_rq;
 	}
-
-unlock:
-	sched_core_unlock(cpu, &flags);
 }
 
 static inline void sched_core_cpu_dying(unsigned int cpu)
@@ -7162,8 +7118,6 @@ void set_user_nice(struct task_struct *p
 {
 	bool queued, running;
 	int old_prio;
-	struct rq_flags rf;
-	struct rq *rq;
 
 	if (task_nice(p) == nice || nice < MIN_NICE || nice > MAX_NICE)
 		return;
@@ -7171,44 +7125,45 @@ void set_user_nice(struct task_struct *p
 	 * We have to be careful, if called from sys_setpriority(),
 	 * the task might be in the middle of scheduling on another CPU.
 	 */
-	rq = task_rq_lock(p, &rf);
-	update_rq_clock(rq);
+	scoped (task_rq_lock, p) {
+		struct rq *rq = _scope.rq;
 
-	/*
-	 * The RT priorities are set via sched_setscheduler(), but we still
-	 * allow the 'normal' nice value to be set - but as expected
-	 * it won't have any effect on scheduling until the task is
-	 * SCHED_DEADLINE, SCHED_FIFO or SCHED_RR:
-	 */
-	if (task_has_dl_policy(p) || task_has_rt_policy(p)) {
-		p->static_prio = NICE_TO_PRIO(nice);
-		goto out_unlock;
-	}
-	queued = task_on_rq_queued(p);
-	running = task_current(rq, p);
-	if (queued)
-		dequeue_task(rq, p, DEQUEUE_SAVE | DEQUEUE_NOCLOCK);
-	if (running)
-		put_prev_task(rq, p);
+		update_rq_clock(rq);
 
-	p->static_prio = NICE_TO_PRIO(nice);
-	set_load_weight(p, true);
-	old_prio = p->prio;
-	p->prio = effective_prio(p);
+		/*
+		 * The RT priorities are set via sched_setscheduler(), but we still
+		 * allow the 'normal' nice value to be set - but as expected
+		 * it won't have any effect on scheduling until the task is
+		 * SCHED_DEADLINE, SCHED_FIFO or SCHED_RR:
+		 */
+		if (task_has_dl_policy(p) || task_has_rt_policy(p)) {
+			p->static_prio = NICE_TO_PRIO(nice);
+			return;
+		}
 
-	if (queued)
-		enqueue_task(rq, p, ENQUEUE_RESTORE | ENQUEUE_NOCLOCK);
-	if (running)
-		set_next_task(rq, p);
+		queued = task_on_rq_queued(p);
+		running = task_current(rq, p);
+		if (queued)
+			dequeue_task(rq, p, DEQUEUE_SAVE | DEQUEUE_NOCLOCK);
+		if (running)
+			put_prev_task(rq, p);
 
-	/*
-	 * If the task increased its priority or is running and
-	 * lowered its priority, then reschedule its CPU:
-	 */
-	p->sched_class->prio_changed(rq, p, old_prio);
+		p->static_prio = NICE_TO_PRIO(nice);
+		set_load_weight(p, true);
+		old_prio = p->prio;
+		p->prio = effective_prio(p);
 
-out_unlock:
-	task_rq_unlock(rq, p, &rf);
+		if (queued)
+			enqueue_task(rq, p, ENQUEUE_RESTORE | ENQUEUE_NOCLOCK);
+		if (running)
+			set_next_task(rq, p);
+
+		/*
+		 * If the task increased its priority or is running and
+		 * lowered its priority, then reschedule its CPU:
+		 */
+		p->sched_class->prio_changed(rq, p, old_prio);
+	}
 }
 EXPORT_SYMBOL(set_user_nice);
 
@@ -7468,6 +7423,19 @@ static struct task_struct *find_process_
 	return pid ? find_task_by_vpid(pid) : current;
 }
 
+static struct task_struct *find_get_task(pid_t pid)
+{
+	struct task_struct *p;
+
+	scoped (rcu) {
+		p = find_process_by_pid(pid);
+		if (likely(p))
+			get_task_struct(p);
+	}
+
+	return p;
+}
+
 /*
  * sched_setparam() passes in -1 for its policy, to let the functions
  * it calls know not to change it.
@@ -7505,14 +7473,11 @@ static void __setscheduler_params(struct
 static bool check_same_owner(struct task_struct *p)
 {
 	const struct cred *cred = current_cred(), *pcred;
-	bool match;
+	guard(rcu, guard);
 
-	rcu_read_lock();
 	pcred = __task_cred(p);
-	match = (uid_eq(cred->euid, pcred->euid) ||
-		 uid_eq(cred->euid, pcred->uid));
-	rcu_read_unlock();
-	return match;
+	return (uid_eq(cred->euid, pcred->euid) ||
+		uid_eq(cred->euid, pcred->uid));
 }
 
 /*
@@ -7915,28 +7880,19 @@ EXPORT_SYMBOL_GPL(sched_set_normal);
 static int
 do_sched_setscheduler(pid_t pid, int policy, struct sched_param __user *param)
 {
+	ptr_guard(put_task, p) = NULL;
 	struct sched_param lparam;
-	struct task_struct *p;
-	int retval;
 
 	if (!param || pid < 0)
 		return -EINVAL;
 	if (copy_from_user(&lparam, param, sizeof(struct sched_param)))
 		return -EFAULT;
 
-	rcu_read_lock();
-	retval = -ESRCH;
-	p = find_process_by_pid(pid);
-	if (likely(p))
-		get_task_struct(p);
-	rcu_read_unlock();
-
-	if (likely(p)) {
-		retval = sched_setscheduler(p, policy, &lparam);
-		put_task_struct(p);
-	}
+	p = find_get_task(pid);
+	if (!p)
+		return -ESRCH;
 
-	return retval;
+	return sched_setscheduler(p, policy, &lparam);
 }
 
 /*
@@ -8031,8 +7987,8 @@ SYSCALL_DEFINE2(sched_setparam, pid_t, p
 SYSCALL_DEFINE3(sched_setattr, pid_t, pid, struct sched_attr __user *, uattr,
 			       unsigned int, flags)
 {
+	ptr_guard(put_task, p) = NULL;
 	struct sched_attr attr;
-	struct task_struct *p;
 	int retval;
 
 	if (!uattr || pid < 0 || flags)
@@ -8047,21 +8003,14 @@ SYSCALL_DEFINE3(sched_setattr, pid_t, pi
 	if (attr.sched_flags & SCHED_FLAG_KEEP_POLICY)
 		attr.sched_policy = SETPARAM_POLICY;
 
-	rcu_read_lock();
-	retval = -ESRCH;
-	p = find_process_by_pid(pid);
-	if (likely(p))
-		get_task_struct(p);
-	rcu_read_unlock();
+	p = find_get_task(pid);
+	if (!p)
+		return -ESRCH;
 
-	if (likely(p)) {
-		if (attr.sched_flags & SCHED_FLAG_KEEP_PARAMS)
-			get_params(p, &attr);
-		retval = sched_setattr(p, &attr);
-		put_task_struct(p);
-	}
+	if (attr.sched_flags & SCHED_FLAG_KEEP_PARAMS)
+		get_params(p, &attr);
 
-	return retval;
+	return sched_setattr(p, &attr);
 }
 
 /**
@@ -8079,16 +8028,19 @@ SYSCALL_DEFINE1(sched_getscheduler, pid_
 	if (pid < 0)
 		return -EINVAL;
 
-	retval = -ESRCH;
-	rcu_read_lock();
-	p = find_process_by_pid(pid);
-	if (p) {
+	scoped (rcu) {
+		p = find_process_by_pid(pid);
+		if (!p)
+			return -ESRCH;
+
 		retval = security_task_getscheduler(p);
-		if (!retval)
-			retval = p->policy
-				| (p->sched_reset_on_fork ? SCHED_RESET_ON_FORK : 0);
+		if (!retval) {
+			retval = p->policy;
+			if (p->sched_reset_on_fork)
+				retval |= SCHED_RESET_ON_FORK;
+		}
 	}
-	rcu_read_unlock();
+
 	return retval;
 }
 
@@ -8109,30 +8061,23 @@ SYSCALL_DEFINE2(sched_getparam, pid_t, p
 	if (!param || pid < 0)
 		return -EINVAL;
 
-	rcu_read_lock();
-	p = find_process_by_pid(pid);
-	retval = -ESRCH;
-	if (!p)
-		goto out_unlock;
+	scoped (rcu) {
+		p = find_process_by_pid(pid);
+		if (!p)
+			return -ESRCH;
 
-	retval = security_task_getscheduler(p);
-	if (retval)
-		goto out_unlock;
+		retval = security_task_getscheduler(p);
+		if (retval)
+			return retval;
 
-	if (task_has_rt_policy(p))
-		lp.sched_priority = p->rt_priority;
-	rcu_read_unlock();
+		if (task_has_rt_policy(p))
+			lp.sched_priority = p->rt_priority;
+	}
 
 	/*
 	 * This one might sleep, we cannot do it with a spinlock held ...
 	 */
-	retval = copy_to_user(param, &lp, sizeof(*param)) ? -EFAULT : 0;
-
-	return retval;
-
-out_unlock:
-	rcu_read_unlock();
-	return retval;
+	return copy_to_user(param, &lp, sizeof(*param)) ? -EFAULT : 0;
 }
 
 /*
@@ -8192,39 +8137,33 @@ SYSCALL_DEFINE4(sched_getattr, pid_t, pi
 	    usize < SCHED_ATTR_SIZE_VER0 || flags)
 		return -EINVAL;
 
-	rcu_read_lock();
-	p = find_process_by_pid(pid);
-	retval = -ESRCH;
-	if (!p)
-		goto out_unlock;
+	scoped (rcu) {
+		p = find_process_by_pid(pid);
+		if (!p)
+			return -ESRCH;
 
-	retval = security_task_getscheduler(p);
-	if (retval)
-		goto out_unlock;
+		retval = security_task_getscheduler(p);
+		if (retval)
+			return retval;
 
-	kattr.sched_policy = p->policy;
-	if (p->sched_reset_on_fork)
-		kattr.sched_flags |= SCHED_FLAG_RESET_ON_FORK;
-	get_params(p, &kattr);
-	kattr.sched_flags &= SCHED_FLAG_ALL;
+		kattr.sched_policy = p->policy;
+		if (p->sched_reset_on_fork)
+			kattr.sched_flags |= SCHED_FLAG_RESET_ON_FORK;
+		get_params(p, &kattr);
+		kattr.sched_flags &= SCHED_FLAG_ALL;
 
 #ifdef CONFIG_UCLAMP_TASK
-	/*
-	 * This could race with another potential updater, but this is fine
-	 * because it'll correctly read the old or the new value. We don't need
-	 * to guarantee who wins the race as long as it doesn't return garbage.
-	 */
-	kattr.sched_util_min = p->uclamp_req[UCLAMP_MIN].value;
-	kattr.sched_util_max = p->uclamp_req[UCLAMP_MAX].value;
+		/*
+		 * This could race with another potential updater, but this is fine
+		 * because it'll correctly read the old or the new value. We don't need
+		 * to guarantee who wins the race as long as it doesn't return garbage.
+		 */
+		kattr.sched_util_min = p->uclamp_req[UCLAMP_MIN].value;
+		kattr.sched_util_max = p->uclamp_req[UCLAMP_MAX].value;
 #endif
-
-	rcu_read_unlock();
+	}
 
 	return sched_attr_copy_to_user(uattr, &kattr, usize);
-
-out_unlock:
-	rcu_read_unlock();
-	return retval;
 }
 
 #ifdef CONFIG_SMP
@@ -8245,10 +8184,10 @@ int dl_task_check_affinity(struct task_s
 	 * tasks allowed to run on all the CPUs in the task's
 	 * root_domain.
 	 */
-	rcu_read_lock();
-	if (!cpumask_subset(task_rq(p)->rd->span, mask))
-		ret = -EBUSY;
-	rcu_read_unlock();
+	scoped (rcu) {
+		if (!cpumask_subset(task_rq(p)->rd->span, mask))
+			ret = -EBUSY;
+	}
 	return ret;
 }
 #endif
@@ -8317,41 +8256,27 @@ __sched_setaffinity(struct task_struct *
 
 long sched_setaffinity(pid_t pid, const struct cpumask *in_mask)
 {
+	ptr_guard(put_task, p) = NULL;
 	struct affinity_context ac;
 	struct cpumask *user_mask;
-	struct task_struct *p;
 	int retval;
 
-	rcu_read_lock();
-
-	p = find_process_by_pid(pid);
-	if (!p) {
-		rcu_read_unlock();
+	p = find_get_task(pid);
+	if (!p)
 		return -ESRCH;
-	}
-
-	/* Prevent p going away */
-	get_task_struct(p);
-	rcu_read_unlock();
 
-	if (p->flags & PF_NO_SETAFFINITY) {
-		retval = -EINVAL;
-		goto out_put_task;
-	}
+	if (p->flags & PF_NO_SETAFFINITY)
+		return -EINVAL;
 
 	if (!check_same_owner(p)) {
-		rcu_read_lock();
-		if (!ns_capable(__task_cred(p)->user_ns, CAP_SYS_NICE)) {
-			rcu_read_unlock();
-			retval = -EPERM;
-			goto out_put_task;
-		}
-		rcu_read_unlock();
+		guard(rcu, rg);
+		if (!ns_capable(__task_cred(p)->user_ns, CAP_SYS_NICE))
+			return -EPERM;
 	}
 
 	retval = security_task_setscheduler(p);
 	if (retval)
-		goto out_put_task;
+		return retval;
 
 	/*
 	 * With non-SMP configs, user_cpus_ptr/user_mask isn't used and
@@ -8361,8 +8286,7 @@ long sched_setaffinity(pid_t pid, const
 	if (user_mask) {
 		cpumask_copy(user_mask, in_mask);
 	} else if (IS_ENABLED(CONFIG_SMP)) {
-		retval = -ENOMEM;
-		goto out_put_task;
+		return -ENOMEM;
 	}
 
 	ac = (struct affinity_context){
@@ -8374,8 +8298,6 @@ long sched_setaffinity(pid_t pid, const
 	retval = __sched_setaffinity(p, &ac);
 	kfree(ac.user_mask);
 
-out_put_task:
-	put_task_struct(p);
 	return retval;
 }
 
@@ -8417,28 +8339,22 @@ SYSCALL_DEFINE3(sched_setaffinity, pid_t
 long sched_getaffinity(pid_t pid, struct cpumask *mask)
 {
 	struct task_struct *p;
-	unsigned long flags;
 	int retval;
 
-	rcu_read_lock();
+	scoped (rcu) {
+		p = find_process_by_pid(pid);
+		if (!p)
+			return -ESRCH;
 
-	retval = -ESRCH;
-	p = find_process_by_pid(pid);
-	if (!p)
-		goto out_unlock;
-
-	retval = security_task_getscheduler(p);
-	if (retval)
-		goto out_unlock;
-
-	raw_spin_lock_irqsave(&p->pi_lock, flags);
-	cpumask_and(mask, &p->cpus_mask, cpu_active_mask);
-	raw_spin_unlock_irqrestore(&p->pi_lock, flags);
+		retval = security_task_getscheduler(p);
+		if (retval)
+			return retval;
 
-out_unlock:
-	rcu_read_unlock();
+		scoped (raw_spinlock_irqsave, &p->pi_lock)
+			cpumask_and(mask, &p->cpus_mask, cpu_active_mask);
+	}
 
-	return retval;
+	return 0;
 }
 
 /**
@@ -8885,55 +8801,47 @@ int __sched yield_to(struct task_struct
 {
 	struct task_struct *curr = current;
 	struct rq *rq, *p_rq;
-	unsigned long flags;
 	int yielded = 0;
 
-	local_irq_save(flags);
-	rq = this_rq();
+	scoped (irqsave) {
+		rq = this_rq();
 
 again:
-	p_rq = task_rq(p);
-	/*
-	 * If we're the only runnable task on the rq and target rq also
-	 * has only one task, there's absolutely no point in yielding.
-	 */
-	if (rq->nr_running == 1 && p_rq->nr_running == 1) {
-		yielded = -ESRCH;
-		goto out_irq;
-	}
+		p_rq = task_rq(p);
+		/*
+		 * If we're the only runnable task on the rq and target rq also
+		 * has only one task, there's absolutely no point in yielding.
+		 */
+		if (rq->nr_running == 1 && p_rq->nr_running == 1)
+			return -ESRCH;
 
-	double_rq_lock(rq, p_rq);
-	if (task_rq(p) != p_rq) {
-		double_rq_unlock(rq, p_rq);
-		goto again;
-	}
+		scoped (double_rq_lock, rq, p_rq) {
+			if (task_rq(p) != p_rq)
+				goto again;
 
-	if (!curr->sched_class->yield_to_task)
-		goto out_unlock;
+			if (!curr->sched_class->yield_to_task)
+				return 0;
 
-	if (curr->sched_class != p->sched_class)
-		goto out_unlock;
+			if (curr->sched_class != p->sched_class)
+				return 0;
 
-	if (task_on_cpu(p_rq, p) || !task_is_running(p))
-		goto out_unlock;
+			if (task_on_cpu(p_rq, p) || !task_is_running(p))
+				return 0;
 
-	yielded = curr->sched_class->yield_to_task(rq, p);
-	if (yielded) {
-		schedstat_inc(rq->yld_count);
-		/*
-		 * Make p's CPU reschedule; pick_next_entity takes care of
-		 * fairness.
-		 */
-		if (preempt && rq != p_rq)
-			resched_curr(p_rq);
+			yielded = curr->sched_class->yield_to_task(rq, p);
+			if (yielded) {
+				schedstat_inc(rq->yld_count);
+				/*
+				 * Make p's CPU reschedule; pick_next_entity
+				 * takes care of fairness.
+				 */
+				if (preempt && rq != p_rq)
+					resched_curr(p_rq);
+			}
+		}
 	}
 
-out_unlock:
-	double_rq_unlock(rq, p_rq);
-out_irq:
-	local_irq_restore(flags);
-
-	if (yielded > 0)
+	if (yielded)
 		schedule();
 
 	return yielded;
@@ -9036,38 +8944,30 @@ SYSCALL_DEFINE1(sched_get_priority_min,
 
 static int sched_rr_get_interval(pid_t pid, struct timespec64 *t)
 {
-	struct task_struct *p;
-	unsigned int time_slice;
-	struct rq_flags rf;
-	struct rq *rq;
+	unsigned int time_slice = 0;
 	int retval;
 
 	if (pid < 0)
 		return -EINVAL;
 
-	retval = -ESRCH;
-	rcu_read_lock();
-	p = find_process_by_pid(pid);
-	if (!p)
-		goto out_unlock;
+	scoped (rcu) {
+		struct task_struct *p = find_process_by_pid(pid);
+		if (!p)
+			return -ESRCH;
 
-	retval = security_task_getscheduler(p);
-	if (retval)
-		goto out_unlock;
+		retval = security_task_getscheduler(p);
+		if (retval)
+			return retval;
 
-	rq = task_rq_lock(p, &rf);
-	time_slice = 0;
-	if (p->sched_class->get_rr_interval)
-		time_slice = p->sched_class->get_rr_interval(rq, p);
-	task_rq_unlock(rq, p, &rf);
+		scoped (task_rq_lock, p) {
+			struct rq *rq = _scope.rq;
+			if (p->sched_class->get_rr_interval)
+				time_slice = p->sched_class->get_rr_interval(rq, p);
+		}
+	}
 
-	rcu_read_unlock();
 	jiffies_to_timespec64(time_slice, t);
 	return 0;
-
-out_unlock:
-	rcu_read_unlock();
-	return retval;
 }
 
 /**
@@ -9300,10 +9200,8 @@ int task_can_attach(struct task_struct *
 	 * success of set_cpus_allowed_ptr() on all attached tasks
 	 * before cpus_mask may be changed.
 	 */
-	if (p->flags & PF_NO_SETAFFINITY) {
-		ret = -EINVAL;
-		goto out;
-	}
+	if (p->flags & PF_NO_SETAFFINITY)
+		return -EINVAL;
 
 	if (dl_task(p) && !cpumask_intersects(task_rq(p)->rd->span,
 					      cs_effective_cpus)) {
@@ -9314,7 +9212,6 @@ int task_can_attach(struct task_struct *
 		ret = dl_cpu_busy(cpu, p);
 	}
 
-out:
 	return ret;
 }
 
@@ -10464,17 +10361,18 @@ void sched_move_task(struct task_struct
 	int queued, running, queue_flags =
 		DEQUEUE_SAVE | DEQUEUE_MOVE | DEQUEUE_NOCLOCK;
 	struct task_group *group;
-	struct rq_flags rf;
 	struct rq *rq;
 
-	rq = task_rq_lock(tsk, &rf);
+	guard(task_rq_lock, guard, tsk);
+	rq = guard.rq;
+
 	/*
 	 * Esp. with SCHED_AUTOGROUP enabled it is possible to get superfluous
 	 * group changes.
 	 */
 	group = sched_get_task_group(tsk);
 	if (group == tsk->sched_task_group)
-		goto unlock;
+		return;
 
 	update_rq_clock(rq);
 
@@ -10499,9 +10397,6 @@ void sched_move_task(struct task_struct
 		 */
 		resched_curr(rq);
 	}
-
-unlock:
-	task_rq_unlock(rq, tsk, &rf);
 }
 
 static inline struct task_group *css_tg(struct cgroup_subsys_state *css)
@@ -10538,11 +10433,10 @@ static int cpu_cgroup_css_online(struct
 
 #ifdef CONFIG_UCLAMP_TASK_GROUP
 	/* Propagate the effective uclamp value for the new group */
-	mutex_lock(&uclamp_mutex);
-	rcu_read_lock();
-	cpu_util_update_eff(css);
-	rcu_read_unlock();
-	mutex_unlock(&uclamp_mutex);
+	scoped (mutex, &uclamp_mutex) {
+		guard(rcu, guard);
+		cpu_util_update_eff(css);
+	}
 #endif
 
 	return 0;
@@ -10693,24 +10587,22 @@ static ssize_t cpu_uclamp_write(struct k
 
 	static_branch_enable(&sched_uclamp_used);
 
-	mutex_lock(&uclamp_mutex);
-	rcu_read_lock();
-
-	tg = css_tg(of_css(of));
-	if (tg->uclamp_req[clamp_id].value != req.util)
-		uclamp_se_set(&tg->uclamp_req[clamp_id], req.util, false);
+	scoped (mutex, &uclamp_mutex) {
+		guard(rcu, guard);
 
-	/*
-	 * Because of not recoverable conversion rounding we keep track of the
-	 * exact requested value
-	 */
-	tg->uclamp_pct[clamp_id] = req.percent;
+		tg = css_tg(of_css(of));
+		if (tg->uclamp_req[clamp_id].value != req.util)
+			uclamp_se_set(&tg->uclamp_req[clamp_id], req.util, false);
 
-	/* Update effective clamps to track the most restrictive value */
-	cpu_util_update_eff(of_css(of));
+		/*
+		 * Because of not recoverable conversion rounding we keep track of the
+		 * exact requested value
+		 */
+		tg->uclamp_pct[clamp_id] = req.percent;
 
-	rcu_read_unlock();
-	mutex_unlock(&uclamp_mutex);
+		/* Update effective clamps to track the most restrictive value */
+		cpu_util_update_eff(of_css(of));
+	}
 
 	return nbytes;
 }
@@ -10737,10 +10629,10 @@ static inline void cpu_uclamp_print(stru
 	u64 percent;
 	u32 rem;
 
-	rcu_read_lock();
-	tg = css_tg(seq_css(sf));
-	util_clamp = tg->uclamp_req[clamp_id].value;
-	rcu_read_unlock();
+	scoped (rcu) {
+		tg = css_tg(seq_css(sf));
+		util_clamp = tg->uclamp_req[clamp_id].value;
+	}
 
 	if (util_clamp == SCHED_CAPACITY_SCALE) {
 		seq_puts(sf, "max\n");
@@ -10792,6 +10684,8 @@ static const u64 max_cfs_runtime = MAX_B
 
 static int __cfs_schedulable(struct task_group *tg, u64 period, u64 runtime);
 
+DEFINE_VOID_GUARD(cpus_read_lock, cpus_read_lock(), cpus_read_unlock())
+
 static int tg_set_cfs_bandwidth(struct task_group *tg, u64 period, u64 quota,
 				u64 burst)
 {
@@ -10831,53 +10725,54 @@ static int tg_set_cfs_bandwidth(struct t
 	 * Prevent race between setting of cfs_rq->runtime_enabled and
 	 * unthrottle_offline_cfs_rqs().
 	 */
-	cpus_read_lock();
-	mutex_lock(&cfs_constraints_mutex);
-	ret = __cfs_schedulable(tg, period, quota);
-	if (ret)
-		goto out_unlock;
+	scoped (cpus_read_lock) {
+		guard(mutex, guard, &cfs_constraints_mutex);
 
-	runtime_enabled = quota != RUNTIME_INF;
-	runtime_was_enabled = cfs_b->quota != RUNTIME_INF;
-	/*
-	 * If we need to toggle cfs_bandwidth_used, off->on must occur
-	 * before making related changes, and on->off must occur afterwards
-	 */
-	if (runtime_enabled && !runtime_was_enabled)
-		cfs_bandwidth_usage_inc();
-	raw_spin_lock_irq(&cfs_b->lock);
-	cfs_b->period = ns_to_ktime(period);
-	cfs_b->quota = quota;
-	cfs_b->burst = burst;
-
-	__refill_cfs_bandwidth_runtime(cfs_b);
-
-	/* Restart the period timer (if active) to handle new period expiry: */
-	if (runtime_enabled)
-		start_cfs_bandwidth(cfs_b);
-
-	raw_spin_unlock_irq(&cfs_b->lock);
-
-	for_each_online_cpu(i) {
-		struct cfs_rq *cfs_rq = tg->cfs_rq[i];
-		struct rq *rq = cfs_rq->rq;
-		struct rq_flags rf;
-
-		rq_lock_irq(rq, &rf);
-		cfs_rq->runtime_enabled = runtime_enabled;
-		cfs_rq->runtime_remaining = 0;
-
-		if (cfs_rq->throttled)
-			unthrottle_cfs_rq(cfs_rq);
-		rq_unlock_irq(rq, &rf);
+		ret = __cfs_schedulable(tg, period, quota);
+		if (ret)
+			return ret;
+
+		runtime_enabled = quota != RUNTIME_INF;
+		runtime_was_enabled = cfs_b->quota != RUNTIME_INF;
+		/*
+		 * If we need to toggle cfs_bandwidth_used, off->on must occur
+		 * before making related changes, and on->off must occur afterwards
+		 */
+		if (runtime_enabled && !runtime_was_enabled)
+			cfs_bandwidth_usage_inc();
+
+		scoped (raw_spinlock_irq, &cfs_b->lock) {
+			cfs_b->period = ns_to_ktime(period);
+			cfs_b->quota = quota;
+			cfs_b->burst = burst;
+
+			__refill_cfs_bandwidth_runtime(cfs_b);
+
+			/*
+			 * Restart the period timer (if active) to handle new
+			 * period expiry:
+			 */
+			if (runtime_enabled)
+				start_cfs_bandwidth(cfs_b);
+		}
+
+		for_each_online_cpu(i) {
+			struct cfs_rq *cfs_rq = tg->cfs_rq[i];
+			struct rq *rq = cfs_rq->rq;
+
+			scoped (rq_lock_irq, rq) {
+				cfs_rq->runtime_enabled = runtime_enabled;
+				cfs_rq->runtime_remaining = 0;
+
+				if (cfs_rq->throttled)
+					unthrottle_cfs_rq(cfs_rq);
+			}
+		}
+		if (runtime_was_enabled && !runtime_enabled)
+			cfs_bandwidth_usage_dec();
 	}
-	if (runtime_was_enabled && !runtime_enabled)
-		cfs_bandwidth_usage_dec();
-out_unlock:
-	mutex_unlock(&cfs_constraints_mutex);
-	cpus_read_unlock();
 
-	return ret;
+	return 0;
 }
 
 static int tg_set_cfs_quota(struct task_group *tg, long cfs_quota_us)
@@ -11069,9 +10964,8 @@ static int __cfs_schedulable(struct task
 		do_div(data.quota, NSEC_PER_USEC);
 	}
 
-	rcu_read_lock();
-	ret = walk_tg_tree(tg_cfs_schedulable_down, tg_nop, &data);
-	rcu_read_unlock();
+	scoped (rcu)
+		ret = walk_tg_tree(tg_cfs_schedulable_down, tg_nop, &data);
 
 	return ret;
 }
@@ -11634,14 +11528,13 @@ int __sched_mm_cid_migrate_from_fetch_ci
 	 * are not the last task to be migrated from this cpu for this mm, so
 	 * there is no need to move src_cid to the destination cpu.
 	 */
-	rcu_read_lock();
-	src_task = rcu_dereference(src_rq->curr);
-	if (READ_ONCE(src_task->mm_cid_active) && src_task->mm == mm) {
-		rcu_read_unlock();
-		t->last_mm_cid = -1;
-		return -1;
+	scoped (rcu) {
+		src_task = rcu_dereference(src_rq->curr);
+		if (READ_ONCE(src_task->mm_cid_active) && src_task->mm == mm) {
+			t->last_mm_cid = -1;
+			return -1;
+		}
 	}
-	rcu_read_unlock();
 
 	return src_cid;
 }
@@ -11685,18 +11578,17 @@ int __sched_mm_cid_migrate_from_try_stea
 	 * the lazy-put flag, this task will be responsible for transitioning
 	 * from lazy-put flag set to MM_CID_UNSET.
 	 */
-	rcu_read_lock();
-	src_task = rcu_dereference(src_rq->curr);
-	if (READ_ONCE(src_task->mm_cid_active) && src_task->mm == mm) {
-		rcu_read_unlock();
-		/*
-		 * We observed an active task for this mm, there is therefore
-		 * no point in moving this cid to the destination cpu.
-		 */
-		t->last_mm_cid = -1;
-		return -1;
+	scoped (rcu) {
+		src_task = rcu_dereference(src_rq->curr);
+		if (READ_ONCE(src_task->mm_cid_active) && src_task->mm == mm) {
+			/*
+			 * We observed an active task for this mm, there is therefore
+			 * no point in moving this cid to the destination cpu.
+			 */
+			t->last_mm_cid = -1;
+			return -1;
+		}
 	}
-	rcu_read_unlock();
 
 	/*
 	 * The src_cid is unused, so it can be unset.
@@ -11769,7 +11661,6 @@ static void sched_mm_cid_remote_clear(st
 {
 	struct rq *rq = cpu_rq(cpu);
 	struct task_struct *t;
-	unsigned long flags;
 	int cid, lazy_cid;
 
 	cid = READ_ONCE(pcpu_cid->cid);
@@ -11804,23 +11695,21 @@ static void sched_mm_cid_remote_clear(st
 	 * the lazy-put flag, that task will be responsible for transitioning
 	 * from lazy-put flag set to MM_CID_UNSET.
 	 */
-	rcu_read_lock();
-	t = rcu_dereference(rq->curr);
-	if (READ_ONCE(t->mm_cid_active) && t->mm == mm) {
-		rcu_read_unlock();
-		return;
+	scoped (rcu) {
+		t = rcu_dereference(rq->curr);
+		if (READ_ONCE(t->mm_cid_active) && t->mm == mm)
+			return;
 	}
-	rcu_read_unlock();
 
 	/*
 	 * The cid is unused, so it can be unset.
 	 * Disable interrupts to keep the window of cid ownership without rq
 	 * lock small.
 	 */
-	local_irq_save(flags);
-	if (try_cmpxchg(&pcpu_cid->cid, &lazy_cid, MM_CID_UNSET))
-		__mm_cid_put(mm, cid);
-	local_irq_restore(flags);
+	scoped (irqsave) {
+		if (try_cmpxchg(&pcpu_cid->cid, &lazy_cid, MM_CID_UNSET))
+			__mm_cid_put(mm, cid);
+	}
 }
 
 static void sched_mm_cid_remote_clear_old(struct mm_struct *mm, int cpu)
@@ -11842,14 +11731,13 @@ static void sched_mm_cid_remote_clear_ol
 	 * snapshot associated with this cid if an active task using the mm is
 	 * observed on this rq.
 	 */
-	rcu_read_lock();
-	curr = rcu_dereference(rq->curr);
-	if (READ_ONCE(curr->mm_cid_active) && curr->mm == mm) {
-		WRITE_ONCE(pcpu_cid->time, rq_clock);
-		rcu_read_unlock();
-		return;
+	scoped (rcu) {
+		curr = rcu_dereference(rq->curr);
+		if (READ_ONCE(curr->mm_cid_active) && curr->mm == mm) {
+			WRITE_ONCE(pcpu_cid->time, rq_clock);
+			return;
+		}
 	}
-	rcu_read_unlock();
 
 	if (rq_clock < pcpu_cid->time + SCHED_MM_CID_PERIOD_NS)
 		return;
@@ -11943,7 +11831,6 @@ void task_tick_mm_cid(struct rq *rq, str
 void sched_mm_cid_exit_signals(struct task_struct *t)
 {
 	struct mm_struct *mm = t->mm;
-	struct rq_flags rf;
 	struct rq *rq;
 
 	if (!mm)
@@ -11951,23 +11838,22 @@ void sched_mm_cid_exit_signals(struct ta
 
 	preempt_disable();
 	rq = this_rq();
-	rq_lock_irqsave(rq, &rf);
-	preempt_enable_no_resched();	/* holding spinlock */
-	WRITE_ONCE(t->mm_cid_active, 0);
-	/*
-	 * Store t->mm_cid_active before loading per-mm/cpu cid.
-	 * Matches barrier in sched_mm_cid_remote_clear_old().
-	 */
-	smp_mb();
-	mm_cid_put(mm);
-	t->last_mm_cid = t->mm_cid = -1;
-	rq_unlock_irqrestore(rq, &rf);
+	scoped (rq_lock_irqsave, rq) {
+		preempt_enable_no_resched();	/* holding spinlock */
+		WRITE_ONCE(t->mm_cid_active, 0);
+		/*
+		 * Store t->mm_cid_active before loading per-mm/cpu cid.
+		 * Matches barrier in sched_mm_cid_remote_clear_old().
+		 */
+		smp_mb();
+		mm_cid_put(mm);
+		t->last_mm_cid = t->mm_cid = -1;
+	}
 }
 
 void sched_mm_cid_before_execve(struct task_struct *t)
 {
 	struct mm_struct *mm = t->mm;
-	struct rq_flags rf;
 	struct rq *rq;
 
 	if (!mm)
@@ -11975,23 +11861,22 @@ void sched_mm_cid_before_execve(struct t
 
 	preempt_disable();
 	rq = this_rq();
-	rq_lock_irqsave(rq, &rf);
-	preempt_enable_no_resched();	/* holding spinlock */
-	WRITE_ONCE(t->mm_cid_active, 0);
-	/*
-	 * Store t->mm_cid_active before loading per-mm/cpu cid.
-	 * Matches barrier in sched_mm_cid_remote_clear_old().
-	 */
-	smp_mb();
-	mm_cid_put(mm);
-	t->last_mm_cid = t->mm_cid = -1;
-	rq_unlock_irqrestore(rq, &rf);
+	scoped (rq_lock_irqsave, rq) {
+		preempt_enable_no_resched();	/* holding spinlock */
+		WRITE_ONCE(t->mm_cid_active, 0);
+		/*
+		 * Store t->mm_cid_active before loading per-mm/cpu cid.
+		 * Matches barrier in sched_mm_cid_remote_clear_old().
+		 */
+		smp_mb();
+		mm_cid_put(mm);
+		t->last_mm_cid = t->mm_cid = -1;
+	}
 }
 
 void sched_mm_cid_after_execve(struct task_struct *t)
 {
 	struct mm_struct *mm = t->mm;
-	struct rq_flags rf;
 	struct rq *rq;
 
 	if (!mm)
@@ -11999,16 +11884,16 @@ void sched_mm_cid_after_execve(struct ta
 
 	preempt_disable();
 	rq = this_rq();
-	rq_lock_irqsave(rq, &rf);
-	preempt_enable_no_resched();	/* holding spinlock */
-	WRITE_ONCE(t->mm_cid_active, 1);
-	/*
-	 * Store t->mm_cid_active before loading per-mm/cpu cid.
-	 * Matches barrier in sched_mm_cid_remote_clear_old().
-	 */
-	smp_mb();
-	t->last_mm_cid = t->mm_cid = mm_cid_get(rq, mm);
-	rq_unlock_irqrestore(rq, &rf);
+	scoped (rq_lock_irqsave, rq) {
+		preempt_enable_no_resched();	/* holding spinlock */
+		WRITE_ONCE(t->mm_cid_active, 1);
+		/*
+		 * Store t->mm_cid_active before loading per-mm/cpu cid.
+		 * Matches barrier in sched_mm_cid_remote_clear_old().
+		 */
+		smp_mb();
+		t->last_mm_cid = t->mm_cid = mm_cid_get(rq, mm);
+	}
 	rseq_set_notify_resume(t);
 }
 
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -1630,6 +1630,11 @@ task_rq_unlock(struct rq *rq, struct tas
 	raw_spin_unlock_irqrestore(&p->pi_lock, rf->flags);
 }
 
+DEFINE_LOCK_GUARD_1(task_rq_lock, struct task_struct,
+		    _G->rq = task_rq_lock(_G->lock, &_G->rf),
+		    task_rq_unlock(_G->rq, _G->lock, &_G->rf),
+		    struct rq *rq; struct rq_flags rf)
+
 static inline void
 rq_lock_irqsave(struct rq *rq, struct rq_flags *rf)
 	__acquires(rq->lock)
@@ -1678,6 +1683,21 @@ rq_unlock(struct rq *rq, struct rq_flags
 	raw_spin_rq_unlock(rq);
 }
 
+DEFINE_LOCK_GUARD_1(rq_lock, struct rq,
+		    rq_lock(_G->lock, &_G->rf),
+		    rq_unlock(_G->lock, &_G->rf),
+		    struct rq_flags rf)
+
+DEFINE_LOCK_GUARD_1(rq_lock_irq, struct rq,
+		    rq_lock_irq(_G->lock, &_G->rf),
+		    rq_unlock_irq(_G->lock, &_G->rf),
+		    struct rq_flags rf)
+
+DEFINE_LOCK_GUARD_1(rq_lock_irqsave, struct rq,
+		    rq_lock_irqsave(_G->lock, &_G->rf),
+		    rq_unlock_irqrestore(_G->lock, &_G->rf),
+		    struct rq_flags rf)
+
 static inline struct rq *
 this_rq_lock_irq(struct rq_flags *rf)
 	__acquires(rq->lock)
@@ -2701,6 +2721,16 @@ static inline void double_raw_lock(raw_s
 	raw_spin_lock_nested(l2, SINGLE_DEPTH_NESTING);
 }
 
+static inline void double_raw_unlock(raw_spinlock_t *l1, raw_spinlock_t *l2)
+{
+	raw_spin_unlock(l1);
+	raw_spin_unlock(l2);
+}
+
+DEFINE_LOCK_GUARD_2(double_raw_spinlock, raw_spinlock_t,
+		    double_raw_lock(_G->lock, _G->lock2),
+		    double_raw_unlock(_G->lock, _G->lock2))
+
 /*
  * double_rq_unlock - safely unlock two runqueues
  *
@@ -2758,6 +2788,10 @@ static inline void double_rq_unlock(stru
 
 #endif
 
+DEFINE_LOCK_GUARD_2(double_rq_lock, struct rq,
+		    double_rq_lock(_G->lock, _G->lock2),
+		    double_rq_unlock(_G->lock, _G->lock2))
+
 extern struct sched_entity *__pick_first_entity(struct cfs_rq *cfs_rq);
 extern struct sched_entity *__pick_last_entity(struct cfs_rq *cfs_rq);
 



^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v2 1/2] locking: Introduce __cleanup__ based guards
  2023-05-26 20:52 ` [PATCH v2 1/2] locking: Introduce __cleanup__ based guards Peter Zijlstra
@ 2023-05-26 21:24   ` Peter Zijlstra
  2023-05-26 21:54     ` Linus Torvalds
  0 siblings, 1 reply; 42+ messages in thread
From: Peter Zijlstra @ 2023-05-26 21:24 UTC (permalink / raw)
  To: torvalds, keescook, gregkh, pbonzini
  Cc: linux-kernel, ojeda, ndesaulniers, mingo, will, longman,
	boqun.feng, juri.lelli, vincent.guittot, dietmar.eggemann,
	rostedt, bsegall, mgorman, bristot, vschneid, paulmck, frederic,
	quic_neeraju, joel, josh, mathieu.desnoyers, jiangshanlai, rcu,
	tj, tglx

On Fri, May 26, 2023 at 10:52:05PM +0200, Peter Zijlstra wrote:

> +++ b/include/linux/guards.h
> @@ -0,0 +1,142 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +#ifndef __LINUX_GUARDS_H
> +#define __LINUX_GUARDS_H
> +
> +#include <linux/compiler_attributes.h>
> +
> +/*
> + * Pointer Guards are special pointers (variables) with a scope bound cleanup
> + * function.
> + *
> + * Various guard types can be created using:
> + *
> + *   DEFINE_PTR_GUARD(guard_type, typename, cleanup-exp)
> + *
> + * After which they can be used like so:
> + *
> + *   ptr_guard(guard_type, name) = find_get_object(foo);
> + *
> + * Where the return type of find_get_object() should match the guard_type's
> + * 'typname *'. And when @name goes out of scope cleanup-exp is ran (inserted
> + * by the compiler) when !NULL at that time. Also see the __cleanup attribute.
> + */
> +
> +#define DEFINE_PTR_GUARD(_type, _Type, _Put)					\
> +typedef _Type *ptr_guard_##_type##_t;						\
> +										\
> +static inline void ptr_guard_##_type##_cleanup(_Type **_ptr)			\
> +{										\
> +	_Type *_G = *_ptr;							\
> +	if (_G)									\
> +		_Put(_G);							\
> +}
> +
> +#define ptr_guard(_type, _name)							\
> +	ptr_guard_##_type##_t _name __cleanup(ptr_guard_##_type##_cleanup)
> +

Ha, and then I wanted to create an fdput() guard... :/

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v2 1/2] locking: Introduce __cleanup__ based guards
  2023-05-26 21:24   ` Peter Zijlstra
@ 2023-05-26 21:54     ` Linus Torvalds
  2023-05-27  8:57       ` Peter Zijlstra
  0 siblings, 1 reply; 42+ messages in thread
From: Linus Torvalds @ 2023-05-26 21:54 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: keescook, gregkh, pbonzini, linux-kernel, ojeda, ndesaulniers,
	mingo, will, longman, boqun.feng, juri.lelli, vincent.guittot,
	dietmar.eggemann, rostedt, bsegall, mgorman, bristot, vschneid,
	paulmck, frederic, quic_neeraju, joel, josh, mathieu.desnoyers,
	jiangshanlai, rcu, tj, tglx

On Fri, May 26, 2023 at 2:25 PM Peter Zijlstra <peterz@infradead.org> wrote:
>> > +                                                                             \
> > +static inline void ptr_guard_##_type##_cleanup(_Type **_ptr)                 \
> > +{                                                                            \
> > +     _Type *_G = *_ptr;                                                      \
> > +     if (_G)                                                                 \
> > +             _Put(_G);                                                       \
> > +}
> > +
> > +#define ptr_guard(_type, _name)                                                      \
> > +     ptr_guard_##_type##_t _name __cleanup(ptr_guard_##_type##_cleanup)
> > +
>
> Ha, and then I wanted to create an fdput() guard... :/

I have to say, I'm not super-convinced about the macros to create
these guard functions and types.

I suspect that it would be better to literally just declare the guard
types directly.  For the fdget idiom, you really just want the guard
type to *be* the 'struct fd'.

And I think you made those macros just make too many assumptions.

It's not just "struct fd", for example.  If you want to do things like
wrap 'local_irq_save', again it's not a pointer, the context is just
'unsigned long irqflags'.

And I really don't think those type-creation macros buy you anything.

What's wrong with just writing it out:

    typedef struct fd guard_fdget_type_t;
    static inline struct fd guard_fdget_init(int fd)
    { return fdget(fd); }
    static inline void guard_fdget_exit(struct fd fd)
    { fdput(fd); }

and

    typedef struct mutex *guard_mutex_type_t;
    static inline struct mutex *guard_mutex_init(struct mutex *mutex)
    { mutex_lock(mutex); return mutex; }
    static inline void guard_mutex_exit(struct mutex *mutex)
    { mutex_unlock(mutex); }

I don't think the latter is in the *least* less readable than doing

    DEFINE_LOCK_GUARD_1(mutex, struct mutex,
                mutex_lock(_G->lock),
                mutex_unlock(_G->lock))

which is this magic macro that creates those, and makes them
completely ungreppable - and makes the type system very inflexible.

So I really think that it would be a lot easier to write out the
wrappers explicitly for the few types that really want this.

I dunno.

Once again, I have written example code in my MUA that I have not
tested AT ALL, and that may be fundamentally broken for some very
obvious reason, and I'm just too stupid to see it.

             Linus

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v2 1/2] locking: Introduce __cleanup__ based guards
  2023-05-26 21:54     ` Linus Torvalds
@ 2023-05-27  8:57       ` Peter Zijlstra
  0 siblings, 0 replies; 42+ messages in thread
From: Peter Zijlstra @ 2023-05-27  8:57 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: keescook, gregkh, pbonzini, linux-kernel, ojeda, ndesaulniers,
	mingo, will, longman, boqun.feng, juri.lelli, vincent.guittot,
	dietmar.eggemann, rostedt, bsegall, mgorman, bristot, vschneid,
	paulmck, frederic, quic_neeraju, joel, josh, mathieu.desnoyers,
	jiangshanlai, rcu, tj, tglx

On Fri, May 26, 2023 at 02:54:40PM -0700, Linus Torvalds wrote:
> What's wrong with just writing it out:
> 
>     typedef struct fd guard_fdget_type_t;
>     static inline struct fd guard_fdget_init(int fd)
>     { return fdget(fd); }
>     static inline void guard_fdget_exit(struct fd fd)
>     { fdput(fd); }
> 

(wrong guard type, ptr_guard vs lock_guard) but yeah, I had this same
realization during breakfast. Clearly the brain had already left last
night.

Specifically, I think we want ptr_guard() here (and possibly fdnull for
__to_fd(0)) for things like do_sendfile() where a struct fd is
initialized late.

The below seems to compile...

--- a/include/linux/file.h
+++ b/include/linux/file.h
@@ -10,6 +10,7 @@
 #include <linux/types.h>
 #include <linux/posix_types.h>
 #include <linux/errno.h>
+#include <linux/guards.h>
 
 struct file;
 
@@ -45,6 +46,13 @@ static inline void fdput(struct fd fd)
 		fput(fd.file);
 }
 
+typedef struct fd ptr_guard_fdput_t;
+
+static inline void ptr_guard_fdput_cleanup(struct fd *fdp)
+{
+	fdput(*fdp);
+}
+
 extern struct file *fget(unsigned int fd);
 extern struct file *fget_raw(unsigned int fd);
 extern struct file *fget_task(struct task_struct *task, unsigned int fd);
@@ -58,6 +66,8 @@ static inline struct fd __to_fd(unsigned
 	return (struct fd){(struct file *)(v & ~3),v & 3};
 }
 
+#define fdnull	__to_fd(0)
+
 static inline struct fd fdget(unsigned int fd)
 {
 	return __to_fd(__fdget(fd));
--- a/fs/read_write.c
+++ b/fs/read_write.c
@@ -1180,7 +1180,8 @@ COMPAT_SYSCALL_DEFINE6(pwritev2, compat_ulong_t, fd,
 static ssize_t do_sendfile(int out_fd, int in_fd, loff_t *ppos,
 		  	   size_t count, loff_t max)
 {
-	struct fd in, out;
+	ptr_guard(fdput, in) = fdnull;
+	ptr_guard(fdput, out) = fdnull;
 	struct inode *in_inode, *out_inode;
 	struct pipe_inode_info *opipe;
 	loff_t pos;
@@ -1191,35 +1192,35 @@ static ssize_t do_sendfile(int out_fd, int in_fd, loff_t *ppos,
 	/*
 	 * Get input file, and verify that it is ok..
 	 */
-	retval = -EBADF;
 	in = fdget(in_fd);
 	if (!in.file)
-		goto out;
+		return -EBADF;
 	if (!(in.file->f_mode & FMODE_READ))
-		goto fput_in;
-	retval = -ESPIPE;
+		return -EBADF;
+
 	if (!ppos) {
 		pos = in.file->f_pos;
 	} else {
 		pos = *ppos;
 		if (!(in.file->f_mode & FMODE_PREAD))
-			goto fput_in;
+			return -ESPIPE;
 	}
+
 	retval = rw_verify_area(READ, in.file, &pos, count);
 	if (retval < 0)
-		goto fput_in;
+		return retval;
 	if (count > MAX_RW_COUNT)
 		count =  MAX_RW_COUNT;
 
 	/*
 	 * Get output file, and verify that it is ok..
 	 */
-	retval = -EBADF;
 	out = fdget(out_fd);
 	if (!out.file)
-		goto fput_in;
+		return -EBADF;
 	if (!(out.file->f_mode & FMODE_WRITE))
-		goto fput_out;
+		return -EBADF;
+
 	in_inode = file_inode(in.file);
 	out_inode = file_inode(out.file);
 	out_pos = out.file->f_pos;
@@ -1228,9 +1229,8 @@ static ssize_t do_sendfile(int out_fd, int in_fd, loff_t *ppos,
 		max = min(in_inode->i_sb->s_maxbytes, out_inode->i_sb->s_maxbytes);
 
 	if (unlikely(pos + count > max)) {
-		retval = -EOVERFLOW;
 		if (pos >= max)
-			goto fput_out;
+			return -EOVERFLOW;
 		count = max - pos;
 	}
 
@@ -1249,7 +1249,7 @@ static ssize_t do_sendfile(int out_fd, int in_fd, loff_t *ppos,
 	if (!opipe) {
 		retval = rw_verify_area(WRITE, out.file, &out_pos, count);
 		if (retval < 0)
-			goto fput_out;
+			return retval;
 		file_start_write(out.file);
 		retval = do_splice_direct(in.file, &pos, out.file, &out_pos,
 					  count, fl);
@@ -1278,11 +1278,6 @@ static ssize_t do_sendfile(int out_fd, int in_fd, loff_t *ppos,
 	if (pos > max)
 		retval = -EOVERFLOW;
 
-fput_out:
-	fdput(out);
-fput_in:
-	fdput(in);
-out:
 	return retval;
 }
 

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v2 0/2] Lock and Pointer guards
  2023-05-26 20:52 [PATCH v2 0/2] Lock and Pointer guards Peter Zijlstra
  2023-05-26 20:52 ` [PATCH v2 1/2] locking: Introduce __cleanup__ based guards Peter Zijlstra
  2023-05-26 20:52 ` [PATCH v2 2/2] sched: Use fancy new guards Peter Zijlstra
@ 2023-05-27 17:21 ` Mathieu Desnoyers
  2023-05-27 19:18 ` Linus Torvalds
  3 siblings, 0 replies; 42+ messages in thread
From: Mathieu Desnoyers @ 2023-05-27 17:21 UTC (permalink / raw)
  To: Peter Zijlstra, torvalds, keescook, gregkh, pbonzini
  Cc: linux-kernel, ojeda, ndesaulniers, mingo, will, longman,
	boqun.feng, juri.lelli, vincent.guittot, dietmar.eggemann,
	rostedt, bsegall, mgorman, bristot, vschneid, paulmck, frederic,
	quic_neeraju, joel, josh, jiangshanlai, rcu, tj, tglx

On 5/26/23 16:52, Peter Zijlstra wrote:
> By popular demand, a new and improved version :-)
> 
> New since -v1 ( https://lkml.kernel.org/r/20230526150549.250372621@infradead.org )
> 
>   - much improved interface for lock guards: guard() and scoped () { }
>     as suggested by Linus.

<name bikeshedding>

I know I'm the one who hinted at C++ "std::scoped_lock" as a similar 
preexisting API, but I find that "scoped()" is weird in the newly 
proposed form. "scoped_lock" is fine considering that "scoped" is an 
adjective applying to "lock", but in the case of e.g. scoped(rcu) { }, 
then we are really declaring a "scope" of type "rcu". I suspect that in 
this case:

scope(rcu) { }

would be less unexpected than the adjective form:

scoped(rcu) { }

Especially if we go for the name "guard()", rather than the adjective 
guarded(), for its counterpart.

Thanks,

Mathieu


-- 
Mathieu Desnoyers
EfficiOS Inc.
https://www.efficios.com


^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v2 0/2] Lock and Pointer guards
  2023-05-26 20:52 [PATCH v2 0/2] Lock and Pointer guards Peter Zijlstra
                   ` (2 preceding siblings ...)
  2023-05-27 17:21 ` [PATCH v2 0/2] Lock and Pointer guards Mathieu Desnoyers
@ 2023-05-27 19:18 ` Linus Torvalds
  2023-05-29 12:09   ` Paolo Bonzini
                     ` (2 more replies)
  3 siblings, 3 replies; 42+ messages in thread
From: Linus Torvalds @ 2023-05-27 19:18 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: keescook, gregkh, pbonzini, linux-kernel, ojeda, ndesaulniers,
	mingo, will, longman, boqun.feng, juri.lelli, vincent.guittot,
	dietmar.eggemann, rostedt, bsegall, mgorman, bristot, vschneid,
	paulmck, frederic, quic_neeraju, joel, josh, mathieu.desnoyers,
	jiangshanlai, rcu, tj, tglx

On Fri, May 26, 2023 at 2:01 PM Peter Zijlstra <peterz@infradead.org> wrote:
>
> By popular demand, a new and improved version :-)

Well, I certainly find the improved version more legible, despite my
further comment about the type macros being too complicated.

But now I've slept on it, and I think the problem goes beyond "macros
being too complicated".

I actually think that the whole "create a new scope" is just wrong. I
understand why you did it: the whole "its' going out of scope" is part
of the *idea*.

BUT.

Adding a new scope results in

 (a) pointless indentation

 (b) syntax problems with fighting the limited syntax of for loops

 (c) extra dummy variables - both for the for loop hack, but also for
the scope guard thing itself

and none of the above is an *advantage*.

So how about we take a step back, and say "what if we don't create a
new scope at all"?

I think it actually improves on everything. The macros become
*trivial*. The code becomes more legible.

And you can have multiple different scopes very naturally, or you can
just share a scope.

Let me build up my argument here. Let's start with this *trivial* helper:

    /* Trivial "generic" auto-release macro */
    #define auto_release_name(type, name, init, exit) \
        type name __attribute__((__cleanup__(exit))) = (init)

it's truly stupid: it creates a new variable of type 'type', with name
'name', and with the initial value 'init', and with the cleanup
function 'exit'.

It looks stupid, but it's the *good* stupid. It's really really
obvious, and there are no games here.

Let me then introduce *one* other helper, because it turns out that
much of the time, you don't really want to  pick a name. So we'll use
the above macro, but make a version of it that just automatically
picks a unique name for you:

    #define auto_release(type, init, exit) \
        auto_release_name(type, \
                __UNIQUE_ID(auto) __maybe_unused, \
                init, exit)

again, there are absolutely no games here, it's all just very very
straightforward. It's so straightforward that it all feels stupid.

But again, it's the 'S' in "KISS". Keep It Simple, Stupid.

And it turns out that the above two trivial macros are actually quite
useful in themselves. You want to do an auto-cleanup version of
'struct fd'? It's trivial:

    /* Trivial "getfd()" wrapper */
    static inline void release_fd(struct fd *fd)
    { fdput(*fd); }

    #define auto_getfd(name, n) \
        auto_release_name(struct fd, name, fdget(n), release_fd)

and now you can write code like this:

    int testfd(int fd)
    {
        auto_getfd(f, fd);
        /* Do something with f.file */
        return f.file ? 0 : -EBADF;
    }

Which is really trivial, and actually pretty legible.

Note that there is no explicit "scope" here. The scope is simply
implicit in the code. The compiler will verify that it's at the
beginning of the block, since we do that
-Wdeclaration-after-statement, so for the kernel this is always at the
beginning of a scope anyway.

And that's actually a big advantage. You want to deal with *two* file
descriptors? Sure. Just do this:

    /* Do two file descriptors point to the same file? */
    int testfd2(int fd1, int fd2)
    {
        auto_getfd(f1, fd1);
        auto_getfd(f2, fd2);
        return f1.file && f1.file == f2.file;
    }

and it JustWorks(tm). Notice how getting rid of the explicit scoping
is actually a *feature*.

Ok, so why did I want that helper that used that auto-genmerated name?
The above didn't use it, but look here:

    /* Trivial "mutex" auto-release helpers */
    static inline struct mutex *get_mutex(struct mutex *a)
    { mutex_lock(a); return a; }

    static inline void put_mutex(struct mutex **a)
    { mutex_unlock(*a); }

    #define auto_release_mutex_lock(lock) \
        auto_release(struct mutex *, get_mutex(lock), put_mutex)

That's all you need for a nice locked region.

Or how about RCU, which doesn't have a lock type? Just do this:

    struct rcu_scope;

    static inline struct rcu_scope *get_rcu(void)
    { rcu_read_lock(); return NULL; }

    static void put_rcu(struct rcu_scope **a)
    { rcu_read_unlock(); }

    #define auto_release_rcu_lock() \
        auto_release(struct rcu_scope *, get_rcu(), put_rcu)

and you're done. And how you can *mix* all these:

    int testfd2(int fd1, int fd2)
    {
        auto_release_mutex_lock(&cgroup_mutex);
        auto_release_rcu_lock();
        auto_getfd(f1, fd1);
        auto_getfd(f2, fd2);
        return f1.file && f1.file == f2.file;
    }

Ok. The above is insane. But it's instructive. And it's all SO VERY SIMPLE.

It's also an example of something people need to be aware of: the
auto-releasing is not ordered. That may or may not be a problem. It's
not a problem for two identical locks, but it very much *can* be a
problem if you mix different locks.

So in the crazy above, the RCU lock may be released *after* the
cgroup_mutex is released. Or before.

I'm not convinced it's ordered even if you end up using explicit
scopes, which you can obviously still do:

    int testfd2(int fd1, int fd2)
    {
        auto_release_mutex_lock(&cgroup_mutex);
        do {
                auto_release_rcu_lock();
                auto_getfd(f1, fd1);
                auto_getfd(f2, fd2);
                return f1.file && f1.file == f2.file;
        } while (0);
    }

so I don't think the nasty "scoped()" version is any better in this regard.

And before you say "unlock order doesn't matter" - that's actually not
true. Unlock order does matter when you have things like
"spin_lock_irq()" where the irq-off region then protects other locks
too.

If you do the "unlock_irq" before you've released the other spinlocks
that needed irq protection, you're now screwed.

Final notes:

 - I think the above is simpler and objectively better in every way
from the explicitly scoped thing

 - I do think that the auto-release can be very dangerous for locking,
and people need to verify me about the ordering. Maybe the freeing
order is well-defined.

 - I also suspect that to get maximum advantage of this all, we would
have to get rid of -Wdeclaration-after-statement

That last point is something that some people will celebrate, but I do
think it has helped keep our code base somewhat controlled, and made
people think more about the variables they declare.

But if you start doing consecutive cleanup operations, you really end
up wanting to do thigns like this:

    int testfd2(int fd1, int fd2)
    {
        auto_getfd(f1, fd1);
        if (!f1.file)
                return -EBADF;
        auto_getfd(f2, fd2);
        if (!f2.file)
                return -EBADF;
        return do_something (f1, f2);
    }

and you basically *have* to allow those declarations in the middle -
because you don't want to clean up fd2 in that "f1 failed" case, since
f2 doesn't exist yet.

                  Linus

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v2 0/2] Lock and Pointer guards
  2023-05-27 19:18 ` Linus Torvalds
@ 2023-05-29 12:09   ` Paolo Bonzini
  2023-05-29 19:04     ` Linus Torvalds
  2023-05-30  9:23   ` Peter Zijlstra
  2023-05-30  9:26   ` Peter Zijlstra
  2 siblings, 1 reply; 42+ messages in thread
From: Paolo Bonzini @ 2023-05-29 12:09 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Peter Zijlstra, keescook, gregkh, linux-kernel, ojeda,
	ndesaulniers, mingo, will, longman, boqun.feng, juri.lelli,
	vincent.guittot, dietmar.eggemann, rostedt, bsegall, mgorman,
	bristot, vschneid, paulmck, frederic, quic_neeraju, joel, josh,
	mathieu.desnoyers, jiangshanlai, rcu, tj, tglx

On Sat, May 27, 2023 at 9:18 PM Linus Torvalds
<torvalds@linux-foundation.org> wrote:
> It's also an example of something people need to be aware of: the
> auto-releasing is not ordered. That may or may not be a problem. It's
> not a problem for two identical locks, but it very much *can* be a
> problem if you mix different locks.

It is guaranteed. It would be nice to have it documented, but if you
look at the intermediate representation of this simple example:

#include <stdio.h>

int print_on_exit(void **f) {
    if (*f) puts(*f);
    *f = NULL;
}

int main(int argc) {
    // prints "second" then "first"
    void *__attribute__((__cleanup__(print_on_exit))) cleanup1 = "first";
    void *__attribute__((__cleanup__(print_on_exit))) cleanup2 = "second";
}

... the implementation is introducing a scope and reusing the C++
try/finally implementation, and the "optimizations" that are allowed
when functions are not "throwing" anything. This is always the case
for C, so this (from f.c.005t.original, obtained with -c
-fdump-tree-all):

{
  // the duplicated initializers are just an artifact of the AST
  void * cleanup1 = (void *) "first";
  void * cleanup2 = (void *) "second";

    void * cleanup1 = (void *) "first";
  try
    {
            void * cleanup2 = (void *) "second";
      try
        {

        }
      finally
        {
          print_on_exit (&cleanup2);
        }
    }
  finally
    {
      print_on_exit (&cleanup1);
    }
}
return 0;

becomes this as soon as exceptions are lowered (from f.c.013t.eh),
which is before any kind of optimization:

int main (int argc)
{
  void * cleanup2;
  void * cleanup1;
  int D.3187;

  cleanup1 = "first";
  cleanup2 = "second";
  print_on_exit (&cleanup2);
  print_on_exit (&cleanup1);
  cleanup1 = {CLOBBER(eol)};
  cleanup2 = {CLOBBER(eol)};
  D.3187 = 0;
  goto <D.3188>;
  <D.3188>:
  return D.3187;
}

> So in the crazy above, the RCU lock may be released *after* the
> cgroup_mutex is released. Or before.
>
> I'm not convinced it's ordered even if you end up using explicit
> scopes, which you can obviously still do:

It is, see

        void *__attribute__((__cleanup__(print_on_exit))) cleanup1 = "first";
        {
                void *__attribute__((__cleanup__(print_on_exit)))
cleanup2 = "second";
                puts("begin nested scope");
        }
        puts("back to outer scope");

which prints

begin nested scope
second
back to outer scope
first

This is practically required by glibc's nasty
pthread_cleanup_push/pthread_cleanup_pop macros (the macros are nasty
even if you ignore glibc's implementation, but still):

#  define pthread_cleanup_push(routine, arg) \
  do {                                                                        \
    struct __pthread_cleanup_frame __clframe                                  \
      __attribute__ ((__cleanup__ (__pthread_cleanup_routine)))               \
      = { .__cancel_routine = (routine), .__cancel_arg = (arg),               \
          .__do_it = 1 };

/* Remove a cleanup handler installed by the matching pthread_cleanup_push.
   If EXECUTE is non-zero, the handler function is called. */
#  define pthread_cleanup_pop(execute) \
    __clframe.__do_it = (execute);                                            \
  } while (0)

If the scope wasn't respected, pthread_cleanup_pop(1) would be broken
because pthread_cleanup_pop() must immediately execute the function if
its argument is nonzero.

>  - I think the above is simpler and objectively better in every way
> from the explicitly scoped thing

It is simpler indeed, but a scope-based variant is useful too based on
my experience with QEMU. Maybe things like Python's with statement
have spoiled me, but I can't quite get used to the lone braces in

{
    ...
    {
        auto_release(mutex, &mutex);
        ...
    }
    ...
}

So in QEMU we have both of them. In Linux that would be

   auto_release(mutex, &mutex)

and

   scoped(mutex, &mutex) {}

>  - I do think that the auto-release can be very dangerous for locking,
> and people need to verify me about the ordering. Maybe the freeing
> order is well-defined.

It is but having it documented would be better.

>  - I also suspect that to get maximum advantage of this all, we would
> have to get rid of -Wdeclaration-after-statement

Having both a declaration-based and a scope-based variant helps with
that too. You could add _Pragma("GCC diagnostic push/ignore/pop") to
the declaration-based macros but ouch... even without thinking of
whether clang supports it, which I didn't check.

Paolo

> That last point is something that some people will celebrate, but I do
> think it has helped keep our code base somewhat controlled, and made
> people think more about the variables they declare.


^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v2 0/2] Lock and Pointer guards
  2023-05-29 12:09   ` Paolo Bonzini
@ 2023-05-29 19:04     ` Linus Torvalds
  2023-05-29 21:27       ` Ian Lance Taylor
  0 siblings, 1 reply; 42+ messages in thread
From: Linus Torvalds @ 2023-05-29 19:04 UTC (permalink / raw)
  To: Paolo Bonzini, Ian Lance Taylor
  Cc: Peter Zijlstra, keescook, gregkh, linux-kernel, ojeda,
	ndesaulniers, mingo, will, longman, boqun.feng, juri.lelli,
	vincent.guittot, dietmar.eggemann, rostedt, bsegall, mgorman,
	bristot, vschneid, paulmck, frederic, quic_neeraju, joel, josh,
	mathieu.desnoyers, jiangshanlai, rcu, tj, tglx

On Mon, May 29, 2023 at 8:09 AM Paolo Bonzini <pbonzini@redhat.com> wrote:
>
> On Sat, May 27, 2023 at 9:18 PM Linus Torvalds
> <torvalds@linux-foundation.org> wrote:
> > It's also an example of something people need to be aware of: the
> > auto-releasing is not ordered. That may or may not be a problem. It's
> > not a problem for two identical locks, but it very much *can* be a
> > problem if you mix different locks.
>
> It is guaranteed. It would be nice to have it documented, but if you
> look at the intermediate representation of this simple example:

Well, I can see that it might be doing that reverse ordering in
practice, but for the life of me, I can't actually find anything that
says it is guaranteed.

Google did find me one blog post by Ian Lance Taylor from 2008 that
said that yes, each __cleanup__ attribute basically creates its own
little scope, and that the cleanup in reverse declaration order is
thus guaranteed.

Let's add Ian to the cc, partly to confirm it wasn't just a random
implementation detail, but also partly to perhaps ask him to get
somebody to document it.

Because if it's not documented, how do we know that the clang
implementation of that attribute then ends up also guaranteeing the
reverse order cleanup, even if gcc might guarantee it?

I *suspect* - but cannot find any guarantees - that it's going to
match C++ destructors, and you probably end up pretty much always
having to deal with these cleanup functions in reverse order, so it
all sounds very likely to me.

And maybe it's even documented somewhere that neither of us could find.

Anyway, I do like the option to use cleanup functions, but I think
we'd better make sure, since we really may require nesting for locks
(even if in many cases it won't matter).

Ian? Any chance you can either point us at documentation, or maybe
just confirm it, and hopefully make said documentation happen?

            Linus

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v2 0/2] Lock and Pointer guards
  2023-05-29 19:04     ` Linus Torvalds
@ 2023-05-29 21:27       ` Ian Lance Taylor
  2023-05-30  0:06         ` Linus Torvalds
  0 siblings, 1 reply; 42+ messages in thread
From: Ian Lance Taylor @ 2023-05-29 21:27 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Paolo Bonzini, Peter Zijlstra, keescook, gregkh, linux-kernel,
	ojeda, ndesaulniers, mingo, will, longman, boqun.feng, juri.lelli,
	vincent.guittot, dietmar.eggemann, rostedt, bsegall, mgorman,
	bristot, vschneid, paulmck, frederic, quic_neeraju, joel, josh,
	mathieu.desnoyers, jiangshanlai, rcu, tj, tglx

On Mon, May 29, 2023 at 12:04 PM Linus Torvalds
<torvalds@linux-foundation.org> wrote:
>
> On Mon, May 29, 2023 at 8:09 AM Paolo Bonzini <pbonzini@redhat.com> wrote:
> >
> > On Sat, May 27, 2023 at 9:18 PM Linus Torvalds
> > <torvalds@linux-foundation.org> wrote:
> > > It's also an example of something people need to be aware of: the
> > > auto-releasing is not ordered. That may or may not be a problem. It's
> > > not a problem for two identical locks, but it very much *can* be a
> > > problem if you mix different locks.
> >
> > It is guaranteed. It would be nice to have it documented, but if you
> > look at the intermediate representation of this simple example:
>
> Well, I can see that it might be doing that reverse ordering in
> practice, but for the life of me, I can't actually find anything that
> says it is guaranteed.
>
> Google did find me one blog post by Ian Lance Taylor from 2008 that
> said that yes, each __cleanup__ attribute basically creates its own
> little scope, and that the cleanup in reverse declaration order is
> thus guaranteed.
>
> Let's add Ian to the cc, partly to confirm it wasn't just a random
> implementation detail, but also partly to perhaps ask him to get
> somebody to document it.
>
> Because if it's not documented, how do we know that the clang
> implementation of that attribute then ends up also guaranteeing the
> reverse order cleanup, even if gcc might guarantee it?
>
> I *suspect* - but cannot find any guarantees - that it's going to
> match C++ destructors, and you probably end up pretty much always
> having to deal with these cleanup functions in reverse order, so it
> all sounds very likely to me.
>
> And maybe it's even documented somewhere that neither of us could find.
>
> Anyway, I do like the option to use cleanup functions, but I think
> we'd better make sure, since we really may require nesting for locks
> (even if in many cases it won't matter).
>
> Ian? Any chance you can either point us at documentation, or maybe
> just confirm it, and hopefully make said documentation happen?


It was a while ago, but I expect that I was just thinking of the
implementation.  I agree that the documentation could be clearer.  I
filed https://gcc.gnu.org/PR110029.

Ian

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v2 0/2] Lock and Pointer guards
  2023-05-29 21:27       ` Ian Lance Taylor
@ 2023-05-30  0:06         ` Linus Torvalds
  0 siblings, 0 replies; 42+ messages in thread
From: Linus Torvalds @ 2023-05-30  0:06 UTC (permalink / raw)
  To: Ian Lance Taylor
  Cc: Paolo Bonzini, Peter Zijlstra, keescook, gregkh, linux-kernel,
	ojeda, ndesaulniers, mingo, will, longman, boqun.feng, juri.lelli,
	vincent.guittot, dietmar.eggemann, rostedt, bsegall, mgorman,
	bristot, vschneid, paulmck, frederic, quic_neeraju, joel, josh,
	mathieu.desnoyers, jiangshanlai, rcu, tj, tglx

On Mon, May 29, 2023 at 5:27 PM Ian Lance Taylor <iant@google.com> wrote:
>
> It was a while ago, but I expect that I was just thinking of the
> implementation.  I agree that the documentation could be clearer.  I
> filed https://gcc.gnu.org/PR110029.

Thanks. I think we can proceed with the assumption that it's all
clearly ordered. Even when the scopes are explicit (ie actual separate
block scopes for the variables and no shared scope), doing a 'return'
(or break out of a loop) will obviously exit multiple scopes at once,
so it would be good to have it documented that the cleanup is always
in that reverse order by scope and declaration order within a scope.

I guess I should check with the clang people too.

              Linus

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v2 0/2] Lock and Pointer guards
  2023-05-27 19:18 ` Linus Torvalds
  2023-05-29 12:09   ` Paolo Bonzini
@ 2023-05-30  9:23   ` Peter Zijlstra
  2023-05-30  9:34     ` Peter Zijlstra
                       ` (2 more replies)
  2023-05-30  9:26   ` Peter Zijlstra
  2 siblings, 3 replies; 42+ messages in thread
From: Peter Zijlstra @ 2023-05-30  9:23 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: keescook, gregkh, pbonzini, linux-kernel, ojeda, ndesaulniers,
	mingo, will, longman, boqun.feng, juri.lelli, vincent.guittot,
	dietmar.eggemann, rostedt, bsegall, mgorman, bristot, vschneid,
	paulmck, frederic, quic_neeraju, joel, josh, mathieu.desnoyers,
	jiangshanlai, rcu, tj, tglx

On Sat, May 27, 2023 at 12:18:04PM -0700, Linus Torvalds wrote:

> So how about we take a step back, and say "what if we don't create a
> new scope at all"?

Note that the lock_guard() and ptr_guard() options I have don't require
the new scope thing. The scope thing is optional.

> I think it actually improves on everything. The macros become
> *trivial*. The code becomes more legible.
> 
> And you can have multiple different scopes very naturally, or you can
> just share a scope.
> 
> Let me build up my argument here. Let's start with this *trivial* helper:
> 
>     /* Trivial "generic" auto-release macro */
>     #define auto_release_name(type, name, init, exit) \
>         type name __attribute__((__cleanup__(exit))) = (init)
> 
> it's truly stupid: it creates a new variable of type 'type', with name
> 'name', and with the initial value 'init', and with the cleanup
> function 'exit'.
> 
> It looks stupid, but it's the *good* stupid. It's really really
> obvious, and there are no games here.

I really don't like the auto naming since C++/C23 use auto for type
inference.

> Let me then introduce *one* other helper, because it turns out that
> much of the time, you don't really want to  pick a name. So we'll use
> the above macro, but make a version of it that just automatically
> picks a unique name for you:
> 
>     #define auto_release(type, init, exit) \
>         auto_release_name(type, \
>                 __UNIQUE_ID(auto) __maybe_unused, \
>                 init, exit)

I like that option.

> And it turns out that the above two trivial macros are actually quite
> useful in themselves. You want to do an auto-cleanup version of
> 'struct fd'? It's trivial:
> 
>     /* Trivial "getfd()" wrapper */
>     static inline void release_fd(struct fd *fd)
>     { fdput(*fd); }
> 
>     #define auto_getfd(name, n) \
>         auto_release_name(struct fd, name, fdget(n), release_fd)
> 

>  - I think the above is simpler and objectively better in every way
> from the explicitly scoped thing

Well, I think having that as a option would still be very nice.

>  - I also suspect that to get maximum advantage of this all, we would
> have to get rid of -Wdeclaration-after-statement
> 
> That last point is something that some people will celebrate, but I do
> think it has helped keep our code base somewhat controlled, and made
> people think more about the variables they declare.
> 
> But if you start doing consecutive cleanup operations, you really end
> up wanting to do thigns like this:
> 
>     int testfd2(int fd1, int fd2)
>     {
>         auto_getfd(f1, fd1);
>         if (!f1.file)
>                 return -EBADF;
>         auto_getfd(f2, fd2);
>         if (!f2.file)
>                 return -EBADF;
>         return do_something (f1, f2);
>     }

If you extend the ptr_guard() idea you don't need to get rid of
-Wdeclaration-after-statement and we could write it like:

	int testfd2(int fd1, int fd2)
	{
		ptr_guard(fdput, f1) = fdget(fd1);
		ptr_guard(fdput, f2) = null_ptr(fdput);
		if (!f1.file)
			return -EBADF;
		f2 = fdget(f2);
		if (!f2.file)
			return -EBADF;
		return do_something(f1, f2);
	}


Yes, the macros would be a little more involved, but not horribly so I
think.

typedef struct fd guard_fdput_t;

static const struct fd guard_fdput_null = __to_fd(0);

static inline void guard_fdput_cleanup(struct fd *fd)
{
	fdput(*fd);
}

#define ptr_guard(_guard, _name) \
	guard##_guard##_t _name __cleanup(guard##_guard##_cleanup)

#define null_ptr(_guard)		guard##_guard##_null;

And for actual pointer types (as opposed to fat pointer wrappers like
struct fd) we can have a regular helper macro like earlier:

#define DEFINE_PTR_GUARD(_guard, _type, _put)		\
typdef _type *guard##_guard##_t;			\
static const _type *guard##_guard##_null = NULL;	\
static inline void guard##_guard##_cleanup(_type **ptr)	\
{ if (*ptr) _put(*ptr); }

[NOTE: value propagation gets rid of the above conditional where
appropriate]

eg.:

  DEFINE_PTR_GUARD(put_task, struct task_struct, put_task_struct);


Possibly with a helper:

#define null_guard(_guard, _name) \
	ptr_guard(_guard, _name) = null_ptr(_guard)



Now, ptr_guard() per the above, is asymmetric in that it only cares
about release, let guard() be the symmetric thing that also cares about
init like so:

#define DEFINE_GUARD(_guard, _type, _put, _get)			\
DEFINE_PTR_GUARD(_guard, _type, _put)				\
static inline void guard##_guard##_init(guard##_guard##_t arg)	\
{ _get(arg); return arg; }

#define guard(_guard, _name, _var...)	\
	ptr_guard(_guard, _name) = guard##_guard@##_init(_var)

#define anon_guard(_guard, _var..) \
	guard(_guard, __UNIQUE_ID(guard), _var)

for eg.:

  DEFINE_GUARD(mutex, struct mutex, mutex_unlock, mutex_lock);

which then lets one write:

	int testfd2(int fd1, int fd2)
	{
		anon_guard(mutex, &cgroup_mutex);
		ptr_guard(fdput, f1) = fdget(fd1);
		null_guard(fdput, f2);
		if (!f1.file)
			return -EBADF;
		f2 = fdget(fd2);
		if (!f2.file)
			return -EBADf;
		return do_something(f1,f2);
	}

The RCU thing can then either be manually done like:

struct rcu_guard;

typedef struct rcu_guard *guard_rcu_t;
static const guard_rcu_null = NULL;
static inline guard_rcu_cleanup(struct rcu_guard **ptr)
{ if (*ptr) rcu_read_unlock(); }
static inline struct rcu_guard *guard_rcu_init(void)
{ rcu_read_lock(); return (void*)1; }

(or because we'll need this pattern a few more times, with yet another
DEFINE_foo_GUARD helper)

and:

	anon_guard(rcu);

works.

And at this point the previous scope() things are just one helper macro
away:

#define scope(_guard, _var..) \
	for (guard##_guard##_t *done = NULL, scope = guard##_guard##_init(var); \
	     !done; done++)

to be used where appropriate etc..


Yes, it's a wee bit more involved, but I'm thinking it gives a fair
amount of flexibility and we don't need to ret rid of
-Wdeclaration-after-statement.

Hmm?

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v2 0/2] Lock and Pointer guards
  2023-05-27 19:18 ` Linus Torvalds
  2023-05-29 12:09   ` Paolo Bonzini
  2023-05-30  9:23   ` Peter Zijlstra
@ 2023-05-30  9:26   ` Peter Zijlstra
  2 siblings, 0 replies; 42+ messages in thread
From: Peter Zijlstra @ 2023-05-30  9:26 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: keescook, gregkh, pbonzini, linux-kernel, ojeda, ndesaulniers,
	mingo, will, longman, boqun.feng, juri.lelli, vincent.guittot,
	dietmar.eggemann, rostedt, bsegall, mgorman, bristot, vschneid,
	paulmck, frederic, quic_neeraju, joel, josh, mathieu.desnoyers,
	jiangshanlai, rcu, tj, tglx

On Sat, May 27, 2023 at 12:18:04PM -0700, Linus Torvalds wrote:

> And before you say "unlock order doesn't matter" - that's actually not
> true. Unlock order does matter when you have things like
> "spin_lock_irq()" where the irq-off region then protects other locks
> too.

So I had already verified that GCC keeps them ordered, I see the
subthread with Ian and the documentation update request -- so all good
there.

Also, lockdep would scream bloody murder if you get that wrong, so I
can certainly add something to the selftests that would detect this.


^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v2 0/2] Lock and Pointer guards
  2023-05-30  9:23   ` Peter Zijlstra
@ 2023-05-30  9:34     ` Peter Zijlstra
  2023-05-30 13:58     ` Valentin Schneider
  2023-06-06  9:42     ` Peter Zijlstra
  2 siblings, 0 replies; 42+ messages in thread
From: Peter Zijlstra @ 2023-05-30  9:34 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: keescook, gregkh, pbonzini, linux-kernel, ojeda, ndesaulniers,
	mingo, will, longman, boqun.feng, juri.lelli, vincent.guittot,
	dietmar.eggemann, rostedt, bsegall, mgorman, bristot, vschneid,
	paulmck, frederic, quic_neeraju, joel, josh, mathieu.desnoyers,
	jiangshanlai, rcu, tj, tglx

On Tue, May 30, 2023 at 11:23:42AM +0200, Peter Zijlstra wrote:

> Yes, it's a wee bit more involved, but I'm thinking it gives a fair
> amount of flexibility and we don't need to ret rid of
> -Wdeclaration-after-statement.

One other thing I forgot to point put; it allows things like:

	int store_fd(int fd)
	{
		ptr_guard(fdput, f) = fdget(fd);
		void *ret;
		if (!f.file)
			return -EBADF;
		ret = xa_store(&xarray, f.file->private, f);
		if (xa_is_err(ret))
			return xa_err(ret);
		f = null_ptr(fdput); // xarray now owns f
		return 0;
	}

Where we can assign null_ptr() to clear the guard and inhibit the
cleanup function to pass ownership around.

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v2 0/2] Lock and Pointer guards
  2023-05-30  9:23   ` Peter Zijlstra
  2023-05-30  9:34     ` Peter Zijlstra
@ 2023-05-30 13:58     ` Valentin Schneider
  2023-06-06  9:42     ` Peter Zijlstra
  2 siblings, 0 replies; 42+ messages in thread
From: Valentin Schneider @ 2023-05-30 13:58 UTC (permalink / raw)
  To: Peter Zijlstra, Linus Torvalds
  Cc: keescook, gregkh, pbonzini, linux-kernel, ojeda, ndesaulniers,
	mingo, will, longman, boqun.feng, juri.lelli, vincent.guittot,
	dietmar.eggemann, rostedt, bsegall, mgorman, bristot, paulmck,
	frederic, quic_neeraju, joel, josh, mathieu.desnoyers,
	jiangshanlai, rcu, tj, tglx

On 30/05/23 11:23, Peter Zijlstra wrote:
> On Sat, May 27, 2023 at 12:18:04PM -0700, Linus Torvalds wrote:
>
>
>> And it turns out that the above two trivial macros are actually quite
>> useful in themselves. You want to do an auto-cleanup version of
>> 'struct fd'? It's trivial:
>>
>>     /* Trivial "getfd()" wrapper */
>>     static inline void release_fd(struct fd *fd)
>>     { fdput(*fd); }
>>
>>     #define auto_getfd(name, n) \
>>         auto_release_name(struct fd, name, fdget(n), release_fd)
>>
>
>>  - I think the above is simpler and objectively better in every way
>> from the explicitly scoped thing
>
> Well, I think having that as a option would still be very nice.
>

IMO the explicit scoping can help with readability. It gives a clear visual
indication of where critical sections are, and you can break it up with a
scope + guards as in migrate_swap_stop() to stay at sane indentation
levels (with Python context managers, this would all be one scope).

I'd argue that for these, the scope/indentation is beneficial and not just
a byproduct. Even for longer functions like try_to_wake_up(), this works
out alright.

This obviously falls apart when dealing with too many guards
(e.g. copy_process()) or if the resulting indentation is nuts, but I concur
that keeping the explicit scope as an option would be nice.

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v2 0/2] Lock and Pointer guards
  2023-05-30  9:23   ` Peter Zijlstra
  2023-05-30  9:34     ` Peter Zijlstra
  2023-05-30 13:58     ` Valentin Schneider
@ 2023-06-06  9:42     ` Peter Zijlstra
  2023-06-06 13:17       ` Linus Torvalds
  2023-06-06 15:31       ` Kees Cook
  2 siblings, 2 replies; 42+ messages in thread
From: Peter Zijlstra @ 2023-06-06  9:42 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: keescook, gregkh, pbonzini, linux-kernel, ojeda, ndesaulniers,
	mingo, will, longman, boqun.feng, juri.lelli, vincent.guittot,
	dietmar.eggemann, rostedt, bsegall, mgorman, bristot, vschneid,
	paulmck, frederic, quic_neeraju, joel, josh, mathieu.desnoyers,
	jiangshanlai, rcu, tj, tglx, linux-toolchains

On Tue, May 30, 2023 at 11:23:42AM +0200, Peter Zijlstra wrote:

> Yes, it's a wee bit more involved, but I'm thinking it gives a fair
> amount of flexibility and we don't need to ret rid of
> -Wdeclaration-after-statement.

So I made all that work and .. Yes, you're absolutely right.

Busting -Wdeclaration-after-statement is the right thing to do for
guards.

So then I came up with:

#define __ptr_guard(_guard, _name)                                     \
       guard_##_guard##_t _name __cleanup(guard_##_guard##_cleanup)

#define ptr_guard(_guard, _name)                                       \
       __diag(push)                                                    \
       __diag(ignored "-Wdeclaration-after-statement")                 \
       __ptr_guard(_guard, _name)                                      \
       __diag(pop)

#define guard_init(_guard, _var...)                                    \
       guard_##_guard##_init(_var)

#define named_guard(_guard, _name, _var...)                            \
       ptr_guard(_guard, _name) = guard_init(_guard, _var)

#define guard(_guard, _var...)                                         \
       named_guard(_guard, __UNIQUE_ID(guard), _var)

#define scoped_guard(_guard, _var...)                                  \
       for (__ptr_guard(_guard, scope) = guard_init(_guard, _var),     \
            *done = NULL; !done; done = (void *)1)


And that all (mostly) works on clang, but not GCC :-( GCC refuses to
accept _Pragma() inside an expression.

So I now have that ptr_guard() with push/pop for clang but without for
GCC, which means that only clang has a fighting chance to report
-Wdeclaration-after-statement warns until such a time as that we can get
GCC 'fixed'.

  https://godbolt.org/z/5MPeq5W6K


FWIW: the work-in-progress patches I have are here:

  https://git.kernel.org/pub/scm/linux/kernel/git/peterz/queue.git/log/?h=core/guards

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v2 0/2] Lock and Pointer guards
  2023-06-06  9:42     ` Peter Zijlstra
@ 2023-06-06 13:17       ` Linus Torvalds
  2023-06-06 13:40         ` Peter Zijlstra
  2023-06-06 15:31       ` Kees Cook
  1 sibling, 1 reply; 42+ messages in thread
From: Linus Torvalds @ 2023-06-06 13:17 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: keescook, gregkh, pbonzini, linux-kernel, ojeda, ndesaulniers,
	mingo, will, longman, boqun.feng, juri.lelli, vincent.guittot,
	dietmar.eggemann, rostedt, bsegall, mgorman, bristot, vschneid,
	paulmck, frederic, quic_neeraju, joel, josh, mathieu.desnoyers,
	jiangshanlai, rcu, tj, tglx, linux-toolchains

On Tue, Jun 6, 2023 at 2:43 AM Peter Zijlstra <peterz@infradead.org> wrote:
>
> ( GCC refuses to accept _Pragma() inside an expression.

If we really want this all, I think we'd just stop using
-Wdeclaration-after-statement entirely.

There are other uses for it, and people have asked for mixing
declarations and code before.

I think that particular straightjacket has been a good thing, but I
also think that it's ok to just let it go as a hard rule, and just try
to make it a coding style issue for the common case, but allow mixed
declarations and code when it makes sense.

For the whole "automatic release case it definitely makes sense, but
it's not like it isn't possible useful elsewhere. I just don't want
for it to become some global pattern for everything.

That said, I still don't understand why you lke the name "guard" for
this.  I understand not liking "auto", but "guard" doesn't seem any
better. In fact, much worse. Guarded expressions means something
completely different both in real life and in computer science.

I'm assuming there's some history there, but it makes no sense to me
as a name here.

              Linus

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v2 0/2] Lock and Pointer guards
  2023-06-06 13:17       ` Linus Torvalds
@ 2023-06-06 13:40         ` Peter Zijlstra
  2023-06-06 14:50           ` Linus Torvalds
  0 siblings, 1 reply; 42+ messages in thread
From: Peter Zijlstra @ 2023-06-06 13:40 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: keescook, gregkh, pbonzini, linux-kernel, ojeda, ndesaulniers,
	mingo, will, longman, boqun.feng, juri.lelli, vincent.guittot,
	dietmar.eggemann, rostedt, bsegall, mgorman, bristot, vschneid,
	paulmck, frederic, quic_neeraju, joel, josh, mathieu.desnoyers,
	jiangshanlai, rcu, tj, tglx, linux-toolchains

On Tue, Jun 06, 2023 at 06:17:33AM -0700, Linus Torvalds wrote:

> That said, I still don't understand why you lke the name "guard" for
> this.  I understand not liking "auto", but "guard" doesn't seem any
> better. In fact, much worse. Guarded expressions means something
> completely different both in real life and in computer science.
> 
> I'm assuming there's some history there, but it makes no sense to me
> as a name here.

I know the name from C++ where it is std::lock_guard<> (well, back when
I still did C++ it wasn't std, but whatever), and Rust seems to have
std::sync::MutexGuard<>.

But if that's the sole objection left, lets have a bike-shed party and
pick a colour :-)

'shield' 'sentry' 'sentinel' 'keeper' 'custodian' 'warden' ?

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v2 0/2] Lock and Pointer guards
  2023-06-06 13:40         ` Peter Zijlstra
@ 2023-06-06 14:50           ` Linus Torvalds
  2023-06-06 16:06             ` Kees Cook
  2023-06-06 18:08             ` Peter Zijlstra
  0 siblings, 2 replies; 42+ messages in thread
From: Linus Torvalds @ 2023-06-06 14:50 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: keescook, gregkh, pbonzini, linux-kernel, ojeda, ndesaulniers,
	mingo, will, longman, boqun.feng, juri.lelli, vincent.guittot,
	dietmar.eggemann, rostedt, bsegall, mgorman, bristot, vschneid,
	paulmck, frederic, quic_neeraju, joel, josh, mathieu.desnoyers,
	jiangshanlai, rcu, tj, tglx, linux-toolchains

On Tue, Jun 6, 2023 at 6:40 AM Peter Zijlstra <peterz@infradead.org> wrote:
>
> I know the name from C++ where it is std::lock_guard<> (well, back when
> I still did C++ it wasn't std, but whatever), and Rust seems to have
> std::sync::MutexGuard<>.
>
> But if that's the sole objection left, lets have a bike-shed party and
> pick a colour :-)
>
> 'shield' 'sentry' 'sentinel' 'keeper' 'custodian' 'warden' ?

I feel like you seem entirely too fixated on locking.

That's not even _remotely_ the most interesting case, I think.

For the common locking patterns , maybe "guard" makes sense as part of
the name, but it damn well shouldn't be about any "ptr_guard" or
whatever.

I think it should be some trivial

        mutex_scope(mymytex) {
                ..
        }

and yes, in that case scoping makes tons of sense and might even be
required, because you do want to visually see the scope of the
locking.

I'm not seeing the point of "guard" in that name either, but I don't
think the above is the complex case for naming, for use, or for
syntax.

In contrast, I think reference counting and allocations are the much
more interesting one, and the case where you are also more likely to
have multiple consecutive ones, and where it's likely less about "this
region is protected" and much more about the RAII patterns and
resources that you just want cleanup for.

So that's why I had that "auto_release" name - to make it clear that
it's about the *cleanup*, not about some "guarded region". See my
argument?

The locking cases can probably be handled with just a couple of macros
(mutex, rcu, maybe spinlock?) and I would violently argue that the
locking cases will have to have unconditional cleanup (ie if you
*ever* have the situation where some path is supposed to be able to
return with the lock held and no cleanup, then you should simply not
use the "mutex_scope()" kind of helper AT ALL).

So locking is, I feel, almost uninteresting. There are only a few
cases, and only trivial patterns for it, because anything non-trivial
had better be done very explicitly, and very much visibly by hand.

But look at the cases where we currently use "goto exit" kind of code,
or have duplicated cleanups because we do "cleanup(x); return -ERRNO"
kinds of patterns.

Some of them are locking, yes. But a lot of them are more generic "do
this before returning" kinds of things: freeing (possibly complex)
data structures, uncharging statistcs, or even things like writing
results back to user space.

And, unlike locking, those often (but not always) end up having the
"don't do this in the single success case" situation, so that you need
to have some way to show "ok, don't release it after all". Yes, that
ends up in practice likely being setting that special pointer to NULL,
but I think we need to have a much better syntax for it to show that
"now we're *not* doing the auto-release".

So it's that more complex case that I

 (a) don't think "guard" makes any sense for at all

 (b) think it's where the bigger payoffs are

 (c) think that generally are not "nested" (you release the resources
you've allocated, but you don't "nest" the allocations - they are just
serial.

 (d) think that the syntax could be pretty nasty, because the cleanup
is not just a trivial fixed "unlock" function

Just as an example of something that has a bit of everything, look at
kernel/cpu.c and the _cpu_down() function in particular, which has
that "goto out" to do all the cleanup.

That looks like a perfect case for having some nice "associate the
cleanup with the initialization" RAII patterns, but while it does have
one lock (cpus_write_lock/unlock()), even that one is abstracted away
so that it may be a no-op, and might be a percpu rwlock, and so having
to create a special macro for that would be more work than is worth
it.

Because what I *don't* want to happen is that people create some
one-time-use "helper" macro infrastructure to use this all. Again,
that cpus_write_lock/unlock is a good example of this.

End result: to avoid having crazy indentation, and crazy one-time
helper inline functions or whatever, I think you'd really want some
fairly generic model for

  "I am doing X, and I want to to Y as a cleanup when you exit the function"

where both X and Y can be *fairly* complicated things. They might be
as simple (and reasonably common) as "fd = fdget(f)" and "fdput(fd)",
but what about the one-offs like that cpus_write_lock/unlock pattern?

That one only happens in two places (_cpu_down() and _cpu_up()), and
maybe the answer is "we can't reasonably do it for that, because the
complexity is not worth it".

But maybe we *can*.

For example, this is likely the only realistic *good* case for the
horrible "nested function" thing that gcc supports. In my "look, we
could clean up a 'struct fd' example from last week, I made it do
something like this:

    /* Trivial "getfd()" wrapper */
    static inline void release_fd(struct fd *fd)
    { fdput(*fd); }

    #define auto_getfd(name, n) \
        auto_release_name(struct fd, name, fdget(n), release_fd)

but it would possibly be much more generic, and much more useful, if
that "release_fd()" function was generated by the macro as a local
nested inline function.

End result: you could use any arbitrary local cleanup code.

So you could have something like

  #define RAII(type, var, init, exit) \
        __RAII(type, var, init, exit, __UNIQUE_ID(fn)

  #define __RAII(type, var, init, exit, exitname) \
        void exitname(type *p) { exit } \
        type var __attribute__((__cleanup__(exitname))) = (init)

and do all of the above with

    RAII(struct fd, fd, fdget(f), fdput(fd));

because that macro would literally expand to create a (uniquely named)
nested function that then contains that "fdput(fd)".

I dunno. The syntax looks pretty bad, and I'm not even convinced clang
supports nested functions, so the above may not be an option.

But wouldn't it be nice to just be able to declare that arbitrary
cleanup in-place?

Then the lock cases would really just be trivial helper wrappers. And
you can use "guard" there if you want, but I feel it makes absolutely
*no* sense in the generic case. There is absolutely nothng that is
being "guarded" by having an automatic cleanup of some random
variable.

              Linus

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v2 0/2] Lock and Pointer guards
  2023-06-06  9:42     ` Peter Zijlstra
  2023-06-06 13:17       ` Linus Torvalds
@ 2023-06-06 15:31       ` Kees Cook
  2023-06-06 15:45         ` Linus Torvalds
  1 sibling, 1 reply; 42+ messages in thread
From: Kees Cook @ 2023-06-06 15:31 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Linus Torvalds, gregkh, pbonzini, linux-kernel, ojeda,
	ndesaulniers, mingo, will, longman, boqun.feng, juri.lelli,
	vincent.guittot, dietmar.eggemann, rostedt, bsegall, mgorman,
	bristot, vschneid, paulmck, frederic, quic_neeraju, joel, josh,
	mathieu.desnoyers, jiangshanlai, rcu, tj, tglx, linux-toolchains

On Tue, Jun 06, 2023 at 11:42:51AM +0200, Peter Zijlstra wrote:
> [...]
> #define scoped_guard(_guard, _var...)                                  \
>        for (__ptr_guard(_guard, scope) = guard_init(_guard, _var),     \
>             *done = NULL; !done; done = (void *)1)

nit: Linus's example was "(void *)8" (instead of 1) because we've had
issues in the past with alignment warnings on archs that are sensitive
to it. (e.g. see the __is_constexpr() macro which is doing NULL/!NULL
comparisons.)

-- 
Kees Cook

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v2 0/2] Lock and Pointer guards
  2023-06-06 15:31       ` Kees Cook
@ 2023-06-06 15:45         ` Linus Torvalds
  2023-06-06 16:08           ` Kees Cook
  2023-06-08 16:25           ` David Laight
  0 siblings, 2 replies; 42+ messages in thread
From: Linus Torvalds @ 2023-06-06 15:45 UTC (permalink / raw)
  To: Kees Cook
  Cc: Peter Zijlstra, gregkh, pbonzini, linux-kernel, ojeda,
	ndesaulniers, mingo, will, longman, boqun.feng, juri.lelli,
	vincent.guittot, dietmar.eggemann, rostedt, bsegall, mgorman,
	bristot, vschneid, paulmck, frederic, quic_neeraju, joel, josh,
	mathieu.desnoyers, jiangshanlai, rcu, tj, tglx, linux-toolchains

On Tue, Jun 6, 2023 at 8:31 AM Kees Cook <keescook@chromium.org> wrote:
>
> nit: Linus's example was "(void *)8" (instead of 1) because we've had
> issues in the past with alignment warnings on archs that are sensitive
> to it. (e.g. see the __is_constexpr() macro which is doing NULL/!NULL
> comparisons.)

Note that I don't think we ever saw such a warning, it was just a
theoretical observation that depending on type, the compiler might
warn about known mis-aligned pointer bits.

So I'm not sure the 1-vs-8 actually matters. We do other things that
assume that low bits in a pointer are retained and valid, even if in
theory the C type system might have issues with it.

But maybe I mis-remember - if you did get an actual warning, maybe we
should document that warning just to keep the memory alive.

            Linus

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v2 0/2] Lock and Pointer guards
  2023-06-06 14:50           ` Linus Torvalds
@ 2023-06-06 16:06             ` Kees Cook
  2023-06-06 18:08             ` Peter Zijlstra
  1 sibling, 0 replies; 42+ messages in thread
From: Kees Cook @ 2023-06-06 16:06 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Peter Zijlstra, gregkh, pbonzini, linux-kernel, ojeda,
	ndesaulniers, mingo, will, longman, boqun.feng, juri.lelli,
	vincent.guittot, dietmar.eggemann, rostedt, bsegall, mgorman,
	bristot, vschneid, paulmck, frederic, quic_neeraju, joel, josh,
	mathieu.desnoyers, jiangshanlai, rcu, tj, tglx, linux-toolchains

On Tue, Jun 06, 2023 at 07:50:47AM -0700, Linus Torvalds wrote:
> So you could have something like
> 
>   #define RAII(type, var, init, exit) \
>         __RAII(type, var, init, exit, __UNIQUE_ID(fn)
> 
>   #define __RAII(type, var, init, exit, exitname) \
>         void exitname(type *p) { exit } \
>         type var __attribute__((__cleanup__(exitname))) = (init)
> 
> and do all of the above with
> 
>     RAII(struct fd, fd, fdget(f), fdput(fd));

"fdput(fd)" needs to be "fdput(*p)", since otherwise "fdput(fd)" is
referencing "fd" before it has been declared.

But regardless, yes, Clang is angry about the nested function. Also,
while my toy[1] example doesn't show it, GCC may also generate code
that requires an executable stack for some instances (or at least it
did historically) that need trampolines.

[1] https://godbolt.org/z/WTjx6Gs7x

Also, more nits on naming: isn't this more accurately called Scope-based
Resource Management (SBRM) not RAII? (RAII is technically object lifetime,
and SBRM is scope entry/exit.)

-- 
Kees Cook

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v2 0/2] Lock and Pointer guards
  2023-06-06 15:45         ` Linus Torvalds
@ 2023-06-06 16:08           ` Kees Cook
  2023-06-08 16:25           ` David Laight
  1 sibling, 0 replies; 42+ messages in thread
From: Kees Cook @ 2023-06-06 16:08 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Peter Zijlstra, gregkh, pbonzini, linux-kernel, ojeda,
	ndesaulniers, mingo, will, longman, boqun.feng, juri.lelli,
	vincent.guittot, dietmar.eggemann, rostedt, bsegall, mgorman,
	bristot, vschneid, paulmck, frederic, quic_neeraju, joel, josh,
	mathieu.desnoyers, jiangshanlai, rcu, tj, tglx, linux-toolchains

On Tue, Jun 06, 2023 at 08:45:49AM -0700, Linus Torvalds wrote:
> On Tue, Jun 6, 2023 at 8:31 AM Kees Cook <keescook@chromium.org> wrote:
> >
> > nit: Linus's example was "(void *)8" (instead of 1) because we've had
> > issues in the past with alignment warnings on archs that are sensitive
> > to it. (e.g. see the __is_constexpr() macro which is doing NULL/!NULL
> > comparisons.)
> 
> Note that I don't think we ever saw such a warning, it was just a
> theoretical observation that depending on type, the compiler might
> warn about known mis-aligned pointer bits.
> 
> So I'm not sure the 1-vs-8 actually matters. We do other things that
> assume that low bits in a pointer are retained and valid, even if in
> theory the C type system might have issues with it.
> 
> But maybe I mis-remember - if you did get an actual warning, maybe we
> should document that warning just to keep the memory alive.

I've never seen a warning, but since this came up in the dissection of
the __is_constexpr() behavior, it's been burned into my mind. ;)

-- 
Kees Cook

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v2 0/2] Lock and Pointer guards
  2023-06-06 14:50           ` Linus Torvalds
  2023-06-06 16:06             ` Kees Cook
@ 2023-06-06 18:08             ` Peter Zijlstra
  2023-06-06 23:22               ` Linus Torvalds
  1 sibling, 1 reply; 42+ messages in thread
From: Peter Zijlstra @ 2023-06-06 18:08 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: keescook, gregkh, pbonzini, linux-kernel, ojeda, ndesaulniers,
	mingo, will, longman, boqun.feng, juri.lelli, vincent.guittot,
	dietmar.eggemann, rostedt, bsegall, mgorman, bristot, vschneid,
	paulmck, frederic, quic_neeraju, joel, josh, mathieu.desnoyers,
	jiangshanlai, rcu, tj, tglx, linux-toolchains

On Tue, Jun 06, 2023 at 07:50:47AM -0700, Linus Torvalds wrote:
> I feel like you seem entirely too fixated on locking.

It's what I started out with... but it's certainly not everything.

The thing I have removes pretty much all the error gotos from
sched/core.c and events/core.c and while locking is ofcourse a fair
amount of that there's significant non-locking usage.

>  (c) think that generally are not "nested" (you release the resources
> you've allocated, but you don't "nest" the allocations - they are just
> serial.
> 
>  (d) think that the syntax could be pretty nasty, because the cleanup
> is not just a trivial fixed "unlock" function

So I'm not sure you've seen the actual convertions I've done, but yes,
there's lots of those.

I've just used the guard naming because locks is what I started out
with.

+	ptr_guard(kfree, alloc) = kzalloc(event->read_size, GFP_KERNEL);
+	if (!alloc)
 		return -ENOMEM;


 	event = perf_event_alloc(&attr, cpu, task, group_leader, NULL,
 				 NULL, NULL, cgroup_fd);
+	if (IS_ERR(event))
+		return PTR_ERR(event);
+
+	ptr_guard(free_event, event_guard) = event;


+	ptr_guard(put_task, p) = find_get_task(pid);
+	if (!p)
+		return -ESRCH;


So it does all that... you just hate the naming -- surely we can fix
that.


Would it all be less offensive if I did: s/guard/cleanup/ on the whole
thing? Then we'd have things like:


DEFINE_PTR_CLEANUP(put_task, struct task_struct *,
		   if (_C) put_task_struct(_C))

	ptr_cleanup(put_task, p) = find_get_task(pid);
	if (!p)
		return -ESRCH;


DEFINE_PTR_CLEANUP(free_event, struct perf_event *,
		   if (!IS_ERR_OR_NULL(_C)) free_event(_C))

	ptr_cleanup(free_event, event) = perf_event_alloc(...);
	if (IS_ERR(event))
		return PTR_ERR(event);


DEFINE_PTR_CLEANUP(kfree, void *, kfree(_C))

	ptr_cleanup(kfree, foo) = kzalloc(...);
	if (!foo)
		return -ENOMEM;


But do we then continue with:

DEFINE_CLEANUP(mutex, struct mutex *, mutex_lock(_C), mutex_unlock(_C))

	cleanup(mutex, &my_mutex);


	scoped_cleanup (mutex, &my_mutex) {
		...
	}

etc..? or do we keep guard() there, but based internally on
ptr_cleanup() with the cleanup_## naming.

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v2 0/2] Lock and Pointer guards
  2023-06-06 18:08             ` Peter Zijlstra
@ 2023-06-06 23:22               ` Linus Torvalds
  2023-06-07  9:41                 ` Peter Zijlstra
  0 siblings, 1 reply; 42+ messages in thread
From: Linus Torvalds @ 2023-06-06 23:22 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: keescook, gregkh, pbonzini, linux-kernel, ojeda, ndesaulniers,
	mingo, will, longman, boqun.feng, juri.lelli, vincent.guittot,
	dietmar.eggemann, rostedt, bsegall, mgorman, bristot, vschneid,
	paulmck, frederic, quic_neeraju, joel, josh, mathieu.desnoyers,
	jiangshanlai, rcu, tj, tglx, linux-toolchains

On Tue, Jun 6, 2023 at 11:08 AM Peter Zijlstra <peterz@infradead.org> wrote:>
> Would it all be less offensive if I did: s/guard/cleanup/ on the whole
> thing?

It's more than "guard" for me.

What is "ptr"? Why? We already know of at least one case where it's
not a pointer at all, ie 'struct fd'.

So I *really* hate the naming. Absolutely none of it makes sense to
me. One part is a nonsensical name apparently based on a special-case
operation, and the other part is a nonsensical type from just one
random - if common - implementation issue.

What you want to do is to have a way to define and name a
"constructor/desctructor" pair for an arbitrary type - *not*
necessarily a pointer - and then optionally a way to say "Oh, don't do
the destructor, because I'm actually going to use it long-term".

I said "cleanup", but that's not right either, since we always have to
have that initializer too.

Maybe just bite the bullet, and call the damn thing a "class", and
have some syntax like

     DEFINE_CLASS(name, type, exit, init, initarg...);

to create the infrastructure for some named 'class'. So you'd have

    DEFINE_CLASS(mutex, struct mutex *,
        mutex_unlock(*_P),
        ({mutex_lock(mutex); mutex;}), struct mutex *mutex)

to define the mutex "class", and do

    DEFINE_CLASS(fd, struct fd,
        fdput(*_P),
        fdget(f), int f)

for the 'struct fd' thing.

The above basically would just create the wrapper functions (and
typedefs) for the constructor and destructor.

So the 'DEFINE_CLASS(mutex ..)' thing would basically just expand to

    typedef struct mutex *class_mutex_type;
    static inline void class_mutex_destructor(class_mutex_type *_C)
    { mutex_unlock(*_C); }
    static inline class_mutex_type class_mutex_constructor(struct mutex *mutex)
    { return ({mutex_lock(mutex); mutex;}); }

Then to _instantiate_ one of those, you'd do

    INSTANTIATE_CLASS(name, var)

which would expand to

    class_name_type var
        __attribute__((__cleanup__(class_name_destructor))) =
class_name_constructor

and the magic of that syntax is that you'd actually use that
"INSTANTIATE_CLASS()" with the argument to the init function
afterwards, so you'd actually do

    INSTANTIATE_CLASS(mutex, n)(&sched_domains_mutex);

to create a variable 'n' of class 'mutex', where the
class_mutex_constructor gets the pointer to 'sched_domain_mutex' as
the argument.

And at THAT point, you can do this:

    #define mutex_guard(m) \
        INSTANTIATE_CLASS(mutex, __UNIQUE_ID(mutex))(m)

and now you can do

       mutex_guard(&sched_domains_mutex);

to basically start a guarded section where you hold 'sched_domains_mutex'.

And in that *very* least situation, 'guard' makes sense in the name.
But no earlier. And there is absolutely no 'ptr' anywhere here.

The above should work also for the 'struct fd' case, so you can do

       INSTANTIATE(fd, f)(fd);

to create a 'struct fd' named 'f' that is initialized with
'fdget(fd)', and will DTRT when going out of scope. Modulo any stupid
errors I did.

And ok, I didn't write out the exact macro magic to *get* those
expansions above (I just tried to write out the approximate end
result), but it should be mostly fairly straightforward.

So it would be the usual token pasting tricks to get the 'class type',
the 'class destructor' and the 'class constructor' functions.

Finally, note that the "initarg" part is a macro vararg thing. The
initargs can be empty (for things like RCU), but it could also
possibly be multiple arguments (like a "size and gfp_flags" for an
allocation?).

I'm sure there's something horribly wrong in the above, but my point
is that I'd really like this to make naming and conceptual sense.

And if somebody goes "The above is basically exactly what the original
C++ compiler did back when it was really just a pre-processor for C",
then you'd be 100% right. The above is basically (a part of) what
Bjarne did, except he did it as a separate pass.

And to answer the next question: why not just give up and do C++ -
it's because of all the *other* things Bjarne did.

             Linus

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v2 0/2] Lock and Pointer guards
  2023-06-06 23:22               ` Linus Torvalds
@ 2023-06-07  9:41                 ` Peter Zijlstra
  2023-06-08  8:52                   ` Peter Zijlstra
  0 siblings, 1 reply; 42+ messages in thread
From: Peter Zijlstra @ 2023-06-07  9:41 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: keescook, gregkh, pbonzini, linux-kernel, ojeda, ndesaulniers,
	mingo, will, longman, boqun.feng, juri.lelli, vincent.guittot,
	dietmar.eggemann, rostedt, bsegall, mgorman, bristot, vschneid,
	paulmck, frederic, quic_neeraju, joel, josh, mathieu.desnoyers,
	jiangshanlai, rcu, tj, tglx, linux-toolchains

On Tue, Jun 06, 2023 at 04:22:26PM -0700, Linus Torvalds wrote:
> On Tue, Jun 6, 2023 at 11:08 AM Peter Zijlstra <peterz@infradead.org> wrote:>
> > Would it all be less offensive if I did: s/guard/cleanup/ on the whole
> > thing?
> 
> It's more than "guard" for me.
> 
> What is "ptr"? Why? We already know of at least one case where it's
> not a pointer at all, ie 'struct fd'.

(so in my view struct fd is nothing more than a fat pointer)

> So I *really* hate the naming. Absolutely none of it makes sense to
> me. One part is a nonsensical name apparently based on a special-case
> operation, and the other part is a nonsensical type from just one
> random - if common - implementation issue.
> 
> What you want to do is to have a way to define and name a
> "constructor/desctructor" pair for an arbitrary type - *not*
> necessarily a pointer - and then optionally a way to say "Oh, don't do
> the destructor, because I'm actually going to use it long-term".

Yes, so when it's a 'pointer', that part becomes assigning it NULL (or
fdnull in the struct fd case). For example:

DEFINE_PTR_CLEANUP(kfree, void *, kfree(_C))

	ptr_cleanup(kfree, mem) = kzalloc(....);
	if (!mem)
		return -ENOMEM;

	object = mem;

	// build object with more failure cases

	mem = NULL;          // object is a success, we keep it.
	return object;

> I said "cleanup", but that's not right either, since we always have to
> have that initializer too.

I've found that for most things the initializer part isn't actually that
important. Consider that struct fd thing again; perf has a helper:

static inline struct fd perf_fget_light(int fd)
{
	struct fd f = fdget(fd);
	if (!f.file)
		return fdnull;

	if (f.file->f_op != &perf_fops) {
		fdput(f);
		return fdnull;
	}
	return f;
}

So now we have both fdget() and perf_fget_light() to obtain a struct fd,
both need fdput().

The pointer with destructor semantics works for both:

DEFINE_PTR_CLEANUP(fdput, struct fd, fdput(_C))

	ptr_cleanup(fdput, f) = perf_fget_light(fd);

or, somewhere else:

	ptr_cleanup(fdput, f) = fdget(fd);


The same is true for kfree(), we have a giant pile of allocation
functions that all are freed with kfree(): kmalloc(), kzalloc(),
kmalloc_node(), kzalloc_node(), krealloc(), kmalloc_array(),
krealloc_array(), kcalloc(), etc..

> Maybe just bite the bullet, and call the damn thing a "class", and
> have some syntax like
> 
>      DEFINE_CLASS(name, type, exit, init, initarg...);
> 
> to create the infrastructure for some named 'class'. So you'd have
> 
>     DEFINE_CLASS(mutex, struct mutex *,
>         mutex_unlock(*_P),
>         ({mutex_lock(mutex); mutex;}), struct mutex *mutex)
> 
> to define the mutex "class", and do
> 
>     DEFINE_CLASS(fd, struct fd,
>         fdput(*_P),
>         fdget(f), int f)
> 
> for the 'struct fd' thing.

Right; that is very close to what I have. And certainly useful --
although as per the above, perhaps not so for the struct fd case.

> Then to _instantiate_ one of those, you'd do
> 
>     INSTANTIATE_CLASS(name, var)
> 
> which would expand to
> 
>     class_name_type var
>         __attribute__((__cleanup__(class_name_destructor))) =
> class_name_constructor
> 
> and the magic of that syntax is that you'd actually use that
> "INSTANTIATE_CLASS()" with the argument to the init function
> afterwards, so you'd actually do
> 
>     INSTANTIATE_CLASS(mutex, n)(&sched_domains_mutex);
> 
> to create a variable 'n' of class 'mutex', where the
> class_mutex_constructor gets the pointer to 'sched_domain_mutex' as
> the argument.

Yes, I had actually considered this syntax, and I really like it. The
only reason I hadn't done that is because the for-loop thing, there I
couldn't make it work.

> I'm sure there's something horribly wrong in the above, but my point
> is that I'd really like this to make naming and conceptual sense.

Right, I hear ya. So the asymmetric case (iow destructor only) could be
seen as using the copy-constructor.

#define DEFINE_CLASS(name, type, exit, init, init_args...)		\
typedef type class_##name##_t;						\
static inline void class_##name##_destructor(type *this)		\
{ type THIS = *this; exit; }						\
static inline type class_##name##_constructor(init_args)		\
{ type THIS = init; return THIS; }

#define __INSTANTIATE_VAR(name, var)					\
	class_##name##_t var __cleanup(class_##name##_destructor)

#define INSTANTIATE_CLASS(name, var)					\
	__INSTANTIATE_VAR(name, var) = class_##name##_constructor


DEFINE_CLASS(fd, struct fd, fdput(THIS), f, struct fd f)

	INSTANTIATE_CLASS(fd, f)(perf_fget_light(fd));


Alternatively, you be OK with exposing INSTANTIATE_VAR() to easily
circumvent the default constructor?

And/or how about EXTEND_CLASS(), something like so?

#define EXTEND_CLASS(name, ext, init, init_args...)			\
typedef class_##name##_t class_##name##ext##_t;				\
static inline void class_##name##ext##_destructor(class_##name##_t *this) \
{ class_##name##_destructor(this); }					\
static inline type class_##name##ext##_constructor(init_args)		\
{ type THIS = init; return THIS; }


DEFINE_CLASS(fd, struct fd, fdput(THIS), fdget(fd), int fd)
EXTEND_CLASS(fd, _perf, perf_fget_light(fd), int fd)

	INSTANTIATE_CLASS(fd_perf, f)(fd);


> And at THAT point, you can do this:
> 
>     #define mutex_guard(m) \
>         INSTANTIATE_CLASS(mutex, __UNIQUE_ID(mutex))(m)
> 
> and now you can do
> 
>        mutex_guard(&sched_domains_mutex);

So the 'problem' is the amount of different guards I ended up having and
you can't have macro's define more macros :/

Which is how I ended up with the:

	guard(mutex, &sched_domains_mutex);

syntax.

This can ofcourse be achieved using the above CLASS thing like:

DEFINE_CLASS(mutex, struct mutex *, mutex_unlock(THIS),
	     ({ mutex_lock(m); m; }), struct mutex *m)

#define named_guard(name, var, args...)					\
	INSTANTIATE_CLASS(name, var)(args)

#define guard(name, args...)						\
	named_guard(name, __UNIQUE_ID(guard), args)

#define scoped_guard(name, args...)					\
	for (named_guard(name, scope, args),				\
	     *done = NULL; !done; done = (void *)1)


With the understanding they're only to be used for locks.

Also, I'm already tired of writing INSTANTIATE.. would:

	CLASS(fd, f)(fd);

	VAR(kfree, mem) = kzalloc_node(...);

be acceptable shorthand?

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v2 0/2] Lock and Pointer guards
  2023-06-07  9:41                 ` Peter Zijlstra
@ 2023-06-08  8:52                   ` Peter Zijlstra
  2023-06-08  9:04                     ` Greg KH
  2023-06-08 15:45                     ` Linus Torvalds
  0 siblings, 2 replies; 42+ messages in thread
From: Peter Zijlstra @ 2023-06-08  8:52 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: keescook, gregkh, pbonzini, linux-kernel, ojeda, ndesaulniers,
	mingo, will, longman, boqun.feng, juri.lelli, vincent.guittot,
	dietmar.eggemann, rostedt, bsegall, mgorman, bristot, vschneid,
	paulmck, frederic, quic_neeraju, joel, josh, mathieu.desnoyers,
	jiangshanlai, rcu, tj, tglx, linux-toolchains

On Wed, Jun 07, 2023 at 11:41:01AM +0200, Peter Zijlstra wrote:


> > I'm sure there's something horribly wrong in the above, but my point
> > is that I'd really like this to make naming and conceptual sense.
> 
> Right, I hear ya. So the asymmetric case (iow destructor only) could be
> seen as using the copy-constructor.
> 
> #define DEFINE_CLASS(name, type, exit, init, init_args...)		\
> typedef type class_##name##_t;						\
> static inline void class_##name##_destructor(type *this)		\
> { type THIS = *this; exit; }						\
> static inline type class_##name##_constructor(init_args)		\
> { type THIS = init; return THIS; }
> 
> #define __INSTANTIATE_VAR(name, var)					\
> 	class_##name##_t var __cleanup(class_##name##_destructor)
> 
> #define INSTANTIATE_CLASS(name, var)					\
> 	__INSTANTIATE_VAR(name, var) = class_##name##_constructor
> 
> 
> DEFINE_CLASS(fd, struct fd, fdput(THIS), f, struct fd f)
> 
> 	INSTANTIATE_CLASS(fd, f)(perf_fget_light(fd));
> 
> 
> Alternatively, you be OK with exposing INSTANTIATE_VAR() to easily
> circumvent the default constructor?

Or perhaps use the smart-pointer concept applied to our classes like:

#define smart_ptr(name, var) \
	__INSTANTIATE_VAR(name, var)

To mean a pointer that calls the destructor for class 'name'. I think
the nearest thing C++ has is std::unique_ptr<>.


Then we can write:


DEFINE_CLASS(kfree, void *, kfree(THIS), p, void *p)


	smart_ptr(kfree, mem) = kzalloc_node(...);
	if (!mem)
		return -ENOMEM;

	object = mem;

	// further initiatlize object with error cases etc..

	mem = NULL; // success, we keep it.
	return object;




^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v2 0/2] Lock and Pointer guards
  2023-06-08  8:52                   ` Peter Zijlstra
@ 2023-06-08  9:04                     ` Greg KH
  2023-06-08 15:45                     ` Linus Torvalds
  1 sibling, 0 replies; 42+ messages in thread
From: Greg KH @ 2023-06-08  9:04 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Linus Torvalds, keescook, pbonzini, linux-kernel, ojeda,
	ndesaulniers, mingo, will, longman, boqun.feng, juri.lelli,
	vincent.guittot, dietmar.eggemann, rostedt, bsegall, mgorman,
	bristot, vschneid, paulmck, frederic, quic_neeraju, joel, josh,
	mathieu.desnoyers, jiangshanlai, rcu, tj, tglx, linux-toolchains

On Thu, Jun 08, 2023 at 10:52:48AM +0200, Peter Zijlstra wrote:
> On Wed, Jun 07, 2023 at 11:41:01AM +0200, Peter Zijlstra wrote:
> 
> 
> > > I'm sure there's something horribly wrong in the above, but my point
> > > is that I'd really like this to make naming and conceptual sense.
> > 
> > Right, I hear ya. So the asymmetric case (iow destructor only) could be
> > seen as using the copy-constructor.
> > 
> > #define DEFINE_CLASS(name, type, exit, init, init_args...)		\
> > typedef type class_##name##_t;						\
> > static inline void class_##name##_destructor(type *this)		\
> > { type THIS = *this; exit; }						\
> > static inline type class_##name##_constructor(init_args)		\
> > { type THIS = init; return THIS; }
> > 
> > #define __INSTANTIATE_VAR(name, var)					\
> > 	class_##name##_t var __cleanup(class_##name##_destructor)
> > 
> > #define INSTANTIATE_CLASS(name, var)					\
> > 	__INSTANTIATE_VAR(name, var) = class_##name##_constructor
> > 
> > 
> > DEFINE_CLASS(fd, struct fd, fdput(THIS), f, struct fd f)
> > 
> > 	INSTANTIATE_CLASS(fd, f)(perf_fget_light(fd));
> > 
> > 
> > Alternatively, you be OK with exposing INSTANTIATE_VAR() to easily
> > circumvent the default constructor?
> 
> Or perhaps use the smart-pointer concept applied to our classes like:
> 
> #define smart_ptr(name, var) \
> 	__INSTANTIATE_VAR(name, var)
> 
> To mean a pointer that calls the destructor for class 'name'. I think
> the nearest thing C++ has is std::unique_ptr<>.
> 
> 
> Then we can write:
> 
> 
> DEFINE_CLASS(kfree, void *, kfree(THIS), p, void *p)
> 
> 
> 	smart_ptr(kfree, mem) = kzalloc_node(...);
> 	if (!mem)
> 		return -ENOMEM;
> 
> 	object = mem;
> 
> 	// further initiatlize object with error cases etc..
> 
> 	mem = NULL; // success, we keep it.
> 	return object;

I like the idea, as we need a way to say "don't clean this up, it was
passed to somewhere else" for these types of allocations, but have it
"automatically" cleaned up on the error paths.

I have no say in the naming, though I always disliked the idea of a
pointer being "smart" as they are just a dumb memory register :)

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v2 0/2] Lock and Pointer guards
  2023-06-08  8:52                   ` Peter Zijlstra
  2023-06-08  9:04                     ` Greg KH
@ 2023-06-08 15:45                     ` Linus Torvalds
  2023-06-08 16:47                       ` Kees Cook
                                         ` (2 more replies)
  1 sibling, 3 replies; 42+ messages in thread
From: Linus Torvalds @ 2023-06-08 15:45 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: keescook, gregkh, pbonzini, linux-kernel, ojeda, ndesaulniers,
	mingo, will, longman, boqun.feng, juri.lelli, vincent.guittot,
	dietmar.eggemann, rostedt, bsegall, mgorman, bristot, vschneid,
	paulmck, frederic, quic_neeraju, joel, josh, mathieu.desnoyers,
	jiangshanlai, rcu, tj, tglx, linux-toolchains

On Thu, Jun 8, 2023 at 1:53 AM Peter Zijlstra <peterz@infradead.org> wrote:
>
> Or perhaps use the smart-pointer concept applied to our classes like:
>
> #define smart_ptr(name, var) \
>         __INSTANTIATE_VAR(name, var)

So this is the only situation where I think "ptr" makes sense (your
"fat pointer" argument is nonsensical - sure, you can treat anything
as a pointer if you're brave enough, but that just means that
"pointer" becomes a meaningless word).

However, I think that for "smart pointers", we don't need any of this
complexity at all, and we don't need that ugly syntax.

> Then we can write:
>
> DEFINE_CLASS(kfree, void *, kfree(THIS), p, void *p)
>
>         smart_ptr(kfree, mem) = kzalloc_node(...);
>         if (!mem)
>                 return -ENOMEM;

No, the above is broken, and would result in us using "void *" in
situations where we really *really* don't want that.

For automatic freeing, you want something that can handle different
types properly, and without having to constantly declare the types
somewhere else before use.

And no, you do *not* want that "kfree(THIS)" kind of interface,
because you want the compiler to inline the freeing function wrapper,
and notice _statically_ when somebody zeroed the variable and not even
call "kfree()", because otherwise you'd have a pointless call to
kfree(NULL) in the success path too.

So for convenient automatic pointer freeing, you want an interface
much more akin to

        struct whatever *ptr __automatic_kfree = kmalloc(...);

which is much more legible, doesn't have any type mis-use issues, and
is also just trivially dealt with by a

  static inline void automatic_kfree_wrapper(void *pp)
  { void *p = *(void **)pp; if (p) kfree(p); }
  #define __automatic_kfree \
        __attribute__((__cleanup__(automatic_kfree_wrapper)))
  #define no_free_ptr(p) \
        ({ __auto_type __ptr = (p); (p) = NULL; __ptr; })

which I just tested generates the sane code even for the "set the ptr
to NULL and return success" case.

The above allows you to trivially do things like

        struct whatever *p __automatic_kfree = kmalloc(..);

        if (!do_something(p))
                return -ENOENT;

        return no_free_ptr(p);

and it JustWorks(tm).

And yes, it needs a couple of different versions of that
"__automatic_kfree()" wrapper for the different freeing cases (kvfree,
rcu_free, whatever), but those are all trivial one-liners.

And no, I didn't think too much about those names. "__automatic_kfree"
is too damn long to type, but you hated "auto". And "no_free_ptr()" is
not wonderful name either. But I tried to make the naming at least be
obvious, if not wonderful.

               Linus

^ permalink raw reply	[flat|nested] 42+ messages in thread

* RE: [PATCH v2 0/2] Lock and Pointer guards
  2023-06-06 15:45         ` Linus Torvalds
  2023-06-06 16:08           ` Kees Cook
@ 2023-06-08 16:25           ` David Laight
  1 sibling, 0 replies; 42+ messages in thread
From: David Laight @ 2023-06-08 16:25 UTC (permalink / raw)
  To: 'Linus Torvalds', Kees Cook
  Cc: Peter Zijlstra, gregkh@linuxfoundation.org, pbonzini@redhat.com,
	linux-kernel@vger.kernel.org, ojeda@kernel.org,
	ndesaulniers@google.com, mingo@redhat.com, will@kernel.org,
	longman@redhat.com, boqun.feng@gmail.com, juri.lelli@redhat.com,
	vincent.guittot@linaro.org, dietmar.eggemann@arm.com,
	rostedt@goodmis.org, bsegall@google.com, mgorman@suse.de,
	bristot@redhat.com, vschneid@redhat.com, paulmck@kernel.org,
	frederic@kernel.org, quic_neeraju@quicinc.com,
	joel@joelfernandes.org, josh@joshtriplett.org,
	mathieu.desnoyers@efficios.com, jiangshanlai@gmail.com,
	rcu@vger.kernel.org, tj@kernel.org, tglx@linutronix.de,
	linux-toolchains@vger.kernel.org

From: Linus Torvalds
> Sent: 06 June 2023 16:46
> 
> On Tue, Jun 6, 2023 at 8:31 AM Kees Cook <keescook@chromium.org> wrote:
> >
> > nit: Linus's example was "(void *)8" (instead of 1) because we've had
> > issues in the past with alignment warnings on archs that are sensitive
> > to it. (e.g. see the __is_constexpr() macro which is doing NULL/!NULL
> > comparisons.)

__is_constexpr() is playing entirely different games.
Basically the type of (x ? (void *)y : (int *)z)
depends on whether 'y' is a compile-time 0 (int *) or not (void *).

...
> So I'm not sure the 1-vs-8 actually matters. We do other things that
> assume that low bits in a pointer are retained and valid, even if in
> theory the C type system might have issues with it.

Yes, given that gcc will assume a pointer is aligned if you try
to memcpy() from it, I'm surprised it doesn't always assume that.
In which case (long)ptr_to_int & 3 can be validly assumed to be zero.

I've found some 'day job' code that passed the address of a
member of a 'packed' structure to a function which then used
host ordered unsigned char[] accesses.
The compiler is certainly allowed to convert that back to
a word write - which would then fault.
(I've not looked to see if any modern compilers do.)
Of course the simple ptr->member would have be fine.

	David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v2 0/2] Lock and Pointer guards
  2023-06-08 15:45                     ` Linus Torvalds
@ 2023-06-08 16:47                       ` Kees Cook
  2023-06-08 16:59                         ` Linus Torvalds
  2023-06-08 17:20                         ` Nick Desaulniers
  2023-06-08 20:06                       ` Peter Zijlstra
  2023-06-09  8:27                       ` Rasmus Villemoes
  2 siblings, 2 replies; 42+ messages in thread
From: Kees Cook @ 2023-06-08 16:47 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Peter Zijlstra, gregkh, pbonzini, linux-kernel, ojeda,
	ndesaulniers, mingo, will, longman, boqun.feng, juri.lelli,
	vincent.guittot, dietmar.eggemann, rostedt, bsegall, mgorman,
	bristot, vschneid, paulmck, frederic, quic_neeraju, joel, josh,
	mathieu.desnoyers, jiangshanlai, rcu, tj, tglx, linux-toolchains

On Thu, Jun 08, 2023 at 08:45:53AM -0700, Linus Torvalds wrote:
> So for convenient automatic pointer freeing, you want an interface
> much more akin to
> 
>         struct whatever *ptr __automatic_kfree = kmalloc(...);
> 
> which is much more legible, doesn't have any type mis-use issues, and
> is also just trivially dealt with by a
> 
>   static inline void automatic_kfree_wrapper(void *pp)
>   { void *p = *(void **)pp; if (p) kfree(p); }
>   #define __automatic_kfree \
>         __attribute__((__cleanup__(automatic_kfree_wrapper)))
>   #define no_free_ptr(p) \
>         ({ __auto_type __ptr = (p); (p) = NULL; __ptr; })
> 
> which I just tested generates the sane code even for the "set the ptr
> to NULL and return success" case.
> 
> The above allows you to trivially do things like
> 
>         struct whatever *p __automatic_kfree = kmalloc(..);
> 
>         if (!do_something(p))
>                 return -ENOENT;
> 
>         return no_free_ptr(p);

I am a little worried about how (any version so far of) this API could go
wrong, e.g. if someone uses this and does "return p" instead of "return
no_free_ptr(p)", it'll return a freed pointer. I was hoping we could do
something like this to the end of automatic_kfree_wrapper():

	*(void **)pp = NULL;

i.e. if no_free_ptr() goes missing, "return p" will return NULL, which
is much easier to track down that dealing with later use-after-free bugs,
etc. Unfortunately, the __cleanup ordering is _after_ the compiler stores
the return value...

static inline void cleanup_info(struct info **p)
{
	free(*p);
	*p = NULL; /* this is effectively ignored */
}

struct info *do_something(int f)
{
	struct info *var __attribute__((__cleanup__(cleanup_info))) =
		malloc(1024);

	process(var);

	return var; /* oops, forgot to disable cleanup */
}

compile down to:

do_something:
        pushq   %rbx
        movl    $1024, %edi
        call    malloc
        movq    %rax, %rbx
        movq    %rax, %rdi
        call    process
        movq    %rbx, %rdi
        call    free
        movq    %rbx, %rax	; uses saved copy of malloc return
        popq    %rbx
        ret

The point being, if we can proactively make this hard to shoot ourselves in
the foot, that would be nice. :)

-- 
Kees Cook

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v2 0/2] Lock and Pointer guards
  2023-06-08 16:47                       ` Kees Cook
@ 2023-06-08 16:59                         ` Linus Torvalds
  2023-06-08 17:20                         ` Nick Desaulniers
  1 sibling, 0 replies; 42+ messages in thread
From: Linus Torvalds @ 2023-06-08 16:59 UTC (permalink / raw)
  To: Kees Cook
  Cc: Peter Zijlstra, gregkh, pbonzini, linux-kernel, ojeda,
	ndesaulniers, mingo, will, longman, boqun.feng, juri.lelli,
	vincent.guittot, dietmar.eggemann, rostedt, bsegall, mgorman,
	bristot, vschneid, paulmck, frederic, quic_neeraju, joel, josh,
	mathieu.desnoyers, jiangshanlai, rcu, tj, tglx, linux-toolchains

On Thu, Jun 8, 2023 at 9:47 AM Kees Cook <keescook@chromium.org> wrote:
>
> I am a little worried about how (any version so far of) this API could go
> wrong, e.g. if someone uses this and does "return p" instead of "return
> no_free_ptr(p)",

Absolutely. I think the whole "don't always cleanup" is quite
dangerous, and maybe not worth it, but it _is_ a fairly common
pattern.

>     I was hoping we could do
> something like this to the end of automatic_kfree_wrapper():
>
>        *(void **)pp = NULL;

That would be a lovely thing, but as you found out, it fundamentally
cannot work.

The whole point of "cleanup" is that it is done when the variable -
not trh *value* - goes out of scope.

So when you have that

        return var; /* oops, forgot to disable cleanup */

situation, by definition "var" hasn't gone out of scope until _after_
you read the value and return that value, so pretty much by definition
it cannot make a difference to assign something to 'pp' in the cleanup
code.

> The point being, if we can proactively make this hard to shoot ourselves in
> the foot, that would be nice. :)

So the good thing is that things like KASAN would immediately notice,
and since this all happens (presumably) for the success case, it's not
about some unlikely error case.

I also think that -fanalyzer might just catch it (as part of
-Wanalyzer-use-after-free) at compile-time, but I didn't check. So
if/when people start using -fanalyze on the kernel, those things would
be caught early.

So while this "forget to avoid cleanup" is a worry, I think it ends up
likely being one that is fairly easy to avoid with other checks in
place.

Famous last words.

             Linus

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v2 0/2] Lock and Pointer guards
  2023-06-08 16:47                       ` Kees Cook
  2023-06-08 16:59                         ` Linus Torvalds
@ 2023-06-08 17:20                         ` Nick Desaulniers
  2023-06-08 18:51                           ` Peter Zijlstra
  1 sibling, 1 reply; 42+ messages in thread
From: Nick Desaulniers @ 2023-06-08 17:20 UTC (permalink / raw)
  To: Kees Cook, Linus Torvalds
  Cc: Peter Zijlstra, gregkh, pbonzini, linux-kernel, ojeda, mingo,
	will, longman, boqun.feng, juri.lelli, vincent.guittot,
	dietmar.eggemann, rostedt, bsegall, mgorman, bristot, vschneid,
	paulmck, frederic, quic_neeraju, joel, josh, mathieu.desnoyers,
	jiangshanlai, rcu, tj, tglx, linux-toolchains

On Thu, Jun 8, 2023 at 9:47 AM Kees Cook <keescook@chromium.org> wrote:
>
> On Thu, Jun 08, 2023 at 08:45:53AM -0700, Linus Torvalds wrote:
> > So for convenient automatic pointer freeing, you want an interface
> > much more akin to
> >
> >         struct whatever *ptr __automatic_kfree = kmalloc(...);
> >
> > which is much more legible, doesn't have any type mis-use issues, and
> > is also just trivially dealt with by a
> >
> >   static inline void automatic_kfree_wrapper(void *pp)
> >   { void *p = *(void **)pp; if (p) kfree(p); }
> >   #define __automatic_kfree \
> >         __attribute__((__cleanup__(automatic_kfree_wrapper)))
> >   #define no_free_ptr(p) \
> >         ({ __auto_type __ptr = (p); (p) = NULL; __ptr; })
> >
> > which I just tested generates the sane code even for the "set the ptr
> > to NULL and return success" case.
> >
> > The above allows you to trivially do things like
> >
> >         struct whatever *p __automatic_kfree = kmalloc(..);
> >
> >         if (!do_something(p))
> >                 return -ENOENT;
> >
> >         return no_free_ptr(p);
>
> I am a little worried about how (any version so far of) this API could go
> wrong, e.g. if someone uses this and does "return p" instead of "return
> no_free_ptr(p)", it'll return a freed pointer.

Presumably, one could simply just not use RAII(/SBRM someone else
corrected me about this recently coincidentally; I taught them my fun
C++ acronym IDGAF) when working with a value that conditionally
"escapes" the local scope.

-- 
Thanks,
~Nick Desaulniers

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v2 0/2] Lock and Pointer guards
  2023-06-08 17:20                         ` Nick Desaulniers
@ 2023-06-08 18:51                           ` Peter Zijlstra
  2023-06-08 20:14                             ` Nick Desaulniers
  0 siblings, 1 reply; 42+ messages in thread
From: Peter Zijlstra @ 2023-06-08 18:51 UTC (permalink / raw)
  To: Nick Desaulniers
  Cc: Kees Cook, Linus Torvalds, gregkh, pbonzini, linux-kernel, ojeda,
	mingo, will, longman, boqun.feng, juri.lelli, vincent.guittot,
	dietmar.eggemann, rostedt, bsegall, mgorman, bristot, vschneid,
	paulmck, frederic, quic_neeraju, joel, josh, mathieu.desnoyers,
	jiangshanlai, rcu, tj, tglx, linux-toolchains

On Thu, Jun 08, 2023 at 10:20:19AM -0700, Nick Desaulniers wrote:
> Presumably, one could simply just not use RAII(/SBRM someone else
> corrected me about this recently coincidentally; I taught them my fun
> C++ acronym IDGAF) when working with a value that conditionally
> "escapes" the local scope.

But then you're back to the error goto :/

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v2 0/2] Lock and Pointer guards
  2023-06-08 15:45                     ` Linus Torvalds
  2023-06-08 16:47                       ` Kees Cook
@ 2023-06-08 20:06                       ` Peter Zijlstra
  2023-06-09  2:25                         ` Linus Torvalds
  2023-06-09  8:27                       ` Rasmus Villemoes
  2 siblings, 1 reply; 42+ messages in thread
From: Peter Zijlstra @ 2023-06-08 20:06 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: keescook, gregkh, pbonzini, linux-kernel, ojeda, ndesaulniers,
	mingo, will, longman, boqun.feng, juri.lelli, vincent.guittot,
	dietmar.eggemann, rostedt, bsegall, mgorman, bristot, vschneid,
	paulmck, frederic, quic_neeraju, joel, josh, mathieu.desnoyers,
	jiangshanlai, rcu, tj, tglx, linux-toolchains

On Thu, Jun 08, 2023 at 08:45:53AM -0700, Linus Torvalds wrote:

> > DEFINE_CLASS(kfree, void *, kfree(THIS), p, void *p)
> >
> >         smart_ptr(kfree, mem) = kzalloc_node(...);
> >         if (!mem)
> >                 return -ENOMEM;
> 
> No, the above is broken, and would result in us using "void *" in
> situations where we really *really* don't want that.
> 
> For automatic freeing, you want something that can handle different
> types properly, and without having to constantly declare the types
> somewhere else before use.

Ah, I see what you did with the no_free_ptr(), that avoids having to
have two pointers around, nice!

> So for convenient automatic pointer freeing, you want an interface
> much more akin to
> 
>         struct whatever *ptr __automatic_kfree = kmalloc(...);
> 
> which is much more legible, doesn't have any type mis-use issues, and
> is also just trivially dealt with by a
> 
>   static inline void automatic_kfree_wrapper(void *pp)
>   { void *p = *(void **)pp; if (p) kfree(p); }
>   #define __automatic_kfree \
>         __attribute__((__cleanup__(automatic_kfree_wrapper)))
>   #define no_free_ptr(p) \
>         ({ __auto_type __ptr = (p); (p) = NULL; __ptr; })
> 
> which I just tested generates the sane code even for the "set the ptr
> to NULL and return success" case.
> 
> The above allows you to trivially do things like
> 
>         struct whatever *p __automatic_kfree = kmalloc(..);
> 
>         if (!do_something(p))
>                 return -ENOENT;
> 
>         return no_free_ptr(p);
> 
> and it JustWorks(tm).

OK, something like so then?


#define DEFINE_FREE(name, type, free) \
	static inline __free_##name(type *p) { type _P = *p; free; }

#define __free(name)	__cleanup(__free_##name)

#define no_free_ptr(p) \
	({ __auto_type __ptr = (p); (p) = NULL; __ptr; })


DEFINE_FREE(kfree, void *, if (_P) kfree(_P));

	struct obj *p __free(kfree) = kmalloc(...);

	if (!do_something(p))
		return -ENOENT;

	return no_free_ptr(p);




DEFINE_CLASS(find_get_context, struct perf_event_context *,
	     if (!IS_ERR_OR_NULL(_C)) put_ctx(_C),
	     find_get_context(task, event), struct task_struct *task, struct perf_event *event)

DEFINE_FREE(free_event, struct perf_event *,
	    if (!IS_ERR_OR_NULL(_P)) free_event(_P));


	struct perf_event *event __free(free_event) = perf_event_alloc(...)
	if (IS_ERR(event))
		return event;

	class(find_get_context, ctx)(task, event);
	if (IS_ERR(ctx))
		return (void*)ctx;

	if (!task && !container_of(ctx, struct perf_cpu_context, ctx)->online)
		return -ENODEV;

	...

	event->ctx = get_ctx(ctx);

	return no_free_ptr(event);



works for me, I'll go make it happen.


^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v2 0/2] Lock and Pointer guards
  2023-06-08 18:51                           ` Peter Zijlstra
@ 2023-06-08 20:14                             ` Nick Desaulniers
  2023-06-09 10:20                               ` Paolo Bonzini
  0 siblings, 1 reply; 42+ messages in thread
From: Nick Desaulniers @ 2023-06-08 20:14 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Kees Cook, Linus Torvalds, gregkh, pbonzini, linux-kernel, ojeda,
	mingo, will, longman, boqun.feng, juri.lelli, vincent.guittot,
	dietmar.eggemann, rostedt, bsegall, mgorman, bristot, vschneid,
	paulmck, frederic, quic_neeraju, joel, josh, mathieu.desnoyers,
	jiangshanlai, rcu, tj, tglx, linux-toolchains

On Thu, Jun 8, 2023 at 11:51 AM Peter Zijlstra <peterz@infradead.org> wrote:
>
> On Thu, Jun 08, 2023 at 10:20:19AM -0700, Nick Desaulniers wrote:
> > Presumably, one could simply just not use RAII when working with a value that conditionally
> > "escapes" the local scope.
>
> But then you're back to the error goto :/

Thinking more about the expected ergonomics here over lunch...no
meaningful insights, just thoughts...

For something like a mutex/lock; I'd expect those to be acquired then
released within the same function, yeah? In which case
__attribute__((cleanup())) has fewer hazards since the resource
doesn't escape.

For a pointer to a dynamically allocated region that may get returned
to the caller...

I mean, people do this in C++. It is safe and canonical to return a
std::unique_ptr.  When created locally the destructor does the
expected thing regardless of control flow.  IIUC, std::unique_ptr's
move constructor basically does what Kees suggested earlier (trigger
warning: C++): https://github.com/llvm/llvm-project/blob/7a52f79126a59717012d8039ef875f68e3c637fd/libcxx/include/__memory/unique_ptr.h#L429-L430.
example: https://godbolt.org/z/51s49G9f1

A recent commit to clang https://reviews.llvm.org/rG301eb6b68f30
raised an interesting point (deficiency is perhaps too strong a word)
about GNU-style attributes; they generally have no meaning on an ABI.

Consider a function that returns a locally constructed
std::unique_ptr.  If the function returns a type where the caller
knows what destructor functions to run.  This is part of the ABI.

Here, we're talking about using __attribute__((cleanup())) to DTR
locally, but then we return a "raw" pointer to a caller. What cleanup
function should the caller run, implicitly, if at all?  If we use
__attribute__((cleanup())) that saves us a few gotos locally, but the
caller perhaps now needs the same treatment.
-- 
Thanks,
~Nick Desaulniers

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v2 0/2] Lock and Pointer guards
  2023-06-08 20:06                       ` Peter Zijlstra
@ 2023-06-09  2:25                         ` Linus Torvalds
  2023-06-09  8:14                           ` Peter Zijlstra
  2023-06-09 21:18                           ` Kees Cook
  0 siblings, 2 replies; 42+ messages in thread
From: Linus Torvalds @ 2023-06-09  2:25 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: keescook, gregkh, pbonzini, linux-kernel, ojeda, ndesaulniers,
	mingo, will, longman, boqun.feng, juri.lelli, vincent.guittot,
	dietmar.eggemann, rostedt, bsegall, mgorman, bristot, vschneid,
	paulmck, frederic, quic_neeraju, joel, josh, mathieu.desnoyers,
	jiangshanlai, rcu, tj, tglx, linux-toolchains

On Thu, Jun 8, 2023 at 1:06 PM Peter Zijlstra <peterz@infradead.org> wrote:
>
>         struct obj *p __free(kfree) = kmalloc(...);

Yeah, the above actually looks really good to me - I like the naming
here, and the use looks very logical to me.

Of course, maybe once I see the patches that use this I go "uhh", but
at least for now I think you've hit on a rather legible syntax.

I'm still a bit unsure of the "no_free_ptr(p)" naming, but at least
it's pretty clear about what it does.

So my only worry is that it's not pretty and to the point like your
"__free(kfree)" syntax.

But it feels workable and not misleading, so unless somebody can come
up with a better name, I think it's ok.

           Linus

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v2 0/2] Lock and Pointer guards
  2023-06-09  2:25                         ` Linus Torvalds
@ 2023-06-09  8:14                           ` Peter Zijlstra
  2023-06-09 21:18                           ` Kees Cook
  1 sibling, 0 replies; 42+ messages in thread
From: Peter Zijlstra @ 2023-06-09  8:14 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: keescook, gregkh, pbonzini, linux-kernel, ojeda, ndesaulniers,
	mingo, will, longman, boqun.feng, juri.lelli, vincent.guittot,
	dietmar.eggemann, rostedt, bsegall, mgorman, bristot, vschneid,
	paulmck, frederic, quic_neeraju, joel, josh, mathieu.desnoyers,
	jiangshanlai, rcu, tj, tglx, linux-toolchains

On Thu, Jun 08, 2023 at 07:25:27PM -0700, Linus Torvalds wrote:
> On Thu, Jun 8, 2023 at 1:06 PM Peter Zijlstra <peterz@infradead.org> wrote:
> >
> >         struct obj *p __free(kfree) = kmalloc(...);
> 
> Yeah, the above actually looks really good to me - I like the naming
> here, and the use looks very logical to me.
> 
> Of course, maybe once I see the patches that use this I go "uhh", but
> at least for now I think you've hit on a rather legible syntax.
> 
> I'm still a bit unsure of the "no_free_ptr(p)" naming, but at least
> it's pretty clear about what it does.
> 
> So my only worry is that it's not pretty and to the point like your
> "__free(kfree)" syntax.
> 
> But it feels workable and not misleading, so unless somebody can come
> up with a better name, I think it's ok.

#define return_ptr(p) \
	return no_free_ptr(p)


	struct obj *p __free(kfree) = kmalloc(...);
	if (!p)
		return ERR_PTR(-ENOMEM);

	...

	return_ptr(p);

?


^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v2 0/2] Lock and Pointer guards
  2023-06-08 15:45                     ` Linus Torvalds
  2023-06-08 16:47                       ` Kees Cook
  2023-06-08 20:06                       ` Peter Zijlstra
@ 2023-06-09  8:27                       ` Rasmus Villemoes
  2 siblings, 0 replies; 42+ messages in thread
From: Rasmus Villemoes @ 2023-06-09  8:27 UTC (permalink / raw)
  To: Linus Torvalds, Peter Zijlstra
  Cc: keescook, gregkh, pbonzini, linux-kernel, ojeda, ndesaulniers,
	mingo, will, longman, boqun.feng, juri.lelli, vincent.guittot,
	dietmar.eggemann, rostedt, bsegall, mgorman, bristot, vschneid,
	paulmck, frederic, quic_neeraju, joel, josh, mathieu.desnoyers,
	jiangshanlai, rcu, tj, tglx, linux-toolchains

On 08/06/2023 17.45, Linus Torvalds wrote:

> And no, I didn't think too much about those names. "__automatic_kfree"
> is too damn long to type, but you hated "auto". And "no_free_ptr()" is
> not wonderful name either. But I tried to make the naming at least be
> obvious, if not wonderful.

No opinion either way, just throwing a data point out there:

So the systemd codebase makes quite heavy use of this automatic cleanup
stuff. Their name for no_free_ptr is TAKE_PTR(), and the verb "take"
apparently comes from Rust:

//// src/fundamental/macro-fundamental.h

/* Takes inspiration from Rust's Option::take() method: reads and
returns a pointer, but at the same time
 * resets it to NULL. See:
https://doc.rust-lang.org/std/option/enum.Option.html#method.take */
#define TAKE_GENERIC(var, type, nullvalue)                       \
        ({                                                       \
                type *_pvar_ = &(var);                           \
                type _var_ = *_pvar_;                            \
                type _nullvalue_ = nullvalue;                    \
                *_pvar_ = _nullvalue_;                           \
                _var_;                                           \
        })
#define TAKE_PTR_TYPE(ptr, type) TAKE_GENERIC(ptr, type, NULL)
#define TAKE_PTR(ptr) TAKE_PTR_TYPE(ptr, typeof(ptr))
#define TAKE_STRUCT_TYPE(s, type) TAKE_GENERIC(s, type, {})
#define TAKE_STRUCT(s) TAKE_STRUCT_TYPE(s, typeof(s))


and then they also have a

/* Like TAKE_PTR() but for file descriptors, resetting them to -EBADF */
#define TAKE_FD(fd) TAKE_GENERIC(fd, int, -EBADF)

with their "auto-close fd" helper of course knowing to ignore any
negative value.

Rasmus


^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v2 0/2] Lock and Pointer guards
  2023-06-08 20:14                             ` Nick Desaulniers
@ 2023-06-09 10:20                               ` Paolo Bonzini
  0 siblings, 0 replies; 42+ messages in thread
From: Paolo Bonzini @ 2023-06-09 10:20 UTC (permalink / raw)
  To: Nick Desaulniers, Peter Zijlstra
  Cc: Kees Cook, Linus Torvalds, gregkh, linux-kernel, ojeda, mingo,
	will, longman, boqun.feng, juri.lelli, vincent.guittot,
	dietmar.eggemann, rostedt, bsegall, mgorman, bristot, vschneid,
	paulmck, frederic, quic_neeraju, joel, josh, mathieu.desnoyers,
	jiangshanlai, rcu, tj, tglx, linux-toolchains

On 6/8/23 22:14, Nick Desaulniers wrote:
> Here, we're talking about using __attribute__((cleanup())) to DTR
> locally, but then we return a "raw" pointer to a caller. What cleanup
> function should the caller run, implicitly, if at all?  If we use
> __attribute__((cleanup())) that saves us a few gotos locally, but the
> caller perhaps now needs the same treatment.

But this is only a problem when you return a void*; and in general in C 
you will return a struct more often than a raw pointer (and in C++ you 
also have the issue of delete vs. delete[], that does not exist in C).

Returning a struct doesn't protect against use-after-free bugs in the 
way std::unique_ptr<> or Rust lifetimes do, but it at least tries to 
protect against calling the wrong cleanup function if you provide a 
typed "destructor" function that does the right thing---for example by 
handling reference counting or by freeing sub-structs before calling 
kfree/vfree.

Of course it's not a silver bullet, but then that's why people are 
looking into Rust for Linux.

Paolo

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v2 0/2] Lock and Pointer guards
  2023-06-09  2:25                         ` Linus Torvalds
  2023-06-09  8:14                           ` Peter Zijlstra
@ 2023-06-09 21:18                           ` Kees Cook
  1 sibling, 0 replies; 42+ messages in thread
From: Kees Cook @ 2023-06-09 21:18 UTC (permalink / raw)
  To: Linus Torvalds, Peter Zijlstra
  Cc: keescook, gregkh, pbonzini, linux-kernel, ojeda, ndesaulniers,
	mingo, will, longman, boqun.feng, juri.lelli, vincent.guittot,
	dietmar.eggemann, rostedt, bsegall, mgorman, bristot, vschneid,
	paulmck, frederic, quic_neeraju, joel, josh, mathieu.desnoyers,
	jiangshanlai, rcu, tj, tglx, linux-toolchains

On June 8, 2023 7:25:27 PM PDT, Linus Torvalds <torvalds@linux-foundation.org> wrote:
>On Thu, Jun 8, 2023 at 1:06 PM Peter Zijlstra <peterz@infradead.org> wrote:
>>
>>         struct obj *p __free(kfree) = kmalloc(...);
>
>Yeah, the above actually looks really good to me - I like the naming
>here, and the use looks very logical to me.
>
>Of course, maybe once I see the patches that use this I go "uhh", but
>at least for now I think you've hit on a rather legible syntax.
>
>I'm still a bit unsure of the "no_free_ptr(p)" naming, but at least
>it's pretty clear about what it does.
>
>So my only worry is that it's not pretty and to the point like your
>"__free(kfree)" syntax.
>
>But it feels workable and not misleading, so unless somebody can come
>up with a better name, I think it's ok.

I like the proposed "take" naming, and before reading that reply I was going to suggest "keep". *shrug*


-- 
Kees Cook

^ permalink raw reply	[flat|nested] 42+ messages in thread

end of thread, other threads:[~2023-06-09 21:18 UTC | newest]

Thread overview: 42+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-05-26 20:52 [PATCH v2 0/2] Lock and Pointer guards Peter Zijlstra
2023-05-26 20:52 ` [PATCH v2 1/2] locking: Introduce __cleanup__ based guards Peter Zijlstra
2023-05-26 21:24   ` Peter Zijlstra
2023-05-26 21:54     ` Linus Torvalds
2023-05-27  8:57       ` Peter Zijlstra
2023-05-26 20:52 ` [PATCH v2 2/2] sched: Use fancy new guards Peter Zijlstra
2023-05-27 17:21 ` [PATCH v2 0/2] Lock and Pointer guards Mathieu Desnoyers
2023-05-27 19:18 ` Linus Torvalds
2023-05-29 12:09   ` Paolo Bonzini
2023-05-29 19:04     ` Linus Torvalds
2023-05-29 21:27       ` Ian Lance Taylor
2023-05-30  0:06         ` Linus Torvalds
2023-05-30  9:23   ` Peter Zijlstra
2023-05-30  9:34     ` Peter Zijlstra
2023-05-30 13:58     ` Valentin Schneider
2023-06-06  9:42     ` Peter Zijlstra
2023-06-06 13:17       ` Linus Torvalds
2023-06-06 13:40         ` Peter Zijlstra
2023-06-06 14:50           ` Linus Torvalds
2023-06-06 16:06             ` Kees Cook
2023-06-06 18:08             ` Peter Zijlstra
2023-06-06 23:22               ` Linus Torvalds
2023-06-07  9:41                 ` Peter Zijlstra
2023-06-08  8:52                   ` Peter Zijlstra
2023-06-08  9:04                     ` Greg KH
2023-06-08 15:45                     ` Linus Torvalds
2023-06-08 16:47                       ` Kees Cook
2023-06-08 16:59                         ` Linus Torvalds
2023-06-08 17:20                         ` Nick Desaulniers
2023-06-08 18:51                           ` Peter Zijlstra
2023-06-08 20:14                             ` Nick Desaulniers
2023-06-09 10:20                               ` Paolo Bonzini
2023-06-08 20:06                       ` Peter Zijlstra
2023-06-09  2:25                         ` Linus Torvalds
2023-06-09  8:14                           ` Peter Zijlstra
2023-06-09 21:18                           ` Kees Cook
2023-06-09  8:27                       ` Rasmus Villemoes
2023-06-06 15:31       ` Kees Cook
2023-06-06 15:45         ` Linus Torvalds
2023-06-06 16:08           ` Kees Cook
2023-06-08 16:25           ` David Laight
2023-05-30  9:26   ` Peter Zijlstra

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).