[patch 00/12] cpu isolation: infra to block interference to select CPUs

LKML Archive mirror
 help / color / mirror / Atom feed

* [patch 00/12] cpu isolation: infra to block interference to select CPUs
@ 2024-02-06 18:49 Marcelo Tosatti
  2024-02-06 18:49 ` [patch 01/12] cpu isolation: basic block interference infrastructure Marcelo Tosatti
                   ` (11 more replies)
  0 siblings, 12 replies; 27+ messages in thread
From: Marcelo Tosatti @ 2024-02-06 18:49 UTC (permalink / raw
  To: linux-kernel
  Cc: Daniel Bristot de Oliveira, Juri Lelli, Valentin Schneider,
	Frederic Weisbecker, Leonardo Bras, Peter Zijlstra,
	Thomas Gleixner

There are a number of codepaths in the kernel that interrupt
code execution in remote CPUs. A subset of such codepaths are
triggered from userspace and can therefore return errors.

Introduce a cpumask named "block interference", writable from userspace.

This cpumask (and associated helpers) can be used by code that executes
code on remote CPUs to optionally return an error.

Note: the word "interference" has been chosen since "interruption" is
often confused with "device interrupt".

To protect readers VS writers of this cpumask, SRCU protection is used.

What is proposed is to incrementally modify code that can return errors
in two ways:

1) Introduction of fail variants of the functions that generate
code execution on remote CPUs. This way the modified code should
look like:

idx = block_interf_srcu_read_lock();
ret = smp_call_function_single_fail(cpu, remote_fn, ...);  (or stop_machine_fail)
block_interf_srcu_read_unlock(idx);

This is grep friendly (so one can search for smp_call_function_* variants)
and re-uses code.

2) Usage of block interference CPU mask helpers. For certain
users of smp_call_func_*, stop_machine_* functions it
is natural to check for block interference CPUs before
calling the functions for remote code execution.

For example if its not desirable to perform error handling at
smp_call_func_* time, or if performing the error handling requires
unjustified complexity. Then:

idx = block_interf_srcu_read_lock();

if target cpumask intersects with block interference cpumask {
block_interf_read_unlock();
return error
}

...
ret = smp_call_function_single / stop_machine() / ...
...

block_interf_srcu_read_unlock(idx);

Regarding housekeeping flags, it is usually the case that initialization might
require code execution on interference blocked CPUs (for example MTRR
initialization, resctrlfs initialization, MSR writes, ...). Therefore
tagging the CPUs after system initialization is necessary, which
is not possible with current housekeeping flags infrastructure.

This patchset converts a few callers for demonstration purposes.

Sending the second RFC to know whether folks have objections
(there were no objections to the first release), or have
better ideas.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* [patch 01/12] cpu isolation: basic block interference infrastructure
  2024-02-06 18:49 [patch 00/12] cpu isolation: infra to block interference to select CPUs Marcelo Tosatti
@ 2024-02-06 18:49 ` Marcelo Tosatti
  2024-02-06 18:49 ` [patch 02/12] introduce smp_call_func_single_fail Marcelo Tosatti
                   ` (10 subsequent siblings)
  11 siblings, 0 replies; 27+ messages in thread
From: Marcelo Tosatti @ 2024-02-06 18:49 UTC (permalink / raw
  To: linux-kernel
  Cc: Daniel Bristot de Oliveira, Juri Lelli, Valentin Schneider,
	Frederic Weisbecker, Leonardo Bras, Peter Zijlstra,
	Thomas Gleixner, Marcelo Tosatti

There are a number of codepaths in the kernel that interrupt
code execution in remote CPUs. A subset of such codepaths are
triggered from userspace and can therefore return errors.

Introduce a cpumask named "block interference", writable from userspace.

This cpumask (and associated helpers) can be used by code that executes
code on remote CPUs to optionally return an error.

Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>

Index: linux-isolation/include/linux/sched/isolation.h
===================================================================
--- linux-isolation.orig/include/linux/sched/isolation.h
+++ linux-isolation/include/linux/sched/isolation.h
@@ -72,4 +72,28 @@ static inline bool cpu_is_isolated(int c
 	       cpuset_cpu_is_isolated(cpu);
 }
 
+#ifdef CONFIG_CPU_ISOLATION
+extern cpumask_var_t block_interf_cpumask;
+extern bool block_interf_cpumask_active;
+
+int block_interf_srcu_read_lock(void);
+void block_interf_srcu_read_unlock(int idx);
+
+void block_interf_assert_held(void);
+
+#else
+int block_interf_srcu_read_lock(void) { return 0; }
+void block_interf_srcu_read_unlock(int idx) { }
+void block_interf_assert_held(void) { }
+#endif
+
+static inline bool block_interf_cpu(int cpu)
+{
+#ifdef CONFIG_CPU_ISOLATION
+	if (block_interf_cpumask_active)
+		return cpumask_test_cpu(cpu, block_interf_cpumask);
+#endif
+	return false;
+}
+
 #endif /* _LINUX_SCHED_ISOLATION_H */
Index: linux-isolation/kernel/sched/isolation.c
===================================================================
--- linux-isolation.orig/kernel/sched/isolation.c
+++ linux-isolation/kernel/sched/isolation.c
@@ -239,3 +239,109 @@ static int __init housekeeping_isolcpus_
 	return housekeeping_setup(str, flags);
 }
 __setup("isolcpus=", housekeeping_isolcpus_setup);
+
+struct srcu_struct block_interf_srcu;
+EXPORT_SYMBOL_GPL(block_interf_srcu);
+
+cpumask_var_t block_interf_cpumask;
+EXPORT_SYMBOL_GPL(block_interf_cpumask);
+
+bool block_interf_cpumask_active;
+EXPORT_SYMBOL_GPL(block_interf_cpumask_active);
+
+int block_interf_srcu_read_lock(void)
+{
+	return srcu_read_lock(&block_interf_srcu);
+}
+EXPORT_SYMBOL(block_interf_srcu_read_lock);
+
+void block_interf_srcu_read_unlock(int idx)
+{
+	srcu_read_unlock(&block_interf_srcu, idx);
+}
+EXPORT_SYMBOL(block_interf_srcu_read_unlock);
+
+void block_interf_assert_held(void)
+{
+	WARN_ON_ONCE(!srcu_read_lock_held(&block_interf_srcu));
+}
+EXPORT_SYMBOL(block_interf_assert_held);
+
+static ssize_t
+block_interf_cpumask_read(struct file *filp, char __user *ubuf,
+		     size_t count, loff_t *ppos)
+{
+	char *mask_str;
+	int len;
+
+	len = snprintf(NULL, 0, "%*pb\n",
+		       cpumask_pr_args(block_interf_cpumask)) + 1;
+	mask_str = kmalloc(len, GFP_KERNEL);
+	if (!mask_str)
+		return -ENOMEM;
+
+	len = snprintf(mask_str, len, "%*pb\n",
+		       cpumask_pr_args(block_interf_cpumask));
+	if (len >= count) {
+		count = -EINVAL;
+		goto out_err;
+	}
+	count = simple_read_from_buffer(ubuf, count, ppos, mask_str, len);
+
+out_err:
+	kfree(mask_str);
+
+	return count;
+}
+
+static ssize_t
+block_interf_cpumask_write(struct file *filp, const char __user *ubuf,
+			   size_t count, loff_t *ppos)
+{
+	cpumask_var_t block_interf_cpumask_new;
+	int err;
+
+	if (!zalloc_cpumask_var(&block_interf_cpumask_new, GFP_KERNEL))
+		return -ENOMEM;
+
+	err = cpumask_parse_user(ubuf, count, block_interf_cpumask_new);
+	if (err)
+		goto err_free;
+
+	cpumask_copy(block_interf_cpumask, block_interf_cpumask_new);
+	synchronize_srcu(&block_interf_srcu);
+	free_cpumask_var(block_interf_cpumask_new);
+
+	return count;
+
+err_free:
+	free_cpumask_var(block_interf_cpumask_new);
+
+	return err;
+}
+
+static const struct file_operations block_interf_cpumask_fops = {
+	.read		= block_interf_cpumask_read,
+	.write		= block_interf_cpumask_write,
+};
+
+static int __init block_interf_cpumask_init(void)
+{
+	int ret;
+
+	ret = init_srcu_struct(&block_interf_srcu);
+	if (ret)
+		return ret;
+
+	if (!zalloc_cpumask_var(&block_interf_cpumask, GFP_KERNEL))
+		return -ENOMEM;
+
+	debugfs_create_file_unsafe("block_interf_cpumask", 0644, NULL, NULL,
+				   &block_interf_cpumask_fops);
+
+	block_interf_cpumask_active = true;
+	return 0;
+}
+
+late_initcall(block_interf_cpumask_init);
+



^ permalink raw reply	[flat|nested] 27+ messages in thread

* [patch 02/12] introduce smp_call_func_single_fail
  2024-02-06 18:49 [patch 00/12] cpu isolation: infra to block interference to select CPUs Marcelo Tosatti
  2024-02-06 18:49 ` [patch 01/12] cpu isolation: basic block interference infrastructure Marcelo Tosatti
@ 2024-02-06 18:49 ` Marcelo Tosatti
  2024-02-06 18:49 ` [patch 03/12] Introduce _fail variants of stop_machine functions Marcelo Tosatti
                   ` (9 subsequent siblings)
  11 siblings, 0 replies; 27+ messages in thread
From: Marcelo Tosatti @ 2024-02-06 18:49 UTC (permalink / raw
  To: linux-kernel
  Cc: Daniel Bristot de Oliveira, Juri Lelli, Valentin Schneider,
	Frederic Weisbecker, Leonardo Bras, Peter Zijlstra,
	Thomas Gleixner, Marcelo Tosatti

Introduce smp_call_func_single_fail, which checks if
the target CPU is tagged as a "block interference" CPU,
and returns an error if so.

Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>

Index: linux-isolation/include/linux/smp.h
===================================================================
--- linux-isolation.orig/include/linux/smp.h
+++ linux-isolation/include/linux/smp.h
@@ -50,6 +50,9 @@ extern unsigned int total_cpus;
 int smp_call_function_single(int cpuid, smp_call_func_t func, void *info,
 			     int wait);
 
+int smp_call_function_single_fail(int cpuid, smp_call_func_t func, void *info,
+				  int wait);
+
 void on_each_cpu_cond_mask(smp_cond_func_t cond_func, smp_call_func_t func,
 			   void *info, bool wait, const struct cpumask *mask);
 
Index: linux-isolation/kernel/smp.c
===================================================================
--- linux-isolation.orig/kernel/smp.c
+++ linux-isolation/kernel/smp.c
@@ -25,6 +25,7 @@
 #include <linux/nmi.h>
 #include <linux/sched/debug.h>
 #include <linux/jump_label.h>
+#include <linux/sched/isolation.h>
 
 #include <trace/events/ipi.h>
 #define CREATE_TRACE_POINTS
@@ -655,6 +656,30 @@ int smp_call_function_single(int cpu, sm
 }
 EXPORT_SYMBOL(smp_call_function_single);
 
+/*
+ * smp_call_function_single_fail - Run a function on a specific CPU,
+ * failing if any target CPU is marked as "no ipi".
+ * @func: The function to run. This must be fast and non-blocking.
+ * @info: An arbitrary pointer to pass to the function.
+ * @wait: If true, wait until function has completed on other CPUs.
+ *
+ * Returns 0 on success, else a negative status code.
+ */
+int smp_call_function_single_fail(int cpu, smp_call_func_t func, void *info,
+			     int wait)
+{
+	int err;
+
+	block_interf_assert_held();
+	if (block_interf_cpu(cpu))
+		return -EPERM;
+
+	err = smp_call_function_single(cpu, func, info, wait);
+
+	return err;
+}
+EXPORT_SYMBOL(smp_call_function_single_fail);
+
 /**
  * smp_call_function_single_async() - Run an asynchronous function on a
  * 			         specific CPU.



^ permalink raw reply	[flat|nested] 27+ messages in thread

* [patch 03/12] Introduce _fail variants of stop_machine functions
  2024-02-06 18:49 [patch 00/12] cpu isolation: infra to block interference to select CPUs Marcelo Tosatti
  2024-02-06 18:49 ` [patch 01/12] cpu isolation: basic block interference infrastructure Marcelo Tosatti
  2024-02-06 18:49 ` [patch 02/12] introduce smp_call_func_single_fail Marcelo Tosatti
@ 2024-02-06 18:49 ` Marcelo Tosatti
  2024-02-06 18:49 ` [patch 04/12] clockevent unbind: use smp_call_func_single_fail Marcelo Tosatti
                   ` (8 subsequent siblings)
  11 siblings, 0 replies; 27+ messages in thread
From: Marcelo Tosatti @ 2024-02-06 18:49 UTC (permalink / raw
  To: linux-kernel
  Cc: Daniel Bristot de Oliveira, Juri Lelli, Valentin Schneider,
	Frederic Weisbecker, Leonardo Bras, Peter Zijlstra,
	Thomas Gleixner, Marcelo Tosatti

Introduce stop_machine_fail and stop_machine_cpuslocked_fail,
which check if any online CPU in the system is tagged as 
a block interference CPU. 

If so, returns an error.

Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>

Index: linux-isolation/include/linux/stop_machine.h
===================================================================
--- linux-isolation.orig/include/linux/stop_machine.h
+++ linux-isolation/include/linux/stop_machine.h
@@ -113,6 +113,9 @@ static inline void print_stop_info(const
  */
 int stop_machine(cpu_stop_fn_t fn, void *data, const struct cpumask *cpus);
 
+
+int stop_machine_fail(cpu_stop_fn_t fn, void *data, const struct cpumask *cpus);
+
 /**
  * stop_machine_cpuslocked: freeze the machine on all CPUs and run this function
  * @fn: the function to run
@@ -124,6 +127,9 @@ int stop_machine(cpu_stop_fn_t fn, void
  */
 int stop_machine_cpuslocked(cpu_stop_fn_t fn, void *data, const struct cpumask *cpus);
 
+
+int stop_machine_cpuslocked_fail(cpu_stop_fn_t fn, void *data, const struct cpumask *cpus);
+
 /**
  * stop_core_cpuslocked: - stop all threads on just one core
  * @cpu: any cpu in the targeted core
Index: linux-isolation/kernel/stop_machine.c
===================================================================
--- linux-isolation.orig/kernel/stop_machine.c
+++ linux-isolation/kernel/stop_machine.c
@@ -22,6 +22,7 @@
 #include <linux/atomic.h>
 #include <linux/nmi.h>
 #include <linux/sched/wake_q.h>
+#include <linux/sched/isolation.h>
 
 /*
  * Structure to determine completion condition and record errors.  May
@@ -619,6 +620,17 @@ int stop_machine_cpuslocked(cpu_stop_fn_
 	return stop_cpus(cpu_online_mask, multi_cpu_stop, &msdata);
 }
 
+int stop_machine_cpuslocked_fail(cpu_stop_fn_t fn, void *data,
+				 const struct cpumask *cpus)
+{
+	block_interf_assert_held();
+
+	if (cpumask_intersects(block_interf_cpumask, cpu_online_mask))
+		return -EPERM;
+
+	return stop_machine_cpuslocked(fn, data, cpus);
+}
+
 int stop_machine(cpu_stop_fn_t fn, void *data, const struct cpumask *cpus)
 {
 	int ret;
@@ -631,6 +643,19 @@ int stop_machine(cpu_stop_fn_t fn, void
 }
 EXPORT_SYMBOL_GPL(stop_machine);
 
+int stop_machine_fail(cpu_stop_fn_t fn, void *data, const struct cpumask *cpus)
+{
+	int ret;
+
+	/* No CPUs can come up or down during this. */
+	cpus_read_lock();
+	ret = stop_machine_cpuslocked_fail(fn, data, cpus);
+	cpus_read_unlock();
+	return ret;
+}
+EXPORT_SYMBOL_GPL(stop_machine_fail);
+
+
 #ifdef CONFIG_SCHED_SMT
 int stop_core_cpuslocked(unsigned int cpu, cpu_stop_fn_t fn, void *data)
 {



^ permalink raw reply	[flat|nested] 27+ messages in thread

* [patch 04/12] clockevent unbind: use smp_call_func_single_fail
  2024-02-06 18:49 [patch 00/12] cpu isolation: infra to block interference to select CPUs Marcelo Tosatti
                   ` (2 preceding siblings ...)
  2024-02-06 18:49 ` [patch 03/12] Introduce _fail variants of stop_machine functions Marcelo Tosatti
@ 2024-02-06 18:49 ` Marcelo Tosatti
  2024-02-07 11:55   ` Thomas Gleixner
  2024-02-06 18:49 ` [patch 05/12] timekeeping_notify: use stop_machine_fail when appropriate Marcelo Tosatti
                   ` (7 subsequent siblings)
  11 siblings, 1 reply; 27+ messages in thread
From: Marcelo Tosatti @ 2024-02-06 18:49 UTC (permalink / raw
  To: linux-kernel
  Cc: Daniel Bristot de Oliveira, Juri Lelli, Valentin Schneider,
	Frederic Weisbecker, Leonardo Bras, Peter Zijlstra,
	Thomas Gleixner, Marcelo Tosatti

Convert clockevents_unbind from smp_call_function_single
to smp_call_func_single_fail, which will fail in case
the target CPU is tagged as block interference CPU.

Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>

Index: linux-isolation/kernel/time/clockevents.c
===================================================================
--- linux-isolation.orig/kernel/time/clockevents.c
+++ linux-isolation/kernel/time/clockevents.c
@@ -13,6 +13,7 @@
 #include <linux/module.h>
 #include <linux/smp.h>
 #include <linux/device.h>
+#include <linux/sched/isolation.h>
 
 #include "tick-internal.h"
 
@@ -416,9 +417,14 @@ static void __clockevents_unbind(void *a
  */
 static int clockevents_unbind(struct clock_event_device *ced, int cpu)
 {
+	int ret, idx;
 	struct ce_unbind cu = { .ce = ced, .res = -ENODEV };
 
-	smp_call_function_single(cpu, __clockevents_unbind, &cu, 1);
+	idx = block_interf_srcu_read_lock();
+	ret = smp_call_function_single_fail(cpu, __clockevents_unbind, &cu, 1);
+	block_interf_srcu_read_unlock(idx);
+	if (ret)
+		return ret;
 	return cu.res;
 }
 



^ permalink raw reply	[flat|nested] 27+ messages in thread

* [patch 05/12] timekeeping_notify: use stop_machine_fail when appropriate
  2024-02-06 18:49 [patch 00/12] cpu isolation: infra to block interference to select CPUs Marcelo Tosatti
                   ` (3 preceding siblings ...)
  2024-02-06 18:49 ` [patch 04/12] clockevent unbind: use smp_call_func_single_fail Marcelo Tosatti
@ 2024-02-06 18:49 ` Marcelo Tosatti
  2024-02-07 11:57   ` Thomas Gleixner
  2024-02-06 18:49 ` [patch 06/12] perf_event_open: check for block interference CPUs Marcelo Tosatti
                   ` (6 subsequent siblings)
  11 siblings, 1 reply; 27+ messages in thread
From: Marcelo Tosatti @ 2024-02-06 18:49 UTC (permalink / raw
  To: linux-kernel
  Cc: Daniel Bristot de Oliveira, Juri Lelli, Valentin Schneider,
	Frederic Weisbecker, Leonardo Bras, Peter Zijlstra,
	Thomas Gleixner, Marcelo Tosatti

Change timekeeping_notify to use stop_machine_fail when appropriate,
which will fail in case the target CPU is tagged as block interference CPU.

Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>

Index: linux-isolation/include/linux/clocksource.h
===================================================================
--- linux-isolation.orig/include/linux/clocksource.h
+++ linux-isolation/include/linux/clocksource.h
@@ -267,7 +267,7 @@ extern void clocksource_arch_init(struct
 static inline void clocksource_arch_init(struct clocksource *cs) { }
 #endif
 
-extern int timekeeping_notify(struct clocksource *clock);
+extern int timekeeping_notify(struct clocksource *clock, bool fail);
 
 extern u64 clocksource_mmio_readl_up(struct clocksource *);
 extern u64 clocksource_mmio_readl_down(struct clocksource *);
Index: linux-isolation/kernel/time/clocksource.c
===================================================================
--- linux-isolation.orig/kernel/time/clocksource.c
+++ linux-isolation/kernel/time/clocksource.c
@@ -125,7 +125,7 @@ static u64 suspend_start;
 
 #ifdef CONFIG_CLOCKSOURCE_WATCHDOG
 static void clocksource_watchdog_work(struct work_struct *work);
-static void clocksource_select(void);
+static int clocksource_select(bool fail);
 
 static LIST_HEAD(watchdog_list);
 static struct clocksource *watchdog;
@@ -679,7 +679,7 @@ static int clocksource_watchdog_kthread(
 {
 	mutex_lock(&clocksource_mutex);
 	if (__clocksource_watchdog_kthread())
-		clocksource_select();
+		clocksource_select(false);
 	mutex_unlock(&clocksource_mutex);
 	return 0;
 }
@@ -976,7 +976,7 @@ static struct clocksource *clocksource_f
 	return NULL;
 }
 
-static void __clocksource_select(bool skipcur)
+static int __clocksource_select(bool skipcur, bool fail)
 {
 	bool oneshot = tick_oneshot_mode_active();
 	struct clocksource *best, *cs;
@@ -984,7 +984,7 @@ static void __clocksource_select(bool sk
 	/* Find the best suitable clocksource */
 	best = clocksource_find_best(oneshot, skipcur);
 	if (!best)
-		return;
+		return 0;
 
 	if (!strlen(override_name))
 		goto found;
@@ -1021,10 +1021,16 @@ static void __clocksource_select(bool sk
 	}
 
 found:
-	if (curr_clocksource != best && !timekeeping_notify(best)) {
+	if (curr_clocksource != best) {
+		int ret;
+
+		ret = timekeeping_notify(best, fail);
+		if (ret)
+			return ret;
 		pr_info("Switched to clocksource %s\n", best->name);
 		curr_clocksource = best;
 	}
+	return 0;
 }
 
 /**
@@ -1035,14 +1041,14 @@ found:
  * Select the clocksource with the best rating, or the clocksource,
  * which is selected by userspace override.
  */
-static void clocksource_select(void)
+static int clocksource_select(bool fail)
 {
-	__clocksource_select(false);
+	return __clocksource_select(false, fail);
 }
 
-static void clocksource_select_fallback(void)
+static int clocksource_select_fallback(void)
 {
-	__clocksource_select(true);
+	return __clocksource_select(true, true);
 }
 
 /*
@@ -1061,7 +1067,7 @@ static int __init clocksource_done_booti
 	 * Run the watchdog first to eliminate unstable clock sources
 	 */
 	__clocksource_watchdog_kthread();
-	clocksource_select();
+	clocksource_select(false);
 	mutex_unlock(&clocksource_mutex);
 	return 0;
 }
@@ -1209,7 +1215,7 @@ int __clocksource_register_scale(struct
 	clocksource_enqueue_watchdog(cs);
 	clocksource_watchdog_unlock(&flags);
 
-	clocksource_select();
+	clocksource_select(false);
 	clocksource_select_watchdog(false);
 	__clocksource_suspend_select(cs);
 	mutex_unlock(&clocksource_mutex);
@@ -1238,7 +1244,7 @@ void clocksource_change_rating(struct cl
 	__clocksource_change_rating(cs, rating);
 	clocksource_watchdog_unlock(&flags);
 
-	clocksource_select();
+	clocksource_select(false);
 	clocksource_select_watchdog(false);
 	clocksource_suspend_select(false);
 	mutex_unlock(&clocksource_mutex);
@@ -1260,8 +1266,12 @@ static int clocksource_unbind(struct clo
 	}
 
 	if (cs == curr_clocksource) {
+		int ret;
+
 		/* Select and try to install a replacement clock source */
-		clocksource_select_fallback();
+		ret = clocksource_select_fallback();
+		if (ret)
+			return ret;
 		if (curr_clocksource == cs)
 			return -EBUSY;
 	}
@@ -1352,17 +1362,17 @@ static ssize_t current_clocksource_store
 					 struct device_attribute *attr,
 					 const char *buf, size_t count)
 {
-	ssize_t ret;
+	ssize_t ret, err;
 
 	mutex_lock(&clocksource_mutex);
 
 	ret = sysfs_get_uname(buf, override_name, count);
 	if (ret >= 0)
-		clocksource_select();
+		err = clocksource_select(true);
 
 	mutex_unlock(&clocksource_mutex);
 
-	return ret;
+	return err ? err : ret;
 }
 static DEVICE_ATTR_RW(current_clocksource);
 
Index: linux-isolation/kernel/time/timekeeping.c
===================================================================
--- linux-isolation.orig/kernel/time/timekeeping.c
+++ linux-isolation/kernel/time/timekeeping.c
@@ -13,6 +13,7 @@
 #include <linux/sched.h>
 #include <linux/sched/loadavg.h>
 #include <linux/sched/clock.h>
+#include <linux/sched/isolation.h>
 #include <linux/syscore_ops.h>
 #include <linux/clocksource.h>
 #include <linux/jiffies.h>
@@ -1497,13 +1498,24 @@ static int change_clocksource(void *data
  * This function is called from clocksource.c after a new, better clock
  * source has been registered. The caller holds the clocksource_mutex.
  */
-int timekeeping_notify(struct clocksource *clock)
+int timekeeping_notify(struct clocksource *clock, bool fail)
 {
 	struct timekeeper *tk = &tk_core.timekeeper;
 
 	if (tk->tkr_mono.clock == clock)
 		return 0;
-	stop_machine(change_clocksource, clock, NULL);
+
+	if (!fail)
+		stop_machine(change_clocksource, clock, NULL);
+	else {
+		int ret, idx;
+
+		idx = block_interf_srcu_read_lock();
+		ret = stop_machine_fail(change_clocksource, clock, NULL);
+		block_interf_srcu_read_unlock(idx);
+		if (ret)
+			return ret;
+	}
 	tick_clock_notify();
 	return tk->tkr_mono.clock == clock ? 0 : -1;
 }



^ permalink raw reply	[flat|nested] 27+ messages in thread

* [patch 06/12] perf_event_open: check for block interference CPUs
  2024-02-06 18:49 [patch 00/12] cpu isolation: infra to block interference to select CPUs Marcelo Tosatti
                   ` (4 preceding siblings ...)
  2024-02-06 18:49 ` [patch 05/12] timekeeping_notify: use stop_machine_fail when appropriate Marcelo Tosatti
@ 2024-02-06 18:49 ` Marcelo Tosatti
  2024-02-06 18:49 ` [patch 07/12] mtrr_add_page/mtrr_del_page: " Marcelo Tosatti
                   ` (5 subsequent siblings)
  11 siblings, 0 replies; 27+ messages in thread
From: Marcelo Tosatti @ 2024-02-06 18:49 UTC (permalink / raw
  To: linux-kernel
  Cc: Daniel Bristot de Oliveira, Juri Lelli, Valentin Schneider,
	Frederic Weisbecker, Leonardo Bras, Peter Zijlstra,
	Thomas Gleixner, Marcelo Tosatti

When creating perf events, return an error rather
than interfering with CPUs tagged as block interference.

Note: this patch is incomplete, installation of perf context 
on block interference CPUs via task context is not performed.

Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>

Index: linux-isolation/kernel/events/core.c
===================================================================
--- linux-isolation.orig/kernel/events/core.c
+++ linux-isolation/kernel/events/core.c
@@ -55,6 +55,7 @@
 #include <linux/pgtable.h>
 #include <linux/buildid.h>
 #include <linux/task_work.h>
+#include <linux/sched/isolation.h>
 
 #include "internal.h"
 
@@ -12435,6 +12436,7 @@ SYSCALL_DEFINE5(perf_event_open,
 	int err;
 	int f_flags = O_RDWR;
 	int cgroup_fd = -1;
+	int idx;
 
 	/* for future expandability... */
 	if (flags & ~PERF_FLAG_ALL)
@@ -12712,6 +12714,26 @@ SYSCALL_DEFINE5(perf_event_open,
 		goto err_context;
 	}
 
+	idx = block_interf_srcu_read_lock();
+	if (!task) {
+		if (move_group) {
+			for_each_sibling_event(sibling, group_leader) {
+				if (block_interf_cpu(sibling->cpu)) {
+					err = -EPERM;
+					goto err_block_interf;
+				}
+			}
+			if (block_interf_cpu(group_leader->cpu)) {
+				err = -EPERM;
+				goto err_block_interf;
+			}
+		}
+		if (block_interf_cpu(event->cpu)) {
+			err = -EPERM;
+			goto err_block_interf;
+		}
+	}
+
 	/*
 	 * This is the point on no return; we cannot fail hereafter. This is
 	 * where we start modifying current state.
@@ -12775,6 +12797,8 @@ SYSCALL_DEFINE5(perf_event_open,
 		put_task_struct(task);
 	}
 
+	block_interf_srcu_read_unlock(idx);
+
 	mutex_lock(&current->perf_event_mutex);
 	list_add_tail(&event->owner_entry, &current->perf_event_list);
 	mutex_unlock(&current->perf_event_mutex);
@@ -12789,6 +12813,8 @@ SYSCALL_DEFINE5(perf_event_open,
 	fd_install(event_fd, event_file);
 	return event_fd;
 
+err_block_interf:
+	block_interf_srcu_read_unlock(idx);
 err_context:
 	put_pmu_ctx(event->pmu_ctx);
 	event->pmu_ctx = NULL; /* _free_event() */



^ permalink raw reply	[flat|nested] 27+ messages in thread

* [patch 07/12] mtrr_add_page/mtrr_del_page: check for block interference CPUs
  2024-02-06 18:49 [patch 00/12] cpu isolation: infra to block interference to select CPUs Marcelo Tosatti
                   ` (5 preceding siblings ...)
  2024-02-06 18:49 ` [patch 06/12] perf_event_open: check for block interference CPUs Marcelo Tosatti
@ 2024-02-06 18:49 ` Marcelo Tosatti
  2024-02-06 18:49 ` [patch 08/12] arm64 kernel/topology: use smp_call_function_single_fail Marcelo Tosatti
                   ` (4 subsequent siblings)
  11 siblings, 0 replies; 27+ messages in thread
From: Marcelo Tosatti @ 2024-02-06 18:49 UTC (permalink / raw
  To: linux-kernel
  Cc: Daniel Bristot de Oliveira, Juri Lelli, Valentin Schneider,
	Frederic Weisbecker, Leonardo Bras, Peter Zijlstra,
	Thomas Gleixner, Marcelo Tosatti

Check if any online CPU in the system is tagged as
a block interference CPU, and if so return an error
to mtrr_add_page/mtrr_del_page. 

This can avoid interference to such CPUs (while allowing
userspace to handle the failures).

Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>


Index: linux-isolation/arch/x86/kernel/cpu/mtrr/mtrr.c
===================================================================
--- linux-isolation.orig/arch/x86/kernel/cpu/mtrr/mtrr.c
+++ linux-isolation/arch/x86/kernel/cpu/mtrr/mtrr.c
@@ -45,6 +45,7 @@
 #include <linux/smp.h>
 #include <linux/syscore_ops.h>
 #include <linux/rcupdate.h>
+#include <linux/sched/isolation.h>
 
 #include <asm/cacheinfo.h>
 #include <asm/cpufeature.h>
@@ -226,7 +227,7 @@ int mtrr_add_page(unsigned long base, un
 		  unsigned int type, bool increment)
 {
 	unsigned long lbase, lsize;
-	int i, replace, error;
+	int i, replace, error, idx;
 	mtrr_type ltype;
 
 	if (!mtrr_enabled())
@@ -261,6 +262,13 @@ int mtrr_add_page(unsigned long base, un
 	error = -EINVAL;
 	replace = -1;
 
+	idx = block_interf_srcu_read_lock();
+
+	if (cpumask_intersects(block_interf_cpumask, cpu_online_mask)) {
+		block_interf_srcu_read_unlock(idx);
+		return -EPERM;
+	}
+
 	/* No CPU hotplug when we change MTRR entries */
 	cpus_read_lock();
 
@@ -325,6 +333,7 @@ int mtrr_add_page(unsigned long base, un
  out:
 	mutex_unlock(&mtrr_mutex);
 	cpus_read_unlock();
+	block_interf_srcu_read_unlock(idx);
 	return error;
 }
 
@@ -405,11 +414,17 @@ int mtrr_del_page(int reg, unsigned long
 	mtrr_type ltype;
 	unsigned long lbase, lsize;
 	int error = -EINVAL;
+	int idx;
 
 	if (!mtrr_enabled())
 		return -ENODEV;
 
 	max = num_var_ranges;
+	idx = block_interf_srcu_read_lock();
+	if (cpumask_intersects(block_interf_cpumask, cpu_online_mask)) {
+		block_interf_srcu_read_unlock(idx);
+		return -EPERM;
+	}
 	/* No CPU hotplug when we change MTRR entries */
 	cpus_read_lock();
 	mutex_lock(&mtrr_mutex);
@@ -446,6 +461,7 @@ int mtrr_del_page(int reg, unsigned long
  out:
 	mutex_unlock(&mtrr_mutex);
 	cpus_read_unlock();
+	block_interf_srcu_read_unlock(idx);
 	return error;
 }
 



^ permalink raw reply	[flat|nested] 27+ messages in thread

* [patch 08/12] arm64 kernel/topology: use smp_call_function_single_fail
  2024-02-06 18:49 [patch 00/12] cpu isolation: infra to block interference to select CPUs Marcelo Tosatti
                   ` (6 preceding siblings ...)
  2024-02-06 18:49 ` [patch 07/12] mtrr_add_page/mtrr_del_page: " Marcelo Tosatti
@ 2024-02-06 18:49 ` Marcelo Tosatti
  2024-02-06 18:49 ` [patch 09/12] AMD MCE: use smp_call_func_single_fail Marcelo Tosatti
                   ` (3 subsequent siblings)
  11 siblings, 0 replies; 27+ messages in thread
From: Marcelo Tosatti @ 2024-02-06 18:49 UTC (permalink / raw
  To: linux-kernel
  Cc: Daniel Bristot de Oliveira, Juri Lelli, Valentin Schneider,
	Frederic Weisbecker, Leonardo Bras, Peter Zijlstra,
	Thomas Gleixner, Marcelo Tosatti

Convert cpu_read_constcnt from smp_call_function_single
to smp_call_func_single_fail, which will fail in case
the target CPU is tagged as block interference CPU.

Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>

Index: linux-isolation/arch/arm64/kernel/topology.c
===================================================================
--- linux-isolation.orig/arch/arm64/kernel/topology.c
+++ linux-isolation/arch/arm64/kernel/topology.c
@@ -17,6 +17,7 @@
 #include <linux/cpufreq.h>
 #include <linux/init.h>
 #include <linux/percpu.h>
+#include <linux/sched/isolation.h>
 
 #include <asm/cpu.h>
 #include <asm/cputype.h>
@@ -280,6 +281,8 @@ static void cpu_read_constcnt(void *val)
 static inline
 int counters_read_on_cpu(int cpu, smp_call_func_t func, u64 *val)
 {
+	int ret, idx;
+
 	/*
 	 * Abort call on counterless CPU or when interrupts are
 	 * disabled - can lead to deadlock in smp sync call.
@@ -290,7 +293,11 @@ int counters_read_on_cpu(int cpu, smp_ca
 	if (WARN_ON_ONCE(irqs_disabled()))
 		return -EPERM;
 
-	smp_call_function_single(cpu, func, val, 1);
+	idx = block_interf_srcu_read_lock();
+	ret = smp_call_function_single_fail(cpu, func, val, 1);
+	block_interf_srcu_read_unlock(idx);
+	if (ret)
+		return ret;
 
 	return 0;
 }



^ permalink raw reply	[flat|nested] 27+ messages in thread

* [patch 09/12] AMD MCE: use smp_call_func_single_fail
  2024-02-06 18:49 [patch 00/12] cpu isolation: infra to block interference to select CPUs Marcelo Tosatti
                   ` (7 preceding siblings ...)
  2024-02-06 18:49 ` [patch 08/12] arm64 kernel/topology: use smp_call_function_single_fail Marcelo Tosatti
@ 2024-02-06 18:49 ` Marcelo Tosatti
  2024-02-06 18:49 ` [patch 10/12] x86/mce/inject.c: fail if target cpu is block interference Marcelo Tosatti
                   ` (2 subsequent siblings)
  11 siblings, 0 replies; 27+ messages in thread
From: Marcelo Tosatti @ 2024-02-06 18:49 UTC (permalink / raw
  To: linux-kernel
  Cc: Daniel Bristot de Oliveira, Juri Lelli, Valentin Schneider,
	Frederic Weisbecker, Leonardo Bras, Peter Zijlstra,
	Thomas Gleixner, Marcelo Tosatti

Convert arch/x86/kernel/cpu/mce/amd.c from smp_call_function_single
to smp_call_func_single_fail, which will fail in case
the target CPU is tagged as block interference CPU.

Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>

Index: linux-isolation/arch/x86/kernel/cpu/mce/amd.c
===================================================================
--- linux-isolation.orig/arch/x86/kernel/cpu/mce/amd.c
+++ linux-isolation/arch/x86/kernel/cpu/mce/amd.c
@@ -19,6 +19,7 @@
 #include <linux/cpu.h>
 #include <linux/smp.h>
 #include <linux/string.h>
+#include <linux/sched/isolation.h>
 
 #include <asm/amd_nb.h>
 #include <asm/traps.h>
@@ -970,6 +971,7 @@ store_interrupt_enable(struct threshold_
 {
 	struct thresh_restart tr;
 	unsigned long new;
+	int ret, idx;
 
 	if (!b->interrupt_capable)
 		return -EINVAL;
@@ -982,8 +984,15 @@ store_interrupt_enable(struct threshold_
 	memset(&tr, 0, sizeof(tr));
 	tr.b		= b;
 
-	if (smp_call_function_single(b->cpu, threshold_restart_bank, &tr, 1))
+	idx = block_interf_srcu_read_lock();
+	ret = smp_call_function_single_fail(b->cpu, threshold_restart_bank,
+					    &tr, 1);
+	block_interf_srcu_read_unlock(idx);
+	if (ret) {
+		if (ret == -EPERM)
+			return ret;
 		return -ENODEV;
+	}
 
 	return size;
 }
@@ -993,6 +1002,7 @@ store_threshold_limit(struct threshold_b
 {
 	struct thresh_restart tr;
 	unsigned long new;
+	int ret, idx;
 
 	if (kstrtoul(buf, 0, &new) < 0)
 		return -EINVAL;
@@ -1007,8 +1017,14 @@ store_threshold_limit(struct threshold_b
 	b->threshold_limit = new;
 	tr.b = b;
 
-	if (smp_call_function_single(b->cpu, threshold_restart_bank, &tr, 1))
+	idx = block_interf_srcu_read_lock();
+	ret = smp_call_function_single_fail(b->cpu, threshold_restart_bank, &tr, 1);
+	block_interf_srcu_read_unlock(idx);
+	if (ret) {
+		if (ret == -EPERM)
+			return ret;
 		return -ENODEV;
+	}
 
 	return size;
 }



^ permalink raw reply	[flat|nested] 27+ messages in thread

* [patch 10/12] x86/mce/inject.c: fail if target cpu is block interference
  2024-02-06 18:49 [patch 00/12] cpu isolation: infra to block interference to select CPUs Marcelo Tosatti
                   ` (8 preceding siblings ...)
  2024-02-06 18:49 ` [patch 09/12] AMD MCE: use smp_call_func_single_fail Marcelo Tosatti
@ 2024-02-06 18:49 ` Marcelo Tosatti
  2024-02-06 18:49 ` [patch 11/12] x86/resctrl: use smp_call_function_single_fail Marcelo Tosatti
  2024-02-06 18:49 ` [patch 12/12] x86/cacheinfo.c: check for block interference CPUs Marcelo Tosatti
  11 siblings, 0 replies; 27+ messages in thread
From: Marcelo Tosatti @ 2024-02-06 18:49 UTC (permalink / raw
  To: linux-kernel
  Cc: Daniel Bristot de Oliveira, Juri Lelli, Valentin Schneider,
	Frederic Weisbecker, Leonardo Bras, Peter Zijlstra,
	Thomas Gleixner, Marcelo Tosatti

In the codepaths leading to smp_call_function (including rdmsrl_on_cpu),
check for, and fail if, a target cpu is marked as "block interference".

Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>

Index: linux-isolation/arch/x86/kernel/cpu/mce/inject.c
===================================================================
--- linux-isolation.orig/arch/x86/kernel/cpu/mce/inject.c
+++ linux-isolation/arch/x86/kernel/cpu/mce/inject.c
@@ -23,6 +23,7 @@
 #include <linux/notifier.h>
 #include <linux/pci.h>
 #include <linux/uaccess.h>
+#include <linux/sched/isolation.h>
 
 #include <asm/amd_nb.h>
 #include <asm/apic.h>
@@ -584,6 +585,13 @@ static int inj_bank_set(void *data, u64
 	struct mce *m = (struct mce *)data;
 	u8 n_banks;
 	u64 cap;
+	int idx, ret = 0;
+
+	idx = block_interf_srcu_read_lock();
+	if (block_interf_cpu(m->extcpu)) {
+		ret = -EPERM;
+		goto err;
+	}
 
 	/* Get bank count on target CPU so we can handle non-uniform values. */
 	rdmsrl_on_cpu(m->extcpu, MSR_IA32_MCG_CAP, &cap);
@@ -591,7 +599,8 @@ static int inj_bank_set(void *data, u64
 
 	if (val >= n_banks) {
 		pr_err("MCA bank %llu non-existent on CPU%d\n", val, m->extcpu);
-		return -EINVAL;
+		ret = -EINVAL;
+		goto err;
 	}
 
 	m->bank = val;
@@ -612,12 +621,14 @@ static int inj_bank_set(void *data, u64
 
 		if (rdmsrl_on_cpu(m->extcpu, MSR_AMD64_SMCA_MCx_IPID(val), &ipid)) {
 			pr_err("Error reading IPID on CPU%d\n", m->extcpu);
-			return -EINVAL;
+			ret = -EINVAL;
+			goto err;
 		}
 
 		if (!ipid) {
 			pr_err("Cannot inject into unpopulated bank %llu\n", val);
-			return -ENODEV;
+			ret = -ENODEV;
+			goto err;
 		}
 	}
 
@@ -627,7 +638,9 @@ inject:
 	/* Reset injection struct */
 	setup_inj_struct(&i_mce);
 
-	return 0;
+err:
+	block_interf_srcu_read_unlock(idx);
+	return ret;
 }
 
 MCE_INJECT_GET(bank);



^ permalink raw reply	[flat|nested] 27+ messages in thread

* [patch 11/12] x86/resctrl: use smp_call_function_single_fail
  2024-02-06 18:49 [patch 00/12] cpu isolation: infra to block interference to select CPUs Marcelo Tosatti
                   ` (9 preceding siblings ...)
  2024-02-06 18:49 ` [patch 10/12] x86/mce/inject.c: fail if target cpu is block interference Marcelo Tosatti
@ 2024-02-06 18:49 ` Marcelo Tosatti
  2024-02-12 15:19   ` Thomas Gleixner
  2024-02-06 18:49 ` [patch 12/12] x86/cacheinfo.c: check for block interference CPUs Marcelo Tosatti
  11 siblings, 1 reply; 27+ messages in thread
From: Marcelo Tosatti @ 2024-02-06 18:49 UTC (permalink / raw
  To: linux-kernel
  Cc: Daniel Bristot de Oliveira, Juri Lelli, Valentin Schneider,
	Frederic Weisbecker, Leonardo Bras, Peter Zijlstra,
	Thomas Gleixner, Marcelo Tosatti

Convert update_task_closid_rmid from smp_call_function_single
to smp_call_func_single_fail, which will fail in case
the target CPU is tagged as block interference CPU.

Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>

Index: linux-isolation/arch/x86/kernel/cpu/resctrl/rdtgroup.c
===================================================================
--- linux-isolation.orig/arch/x86/kernel/cpu/resctrl/rdtgroup.c
+++ linux-isolation/arch/x86/kernel/cpu/resctrl/rdtgroup.c
@@ -26,6 +26,7 @@
 #include <linux/slab.h>
 #include <linux/task_work.h>
 #include <linux/user_namespace.h>
+#include <linux/sched/isolation.h>
 
 #include <uapi/linux/magic.h>
 
@@ -551,12 +552,20 @@ static void _update_task_closid_rmid(voi
 		resctrl_sched_in(task);
 }
 
-static void update_task_closid_rmid(struct task_struct *t)
+static int update_task_closid_rmid(struct task_struct *t)
 {
-	if (IS_ENABLED(CONFIG_SMP) && task_curr(t))
-		smp_call_function_single(task_cpu(t), _update_task_closid_rmid, t, 1);
-	else
+	int idx, ret = 0;
+
+	if (IS_ENABLED(CONFIG_SMP) && task_curr(t)) {
+		idx = block_interf_srcu_read_lock();
+		ret = smp_call_function_single_fail(task_cpu(t),
+						    _update_task_closid_rmid,
+						    t, 1);
+		block_interf_srcu_read_unlock(idx);
+	} else
 		_update_task_closid_rmid(t);
+
+	return ret;
 }
 
 static int __rdtgroup_move_task(struct task_struct *tsk,
@@ -604,9 +613,7 @@ static int __rdtgroup_move_task(struct t
 	 * group go into effect. If the task is not current, the MSR will be
 	 * updated when the task is scheduled in.
 	 */
-	update_task_closid_rmid(tsk);
-
-	return 0;
+	return update_task_closid_rmid(tsk);
 }
 
 static bool is_closid_match(struct task_struct *t, struct rdtgroup *r)



^ permalink raw reply	[flat|nested] 27+ messages in thread

* [patch 12/12] x86/cacheinfo.c: check for block interference CPUs
  2024-02-06 18:49 [patch 00/12] cpu isolation: infra to block interference to select CPUs Marcelo Tosatti
                   ` (10 preceding siblings ...)
  2024-02-06 18:49 ` [patch 11/12] x86/resctrl: use smp_call_function_single_fail Marcelo Tosatti
@ 2024-02-06 18:49 ` Marcelo Tosatti
  2024-02-07 12:41   ` Thomas Gleixner
  11 siblings, 1 reply; 27+ messages in thread
From: Marcelo Tosatti @ 2024-02-06 18:49 UTC (permalink / raw
  To: linux-kernel
  Cc: Daniel Bristot de Oliveira, Juri Lelli, Valentin Schneider,
	Frederic Weisbecker, Leonardo Bras, Peter Zijlstra,
	Thomas Gleixner, Marcelo Tosatti

Change amd_set_l3_disable_slot to check for block interference
cpumask, and avoid the IPI if set.

Also change wbinvd_on_cpu to use smp_call_function_single_fail.

Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>

Index: linux-isolation/arch/x86/include/asm/smp.h
===================================================================
--- linux-isolation.orig/arch/x86/include/asm/smp.h
+++ linux-isolation/arch/x86/include/asm/smp.h
@@ -118,7 +118,7 @@ int native_cpu_disable(void);
 void __noreturn hlt_play_dead(void);
 void native_play_dead(void);
 void play_dead_common(void);
-void wbinvd_on_cpu(int cpu);
+int wbinvd_on_cpu(int cpu);
 int wbinvd_on_all_cpus(void);
 
 void smp_kick_mwait_play_dead(void);
Index: linux-isolation/arch/x86/kernel/cpu/cacheinfo.c
===================================================================
--- linux-isolation.orig/arch/x86/kernel/cpu/cacheinfo.c
+++ linux-isolation/arch/x86/kernel/cpu/cacheinfo.c
@@ -17,6 +17,7 @@
 #include <linux/sysfs.h>
 #include <linux/pci.h>
 #include <linux/stop_machine.h>
+#include <linux/sched/isolation.h>
 
 #include <asm/cpufeature.h>
 #include <asm/cacheinfo.h>
@@ -396,6 +397,7 @@ static void amd_l3_disable_index(struct
 	 *  disable index in all 4 subcaches
 	 */
 	for (i = 0; i < 4; i++) {
+		int ret;
 		u32 reg = idx | (i << 20);
 
 		if (!nb->l3_cache.subcaches[i])
@@ -409,6 +411,7 @@ static void amd_l3_disable_index(struct
 		 * is not sufficient.
 		 */
 		ret = wbinvd_on_cpu(cpu);
+		WARN_ON(ret == -EPERM);
 
 		reg |= BIT(31);
 		pci_write_config_dword(nb->misc, 0x1BC + slot * 4, reg);
@@ -428,7 +431,7 @@ static void amd_l3_disable_index(struct
 static int amd_set_l3_disable_slot(struct amd_northbridge *nb, int cpu,
 			    unsigned slot, unsigned long index)
 {
-	int ret = 0;
+	int idx, ret = 0;
 
 	/*  check if @slot is already used or the index is already disabled */
 	ret = amd_get_l3_disable_slot(nb, slot);
@@ -442,7 +445,15 @@ static int amd_set_l3_disable_slot(struc
 	if (index == amd_get_l3_disable_slot(nb, !slot))
 		return -EEXIST;
 
-	amd_l3_disable_index(nb, cpu, slot, index);
+	ret = 0;
+	idx = block_interf_srcu_read_lock();
+
+	if (block_interf_cpu(cpu))
+		ret = -EPERM;
+	else
+		amd_l3_disable_index(nb, cpu, slot, index);
+
+	block_interf_srcu_read_unlock(idx);
 
 	return 0;
 }
Index: linux-isolation/arch/x86/lib/cache-smp.c
===================================================================
--- linux-isolation.orig/arch/x86/lib/cache-smp.c
+++ linux-isolation/arch/x86/lib/cache-smp.c
@@ -7,9 +7,9 @@ static void __wbinvd(void *dummy)
 	wbinvd();
 }
 
-void wbinvd_on_cpu(int cpu)
+int wbinvd_on_cpu(int cpu)
 {
-	smp_call_function_single(cpu, __wbinvd, NULL, 1);
+	return smp_call_function_single_fail(cpu, __wbinvd, NULL, 1);
 }
 EXPORT_SYMBOL(wbinvd_on_cpu);
 



^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [patch 04/12] clockevent unbind: use smp_call_func_single_fail
  2024-02-06 18:49 ` [patch 04/12] clockevent unbind: use smp_call_func_single_fail Marcelo Tosatti
@ 2024-02-07 11:55   ` Thomas Gleixner
  2024-02-07 12:51     ` Marcelo Tosatti
  0 siblings, 1 reply; 27+ messages in thread
From: Thomas Gleixner @ 2024-02-07 11:55 UTC (permalink / raw
  To: Marcelo Tosatti, linux-kernel
  Cc: Daniel Bristot de Oliveira, Juri Lelli, Valentin Schneider,
	Frederic Weisbecker, Leonardo Bras, Peter Zijlstra,
	Marcelo Tosatti

On Tue, Feb 06 2024 at 15:49, Marcelo Tosatti wrote:
> Convert clockevents_unbind from smp_call_function_single
> to smp_call_func_single_fail, which will fail in case
> the target CPU is tagged as block interference CPU.

Seriously? This is a root only operation. So if root wants to interfere
then so be it.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [patch 05/12] timekeeping_notify: use stop_machine_fail when appropriate
  2024-02-06 18:49 ` [patch 05/12] timekeeping_notify: use stop_machine_fail when appropriate Marcelo Tosatti
@ 2024-02-07 11:57   ` Thomas Gleixner
  2024-02-07 12:58     ` Marcelo Tosatti
  0 siblings, 1 reply; 27+ messages in thread
From: Thomas Gleixner @ 2024-02-07 11:57 UTC (permalink / raw
  To: Marcelo Tosatti, linux-kernel
  Cc: Daniel Bristot de Oliveira, Juri Lelli, Valentin Schneider,
	Frederic Weisbecker, Leonardo Bras, Peter Zijlstra,
	Marcelo Tosatti

On Tue, Feb 06 2024 at 15:49, Marcelo Tosatti wrote:
> Change timekeeping_notify to use stop_machine_fail when appropriate,
> which will fail in case the target CPU is tagged as block interference
> CPU.

You completely fail to explain 'appropriate'. There is zero reason for
this churn, really.


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [patch 12/12] x86/cacheinfo.c: check for block interference CPUs
  2024-02-06 18:49 ` [patch 12/12] x86/cacheinfo.c: check for block interference CPUs Marcelo Tosatti
@ 2024-02-07 12:41   ` Thomas Gleixner
  2024-02-07 13:10     ` Marcelo Tosatti
  0 siblings, 1 reply; 27+ messages in thread
From: Thomas Gleixner @ 2024-02-07 12:41 UTC (permalink / raw
  To: Marcelo Tosatti, linux-kernel
  Cc: Daniel Bristot de Oliveira, Juri Lelli, Valentin Schneider,
	Frederic Weisbecker, Leonardo Bras, Peter Zijlstra,
	Marcelo Tosatti

On Tue, Feb 06 2024 at 15:49, Marcelo Tosatti wrote:
> @@ -396,6 +397,7 @@ static void amd_l3_disable_index(struct
>  	 *  disable index in all 4 subcaches
>  	 */
>  	for (i = 0; i < 4; i++) {
> +		int ret;
>  		u32 reg = idx | (i << 20);
>  
>  		if (!nb->l3_cache.subcaches[i])
> @@ -409,6 +411,7 @@ static void amd_l3_disable_index(struct
>  		 * is not sufficient.
>  		 */
>  		ret = wbinvd_on_cpu(cpu);
> +		WARN_ON(ret == -EPERM);

What? You create inconsistent state here.

> -	amd_l3_disable_index(nb, cpu, slot, index);
> +	ret = 0;
> +	idx = block_interf_srcu_read_lock();
> +
> +	if (block_interf_cpu(cpu))
> +		ret = -EPERM;
> +	else
> +		amd_l3_disable_index(nb, cpu, slot, index);
> +
> +	block_interf_srcu_read_unlock(idx);

Again. This is a root only operation.

This whole patch series is just voodoo programming with zero
justification for the mess it creates.

Thanks,

        tglx
.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [patch 04/12] clockevent unbind: use smp_call_func_single_fail
  2024-02-07 11:55   ` Thomas Gleixner
@ 2024-02-07 12:51     ` Marcelo Tosatti
  2024-02-11  8:52       ` Thomas Gleixner
  0 siblings, 1 reply; 27+ messages in thread
From: Marcelo Tosatti @ 2024-02-07 12:51 UTC (permalink / raw
  To: Thomas Gleixner
  Cc: linux-kernel, Daniel Bristot de Oliveira, Juri Lelli,
	Valentin Schneider, Frederic Weisbecker, Leonardo Bras,
	Peter Zijlstra

On Wed, Feb 07, 2024 at 12:55:59PM +0100, Thomas Gleixner wrote:
> On Tue, Feb 06 2024 at 15:49, Marcelo Tosatti wrote:
> > Convert clockevents_unbind from smp_call_function_single
> > to smp_call_func_single_fail, which will fail in case
> > the target CPU is tagged as block interference CPU.
> 
> Seriously? This is a root only operation. So if root wants to interfere
> then so be it.

Hi Thomas!

OK, so the problem is the following: due to software complexity, one is
often not aware of all operations that might take place.

For example:

https://lore.kernel.org/lkml/Y6mXDUZkII5OnuE8@tpad/T/

[PATCH] hwmon: coretemp: avoid RDMSR interruptions to isolated CPUs

The coretemp driver uses rdmsr_on_cpu calls to read
MSR_IA32_PACKAGE_THERM_STATUS/MSR_IA32_THERM_STATUS registers,
which contain information about current core temperature.

For certain low latency applications, the RDMSR interruption exceeds    
the applications requirements.

So disable reading of crit_alarm and temp files via /sys, in case
CPU isolation is enabled.

Temperature information from the housekeeping cores should be
sufficient to infer die temperature.

Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>

diff --git a/drivers/hwmon/coretemp.c b/drivers/hwmon/coretemp.c
index 9bee4d33fbdf..30a35f4130d5 100644
--- a/drivers/hwmon/coretemp.c
+++ b/drivers/hwmon/coretemp.c
@@ -27,6 +27,7 @@
 #include <asm/msr.h>
 #include <asm/processor.h>
 #include <asm/cpu_device_id.h>
+#include <linux/sched/isolation.h>

 #define DRVNAME	"coretemp"

@@ -121,6 +122,10 @@ static ssize_t show_crit_alarm(struct device *dev,
 	struct platform_data *pdata = dev_get_drvdata(dev);
 	struct temp_data *tdata = pdata->core_data[attr->index];

+
+	if (!housekeeping_cpu(tdata->cpu, HK_TYPE_MISC))
+		return -EINVAL;
+
 	mutex_lock(&tdata->update_lock);
 	rdmsr_on_cpu(tdata->cpu, tdata->status_reg, &eax, &edx);
 	mutex_unlock(&tdata->update_lock);
@@ -158,6 +163,8 @@ static ssize_t show_temp(struct device *dev,

 	/* Check whether the time interval has elapsed */
 	if (!tdata->valid || time_after(jiffies, tdata->last_updated + HZ)) {
+		if (!housekeeping_cpu(tdata->cpu, HK_TYPE_MISC))
+			return -EINVAL;
 		rdmsr_on_cpu(tdata->cpu, tdata->status_reg, &eax, &edx);
 		/*
 		 * Ignore the valid bit. In all observed cases the register

In this case, a userspace application (collecting telemetry data), was
responsible for reading the sysfs files.

Now think of all possible paths, from userspace, that lead to kernel
code that ends up in smp_call_function_* variants (or other functions
that cause IPIs to isolated CPUs).

The alternative, from blocking this in the kernel, would be to validate all 
userspace software involved in your application, to ensure it won't end
up in the kernel sending IPIs. Which is impractical, isnt it ?
(or rather, with such option in the kernel, it would be possible to run 
applications which have not been validated, since the kernel would fail
the operation that results in IPI to isolated CPU).

So the idea would be an additional "isolation mode", which when enabled, 
would disallow the IPIs. Its still possible for root user to disable
this mode, and retry the operation.

So lets say i want to read MSRs on a given CPU, as root.

You'd have to: 

1) readmsr on given CPU (returns -EPERM or whatever), since the
"block interference" mode is enabled for that CPU.

2) Disable that CPU in the block interference cpumask.

3) readmsr on the given CPU (success).

4) Re-enable CPU in block interference cpumask, if desired.

(BTW, better ideas than the cpumask are welcome, but it seems the
realibility aspect of something similar to this is necessary).

Thanks!

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* Re: [patch 05/12] timekeeping_notify: use stop_machine_fail when appropriate
  2024-02-07 11:57   ` Thomas Gleixner
@ 2024-02-07 12:58     ` Marcelo Tosatti
  2024-02-08 15:23       ` Thomas Gleixner
  0 siblings, 1 reply; 27+ messages in thread
From: Marcelo Tosatti @ 2024-02-07 12:58 UTC (permalink / raw
  To: Thomas Gleixner
  Cc: linux-kernel, Daniel Bristot de Oliveira, Juri Lelli,
	Valentin Schneider, Frederic Weisbecker, Leonardo Bras,
	Peter Zijlstra

On Wed, Feb 07, 2024 at 12:57:19PM +0100, Thomas Gleixner wrote:
> On Tue, Feb 06 2024 at 15:49, Marcelo Tosatti wrote:
> > Change timekeeping_notify to use stop_machine_fail when appropriate,
> > which will fail in case the target CPU is tagged as block interference
> > CPU.
> 
> You completely fail to explain 'appropriate'. There is zero reason for
> this churn, really.

The churn is so that we can return an error to
current_clocksource_store (sysfs handler for writes to
/sys/devices/system/clocksource/clocksource0/current_clocksource).

Yeah, should probably separate patches that prepare code for 
returning errors from the actual change between smp_call_function_single
and smp_call_function_single_fail (started that for the
rdmsr_on_cpu/wrmsr_on_cpu helpers).

OK, so i am assuming you are good with the general idea of the patch
(and have no better one) which is:

idx = block_interf_srcu_read_lock();
ret = smp_call_function_single_fail(cpu, remote_fn, ...);  (or stop_machine_fail)
block_interf_srcu_read_unlock(idx);

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [patch 12/12] x86/cacheinfo.c: check for block interference CPUs
  2024-02-07 12:41   ` Thomas Gleixner
@ 2024-02-07 13:10     ` Marcelo Tosatti
  2024-02-07 13:16       ` Marcelo Tosatti
  0 siblings, 1 reply; 27+ messages in thread
From: Marcelo Tosatti @ 2024-02-07 13:10 UTC (permalink / raw
  To: Thomas Gleixner
  Cc: linux-kernel, Daniel Bristot de Oliveira, Juri Lelli,
	Valentin Schneider, Frederic Weisbecker, Leonardo Bras,
	Peter Zijlstra

On Wed, Feb 07, 2024 at 01:41:36PM +0100, Thomas Gleixner wrote:
> On Tue, Feb 06 2024 at 15:49, Marcelo Tosatti wrote:
> > @@ -396,6 +397,7 @@ static void amd_l3_disable_index(struct
> >  	 *  disable index in all 4 subcaches
> >  	 */
> >  	for (i = 0; i < 4; i++) {
> > +		int ret;
> >  		u32 reg = idx | (i << 20);
> >  
> >  		if (!nb->l3_cache.subcaches[i])
> > @@ -409,6 +411,7 @@ static void amd_l3_disable_index(struct
> >  		 * is not sufficient.
> >  		 */
> >  		ret = wbinvd_on_cpu(cpu);
> > +		WARN_ON(ret == -EPERM);
> 
> What? You create inconsistent state here.

That should not happen, since we checked for 

+   idx = block_interf_srcu_read_lock();
+
+   if (block_interf_cpu(cpu))
+           ret = -EPERM;

Earlier.

Thus the WARN_ON (hum, can change to BUG_ON...).

> > -	amd_l3_disable_index(nb, cpu, slot, index);
> > +	ret = 0;
> > +	idx = block_interf_srcu_read_lock();
> > +
> > +	if (block_interf_cpu(cpu))
> > +		ret = -EPERM;
> > +	else
> > +		amd_l3_disable_index(nb, cpu, slot, index);
> > +
> > +	block_interf_srcu_read_unlock(idx);
> 
> Again. This is a root only operation.
> 
> This whole patch series is just voodoo programming with zero
> justification for the mess it creates.
> 
> Thanks,
> 
>         tglx

Its not really voodoo programming: its simply returning errors to
userspace if a CPU is set in a particular cpumask.

Do you have a better idea? (i am in the process of converting more
users).
Can improve on the patchset quality.

Ok, the justification is as follows. Today, there is a reactive approach
to interruptions to isolated CPUs:

1) Run Linux+latency sensitive application on isolated CPU.

2) Wait for IPI or other interruption to happen on that isolated CPU,
which breaks the application.

3) Remove that interruption to the isolated CPU.

This is (for a class of IPIs), an active approach, where those IPIs are 
not permitted to interrupt the latency sensitive applications.

https://iot-analytics.com/soft-plc-industrial-innovators-dilemma/

"Hard PLCs (a market in which incumbent vendors dominate) have
historically addressed most of the needs of the existing / high end
market, such as high reliability, fast cycle times and, perhaps most
importantly, the ability of existing workforce to support and maintain
the systems. Soft PLCs, on the other hand, initially addressed the needs
of new / lower end customers by providing more flexible,
non-deterministic control solutions often at a fraction of the cost of
similar hard PLCs. Since entering the market in the 90’s, soft PLCs have
rapidly become more performant thanks to advances in virtualization
technologies, real-time Linux operating systems and more powerful edge
computing hardware, thus moving up the y-axis (product performance) in
the chart above."

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [patch 12/12] x86/cacheinfo.c: check for block interference CPUs
  2024-02-07 13:10     ` Marcelo Tosatti
@ 2024-02-07 13:16       ` Marcelo Tosatti
  0 siblings, 0 replies; 27+ messages in thread
From: Marcelo Tosatti @ 2024-02-07 13:16 UTC (permalink / raw
  To: Thomas Gleixner
  Cc: linux-kernel, Daniel Bristot de Oliveira, Juri Lelli,
	Valentin Schneider, Frederic Weisbecker, Leonardo Bras,
	Peter Zijlstra

On Wed, Feb 07, 2024 at 10:10:41AM -0300, Marcelo Tosatti wrote:
> On Wed, Feb 07, 2024 at 01:41:36PM +0100, Thomas Gleixner wrote:
> > On Tue, Feb 06 2024 at 15:49, Marcelo Tosatti wrote:
> > > @@ -396,6 +397,7 @@ static void amd_l3_disable_index(struct
> > >  	 *  disable index in all 4 subcaches
> > >  	 */
> > >  	for (i = 0; i < 4; i++) {
> > > +		int ret;
> > >  		u32 reg = idx | (i << 20);
> > >  
> > >  		if (!nb->l3_cache.subcaches[i])
> > > @@ -409,6 +411,7 @@ static void amd_l3_disable_index(struct
> > >  		 * is not sufficient.
> > >  		 */
> > >  		ret = wbinvd_on_cpu(cpu);
> > > +		WARN_ON(ret == -EPERM);
> > 
> > What? You create inconsistent state here.
> 
> That should not happen, since we checked for 
> 
> +   idx = block_interf_srcu_read_lock();
> +
> +   if (block_interf_cpu(cpu))
> +           ret = -EPERM;
> 
> Earlier.
> 
> Thus the WARN_ON (hum, can change to BUG_ON...).
> 
> > > -	amd_l3_disable_index(nb, cpu, slot, index);
> > > +	ret = 0;
> > > +	idx = block_interf_srcu_read_lock();
> > > +
> > > +	if (block_interf_cpu(cpu))
> > > +		ret = -EPERM;
> > > +	else
> > > +		amd_l3_disable_index(nb, cpu, slot, index);
> > > +
> > > +	block_interf_srcu_read_unlock(idx);
> > 
> > Again. This is a root only operation.
> > 
> > This whole patch series is just voodoo programming with zero
> > justification for the mess it creates.

BTW, this converted less than 17% callers, so this is why early
feedback is useful.

Thanks!


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [patch 05/12] timekeeping_notify: use stop_machine_fail when appropriate
  2024-02-07 12:58     ` Marcelo Tosatti
@ 2024-02-08 15:23       ` Thomas Gleixner
  2024-02-09 15:30         ` Marcelo Tosatti
  0 siblings, 1 reply; 27+ messages in thread
From: Thomas Gleixner @ 2024-02-08 15:23 UTC (permalink / raw
  To: Marcelo Tosatti
  Cc: linux-kernel, Daniel Bristot de Oliveira, Juri Lelli,
	Valentin Schneider, Frederic Weisbecker, Leonardo Bras,
	Peter Zijlstra

On Wed, Feb 07 2024 at 09:58, Marcelo Tosatti wrote:
> On Wed, Feb 07, 2024 at 12:57:19PM +0100, Thomas Gleixner wrote:
>> On Tue, Feb 06 2024 at 15:49, Marcelo Tosatti wrote:
>> > Change timekeeping_notify to use stop_machine_fail when appropriate,
>> > which will fail in case the target CPU is tagged as block interference
>> > CPU.
>> 
>> You completely fail to explain 'appropriate'. There is zero reason for
>> this churn, really.
>
> The churn is so that we can return an error to
> current_clocksource_store (sysfs handler for writes to
> /sys/devices/system/clocksource/clocksource0/current_clocksource).

What for? Why?

Writing to that file requires root. Root can rightfully screw up a
system and adding a debugfs based "prevention" mechanism is not making
this any better because root can just clear the CPU mask there and move
on.

So what is the actual real world problem solved by these patches?

All I've seen so far is handwaving about interference prevention and TBH
I can't squint hard enough to believe that.

Thanks,

        tglx

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [patch 05/12] timekeeping_notify: use stop_machine_fail when appropriate
  2024-02-08 15:23       ` Thomas Gleixner
@ 2024-02-09 15:30         ` Marcelo Tosatti
  2024-02-12 15:29           ` Thomas Gleixner
  0 siblings, 1 reply; 27+ messages in thread
From: Marcelo Tosatti @ 2024-02-09 15:30 UTC (permalink / raw
  To: Thomas Gleixner
  Cc: linux-kernel, Daniel Bristot de Oliveira, Juri Lelli,
	Valentin Schneider, Frederic Weisbecker, Leonardo Bras,
	Peter Zijlstra

On Thu, Feb 08, 2024 at 04:23:58PM +0100, Thomas Gleixner wrote:
> On Wed, Feb 07 2024 at 09:58, Marcelo Tosatti wrote:
> > On Wed, Feb 07, 2024 at 12:57:19PM +0100, Thomas Gleixner wrote:
> >> On Tue, Feb 06 2024 at 15:49, Marcelo Tosatti wrote:
> >> > Change timekeeping_notify to use stop_machine_fail when appropriate,
> >> > which will fail in case the target CPU is tagged as block interference
> >> > CPU.
> >> 
> >> You completely fail to explain 'appropriate'. There is zero reason for
> >> this churn, really.
> >
> > The churn is so that we can return an error to
> > current_clocksource_store (sysfs handler for writes to
> > /sys/devices/system/clocksource/clocksource0/current_clocksource).
> 
> What for? Why?
> 
> Writing to that file requires root. Root can rightfully screw up a
> system and adding a debugfs based "prevention" mechanism is not making
> this any better because root can just clear the CPU mask there and move
> on.
> 
> So what is the actual real world problem solved by these patches?
> 
> All I've seen so far is handwaving about interference prevention and TBH
> I can't squint hard enough to believe that.
> 
> Thanks,
> 
>         tglx


Thomas,

The problem is an administrator is not aware of the relationship between

Kernel interface        ->      generation of IPIs.


Even less so of which applications in the system are accessing
which kernel interfaces.

Let me give some examples:

1) Change of trip temperatures from userspace.

static int
sys_set_trip_temp(struct thermal_zone_device *tzd, int trip, int temp)
{
        struct zone_device *zonedev = thermal_zone_device_priv(tzd);
        u32 l, h, mask, shift, intr;
        int tj_max, val, ret;

        tj_max = intel_tcc_get_tjmax(zonedev->cpu);
        if (tj_max < 0)
                return tj_max;
        tj_max *= 1000;

        val = (tj_max - temp)/1000;

        if (trip >= MAX_NUMBER_OF_TRIPS || val < 0 || val > 0x7f)
                return -EINVAL;

        ret = rdmsr_on_cpu(zonedev->cpu, MSR_IA32_PACKAGE_THERM_INTERRUPT,
                           &l, &h);
        if (ret < 0)
                return ret;

        if (trip) {
                mask = THERM_MASK_THRESHOLD1;
                shift = THERM_SHIFT_THRESHOLD1;
                intr = THERM_INT_THRESHOLD1_ENABLE;
        } else {
                mask = THERM_MASK_THRESHOLD0;
                shift = THERM_SHIFT_THRESHOLD0;
                intr = THERM_INT_THRESHOLD0_ENABLE;
        }
        l &= ~mask;
        /*
        * When users space sets a trip temperature == 0, which is indication
        * that, it is no longer interested in receiving notifications.
        */
        if (!temp) {
                l &= ~intr;
        } else {
                l |= val << shift;
                l |= intr;
        }

        return wrmsr_on_cpu(zonedev->cpu, MSR_IA32_PACKAGE_THERM_INTERRUPT,
                        l, h);
}

It seems plausible that userspace application decides to change trip temperature.

2) Read of per-CPU registers (from sysfs).

arch/arm64/kernel/topology.c

static inline
int counters_read_on_cpu(int cpu, smp_call_func_t func, u64 *val)
{
        /*
         * Abort call on counterless CPU or when interrupts are
         * disabled - can lead to deadlock in smp sync call.
         */
        if (!cpu_has_amu_feat(cpu))
                return -EOPNOTSUPP;

        if (WARN_ON_ONCE(irqs_disabled()))
                return -EPERM;

        smp_call_function_single(cpu, func, val, 1);

        return 0;
}

sysfs read -> cppc_get_perf_caps -> cpc_read -> cpc_read_ffh -> counters_read_on_cpu 

#define define_one_cppc_ro(_name)               \
static struct kobj_attribute _name =            \
__ATTR(_name, 0444, show_##_name, NULL)

This one does not even require root.

Other interfaces on the same class:

show_pw20_wait_time (PowerPC)
uncore_read_freq (x86)
...

So while I understand your point that root can (and should be) 
able to perform any operations on the system, this interface would 
be along the lines of:

"I don't want isolated CPUs to be interrupted, but i am not aware of
which kernel interfaces can result in interruptions to isolated CPUs.

Lets indicate through this cpumask, which the kernel can consult, 
that we'd like userspace operations to fail, if they were going
to interrupt an isolated CPU".

Its the kernel that knows which operations might interrupt 
isolated CPUs, not userspace.

Also https://www.spinics.net/lists/kernel/msg5094328.html. 


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [patch 04/12] clockevent unbind: use smp_call_func_single_fail
  2024-02-07 12:51     ` Marcelo Tosatti
@ 2024-02-11  8:52       ` Thomas Gleixner
  2024-02-14 18:58         ` Marcelo Tosatti
  0 siblings, 1 reply; 27+ messages in thread
From: Thomas Gleixner @ 2024-02-11  8:52 UTC (permalink / raw
  To: Marcelo Tosatti
  Cc: linux-kernel, Daniel Bristot de Oliveira, Juri Lelli,
	Valentin Schneider, Frederic Weisbecker, Leonardo Bras,
	Peter Zijlstra

On Wed, Feb 07 2024 at 09:51, Marcelo Tosatti wrote:
> On Wed, Feb 07, 2024 at 12:55:59PM +0100, Thomas Gleixner wrote:
>
> OK, so the problem is the following: due to software complexity, one is
> often not aware of all operations that might take place.

The problem is that people throw random crap on their systems and avoid
proper system engineering and then complain that their realtime
constraints are violated. So you are proliferating bad engineering
practices and encourage people not to care.

> Now think of all possible paths, from userspace, that lead to kernel
> code that ends up in smp_call_function_* variants (or other functions
> that cause IPIs to isolated CPUs).

So you need to analyze every possible code path and interface and add
your magic functions there after figuring out whether that's valid or
not.

> The alternative, from blocking this in the kernel, would be to validate all 
> userspace software involved in your application, to ensure it won't end
> up in the kernel sending IPIs. Which is impractical, isnt it ?

It's absolutely not impractical. It's part of proper system
engineering. The wet dream that you can run random docker containers and
everything works magically is just a wet dream.

> (or rather, with such option in the kernel, it would be possible to run 
> applications which have not been validated, since the kernel would fail
> the operation that results in IPI to isolated CPU).

That's a fallacy because you _cannot_ define with a single CPU mask
which interface is valid in a particular configuration to end up with an
IPI and which one is not. There are legitimate reasons in realtime or
latency constraint systems to invoke selective functionality which
interferes with the overall system constraints.

How do you cover that with your magic CPU mask? You can't.

Aside of that there is a decent chance that you are subtly breaking user
space that way. Just look at that hwmon/coretemp commit you pointed to:

  "Temperature information from the housekeeping cores should be
   sufficient to infer die temperature."

That's just wishful thinking for various reasons:

  - The die temperature on larger packages is not evenly distributed and
    you can run into situations where the housekeeping cores are sitting
    "far" enough away from the worker core which creates the heat spot

  - Some monitoring applications just stop to work when they can't read
    the full data set, which means that they break subtly and you can
    infer exactly nothing.

> So the idea would be an additional "isolation mode", which when enabled, 
> would disallow the IPIs. Its still possible for root user to disable
> this mode, and retry the operation.
>
> So lets say i want to read MSRs on a given CPU, as root.
>
> You'd have to: 
>
> 1) readmsr on given CPU (returns -EPERM or whatever), since the
> "block interference" mode is enabled for that CPU.
>
> 2) Disable that CPU in the block interference cpumask.
>
> 3) readmsr on the given CPU (success).
>
> 4) Re-enable CPU in block interference cpumask, if desired.

That's just wrong. Why?

Once you enable it just to read the MSR you enable the operation for
_ALL_ other non-validated crap too. So while the single MSR read might
be OK under certain circumstances the fact that you open up a window for
all other interfaces to do far more interfering operations is a red
flag.

This whole thing is a really badly defined policy mechanism of very
dubious value.

Thanks,

        tglx

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [patch 11/12] x86/resctrl: use smp_call_function_single_fail
  2024-02-06 18:49 ` [patch 11/12] x86/resctrl: use smp_call_function_single_fail Marcelo Tosatti
@ 2024-02-12 15:19   ` Thomas Gleixner
  2024-02-14 18:59     ` Marcelo Tosatti
  0 siblings, 1 reply; 27+ messages in thread
From: Thomas Gleixner @ 2024-02-12 15:19 UTC (permalink / raw
  To: Marcelo Tosatti, linux-kernel
  Cc: Daniel Bristot de Oliveira, Juri Lelli, Valentin Schneider,
	Frederic Weisbecker, Leonardo Bras, Peter Zijlstra,
	Marcelo Tosatti

On Tue, Feb 06 2024 at 15:49, Marcelo Tosatti wrote:
> Convert update_task_closid_rmid from smp_call_function_single
> to smp_call_func_single_fail, which will fail in case
> the target CPU is tagged as block interference CPU.

You fail again to provide a rationale for this change.

What's worse is that you fail to explain why you think that creating
inconistent state is a valid approach.

> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
>
> Index: linux-isolation/arch/x86/kernel/cpu/resctrl/rdtgroup.c
> ===================================================================
> --- linux-isolation.orig/arch/x86/kernel/cpu/resctrl/rdtgroup.c
> +++ linux-isolation/arch/x86/kernel/cpu/resctrl/rdtgroup.c
> @@ -26,6 +26,7 @@
>  #include <linux/slab.h>
>  #include <linux/task_work.h>
>  #include <linux/user_namespace.h>
> +#include <linux/sched/isolation.h>
>  
>  #include <uapi/linux/magic.h>
>  
> @@ -551,12 +552,20 @@ static void _update_task_closid_rmid(voi
>  		resctrl_sched_in(task);
>  }
>  
> -static void update_task_closid_rmid(struct task_struct *t)
> +static int update_task_closid_rmid(struct task_struct *t)
>  {
> -	if (IS_ENABLED(CONFIG_SMP) && task_curr(t))
> -		smp_call_function_single(task_cpu(t), _update_task_closid_rmid, t, 1);
> -	else
> +	int idx, ret = 0;
> +
> +	if (IS_ENABLED(CONFIG_SMP) && task_curr(t)) {
> +		idx = block_interf_srcu_read_lock();
> +		ret = smp_call_function_single_fail(task_cpu(t),
> +						    _update_task_closid_rmid,
> +						    t, 1);
> +		block_interf_srcu_read_unlock(idx);
> +	} else
>  		_update_task_closid_rmid(t);
> +
> +	return ret;

This is invoked _after_ the change has been committed to the in-memory
state so how is failing here correct?

Thanks,

        tglx

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [patch 05/12] timekeeping_notify: use stop_machine_fail when appropriate
  2024-02-09 15:30         ` Marcelo Tosatti
@ 2024-02-12 15:29           ` Thomas Gleixner
  0 siblings, 0 replies; 27+ messages in thread
From: Thomas Gleixner @ 2024-02-12 15:29 UTC (permalink / raw
  To: Marcelo Tosatti
  Cc: linux-kernel, Daniel Bristot de Oliveira, Juri Lelli,
	Valentin Schneider, Frederic Weisbecker, Leonardo Bras,
	Peter Zijlstra

On Fri, Feb 09 2024 at 12:30, Marcelo Tosatti wrote:
> On Thu, Feb 08, 2024 at 04:23:58PM +0100, Thomas Gleixner wrote:
> So while I understand your point that root can (and should be) 
> able to perform any operations on the system, this interface would 
> be along the lines of:
>
> "I don't want isolated CPUs to be interrupted, but i am not aware of
> which kernel interfaces can result in interruptions to isolated CPUs.
>
> Lets indicate through this cpumask, which the kernel can consult, 
> that we'd like userspace operations to fail, if they were going
> to interrupt an isolated CPU".
>
> Its the kernel that knows which operations might interrupt 
> isolated CPUs, not userspace.

Right, but you cannot just throw a CPU mask in and decide that it will
work for everything.

As I explained to you before, these interfaces cannot be treated in the
same way just because they might end up in a SMP function call.

You are definining a binary all or nothing policy which is the worst
thing you can do, because it prevents any fine grained decision and in
case an interface is needed for a particular operation it requires to
open up the gates for everything. Works for me and scratches my itch are
not really valid engineering principles.

Not to talk about the blind replacement of the SMP function call which
causes inconsistent state as I pointed out to you.

Thanks

        tglx

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [patch 04/12] clockevent unbind: use smp_call_func_single_fail
  2024-02-11  8:52       ` Thomas Gleixner
@ 2024-02-14 18:58         ` Marcelo Tosatti
  0 siblings, 0 replies; 27+ messages in thread
From: Marcelo Tosatti @ 2024-02-14 18:58 UTC (permalink / raw
  To: Thomas Gleixner
  Cc: linux-kernel, Daniel Bristot de Oliveira, Juri Lelli,
	Valentin Schneider, Frederic Weisbecker, Leonardo Bras,
	Peter Zijlstra

On Sun, Feb 11, 2024 at 09:52:35AM +0100, Thomas Gleixner wrote:
> On Wed, Feb 07 2024 at 09:51, Marcelo Tosatti wrote:
> > On Wed, Feb 07, 2024 at 12:55:59PM +0100, Thomas Gleixner wrote:
> >
> > OK, so the problem is the following: due to software complexity, one is
> > often not aware of all operations that might take place.
> 
> The problem is that people throw random crap on their systems and avoid
> proper system engineering and then complain that their realtime
> constraints are violated. So you are proliferating bad engineering
> practices and encourage people not to care.

Its more of a practicality and cost concern: one usually does not have
resources to fully review software before using that software.

> > Now think of all possible paths, from userspace, that lead to kernel
> > code that ends up in smp_call_function_* variants (or other functions
> > that cause IPIs to isolated CPUs).
> 
> So you need to analyze every possible code path and interface and add
> your magic functions there after figuring out whether that's valid or
> not.

"A magic function", yes.

> > The alternative, from blocking this in the kernel, would be to validate all 
> > userspace software involved in your application, to ensure it won't end
> > up in the kernel sending IPIs. Which is impractical, isnt it ?
> 
> It's absolutely not impractical. It's part of proper system
> engineering. The wet dream that you can run random docker containers and
> everything works magically is just a wet dream.

Unfortunately that is what people do.

I understand that "full software review" would be the ideal, but in most
situations it does not seem to happen.

> > (or rather, with such option in the kernel, it would be possible to run 
> > applications which have not been validated, since the kernel would fail
> > the operation that results in IPI to isolated CPU).
> 
> That's a fallacy because you _cannot_ define with a single CPU mask
> which interface is valid in a particular configuration to end up with an
> IPI and which one is not. There are legitimate reasons in realtime or
> latency constraint systems to invoke selective functionality which
> interferes with the overall system constraints.
> 
> How do you cover that with your magic CPU mask? You can't.
> 
> Aside of that there is a decent chance that you are subtly breaking user
> space that way. Just look at that hwmon/coretemp commit you pointed to:
> 
>   "Temperature information from the housekeeping cores should be
>    sufficient to infer die temperature."
> 
> That's just wishful thinking for various reasons:
> 
>   - The die temperature on larger packages is not evenly distributed and
>     you can run into situations where the housekeeping cores are sitting
>     "far" enough away from the worker core which creates the heat spot

I know.

>   - Some monitoring applications just stop to work when they can't read
>     the full data set, which means that they break subtly and you can
>     infer exactly nothing.
> 
> > So the idea would be an additional "isolation mode", which when enabled, 
> > would disallow the IPIs. Its still possible for root user to disable
> > this mode, and retry the operation.
> >
> > So lets say i want to read MSRs on a given CPU, as root.
> >
> > You'd have to: 
> >
> > 1) readmsr on given CPU (returns -EPERM or whatever), since the
> > "block interference" mode is enabled for that CPU.
> >
> > 2) Disable that CPU in the block interference cpumask.
> >
> > 3) readmsr on the given CPU (success).
> >
> > 4) Re-enable CPU in block interference cpumask, if desired.
> 
> That's just wrong. Why?
> 
> Once you enable it just to read the MSR you enable the operation for
> _ALL_ other non-validated crap too. So while the single MSR read might
> be OK under certain circumstances the fact that you open up a window for
> all other interfaces to do far more interfering operations is a red
> flag.
> 
> This whole thing is a really badly defined policy mechanism of very
> dubious value.
> 
> Thanks,

OK, fair enough. From your comments, it seems that per-callsite 
toggling would be ideal, for example:

/sys/kernel/interference_blocking/ directory containing one sub-directory
per call site. 

Inside each sub-directory, a "enabled" file, controlling a boolean 
to enable or disable interference blocking for that particular
callsite.

Also a "cpumask" file on each directory, by default containing the same
cpumask as the nohz_full CPUs, to control to which CPUs to block the
interference for.

How does that sound?


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [patch 11/12] x86/resctrl: use smp_call_function_single_fail
  2024-02-12 15:19   ` Thomas Gleixner
@ 2024-02-14 18:59     ` Marcelo Tosatti
  0 siblings, 0 replies; 27+ messages in thread
From: Marcelo Tosatti @ 2024-02-14 18:59 UTC (permalink / raw
  To: Thomas Gleixner
  Cc: linux-kernel, Daniel Bristot de Oliveira, Juri Lelli,
	Valentin Schneider, Frederic Weisbecker, Leonardo Bras,
	Peter Zijlstra

On Mon, Feb 12, 2024 at 04:19:19PM +0100, Thomas Gleixner wrote:
> On Tue, Feb 06 2024 at 15:49, Marcelo Tosatti wrote:
> > Convert update_task_closid_rmid from smp_call_function_single
> > to smp_call_func_single_fail, which will fail in case
> > the target CPU is tagged as block interference CPU.
> 
> You fail again to provide a rationale for this change.
> 
> What's worse is that you fail to explain why you think that creating
> inconistent state is a valid approach.

Well, the patch is broken and needs fixing.


^ permalink raw reply	[flat|nested] 27+ messages in thread

end of thread, other threads:[~2024-02-14 19:06 UTC | newest]

Thread overview: 27+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-02-06 18:49 [patch 00/12] cpu isolation: infra to block interference to select CPUs Marcelo Tosatti
2024-02-06 18:49 ` [patch 01/12] cpu isolation: basic block interference infrastructure Marcelo Tosatti
2024-02-06 18:49 ` [patch 02/12] introduce smp_call_func_single_fail Marcelo Tosatti
2024-02-06 18:49 ` [patch 03/12] Introduce _fail variants of stop_machine functions Marcelo Tosatti
2024-02-06 18:49 ` [patch 04/12] clockevent unbind: use smp_call_func_single_fail Marcelo Tosatti
2024-02-07 11:55   ` Thomas Gleixner
2024-02-07 12:51     ` Marcelo Tosatti
2024-02-11  8:52       ` Thomas Gleixner
2024-02-14 18:58         ` Marcelo Tosatti
2024-02-06 18:49 ` [patch 05/12] timekeeping_notify: use stop_machine_fail when appropriate Marcelo Tosatti
2024-02-07 11:57   ` Thomas Gleixner
2024-02-07 12:58     ` Marcelo Tosatti
2024-02-08 15:23       ` Thomas Gleixner
2024-02-09 15:30         ` Marcelo Tosatti
2024-02-12 15:29           ` Thomas Gleixner
2024-02-06 18:49 ` [patch 06/12] perf_event_open: check for block interference CPUs Marcelo Tosatti
2024-02-06 18:49 ` [patch 07/12] mtrr_add_page/mtrr_del_page: " Marcelo Tosatti
2024-02-06 18:49 ` [patch 08/12] arm64 kernel/topology: use smp_call_function_single_fail Marcelo Tosatti
2024-02-06 18:49 ` [patch 09/12] AMD MCE: use smp_call_func_single_fail Marcelo Tosatti
2024-02-06 18:49 ` [patch 10/12] x86/mce/inject.c: fail if target cpu is block interference Marcelo Tosatti
2024-02-06 18:49 ` [patch 11/12] x86/resctrl: use smp_call_function_single_fail Marcelo Tosatti
2024-02-12 15:19   ` Thomas Gleixner
2024-02-14 18:59     ` Marcelo Tosatti
2024-02-06 18:49 ` [patch 12/12] x86/cacheinfo.c: check for block interference CPUs Marcelo Tosatti
2024-02-07 12:41   ` Thomas Gleixner
2024-02-07 13:10     ` Marcelo Tosatti
2024-02-07 13:16       ` Marcelo Tosatti

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).