[PATCH 0/5] x86/bugs: more BHI fixes

All the mail mirrored from lore.kernel.org
 help / color / mirror / Atom feed

* [PATCH 0/5] x86/bugs: more BHI fixes
@ 2024-04-19 21:09 Josh Poimboeuf
  2024-04-19 21:09 ` [PATCH v4 1/5] x86/bugs: Only harden syscalls when needed Josh Poimboeuf
                   ` (4 more replies)
  0 siblings, 5 replies; 21+ messages in thread
From: Josh Poimboeuf @ 2024-04-19 21:09 UTC (permalink / raw)
  To: x86
  Cc: linux-kernel, Linus Torvalds, Daniel Sneddon, Pawan Gupta,
	Thomas Gleixner, Alexandre Chartre, Konrad Rzeszutek Wilk,
	Peter Zijlstra, Greg Kroah-Hartman, Sean Christopherson,
	Andrew Cooper, Dave Hansen, Nikolay Borisov, KP Singh,
	Waiman Long, Borislav Petkov, Ingo Molnar

Patch 1 is another iteration of reducing the scope of syscall hardening
in order to improve performance on some CPUs.  The feature bit has a new
name, and the commit log and comments are much improved.

The rest of the patches are new:

  - Patch 2 fixes the default mitigations for !x86 (reimplementation of
    Sean's fix).

  - Patch 3 fixes some objtool warnings found by Paul.

  - Patch 4 is a documentation cleanup and prep for patch 5.

  - Patch 5 adds a requested spectre_bhi=vmexit option.

Josh Poimboeuf (5):
  x86/bugs: Only harden syscalls when needed
  cpu/speculation: Fix CPU mitigation defaults for !x86
  x86/syscall: Mark exit[_group] syscall handlers __noreturn
  x86/bugs: Remove duplicate Spectre cmdline option descriptions
  x86/bugs: Add 'spectre_bhi=vmexit' cmdline option

 Documentation/admin-guide/hw-vuln/spectre.rst | 84 ++-----------------
 .../admin-guide/kernel-parameters.txt         | 12 ++-
 arch/Kconfig                                  | 10 +++
 arch/x86/Kconfig                              | 15 +---
 arch/x86/entry/common.c                       | 15 +++-
 arch/x86/entry/syscall_32.c                   | 11 +--
 arch/x86/entry/syscall_64.c                   | 10 +--
 arch/x86/entry/syscall_x32.c                  | 11 ++-
 arch/x86/entry/syscalls/syscall_64.tbl        |  6 +-
 arch/x86/include/asm/cpufeatures.h            |  1 +
 arch/x86/include/asm/syscall.h                |  8 +-
 arch/x86/kernel/cpu/bugs.c                    | 51 +++++++++--
 arch/x86/um/sys_call_table_32.c               |  1 +
 arch/x86/um/sys_call_table_64.c               |  1 +
 kernel/cpu.c                                  |  4 +-
 scripts/syscalltbl.sh                         |  6 +-
 tools/objtool/noreturns.h                     |  4 +
 17 files changed, 126 insertions(+), 124 deletions(-)

-- 
2.44.0


^ permalink raw reply	[flat|nested] 21+ messages in thread

* [PATCH v4 1/5] x86/bugs: Only harden syscalls when needed
  2024-04-19 21:09 [PATCH 0/5] x86/bugs: more BHI fixes Josh Poimboeuf
@ 2024-04-19 21:09 ` Josh Poimboeuf
  2024-04-22  8:09   ` Yujie Liu
  2024-04-19 21:09 ` [PATCH v4 2/5] cpu/speculation: Fix CPU mitigation defaults for !x86 Josh Poimboeuf
                   ` (3 subsequent siblings)
  4 siblings, 1 reply; 21+ messages in thread
From: Josh Poimboeuf @ 2024-04-19 21:09 UTC (permalink / raw)
  To: x86
  Cc: linux-kernel, Linus Torvalds, Daniel Sneddon, Pawan Gupta,
	Thomas Gleixner, Alexandre Chartre, Konrad Rzeszutek Wilk,
	Peter Zijlstra, Greg Kroah-Hartman, Sean Christopherson,
	Andrew Cooper, Dave Hansen, Nikolay Borisov, KP Singh,
	Waiman Long, Borislav Petkov, Ingo Molnar

Syscall hardening (converting the syscall indirect branch to a series of
direct branches) has shown some performance regressions:

- Red Hat internal testing showed up to 12% slowdowns in database
  benchmark testing on Sapphire Rapids when the DB was stressed with 80+
  users to cause contention.

- The kernel test robot's will-it-scale benchmarks showed significant
  regressions on Skylake with IBRS:
  https://lkml.kernel.org/lkml/202404191333.178a0eed-yujie.liu@intel.com

To fix those slowdowns, only use the syscall direct branches when
indirect branches are considered to be "not OK": meaning Spectre v2+BHI
isn't mitigated by HW and the user hasn't disabled mitigations.

Fixes: 1e3ad78334a6 ("x86/syscall: Don't force use of indirect calls for system calls")
Reviewed-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com>
Acked-by: Borislav Petkov (AMD) <bp@alien8.de>
Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org>
---
 arch/x86/entry/common.c            | 15 ++++++++++---
 arch/x86/entry/syscall_32.c        | 11 +---------
 arch/x86/entry/syscall_64.c        |  6 -----
 arch/x86/entry/syscall_x32.c       |  7 +++++-
 arch/x86/include/asm/cpufeatures.h |  1 +
 arch/x86/include/asm/syscall.h     |  8 ++++++-
 arch/x86/kernel/cpu/bugs.c         | 35 +++++++++++++++++++++++++++++-
 7 files changed, 61 insertions(+), 22 deletions(-)

diff --git a/arch/x86/entry/common.c b/arch/x86/entry/common.c
index 51cc9c7cb9bd..db1ef98da3a4 100644
--- a/arch/x86/entry/common.c
+++ b/arch/x86/entry/common.c
@@ -49,7 +49,10 @@ static __always_inline bool do_syscall_x64(struct pt_regs *regs, int nr)
 
 	if (likely(unr < NR_syscalls)) {
 		unr = array_index_nospec(unr, NR_syscalls);
-		regs->ax = x64_sys_call(regs, unr);
+		if (likely(cpu_feature_enabled(X86_FEATURE_INDIRECT_BRANCH_OK)))
+			regs->ax = sys_call_table[unr](regs);
+		else
+			regs->ax = x64_sys_call(regs, unr);
 		return true;
 	}
 	return false;
@@ -66,7 +69,10 @@ static __always_inline bool do_syscall_x32(struct pt_regs *regs, int nr)
 
 	if (IS_ENABLED(CONFIG_X86_X32_ABI) && likely(xnr < X32_NR_syscalls)) {
 		xnr = array_index_nospec(xnr, X32_NR_syscalls);
-		regs->ax = x32_sys_call(regs, xnr);
+		if (likely(cpu_feature_enabled(X86_FEATURE_INDIRECT_BRANCH_OK)))
+			regs->ax = x32_sys_call_table[xnr](regs);
+		else
+			regs->ax = x32_sys_call(regs, xnr);
 		return true;
 	}
 	return false;
@@ -162,7 +168,10 @@ static __always_inline void do_syscall_32_irqs_on(struct pt_regs *regs, int nr)
 
 	if (likely(unr < IA32_NR_syscalls)) {
 		unr = array_index_nospec(unr, IA32_NR_syscalls);
-		regs->ax = ia32_sys_call(regs, unr);
+		if (likely(cpu_feature_enabled(X86_FEATURE_INDIRECT_BRANCH_OK)))
+			regs->ax = ia32_sys_call_table[unr](regs);
+		else
+			regs->ax = ia32_sys_call(regs, unr);
 	} else if (nr != -1) {
 		regs->ax = __ia32_sys_ni_syscall(regs);
 	}
diff --git a/arch/x86/entry/syscall_32.c b/arch/x86/entry/syscall_32.c
index c2235bae17ef..aab31760b4e3 100644
--- a/arch/x86/entry/syscall_32.c
+++ b/arch/x86/entry/syscall_32.c
@@ -14,25 +14,16 @@
 #endif
 
 #define __SYSCALL(nr, sym) extern long __ia32_##sym(const struct pt_regs *);
-
 #include <asm/syscalls_32.h>
 #undef __SYSCALL
 
-/*
- * The sys_call_table[] is no longer used for system calls, but
- * kernel/trace/trace_syscalls.c still wants to know the system
- * call address.
- */
-#ifdef CONFIG_X86_32
 #define __SYSCALL(nr, sym) __ia32_##sym,
-const sys_call_ptr_t sys_call_table[] = {
+const sys_call_ptr_t ia32_sys_call_table[] = {
 #include <asm/syscalls_32.h>
 };
 #undef __SYSCALL
-#endif
 
 #define __SYSCALL(nr, sym) case nr: return __ia32_##sym(regs);
-
 long ia32_sys_call(const struct pt_regs *regs, unsigned int nr)
 {
 	switch (nr) {
diff --git a/arch/x86/entry/syscall_64.c b/arch/x86/entry/syscall_64.c
index 33b3f09e6f15..96ea1f8a1d3f 100644
--- a/arch/x86/entry/syscall_64.c
+++ b/arch/x86/entry/syscall_64.c
@@ -11,11 +11,6 @@
 #include <asm/syscalls_64.h>
 #undef __SYSCALL
 
-/*
- * The sys_call_table[] is no longer used for system calls, but
- * kernel/trace/trace_syscalls.c still wants to know the system
- * call address.
- */
 #define __SYSCALL(nr, sym) __x64_##sym,
 const sys_call_ptr_t sys_call_table[] = {
 #include <asm/syscalls_64.h>
@@ -23,7 +18,6 @@ const sys_call_ptr_t sys_call_table[] = {
 #undef __SYSCALL
 
 #define __SYSCALL(nr, sym) case nr: return __x64_##sym(regs);
-
 long x64_sys_call(const struct pt_regs *regs, unsigned int nr)
 {
 	switch (nr) {
diff --git a/arch/x86/entry/syscall_x32.c b/arch/x86/entry/syscall_x32.c
index 03de4a932131..5aef4230faca 100644
--- a/arch/x86/entry/syscall_x32.c
+++ b/arch/x86/entry/syscall_x32.c
@@ -11,8 +11,13 @@
 #include <asm/syscalls_x32.h>
 #undef __SYSCALL
 
-#define __SYSCALL(nr, sym) case nr: return __x64_##sym(regs);
+#define __SYSCALL(nr, sym) __x64_##sym,
+const sys_call_ptr_t x32_sys_call_table[] = {
+#include <asm/syscalls_x32.h>
+};
+#undef __SYSCALL
 
+#define __SYSCALL(nr, sym) case nr: return __x64_##sym(regs);
 long x32_sys_call(const struct pt_regs *regs, unsigned int nr)
 {
 	switch (nr) {
diff --git a/arch/x86/include/asm/cpufeatures.h b/arch/x86/include/asm/cpufeatures.h
index 3c7434329661..d64b0a5291f1 100644
--- a/arch/x86/include/asm/cpufeatures.h
+++ b/arch/x86/include/asm/cpufeatures.h
@@ -470,6 +470,7 @@
 #define X86_FEATURE_BHI_CTRL		(21*32+ 2) /* "" BHI_DIS_S HW control available */
 #define X86_FEATURE_CLEAR_BHB_HW	(21*32+ 3) /* "" BHI_DIS_S HW control enabled */
 #define X86_FEATURE_CLEAR_BHB_LOOP_ON_VMEXIT (21*32+ 4) /* "" Clear branch history at vmexit using SW loop */
+#define X86_FEATURE_INDIRECT_BRANCH_OK	(21*32+ 5) /* "" It's OK to use indirect branches */
 
 /*
  * BUG word(s)
diff --git a/arch/x86/include/asm/syscall.h b/arch/x86/include/asm/syscall.h
index 2fc7bc3863ff..dfb59521244c 100644
--- a/arch/x86/include/asm/syscall.h
+++ b/arch/x86/include/asm/syscall.h
@@ -16,14 +16,20 @@
 #include <asm/thread_info.h>	/* for TS_COMPAT */
 #include <asm/unistd.h>
 
-/* This is used purely for kernel/trace/trace_syscalls.c */
 typedef long (*sys_call_ptr_t)(const struct pt_regs *);
 extern const sys_call_ptr_t sys_call_table[];
 
+#if defined(CONFIG_X86_32)
+#define ia32_sys_call_table sys_call_table
+#else
 /*
  * These may not exist, but still put the prototypes in so we
  * can use IS_ENABLED().
  */
+extern const sys_call_ptr_t ia32_sys_call_table[];
+extern const sys_call_ptr_t x32_sys_call_table[];
+#endif
+
 extern long ia32_sys_call(const struct pt_regs *, unsigned int nr);
 extern long x32_sys_call(const struct pt_regs *, unsigned int nr);
 extern long x64_sys_call(const struct pt_regs *, unsigned int nr);
diff --git a/arch/x86/kernel/cpu/bugs.c b/arch/x86/kernel/cpu/bugs.c
index ab18185894df..5fca46c78daf 100644
--- a/arch/x86/kernel/cpu/bugs.c
+++ b/arch/x86/kernel/cpu/bugs.c
@@ -1665,6 +1665,12 @@ static void __init bhi_select_mitigation(void)
 	if (!IS_ENABLED(CONFIG_X86_64))
 		return;
 
+	/*
+	 * There's no HW mitigation in place.  Mark indirect branches as
+	 * "not OK".
+	 */
+	setup_clear_cpu_cap(X86_FEATURE_INDIRECT_BRANCH_OK);
+
 	/* Mitigate KVM by default */
 	setup_force_cpu_cap(X86_FEATURE_CLEAR_BHB_LOOP_ON_VMEXIT);
 	pr_info("Spectre BHI mitigation: SW BHB clearing on vm exit\n");
@@ -1679,6 +1685,28 @@ static void __init spectre_v2_select_mitigation(void)
 	enum spectre_v2_mitigation_cmd cmd = spectre_v2_parse_cmdline();
 	enum spectre_v2_mitigation mode = SPECTRE_V2_NONE;
 
+	/*
+	 * X86_FEATURE_INDIRECT_BRANCH_OK indicates that indirect calls are
+	 * "OK" to use due to (at least) one of the following being true:
+	 *
+	 *   - the CPU isn't vulnerable to Spectre v2, BHI, etc;
+	 *
+	 *   - a HW mitigation is in place (e.g., IBRS, eIBRS+BHI_DIS_S); or
+	 *
+	 *   - the user disabled mitigations.
+	 *
+	 * Clearing the bit enables certain indirect branch "easy targets" [*]
+	 * to be converted to a series of direct branches.
+	 *
+	 * Assume innocence until proven guilty: set it now and clear it later
+	 * if/when needed.
+	 *
+	 * [*] The closer the indirect branch is to kernel entry, and the more
+	 *     user-controlled registers there are, the easier target it may be
+	 *     for future Spectre v2 variants.
+	 */
+	setup_force_cpu_cap(X86_FEATURE_INDIRECT_BRANCH_OK);
+
 	/*
 	 * If the CPU is not affected and the command line mode is NONE or AUTO
 	 * then nothing to do.
@@ -1765,11 +1793,16 @@ static void __init spectre_v2_select_mitigation(void)
 		break;
 
 	case SPECTRE_V2_LFENCE:
+		setup_clear_cpu_cap(X86_FEATURE_INDIRECT_BRANCH_OK);
+		fallthrough;
 	case SPECTRE_V2_EIBRS_LFENCE:
 		setup_force_cpu_cap(X86_FEATURE_RETPOLINE_LFENCE);
-		fallthrough;
+		setup_force_cpu_cap(X86_FEATURE_RETPOLINE);
+		break;
 
 	case SPECTRE_V2_RETPOLINE:
+		setup_clear_cpu_cap(X86_FEATURE_INDIRECT_BRANCH_OK);
+		fallthrough;
 	case SPECTRE_V2_EIBRS_RETPOLINE:
 		setup_force_cpu_cap(X86_FEATURE_RETPOLINE);
 		break;
-- 
2.44.0


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH v4 2/5] cpu/speculation: Fix CPU mitigation defaults for !x86
  2024-04-19 21:09 [PATCH 0/5] x86/bugs: more BHI fixes Josh Poimboeuf
  2024-04-19 21:09 ` [PATCH v4 1/5] x86/bugs: Only harden syscalls when needed Josh Poimboeuf
@ 2024-04-19 21:09 ` Josh Poimboeuf
  2024-04-20  0:09   ` Sean Christopherson
  2024-04-19 21:09 ` [PATCH v4 3/5] x86/syscall: Mark exit[_group] syscall handlers __noreturn Josh Poimboeuf
                   ` (2 subsequent siblings)
  4 siblings, 1 reply; 21+ messages in thread
From: Josh Poimboeuf @ 2024-04-19 21:09 UTC (permalink / raw)
  To: x86
  Cc: linux-kernel, Linus Torvalds, Daniel Sneddon, Pawan Gupta,
	Thomas Gleixner, Alexandre Chartre, Konrad Rzeszutek Wilk,
	Peter Zijlstra, Greg Kroah-Hartman, Sean Christopherson,
	Andrew Cooper, Dave Hansen, Nikolay Borisov, KP Singh,
	Waiman Long, Borislav Petkov, Ingo Molnar, Stephen Rothwell,
	Michael Ellerman, Geert Uytterhoeven

CPU speculative execution mitigations were inadvertently disabled on
non-x86 arches by the following commit:

 f337a6a21e2f ("x86/cpu: Actually turn off mitigations by default for SPECULATION_MITIGATIONS=n")

Fix it by replacing CONFIG_SPECULATION_MITIGATIONS with a new generic
CONFIG_CPU_MITIGATIONS option and moving the x86-specific mitigations to
a separate menu which depends on CONFIG_CPU_MITIGATIONS.

Fixes: f337a6a21e2f ("x86/cpu: Actually turn off mitigations by default for SPECULATION_MITIGATIONS=n")
Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Reported-by: Michael Ellerman <mpe@ellerman.id.au>
Reported-by: Geert Uytterhoeven <geert@linux-m68k.org>
Closes: https://lkml.kernel.org/r/20240413115324.53303a68%40canb.auug.org.au
Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org>
---
 arch/Kconfig     | 10 ++++++++++
 arch/x86/Kconfig | 15 +++------------
 kernel/cpu.c     |  4 ++--
 3 files changed, 15 insertions(+), 14 deletions(-)

diff --git a/arch/Kconfig b/arch/Kconfig
index 9f066785bb71..5c96849eb957 100644
--- a/arch/Kconfig
+++ b/arch/Kconfig
@@ -11,6 +11,16 @@ source "arch/$(SRCARCH)/Kconfig"
 
 menu "General architecture-dependent options"
 
+config CPU_MITIGATIONS
+	bool "Mitigations for CPU speculative execution vulnerabilities"
+	default y
+	help
+	  Say Y here to enable mitigations for CPU speculative execution
+	  vulnerabilities.
+
+	  If you say N, all mitigations will be disabled. You really
+	  should know what you are doing to say so.
+
 config ARCH_HAS_SUBPAGE_FAULTS
 	bool
 	help
diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index 4474bf32d0a4..85a4d57bce1e 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -2488,17 +2488,8 @@ config PREFIX_SYMBOLS
 	def_bool y
 	depends on CALL_PADDING && !CFI_CLANG
 
-menuconfig SPECULATION_MITIGATIONS
-	bool "Mitigations for speculative execution vulnerabilities"
-	default y
-	help
-	  Say Y here to enable options which enable mitigations for
-	  speculative execution hardware vulnerabilities.
-
-	  If you say N, all mitigations will be disabled. You really
-	  should know what you are doing to say so.
-
-if SPECULATION_MITIGATIONS
+menu "CPU speculative execution mitigation defaults"
+	depends on CPU_MITIGATIONS
 
 config MITIGATION_PAGE_TABLE_ISOLATION
 	bool "Remove the kernel mapping in user mode"
@@ -2643,7 +2634,7 @@ config MITIGATION_SPECTRE_BHI
 	  indirect branches.
 	  See <file:Documentation/admin-guide/hw-vuln/spectre.rst>
 
-endif
+endmenu
 
 config ARCH_HAS_ADD_PAGES
 	def_bool y
diff --git a/kernel/cpu.c b/kernel/cpu.c
index 07ad53b7f119..bb0ff275fb46 100644
--- a/kernel/cpu.c
+++ b/kernel/cpu.c
@@ -3207,8 +3207,8 @@ enum cpu_mitigations {
 };
 
 static enum cpu_mitigations cpu_mitigations __ro_after_init =
-	IS_ENABLED(CONFIG_SPECULATION_MITIGATIONS) ? CPU_MITIGATIONS_AUTO :
-						     CPU_MITIGATIONS_OFF;
+	IS_ENABLED(CONFIG_CPU_MITIGATIONS) ? CPU_MITIGATIONS_AUTO :
+					     CPU_MITIGATIONS_OFF;
 
 static int __init mitigations_parse_cmdline(char *arg)
 {
-- 
2.44.0


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH v4 3/5] x86/syscall: Mark exit[_group] syscall handlers __noreturn
  2024-04-19 21:09 [PATCH 0/5] x86/bugs: more BHI fixes Josh Poimboeuf
  2024-04-19 21:09 ` [PATCH v4 1/5] x86/bugs: Only harden syscalls when needed Josh Poimboeuf
  2024-04-19 21:09 ` [PATCH v4 2/5] cpu/speculation: Fix CPU mitigation defaults for !x86 Josh Poimboeuf
@ 2024-04-19 21:09 ` Josh Poimboeuf
  2024-04-20 13:58   ` Paul E. McKenney
  2024-04-19 21:09 ` [PATCH v4 4/5] x86/bugs: Remove duplicate Spectre cmdline option descriptions Josh Poimboeuf
  2024-04-19 21:09 ` [PATCH v4 5/5] x86/bugs: Add 'spectre_bhi=vmexit' cmdline option Josh Poimboeuf
  4 siblings, 1 reply; 21+ messages in thread
From: Josh Poimboeuf @ 2024-04-19 21:09 UTC (permalink / raw)
  To: x86
  Cc: linux-kernel, Linus Torvalds, Daniel Sneddon, Pawan Gupta,
	Thomas Gleixner, Alexandre Chartre, Konrad Rzeszutek Wilk,
	Peter Zijlstra, Greg Kroah-Hartman, Sean Christopherson,
	Andrew Cooper, Dave Hansen, Nikolay Borisov, KP Singh,
	Waiman Long, Borislav Petkov, Ingo Molnar, Paul E. McKenney

The direct-call syscall dispatch functions don't know that the exit()
and exit_group() syscall handlers don't return.  As a result the call
sites aren't optimized accordingly.

Fix that by marking those exit syscall declarations as __noreturn.

Fixes the following warnings:

  vmlinux.o: warning: objtool: x64_sys_call+0x2804: __x64_sys_exit() is missing a __noreturn annotation
  vmlinux.o: warning: objtool: ia32_sys_call+0x29b6: __ia32_sys_exit_group() is missing a __noreturn annotation

Fixes: 7390db8aea0d ("x86/bhi: Add support for clearing branch history at syscall entry")
Reported-by: "Paul E. McKenney" <paulmck@kernel.org>
Closes: https://lkml.kernel.org/lkml/6dba9b32-db2c-4e6d-9500-7a08852f17a3@paulmck-laptop
Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org>
---
 arch/x86/entry/syscall_64.c            | 4 ++++
 arch/x86/entry/syscall_x32.c           | 4 ++++
 arch/x86/entry/syscalls/syscall_64.tbl | 6 +++---
 arch/x86/um/sys_call_table_32.c        | 1 +
 arch/x86/um/sys_call_table_64.c        | 1 +
 scripts/syscalltbl.sh                  | 6 ++++--
 tools/objtool/noreturns.h              | 4 ++++
 7 files changed, 21 insertions(+), 5 deletions(-)

diff --git a/arch/x86/entry/syscall_64.c b/arch/x86/entry/syscall_64.c
index 96ea1f8a1d3f..ff36a993a07e 100644
--- a/arch/x86/entry/syscall_64.c
+++ b/arch/x86/entry/syscall_64.c
@@ -8,9 +8,13 @@
 #include <asm/syscall.h>
 
 #define __SYSCALL(nr, sym) extern long __x64_##sym(const struct pt_regs *);
+#define __SYSCALL_NORETURN(nr, sym) extern long __noreturn __x64_##sym(const struct pt_regs *);
 #include <asm/syscalls_64.h>
 #undef __SYSCALL
 
+#undef __SYSCALL_NORETURN
+#define __SYSCALL_NORETURN __SYSCALL
+
 #define __SYSCALL(nr, sym) __x64_##sym,
 const sys_call_ptr_t sys_call_table[] = {
 #include <asm/syscalls_64.h>
diff --git a/arch/x86/entry/syscall_x32.c b/arch/x86/entry/syscall_x32.c
index 5aef4230faca..4221ecce6e68 100644
--- a/arch/x86/entry/syscall_x32.c
+++ b/arch/x86/entry/syscall_x32.c
@@ -8,9 +8,13 @@
 #include <asm/syscall.h>
 
 #define __SYSCALL(nr, sym) extern long __x64_##sym(const struct pt_regs *);
+#define __SYSCALL_NORETURN(nr, sym) extern long __noreturn __x64_##sym(const struct pt_regs *);
 #include <asm/syscalls_x32.h>
 #undef __SYSCALL
 
+#undef __SYSCALL_NORETURN
+#define __SYSCALL_NORETURN __SYSCALL
+
 #define __SYSCALL(nr, sym) __x64_##sym,
 const sys_call_ptr_t x32_sys_call_table[] = {
 #include <asm/syscalls_x32.h>
diff --git a/arch/x86/entry/syscalls/syscall_64.tbl b/arch/x86/entry/syscalls/syscall_64.tbl
index 7e8d46f4147f..6695105d21b5 100644
--- a/arch/x86/entry/syscalls/syscall_64.tbl
+++ b/arch/x86/entry/syscalls/syscall_64.tbl
@@ -2,7 +2,7 @@
 # 64-bit system call numbers and entry vectors
 #
 # The format is:
-# <number> <abi> <name> <entry point>
+# <number> <abi> <name> <entry point> [0 noreturn]
 #
 # The __x64_sys_*() stubs are created on-the-fly for sys_*() system calls
 #
@@ -68,7 +68,7 @@
 57	common	fork			sys_fork
 58	common	vfork			sys_vfork
 59	64	execve			sys_execve
-60	common	exit			sys_exit
+60	common	exit			sys_exit		0	noreturn
 61	common	wait4			sys_wait4
 62	common	kill			sys_kill
 63	common	uname			sys_newuname
@@ -239,7 +239,7 @@
 228	common	clock_gettime		sys_clock_gettime
 229	common	clock_getres		sys_clock_getres
 230	common	clock_nanosleep		sys_clock_nanosleep
-231	common	exit_group		sys_exit_group
+231	common	exit_group		sys_exit_group		0	noreturn
 232	common	epoll_wait		sys_epoll_wait
 233	common	epoll_ctl		sys_epoll_ctl
 234	common	tgkill			sys_tgkill
diff --git a/arch/x86/um/sys_call_table_32.c b/arch/x86/um/sys_call_table_32.c
index 89df5d89d664..c7d4bf955d2b 100644
--- a/arch/x86/um/sys_call_table_32.c
+++ b/arch/x86/um/sys_call_table_32.c
@@ -24,6 +24,7 @@
 #define __SYSCALL_WITH_COMPAT(nr, native, compat)	__SYSCALL(nr, native)
 
 #define __SYSCALL(nr, sym) extern asmlinkage long sym(unsigned long, unsigned long, unsigned long, unsigned long, unsigned long, unsigned long);
+#define __SYSCALL_NORETURN __SYSCALL
 #include <asm/syscalls_32.h>
 
 #undef __SYSCALL
diff --git a/arch/x86/um/sys_call_table_64.c b/arch/x86/um/sys_call_table_64.c
index b0b4cfd2308c..4760c40ae5cd 100644
--- a/arch/x86/um/sys_call_table_64.c
+++ b/arch/x86/um/sys_call_table_64.c
@@ -19,6 +19,7 @@
 #define sys_ioperm sys_ni_syscall
 
 #define __SYSCALL(nr, sym) extern asmlinkage long sym(unsigned long, unsigned long, unsigned long, unsigned long, unsigned long, unsigned long);
+#define __SYSCALL_NORETURN __SYSCALL
 #include <asm/syscalls_64.h>
 
 #undef __SYSCALL
diff --git a/scripts/syscalltbl.sh b/scripts/syscalltbl.sh
index 6abe143889ef..16487d47e06a 100755
--- a/scripts/syscalltbl.sh
+++ b/scripts/syscalltbl.sh
@@ -54,7 +54,7 @@ nxt=0
 
 grep -E "^[0-9]+[[:space:]]+$abis" "$infile" | {
 
-	while read nr abi name native compat ; do
+	while read nr abi name native compat noreturn; do
 
 		if [ $nxt -gt $nr ]; then
 			echo "error: $infile: syscall table is not sorted or duplicates the same syscall number" >&2
@@ -66,7 +66,9 @@ grep -E "^[0-9]+[[:space:]]+$abis" "$infile" | {
 			nxt=$((nxt + 1))
 		done
 
-		if [ -n "$compat" ]; then
+		if [ -n "$noreturn" ]; then
+			echo "__SYSCALL_NORETURN($nr, $native)"
+		elif [ -n "$compat" ]; then
 			echo "__SYSCALL_WITH_COMPAT($nr, $native, $compat)"
 		elif [ -n "$native" ]; then
 			echo "__SYSCALL($nr, $native)"
diff --git a/tools/objtool/noreturns.h b/tools/objtool/noreturns.h
index 7ebf29c91184..1e8141ef1b15 100644
--- a/tools/objtool/noreturns.h
+++ b/tools/objtool/noreturns.h
@@ -7,12 +7,16 @@
  * Yes, this is unfortunate.  A better solution is in the works.
  */
 NORETURN(__fortify_panic)
+NORETURN(__ia32_sys_exit)
+NORETURN(__ia32_sys_exit_group)
 NORETURN(__kunit_abort)
 NORETURN(__module_put_and_kthread_exit)
 NORETURN(__reiserfs_panic)
 NORETURN(__stack_chk_fail)
 NORETURN(__tdx_hypercall_failed)
 NORETURN(__ubsan_handle_builtin_unreachable)
+NORETURN(__x64_sys_exit)
+NORETURN(__x64_sys_exit_group)
 NORETURN(arch_cpu_idle_dead)
 NORETURN(bch2_trans_in_restart_error)
 NORETURN(bch2_trans_restart_error)
-- 
2.44.0


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH v4 4/5] x86/bugs: Remove duplicate Spectre cmdline option descriptions
  2024-04-19 21:09 [PATCH 0/5] x86/bugs: more BHI fixes Josh Poimboeuf
                   ` (2 preceding siblings ...)
  2024-04-19 21:09 ` [PATCH v4 3/5] x86/syscall: Mark exit[_group] syscall handlers __noreturn Josh Poimboeuf
@ 2024-04-19 21:09 ` Josh Poimboeuf
  2024-04-19 21:09 ` [PATCH v4 5/5] x86/bugs: Add 'spectre_bhi=vmexit' cmdline option Josh Poimboeuf
  4 siblings, 0 replies; 21+ messages in thread
From: Josh Poimboeuf @ 2024-04-19 21:09 UTC (permalink / raw)
  To: x86
  Cc: linux-kernel, Linus Torvalds, Daniel Sneddon, Pawan Gupta,
	Thomas Gleixner, Alexandre Chartre, Konrad Rzeszutek Wilk,
	Peter Zijlstra, Greg Kroah-Hartman, Sean Christopherson,
	Andrew Cooper, Dave Hansen, Nikolay Borisov, KP Singh,
	Waiman Long, Borislav Petkov, Ingo Molnar

Duplicating the documentation of all the Spectre kernel cmdline options
in two separate places is unwieldy and error-prone.  Instead just add a
reference to kernel-parameters.txt from spectre.rst.

Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org>
---
 Documentation/admin-guide/hw-vuln/spectre.rst | 84 ++-----------------
 1 file changed, 9 insertions(+), 75 deletions(-)

diff --git a/Documentation/admin-guide/hw-vuln/spectre.rst b/Documentation/admin-guide/hw-vuln/spectre.rst
index 25a04cda4c2c..f9797ab6b38f 100644
--- a/Documentation/admin-guide/hw-vuln/spectre.rst
+++ b/Documentation/admin-guide/hw-vuln/spectre.rst
@@ -592,85 +592,19 @@ Spectre variant 2
 Mitigation control on the kernel command line
 ---------------------------------------------
 
-Spectre variant 2 mitigation can be disabled or force enabled at the
-kernel command line.
+In general the kernel selects reasonable default mitigations for the
+current CPU.
+
+Spectre default mitigations can be disabled or changed at the kernel
+command line with the following options:
 
 	nospectre_v1
-
-		[X86,PPC] Disable mitigations for Spectre Variant 1
-		(bounds check bypass). With this option data leaks are
-		possible in the system.
-
 	nospectre_v2
+	spectre_v2={option}
+	spectre_v2_user={option}
+	spectre_bhi={option}
 
-		[X86] Disable all mitigations for the Spectre variant 2
-		(indirect branch prediction) vulnerability. System may
-		allow data leaks with this option, which is equivalent
-		to spectre_v2=off.
-
-
-        spectre_v2=
-
-		[X86] Control mitigation of Spectre variant 2
-		(indirect branch speculation) vulnerability.
-		The default operation protects the kernel from
-		user space attacks.
-
-		on
-			unconditionally enable, implies
-			spectre_v2_user=on
-		off
-			unconditionally disable, implies
-		        spectre_v2_user=off
-		auto
-			kernel detects whether your CPU model is
-		        vulnerable
-
-		Selecting 'on' will, and 'auto' may, choose a
-		mitigation method at run time according to the
-		CPU, the available microcode, the setting of the
-		CONFIG_MITIGATION_RETPOLINE configuration option,
-		and the compiler with which the kernel was built.
-
-		Selecting 'on' will also enable the mitigation
-		against user space to user space task attacks.
-
-		Selecting 'off' will disable both the kernel and
-		the user space protections.
-
-		Specific mitigations can also be selected manually:
-
-                retpoline               auto pick between generic,lfence
-                retpoline,generic       Retpolines
-                retpoline,lfence        LFENCE; indirect branch
-                retpoline,amd           alias for retpoline,lfence
-                eibrs                   Enhanced/Auto IBRS
-                eibrs,retpoline         Enhanced/Auto IBRS + Retpolines
-                eibrs,lfence            Enhanced/Auto IBRS + LFENCE
-                ibrs                    use IBRS to protect kernel
-
-		Not specifying this option is equivalent to
-		spectre_v2=auto.
-
-		In general the kernel by default selects
-		reasonable mitigations for the current CPU. To
-		disable Spectre variant 2 mitigations, boot with
-		spectre_v2=off. Spectre variant 1 mitigations
-		cannot be disabled.
-
-	spectre_bhi=
-
-		[X86] Control mitigation of Branch History Injection
-		(BHI) vulnerability.  This setting affects the deployment
-		of the HW BHI control and the SW BHB clearing sequence.
-
-		on
-			(default) Enable the HW or SW mitigation as
-			needed.
-		off
-			Disable the mitigation.
-
-For spectre_v2_user see Documentation/admin-guide/kernel-parameters.txt
+For more details on the available options, refer to Documentation/admin-guide/kernel-parameters.txt
 
 Mitigation selection guide
 --------------------------
-- 
2.44.0


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH v4 5/5] x86/bugs: Add 'spectre_bhi=vmexit' cmdline option
  2024-04-19 21:09 [PATCH 0/5] x86/bugs: more BHI fixes Josh Poimboeuf
                   ` (3 preceding siblings ...)
  2024-04-19 21:09 ` [PATCH v4 4/5] x86/bugs: Remove duplicate Spectre cmdline option descriptions Josh Poimboeuf
@ 2024-04-19 21:09 ` Josh Poimboeuf
  2024-04-19 21:46   ` Josh Poimboeuf
  4 siblings, 1 reply; 21+ messages in thread
From: Josh Poimboeuf @ 2024-04-19 21:09 UTC (permalink / raw)
  To: x86
  Cc: linux-kernel, Linus Torvalds, Daniel Sneddon, Pawan Gupta,
	Thomas Gleixner, Alexandre Chartre, Konrad Rzeszutek Wilk,
	Peter Zijlstra, Greg Kroah-Hartman, Sean Christopherson,
	Andrew Cooper, Dave Hansen, Nikolay Borisov, KP Singh,
	Waiman Long, Borislav Petkov, Ingo Molnar, Maksim Davydov

In cloud environments it can be useful to *only* enable the vmexit
mitigation and leave syscalls vulnerable.  Add that as an option.

This is similar to the old spectre_v2=auto option which was removed with
the following commit:

  36d4fe147c87 ("x86/bugs: Remove CONFIG_BHI_MITIGATION_AUTO and spectre_bhi=auto")

with the main difference being that this has a more descriptive name and
is disabled by default.

Requested-by: Maksim Davydov <davydov-max@yandex-team.ru>
Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org>
---
 Documentation/admin-guide/kernel-parameters.txt | 12 +++++++++---
 arch/x86/kernel/cpu/bugs.c                      | 16 +++++++++++-----
 2 files changed, 20 insertions(+), 8 deletions(-)

diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
index 902ecd92a29f..83c4889b88d2 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -6069,9 +6069,15 @@
 			deployment of the HW BHI control and the SW BHB
 			clearing sequence.
 
-			on   - (default) Enable the HW or SW mitigation
-			       as needed.
-			off  - Disable the mitigation.
+			on     - (default) Enable the HW or SW mitigation as
+				 needed.  This protects the kernel from
+				 both syscalls and VMs.
+			vmexit - On systems which don't have the HW mitigation
+				 available, enable the SW mitigation on vmexit
+				 ONLY.  On such systems, the host kernel is
+				 protected from VM-originated BHI attacks, but
+				 may still be vulnerable to syscall attacks.
+			off    - Disable the mitigation.
 
 	spectre_v2=	[X86,EARLY] Control mitigation of Spectre variant 2
 			(indirect branch speculation) vulnerability.
diff --git a/arch/x86/kernel/cpu/bugs.c b/arch/x86/kernel/cpu/bugs.c
index 5fca46c78daf..575a4bb5a78d 100644
--- a/arch/x86/kernel/cpu/bugs.c
+++ b/arch/x86/kernel/cpu/bugs.c
@@ -1625,6 +1625,7 @@ static bool __init spec_ctrl_bhi_dis(void)
 enum bhi_mitigations {
 	BHI_MITIGATION_OFF,
 	BHI_MITIGATION_ON,
+	BHI_MITIGATION_VMEXIT_ONLY,
 };
 
 static enum bhi_mitigations bhi_mitigation __ro_after_init =
@@ -1639,6 +1640,8 @@ static int __init spectre_bhi_parse_cmdline(char *str)
 		bhi_mitigation = BHI_MITIGATION_OFF;
 	else if (!strcmp(str, "on"))
 		bhi_mitigation = BHI_MITIGATION_ON;
+	else if (!strcmp(str, "vmexit"))
+		bhi_mitigation = BHI_MITIGATION_VMEXIT_ONLY;
 	else
 		pr_err("Ignoring unknown spectre_bhi option (%s)", str);
 
@@ -1659,6 +1662,7 @@ static void __init bhi_select_mitigation(void)
 			return;
 	}
 
+	/* Mitigate in hardware if supported */
 	if (spec_ctrl_bhi_dis())
 		return;
 
@@ -1671,13 +1675,15 @@ static void __init bhi_select_mitigation(void)
 	 */
 	setup_clear_cpu_cap(X86_FEATURE_INDIRECT_BRANCH_OK);
 
-	/* Mitigate KVM by default */
-	setup_force_cpu_cap(X86_FEATURE_CLEAR_BHB_LOOP_ON_VMEXIT);
-	pr_info("Spectre BHI mitigation: SW BHB clearing on vm exit\n");
+	if (bhi_mitigation == BHI_MITIGATION_VMEXIT_ONLY) {
+		pr_info("Spectre BHI mitigation: SW BHB clearing on vm exit only\n");
+		setup_force_cpu_cap(X86_FEATURE_CLEAR_BHB_LOOP_ON_VMEXIT);
+		return;
+	}
 
-	/* Mitigate syscalls when the mitigation is forced =on */
+	pr_info("Spectre BHI mitigation: SW BHB clearing on syscall and vm exit\n");
 	setup_force_cpu_cap(X86_FEATURE_CLEAR_BHB_LOOP);
-	pr_info("Spectre BHI mitigation: SW BHB clearing on syscall\n");
+	setup_force_cpu_cap(X86_FEATURE_CLEAR_BHB_LOOP_ON_VMEXIT);
 }
 
 static void __init spectre_v2_select_mitigation(void)
-- 
2.44.0


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* Re: [PATCH v4 5/5] x86/bugs: Add 'spectre_bhi=vmexit' cmdline option
  2024-04-19 21:09 ` [PATCH v4 5/5] x86/bugs: Add 'spectre_bhi=vmexit' cmdline option Josh Poimboeuf
@ 2024-04-19 21:46   ` Josh Poimboeuf
  0 siblings, 0 replies; 21+ messages in thread
From: Josh Poimboeuf @ 2024-04-19 21:46 UTC (permalink / raw)
  To: x86
  Cc: linux-kernel, Linus Torvalds, Daniel Sneddon, Pawan Gupta,
	Thomas Gleixner, Alexandre Chartre, Konrad Rzeszutek Wilk,
	Peter Zijlstra, Greg Kroah-Hartman, Sean Christopherson,
	Andrew Cooper, Dave Hansen, Nikolay Borisov, KP Singh,
	Waiman Long, Borislav Petkov, Ingo Molnar, Maksim Davydov

On Fri, Apr 19, 2024 at 02:09:51PM -0700, Josh Poimboeuf wrote:
> In cloud environments it can be useful to *only* enable the vmexit
> mitigation and leave syscalls vulnerable.  Add that as an option.
> 
> This is similar to the old spectre_v2=auto option which was removed with

As Daniel pointed out this should be /v2/bhi/

-- 
Josh

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v4 2/5] cpu/speculation: Fix CPU mitigation defaults for !x86
  2024-04-19 21:09 ` [PATCH v4 2/5] cpu/speculation: Fix CPU mitigation defaults for !x86 Josh Poimboeuf
@ 2024-04-20  0:09   ` Sean Christopherson
  2024-04-23 14:10     ` Sean Christopherson
  0 siblings, 1 reply; 21+ messages in thread
From: Sean Christopherson @ 2024-04-20  0:09 UTC (permalink / raw)
  To: Josh Poimboeuf
  Cc: x86, linux-kernel, Linus Torvalds, Daniel Sneddon, Pawan Gupta,
	Thomas Gleixner, Alexandre Chartre, Konrad Rzeszutek Wilk,
	Peter Zijlstra, Greg Kroah-Hartman, Andrew Cooper, Dave Hansen,
	Nikolay Borisov, KP Singh, Waiman Long, Borislav Petkov,
	Ingo Molnar, Stephen Rothwell, Michael Ellerman,
	Geert Uytterhoeven

On Fri, Apr 19, 2024, Josh Poimboeuf wrote:
> CPU speculative execution mitigations were inadvertently disabled on
> non-x86 arches by the following commit:
> 
>  f337a6a21e2f ("x86/cpu: Actually turn off mitigations by default for SPECULATION_MITIGATIONS=n")
> 
> Fix it by replacing CONFIG_SPECULATION_MITIGATIONS with a new generic
> CONFIG_CPU_MITIGATIONS option and moving the x86-specific mitigations to
> a separate menu which depends on CONFIG_CPU_MITIGATIONS.

Ah drat, I didn't check my mailbox until after Cc'ing Linus my own version[*].

I don't have a strong preference between the two, though I do think it's worth
nothing that this will (obvioulsy) allow disabling mitigations at compile time
on all architectures, which may or may not be desirable.

[*] https://lore.kernel.org/all/20240420000556.2645001-2-seanjc@google.com

> Fixes: f337a6a21e2f ("x86/cpu: Actually turn off mitigations by default for SPECULATION_MITIGATIONS=n")
> Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
> Reported-by: Michael Ellerman <mpe@ellerman.id.au>
> Reported-by: Geert Uytterhoeven <geert@linux-m68k.org>
> Closes: https://lkml.kernel.org/r/20240413115324.53303a68%40canb.auug.org.au
> Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org>
> ---
>  arch/Kconfig     | 10 ++++++++++
>  arch/x86/Kconfig | 15 +++------------
>  kernel/cpu.c     |  4 ++--
>  3 files changed, 15 insertions(+), 14 deletions(-)
> 
> diff --git a/arch/Kconfig b/arch/Kconfig
> index 9f066785bb71..5c96849eb957 100644
> --- a/arch/Kconfig
> +++ b/arch/Kconfig
> @@ -11,6 +11,16 @@ source "arch/$(SRCARCH)/Kconfig"
>  
>  menu "General architecture-dependent options"
>  
> +config CPU_MITIGATIONS
> +	bool "Mitigations for CPU speculative execution vulnerabilities"
> +	default y
> +	help
> +	  Say Y here to enable mitigations for CPU speculative execution
> +	  vulnerabilities.
> +
> +	  If you say N, all mitigations will be disabled. You really
> +	  should know what you are doing to say so.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v4 3/5] x86/syscall: Mark exit[_group] syscall handlers __noreturn
  2024-04-19 21:09 ` [PATCH v4 3/5] x86/syscall: Mark exit[_group] syscall handlers __noreturn Josh Poimboeuf
@ 2024-04-20 13:58   ` Paul E. McKenney
  2024-04-21  5:25     ` Josh Poimboeuf
  0 siblings, 1 reply; 21+ messages in thread
From: Paul E. McKenney @ 2024-04-20 13:58 UTC (permalink / raw)
  To: Josh Poimboeuf
  Cc: x86, linux-kernel, Linus Torvalds, Daniel Sneddon, Pawan Gupta,
	Thomas Gleixner, Alexandre Chartre, Konrad Rzeszutek Wilk,
	Peter Zijlstra, Greg Kroah-Hartman, Sean Christopherson,
	Andrew Cooper, Dave Hansen, Nikolay Borisov, KP Singh,
	Waiman Long, Borislav Petkov, Ingo Molnar

On Fri, Apr 19, 2024 at 02:09:49PM -0700, Josh Poimboeuf wrote:
> The direct-call syscall dispatch functions don't know that the exit()
> and exit_group() syscall handlers don't return.  As a result the call
> sites aren't optimized accordingly.
> 
> Fix that by marking those exit syscall declarations as __noreturn.
> 
> Fixes the following warnings:
> 
>   vmlinux.o: warning: objtool: x64_sys_call+0x2804: __x64_sys_exit() is missing a __noreturn annotation
>   vmlinux.o: warning: objtool: ia32_sys_call+0x29b6: __ia32_sys_exit_group() is missing a __noreturn annotation
> 
> Fixes: 7390db8aea0d ("x86/bhi: Add support for clearing branch history at syscall entry")
> Reported-by: "Paul E. McKenney" <paulmck@kernel.org>
> Closes: https://lkml.kernel.org/lkml/6dba9b32-db2c-4e6d-9500-7a08852f17a3@paulmck-laptop
> Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org>

Looks good, but it does not apply on top of current -next and I don't
trust myself to hand-apply it (something about having just got off of
a flight across the big pond).

Could you please let me know what else do I need to pull in to be able
to cleanly apply this one?

							Thanx, Paul

> ---
>  arch/x86/entry/syscall_64.c            | 4 ++++
>  arch/x86/entry/syscall_x32.c           | 4 ++++
>  arch/x86/entry/syscalls/syscall_64.tbl | 6 +++---
>  arch/x86/um/sys_call_table_32.c        | 1 +
>  arch/x86/um/sys_call_table_64.c        | 1 +
>  scripts/syscalltbl.sh                  | 6 ++++--
>  tools/objtool/noreturns.h              | 4 ++++
>  7 files changed, 21 insertions(+), 5 deletions(-)
> 
> diff --git a/arch/x86/entry/syscall_64.c b/arch/x86/entry/syscall_64.c
> index 96ea1f8a1d3f..ff36a993a07e 100644
> --- a/arch/x86/entry/syscall_64.c
> +++ b/arch/x86/entry/syscall_64.c
> @@ -8,9 +8,13 @@
>  #include <asm/syscall.h>
>  
>  #define __SYSCALL(nr, sym) extern long __x64_##sym(const struct pt_regs *);
> +#define __SYSCALL_NORETURN(nr, sym) extern long __noreturn __x64_##sym(const struct pt_regs *);
>  #include <asm/syscalls_64.h>
>  #undef __SYSCALL
>  
> +#undef __SYSCALL_NORETURN
> +#define __SYSCALL_NORETURN __SYSCALL
> +
>  #define __SYSCALL(nr, sym) __x64_##sym,
>  const sys_call_ptr_t sys_call_table[] = {
>  #include <asm/syscalls_64.h>
> diff --git a/arch/x86/entry/syscall_x32.c b/arch/x86/entry/syscall_x32.c
> index 5aef4230faca..4221ecce6e68 100644
> --- a/arch/x86/entry/syscall_x32.c
> +++ b/arch/x86/entry/syscall_x32.c
> @@ -8,9 +8,13 @@
>  #include <asm/syscall.h>
>  
>  #define __SYSCALL(nr, sym) extern long __x64_##sym(const struct pt_regs *);
> +#define __SYSCALL_NORETURN(nr, sym) extern long __noreturn __x64_##sym(const struct pt_regs *);
>  #include <asm/syscalls_x32.h>
>  #undef __SYSCALL
>  
> +#undef __SYSCALL_NORETURN
> +#define __SYSCALL_NORETURN __SYSCALL
> +
>  #define __SYSCALL(nr, sym) __x64_##sym,
>  const sys_call_ptr_t x32_sys_call_table[] = {
>  #include <asm/syscalls_x32.h>
> diff --git a/arch/x86/entry/syscalls/syscall_64.tbl b/arch/x86/entry/syscalls/syscall_64.tbl
> index 7e8d46f4147f..6695105d21b5 100644
> --- a/arch/x86/entry/syscalls/syscall_64.tbl
> +++ b/arch/x86/entry/syscalls/syscall_64.tbl
> @@ -2,7 +2,7 @@
>  # 64-bit system call numbers and entry vectors
>  #
>  # The format is:
> -# <number> <abi> <name> <entry point>
> +# <number> <abi> <name> <entry point> [0 noreturn]
>  #
>  # The __x64_sys_*() stubs are created on-the-fly for sys_*() system calls
>  #
> @@ -68,7 +68,7 @@
>  57	common	fork			sys_fork
>  58	common	vfork			sys_vfork
>  59	64	execve			sys_execve
> -60	common	exit			sys_exit
> +60	common	exit			sys_exit		0	noreturn
>  61	common	wait4			sys_wait4
>  62	common	kill			sys_kill
>  63	common	uname			sys_newuname
> @@ -239,7 +239,7 @@
>  228	common	clock_gettime		sys_clock_gettime
>  229	common	clock_getres		sys_clock_getres
>  230	common	clock_nanosleep		sys_clock_nanosleep
> -231	common	exit_group		sys_exit_group
> +231	common	exit_group		sys_exit_group		0	noreturn
>  232	common	epoll_wait		sys_epoll_wait
>  233	common	epoll_ctl		sys_epoll_ctl
>  234	common	tgkill			sys_tgkill
> diff --git a/arch/x86/um/sys_call_table_32.c b/arch/x86/um/sys_call_table_32.c
> index 89df5d89d664..c7d4bf955d2b 100644
> --- a/arch/x86/um/sys_call_table_32.c
> +++ b/arch/x86/um/sys_call_table_32.c
> @@ -24,6 +24,7 @@
>  #define __SYSCALL_WITH_COMPAT(nr, native, compat)	__SYSCALL(nr, native)
>  
>  #define __SYSCALL(nr, sym) extern asmlinkage long sym(unsigned long, unsigned long, unsigned long, unsigned long, unsigned long, unsigned long);
> +#define __SYSCALL_NORETURN __SYSCALL
>  #include <asm/syscalls_32.h>
>  
>  #undef __SYSCALL
> diff --git a/arch/x86/um/sys_call_table_64.c b/arch/x86/um/sys_call_table_64.c
> index b0b4cfd2308c..4760c40ae5cd 100644
> --- a/arch/x86/um/sys_call_table_64.c
> +++ b/arch/x86/um/sys_call_table_64.c
> @@ -19,6 +19,7 @@
>  #define sys_ioperm sys_ni_syscall
>  
>  #define __SYSCALL(nr, sym) extern asmlinkage long sym(unsigned long, unsigned long, unsigned long, unsigned long, unsigned long, unsigned long);
> +#define __SYSCALL_NORETURN __SYSCALL
>  #include <asm/syscalls_64.h>
>  
>  #undef __SYSCALL
> diff --git a/scripts/syscalltbl.sh b/scripts/syscalltbl.sh
> index 6abe143889ef..16487d47e06a 100755
> --- a/scripts/syscalltbl.sh
> +++ b/scripts/syscalltbl.sh
> @@ -54,7 +54,7 @@ nxt=0
>  
>  grep -E "^[0-9]+[[:space:]]+$abis" "$infile" | {
>  
> -	while read nr abi name native compat ; do
> +	while read nr abi name native compat noreturn; do
>  
>  		if [ $nxt -gt $nr ]; then
>  			echo "error: $infile: syscall table is not sorted or duplicates the same syscall number" >&2
> @@ -66,7 +66,9 @@ grep -E "^[0-9]+[[:space:]]+$abis" "$infile" | {
>  			nxt=$((nxt + 1))
>  		done
>  
> -		if [ -n "$compat" ]; then
> +		if [ -n "$noreturn" ]; then
> +			echo "__SYSCALL_NORETURN($nr, $native)"
> +		elif [ -n "$compat" ]; then
>  			echo "__SYSCALL_WITH_COMPAT($nr, $native, $compat)"
>  		elif [ -n "$native" ]; then
>  			echo "__SYSCALL($nr, $native)"
> diff --git a/tools/objtool/noreturns.h b/tools/objtool/noreturns.h
> index 7ebf29c91184..1e8141ef1b15 100644
> --- a/tools/objtool/noreturns.h
> +++ b/tools/objtool/noreturns.h
> @@ -7,12 +7,16 @@
>   * Yes, this is unfortunate.  A better solution is in the works.
>   */
>  NORETURN(__fortify_panic)
> +NORETURN(__ia32_sys_exit)
> +NORETURN(__ia32_sys_exit_group)
>  NORETURN(__kunit_abort)
>  NORETURN(__module_put_and_kthread_exit)
>  NORETURN(__reiserfs_panic)
>  NORETURN(__stack_chk_fail)
>  NORETURN(__tdx_hypercall_failed)
>  NORETURN(__ubsan_handle_builtin_unreachable)
> +NORETURN(__x64_sys_exit)
> +NORETURN(__x64_sys_exit_group)
>  NORETURN(arch_cpu_idle_dead)
>  NORETURN(bch2_trans_in_restart_error)
>  NORETURN(bch2_trans_restart_error)
> -- 
> 2.44.0
> 

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v4 3/5] x86/syscall: Mark exit[_group] syscall handlers __noreturn
  2024-04-20 13:58   ` Paul E. McKenney
@ 2024-04-21  5:25     ` Josh Poimboeuf
  2024-04-21 20:40       ` Paul McKenney
  0 siblings, 1 reply; 21+ messages in thread
From: Josh Poimboeuf @ 2024-04-21  5:25 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: x86, linux-kernel, Linus Torvalds, Daniel Sneddon, Pawan Gupta,
	Thomas Gleixner, Alexandre Chartre, Konrad Rzeszutek Wilk,
	Peter Zijlstra, Greg Kroah-Hartman, Sean Christopherson,
	Andrew Cooper, Dave Hansen, Nikolay Borisov, KP Singh,
	Waiman Long, Borislav Petkov, Ingo Molnar

On Sat, Apr 20, 2024 at 06:58:58AM -0700, Paul E. McKenney wrote:
> On Fri, Apr 19, 2024 at 02:09:49PM -0700, Josh Poimboeuf wrote:
> > The direct-call syscall dispatch functions don't know that the exit()
> > and exit_group() syscall handlers don't return.  As a result the call
> > sites aren't optimized accordingly.
> > 
> > Fix that by marking those exit syscall declarations as __noreturn.
> > 
> > Fixes the following warnings:
> > 
> >   vmlinux.o: warning: objtool: x64_sys_call+0x2804: __x64_sys_exit() is missing a __noreturn annotation
> >   vmlinux.o: warning: objtool: ia32_sys_call+0x29b6: __ia32_sys_exit_group() is missing a __noreturn annotation
> > 
> > Fixes: 7390db8aea0d ("x86/bhi: Add support for clearing branch history at syscall entry")
> > Reported-by: "Paul E. McKenney" <paulmck@kernel.org>
> > Closes: https://lkml.kernel.org/lkml/6dba9b32-db2c-4e6d-9500-7a08852f17a3@paulmck-laptop
> > Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org>
> 
> Looks good, but it does not apply on top of current -next and I don't
> trust myself to hand-apply it (something about having just got off of
> a flight across the big pond).
> 
> Could you please let me know what else do I need to pull in to be able
> to cleanly apply this one?

This patch has a dependency on an earlier patch in the set:

  https://lkml.kernel.org/lkml/982d05a2f669140f26500bee643011896d661094.1713559768.git.jpoimboe@kernel.org

Though I think it's not a hard dependency and I could reverse the order
of the patches if needed.

-- 
Josh

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v4 3/5] x86/syscall: Mark exit[_group] syscall handlers __noreturn
  2024-04-21  5:25     ` Josh Poimboeuf
@ 2024-04-21 20:40       ` Paul McKenney
  2024-04-21 21:47         ` Paul McKenney
  0 siblings, 1 reply; 21+ messages in thread
From: Paul McKenney @ 2024-04-21 20:40 UTC (permalink / raw)
  To: Josh Poimboeuf
  Cc: Paul E. McKenney, x86, linux-kernel, Linus Torvalds,
	Daniel Sneddon, Pawan Gupta, Thomas Gleixner, Alexandre Chartre,
	Konrad Rzeszutek Wilk, Peter Zijlstra, Greg Kroah-Hartman,
	Sean Christopherson, Andrew Cooper, Dave Hansen, Nikolay Borisov,
	KP Singh, Waiman Long, Borislav Petkov, Ingo Molnar

They apply fine as is, so I have started tests with that pair of patches.

                                              Thanx, Paul

On Sat, Apr 20, 2024 at 10:25 PM Josh Poimboeuf <jpoimboe@kernel.org> wrote:
>
> On Sat, Apr 20, 2024 at 06:58:58AM -0700, Paul E. McKenney wrote:
> > On Fri, Apr 19, 2024 at 02:09:49PM -0700, Josh Poimboeuf wrote:
> > > The direct-call syscall dispatch functions don't know that the exit()
> > > and exit_group() syscall handlers don't return.  As a result the call
> > > sites aren't optimized accordingly.
> > >
> > > Fix that by marking those exit syscall declarations as __noreturn.
> > >
> > > Fixes the following warnings:
> > >
> > >   vmlinux.o: warning: objtool: x64_sys_call+0x2804: __x64_sys_exit() is missing a __noreturn annotation
> > >   vmlinux.o: warning: objtool: ia32_sys_call+0x29b6: __ia32_sys_exit_group() is missing a __noreturn annotation
> > >
> > > Fixes: 7390db8aea0d ("x86/bhi: Add support for clearing branch history at syscall entry")
> > > Reported-by: "Paul E. McKenney" <paulmck@kernel.org>
> > > Closes: https://lkml.kernel.org/lkml/6dba9b32-db2c-4e6d-9500-7a08852f17a3@paulmck-laptop
> > > Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org>
> >
> > Looks good, but it does not apply on top of current -next and I don't
> > trust myself to hand-apply it (something about having just got off of
> > a flight across the big pond).
> >
> > Could you please let me know what else do I need to pull in to be able
> > to cleanly apply this one?
>
> This patch has a dependency on an earlier patch in the set:
>
>   https://lkml.kernel.org/lkml/982d05a2f669140f26500bee643011896d661094.1713559768.git.jpoimboe@kernel.org
>
> Though I think it's not a hard dependency and I could reverse the order
> of the patches if needed.
>
> --
> Josh

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v4 3/5] x86/syscall: Mark exit[_group] syscall handlers __noreturn
  2024-04-21 20:40       ` Paul McKenney
@ 2024-04-21 21:47         ` Paul McKenney
  2024-05-02 23:48           ` Paul McKenney
  0 siblings, 1 reply; 21+ messages in thread
From: Paul McKenney @ 2024-04-21 21:47 UTC (permalink / raw)
  To: Josh Poimboeuf
  Cc: Paul E. McKenney, x86, linux-kernel, Linus Torvalds,
	Daniel Sneddon, Pawan Gupta, Thomas Gleixner, Alexandre Chartre,
	Konrad Rzeszutek Wilk, Peter Zijlstra, Greg Kroah-Hartman,
	Sean Christopherson, Andrew Cooper, Dave Hansen, Nikolay Borisov,
	KP Singh, Waiman Long, Borislav Petkov, Ingo Molnar

[-- Attachment #1: Type: text/plain, Size: 14430 bytes --]

And this definitely helped, thank you!

However, this one still remains:

vmlinux.o: warning: objtool: ia32_sys_call+0x29b6:
__ia32_sys_exit_group() is missing a __noreturn annotation

Please see below for my diffs against next-20240419, in case I messed
something up.

I attached a copy as well, given that I am away from mutt, hence using
gmail directly.

                                                Thanx, Paul

-----------------------------------------------------

diff --git a/arch/x86/entry/common.c b/arch/x86/entry/common.c
index 6de50b80702e6..9810ba2857a5c 100644
--- a/arch/x86/entry/common.c
+++ b/arch/x86/entry/common.c
@@ -49,7 +49,10 @@ static __always_inline bool do_syscall_x64(struct
pt_regs *regs, int nr)

  if (likely(unr < NR_syscalls)) {
  unr = array_index_nospec(unr, NR_syscalls);
- regs->ax = x64_sys_call(regs, unr);
+ if (likely(cpu_feature_enabled(X86_FEATURE_INDIRECT_BRANCH_OK)))
+ regs->ax = sys_call_table[unr](regs);
+ else
+ regs->ax = x64_sys_call(regs, unr);
  return true;
  }
  return false;
@@ -66,7 +69,10 @@ static __always_inline bool do_syscall_x32(struct
pt_regs *regs, int nr)

  if (IS_ENABLED(CONFIG_X86_X32_ABI) && likely(xnr < X32_NR_syscalls)) {
  xnr = array_index_nospec(xnr, X32_NR_syscalls);
- regs->ax = x32_sys_call(regs, xnr);
+ if (likely(cpu_feature_enabled(X86_FEATURE_INDIRECT_BRANCH_OK)))
+ regs->ax = x32_sys_call_table[xnr](regs);
+ else
+ regs->ax = x32_sys_call(regs, xnr);
  return true;
  }
  return false;
@@ -162,7 +168,10 @@ static __always_inline void
do_syscall_32_irqs_on(struct pt_regs *regs, int nr)

  if (likely(unr < IA32_NR_syscalls)) {
  unr = array_index_nospec(unr, IA32_NR_syscalls);
- regs->ax = ia32_sys_call(regs, unr);
+ if (likely(cpu_feature_enabled(X86_FEATURE_INDIRECT_BRANCH_OK)))
+ regs->ax = ia32_sys_call_table[unr](regs);
+ else
+ regs->ax = ia32_sys_call(regs, unr);
  } else if (nr != -1) {
  regs->ax = __ia32_sys_ni_syscall(regs);
  }
diff --git a/arch/x86/entry/syscall_32.c b/arch/x86/entry/syscall_32.c
index c2235bae17ef6..aab31760b4e3e 100644
--- a/arch/x86/entry/syscall_32.c
+++ b/arch/x86/entry/syscall_32.c
@@ -14,25 +14,16 @@
 #endif

 #define __SYSCALL(nr, sym) extern long __ia32_##sym(const struct pt_regs *);
-
 #include <asm/syscalls_32.h>
 #undef __SYSCALL

-/*
- * The sys_call_table[] is no longer used for system calls, but
- * kernel/trace/trace_syscalls.c still wants to know the system
- * call address.
- */
-#ifdef CONFIG_X86_32
 #define __SYSCALL(nr, sym) __ia32_##sym,
-const sys_call_ptr_t sys_call_table[] = {
+const sys_call_ptr_t ia32_sys_call_table[] = {
 #include <asm/syscalls_32.h>
 };
 #undef __SYSCALL
-#endif

 #define __SYSCALL(nr, sym) case nr: return __ia32_##sym(regs);
-
 long ia32_sys_call(const struct pt_regs *regs, unsigned int nr)
 {
  switch (nr) {
diff --git a/arch/x86/entry/syscall_64.c b/arch/x86/entry/syscall_64.c
index 33b3f09e6f151..ff36a993a07e0 100644
--- a/arch/x86/entry/syscall_64.c
+++ b/arch/x86/entry/syscall_64.c
@@ -8,14 +8,13 @@
 #include <asm/syscall.h>

 #define __SYSCALL(nr, sym) extern long __x64_##sym(const struct pt_regs *);
+#define __SYSCALL_NORETURN(nr, sym) extern long __noreturn
__x64_##sym(const struct pt_regs *);
 #include <asm/syscalls_64.h>
 #undef __SYSCALL

-/*
- * The sys_call_table[] is no longer used for system calls, but
- * kernel/trace/trace_syscalls.c still wants to know the system
- * call address.
- */
+#undef __SYSCALL_NORETURN
+#define __SYSCALL_NORETURN __SYSCALL
+
 #define __SYSCALL(nr, sym) __x64_##sym,
 const sys_call_ptr_t sys_call_table[] = {
 #include <asm/syscalls_64.h>
@@ -23,7 +22,6 @@ const sys_call_ptr_t sys_call_table[] = {
 #undef __SYSCALL

 #define __SYSCALL(nr, sym) case nr: return __x64_##sym(regs);
-
 long x64_sys_call(const struct pt_regs *regs, unsigned int nr)
 {
  switch (nr) {
diff --git a/arch/x86/entry/syscall_x32.c b/arch/x86/entry/syscall_x32.c
index 03de4a9321318..4221ecce6e689 100644
--- a/arch/x86/entry/syscall_x32.c
+++ b/arch/x86/entry/syscall_x32.c
@@ -8,11 +8,20 @@
 #include <asm/syscall.h>

 #define __SYSCALL(nr, sym) extern long __x64_##sym(const struct pt_regs *);
+#define __SYSCALL_NORETURN(nr, sym) extern long __noreturn
__x64_##sym(const struct pt_regs *);
 #include <asm/syscalls_x32.h>
 #undef __SYSCALL

-#define __SYSCALL(nr, sym) case nr: return __x64_##sym(regs);
+#undef __SYSCALL_NORETURN
+#define __SYSCALL_NORETURN __SYSCALL
+
+#define __SYSCALL(nr, sym) __x64_##sym,
+const sys_call_ptr_t x32_sys_call_table[] = {
+#include <asm/syscalls_x32.h>
+};
+#undef __SYSCALL

+#define __SYSCALL(nr, sym) case nr: return __x64_##sym(regs);
 long x32_sys_call(const struct pt_regs *regs, unsigned int nr)
 {
  switch (nr) {
diff --git a/arch/x86/entry/syscalls/syscall_64.tbl
b/arch/x86/entry/syscalls/syscall_64.tbl
index a396f6e6ab5bf..7ec68d94eb593 100644
--- a/arch/x86/entry/syscalls/syscall_64.tbl
+++ b/arch/x86/entry/syscalls/syscall_64.tbl
@@ -2,7 +2,7 @@
 # 64-bit system call numbers and entry vectors
 #
 # The format is:
-# <number> <abi> <name> <entry point>
+# <number> <abi> <name> <entry point> [0 noreturn]
 #
 # The __x64_sys_*() stubs are created on-the-fly for sys_*() system calls
 #
@@ -68,7 +68,7 @@
 57 common fork sys_fork
 58 common vfork sys_vfork
 59 64 execve sys_execve
-60 common exit sys_exit
+60 common exit sys_exit 0 noreturn
 61 common wait4 sys_wait4
 62 common kill sys_kill
 63 common uname sys_newuname
@@ -239,7 +239,7 @@
 228 common clock_gettime sys_clock_gettime
 229 common clock_getres sys_clock_getres
 230 common clock_nanosleep sys_clock_nanosleep
-231 common exit_group sys_exit_group
+231 common exit_group sys_exit_group 0 noreturn
 232 common epoll_wait sys_epoll_wait
 233 common epoll_ctl sys_epoll_ctl
 234 common tgkill sys_tgkill
diff --git a/arch/x86/include/asm/cpufeatures.h
b/arch/x86/include/asm/cpufeatures.h
index 3c7434329661c..d64b0a5291f10 100644
--- a/arch/x86/include/asm/cpufeatures.h
+++ b/arch/x86/include/asm/cpufeatures.h
@@ -470,6 +470,7 @@
 #define X86_FEATURE_BHI_CTRL (21*32+ 2) /* "" BHI_DIS_S HW control available */
 #define X86_FEATURE_CLEAR_BHB_HW (21*32+ 3) /* "" BHI_DIS_S HW
control enabled */
 #define X86_FEATURE_CLEAR_BHB_LOOP_ON_VMEXIT (21*32+ 4) /* "" Clear
branch history at vmexit using SW loop */
+#define X86_FEATURE_INDIRECT_BRANCH_OK (21*32+ 5) /* "" It's OK to
use indirect branches */

 /*
  * BUG word(s)
diff --git a/arch/x86/include/asm/syscall.h b/arch/x86/include/asm/syscall.h
index 2fc7bc3863ff6..dfb59521244c2 100644
--- a/arch/x86/include/asm/syscall.h
+++ b/arch/x86/include/asm/syscall.h
@@ -16,14 +16,20 @@
 #include <asm/thread_info.h> /* for TS_COMPAT */
 #include <asm/unistd.h>

-/* This is used purely for kernel/trace/trace_syscalls.c */
 typedef long (*sys_call_ptr_t)(const struct pt_regs *);
 extern const sys_call_ptr_t sys_call_table[];

+#if defined(CONFIG_X86_32)
+#define ia32_sys_call_table sys_call_table
+#else
 /*
  * These may not exist, but still put the prototypes in so we
  * can use IS_ENABLED().
  */
+extern const sys_call_ptr_t ia32_sys_call_table[];
+extern const sys_call_ptr_t x32_sys_call_table[];
+#endif
+
 extern long ia32_sys_call(const struct pt_regs *, unsigned int nr);
 extern long x32_sys_call(const struct pt_regs *, unsigned int nr);
 extern long x64_sys_call(const struct pt_regs *, unsigned int nr);
diff --git a/arch/x86/kernel/cpu/bugs.c b/arch/x86/kernel/cpu/bugs.c
index ab18185894dfd..5fca46c78daf2 100644
--- a/arch/x86/kernel/cpu/bugs.c
+++ b/arch/x86/kernel/cpu/bugs.c
@@ -1665,6 +1665,12 @@ static void __init bhi_select_mitigation(void)
  if (!IS_ENABLED(CONFIG_X86_64))
  return;

+ /*
+ * There's no HW mitigation in place.  Mark indirect branches as
+ * "not OK".
+ */
+ setup_clear_cpu_cap(X86_FEATURE_INDIRECT_BRANCH_OK);
+
  /* Mitigate KVM by default */
  setup_force_cpu_cap(X86_FEATURE_CLEAR_BHB_LOOP_ON_VMEXIT);
  pr_info("Spectre BHI mitigation: SW BHB clearing on vm exit\n");
@@ -1679,6 +1685,28 @@ static void __init spectre_v2_select_mitigation(void)
  enum spectre_v2_mitigation_cmd cmd = spectre_v2_parse_cmdline();
  enum spectre_v2_mitigation mode = SPECTRE_V2_NONE;

+ /*
+ * X86_FEATURE_INDIRECT_BRANCH_OK indicates that indirect calls are
+ * "OK" to use due to (at least) one of the following being true:
+ *
+ *   - the CPU isn't vulnerable to Spectre v2, BHI, etc;
+ *
+ *   - a HW mitigation is in place (e.g., IBRS, eIBRS+BHI_DIS_S); or
+ *
+ *   - the user disabled mitigations.
+ *
+ * Clearing the bit enables certain indirect branch "easy targets" [*]
+ * to be converted to a series of direct branches.
+ *
+ * Assume innocence until proven guilty: set it now and clear it later
+ * if/when needed.
+ *
+ * [*] The closer the indirect branch is to kernel entry, and the more
+ *     user-controlled registers there are, the easier target it may be
+ *     for future Spectre v2 variants.
+ */
+ setup_force_cpu_cap(X86_FEATURE_INDIRECT_BRANCH_OK);
+
  /*
  * If the CPU is not affected and the command line mode is NONE or AUTO
  * then nothing to do.
@@ -1765,11 +1793,16 @@ static void __init spectre_v2_select_mitigation(void)
  break;

  case SPECTRE_V2_LFENCE:
+ setup_clear_cpu_cap(X86_FEATURE_INDIRECT_BRANCH_OK);
+ fallthrough;
  case SPECTRE_V2_EIBRS_LFENCE:
  setup_force_cpu_cap(X86_FEATURE_RETPOLINE_LFENCE);
- fallthrough;
+ setup_force_cpu_cap(X86_FEATURE_RETPOLINE);
+ break;

  case SPECTRE_V2_RETPOLINE:
+ setup_clear_cpu_cap(X86_FEATURE_INDIRECT_BRANCH_OK);
+ fallthrough;
  case SPECTRE_V2_EIBRS_RETPOLINE:
  setup_force_cpu_cap(X86_FEATURE_RETPOLINE);
  break;
diff --git a/arch/x86/um/sys_call_table_32.c b/arch/x86/um/sys_call_table_32.c
index 89df5d89d6640..c7d4bf955d2ba 100644
--- a/arch/x86/um/sys_call_table_32.c
+++ b/arch/x86/um/sys_call_table_32.c
@@ -24,6 +24,7 @@
 #define __SYSCALL_WITH_COMPAT(nr, native, compat) __SYSCALL(nr, native)

 #define __SYSCALL(nr, sym) extern asmlinkage long sym(unsigned long,
unsigned long, unsigned long, unsigned long, unsigned long, unsigned
long);
+#define __SYSCALL_NORETURN __SYSCALL
 #include <asm/syscalls_32.h>

 #undef __SYSCALL
diff --git a/arch/x86/um/sys_call_table_64.c b/arch/x86/um/sys_call_table_64.c
index b0b4cfd2308c8..4760c40ae5cd0 100644
--- a/arch/x86/um/sys_call_table_64.c
+++ b/arch/x86/um/sys_call_table_64.c
@@ -19,6 +19,7 @@
 #define sys_ioperm sys_ni_syscall

 #define __SYSCALL(nr, sym) extern asmlinkage long sym(unsigned long,
unsigned long, unsigned long, unsigned long, unsigned long, unsigned
long);
+#define __SYSCALL_NORETURN __SYSCALL
 #include <asm/syscalls_64.h>

 #undef __SYSCALL
diff --git a/scripts/syscalltbl.sh b/scripts/syscalltbl.sh
index 6abe143889ef6..16487d47e06a3 100755
--- a/scripts/syscalltbl.sh
+++ b/scripts/syscalltbl.sh
@@ -54,7 +54,7 @@ nxt=0

 grep -E "^[0-9]+[[:space:]]+$abis" "$infile" | {

- while read nr abi name native compat ; do
+ while read nr abi name native compat noreturn; do

  if [ $nxt -gt $nr ]; then
  echo "error: $infile: syscall table is not sorted or duplicates the
same syscall number" >&2
@@ -66,7 +66,9 @@ grep -E "^[0-9]+[[:space:]]+$abis" "$infile" | {
  nxt=$((nxt + 1))
  done

- if [ -n "$compat" ]; then
+ if [ -n "$noreturn" ]; then
+ echo "__SYSCALL_NORETURN($nr, $native)"
+ elif [ -n "$compat" ]; then
  echo "__SYSCALL_WITH_COMPAT($nr, $native, $compat)"
  elif [ -n "$native" ]; then
  echo "__SYSCALL($nr, $native)"
diff --git a/tools/objtool/noreturns.h b/tools/objtool/noreturns.h
index 7ebf29c911849..1e8141ef1b15d 100644
--- a/tools/objtool/noreturns.h
+++ b/tools/objtool/noreturns.h
@@ -7,12 +7,16 @@
  * Yes, this is unfortunate.  A better solution is in the works.
  */
 NORETURN(__fortify_panic)
+NORETURN(__ia32_sys_exit)
+NORETURN(__ia32_sys_exit_group)
 NORETURN(__kunit_abort)
 NORETURN(__module_put_and_kthread_exit)
 NORETURN(__reiserfs_panic)
 NORETURN(__stack_chk_fail)
 NORETURN(__tdx_hypercall_failed)
 NORETURN(__ubsan_handle_builtin_unreachable)
+NORETURN(__x64_sys_exit)
+NORETURN(__x64_sys_exit_group)
 NORETURN(arch_cpu_idle_dead)
 NORETURN(bch2_trans_in_restart_error)
 NORETURN(bch2_trans_restart_error)

On Sun, Apr 21, 2024 at 1:40 PM Paul McKenney <paulmckrcu@gmail.com> wrote:
>
> They apply fine as is, so I have started tests with that pair of patches.
>
>                                               Thanx, Paul
>
> On Sat, Apr 20, 2024 at 10:25 PM Josh Poimboeuf <jpoimboe@kernel.org> wrote:
> >
> > On Sat, Apr 20, 2024 at 06:58:58AM -0700, Paul E. McKenney wrote:
> > > On Fri, Apr 19, 2024 at 02:09:49PM -0700, Josh Poimboeuf wrote:
> > > > The direct-call syscall dispatch functions don't know that the exit()
> > > > and exit_group() syscall handlers don't return.  As a result the call
> > > > sites aren't optimized accordingly.
> > > >
> > > > Fix that by marking those exit syscall declarations as __noreturn.
> > > >
> > > > Fixes the following warnings:
> > > >
> > > >   vmlinux.o: warning: objtool: x64_sys_call+0x2804: __x64_sys_exit() is missing a __noreturn annotation
> > > >   vmlinux.o: warning: objtool: ia32_sys_call+0x29b6: __ia32_sys_exit_group() is missing a __noreturn annotation
> > > >
> > > > Fixes: 7390db8aea0d ("x86/bhi: Add support for clearing branch history at syscall entry")
> > > > Reported-by: "Paul E. McKenney" <paulmck@kernel.org>
> > > > Closes: https://lkml.kernel.org/lkml/6dba9b32-db2c-4e6d-9500-7a08852f17a3@paulmck-laptop
> > > > Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org>
> > >
> > > Looks good, but it does not apply on top of current -next and I don't
> > > trust myself to hand-apply it (something about having just got off of
> > > a flight across the big pond).
> > >
> > > Could you please let me know what else do I need to pull in to be able
> > > to cleanly apply this one?
> >
> > This patch has a dependency on an earlier patch in the set:
> >
> >   https://lkml.kernel.org/lkml/982d05a2f669140f26500bee643011896d661094.1713559768.git.jpoimboe@kernel.org
> >
> > Though I think it's not a hard dependency and I could reverse the order
> > of the patches if needed.
> >
> > --
> > Josh

[-- Attachment #2: diffs --]
[-- Type: application/octet-stream, Size: 11722 bytes --]

diff --git a/arch/x86/entry/common.c b/arch/x86/entry/common.c
index 6de50b80702e6..9810ba2857a5c 100644
--- a/arch/x86/entry/common.c
+++ b/arch/x86/entry/common.c
@@ -49,7 +49,10 @@ static __always_inline bool do_syscall_x64(struct pt_regs *regs, int nr)
 
 	if (likely(unr < NR_syscalls)) {
 		unr = array_index_nospec(unr, NR_syscalls);
-		regs->ax = x64_sys_call(regs, unr);
+		if (likely(cpu_feature_enabled(X86_FEATURE_INDIRECT_BRANCH_OK)))
+			regs->ax = sys_call_table[unr](regs);
+		else
+			regs->ax = x64_sys_call(regs, unr);
 		return true;
 	}
 	return false;
@@ -66,7 +69,10 @@ static __always_inline bool do_syscall_x32(struct pt_regs *regs, int nr)
 
 	if (IS_ENABLED(CONFIG_X86_X32_ABI) && likely(xnr < X32_NR_syscalls)) {
 		xnr = array_index_nospec(xnr, X32_NR_syscalls);
-		regs->ax = x32_sys_call(regs, xnr);
+		if (likely(cpu_feature_enabled(X86_FEATURE_INDIRECT_BRANCH_OK)))
+			regs->ax = x32_sys_call_table[xnr](regs);
+		else
+			regs->ax = x32_sys_call(regs, xnr);
 		return true;
 	}
 	return false;
@@ -162,7 +168,10 @@ static __always_inline void do_syscall_32_irqs_on(struct pt_regs *regs, int nr)
 
 	if (likely(unr < IA32_NR_syscalls)) {
 		unr = array_index_nospec(unr, IA32_NR_syscalls);
-		regs->ax = ia32_sys_call(regs, unr);
+		if (likely(cpu_feature_enabled(X86_FEATURE_INDIRECT_BRANCH_OK)))
+			regs->ax = ia32_sys_call_table[unr](regs);
+		else
+			regs->ax = ia32_sys_call(regs, unr);
 	} else if (nr != -1) {
 		regs->ax = __ia32_sys_ni_syscall(regs);
 	}
diff --git a/arch/x86/entry/syscall_32.c b/arch/x86/entry/syscall_32.c
index c2235bae17ef6..aab31760b4e3e 100644
--- a/arch/x86/entry/syscall_32.c
+++ b/arch/x86/entry/syscall_32.c
@@ -14,25 +14,16 @@
 #endif
 
 #define __SYSCALL(nr, sym) extern long __ia32_##sym(const struct pt_regs *);
-
 #include <asm/syscalls_32.h>
 #undef __SYSCALL
 
-/*
- * The sys_call_table[] is no longer used for system calls, but
- * kernel/trace/trace_syscalls.c still wants to know the system
- * call address.
- */
-#ifdef CONFIG_X86_32
 #define __SYSCALL(nr, sym) __ia32_##sym,
-const sys_call_ptr_t sys_call_table[] = {
+const sys_call_ptr_t ia32_sys_call_table[] = {
 #include <asm/syscalls_32.h>
 };
 #undef __SYSCALL
-#endif
 
 #define __SYSCALL(nr, sym) case nr: return __ia32_##sym(regs);
-
 long ia32_sys_call(const struct pt_regs *regs, unsigned int nr)
 {
 	switch (nr) {
diff --git a/arch/x86/entry/syscall_64.c b/arch/x86/entry/syscall_64.c
index 33b3f09e6f151..ff36a993a07e0 100644
--- a/arch/x86/entry/syscall_64.c
+++ b/arch/x86/entry/syscall_64.c
@@ -8,14 +8,13 @@
 #include <asm/syscall.h>
 
 #define __SYSCALL(nr, sym) extern long __x64_##sym(const struct pt_regs *);
+#define __SYSCALL_NORETURN(nr, sym) extern long __noreturn __x64_##sym(const struct pt_regs *);
 #include <asm/syscalls_64.h>
 #undef __SYSCALL
 
-/*
- * The sys_call_table[] is no longer used for system calls, but
- * kernel/trace/trace_syscalls.c still wants to know the system
- * call address.
- */
+#undef __SYSCALL_NORETURN
+#define __SYSCALL_NORETURN __SYSCALL
+
 #define __SYSCALL(nr, sym) __x64_##sym,
 const sys_call_ptr_t sys_call_table[] = {
 #include <asm/syscalls_64.h>
@@ -23,7 +22,6 @@ const sys_call_ptr_t sys_call_table[] = {
 #undef __SYSCALL
 
 #define __SYSCALL(nr, sym) case nr: return __x64_##sym(regs);
-
 long x64_sys_call(const struct pt_regs *regs, unsigned int nr)
 {
 	switch (nr) {
diff --git a/arch/x86/entry/syscall_x32.c b/arch/x86/entry/syscall_x32.c
index 03de4a9321318..4221ecce6e689 100644
--- a/arch/x86/entry/syscall_x32.c
+++ b/arch/x86/entry/syscall_x32.c
@@ -8,11 +8,20 @@
 #include <asm/syscall.h>
 
 #define __SYSCALL(nr, sym) extern long __x64_##sym(const struct pt_regs *);
+#define __SYSCALL_NORETURN(nr, sym) extern long __noreturn __x64_##sym(const struct pt_regs *);
 #include <asm/syscalls_x32.h>
 #undef __SYSCALL
 
-#define __SYSCALL(nr, sym) case nr: return __x64_##sym(regs);
+#undef __SYSCALL_NORETURN
+#define __SYSCALL_NORETURN __SYSCALL
+
+#define __SYSCALL(nr, sym) __x64_##sym,
+const sys_call_ptr_t x32_sys_call_table[] = {
+#include <asm/syscalls_x32.h>
+};
+#undef __SYSCALL
 
+#define __SYSCALL(nr, sym) case nr: return __x64_##sym(regs);
 long x32_sys_call(const struct pt_regs *regs, unsigned int nr)
 {
 	switch (nr) {
diff --git a/arch/x86/entry/syscalls/syscall_64.tbl b/arch/x86/entry/syscalls/syscall_64.tbl
index a396f6e6ab5bf..7ec68d94eb593 100644
--- a/arch/x86/entry/syscalls/syscall_64.tbl
+++ b/arch/x86/entry/syscalls/syscall_64.tbl
@@ -2,7 +2,7 @@
 # 64-bit system call numbers and entry vectors
 #
 # The format is:
-# <number> <abi> <name> <entry point>
+# <number> <abi> <name> <entry point> [0 noreturn]
 #
 # The __x64_sys_*() stubs are created on-the-fly for sys_*() system calls
 #
@@ -68,7 +68,7 @@
 57	common	fork			sys_fork
 58	common	vfork			sys_vfork
 59	64	execve			sys_execve
-60	common	exit			sys_exit
+60	common	exit			sys_exit		0	noreturn
 61	common	wait4			sys_wait4
 62	common	kill			sys_kill
 63	common	uname			sys_newuname
@@ -239,7 +239,7 @@
 228	common	clock_gettime		sys_clock_gettime
 229	common	clock_getres		sys_clock_getres
 230	common	clock_nanosleep		sys_clock_nanosleep
-231	common	exit_group		sys_exit_group
+231	common	exit_group		sys_exit_group		0	noreturn
 232	common	epoll_wait		sys_epoll_wait
 233	common	epoll_ctl		sys_epoll_ctl
 234	common	tgkill			sys_tgkill
diff --git a/arch/x86/include/asm/cpufeatures.h b/arch/x86/include/asm/cpufeatures.h
index 3c7434329661c..d64b0a5291f10 100644
--- a/arch/x86/include/asm/cpufeatures.h
+++ b/arch/x86/include/asm/cpufeatures.h
@@ -470,6 +470,7 @@
 #define X86_FEATURE_BHI_CTRL		(21*32+ 2) /* "" BHI_DIS_S HW control available */
 #define X86_FEATURE_CLEAR_BHB_HW	(21*32+ 3) /* "" BHI_DIS_S HW control enabled */
 #define X86_FEATURE_CLEAR_BHB_LOOP_ON_VMEXIT (21*32+ 4) /* "" Clear branch history at vmexit using SW loop */
+#define X86_FEATURE_INDIRECT_BRANCH_OK	(21*32+ 5) /* "" It's OK to use indirect branches */
 
 /*
  * BUG word(s)
diff --git a/arch/x86/include/asm/syscall.h b/arch/x86/include/asm/syscall.h
index 2fc7bc3863ff6..dfb59521244c2 100644
--- a/arch/x86/include/asm/syscall.h
+++ b/arch/x86/include/asm/syscall.h
@@ -16,14 +16,20 @@
 #include <asm/thread_info.h>	/* for TS_COMPAT */
 #include <asm/unistd.h>
 
-/* This is used purely for kernel/trace/trace_syscalls.c */
 typedef long (*sys_call_ptr_t)(const struct pt_regs *);
 extern const sys_call_ptr_t sys_call_table[];
 
+#if defined(CONFIG_X86_32)
+#define ia32_sys_call_table sys_call_table
+#else
 /*
  * These may not exist, but still put the prototypes in so we
  * can use IS_ENABLED().
  */
+extern const sys_call_ptr_t ia32_sys_call_table[];
+extern const sys_call_ptr_t x32_sys_call_table[];
+#endif
+
 extern long ia32_sys_call(const struct pt_regs *, unsigned int nr);
 extern long x32_sys_call(const struct pt_regs *, unsigned int nr);
 extern long x64_sys_call(const struct pt_regs *, unsigned int nr);
diff --git a/arch/x86/kernel/cpu/bugs.c b/arch/x86/kernel/cpu/bugs.c
index ab18185894dfd..5fca46c78daf2 100644
--- a/arch/x86/kernel/cpu/bugs.c
+++ b/arch/x86/kernel/cpu/bugs.c
@@ -1665,6 +1665,12 @@ static void __init bhi_select_mitigation(void)
 	if (!IS_ENABLED(CONFIG_X86_64))
 		return;
 
+	/*
+	 * There's no HW mitigation in place.  Mark indirect branches as
+	 * "not OK".
+	 */
+	setup_clear_cpu_cap(X86_FEATURE_INDIRECT_BRANCH_OK);
+
 	/* Mitigate KVM by default */
 	setup_force_cpu_cap(X86_FEATURE_CLEAR_BHB_LOOP_ON_VMEXIT);
 	pr_info("Spectre BHI mitigation: SW BHB clearing on vm exit\n");
@@ -1679,6 +1685,28 @@ static void __init spectre_v2_select_mitigation(void)
 	enum spectre_v2_mitigation_cmd cmd = spectre_v2_parse_cmdline();
 	enum spectre_v2_mitigation mode = SPECTRE_V2_NONE;
 
+	/*
+	 * X86_FEATURE_INDIRECT_BRANCH_OK indicates that indirect calls are
+	 * "OK" to use due to (at least) one of the following being true:
+	 *
+	 *   - the CPU isn't vulnerable to Spectre v2, BHI, etc;
+	 *
+	 *   - a HW mitigation is in place (e.g., IBRS, eIBRS+BHI_DIS_S); or
+	 *
+	 *   - the user disabled mitigations.
+	 *
+	 * Clearing the bit enables certain indirect branch "easy targets" [*]
+	 * to be converted to a series of direct branches.
+	 *
+	 * Assume innocence until proven guilty: set it now and clear it later
+	 * if/when needed.
+	 *
+	 * [*] The closer the indirect branch is to kernel entry, and the more
+	 *     user-controlled registers there are, the easier target it may be
+	 *     for future Spectre v2 variants.
+	 */
+	setup_force_cpu_cap(X86_FEATURE_INDIRECT_BRANCH_OK);
+
 	/*
 	 * If the CPU is not affected and the command line mode is NONE or AUTO
 	 * then nothing to do.
@@ -1765,11 +1793,16 @@ static void __init spectre_v2_select_mitigation(void)
 		break;
 
 	case SPECTRE_V2_LFENCE:
+		setup_clear_cpu_cap(X86_FEATURE_INDIRECT_BRANCH_OK);
+		fallthrough;
 	case SPECTRE_V2_EIBRS_LFENCE:
 		setup_force_cpu_cap(X86_FEATURE_RETPOLINE_LFENCE);
-		fallthrough;
+		setup_force_cpu_cap(X86_FEATURE_RETPOLINE);
+		break;
 
 	case SPECTRE_V2_RETPOLINE:
+		setup_clear_cpu_cap(X86_FEATURE_INDIRECT_BRANCH_OK);
+		fallthrough;
 	case SPECTRE_V2_EIBRS_RETPOLINE:
 		setup_force_cpu_cap(X86_FEATURE_RETPOLINE);
 		break;
diff --git a/arch/x86/um/sys_call_table_32.c b/arch/x86/um/sys_call_table_32.c
index 89df5d89d6640..c7d4bf955d2ba 100644
--- a/arch/x86/um/sys_call_table_32.c
+++ b/arch/x86/um/sys_call_table_32.c
@@ -24,6 +24,7 @@
 #define __SYSCALL_WITH_COMPAT(nr, native, compat)	__SYSCALL(nr, native)
 
 #define __SYSCALL(nr, sym) extern asmlinkage long sym(unsigned long, unsigned long, unsigned long, unsigned long, unsigned long, unsigned long);
+#define __SYSCALL_NORETURN __SYSCALL
 #include <asm/syscalls_32.h>
 
 #undef __SYSCALL
diff --git a/arch/x86/um/sys_call_table_64.c b/arch/x86/um/sys_call_table_64.c
index b0b4cfd2308c8..4760c40ae5cd0 100644
--- a/arch/x86/um/sys_call_table_64.c
+++ b/arch/x86/um/sys_call_table_64.c
@@ -19,6 +19,7 @@
 #define sys_ioperm sys_ni_syscall
 
 #define __SYSCALL(nr, sym) extern asmlinkage long sym(unsigned long, unsigned long, unsigned long, unsigned long, unsigned long, unsigned long);
+#define __SYSCALL_NORETURN __SYSCALL
 #include <asm/syscalls_64.h>
 
 #undef __SYSCALL
diff --git a/scripts/syscalltbl.sh b/scripts/syscalltbl.sh
index 6abe143889ef6..16487d47e06a3 100755
--- a/scripts/syscalltbl.sh
+++ b/scripts/syscalltbl.sh
@@ -54,7 +54,7 @@ nxt=0
 
 grep -E "^[0-9]+[[:space:]]+$abis" "$infile" | {
 
-	while read nr abi name native compat ; do
+	while read nr abi name native compat noreturn; do
 
 		if [ $nxt -gt $nr ]; then
 			echo "error: $infile: syscall table is not sorted or duplicates the same syscall number" >&2
@@ -66,7 +66,9 @@ grep -E "^[0-9]+[[:space:]]+$abis" "$infile" | {
 			nxt=$((nxt + 1))
 		done
 
-		if [ -n "$compat" ]; then
+		if [ -n "$noreturn" ]; then
+			echo "__SYSCALL_NORETURN($nr, $native)"
+		elif [ -n "$compat" ]; then
 			echo "__SYSCALL_WITH_COMPAT($nr, $native, $compat)"
 		elif [ -n "$native" ]; then
 			echo "__SYSCALL($nr, $native)"
diff --git a/tools/objtool/noreturns.h b/tools/objtool/noreturns.h
index 7ebf29c911849..1e8141ef1b15d 100644
--- a/tools/objtool/noreturns.h
+++ b/tools/objtool/noreturns.h
@@ -7,12 +7,16 @@
  * Yes, this is unfortunate.  A better solution is in the works.
  */
 NORETURN(__fortify_panic)
+NORETURN(__ia32_sys_exit)
+NORETURN(__ia32_sys_exit_group)
 NORETURN(__kunit_abort)
 NORETURN(__module_put_and_kthread_exit)
 NORETURN(__reiserfs_panic)
 NORETURN(__stack_chk_fail)
 NORETURN(__tdx_hypercall_failed)
 NORETURN(__ubsan_handle_builtin_unreachable)
+NORETURN(__x64_sys_exit)
+NORETURN(__x64_sys_exit_group)
 NORETURN(arch_cpu_idle_dead)
 NORETURN(bch2_trans_in_restart_error)
 NORETURN(bch2_trans_restart_error)

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* Re: [PATCH v4 1/5] x86/bugs: Only harden syscalls when needed
  2024-04-19 21:09 ` [PATCH v4 1/5] x86/bugs: Only harden syscalls when needed Josh Poimboeuf
@ 2024-04-22  8:09   ` Yujie Liu
  0 siblings, 0 replies; 21+ messages in thread
From: Yujie Liu @ 2024-04-22  8:09 UTC (permalink / raw)
  To: Josh Poimboeuf
  Cc: x86, linux-kernel, Linus Torvalds, Daniel Sneddon, Pawan Gupta,
	Thomas Gleixner, Alexandre Chartre, Konrad Rzeszutek Wilk,
	Peter Zijlstra, Greg Kroah-Hartman, Sean Christopherson,
	Andrew Cooper, Dave Hansen, Nikolay Borisov, KP Singh,
	Waiman Long, Borislav Petkov, Ingo Molnar

On Fri, Apr 19, 2024 at 02:09:47PM -0700, Josh Poimboeuf wrote:
> Syscall hardening (converting the syscall indirect branch to a series of
> direct branches) has shown some performance regressions:
>
> - Red Hat internal testing showed up to 12% slowdowns in database
>   benchmark testing on Sapphire Rapids when the DB was stressed with 80+
>   users to cause contention.
>
> - The kernel test robot's will-it-scale benchmarks showed significant
>   regressions on Skylake with IBRS:
>   https://lkml.kernel.org/lkml/202404191333.178a0eed-yujie.liu@intel.com

To clarify, we reported a +1.4% improvement (not regression) of
will-it-scale futex4 benchmark on Skylake. Meanwhile we did observe some
regressions by running other benchmarks on Ice Lake, such as:

    stress-ng.null.ops_per_sec -4.0% regression on Intel Xeon Gold 6346 (Ice Lake)
    unixbench.fsbuffer.throughput -1.4% regression on Intel Xeon Gold 6346 (Ice Lake)

>
> To fix those slowdowns, only use the syscall direct branches when
> indirect branches are considered to be "not OK": meaning Spectre v2+BHI
> isn't mitigated by HW and the user hasn't disabled mitigations.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v4 2/5] cpu/speculation: Fix CPU mitigation defaults for !x86
  2024-04-20  0:09   ` Sean Christopherson
@ 2024-04-23 14:10     ` Sean Christopherson
  2024-04-24  5:35       ` Josh Poimboeuf
  0 siblings, 1 reply; 21+ messages in thread
From: Sean Christopherson @ 2024-04-23 14:10 UTC (permalink / raw)
  To: Josh Poimboeuf
  Cc: x86, linux-kernel, Linus Torvalds, Daniel Sneddon, Pawan Gupta,
	Thomas Gleixner, Alexandre Chartre, Konrad Rzeszutek Wilk,
	Peter Zijlstra, Greg Kroah-Hartman, Andrew Cooper, Dave Hansen,
	Nikolay Borisov, KP Singh, Waiman Long, Borislav Petkov,
	Ingo Molnar, Stephen Rothwell, Michael Ellerman,
	Geert Uytterhoeven

On Fri, Apr 19, 2024, Sean Christopherson wrote:
> On Fri, Apr 19, 2024, Josh Poimboeuf wrote:
> > CPU speculative execution mitigations were inadvertently disabled on
> > non-x86 arches by the following commit:
> > 
> >  f337a6a21e2f ("x86/cpu: Actually turn off mitigations by default for SPECULATION_MITIGATIONS=n")
> > 
> > Fix it by replacing CONFIG_SPECULATION_MITIGATIONS with a new generic
> > CONFIG_CPU_MITIGATIONS option and moving the x86-specific mitigations to
> > a separate menu which depends on CONFIG_CPU_MITIGATIONS.
> 
> Ah drat, I didn't check my mailbox until after Cc'ing Linus my own version[*].
> 
> I don't have a strong preference between the two, though I do think it's worth
> nothing that this will (obvioulsy) allow disabling mitigations at compile time
> on all architectures, which may or may not be desirable.
> 
> [*] https://lore.kernel.org/all/20240420000556.2645001-2-seanjc@google.com

Josh, when you get a chance, can you weigh in on my menu-preserving approach?

I want to get this resolved asap so that we're not scrambing on Friday again :-)

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v4 2/5] cpu/speculation: Fix CPU mitigation defaults for !x86
  2024-04-23 14:10     ` Sean Christopherson
@ 2024-04-24  5:35       ` Josh Poimboeuf
  0 siblings, 0 replies; 21+ messages in thread
From: Josh Poimboeuf @ 2024-04-24  5:35 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: x86, linux-kernel, Linus Torvalds, Daniel Sneddon, Pawan Gupta,
	Thomas Gleixner, Alexandre Chartre, Konrad Rzeszutek Wilk,
	Peter Zijlstra, Greg Kroah-Hartman, Andrew Cooper, Dave Hansen,
	Nikolay Borisov, KP Singh, Waiman Long, Borislav Petkov,
	Ingo Molnar, Stephen Rothwell, Michael Ellerman,
	Geert Uytterhoeven

On Tue, Apr 23, 2024 at 07:10:23AM -0700, Sean Christopherson wrote:
> On Fri, Apr 19, 2024, Sean Christopherson wrote:
> > On Fri, Apr 19, 2024, Josh Poimboeuf wrote:
> > > CPU speculative execution mitigations were inadvertently disabled on
> > > non-x86 arches by the following commit:
> > > 
> > >  f337a6a21e2f ("x86/cpu: Actually turn off mitigations by default for SPECULATION_MITIGATIONS=n")
> > > 
> > > Fix it by replacing CONFIG_SPECULATION_MITIGATIONS with a new generic
> > > CONFIG_CPU_MITIGATIONS option and moving the x86-specific mitigations to
> > > a separate menu which depends on CONFIG_CPU_MITIGATIONS.
> > 
> > Ah drat, I didn't check my mailbox until after Cc'ing Linus my own version[*].
> > 
> > I don't have a strong preference between the two, though I do think it's worth
> > nothing that this will (obvioulsy) allow disabling mitigations at compile time
> > on all architectures, which may or may not be desirable.
> > 
> > [*] https://lore.kernel.org/all/20240420000556.2645001-2-seanjc@google.com
> 
> Josh, when you get a chance, can you weigh in on my menu-preserving approach?
> 
> I want to get this resolved asap so that we're not scrambing on Friday again :-)

Yeah, yours looks good.  Lemme go ack.

-- 
Josh

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v4 3/5] x86/syscall: Mark exit[_group] syscall handlers __noreturn
  2024-04-21 21:47         ` Paul McKenney
@ 2024-05-02 23:48           ` Paul McKenney
  2024-05-03 15:38             ` Paul E. McKenney
  2024-05-03 19:56             ` Josh Poimboeuf
  0 siblings, 2 replies; 21+ messages in thread
From: Paul McKenney @ 2024-05-02 23:48 UTC (permalink / raw)
  To: Josh Poimboeuf
  Cc: Paul E. McKenney, x86, linux-kernel, Linus Torvalds,
	Daniel Sneddon, Pawan Gupta, Thomas Gleixner, Alexandre Chartre,
	Konrad Rzeszutek Wilk, Peter Zijlstra, Greg Kroah-Hartman,
	Sean Christopherson, Andrew Cooper, Dave Hansen, Nikolay Borisov,
	KP Singh, Waiman Long, Borislav Petkov, Ingo Molnar

On Sun, Apr 21, 2024 at 2:47 PM Paul McKenney <paulmckrcu@gmail.com> wrote:
>
> And this definitely helped, thank you!
>
> However, this one still remains:
>
> vmlinux.o: warning: objtool: ia32_sys_call+0x29b6:
> __ia32_sys_exit_group() is missing a __noreturn annotation

And looking at the patched code, this function looks to me to be
correctly marked.

No idea...  :-/

                                          Thanx, Paul

> Please see below for my diffs against next-20240419, in case I messed
> something up.
>
> I attached a copy as well, given that I am away from mutt, hence using
> gmail directly.
>
>                                                 Thanx, Paul
>
> -----------------------------------------------------
>
> diff --git a/arch/x86/entry/common.c b/arch/x86/entry/common.c
> index 6de50b80702e6..9810ba2857a5c 100644
> --- a/arch/x86/entry/common.c
> +++ b/arch/x86/entry/common.c
> @@ -49,7 +49,10 @@ static __always_inline bool do_syscall_x64(struct
> pt_regs *regs, int nr)
>
>   if (likely(unr < NR_syscalls)) {
>   unr = array_index_nospec(unr, NR_syscalls);
> - regs->ax = x64_sys_call(regs, unr);
> + if (likely(cpu_feature_enabled(X86_FEATURE_INDIRECT_BRANCH_OK)))
> + regs->ax = sys_call_table[unr](regs);
> + else
> + regs->ax = x64_sys_call(regs, unr);
>   return true;
>   }
>   return false;
> @@ -66,7 +69,10 @@ static __always_inline bool do_syscall_x32(struct
> pt_regs *regs, int nr)
>
>   if (IS_ENABLED(CONFIG_X86_X32_ABI) && likely(xnr < X32_NR_syscalls)) {
>   xnr = array_index_nospec(xnr, X32_NR_syscalls);
> - regs->ax = x32_sys_call(regs, xnr);
> + if (likely(cpu_feature_enabled(X86_FEATURE_INDIRECT_BRANCH_OK)))
> + regs->ax = x32_sys_call_table[xnr](regs);
> + else
> + regs->ax = x32_sys_call(regs, xnr);
>   return true;
>   }
>   return false;
> @@ -162,7 +168,10 @@ static __always_inline void
> do_syscall_32_irqs_on(struct pt_regs *regs, int nr)
>
>   if (likely(unr < IA32_NR_syscalls)) {
>   unr = array_index_nospec(unr, IA32_NR_syscalls);
> - regs->ax = ia32_sys_call(regs, unr);
> + if (likely(cpu_feature_enabled(X86_FEATURE_INDIRECT_BRANCH_OK)))
> + regs->ax = ia32_sys_call_table[unr](regs);
> + else
> + regs->ax = ia32_sys_call(regs, unr);
>   } else if (nr != -1) {
>   regs->ax = __ia32_sys_ni_syscall(regs);
>   }
> diff --git a/arch/x86/entry/syscall_32.c b/arch/x86/entry/syscall_32.c
> index c2235bae17ef6..aab31760b4e3e 100644
> --- a/arch/x86/entry/syscall_32.c
> +++ b/arch/x86/entry/syscall_32.c
> @@ -14,25 +14,16 @@
>  #endif
>
>  #define __SYSCALL(nr, sym) extern long __ia32_##sym(const struct pt_regs *);
> -
>  #include <asm/syscalls_32.h>
>  #undef __SYSCALL
>
> -/*
> - * The sys_call_table[] is no longer used for system calls, but
> - * kernel/trace/trace_syscalls.c still wants to know the system
> - * call address.
> - */
> -#ifdef CONFIG_X86_32
>  #define __SYSCALL(nr, sym) __ia32_##sym,
> -const sys_call_ptr_t sys_call_table[] = {
> +const sys_call_ptr_t ia32_sys_call_table[] = {
>  #include <asm/syscalls_32.h>
>  };
>  #undef __SYSCALL
> -#endif
>
>  #define __SYSCALL(nr, sym) case nr: return __ia32_##sym(regs);
> -
>  long ia32_sys_call(const struct pt_regs *regs, unsigned int nr)
>  {
>   switch (nr) {
> diff --git a/arch/x86/entry/syscall_64.c b/arch/x86/entry/syscall_64.c
> index 33b3f09e6f151..ff36a993a07e0 100644
> --- a/arch/x86/entry/syscall_64.c
> +++ b/arch/x86/entry/syscall_64.c
> @@ -8,14 +8,13 @@
>  #include <asm/syscall.h>
>
>  #define __SYSCALL(nr, sym) extern long __x64_##sym(const struct pt_regs *);
> +#define __SYSCALL_NORETURN(nr, sym) extern long __noreturn
> __x64_##sym(const struct pt_regs *);
>  #include <asm/syscalls_64.h>
>  #undef __SYSCALL
>
> -/*
> - * The sys_call_table[] is no longer used for system calls, but
> - * kernel/trace/trace_syscalls.c still wants to know the system
> - * call address.
> - */
> +#undef __SYSCALL_NORETURN
> +#define __SYSCALL_NORETURN __SYSCALL
> +
>  #define __SYSCALL(nr, sym) __x64_##sym,
>  const sys_call_ptr_t sys_call_table[] = {
>  #include <asm/syscalls_64.h>
> @@ -23,7 +22,6 @@ const sys_call_ptr_t sys_call_table[] = {
>  #undef __SYSCALL
>
>  #define __SYSCALL(nr, sym) case nr: return __x64_##sym(regs);
> -
>  long x64_sys_call(const struct pt_regs *regs, unsigned int nr)
>  {
>   switch (nr) {
> diff --git a/arch/x86/entry/syscall_x32.c b/arch/x86/entry/syscall_x32.c
> index 03de4a9321318..4221ecce6e689 100644
> --- a/arch/x86/entry/syscall_x32.c
> +++ b/arch/x86/entry/syscall_x32.c
> @@ -8,11 +8,20 @@
>  #include <asm/syscall.h>
>
>  #define __SYSCALL(nr, sym) extern long __x64_##sym(const struct pt_regs *);
> +#define __SYSCALL_NORETURN(nr, sym) extern long __noreturn
> __x64_##sym(const struct pt_regs *);
>  #include <asm/syscalls_x32.h>
>  #undef __SYSCALL
>
> -#define __SYSCALL(nr, sym) case nr: return __x64_##sym(regs);
> +#undef __SYSCALL_NORETURN
> +#define __SYSCALL_NORETURN __SYSCALL
> +
> +#define __SYSCALL(nr, sym) __x64_##sym,
> +const sys_call_ptr_t x32_sys_call_table[] = {
> +#include <asm/syscalls_x32.h>
> +};
> +#undef __SYSCALL
>
> +#define __SYSCALL(nr, sym) case nr: return __x64_##sym(regs);
>  long x32_sys_call(const struct pt_regs *regs, unsigned int nr)
>  {
>   switch (nr) {
> diff --git a/arch/x86/entry/syscalls/syscall_64.tbl
> b/arch/x86/entry/syscalls/syscall_64.tbl
> index a396f6e6ab5bf..7ec68d94eb593 100644
> --- a/arch/x86/entry/syscalls/syscall_64.tbl
> +++ b/arch/x86/entry/syscalls/syscall_64.tbl
> @@ -2,7 +2,7 @@
>  # 64-bit system call numbers and entry vectors
>  #
>  # The format is:
> -# <number> <abi> <name> <entry point>
> +# <number> <abi> <name> <entry point> [0 noreturn]
>  #
>  # The __x64_sys_*() stubs are created on-the-fly for sys_*() system calls
>  #
> @@ -68,7 +68,7 @@
>  57 common fork sys_fork
>  58 common vfork sys_vfork
>  59 64 execve sys_execve
> -60 common exit sys_exit
> +60 common exit sys_exit 0 noreturn
>  61 common wait4 sys_wait4
>  62 common kill sys_kill
>  63 common uname sys_newuname
> @@ -239,7 +239,7 @@
>  228 common clock_gettime sys_clock_gettime
>  229 common clock_getres sys_clock_getres
>  230 common clock_nanosleep sys_clock_nanosleep
> -231 common exit_group sys_exit_group
> +231 common exit_group sys_exit_group 0 noreturn
>  232 common epoll_wait sys_epoll_wait
>  233 common epoll_ctl sys_epoll_ctl
>  234 common tgkill sys_tgkill
> diff --git a/arch/x86/include/asm/cpufeatures.h
> b/arch/x86/include/asm/cpufeatures.h
> index 3c7434329661c..d64b0a5291f10 100644
> --- a/arch/x86/include/asm/cpufeatures.h
> +++ b/arch/x86/include/asm/cpufeatures.h
> @@ -470,6 +470,7 @@
>  #define X86_FEATURE_BHI_CTRL (21*32+ 2) /* "" BHI_DIS_S HW control available */
>  #define X86_FEATURE_CLEAR_BHB_HW (21*32+ 3) /* "" BHI_DIS_S HW
> control enabled */
>  #define X86_FEATURE_CLEAR_BHB_LOOP_ON_VMEXIT (21*32+ 4) /* "" Clear
> branch history at vmexit using SW loop */
> +#define X86_FEATURE_INDIRECT_BRANCH_OK (21*32+ 5) /* "" It's OK to
> use indirect branches */
>
>  /*
>   * BUG word(s)
> diff --git a/arch/x86/include/asm/syscall.h b/arch/x86/include/asm/syscall.h
> index 2fc7bc3863ff6..dfb59521244c2 100644
> --- a/arch/x86/include/asm/syscall.h
> +++ b/arch/x86/include/asm/syscall.h
> @@ -16,14 +16,20 @@
>  #include <asm/thread_info.h> /* for TS_COMPAT */
>  #include <asm/unistd.h>
>
> -/* This is used purely for kernel/trace/trace_syscalls.c */
>  typedef long (*sys_call_ptr_t)(const struct pt_regs *);
>  extern const sys_call_ptr_t sys_call_table[];
>
> +#if defined(CONFIG_X86_32)
> +#define ia32_sys_call_table sys_call_table
> +#else
>  /*
>   * These may not exist, but still put the prototypes in so we
>   * can use IS_ENABLED().
>   */
> +extern const sys_call_ptr_t ia32_sys_call_table[];
> +extern const sys_call_ptr_t x32_sys_call_table[];
> +#endif
> +
>  extern long ia32_sys_call(const struct pt_regs *, unsigned int nr);
>  extern long x32_sys_call(const struct pt_regs *, unsigned int nr);
>  extern long x64_sys_call(const struct pt_regs *, unsigned int nr);
> diff --git a/arch/x86/kernel/cpu/bugs.c b/arch/x86/kernel/cpu/bugs.c
> index ab18185894dfd..5fca46c78daf2 100644
> --- a/arch/x86/kernel/cpu/bugs.c
> +++ b/arch/x86/kernel/cpu/bugs.c
> @@ -1665,6 +1665,12 @@ static void __init bhi_select_mitigation(void)
>   if (!IS_ENABLED(CONFIG_X86_64))
>   return;
>
> + /*
> + * There's no HW mitigation in place.  Mark indirect branches as
> + * "not OK".
> + */
> + setup_clear_cpu_cap(X86_FEATURE_INDIRECT_BRANCH_OK);
> +
>   /* Mitigate KVM by default */
>   setup_force_cpu_cap(X86_FEATURE_CLEAR_BHB_LOOP_ON_VMEXIT);
>   pr_info("Spectre BHI mitigation: SW BHB clearing on vm exit\n");
> @@ -1679,6 +1685,28 @@ static void __init spectre_v2_select_mitigation(void)
>   enum spectre_v2_mitigation_cmd cmd = spectre_v2_parse_cmdline();
>   enum spectre_v2_mitigation mode = SPECTRE_V2_NONE;
>
> + /*
> + * X86_FEATURE_INDIRECT_BRANCH_OK indicates that indirect calls are
> + * "OK" to use due to (at least) one of the following being true:
> + *
> + *   - the CPU isn't vulnerable to Spectre v2, BHI, etc;
> + *
> + *   - a HW mitigation is in place (e.g., IBRS, eIBRS+BHI_DIS_S); or
> + *
> + *   - the user disabled mitigations.
> + *
> + * Clearing the bit enables certain indirect branch "easy targets" [*]
> + * to be converted to a series of direct branches.
> + *
> + * Assume innocence until proven guilty: set it now and clear it later
> + * if/when needed.
> + *
> + * [*] The closer the indirect branch is to kernel entry, and the more
> + *     user-controlled registers there are, the easier target it may be
> + *     for future Spectre v2 variants.
> + */
> + setup_force_cpu_cap(X86_FEATURE_INDIRECT_BRANCH_OK);
> +
>   /*
>   * If the CPU is not affected and the command line mode is NONE or AUTO
>   * then nothing to do.
> @@ -1765,11 +1793,16 @@ static void __init spectre_v2_select_mitigation(void)
>   break;
>
>   case SPECTRE_V2_LFENCE:
> + setup_clear_cpu_cap(X86_FEATURE_INDIRECT_BRANCH_OK);
> + fallthrough;
>   case SPECTRE_V2_EIBRS_LFENCE:
>   setup_force_cpu_cap(X86_FEATURE_RETPOLINE_LFENCE);
> - fallthrough;
> + setup_force_cpu_cap(X86_FEATURE_RETPOLINE);
> + break;
>
>   case SPECTRE_V2_RETPOLINE:
> + setup_clear_cpu_cap(X86_FEATURE_INDIRECT_BRANCH_OK);
> + fallthrough;
>   case SPECTRE_V2_EIBRS_RETPOLINE:
>   setup_force_cpu_cap(X86_FEATURE_RETPOLINE);
>   break;
> diff --git a/arch/x86/um/sys_call_table_32.c b/arch/x86/um/sys_call_table_32.c
> index 89df5d89d6640..c7d4bf955d2ba 100644
> --- a/arch/x86/um/sys_call_table_32.c
> +++ b/arch/x86/um/sys_call_table_32.c
> @@ -24,6 +24,7 @@
>  #define __SYSCALL_WITH_COMPAT(nr, native, compat) __SYSCALL(nr, native)
>
>  #define __SYSCALL(nr, sym) extern asmlinkage long sym(unsigned long,
> unsigned long, unsigned long, unsigned long, unsigned long, unsigned
> long);
> +#define __SYSCALL_NORETURN __SYSCALL
>  #include <asm/syscalls_32.h>
>
>  #undef __SYSCALL
> diff --git a/arch/x86/um/sys_call_table_64.c b/arch/x86/um/sys_call_table_64.c
> index b0b4cfd2308c8..4760c40ae5cd0 100644
> --- a/arch/x86/um/sys_call_table_64.c
> +++ b/arch/x86/um/sys_call_table_64.c
> @@ -19,6 +19,7 @@
>  #define sys_ioperm sys_ni_syscall
>
>  #define __SYSCALL(nr, sym) extern asmlinkage long sym(unsigned long,
> unsigned long, unsigned long, unsigned long, unsigned long, unsigned
> long);
> +#define __SYSCALL_NORETURN __SYSCALL
>  #include <asm/syscalls_64.h>
>
>  #undef __SYSCALL
> diff --git a/scripts/syscalltbl.sh b/scripts/syscalltbl.sh
> index 6abe143889ef6..16487d47e06a3 100755
> --- a/scripts/syscalltbl.sh
> +++ b/scripts/syscalltbl.sh
> @@ -54,7 +54,7 @@ nxt=0
>
>  grep -E "^[0-9]+[[:space:]]+$abis" "$infile" | {
>
> - while read nr abi name native compat ; do
> + while read nr abi name native compat noreturn; do
>
>   if [ $nxt -gt $nr ]; then
>   echo "error: $infile: syscall table is not sorted or duplicates the
> same syscall number" >&2
> @@ -66,7 +66,9 @@ grep -E "^[0-9]+[[:space:]]+$abis" "$infile" | {
>   nxt=$((nxt + 1))
>   done
>
> - if [ -n "$compat" ]; then
> + if [ -n "$noreturn" ]; then
> + echo "__SYSCALL_NORETURN($nr, $native)"
> + elif [ -n "$compat" ]; then
>   echo "__SYSCALL_WITH_COMPAT($nr, $native, $compat)"
>   elif [ -n "$native" ]; then
>   echo "__SYSCALL($nr, $native)"
> diff --git a/tools/objtool/noreturns.h b/tools/objtool/noreturns.h
> index 7ebf29c911849..1e8141ef1b15d 100644
> --- a/tools/objtool/noreturns.h
> +++ b/tools/objtool/noreturns.h
> @@ -7,12 +7,16 @@
>   * Yes, this is unfortunate.  A better solution is in the works.
>   */
>  NORETURN(__fortify_panic)
> +NORETURN(__ia32_sys_exit)
> +NORETURN(__ia32_sys_exit_group)
>  NORETURN(__kunit_abort)
>  NORETURN(__module_put_and_kthread_exit)
>  NORETURN(__reiserfs_panic)
>  NORETURN(__stack_chk_fail)
>  NORETURN(__tdx_hypercall_failed)
>  NORETURN(__ubsan_handle_builtin_unreachable)
> +NORETURN(__x64_sys_exit)
> +NORETURN(__x64_sys_exit_group)
>  NORETURN(arch_cpu_idle_dead)
>  NORETURN(bch2_trans_in_restart_error)
>  NORETURN(bch2_trans_restart_error)
>
> On Sun, Apr 21, 2024 at 1:40 PM Paul McKenney <paulmckrcu@gmail.com> wrote:
> >
> > They apply fine as is, so I have started tests with that pair of patches.
> >
> >                                               Thanx, Paul
> >
> > On Sat, Apr 20, 2024 at 10:25 PM Josh Poimboeuf <jpoimboe@kernel.org> wrote:
> > >
> > > On Sat, Apr 20, 2024 at 06:58:58AM -0700, Paul E. McKenney wrote:
> > > > On Fri, Apr 19, 2024 at 02:09:49PM -0700, Josh Poimboeuf wrote:
> > > > > The direct-call syscall dispatch functions don't know that the exit()
> > > > > and exit_group() syscall handlers don't return.  As a result the call
> > > > > sites aren't optimized accordingly.
> > > > >
> > > > > Fix that by marking those exit syscall declarations as __noreturn.
> > > > >
> > > > > Fixes the following warnings:
> > > > >
> > > > >   vmlinux.o: warning: objtool: x64_sys_call+0x2804: __x64_sys_exit() is missing a __noreturn annotation
> > > > >   vmlinux.o: warning: objtool: ia32_sys_call+0x29b6: __ia32_sys_exit_group() is missing a __noreturn annotation
> > > > >
> > > > > Fixes: 7390db8aea0d ("x86/bhi: Add support for clearing branch history at syscall entry")
> > > > > Reported-by: "Paul E. McKenney" <paulmck@kernel.org>
> > > > > Closes: https://lkml.kernel.org/lkml/6dba9b32-db2c-4e6d-9500-7a08852f17a3@paulmck-laptop
> > > > > Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org>
> > > >
> > > > Looks good, but it does not apply on top of current -next and I don't
> > > > trust myself to hand-apply it (something about having just got off of
> > > > a flight across the big pond).
> > > >
> > > > Could you please let me know what else do I need to pull in to be able
> > > > to cleanly apply this one?
> > >
> > > This patch has a dependency on an earlier patch in the set:
> > >
> > >   https://lkml.kernel.org/lkml/982d05a2f669140f26500bee643011896d661094.1713559768.git.jpoimboe@kernel.org
> > >
> > > Though I think it's not a hard dependency and I could reverse the order
> > > of the patches if needed.
> > >
> > > --
> > > Josh

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v4 3/5] x86/syscall: Mark exit[_group] syscall handlers __noreturn
  2024-05-02 23:48           ` Paul McKenney
@ 2024-05-03 15:38             ` Paul E. McKenney
  2024-05-03 19:56             ` Josh Poimboeuf
  1 sibling, 0 replies; 21+ messages in thread
From: Paul E. McKenney @ 2024-05-03 15:38 UTC (permalink / raw)
  To: Josh Poimboeuf
  Cc: x86, linux-kernel, Linus Torvalds, Daniel Sneddon, Pawan Gupta,
	Thomas Gleixner, Alexandre Chartre, Konrad Rzeszutek Wilk,
	Peter Zijlstra, Greg Kroah-Hartman, Sean Christopherson,
	Andrew Cooper, Dave Hansen, Nikolay Borisov, KP Singh,
	Waiman Long, Borislav Petkov, Ingo Molnar

On Thu, May 02, 2024 at 04:48:13PM -0700, Paul McKenney wrote:
> On Sun, Apr 21, 2024 at 2:47 PM Paul McKenney <paulmckrcu@gmail.com> wrote:
> >
> > And this definitely helped, thank you!
> >
> > However, this one still remains:
> >
> > vmlinux.o: warning: objtool: ia32_sys_call+0x29b6:
> > __ia32_sys_exit_group() is missing a __noreturn annotation
> 
> And looking at the patched code, this function looks to me to be
> correctly marked.
> 
> No idea...  :-/

But thank you for getting rid of the other warning:

Tested-by: Paul E. McKenney <paulmck@kernel.org>

> > Please see below for my diffs against next-20240419, in case I messed
> > something up.
> >
> > I attached a copy as well, given that I am away from mutt, hence using
> > gmail directly.
> >
> >                                                 Thanx, Paul
> >
> > -----------------------------------------------------
> >
> > diff --git a/arch/x86/entry/common.c b/arch/x86/entry/common.c
> > index 6de50b80702e6..9810ba2857a5c 100644
> > --- a/arch/x86/entry/common.c
> > +++ b/arch/x86/entry/common.c
> > @@ -49,7 +49,10 @@ static __always_inline bool do_syscall_x64(struct
> > pt_regs *regs, int nr)
> >
> >   if (likely(unr < NR_syscalls)) {
> >   unr = array_index_nospec(unr, NR_syscalls);
> > - regs->ax = x64_sys_call(regs, unr);
> > + if (likely(cpu_feature_enabled(X86_FEATURE_INDIRECT_BRANCH_OK)))
> > + regs->ax = sys_call_table[unr](regs);
> > + else
> > + regs->ax = x64_sys_call(regs, unr);
> >   return true;
> >   }
> >   return false;
> > @@ -66,7 +69,10 @@ static __always_inline bool do_syscall_x32(struct
> > pt_regs *regs, int nr)
> >
> >   if (IS_ENABLED(CONFIG_X86_X32_ABI) && likely(xnr < X32_NR_syscalls)) {
> >   xnr = array_index_nospec(xnr, X32_NR_syscalls);
> > - regs->ax = x32_sys_call(regs, xnr);
> > + if (likely(cpu_feature_enabled(X86_FEATURE_INDIRECT_BRANCH_OK)))
> > + regs->ax = x32_sys_call_table[xnr](regs);
> > + else
> > + regs->ax = x32_sys_call(regs, xnr);
> >   return true;
> >   }
> >   return false;
> > @@ -162,7 +168,10 @@ static __always_inline void
> > do_syscall_32_irqs_on(struct pt_regs *regs, int nr)
> >
> >   if (likely(unr < IA32_NR_syscalls)) {
> >   unr = array_index_nospec(unr, IA32_NR_syscalls);
> > - regs->ax = ia32_sys_call(regs, unr);
> > + if (likely(cpu_feature_enabled(X86_FEATURE_INDIRECT_BRANCH_OK)))
> > + regs->ax = ia32_sys_call_table[unr](regs);
> > + else
> > + regs->ax = ia32_sys_call(regs, unr);
> >   } else if (nr != -1) {
> >   regs->ax = __ia32_sys_ni_syscall(regs);
> >   }
> > diff --git a/arch/x86/entry/syscall_32.c b/arch/x86/entry/syscall_32.c
> > index c2235bae17ef6..aab31760b4e3e 100644
> > --- a/arch/x86/entry/syscall_32.c
> > +++ b/arch/x86/entry/syscall_32.c
> > @@ -14,25 +14,16 @@
> >  #endif
> >
> >  #define __SYSCALL(nr, sym) extern long __ia32_##sym(const struct pt_regs *);
> > -
> >  #include <asm/syscalls_32.h>
> >  #undef __SYSCALL
> >
> > -/*
> > - * The sys_call_table[] is no longer used for system calls, but
> > - * kernel/trace/trace_syscalls.c still wants to know the system
> > - * call address.
> > - */
> > -#ifdef CONFIG_X86_32
> >  #define __SYSCALL(nr, sym) __ia32_##sym,
> > -const sys_call_ptr_t sys_call_table[] = {
> > +const sys_call_ptr_t ia32_sys_call_table[] = {
> >  #include <asm/syscalls_32.h>
> >  };
> >  #undef __SYSCALL
> > -#endif
> >
> >  #define __SYSCALL(nr, sym) case nr: return __ia32_##sym(regs);
> > -
> >  long ia32_sys_call(const struct pt_regs *regs, unsigned int nr)
> >  {
> >   switch (nr) {
> > diff --git a/arch/x86/entry/syscall_64.c b/arch/x86/entry/syscall_64.c
> > index 33b3f09e6f151..ff36a993a07e0 100644
> > --- a/arch/x86/entry/syscall_64.c
> > +++ b/arch/x86/entry/syscall_64.c
> > @@ -8,14 +8,13 @@
> >  #include <asm/syscall.h>
> >
> >  #define __SYSCALL(nr, sym) extern long __x64_##sym(const struct pt_regs *);
> > +#define __SYSCALL_NORETURN(nr, sym) extern long __noreturn
> > __x64_##sym(const struct pt_regs *);
> >  #include <asm/syscalls_64.h>
> >  #undef __SYSCALL
> >
> > -/*
> > - * The sys_call_table[] is no longer used for system calls, but
> > - * kernel/trace/trace_syscalls.c still wants to know the system
> > - * call address.
> > - */
> > +#undef __SYSCALL_NORETURN
> > +#define __SYSCALL_NORETURN __SYSCALL
> > +
> >  #define __SYSCALL(nr, sym) __x64_##sym,
> >  const sys_call_ptr_t sys_call_table[] = {
> >  #include <asm/syscalls_64.h>
> > @@ -23,7 +22,6 @@ const sys_call_ptr_t sys_call_table[] = {
> >  #undef __SYSCALL
> >
> >  #define __SYSCALL(nr, sym) case nr: return __x64_##sym(regs);
> > -
> >  long x64_sys_call(const struct pt_regs *regs, unsigned int nr)
> >  {
> >   switch (nr) {
> > diff --git a/arch/x86/entry/syscall_x32.c b/arch/x86/entry/syscall_x32.c
> > index 03de4a9321318..4221ecce6e689 100644
> > --- a/arch/x86/entry/syscall_x32.c
> > +++ b/arch/x86/entry/syscall_x32.c
> > @@ -8,11 +8,20 @@
> >  #include <asm/syscall.h>
> >
> >  #define __SYSCALL(nr, sym) extern long __x64_##sym(const struct pt_regs *);
> > +#define __SYSCALL_NORETURN(nr, sym) extern long __noreturn
> > __x64_##sym(const struct pt_regs *);
> >  #include <asm/syscalls_x32.h>
> >  #undef __SYSCALL
> >
> > -#define __SYSCALL(nr, sym) case nr: return __x64_##sym(regs);
> > +#undef __SYSCALL_NORETURN
> > +#define __SYSCALL_NORETURN __SYSCALL
> > +
> > +#define __SYSCALL(nr, sym) __x64_##sym,
> > +const sys_call_ptr_t x32_sys_call_table[] = {
> > +#include <asm/syscalls_x32.h>
> > +};
> > +#undef __SYSCALL
> >
> > +#define __SYSCALL(nr, sym) case nr: return __x64_##sym(regs);
> >  long x32_sys_call(const struct pt_regs *regs, unsigned int nr)
> >  {
> >   switch (nr) {
> > diff --git a/arch/x86/entry/syscalls/syscall_64.tbl
> > b/arch/x86/entry/syscalls/syscall_64.tbl
> > index a396f6e6ab5bf..7ec68d94eb593 100644
> > --- a/arch/x86/entry/syscalls/syscall_64.tbl
> > +++ b/arch/x86/entry/syscalls/syscall_64.tbl
> > @@ -2,7 +2,7 @@
> >  # 64-bit system call numbers and entry vectors
> >  #
> >  # The format is:
> > -# <number> <abi> <name> <entry point>
> > +# <number> <abi> <name> <entry point> [0 noreturn]
> >  #
> >  # The __x64_sys_*() stubs are created on-the-fly for sys_*() system calls
> >  #
> > @@ -68,7 +68,7 @@
> >  57 common fork sys_fork
> >  58 common vfork sys_vfork
> >  59 64 execve sys_execve
> > -60 common exit sys_exit
> > +60 common exit sys_exit 0 noreturn
> >  61 common wait4 sys_wait4
> >  62 common kill sys_kill
> >  63 common uname sys_newuname
> > @@ -239,7 +239,7 @@
> >  228 common clock_gettime sys_clock_gettime
> >  229 common clock_getres sys_clock_getres
> >  230 common clock_nanosleep sys_clock_nanosleep
> > -231 common exit_group sys_exit_group
> > +231 common exit_group sys_exit_group 0 noreturn
> >  232 common epoll_wait sys_epoll_wait
> >  233 common epoll_ctl sys_epoll_ctl
> >  234 common tgkill sys_tgkill
> > diff --git a/arch/x86/include/asm/cpufeatures.h
> > b/arch/x86/include/asm/cpufeatures.h
> > index 3c7434329661c..d64b0a5291f10 100644
> > --- a/arch/x86/include/asm/cpufeatures.h
> > +++ b/arch/x86/include/asm/cpufeatures.h
> > @@ -470,6 +470,7 @@
> >  #define X86_FEATURE_BHI_CTRL (21*32+ 2) /* "" BHI_DIS_S HW control available */
> >  #define X86_FEATURE_CLEAR_BHB_HW (21*32+ 3) /* "" BHI_DIS_S HW
> > control enabled */
> >  #define X86_FEATURE_CLEAR_BHB_LOOP_ON_VMEXIT (21*32+ 4) /* "" Clear
> > branch history at vmexit using SW loop */
> > +#define X86_FEATURE_INDIRECT_BRANCH_OK (21*32+ 5) /* "" It's OK to
> > use indirect branches */
> >
> >  /*
> >   * BUG word(s)
> > diff --git a/arch/x86/include/asm/syscall.h b/arch/x86/include/asm/syscall.h
> > index 2fc7bc3863ff6..dfb59521244c2 100644
> > --- a/arch/x86/include/asm/syscall.h
> > +++ b/arch/x86/include/asm/syscall.h
> > @@ -16,14 +16,20 @@
> >  #include <asm/thread_info.h> /* for TS_COMPAT */
> >  #include <asm/unistd.h>
> >
> > -/* This is used purely for kernel/trace/trace_syscalls.c */
> >  typedef long (*sys_call_ptr_t)(const struct pt_regs *);
> >  extern const sys_call_ptr_t sys_call_table[];
> >
> > +#if defined(CONFIG_X86_32)
> > +#define ia32_sys_call_table sys_call_table
> > +#else
> >  /*
> >   * These may not exist, but still put the prototypes in so we
> >   * can use IS_ENABLED().
> >   */
> > +extern const sys_call_ptr_t ia32_sys_call_table[];
> > +extern const sys_call_ptr_t x32_sys_call_table[];
> > +#endif
> > +
> >  extern long ia32_sys_call(const struct pt_regs *, unsigned int nr);
> >  extern long x32_sys_call(const struct pt_regs *, unsigned int nr);
> >  extern long x64_sys_call(const struct pt_regs *, unsigned int nr);
> > diff --git a/arch/x86/kernel/cpu/bugs.c b/arch/x86/kernel/cpu/bugs.c
> > index ab18185894dfd..5fca46c78daf2 100644
> > --- a/arch/x86/kernel/cpu/bugs.c
> > +++ b/arch/x86/kernel/cpu/bugs.c
> > @@ -1665,6 +1665,12 @@ static void __init bhi_select_mitigation(void)
> >   if (!IS_ENABLED(CONFIG_X86_64))
> >   return;
> >
> > + /*
> > + * There's no HW mitigation in place.  Mark indirect branches as
> > + * "not OK".
> > + */
> > + setup_clear_cpu_cap(X86_FEATURE_INDIRECT_BRANCH_OK);
> > +
> >   /* Mitigate KVM by default */
> >   setup_force_cpu_cap(X86_FEATURE_CLEAR_BHB_LOOP_ON_VMEXIT);
> >   pr_info("Spectre BHI mitigation: SW BHB clearing on vm exit\n");
> > @@ -1679,6 +1685,28 @@ static void __init spectre_v2_select_mitigation(void)
> >   enum spectre_v2_mitigation_cmd cmd = spectre_v2_parse_cmdline();
> >   enum spectre_v2_mitigation mode = SPECTRE_V2_NONE;
> >
> > + /*
> > + * X86_FEATURE_INDIRECT_BRANCH_OK indicates that indirect calls are
> > + * "OK" to use due to (at least) one of the following being true:
> > + *
> > + *   - the CPU isn't vulnerable to Spectre v2, BHI, etc;
> > + *
> > + *   - a HW mitigation is in place (e.g., IBRS, eIBRS+BHI_DIS_S); or
> > + *
> > + *   - the user disabled mitigations.
> > + *
> > + * Clearing the bit enables certain indirect branch "easy targets" [*]
> > + * to be converted to a series of direct branches.
> > + *
> > + * Assume innocence until proven guilty: set it now and clear it later
> > + * if/when needed.
> > + *
> > + * [*] The closer the indirect branch is to kernel entry, and the more
> > + *     user-controlled registers there are, the easier target it may be
> > + *     for future Spectre v2 variants.
> > + */
> > + setup_force_cpu_cap(X86_FEATURE_INDIRECT_BRANCH_OK);
> > +
> >   /*
> >   * If the CPU is not affected and the command line mode is NONE or AUTO
> >   * then nothing to do.
> > @@ -1765,11 +1793,16 @@ static void __init spectre_v2_select_mitigation(void)
> >   break;
> >
> >   case SPECTRE_V2_LFENCE:
> > + setup_clear_cpu_cap(X86_FEATURE_INDIRECT_BRANCH_OK);
> > + fallthrough;
> >   case SPECTRE_V2_EIBRS_LFENCE:
> >   setup_force_cpu_cap(X86_FEATURE_RETPOLINE_LFENCE);
> > - fallthrough;
> > + setup_force_cpu_cap(X86_FEATURE_RETPOLINE);
> > + break;
> >
> >   case SPECTRE_V2_RETPOLINE:
> > + setup_clear_cpu_cap(X86_FEATURE_INDIRECT_BRANCH_OK);
> > + fallthrough;
> >   case SPECTRE_V2_EIBRS_RETPOLINE:
> >   setup_force_cpu_cap(X86_FEATURE_RETPOLINE);
> >   break;
> > diff --git a/arch/x86/um/sys_call_table_32.c b/arch/x86/um/sys_call_table_32.c
> > index 89df5d89d6640..c7d4bf955d2ba 100644
> > --- a/arch/x86/um/sys_call_table_32.c
> > +++ b/arch/x86/um/sys_call_table_32.c
> > @@ -24,6 +24,7 @@
> >  #define __SYSCALL_WITH_COMPAT(nr, native, compat) __SYSCALL(nr, native)
> >
> >  #define __SYSCALL(nr, sym) extern asmlinkage long sym(unsigned long,
> > unsigned long, unsigned long, unsigned long, unsigned long, unsigned
> > long);
> > +#define __SYSCALL_NORETURN __SYSCALL
> >  #include <asm/syscalls_32.h>
> >
> >  #undef __SYSCALL
> > diff --git a/arch/x86/um/sys_call_table_64.c b/arch/x86/um/sys_call_table_64.c
> > index b0b4cfd2308c8..4760c40ae5cd0 100644
> > --- a/arch/x86/um/sys_call_table_64.c
> > +++ b/arch/x86/um/sys_call_table_64.c
> > @@ -19,6 +19,7 @@
> >  #define sys_ioperm sys_ni_syscall
> >
> >  #define __SYSCALL(nr, sym) extern asmlinkage long sym(unsigned long,
> > unsigned long, unsigned long, unsigned long, unsigned long, unsigned
> > long);
> > +#define __SYSCALL_NORETURN __SYSCALL
> >  #include <asm/syscalls_64.h>
> >
> >  #undef __SYSCALL
> > diff --git a/scripts/syscalltbl.sh b/scripts/syscalltbl.sh
> > index 6abe143889ef6..16487d47e06a3 100755
> > --- a/scripts/syscalltbl.sh
> > +++ b/scripts/syscalltbl.sh
> > @@ -54,7 +54,7 @@ nxt=0
> >
> >  grep -E "^[0-9]+[[:space:]]+$abis" "$infile" | {
> >
> > - while read nr abi name native compat ; do
> > + while read nr abi name native compat noreturn; do
> >
> >   if [ $nxt -gt $nr ]; then
> >   echo "error: $infile: syscall table is not sorted or duplicates the
> > same syscall number" >&2
> > @@ -66,7 +66,9 @@ grep -E "^[0-9]+[[:space:]]+$abis" "$infile" | {
> >   nxt=$((nxt + 1))
> >   done
> >
> > - if [ -n "$compat" ]; then
> > + if [ -n "$noreturn" ]; then
> > + echo "__SYSCALL_NORETURN($nr, $native)"
> > + elif [ -n "$compat" ]; then
> >   echo "__SYSCALL_WITH_COMPAT($nr, $native, $compat)"
> >   elif [ -n "$native" ]; then
> >   echo "__SYSCALL($nr, $native)"
> > diff --git a/tools/objtool/noreturns.h b/tools/objtool/noreturns.h
> > index 7ebf29c911849..1e8141ef1b15d 100644
> > --- a/tools/objtool/noreturns.h
> > +++ b/tools/objtool/noreturns.h
> > @@ -7,12 +7,16 @@
> >   * Yes, this is unfortunate.  A better solution is in the works.
> >   */
> >  NORETURN(__fortify_panic)
> > +NORETURN(__ia32_sys_exit)
> > +NORETURN(__ia32_sys_exit_group)
> >  NORETURN(__kunit_abort)
> >  NORETURN(__module_put_and_kthread_exit)
> >  NORETURN(__reiserfs_panic)
> >  NORETURN(__stack_chk_fail)
> >  NORETURN(__tdx_hypercall_failed)
> >  NORETURN(__ubsan_handle_builtin_unreachable)
> > +NORETURN(__x64_sys_exit)
> > +NORETURN(__x64_sys_exit_group)
> >  NORETURN(arch_cpu_idle_dead)
> >  NORETURN(bch2_trans_in_restart_error)
> >  NORETURN(bch2_trans_restart_error)
> >
> > On Sun, Apr 21, 2024 at 1:40 PM Paul McKenney <paulmckrcu@gmail.com> wrote:
> > >
> > > They apply fine as is, so I have started tests with that pair of patches.
> > >
> > >                                               Thanx, Paul
> > >
> > > On Sat, Apr 20, 2024 at 10:25 PM Josh Poimboeuf <jpoimboe@kernel.org> wrote:
> > > >
> > > > On Sat, Apr 20, 2024 at 06:58:58AM -0700, Paul E. McKenney wrote:
> > > > > On Fri, Apr 19, 2024 at 02:09:49PM -0700, Josh Poimboeuf wrote:
> > > > > > The direct-call syscall dispatch functions don't know that the exit()
> > > > > > and exit_group() syscall handlers don't return.  As a result the call
> > > > > > sites aren't optimized accordingly.
> > > > > >
> > > > > > Fix that by marking those exit syscall declarations as __noreturn.
> > > > > >
> > > > > > Fixes the following warnings:
> > > > > >
> > > > > >   vmlinux.o: warning: objtool: x64_sys_call+0x2804: __x64_sys_exit() is missing a __noreturn annotation
> > > > > >   vmlinux.o: warning: objtool: ia32_sys_call+0x29b6: __ia32_sys_exit_group() is missing a __noreturn annotation
> > > > > >
> > > > > > Fixes: 7390db8aea0d ("x86/bhi: Add support for clearing branch history at syscall entry")
> > > > > > Reported-by: "Paul E. McKenney" <paulmck@kernel.org>
> > > > > > Closes: https://lkml.kernel.org/lkml/6dba9b32-db2c-4e6d-9500-7a08852f17a3@paulmck-laptop
> > > > > > Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org>
> > > > >
> > > > > Looks good, but it does not apply on top of current -next and I don't
> > > > > trust myself to hand-apply it (something about having just got off of
> > > > > a flight across the big pond).
> > > > >
> > > > > Could you please let me know what else do I need to pull in to be able
> > > > > to cleanly apply this one?
> > > >
> > > > This patch has a dependency on an earlier patch in the set:
> > > >
> > > >   https://lkml.kernel.org/lkml/982d05a2f669140f26500bee643011896d661094.1713559768.git.jpoimboe@kernel.org
> > > >
> > > > Though I think it's not a hard dependency and I could reverse the order
> > > > of the patches if needed.
> > > >
> > > > --
> > > > Josh

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v4 3/5] x86/syscall: Mark exit[_group] syscall handlers __noreturn
  2024-05-02 23:48           ` Paul McKenney
  2024-05-03 15:38             ` Paul E. McKenney
@ 2024-05-03 19:56             ` Josh Poimboeuf
  2024-05-03 20:44               ` Josh Poimboeuf
  1 sibling, 1 reply; 21+ messages in thread
From: Josh Poimboeuf @ 2024-05-03 19:56 UTC (permalink / raw)
  To: Paul McKenney
  Cc: Paul E. McKenney, x86, linux-kernel, Linus Torvalds,
	Daniel Sneddon, Pawan Gupta, Thomas Gleixner, Alexandre Chartre,
	Konrad Rzeszutek Wilk, Peter Zijlstra, Greg Kroah-Hartman,
	Sean Christopherson, Andrew Cooper, Dave Hansen, Nikolay Borisov,
	KP Singh, Waiman Long, Borislav Petkov, Ingo Molnar

On Thu, May 02, 2024 at 04:48:13PM -0700, Paul McKenney wrote:
> On Sun, Apr 21, 2024 at 2:47 PM Paul McKenney <paulmckrcu@gmail.com> wrote:
> >
> > And this definitely helped, thank you!
> >
> > However, this one still remains:
> >
> > vmlinux.o: warning: objtool: ia32_sys_call+0x29b6:
> > __ia32_sys_exit_group() is missing a __noreturn annotation
> 
> And looking at the patched code, this function looks to me to be
> correctly marked.
> 
> No idea...  :-/

Ah, I think I missed fixing syscall_32.tbl.  Lemme see how to do that...

-- 
Josh

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v4 3/5] x86/syscall: Mark exit[_group] syscall handlers __noreturn
  2024-05-03 19:56             ` Josh Poimboeuf
@ 2024-05-03 20:44               ` Josh Poimboeuf
  2024-05-03 23:33                 ` Paul E. McKenney
  0 siblings, 1 reply; 21+ messages in thread
From: Josh Poimboeuf @ 2024-05-03 20:44 UTC (permalink / raw)
  To: Paul McKenney
  Cc: Paul E. McKenney, x86, linux-kernel, Linus Torvalds,
	Daniel Sneddon, Pawan Gupta, Thomas Gleixner, Alexandre Chartre,
	Konrad Rzeszutek Wilk, Peter Zijlstra, Greg Kroah-Hartman,
	Sean Christopherson, Andrew Cooper, Dave Hansen, Nikolay Borisov,
	KP Singh, Waiman Long, Borislav Petkov, Ingo Molnar

On Fri, May 03, 2024 at 12:57:00PM -0700, Josh Poimboeuf wrote:
> On Thu, May 02, 2024 at 04:48:13PM -0700, Paul McKenney wrote:
> > On Sun, Apr 21, 2024 at 2:47 PM Paul McKenney <paulmckrcu@gmail.com> wrote:
> > >
> > > And this definitely helped, thank you!
> > >
> > > However, this one still remains:
> > >
> > > vmlinux.o: warning: objtool: ia32_sys_call+0x29b6:
> > > __ia32_sys_exit_group() is missing a __noreturn annotation
> > 
> > And looking at the patched code, this function looks to me to be
> > correctly marked.
> > 
> > No idea...  :-/
> 
> Ah, I think I missed fixing syscall_32.tbl.  Lemme see how to do that...

Can you try adding this on top?

diff --git a/arch/x86/entry/syscalls/syscall_32.tbl b/arch/x86/entry/syscalls/syscall_32.tbl
index 5f8591ce7f25..f30b608d14dc 100644
--- a/arch/x86/entry/syscalls/syscall_32.tbl
+++ b/arch/x86/entry/syscalls/syscall_32.tbl
@@ -12,7 +12,7 @@
 # The abi is always "i386" for this file.
 #
 0	i386	restart_syscall		sys_restart_syscall
-1	i386	exit			sys_exit
+1	i386	exit			sys_exit			0 noreturn
 2	i386	fork			sys_fork
 3	i386	read			sys_read
 4	i386	write			sys_write
@@ -263,7 +263,7 @@
 249	i386	io_cancel		sys_io_cancel
 250	i386	fadvise64		sys_ia32_fadvise64
 # 251 is available for reuse (was briefly sys_set_zone_reclaim)
-252	i386	exit_group		sys_exit_group
+252	i386	exit_group		sys_exit_group			0 noreturn
 253	i386	lookup_dcookie
 254	i386	epoll_create		sys_epoll_create
 255	i386	epoll_ctl		sys_epoll_ctl

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* Re: [PATCH v4 3/5] x86/syscall: Mark exit[_group] syscall handlers __noreturn
  2024-05-03 20:44               ` Josh Poimboeuf
@ 2024-05-03 23:33                 ` Paul E. McKenney
  2024-05-03 23:48                   ` Josh Poimboeuf
  0 siblings, 1 reply; 21+ messages in thread
From: Paul E. McKenney @ 2024-05-03 23:33 UTC (permalink / raw)
  To: Josh Poimboeuf
  Cc: x86, linux-kernel, Linus Torvalds, Daniel Sneddon, Pawan Gupta,
	Thomas Gleixner, Alexandre Chartre, Konrad Rzeszutek Wilk,
	Peter Zijlstra, Greg Kroah-Hartman, Sean Christopherson,
	Andrew Cooper, Dave Hansen, Nikolay Borisov, KP Singh,
	Waiman Long, Borislav Petkov, Ingo Molnar

On Fri, May 03, 2024 at 01:44:17PM -0700, Josh Poimboeuf wrote:
> On Fri, May 03, 2024 at 12:57:00PM -0700, Josh Poimboeuf wrote:
> > On Thu, May 02, 2024 at 04:48:13PM -0700, Paul McKenney wrote:
> > > On Sun, Apr 21, 2024 at 2:47 PM Paul McKenney <paulmckrcu@gmail.com> wrote:
> > > >
> > > > And this definitely helped, thank you!
> > > >
> > > > However, this one still remains:
> > > >
> > > > vmlinux.o: warning: objtool: ia32_sys_call+0x29b6:
> > > > __ia32_sys_exit_group() is missing a __noreturn annotation
> > > 
> > > And looking at the patched code, this function looks to me to be
> > > correctly marked.
> > > 
> > > No idea...  :-/
> > 
> > Ah, I think I missed fixing syscall_32.tbl.  Lemme see how to do that...
> 
> Can you try adding this on top?
> 
> diff --git a/arch/x86/entry/syscalls/syscall_32.tbl b/arch/x86/entry/syscalls/syscall_32.tbl
> index 5f8591ce7f25..f30b608d14dc 100644
> --- a/arch/x86/entry/syscalls/syscall_32.tbl
> +++ b/arch/x86/entry/syscalls/syscall_32.tbl
> @@ -12,7 +12,7 @@
>  # The abi is always "i386" for this file.
>  #
>  0	i386	restart_syscall		sys_restart_syscall
> -1	i386	exit			sys_exit
> +1	i386	exit			sys_exit			0 noreturn
>  2	i386	fork			sys_fork
>  3	i386	read			sys_read
>  4	i386	write			sys_write
> @@ -263,7 +263,7 @@
>  249	i386	io_cancel		sys_io_cancel
>  250	i386	fadvise64		sys_ia32_fadvise64
>  # 251 is available for reuse (was briefly sys_set_zone_reclaim)
> -252	i386	exit_group		sys_exit_group
> +252	i386	exit_group		sys_exit_group			0 noreturn
>  253	i386	lookup_dcookie
>  254	i386	epoll_create		sys_epoll_create
>  255	i386	epoll_ctl		sys_epoll_ctl

Thank you!

But for non-KCSAN builds, I get the following diagnostics:

In file included from arch/x86/entry/syscall_32.c:17:
./arch/x86/include/generated/asm/syscalls_32.h:2:20: error: expected declaration specifiers or ‘...’ before numeric constant
    2 | __SYSCALL_NORETURN(1, sys_exit)

For KCSAN builds, I instead get the following diagnostics:

In file included from arch/x86/entry/syscall_32.c:17:
./arch/x86/include/generated/asm/syscalls_32.h:2:20: error: expected parameter declarator
__SYSCALL_NORETURN(1, sys_exit)

Does arch/x86/entry/syscall_32.c need the following additional patch?

A quick smoke test passes, but perhaps I am just getting lucky...

							Thanx, Paul

------------------------------------------------------------------------

diff --git a/arch/x86/entry/syscall_32.c b/arch/x86/entry/syscall_32.c
index aab31760b4e3e..d9ae910ea6f33 100644
--- a/arch/x86/entry/syscall_32.c
+++ b/arch/x86/entry/syscall_32.c
@@ -14,9 +14,13 @@
 #endif
 
 #define __SYSCALL(nr, sym) extern long __ia32_##sym(const struct pt_regs *);
+#define __SYSCALL_NORETURN(nr, sym) extern long __noreturn __ia32_##sym(const struct pt_regs *);
 #include <asm/syscalls_32.h>
 #undef __SYSCALL
 
+#undef __SYSCALL_NORETURN
+#define __SYSCALL_NORETURN __SYSCALL
+
 #define __SYSCALL(nr, sym) __ia32_##sym,
 const sys_call_ptr_t ia32_sys_call_table[] = {
 #include <asm/syscalls_32.h>

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* Re: [PATCH v4 3/5] x86/syscall: Mark exit[_group] syscall handlers __noreturn
  2024-05-03 23:33                 ` Paul E. McKenney
@ 2024-05-03 23:48                   ` Josh Poimboeuf
  0 siblings, 0 replies; 21+ messages in thread
From: Josh Poimboeuf @ 2024-05-03 23:48 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: x86, linux-kernel, Linus Torvalds, Daniel Sneddon, Pawan Gupta,
	Thomas Gleixner, Alexandre Chartre, Konrad Rzeszutek Wilk,
	Peter Zijlstra, Greg Kroah-Hartman, Sean Christopherson,
	Andrew Cooper, Dave Hansen, Nikolay Borisov, KP Singh,
	Waiman Long, Borislav Petkov, Ingo Molnar

On Fri, May 03, 2024 at 04:33:00PM -0700, Paul E. McKenney wrote:
> Does arch/x86/entry/syscall_32.c need the following additional patch?
> 
> A quick smoke test passes, but perhaps I am just getting lucky...
> 
> 							Thanx, Paul
> 
> ------------------------------------------------------------------------
> 
> diff --git a/arch/x86/entry/syscall_32.c b/arch/x86/entry/syscall_32.c
> index aab31760b4e3e..d9ae910ea6f33 100644
> --- a/arch/x86/entry/syscall_32.c
> +++ b/arch/x86/entry/syscall_32.c
> @@ -14,9 +14,13 @@
>  #endif
>  
>  #define __SYSCALL(nr, sym) extern long __ia32_##sym(const struct pt_regs *);
> +#define __SYSCALL_NORETURN(nr, sym) extern long __noreturn __ia32_##sym(const struct pt_regs *);
>  #include <asm/syscalls_32.h>
>  #undef __SYSCALL
>  
> +#undef __SYSCALL_NORETURN
> +#define __SYSCALL_NORETURN __SYSCALL
> +
>  #define __SYSCALL(nr, sym) __ia32_##sym,
>  const sys_call_ptr_t ia32_sys_call_table[] = {
>  #include <asm/syscalls_32.h>

Ah, yeah, that looks right.

-- 
Josh

^ permalink raw reply	[flat|nested] 21+ messages in thread

end of thread, other threads:[~2024-05-03 23:48 UTC | newest]

Thread overview: 21+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-04-19 21:09 [PATCH 0/5] x86/bugs: more BHI fixes Josh Poimboeuf
2024-04-19 21:09 ` [PATCH v4 1/5] x86/bugs: Only harden syscalls when needed Josh Poimboeuf
2024-04-22  8:09   ` Yujie Liu
2024-04-19 21:09 ` [PATCH v4 2/5] cpu/speculation: Fix CPU mitigation defaults for !x86 Josh Poimboeuf
2024-04-20  0:09   ` Sean Christopherson
2024-04-23 14:10     ` Sean Christopherson
2024-04-24  5:35       ` Josh Poimboeuf
2024-04-19 21:09 ` [PATCH v4 3/5] x86/syscall: Mark exit[_group] syscall handlers __noreturn Josh Poimboeuf
2024-04-20 13:58   ` Paul E. McKenney
2024-04-21  5:25     ` Josh Poimboeuf
2024-04-21 20:40       ` Paul McKenney
2024-04-21 21:47         ` Paul McKenney
2024-05-02 23:48           ` Paul McKenney
2024-05-03 15:38             ` Paul E. McKenney
2024-05-03 19:56             ` Josh Poimboeuf
2024-05-03 20:44               ` Josh Poimboeuf
2024-05-03 23:33                 ` Paul E. McKenney
2024-05-03 23:48                   ` Josh Poimboeuf
2024-04-19 21:09 ` [PATCH v4 4/5] x86/bugs: Remove duplicate Spectre cmdline option descriptions Josh Poimboeuf
2024-04-19 21:09 ` [PATCH v4 5/5] x86/bugs: Add 'spectre_bhi=vmexit' cmdline option Josh Poimboeuf
2024-04-19 21:46   ` Josh Poimboeuf

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.