* [PATCH 0/5] x86/bugs: more BHI fixes
@ 2024-04-19 21:09 Josh Poimboeuf
2024-04-19 21:09 ` [PATCH v4 1/5] x86/bugs: Only harden syscalls when needed Josh Poimboeuf
` (4 more replies)
0 siblings, 5 replies; 21+ messages in thread
From: Josh Poimboeuf @ 2024-04-19 21:09 UTC (permalink / raw)
To: x86
Cc: linux-kernel, Linus Torvalds, Daniel Sneddon, Pawan Gupta,
Thomas Gleixner, Alexandre Chartre, Konrad Rzeszutek Wilk,
Peter Zijlstra, Greg Kroah-Hartman, Sean Christopherson,
Andrew Cooper, Dave Hansen, Nikolay Borisov, KP Singh,
Waiman Long, Borislav Petkov, Ingo Molnar
Patch 1 is another iteration of reducing the scope of syscall hardening
in order to improve performance on some CPUs. The feature bit has a new
name, and the commit log and comments are much improved.
The rest of the patches are new:
- Patch 2 fixes the default mitigations for !x86 (reimplementation of
Sean's fix).
- Patch 3 fixes some objtool warnings found by Paul.
- Patch 4 is a documentation cleanup and prep for patch 5.
- Patch 5 adds a requested spectre_bhi=vmexit option.
Josh Poimboeuf (5):
x86/bugs: Only harden syscalls when needed
cpu/speculation: Fix CPU mitigation defaults for !x86
x86/syscall: Mark exit[_group] syscall handlers __noreturn
x86/bugs: Remove duplicate Spectre cmdline option descriptions
x86/bugs: Add 'spectre_bhi=vmexit' cmdline option
Documentation/admin-guide/hw-vuln/spectre.rst | 84 ++-----------------
.../admin-guide/kernel-parameters.txt | 12 ++-
arch/Kconfig | 10 +++
arch/x86/Kconfig | 15 +---
arch/x86/entry/common.c | 15 +++-
arch/x86/entry/syscall_32.c | 11 +--
arch/x86/entry/syscall_64.c | 10 +--
arch/x86/entry/syscall_x32.c | 11 ++-
arch/x86/entry/syscalls/syscall_64.tbl | 6 +-
arch/x86/include/asm/cpufeatures.h | 1 +
arch/x86/include/asm/syscall.h | 8 +-
arch/x86/kernel/cpu/bugs.c | 51 +++++++++--
arch/x86/um/sys_call_table_32.c | 1 +
arch/x86/um/sys_call_table_64.c | 1 +
kernel/cpu.c | 4 +-
scripts/syscalltbl.sh | 6 +-
tools/objtool/noreturns.h | 4 +
17 files changed, 126 insertions(+), 124 deletions(-)
--
2.44.0
^ permalink raw reply [flat|nested] 21+ messages in thread
* [PATCH v4 1/5] x86/bugs: Only harden syscalls when needed
2024-04-19 21:09 [PATCH 0/5] x86/bugs: more BHI fixes Josh Poimboeuf
@ 2024-04-19 21:09 ` Josh Poimboeuf
2024-04-22 8:09 ` Yujie Liu
2024-04-19 21:09 ` [PATCH v4 2/5] cpu/speculation: Fix CPU mitigation defaults for !x86 Josh Poimboeuf
` (3 subsequent siblings)
4 siblings, 1 reply; 21+ messages in thread
From: Josh Poimboeuf @ 2024-04-19 21:09 UTC (permalink / raw)
To: x86
Cc: linux-kernel, Linus Torvalds, Daniel Sneddon, Pawan Gupta,
Thomas Gleixner, Alexandre Chartre, Konrad Rzeszutek Wilk,
Peter Zijlstra, Greg Kroah-Hartman, Sean Christopherson,
Andrew Cooper, Dave Hansen, Nikolay Borisov, KP Singh,
Waiman Long, Borislav Petkov, Ingo Molnar
Syscall hardening (converting the syscall indirect branch to a series of
direct branches) has shown some performance regressions:
- Red Hat internal testing showed up to 12% slowdowns in database
benchmark testing on Sapphire Rapids when the DB was stressed with 80+
users to cause contention.
- The kernel test robot's will-it-scale benchmarks showed significant
regressions on Skylake with IBRS:
https://lkml.kernel.org/lkml/202404191333.178a0eed-yujie.liu@intel.com
To fix those slowdowns, only use the syscall direct branches when
indirect branches are considered to be "not OK": meaning Spectre v2+BHI
isn't mitigated by HW and the user hasn't disabled mitigations.
Fixes: 1e3ad78334a6 ("x86/syscall: Don't force use of indirect calls for system calls")
Reviewed-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com>
Acked-by: Borislav Petkov (AMD) <bp@alien8.de>
Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org>
---
arch/x86/entry/common.c | 15 ++++++++++---
arch/x86/entry/syscall_32.c | 11 +---------
arch/x86/entry/syscall_64.c | 6 -----
arch/x86/entry/syscall_x32.c | 7 +++++-
arch/x86/include/asm/cpufeatures.h | 1 +
arch/x86/include/asm/syscall.h | 8 ++++++-
arch/x86/kernel/cpu/bugs.c | 35 +++++++++++++++++++++++++++++-
7 files changed, 61 insertions(+), 22 deletions(-)
diff --git a/arch/x86/entry/common.c b/arch/x86/entry/common.c
index 51cc9c7cb9bd..db1ef98da3a4 100644
--- a/arch/x86/entry/common.c
+++ b/arch/x86/entry/common.c
@@ -49,7 +49,10 @@ static __always_inline bool do_syscall_x64(struct pt_regs *regs, int nr)
if (likely(unr < NR_syscalls)) {
unr = array_index_nospec(unr, NR_syscalls);
- regs->ax = x64_sys_call(regs, unr);
+ if (likely(cpu_feature_enabled(X86_FEATURE_INDIRECT_BRANCH_OK)))
+ regs->ax = sys_call_table[unr](regs);
+ else
+ regs->ax = x64_sys_call(regs, unr);
return true;
}
return false;
@@ -66,7 +69,10 @@ static __always_inline bool do_syscall_x32(struct pt_regs *regs, int nr)
if (IS_ENABLED(CONFIG_X86_X32_ABI) && likely(xnr < X32_NR_syscalls)) {
xnr = array_index_nospec(xnr, X32_NR_syscalls);
- regs->ax = x32_sys_call(regs, xnr);
+ if (likely(cpu_feature_enabled(X86_FEATURE_INDIRECT_BRANCH_OK)))
+ regs->ax = x32_sys_call_table[xnr](regs);
+ else
+ regs->ax = x32_sys_call(regs, xnr);
return true;
}
return false;
@@ -162,7 +168,10 @@ static __always_inline void do_syscall_32_irqs_on(struct pt_regs *regs, int nr)
if (likely(unr < IA32_NR_syscalls)) {
unr = array_index_nospec(unr, IA32_NR_syscalls);
- regs->ax = ia32_sys_call(regs, unr);
+ if (likely(cpu_feature_enabled(X86_FEATURE_INDIRECT_BRANCH_OK)))
+ regs->ax = ia32_sys_call_table[unr](regs);
+ else
+ regs->ax = ia32_sys_call(regs, unr);
} else if (nr != -1) {
regs->ax = __ia32_sys_ni_syscall(regs);
}
diff --git a/arch/x86/entry/syscall_32.c b/arch/x86/entry/syscall_32.c
index c2235bae17ef..aab31760b4e3 100644
--- a/arch/x86/entry/syscall_32.c
+++ b/arch/x86/entry/syscall_32.c
@@ -14,25 +14,16 @@
#endif
#define __SYSCALL(nr, sym) extern long __ia32_##sym(const struct pt_regs *);
-
#include <asm/syscalls_32.h>
#undef __SYSCALL
-/*
- * The sys_call_table[] is no longer used for system calls, but
- * kernel/trace/trace_syscalls.c still wants to know the system
- * call address.
- */
-#ifdef CONFIG_X86_32
#define __SYSCALL(nr, sym) __ia32_##sym,
-const sys_call_ptr_t sys_call_table[] = {
+const sys_call_ptr_t ia32_sys_call_table[] = {
#include <asm/syscalls_32.h>
};
#undef __SYSCALL
-#endif
#define __SYSCALL(nr, sym) case nr: return __ia32_##sym(regs);
-
long ia32_sys_call(const struct pt_regs *regs, unsigned int nr)
{
switch (nr) {
diff --git a/arch/x86/entry/syscall_64.c b/arch/x86/entry/syscall_64.c
index 33b3f09e6f15..96ea1f8a1d3f 100644
--- a/arch/x86/entry/syscall_64.c
+++ b/arch/x86/entry/syscall_64.c
@@ -11,11 +11,6 @@
#include <asm/syscalls_64.h>
#undef __SYSCALL
-/*
- * The sys_call_table[] is no longer used for system calls, but
- * kernel/trace/trace_syscalls.c still wants to know the system
- * call address.
- */
#define __SYSCALL(nr, sym) __x64_##sym,
const sys_call_ptr_t sys_call_table[] = {
#include <asm/syscalls_64.h>
@@ -23,7 +18,6 @@ const sys_call_ptr_t sys_call_table[] = {
#undef __SYSCALL
#define __SYSCALL(nr, sym) case nr: return __x64_##sym(regs);
-
long x64_sys_call(const struct pt_regs *regs, unsigned int nr)
{
switch (nr) {
diff --git a/arch/x86/entry/syscall_x32.c b/arch/x86/entry/syscall_x32.c
index 03de4a932131..5aef4230faca 100644
--- a/arch/x86/entry/syscall_x32.c
+++ b/arch/x86/entry/syscall_x32.c
@@ -11,8 +11,13 @@
#include <asm/syscalls_x32.h>
#undef __SYSCALL
-#define __SYSCALL(nr, sym) case nr: return __x64_##sym(regs);
+#define __SYSCALL(nr, sym) __x64_##sym,
+const sys_call_ptr_t x32_sys_call_table[] = {
+#include <asm/syscalls_x32.h>
+};
+#undef __SYSCALL
+#define __SYSCALL(nr, sym) case nr: return __x64_##sym(regs);
long x32_sys_call(const struct pt_regs *regs, unsigned int nr)
{
switch (nr) {
diff --git a/arch/x86/include/asm/cpufeatures.h b/arch/x86/include/asm/cpufeatures.h
index 3c7434329661..d64b0a5291f1 100644
--- a/arch/x86/include/asm/cpufeatures.h
+++ b/arch/x86/include/asm/cpufeatures.h
@@ -470,6 +470,7 @@
#define X86_FEATURE_BHI_CTRL (21*32+ 2) /* "" BHI_DIS_S HW control available */
#define X86_FEATURE_CLEAR_BHB_HW (21*32+ 3) /* "" BHI_DIS_S HW control enabled */
#define X86_FEATURE_CLEAR_BHB_LOOP_ON_VMEXIT (21*32+ 4) /* "" Clear branch history at vmexit using SW loop */
+#define X86_FEATURE_INDIRECT_BRANCH_OK (21*32+ 5) /* "" It's OK to use indirect branches */
/*
* BUG word(s)
diff --git a/arch/x86/include/asm/syscall.h b/arch/x86/include/asm/syscall.h
index 2fc7bc3863ff..dfb59521244c 100644
--- a/arch/x86/include/asm/syscall.h
+++ b/arch/x86/include/asm/syscall.h
@@ -16,14 +16,20 @@
#include <asm/thread_info.h> /* for TS_COMPAT */
#include <asm/unistd.h>
-/* This is used purely for kernel/trace/trace_syscalls.c */
typedef long (*sys_call_ptr_t)(const struct pt_regs *);
extern const sys_call_ptr_t sys_call_table[];
+#if defined(CONFIG_X86_32)
+#define ia32_sys_call_table sys_call_table
+#else
/*
* These may not exist, but still put the prototypes in so we
* can use IS_ENABLED().
*/
+extern const sys_call_ptr_t ia32_sys_call_table[];
+extern const sys_call_ptr_t x32_sys_call_table[];
+#endif
+
extern long ia32_sys_call(const struct pt_regs *, unsigned int nr);
extern long x32_sys_call(const struct pt_regs *, unsigned int nr);
extern long x64_sys_call(const struct pt_regs *, unsigned int nr);
diff --git a/arch/x86/kernel/cpu/bugs.c b/arch/x86/kernel/cpu/bugs.c
index ab18185894df..5fca46c78daf 100644
--- a/arch/x86/kernel/cpu/bugs.c
+++ b/arch/x86/kernel/cpu/bugs.c
@@ -1665,6 +1665,12 @@ static void __init bhi_select_mitigation(void)
if (!IS_ENABLED(CONFIG_X86_64))
return;
+ /*
+ * There's no HW mitigation in place. Mark indirect branches as
+ * "not OK".
+ */
+ setup_clear_cpu_cap(X86_FEATURE_INDIRECT_BRANCH_OK);
+
/* Mitigate KVM by default */
setup_force_cpu_cap(X86_FEATURE_CLEAR_BHB_LOOP_ON_VMEXIT);
pr_info("Spectre BHI mitigation: SW BHB clearing on vm exit\n");
@@ -1679,6 +1685,28 @@ static void __init spectre_v2_select_mitigation(void)
enum spectre_v2_mitigation_cmd cmd = spectre_v2_parse_cmdline();
enum spectre_v2_mitigation mode = SPECTRE_V2_NONE;
+ /*
+ * X86_FEATURE_INDIRECT_BRANCH_OK indicates that indirect calls are
+ * "OK" to use due to (at least) one of the following being true:
+ *
+ * - the CPU isn't vulnerable to Spectre v2, BHI, etc;
+ *
+ * - a HW mitigation is in place (e.g., IBRS, eIBRS+BHI_DIS_S); or
+ *
+ * - the user disabled mitigations.
+ *
+ * Clearing the bit enables certain indirect branch "easy targets" [*]
+ * to be converted to a series of direct branches.
+ *
+ * Assume innocence until proven guilty: set it now and clear it later
+ * if/when needed.
+ *
+ * [*] The closer the indirect branch is to kernel entry, and the more
+ * user-controlled registers there are, the easier target it may be
+ * for future Spectre v2 variants.
+ */
+ setup_force_cpu_cap(X86_FEATURE_INDIRECT_BRANCH_OK);
+
/*
* If the CPU is not affected and the command line mode is NONE or AUTO
* then nothing to do.
@@ -1765,11 +1793,16 @@ static void __init spectre_v2_select_mitigation(void)
break;
case SPECTRE_V2_LFENCE:
+ setup_clear_cpu_cap(X86_FEATURE_INDIRECT_BRANCH_OK);
+ fallthrough;
case SPECTRE_V2_EIBRS_LFENCE:
setup_force_cpu_cap(X86_FEATURE_RETPOLINE_LFENCE);
- fallthrough;
+ setup_force_cpu_cap(X86_FEATURE_RETPOLINE);
+ break;
case SPECTRE_V2_RETPOLINE:
+ setup_clear_cpu_cap(X86_FEATURE_INDIRECT_BRANCH_OK);
+ fallthrough;
case SPECTRE_V2_EIBRS_RETPOLINE:
setup_force_cpu_cap(X86_FEATURE_RETPOLINE);
break;
--
2.44.0
^ permalink raw reply related [flat|nested] 21+ messages in thread
* [PATCH v4 2/5] cpu/speculation: Fix CPU mitigation defaults for !x86
2024-04-19 21:09 [PATCH 0/5] x86/bugs: more BHI fixes Josh Poimboeuf
2024-04-19 21:09 ` [PATCH v4 1/5] x86/bugs: Only harden syscalls when needed Josh Poimboeuf
@ 2024-04-19 21:09 ` Josh Poimboeuf
2024-04-20 0:09 ` Sean Christopherson
2024-04-19 21:09 ` [PATCH v4 3/5] x86/syscall: Mark exit[_group] syscall handlers __noreturn Josh Poimboeuf
` (2 subsequent siblings)
4 siblings, 1 reply; 21+ messages in thread
From: Josh Poimboeuf @ 2024-04-19 21:09 UTC (permalink / raw)
To: x86
Cc: linux-kernel, Linus Torvalds, Daniel Sneddon, Pawan Gupta,
Thomas Gleixner, Alexandre Chartre, Konrad Rzeszutek Wilk,
Peter Zijlstra, Greg Kroah-Hartman, Sean Christopherson,
Andrew Cooper, Dave Hansen, Nikolay Borisov, KP Singh,
Waiman Long, Borislav Petkov, Ingo Molnar, Stephen Rothwell,
Michael Ellerman, Geert Uytterhoeven
CPU speculative execution mitigations were inadvertently disabled on
non-x86 arches by the following commit:
f337a6a21e2f ("x86/cpu: Actually turn off mitigations by default for SPECULATION_MITIGATIONS=n")
Fix it by replacing CONFIG_SPECULATION_MITIGATIONS with a new generic
CONFIG_CPU_MITIGATIONS option and moving the x86-specific mitigations to
a separate menu which depends on CONFIG_CPU_MITIGATIONS.
Fixes: f337a6a21e2f ("x86/cpu: Actually turn off mitigations by default for SPECULATION_MITIGATIONS=n")
Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Reported-by: Michael Ellerman <mpe@ellerman.id.au>
Reported-by: Geert Uytterhoeven <geert@linux-m68k.org>
Closes: https://lkml.kernel.org/r/20240413115324.53303a68%40canb.auug.org.au
Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org>
---
arch/Kconfig | 10 ++++++++++
arch/x86/Kconfig | 15 +++------------
kernel/cpu.c | 4 ++--
3 files changed, 15 insertions(+), 14 deletions(-)
diff --git a/arch/Kconfig b/arch/Kconfig
index 9f066785bb71..5c96849eb957 100644
--- a/arch/Kconfig
+++ b/arch/Kconfig
@@ -11,6 +11,16 @@ source "arch/$(SRCARCH)/Kconfig"
menu "General architecture-dependent options"
+config CPU_MITIGATIONS
+ bool "Mitigations for CPU speculative execution vulnerabilities"
+ default y
+ help
+ Say Y here to enable mitigations for CPU speculative execution
+ vulnerabilities.
+
+ If you say N, all mitigations will be disabled. You really
+ should know what you are doing to say so.
+
config ARCH_HAS_SUBPAGE_FAULTS
bool
help
diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index 4474bf32d0a4..85a4d57bce1e 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -2488,17 +2488,8 @@ config PREFIX_SYMBOLS
def_bool y
depends on CALL_PADDING && !CFI_CLANG
-menuconfig SPECULATION_MITIGATIONS
- bool "Mitigations for speculative execution vulnerabilities"
- default y
- help
- Say Y here to enable options which enable mitigations for
- speculative execution hardware vulnerabilities.
-
- If you say N, all mitigations will be disabled. You really
- should know what you are doing to say so.
-
-if SPECULATION_MITIGATIONS
+menu "CPU speculative execution mitigation defaults"
+ depends on CPU_MITIGATIONS
config MITIGATION_PAGE_TABLE_ISOLATION
bool "Remove the kernel mapping in user mode"
@@ -2643,7 +2634,7 @@ config MITIGATION_SPECTRE_BHI
indirect branches.
See <file:Documentation/admin-guide/hw-vuln/spectre.rst>
-endif
+endmenu
config ARCH_HAS_ADD_PAGES
def_bool y
diff --git a/kernel/cpu.c b/kernel/cpu.c
index 07ad53b7f119..bb0ff275fb46 100644
--- a/kernel/cpu.c
+++ b/kernel/cpu.c
@@ -3207,8 +3207,8 @@ enum cpu_mitigations {
};
static enum cpu_mitigations cpu_mitigations __ro_after_init =
- IS_ENABLED(CONFIG_SPECULATION_MITIGATIONS) ? CPU_MITIGATIONS_AUTO :
- CPU_MITIGATIONS_OFF;
+ IS_ENABLED(CONFIG_CPU_MITIGATIONS) ? CPU_MITIGATIONS_AUTO :
+ CPU_MITIGATIONS_OFF;
static int __init mitigations_parse_cmdline(char *arg)
{
--
2.44.0
^ permalink raw reply related [flat|nested] 21+ messages in thread
* [PATCH v4 3/5] x86/syscall: Mark exit[_group] syscall handlers __noreturn
2024-04-19 21:09 [PATCH 0/5] x86/bugs: more BHI fixes Josh Poimboeuf
2024-04-19 21:09 ` [PATCH v4 1/5] x86/bugs: Only harden syscalls when needed Josh Poimboeuf
2024-04-19 21:09 ` [PATCH v4 2/5] cpu/speculation: Fix CPU mitigation defaults for !x86 Josh Poimboeuf
@ 2024-04-19 21:09 ` Josh Poimboeuf
2024-04-20 13:58 ` Paul E. McKenney
2024-04-19 21:09 ` [PATCH v4 4/5] x86/bugs: Remove duplicate Spectre cmdline option descriptions Josh Poimboeuf
2024-04-19 21:09 ` [PATCH v4 5/5] x86/bugs: Add 'spectre_bhi=vmexit' cmdline option Josh Poimboeuf
4 siblings, 1 reply; 21+ messages in thread
From: Josh Poimboeuf @ 2024-04-19 21:09 UTC (permalink / raw)
To: x86
Cc: linux-kernel, Linus Torvalds, Daniel Sneddon, Pawan Gupta,
Thomas Gleixner, Alexandre Chartre, Konrad Rzeszutek Wilk,
Peter Zijlstra, Greg Kroah-Hartman, Sean Christopherson,
Andrew Cooper, Dave Hansen, Nikolay Borisov, KP Singh,
Waiman Long, Borislav Petkov, Ingo Molnar, Paul E. McKenney
The direct-call syscall dispatch functions don't know that the exit()
and exit_group() syscall handlers don't return. As a result the call
sites aren't optimized accordingly.
Fix that by marking those exit syscall declarations as __noreturn.
Fixes the following warnings:
vmlinux.o: warning: objtool: x64_sys_call+0x2804: __x64_sys_exit() is missing a __noreturn annotation
vmlinux.o: warning: objtool: ia32_sys_call+0x29b6: __ia32_sys_exit_group() is missing a __noreturn annotation
Fixes: 7390db8aea0d ("x86/bhi: Add support for clearing branch history at syscall entry")
Reported-by: "Paul E. McKenney" <paulmck@kernel.org>
Closes: https://lkml.kernel.org/lkml/6dba9b32-db2c-4e6d-9500-7a08852f17a3@paulmck-laptop
Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org>
---
arch/x86/entry/syscall_64.c | 4 ++++
arch/x86/entry/syscall_x32.c | 4 ++++
arch/x86/entry/syscalls/syscall_64.tbl | 6 +++---
arch/x86/um/sys_call_table_32.c | 1 +
arch/x86/um/sys_call_table_64.c | 1 +
scripts/syscalltbl.sh | 6 ++++--
tools/objtool/noreturns.h | 4 ++++
7 files changed, 21 insertions(+), 5 deletions(-)
diff --git a/arch/x86/entry/syscall_64.c b/arch/x86/entry/syscall_64.c
index 96ea1f8a1d3f..ff36a993a07e 100644
--- a/arch/x86/entry/syscall_64.c
+++ b/arch/x86/entry/syscall_64.c
@@ -8,9 +8,13 @@
#include <asm/syscall.h>
#define __SYSCALL(nr, sym) extern long __x64_##sym(const struct pt_regs *);
+#define __SYSCALL_NORETURN(nr, sym) extern long __noreturn __x64_##sym(const struct pt_regs *);
#include <asm/syscalls_64.h>
#undef __SYSCALL
+#undef __SYSCALL_NORETURN
+#define __SYSCALL_NORETURN __SYSCALL
+
#define __SYSCALL(nr, sym) __x64_##sym,
const sys_call_ptr_t sys_call_table[] = {
#include <asm/syscalls_64.h>
diff --git a/arch/x86/entry/syscall_x32.c b/arch/x86/entry/syscall_x32.c
index 5aef4230faca..4221ecce6e68 100644
--- a/arch/x86/entry/syscall_x32.c
+++ b/arch/x86/entry/syscall_x32.c
@@ -8,9 +8,13 @@
#include <asm/syscall.h>
#define __SYSCALL(nr, sym) extern long __x64_##sym(const struct pt_regs *);
+#define __SYSCALL_NORETURN(nr, sym) extern long __noreturn __x64_##sym(const struct pt_regs *);
#include <asm/syscalls_x32.h>
#undef __SYSCALL
+#undef __SYSCALL_NORETURN
+#define __SYSCALL_NORETURN __SYSCALL
+
#define __SYSCALL(nr, sym) __x64_##sym,
const sys_call_ptr_t x32_sys_call_table[] = {
#include <asm/syscalls_x32.h>
diff --git a/arch/x86/entry/syscalls/syscall_64.tbl b/arch/x86/entry/syscalls/syscall_64.tbl
index 7e8d46f4147f..6695105d21b5 100644
--- a/arch/x86/entry/syscalls/syscall_64.tbl
+++ b/arch/x86/entry/syscalls/syscall_64.tbl
@@ -2,7 +2,7 @@
# 64-bit system call numbers and entry vectors
#
# The format is:
-# <number> <abi> <name> <entry point>
+# <number> <abi> <name> <entry point> [0 noreturn]
#
# The __x64_sys_*() stubs are created on-the-fly for sys_*() system calls
#
@@ -68,7 +68,7 @@
57 common fork sys_fork
58 common vfork sys_vfork
59 64 execve sys_execve
-60 common exit sys_exit
+60 common exit sys_exit 0 noreturn
61 common wait4 sys_wait4
62 common kill sys_kill
63 common uname sys_newuname
@@ -239,7 +239,7 @@
228 common clock_gettime sys_clock_gettime
229 common clock_getres sys_clock_getres
230 common clock_nanosleep sys_clock_nanosleep
-231 common exit_group sys_exit_group
+231 common exit_group sys_exit_group 0 noreturn
232 common epoll_wait sys_epoll_wait
233 common epoll_ctl sys_epoll_ctl
234 common tgkill sys_tgkill
diff --git a/arch/x86/um/sys_call_table_32.c b/arch/x86/um/sys_call_table_32.c
index 89df5d89d664..c7d4bf955d2b 100644
--- a/arch/x86/um/sys_call_table_32.c
+++ b/arch/x86/um/sys_call_table_32.c
@@ -24,6 +24,7 @@
#define __SYSCALL_WITH_COMPAT(nr, native, compat) __SYSCALL(nr, native)
#define __SYSCALL(nr, sym) extern asmlinkage long sym(unsigned long, unsigned long, unsigned long, unsigned long, unsigned long, unsigned long);
+#define __SYSCALL_NORETURN __SYSCALL
#include <asm/syscalls_32.h>
#undef __SYSCALL
diff --git a/arch/x86/um/sys_call_table_64.c b/arch/x86/um/sys_call_table_64.c
index b0b4cfd2308c..4760c40ae5cd 100644
--- a/arch/x86/um/sys_call_table_64.c
+++ b/arch/x86/um/sys_call_table_64.c
@@ -19,6 +19,7 @@
#define sys_ioperm sys_ni_syscall
#define __SYSCALL(nr, sym) extern asmlinkage long sym(unsigned long, unsigned long, unsigned long, unsigned long, unsigned long, unsigned long);
+#define __SYSCALL_NORETURN __SYSCALL
#include <asm/syscalls_64.h>
#undef __SYSCALL
diff --git a/scripts/syscalltbl.sh b/scripts/syscalltbl.sh
index 6abe143889ef..16487d47e06a 100755
--- a/scripts/syscalltbl.sh
+++ b/scripts/syscalltbl.sh
@@ -54,7 +54,7 @@ nxt=0
grep -E "^[0-9]+[[:space:]]+$abis" "$infile" | {
- while read nr abi name native compat ; do
+ while read nr abi name native compat noreturn; do
if [ $nxt -gt $nr ]; then
echo "error: $infile: syscall table is not sorted or duplicates the same syscall number" >&2
@@ -66,7 +66,9 @@ grep -E "^[0-9]+[[:space:]]+$abis" "$infile" | {
nxt=$((nxt + 1))
done
- if [ -n "$compat" ]; then
+ if [ -n "$noreturn" ]; then
+ echo "__SYSCALL_NORETURN($nr, $native)"
+ elif [ -n "$compat" ]; then
echo "__SYSCALL_WITH_COMPAT($nr, $native, $compat)"
elif [ -n "$native" ]; then
echo "__SYSCALL($nr, $native)"
diff --git a/tools/objtool/noreturns.h b/tools/objtool/noreturns.h
index 7ebf29c91184..1e8141ef1b15 100644
--- a/tools/objtool/noreturns.h
+++ b/tools/objtool/noreturns.h
@@ -7,12 +7,16 @@
* Yes, this is unfortunate. A better solution is in the works.
*/
NORETURN(__fortify_panic)
+NORETURN(__ia32_sys_exit)
+NORETURN(__ia32_sys_exit_group)
NORETURN(__kunit_abort)
NORETURN(__module_put_and_kthread_exit)
NORETURN(__reiserfs_panic)
NORETURN(__stack_chk_fail)
NORETURN(__tdx_hypercall_failed)
NORETURN(__ubsan_handle_builtin_unreachable)
+NORETURN(__x64_sys_exit)
+NORETURN(__x64_sys_exit_group)
NORETURN(arch_cpu_idle_dead)
NORETURN(bch2_trans_in_restart_error)
NORETURN(bch2_trans_restart_error)
--
2.44.0
^ permalink raw reply related [flat|nested] 21+ messages in thread
* [PATCH v4 4/5] x86/bugs: Remove duplicate Spectre cmdline option descriptions
2024-04-19 21:09 [PATCH 0/5] x86/bugs: more BHI fixes Josh Poimboeuf
` (2 preceding siblings ...)
2024-04-19 21:09 ` [PATCH v4 3/5] x86/syscall: Mark exit[_group] syscall handlers __noreturn Josh Poimboeuf
@ 2024-04-19 21:09 ` Josh Poimboeuf
2024-04-19 21:09 ` [PATCH v4 5/5] x86/bugs: Add 'spectre_bhi=vmexit' cmdline option Josh Poimboeuf
4 siblings, 0 replies; 21+ messages in thread
From: Josh Poimboeuf @ 2024-04-19 21:09 UTC (permalink / raw)
To: x86
Cc: linux-kernel, Linus Torvalds, Daniel Sneddon, Pawan Gupta,
Thomas Gleixner, Alexandre Chartre, Konrad Rzeszutek Wilk,
Peter Zijlstra, Greg Kroah-Hartman, Sean Christopherson,
Andrew Cooper, Dave Hansen, Nikolay Borisov, KP Singh,
Waiman Long, Borislav Petkov, Ingo Molnar
Duplicating the documentation of all the Spectre kernel cmdline options
in two separate places is unwieldy and error-prone. Instead just add a
reference to kernel-parameters.txt from spectre.rst.
Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org>
---
Documentation/admin-guide/hw-vuln/spectre.rst | 84 ++-----------------
1 file changed, 9 insertions(+), 75 deletions(-)
diff --git a/Documentation/admin-guide/hw-vuln/spectre.rst b/Documentation/admin-guide/hw-vuln/spectre.rst
index 25a04cda4c2c..f9797ab6b38f 100644
--- a/Documentation/admin-guide/hw-vuln/spectre.rst
+++ b/Documentation/admin-guide/hw-vuln/spectre.rst
@@ -592,85 +592,19 @@ Spectre variant 2
Mitigation control on the kernel command line
---------------------------------------------
-Spectre variant 2 mitigation can be disabled or force enabled at the
-kernel command line.
+In general the kernel selects reasonable default mitigations for the
+current CPU.
+
+Spectre default mitigations can be disabled or changed at the kernel
+command line with the following options:
nospectre_v1
-
- [X86,PPC] Disable mitigations for Spectre Variant 1
- (bounds check bypass). With this option data leaks are
- possible in the system.
-
nospectre_v2
+ spectre_v2={option}
+ spectre_v2_user={option}
+ spectre_bhi={option}
- [X86] Disable all mitigations for the Spectre variant 2
- (indirect branch prediction) vulnerability. System may
- allow data leaks with this option, which is equivalent
- to spectre_v2=off.
-
-
- spectre_v2=
-
- [X86] Control mitigation of Spectre variant 2
- (indirect branch speculation) vulnerability.
- The default operation protects the kernel from
- user space attacks.
-
- on
- unconditionally enable, implies
- spectre_v2_user=on
- off
- unconditionally disable, implies
- spectre_v2_user=off
- auto
- kernel detects whether your CPU model is
- vulnerable
-
- Selecting 'on' will, and 'auto' may, choose a
- mitigation method at run time according to the
- CPU, the available microcode, the setting of the
- CONFIG_MITIGATION_RETPOLINE configuration option,
- and the compiler with which the kernel was built.
-
- Selecting 'on' will also enable the mitigation
- against user space to user space task attacks.
-
- Selecting 'off' will disable both the kernel and
- the user space protections.
-
- Specific mitigations can also be selected manually:
-
- retpoline auto pick between generic,lfence
- retpoline,generic Retpolines
- retpoline,lfence LFENCE; indirect branch
- retpoline,amd alias for retpoline,lfence
- eibrs Enhanced/Auto IBRS
- eibrs,retpoline Enhanced/Auto IBRS + Retpolines
- eibrs,lfence Enhanced/Auto IBRS + LFENCE
- ibrs use IBRS to protect kernel
-
- Not specifying this option is equivalent to
- spectre_v2=auto.
-
- In general the kernel by default selects
- reasonable mitigations for the current CPU. To
- disable Spectre variant 2 mitigations, boot with
- spectre_v2=off. Spectre variant 1 mitigations
- cannot be disabled.
-
- spectre_bhi=
-
- [X86] Control mitigation of Branch History Injection
- (BHI) vulnerability. This setting affects the deployment
- of the HW BHI control and the SW BHB clearing sequence.
-
- on
- (default) Enable the HW or SW mitigation as
- needed.
- off
- Disable the mitigation.
-
-For spectre_v2_user see Documentation/admin-guide/kernel-parameters.txt
+For more details on the available options, refer to Documentation/admin-guide/kernel-parameters.txt
Mitigation selection guide
--------------------------
--
2.44.0
^ permalink raw reply related [flat|nested] 21+ messages in thread
* [PATCH v4 5/5] x86/bugs: Add 'spectre_bhi=vmexit' cmdline option
2024-04-19 21:09 [PATCH 0/5] x86/bugs: more BHI fixes Josh Poimboeuf
` (3 preceding siblings ...)
2024-04-19 21:09 ` [PATCH v4 4/5] x86/bugs: Remove duplicate Spectre cmdline option descriptions Josh Poimboeuf
@ 2024-04-19 21:09 ` Josh Poimboeuf
2024-04-19 21:46 ` Josh Poimboeuf
4 siblings, 1 reply; 21+ messages in thread
From: Josh Poimboeuf @ 2024-04-19 21:09 UTC (permalink / raw)
To: x86
Cc: linux-kernel, Linus Torvalds, Daniel Sneddon, Pawan Gupta,
Thomas Gleixner, Alexandre Chartre, Konrad Rzeszutek Wilk,
Peter Zijlstra, Greg Kroah-Hartman, Sean Christopherson,
Andrew Cooper, Dave Hansen, Nikolay Borisov, KP Singh,
Waiman Long, Borislav Petkov, Ingo Molnar, Maksim Davydov
In cloud environments it can be useful to *only* enable the vmexit
mitigation and leave syscalls vulnerable. Add that as an option.
This is similar to the old spectre_v2=auto option which was removed with
the following commit:
36d4fe147c87 ("x86/bugs: Remove CONFIG_BHI_MITIGATION_AUTO and spectre_bhi=auto")
with the main difference being that this has a more descriptive name and
is disabled by default.
Requested-by: Maksim Davydov <davydov-max@yandex-team.ru>
Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org>
---
Documentation/admin-guide/kernel-parameters.txt | 12 +++++++++---
arch/x86/kernel/cpu/bugs.c | 16 +++++++++++-----
2 files changed, 20 insertions(+), 8 deletions(-)
diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
index 902ecd92a29f..83c4889b88d2 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -6069,9 +6069,15 @@
deployment of the HW BHI control and the SW BHB
clearing sequence.
- on - (default) Enable the HW or SW mitigation
- as needed.
- off - Disable the mitigation.
+ on - (default) Enable the HW or SW mitigation as
+ needed. This protects the kernel from
+ both syscalls and VMs.
+ vmexit - On systems which don't have the HW mitigation
+ available, enable the SW mitigation on vmexit
+ ONLY. On such systems, the host kernel is
+ protected from VM-originated BHI attacks, but
+ may still be vulnerable to syscall attacks.
+ off - Disable the mitigation.
spectre_v2= [X86,EARLY] Control mitigation of Spectre variant 2
(indirect branch speculation) vulnerability.
diff --git a/arch/x86/kernel/cpu/bugs.c b/arch/x86/kernel/cpu/bugs.c
index 5fca46c78daf..575a4bb5a78d 100644
--- a/arch/x86/kernel/cpu/bugs.c
+++ b/arch/x86/kernel/cpu/bugs.c
@@ -1625,6 +1625,7 @@ static bool __init spec_ctrl_bhi_dis(void)
enum bhi_mitigations {
BHI_MITIGATION_OFF,
BHI_MITIGATION_ON,
+ BHI_MITIGATION_VMEXIT_ONLY,
};
static enum bhi_mitigations bhi_mitigation __ro_after_init =
@@ -1639,6 +1640,8 @@ static int __init spectre_bhi_parse_cmdline(char *str)
bhi_mitigation = BHI_MITIGATION_OFF;
else if (!strcmp(str, "on"))
bhi_mitigation = BHI_MITIGATION_ON;
+ else if (!strcmp(str, "vmexit"))
+ bhi_mitigation = BHI_MITIGATION_VMEXIT_ONLY;
else
pr_err("Ignoring unknown spectre_bhi option (%s)", str);
@@ -1659,6 +1662,7 @@ static void __init bhi_select_mitigation(void)
return;
}
+ /* Mitigate in hardware if supported */
if (spec_ctrl_bhi_dis())
return;
@@ -1671,13 +1675,15 @@ static void __init bhi_select_mitigation(void)
*/
setup_clear_cpu_cap(X86_FEATURE_INDIRECT_BRANCH_OK);
- /* Mitigate KVM by default */
- setup_force_cpu_cap(X86_FEATURE_CLEAR_BHB_LOOP_ON_VMEXIT);
- pr_info("Spectre BHI mitigation: SW BHB clearing on vm exit\n");
+ if (bhi_mitigation == BHI_MITIGATION_VMEXIT_ONLY) {
+ pr_info("Spectre BHI mitigation: SW BHB clearing on vm exit only\n");
+ setup_force_cpu_cap(X86_FEATURE_CLEAR_BHB_LOOP_ON_VMEXIT);
+ return;
+ }
- /* Mitigate syscalls when the mitigation is forced =on */
+ pr_info("Spectre BHI mitigation: SW BHB clearing on syscall and vm exit\n");
setup_force_cpu_cap(X86_FEATURE_CLEAR_BHB_LOOP);
- pr_info("Spectre BHI mitigation: SW BHB clearing on syscall\n");
+ setup_force_cpu_cap(X86_FEATURE_CLEAR_BHB_LOOP_ON_VMEXIT);
}
static void __init spectre_v2_select_mitigation(void)
--
2.44.0
^ permalink raw reply related [flat|nested] 21+ messages in thread
* Re: [PATCH v4 5/5] x86/bugs: Add 'spectre_bhi=vmexit' cmdline option
2024-04-19 21:09 ` [PATCH v4 5/5] x86/bugs: Add 'spectre_bhi=vmexit' cmdline option Josh Poimboeuf
@ 2024-04-19 21:46 ` Josh Poimboeuf
0 siblings, 0 replies; 21+ messages in thread
From: Josh Poimboeuf @ 2024-04-19 21:46 UTC (permalink / raw)
To: x86
Cc: linux-kernel, Linus Torvalds, Daniel Sneddon, Pawan Gupta,
Thomas Gleixner, Alexandre Chartre, Konrad Rzeszutek Wilk,
Peter Zijlstra, Greg Kroah-Hartman, Sean Christopherson,
Andrew Cooper, Dave Hansen, Nikolay Borisov, KP Singh,
Waiman Long, Borislav Petkov, Ingo Molnar, Maksim Davydov
On Fri, Apr 19, 2024 at 02:09:51PM -0700, Josh Poimboeuf wrote:
> In cloud environments it can be useful to *only* enable the vmexit
> mitigation and leave syscalls vulnerable. Add that as an option.
>
> This is similar to the old spectre_v2=auto option which was removed with
As Daniel pointed out this should be /v2/bhi/
--
Josh
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH v4 2/5] cpu/speculation: Fix CPU mitigation defaults for !x86
2024-04-19 21:09 ` [PATCH v4 2/5] cpu/speculation: Fix CPU mitigation defaults for !x86 Josh Poimboeuf
@ 2024-04-20 0:09 ` Sean Christopherson
2024-04-23 14:10 ` Sean Christopherson
0 siblings, 1 reply; 21+ messages in thread
From: Sean Christopherson @ 2024-04-20 0:09 UTC (permalink / raw)
To: Josh Poimboeuf
Cc: x86, linux-kernel, Linus Torvalds, Daniel Sneddon, Pawan Gupta,
Thomas Gleixner, Alexandre Chartre, Konrad Rzeszutek Wilk,
Peter Zijlstra, Greg Kroah-Hartman, Andrew Cooper, Dave Hansen,
Nikolay Borisov, KP Singh, Waiman Long, Borislav Petkov,
Ingo Molnar, Stephen Rothwell, Michael Ellerman,
Geert Uytterhoeven
On Fri, Apr 19, 2024, Josh Poimboeuf wrote:
> CPU speculative execution mitigations were inadvertently disabled on
> non-x86 arches by the following commit:
>
> f337a6a21e2f ("x86/cpu: Actually turn off mitigations by default for SPECULATION_MITIGATIONS=n")
>
> Fix it by replacing CONFIG_SPECULATION_MITIGATIONS with a new generic
> CONFIG_CPU_MITIGATIONS option and moving the x86-specific mitigations to
> a separate menu which depends on CONFIG_CPU_MITIGATIONS.
Ah drat, I didn't check my mailbox until after Cc'ing Linus my own version[*].
I don't have a strong preference between the two, though I do think it's worth
nothing that this will (obvioulsy) allow disabling mitigations at compile time
on all architectures, which may or may not be desirable.
[*] https://lore.kernel.org/all/20240420000556.2645001-2-seanjc@google.com
> Fixes: f337a6a21e2f ("x86/cpu: Actually turn off mitigations by default for SPECULATION_MITIGATIONS=n")
> Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
> Reported-by: Michael Ellerman <mpe@ellerman.id.au>
> Reported-by: Geert Uytterhoeven <geert@linux-m68k.org>
> Closes: https://lkml.kernel.org/r/20240413115324.53303a68%40canb.auug.org.au
> Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org>
> ---
> arch/Kconfig | 10 ++++++++++
> arch/x86/Kconfig | 15 +++------------
> kernel/cpu.c | 4 ++--
> 3 files changed, 15 insertions(+), 14 deletions(-)
>
> diff --git a/arch/Kconfig b/arch/Kconfig
> index 9f066785bb71..5c96849eb957 100644
> --- a/arch/Kconfig
> +++ b/arch/Kconfig
> @@ -11,6 +11,16 @@ source "arch/$(SRCARCH)/Kconfig"
>
> menu "General architecture-dependent options"
>
> +config CPU_MITIGATIONS
> + bool "Mitigations for CPU speculative execution vulnerabilities"
> + default y
> + help
> + Say Y here to enable mitigations for CPU speculative execution
> + vulnerabilities.
> +
> + If you say N, all mitigations will be disabled. You really
> + should know what you are doing to say so.
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH v4 3/5] x86/syscall: Mark exit[_group] syscall handlers __noreturn
2024-04-19 21:09 ` [PATCH v4 3/5] x86/syscall: Mark exit[_group] syscall handlers __noreturn Josh Poimboeuf
@ 2024-04-20 13:58 ` Paul E. McKenney
2024-04-21 5:25 ` Josh Poimboeuf
0 siblings, 1 reply; 21+ messages in thread
From: Paul E. McKenney @ 2024-04-20 13:58 UTC (permalink / raw)
To: Josh Poimboeuf
Cc: x86, linux-kernel, Linus Torvalds, Daniel Sneddon, Pawan Gupta,
Thomas Gleixner, Alexandre Chartre, Konrad Rzeszutek Wilk,
Peter Zijlstra, Greg Kroah-Hartman, Sean Christopherson,
Andrew Cooper, Dave Hansen, Nikolay Borisov, KP Singh,
Waiman Long, Borislav Petkov, Ingo Molnar
On Fri, Apr 19, 2024 at 02:09:49PM -0700, Josh Poimboeuf wrote:
> The direct-call syscall dispatch functions don't know that the exit()
> and exit_group() syscall handlers don't return. As a result the call
> sites aren't optimized accordingly.
>
> Fix that by marking those exit syscall declarations as __noreturn.
>
> Fixes the following warnings:
>
> vmlinux.o: warning: objtool: x64_sys_call+0x2804: __x64_sys_exit() is missing a __noreturn annotation
> vmlinux.o: warning: objtool: ia32_sys_call+0x29b6: __ia32_sys_exit_group() is missing a __noreturn annotation
>
> Fixes: 7390db8aea0d ("x86/bhi: Add support for clearing branch history at syscall entry")
> Reported-by: "Paul E. McKenney" <paulmck@kernel.org>
> Closes: https://lkml.kernel.org/lkml/6dba9b32-db2c-4e6d-9500-7a08852f17a3@paulmck-laptop
> Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org>
Looks good, but it does not apply on top of current -next and I don't
trust myself to hand-apply it (something about having just got off of
a flight across the big pond).
Could you please let me know what else do I need to pull in to be able
to cleanly apply this one?
Thanx, Paul
> ---
> arch/x86/entry/syscall_64.c | 4 ++++
> arch/x86/entry/syscall_x32.c | 4 ++++
> arch/x86/entry/syscalls/syscall_64.tbl | 6 +++---
> arch/x86/um/sys_call_table_32.c | 1 +
> arch/x86/um/sys_call_table_64.c | 1 +
> scripts/syscalltbl.sh | 6 ++++--
> tools/objtool/noreturns.h | 4 ++++
> 7 files changed, 21 insertions(+), 5 deletions(-)
>
> diff --git a/arch/x86/entry/syscall_64.c b/arch/x86/entry/syscall_64.c
> index 96ea1f8a1d3f..ff36a993a07e 100644
> --- a/arch/x86/entry/syscall_64.c
> +++ b/arch/x86/entry/syscall_64.c
> @@ -8,9 +8,13 @@
> #include <asm/syscall.h>
>
> #define __SYSCALL(nr, sym) extern long __x64_##sym(const struct pt_regs *);
> +#define __SYSCALL_NORETURN(nr, sym) extern long __noreturn __x64_##sym(const struct pt_regs *);
> #include <asm/syscalls_64.h>
> #undef __SYSCALL
>
> +#undef __SYSCALL_NORETURN
> +#define __SYSCALL_NORETURN __SYSCALL
> +
> #define __SYSCALL(nr, sym) __x64_##sym,
> const sys_call_ptr_t sys_call_table[] = {
> #include <asm/syscalls_64.h>
> diff --git a/arch/x86/entry/syscall_x32.c b/arch/x86/entry/syscall_x32.c
> index 5aef4230faca..4221ecce6e68 100644
> --- a/arch/x86/entry/syscall_x32.c
> +++ b/arch/x86/entry/syscall_x32.c
> @@ -8,9 +8,13 @@
> #include <asm/syscall.h>
>
> #define __SYSCALL(nr, sym) extern long __x64_##sym(const struct pt_regs *);
> +#define __SYSCALL_NORETURN(nr, sym) extern long __noreturn __x64_##sym(const struct pt_regs *);
> #include <asm/syscalls_x32.h>
> #undef __SYSCALL
>
> +#undef __SYSCALL_NORETURN
> +#define __SYSCALL_NORETURN __SYSCALL
> +
> #define __SYSCALL(nr, sym) __x64_##sym,
> const sys_call_ptr_t x32_sys_call_table[] = {
> #include <asm/syscalls_x32.h>
> diff --git a/arch/x86/entry/syscalls/syscall_64.tbl b/arch/x86/entry/syscalls/syscall_64.tbl
> index 7e8d46f4147f..6695105d21b5 100644
> --- a/arch/x86/entry/syscalls/syscall_64.tbl
> +++ b/arch/x86/entry/syscalls/syscall_64.tbl
> @@ -2,7 +2,7 @@
> # 64-bit system call numbers and entry vectors
> #
> # The format is:
> -# <number> <abi> <name> <entry point>
> +# <number> <abi> <name> <entry point> [0 noreturn]
> #
> # The __x64_sys_*() stubs are created on-the-fly for sys_*() system calls
> #
> @@ -68,7 +68,7 @@
> 57 common fork sys_fork
> 58 common vfork sys_vfork
> 59 64 execve sys_execve
> -60 common exit sys_exit
> +60 common exit sys_exit 0 noreturn
> 61 common wait4 sys_wait4
> 62 common kill sys_kill
> 63 common uname sys_newuname
> @@ -239,7 +239,7 @@
> 228 common clock_gettime sys_clock_gettime
> 229 common clock_getres sys_clock_getres
> 230 common clock_nanosleep sys_clock_nanosleep
> -231 common exit_group sys_exit_group
> +231 common exit_group sys_exit_group 0 noreturn
> 232 common epoll_wait sys_epoll_wait
> 233 common epoll_ctl sys_epoll_ctl
> 234 common tgkill sys_tgkill
> diff --git a/arch/x86/um/sys_call_table_32.c b/arch/x86/um/sys_call_table_32.c
> index 89df5d89d664..c7d4bf955d2b 100644
> --- a/arch/x86/um/sys_call_table_32.c
> +++ b/arch/x86/um/sys_call_table_32.c
> @@ -24,6 +24,7 @@
> #define __SYSCALL_WITH_COMPAT(nr, native, compat) __SYSCALL(nr, native)
>
> #define __SYSCALL(nr, sym) extern asmlinkage long sym(unsigned long, unsigned long, unsigned long, unsigned long, unsigned long, unsigned long);
> +#define __SYSCALL_NORETURN __SYSCALL
> #include <asm/syscalls_32.h>
>
> #undef __SYSCALL
> diff --git a/arch/x86/um/sys_call_table_64.c b/arch/x86/um/sys_call_table_64.c
> index b0b4cfd2308c..4760c40ae5cd 100644
> --- a/arch/x86/um/sys_call_table_64.c
> +++ b/arch/x86/um/sys_call_table_64.c
> @@ -19,6 +19,7 @@
> #define sys_ioperm sys_ni_syscall
>
> #define __SYSCALL(nr, sym) extern asmlinkage long sym(unsigned long, unsigned long, unsigned long, unsigned long, unsigned long, unsigned long);
> +#define __SYSCALL_NORETURN __SYSCALL
> #include <asm/syscalls_64.h>
>
> #undef __SYSCALL
> diff --git a/scripts/syscalltbl.sh b/scripts/syscalltbl.sh
> index 6abe143889ef..16487d47e06a 100755
> --- a/scripts/syscalltbl.sh
> +++ b/scripts/syscalltbl.sh
> @@ -54,7 +54,7 @@ nxt=0
>
> grep -E "^[0-9]+[[:space:]]+$abis" "$infile" | {
>
> - while read nr abi name native compat ; do
> + while read nr abi name native compat noreturn; do
>
> if [ $nxt -gt $nr ]; then
> echo "error: $infile: syscall table is not sorted or duplicates the same syscall number" >&2
> @@ -66,7 +66,9 @@ grep -E "^[0-9]+[[:space:]]+$abis" "$infile" | {
> nxt=$((nxt + 1))
> done
>
> - if [ -n "$compat" ]; then
> + if [ -n "$noreturn" ]; then
> + echo "__SYSCALL_NORETURN($nr, $native)"
> + elif [ -n "$compat" ]; then
> echo "__SYSCALL_WITH_COMPAT($nr, $native, $compat)"
> elif [ -n "$native" ]; then
> echo "__SYSCALL($nr, $native)"
> diff --git a/tools/objtool/noreturns.h b/tools/objtool/noreturns.h
> index 7ebf29c91184..1e8141ef1b15 100644
> --- a/tools/objtool/noreturns.h
> +++ b/tools/objtool/noreturns.h
> @@ -7,12 +7,16 @@
> * Yes, this is unfortunate. A better solution is in the works.
> */
> NORETURN(__fortify_panic)
> +NORETURN(__ia32_sys_exit)
> +NORETURN(__ia32_sys_exit_group)
> NORETURN(__kunit_abort)
> NORETURN(__module_put_and_kthread_exit)
> NORETURN(__reiserfs_panic)
> NORETURN(__stack_chk_fail)
> NORETURN(__tdx_hypercall_failed)
> NORETURN(__ubsan_handle_builtin_unreachable)
> +NORETURN(__x64_sys_exit)
> +NORETURN(__x64_sys_exit_group)
> NORETURN(arch_cpu_idle_dead)
> NORETURN(bch2_trans_in_restart_error)
> NORETURN(bch2_trans_restart_error)
> --
> 2.44.0
>
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH v4 3/5] x86/syscall: Mark exit[_group] syscall handlers __noreturn
2024-04-20 13:58 ` Paul E. McKenney
@ 2024-04-21 5:25 ` Josh Poimboeuf
2024-04-21 20:40 ` Paul McKenney
0 siblings, 1 reply; 21+ messages in thread
From: Josh Poimboeuf @ 2024-04-21 5:25 UTC (permalink / raw)
To: Paul E. McKenney
Cc: x86, linux-kernel, Linus Torvalds, Daniel Sneddon, Pawan Gupta,
Thomas Gleixner, Alexandre Chartre, Konrad Rzeszutek Wilk,
Peter Zijlstra, Greg Kroah-Hartman, Sean Christopherson,
Andrew Cooper, Dave Hansen, Nikolay Borisov, KP Singh,
Waiman Long, Borislav Petkov, Ingo Molnar
On Sat, Apr 20, 2024 at 06:58:58AM -0700, Paul E. McKenney wrote:
> On Fri, Apr 19, 2024 at 02:09:49PM -0700, Josh Poimboeuf wrote:
> > The direct-call syscall dispatch functions don't know that the exit()
> > and exit_group() syscall handlers don't return. As a result the call
> > sites aren't optimized accordingly.
> >
> > Fix that by marking those exit syscall declarations as __noreturn.
> >
> > Fixes the following warnings:
> >
> > vmlinux.o: warning: objtool: x64_sys_call+0x2804: __x64_sys_exit() is missing a __noreturn annotation
> > vmlinux.o: warning: objtool: ia32_sys_call+0x29b6: __ia32_sys_exit_group() is missing a __noreturn annotation
> >
> > Fixes: 7390db8aea0d ("x86/bhi: Add support for clearing branch history at syscall entry")
> > Reported-by: "Paul E. McKenney" <paulmck@kernel.org>
> > Closes: https://lkml.kernel.org/lkml/6dba9b32-db2c-4e6d-9500-7a08852f17a3@paulmck-laptop
> > Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org>
>
> Looks good, but it does not apply on top of current -next and I don't
> trust myself to hand-apply it (something about having just got off of
> a flight across the big pond).
>
> Could you please let me know what else do I need to pull in to be able
> to cleanly apply this one?
This patch has a dependency on an earlier patch in the set:
https://lkml.kernel.org/lkml/982d05a2f669140f26500bee643011896d661094.1713559768.git.jpoimboe@kernel.org
Though I think it's not a hard dependency and I could reverse the order
of the patches if needed.
--
Josh
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH v4 3/5] x86/syscall: Mark exit[_group] syscall handlers __noreturn
2024-04-21 5:25 ` Josh Poimboeuf
@ 2024-04-21 20:40 ` Paul McKenney
2024-04-21 21:47 ` Paul McKenney
0 siblings, 1 reply; 21+ messages in thread
From: Paul McKenney @ 2024-04-21 20:40 UTC (permalink / raw)
To: Josh Poimboeuf
Cc: Paul E. McKenney, x86, linux-kernel, Linus Torvalds,
Daniel Sneddon, Pawan Gupta, Thomas Gleixner, Alexandre Chartre,
Konrad Rzeszutek Wilk, Peter Zijlstra, Greg Kroah-Hartman,
Sean Christopherson, Andrew Cooper, Dave Hansen, Nikolay Borisov,
KP Singh, Waiman Long, Borislav Petkov, Ingo Molnar
They apply fine as is, so I have started tests with that pair of patches.
Thanx, Paul
On Sat, Apr 20, 2024 at 10:25 PM Josh Poimboeuf <jpoimboe@kernel.org> wrote:
>
> On Sat, Apr 20, 2024 at 06:58:58AM -0700, Paul E. McKenney wrote:
> > On Fri, Apr 19, 2024 at 02:09:49PM -0700, Josh Poimboeuf wrote:
> > > The direct-call syscall dispatch functions don't know that the exit()
> > > and exit_group() syscall handlers don't return. As a result the call
> > > sites aren't optimized accordingly.
> > >
> > > Fix that by marking those exit syscall declarations as __noreturn.
> > >
> > > Fixes the following warnings:
> > >
> > > vmlinux.o: warning: objtool: x64_sys_call+0x2804: __x64_sys_exit() is missing a __noreturn annotation
> > > vmlinux.o: warning: objtool: ia32_sys_call+0x29b6: __ia32_sys_exit_group() is missing a __noreturn annotation
> > >
> > > Fixes: 7390db8aea0d ("x86/bhi: Add support for clearing branch history at syscall entry")
> > > Reported-by: "Paul E. McKenney" <paulmck@kernel.org>
> > > Closes: https://lkml.kernel.org/lkml/6dba9b32-db2c-4e6d-9500-7a08852f17a3@paulmck-laptop
> > > Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org>
> >
> > Looks good, but it does not apply on top of current -next and I don't
> > trust myself to hand-apply it (something about having just got off of
> > a flight across the big pond).
> >
> > Could you please let me know what else do I need to pull in to be able
> > to cleanly apply this one?
>
> This patch has a dependency on an earlier patch in the set:
>
> https://lkml.kernel.org/lkml/982d05a2f669140f26500bee643011896d661094.1713559768.git.jpoimboe@kernel.org
>
> Though I think it's not a hard dependency and I could reverse the order
> of the patches if needed.
>
> --
> Josh
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH v4 3/5] x86/syscall: Mark exit[_group] syscall handlers __noreturn
2024-04-21 20:40 ` Paul McKenney
@ 2024-04-21 21:47 ` Paul McKenney
2024-05-02 23:48 ` Paul McKenney
0 siblings, 1 reply; 21+ messages in thread
From: Paul McKenney @ 2024-04-21 21:47 UTC (permalink / raw)
To: Josh Poimboeuf
Cc: Paul E. McKenney, x86, linux-kernel, Linus Torvalds,
Daniel Sneddon, Pawan Gupta, Thomas Gleixner, Alexandre Chartre,
Konrad Rzeszutek Wilk, Peter Zijlstra, Greg Kroah-Hartman,
Sean Christopherson, Andrew Cooper, Dave Hansen, Nikolay Borisov,
KP Singh, Waiman Long, Borislav Petkov, Ingo Molnar
[-- Attachment #1: Type: text/plain, Size: 14430 bytes --]
And this definitely helped, thank you!
However, this one still remains:
vmlinux.o: warning: objtool: ia32_sys_call+0x29b6:
__ia32_sys_exit_group() is missing a __noreturn annotation
Please see below for my diffs against next-20240419, in case I messed
something up.
I attached a copy as well, given that I am away from mutt, hence using
gmail directly.
Thanx, Paul
-----------------------------------------------------
diff --git a/arch/x86/entry/common.c b/arch/x86/entry/common.c
index 6de50b80702e6..9810ba2857a5c 100644
--- a/arch/x86/entry/common.c
+++ b/arch/x86/entry/common.c
@@ -49,7 +49,10 @@ static __always_inline bool do_syscall_x64(struct
pt_regs *regs, int nr)
if (likely(unr < NR_syscalls)) {
unr = array_index_nospec(unr, NR_syscalls);
- regs->ax = x64_sys_call(regs, unr);
+ if (likely(cpu_feature_enabled(X86_FEATURE_INDIRECT_BRANCH_OK)))
+ regs->ax = sys_call_table[unr](regs);
+ else
+ regs->ax = x64_sys_call(regs, unr);
return true;
}
return false;
@@ -66,7 +69,10 @@ static __always_inline bool do_syscall_x32(struct
pt_regs *regs, int nr)
if (IS_ENABLED(CONFIG_X86_X32_ABI) && likely(xnr < X32_NR_syscalls)) {
xnr = array_index_nospec(xnr, X32_NR_syscalls);
- regs->ax = x32_sys_call(regs, xnr);
+ if (likely(cpu_feature_enabled(X86_FEATURE_INDIRECT_BRANCH_OK)))
+ regs->ax = x32_sys_call_table[xnr](regs);
+ else
+ regs->ax = x32_sys_call(regs, xnr);
return true;
}
return false;
@@ -162,7 +168,10 @@ static __always_inline void
do_syscall_32_irqs_on(struct pt_regs *regs, int nr)
if (likely(unr < IA32_NR_syscalls)) {
unr = array_index_nospec(unr, IA32_NR_syscalls);
- regs->ax = ia32_sys_call(regs, unr);
+ if (likely(cpu_feature_enabled(X86_FEATURE_INDIRECT_BRANCH_OK)))
+ regs->ax = ia32_sys_call_table[unr](regs);
+ else
+ regs->ax = ia32_sys_call(regs, unr);
} else if (nr != -1) {
regs->ax = __ia32_sys_ni_syscall(regs);
}
diff --git a/arch/x86/entry/syscall_32.c b/arch/x86/entry/syscall_32.c
index c2235bae17ef6..aab31760b4e3e 100644
--- a/arch/x86/entry/syscall_32.c
+++ b/arch/x86/entry/syscall_32.c
@@ -14,25 +14,16 @@
#endif
#define __SYSCALL(nr, sym) extern long __ia32_##sym(const struct pt_regs *);
-
#include <asm/syscalls_32.h>
#undef __SYSCALL
-/*
- * The sys_call_table[] is no longer used for system calls, but
- * kernel/trace/trace_syscalls.c still wants to know the system
- * call address.
- */
-#ifdef CONFIG_X86_32
#define __SYSCALL(nr, sym) __ia32_##sym,
-const sys_call_ptr_t sys_call_table[] = {
+const sys_call_ptr_t ia32_sys_call_table[] = {
#include <asm/syscalls_32.h>
};
#undef __SYSCALL
-#endif
#define __SYSCALL(nr, sym) case nr: return __ia32_##sym(regs);
-
long ia32_sys_call(const struct pt_regs *regs, unsigned int nr)
{
switch (nr) {
diff --git a/arch/x86/entry/syscall_64.c b/arch/x86/entry/syscall_64.c
index 33b3f09e6f151..ff36a993a07e0 100644
--- a/arch/x86/entry/syscall_64.c
+++ b/arch/x86/entry/syscall_64.c
@@ -8,14 +8,13 @@
#include <asm/syscall.h>
#define __SYSCALL(nr, sym) extern long __x64_##sym(const struct pt_regs *);
+#define __SYSCALL_NORETURN(nr, sym) extern long __noreturn
__x64_##sym(const struct pt_regs *);
#include <asm/syscalls_64.h>
#undef __SYSCALL
-/*
- * The sys_call_table[] is no longer used for system calls, but
- * kernel/trace/trace_syscalls.c still wants to know the system
- * call address.
- */
+#undef __SYSCALL_NORETURN
+#define __SYSCALL_NORETURN __SYSCALL
+
#define __SYSCALL(nr, sym) __x64_##sym,
const sys_call_ptr_t sys_call_table[] = {
#include <asm/syscalls_64.h>
@@ -23,7 +22,6 @@ const sys_call_ptr_t sys_call_table[] = {
#undef __SYSCALL
#define __SYSCALL(nr, sym) case nr: return __x64_##sym(regs);
-
long x64_sys_call(const struct pt_regs *regs, unsigned int nr)
{
switch (nr) {
diff --git a/arch/x86/entry/syscall_x32.c b/arch/x86/entry/syscall_x32.c
index 03de4a9321318..4221ecce6e689 100644
--- a/arch/x86/entry/syscall_x32.c
+++ b/arch/x86/entry/syscall_x32.c
@@ -8,11 +8,20 @@
#include <asm/syscall.h>
#define __SYSCALL(nr, sym) extern long __x64_##sym(const struct pt_regs *);
+#define __SYSCALL_NORETURN(nr, sym) extern long __noreturn
__x64_##sym(const struct pt_regs *);
#include <asm/syscalls_x32.h>
#undef __SYSCALL
-#define __SYSCALL(nr, sym) case nr: return __x64_##sym(regs);
+#undef __SYSCALL_NORETURN
+#define __SYSCALL_NORETURN __SYSCALL
+
+#define __SYSCALL(nr, sym) __x64_##sym,
+const sys_call_ptr_t x32_sys_call_table[] = {
+#include <asm/syscalls_x32.h>
+};
+#undef __SYSCALL
+#define __SYSCALL(nr, sym) case nr: return __x64_##sym(regs);
long x32_sys_call(const struct pt_regs *regs, unsigned int nr)
{
switch (nr) {
diff --git a/arch/x86/entry/syscalls/syscall_64.tbl
b/arch/x86/entry/syscalls/syscall_64.tbl
index a396f6e6ab5bf..7ec68d94eb593 100644
--- a/arch/x86/entry/syscalls/syscall_64.tbl
+++ b/arch/x86/entry/syscalls/syscall_64.tbl
@@ -2,7 +2,7 @@
# 64-bit system call numbers and entry vectors
#
# The format is:
-# <number> <abi> <name> <entry point>
+# <number> <abi> <name> <entry point> [0 noreturn]
#
# The __x64_sys_*() stubs are created on-the-fly for sys_*() system calls
#
@@ -68,7 +68,7 @@
57 common fork sys_fork
58 common vfork sys_vfork
59 64 execve sys_execve
-60 common exit sys_exit
+60 common exit sys_exit 0 noreturn
61 common wait4 sys_wait4
62 common kill sys_kill
63 common uname sys_newuname
@@ -239,7 +239,7 @@
228 common clock_gettime sys_clock_gettime
229 common clock_getres sys_clock_getres
230 common clock_nanosleep sys_clock_nanosleep
-231 common exit_group sys_exit_group
+231 common exit_group sys_exit_group 0 noreturn
232 common epoll_wait sys_epoll_wait
233 common epoll_ctl sys_epoll_ctl
234 common tgkill sys_tgkill
diff --git a/arch/x86/include/asm/cpufeatures.h
b/arch/x86/include/asm/cpufeatures.h
index 3c7434329661c..d64b0a5291f10 100644
--- a/arch/x86/include/asm/cpufeatures.h
+++ b/arch/x86/include/asm/cpufeatures.h
@@ -470,6 +470,7 @@
#define X86_FEATURE_BHI_CTRL (21*32+ 2) /* "" BHI_DIS_S HW control available */
#define X86_FEATURE_CLEAR_BHB_HW (21*32+ 3) /* "" BHI_DIS_S HW
control enabled */
#define X86_FEATURE_CLEAR_BHB_LOOP_ON_VMEXIT (21*32+ 4) /* "" Clear
branch history at vmexit using SW loop */
+#define X86_FEATURE_INDIRECT_BRANCH_OK (21*32+ 5) /* "" It's OK to
use indirect branches */
/*
* BUG word(s)
diff --git a/arch/x86/include/asm/syscall.h b/arch/x86/include/asm/syscall.h
index 2fc7bc3863ff6..dfb59521244c2 100644
--- a/arch/x86/include/asm/syscall.h
+++ b/arch/x86/include/asm/syscall.h
@@ -16,14 +16,20 @@
#include <asm/thread_info.h> /* for TS_COMPAT */
#include <asm/unistd.h>
-/* This is used purely for kernel/trace/trace_syscalls.c */
typedef long (*sys_call_ptr_t)(const struct pt_regs *);
extern const sys_call_ptr_t sys_call_table[];
+#if defined(CONFIG_X86_32)
+#define ia32_sys_call_table sys_call_table
+#else
/*
* These may not exist, but still put the prototypes in so we
* can use IS_ENABLED().
*/
+extern const sys_call_ptr_t ia32_sys_call_table[];
+extern const sys_call_ptr_t x32_sys_call_table[];
+#endif
+
extern long ia32_sys_call(const struct pt_regs *, unsigned int nr);
extern long x32_sys_call(const struct pt_regs *, unsigned int nr);
extern long x64_sys_call(const struct pt_regs *, unsigned int nr);
diff --git a/arch/x86/kernel/cpu/bugs.c b/arch/x86/kernel/cpu/bugs.c
index ab18185894dfd..5fca46c78daf2 100644
--- a/arch/x86/kernel/cpu/bugs.c
+++ b/arch/x86/kernel/cpu/bugs.c
@@ -1665,6 +1665,12 @@ static void __init bhi_select_mitigation(void)
if (!IS_ENABLED(CONFIG_X86_64))
return;
+ /*
+ * There's no HW mitigation in place. Mark indirect branches as
+ * "not OK".
+ */
+ setup_clear_cpu_cap(X86_FEATURE_INDIRECT_BRANCH_OK);
+
/* Mitigate KVM by default */
setup_force_cpu_cap(X86_FEATURE_CLEAR_BHB_LOOP_ON_VMEXIT);
pr_info("Spectre BHI mitigation: SW BHB clearing on vm exit\n");
@@ -1679,6 +1685,28 @@ static void __init spectre_v2_select_mitigation(void)
enum spectre_v2_mitigation_cmd cmd = spectre_v2_parse_cmdline();
enum spectre_v2_mitigation mode = SPECTRE_V2_NONE;
+ /*
+ * X86_FEATURE_INDIRECT_BRANCH_OK indicates that indirect calls are
+ * "OK" to use due to (at least) one of the following being true:
+ *
+ * - the CPU isn't vulnerable to Spectre v2, BHI, etc;
+ *
+ * - a HW mitigation is in place (e.g., IBRS, eIBRS+BHI_DIS_S); or
+ *
+ * - the user disabled mitigations.
+ *
+ * Clearing the bit enables certain indirect branch "easy targets" [*]
+ * to be converted to a series of direct branches.
+ *
+ * Assume innocence until proven guilty: set it now and clear it later
+ * if/when needed.
+ *
+ * [*] The closer the indirect branch is to kernel entry, and the more
+ * user-controlled registers there are, the easier target it may be
+ * for future Spectre v2 variants.
+ */
+ setup_force_cpu_cap(X86_FEATURE_INDIRECT_BRANCH_OK);
+
/*
* If the CPU is not affected and the command line mode is NONE or AUTO
* then nothing to do.
@@ -1765,11 +1793,16 @@ static void __init spectre_v2_select_mitigation(void)
break;
case SPECTRE_V2_LFENCE:
+ setup_clear_cpu_cap(X86_FEATURE_INDIRECT_BRANCH_OK);
+ fallthrough;
case SPECTRE_V2_EIBRS_LFENCE:
setup_force_cpu_cap(X86_FEATURE_RETPOLINE_LFENCE);
- fallthrough;
+ setup_force_cpu_cap(X86_FEATURE_RETPOLINE);
+ break;
case SPECTRE_V2_RETPOLINE:
+ setup_clear_cpu_cap(X86_FEATURE_INDIRECT_BRANCH_OK);
+ fallthrough;
case SPECTRE_V2_EIBRS_RETPOLINE:
setup_force_cpu_cap(X86_FEATURE_RETPOLINE);
break;
diff --git a/arch/x86/um/sys_call_table_32.c b/arch/x86/um/sys_call_table_32.c
index 89df5d89d6640..c7d4bf955d2ba 100644
--- a/arch/x86/um/sys_call_table_32.c
+++ b/arch/x86/um/sys_call_table_32.c
@@ -24,6 +24,7 @@
#define __SYSCALL_WITH_COMPAT(nr, native, compat) __SYSCALL(nr, native)
#define __SYSCALL(nr, sym) extern asmlinkage long sym(unsigned long,
unsigned long, unsigned long, unsigned long, unsigned long, unsigned
long);
+#define __SYSCALL_NORETURN __SYSCALL
#include <asm/syscalls_32.h>
#undef __SYSCALL
diff --git a/arch/x86/um/sys_call_table_64.c b/arch/x86/um/sys_call_table_64.c
index b0b4cfd2308c8..4760c40ae5cd0 100644
--- a/arch/x86/um/sys_call_table_64.c
+++ b/arch/x86/um/sys_call_table_64.c
@@ -19,6 +19,7 @@
#define sys_ioperm sys_ni_syscall
#define __SYSCALL(nr, sym) extern asmlinkage long sym(unsigned long,
unsigned long, unsigned long, unsigned long, unsigned long, unsigned
long);
+#define __SYSCALL_NORETURN __SYSCALL
#include <asm/syscalls_64.h>
#undef __SYSCALL
diff --git a/scripts/syscalltbl.sh b/scripts/syscalltbl.sh
index 6abe143889ef6..16487d47e06a3 100755
--- a/scripts/syscalltbl.sh
+++ b/scripts/syscalltbl.sh
@@ -54,7 +54,7 @@ nxt=0
grep -E "^[0-9]+[[:space:]]+$abis" "$infile" | {
- while read nr abi name native compat ; do
+ while read nr abi name native compat noreturn; do
if [ $nxt -gt $nr ]; then
echo "error: $infile: syscall table is not sorted or duplicates the
same syscall number" >&2
@@ -66,7 +66,9 @@ grep -E "^[0-9]+[[:space:]]+$abis" "$infile" | {
nxt=$((nxt + 1))
done
- if [ -n "$compat" ]; then
+ if [ -n "$noreturn" ]; then
+ echo "__SYSCALL_NORETURN($nr, $native)"
+ elif [ -n "$compat" ]; then
echo "__SYSCALL_WITH_COMPAT($nr, $native, $compat)"
elif [ -n "$native" ]; then
echo "__SYSCALL($nr, $native)"
diff --git a/tools/objtool/noreturns.h b/tools/objtool/noreturns.h
index 7ebf29c911849..1e8141ef1b15d 100644
--- a/tools/objtool/noreturns.h
+++ b/tools/objtool/noreturns.h
@@ -7,12 +7,16 @@
* Yes, this is unfortunate. A better solution is in the works.
*/
NORETURN(__fortify_panic)
+NORETURN(__ia32_sys_exit)
+NORETURN(__ia32_sys_exit_group)
NORETURN(__kunit_abort)
NORETURN(__module_put_and_kthread_exit)
NORETURN(__reiserfs_panic)
NORETURN(__stack_chk_fail)
NORETURN(__tdx_hypercall_failed)
NORETURN(__ubsan_handle_builtin_unreachable)
+NORETURN(__x64_sys_exit)
+NORETURN(__x64_sys_exit_group)
NORETURN(arch_cpu_idle_dead)
NORETURN(bch2_trans_in_restart_error)
NORETURN(bch2_trans_restart_error)
On Sun, Apr 21, 2024 at 1:40 PM Paul McKenney <paulmckrcu@gmail.com> wrote:
>
> They apply fine as is, so I have started tests with that pair of patches.
>
> Thanx, Paul
>
> On Sat, Apr 20, 2024 at 10:25 PM Josh Poimboeuf <jpoimboe@kernel.org> wrote:
> >
> > On Sat, Apr 20, 2024 at 06:58:58AM -0700, Paul E. McKenney wrote:
> > > On Fri, Apr 19, 2024 at 02:09:49PM -0700, Josh Poimboeuf wrote:
> > > > The direct-call syscall dispatch functions don't know that the exit()
> > > > and exit_group() syscall handlers don't return. As a result the call
> > > > sites aren't optimized accordingly.
> > > >
> > > > Fix that by marking those exit syscall declarations as __noreturn.
> > > >
> > > > Fixes the following warnings:
> > > >
> > > > vmlinux.o: warning: objtool: x64_sys_call+0x2804: __x64_sys_exit() is missing a __noreturn annotation
> > > > vmlinux.o: warning: objtool: ia32_sys_call+0x29b6: __ia32_sys_exit_group() is missing a __noreturn annotation
> > > >
> > > > Fixes: 7390db8aea0d ("x86/bhi: Add support for clearing branch history at syscall entry")
> > > > Reported-by: "Paul E. McKenney" <paulmck@kernel.org>
> > > > Closes: https://lkml.kernel.org/lkml/6dba9b32-db2c-4e6d-9500-7a08852f17a3@paulmck-laptop
> > > > Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org>
> > >
> > > Looks good, but it does not apply on top of current -next and I don't
> > > trust myself to hand-apply it (something about having just got off of
> > > a flight across the big pond).
> > >
> > > Could you please let me know what else do I need to pull in to be able
> > > to cleanly apply this one?
> >
> > This patch has a dependency on an earlier patch in the set:
> >
> > https://lkml.kernel.org/lkml/982d05a2f669140f26500bee643011896d661094.1713559768.git.jpoimboe@kernel.org
> >
> > Though I think it's not a hard dependency and I could reverse the order
> > of the patches if needed.
> >
> > --
> > Josh
[-- Attachment #2: diffs --]
[-- Type: application/octet-stream, Size: 11722 bytes --]
diff --git a/arch/x86/entry/common.c b/arch/x86/entry/common.c
index 6de50b80702e6..9810ba2857a5c 100644
--- a/arch/x86/entry/common.c
+++ b/arch/x86/entry/common.c
@@ -49,7 +49,10 @@ static __always_inline bool do_syscall_x64(struct pt_regs *regs, int nr)
if (likely(unr < NR_syscalls)) {
unr = array_index_nospec(unr, NR_syscalls);
- regs->ax = x64_sys_call(regs, unr);
+ if (likely(cpu_feature_enabled(X86_FEATURE_INDIRECT_BRANCH_OK)))
+ regs->ax = sys_call_table[unr](regs);
+ else
+ regs->ax = x64_sys_call(regs, unr);
return true;
}
return false;
@@ -66,7 +69,10 @@ static __always_inline bool do_syscall_x32(struct pt_regs *regs, int nr)
if (IS_ENABLED(CONFIG_X86_X32_ABI) && likely(xnr < X32_NR_syscalls)) {
xnr = array_index_nospec(xnr, X32_NR_syscalls);
- regs->ax = x32_sys_call(regs, xnr);
+ if (likely(cpu_feature_enabled(X86_FEATURE_INDIRECT_BRANCH_OK)))
+ regs->ax = x32_sys_call_table[xnr](regs);
+ else
+ regs->ax = x32_sys_call(regs, xnr);
return true;
}
return false;
@@ -162,7 +168,10 @@ static __always_inline void do_syscall_32_irqs_on(struct pt_regs *regs, int nr)
if (likely(unr < IA32_NR_syscalls)) {
unr = array_index_nospec(unr, IA32_NR_syscalls);
- regs->ax = ia32_sys_call(regs, unr);
+ if (likely(cpu_feature_enabled(X86_FEATURE_INDIRECT_BRANCH_OK)))
+ regs->ax = ia32_sys_call_table[unr](regs);
+ else
+ regs->ax = ia32_sys_call(regs, unr);
} else if (nr != -1) {
regs->ax = __ia32_sys_ni_syscall(regs);
}
diff --git a/arch/x86/entry/syscall_32.c b/arch/x86/entry/syscall_32.c
index c2235bae17ef6..aab31760b4e3e 100644
--- a/arch/x86/entry/syscall_32.c
+++ b/arch/x86/entry/syscall_32.c
@@ -14,25 +14,16 @@
#endif
#define __SYSCALL(nr, sym) extern long __ia32_##sym(const struct pt_regs *);
-
#include <asm/syscalls_32.h>
#undef __SYSCALL
-/*
- * The sys_call_table[] is no longer used for system calls, but
- * kernel/trace/trace_syscalls.c still wants to know the system
- * call address.
- */
-#ifdef CONFIG_X86_32
#define __SYSCALL(nr, sym) __ia32_##sym,
-const sys_call_ptr_t sys_call_table[] = {
+const sys_call_ptr_t ia32_sys_call_table[] = {
#include <asm/syscalls_32.h>
};
#undef __SYSCALL
-#endif
#define __SYSCALL(nr, sym) case nr: return __ia32_##sym(regs);
-
long ia32_sys_call(const struct pt_regs *regs, unsigned int nr)
{
switch (nr) {
diff --git a/arch/x86/entry/syscall_64.c b/arch/x86/entry/syscall_64.c
index 33b3f09e6f151..ff36a993a07e0 100644
--- a/arch/x86/entry/syscall_64.c
+++ b/arch/x86/entry/syscall_64.c
@@ -8,14 +8,13 @@
#include <asm/syscall.h>
#define __SYSCALL(nr, sym) extern long __x64_##sym(const struct pt_regs *);
+#define __SYSCALL_NORETURN(nr, sym) extern long __noreturn __x64_##sym(const struct pt_regs *);
#include <asm/syscalls_64.h>
#undef __SYSCALL
-/*
- * The sys_call_table[] is no longer used for system calls, but
- * kernel/trace/trace_syscalls.c still wants to know the system
- * call address.
- */
+#undef __SYSCALL_NORETURN
+#define __SYSCALL_NORETURN __SYSCALL
+
#define __SYSCALL(nr, sym) __x64_##sym,
const sys_call_ptr_t sys_call_table[] = {
#include <asm/syscalls_64.h>
@@ -23,7 +22,6 @@ const sys_call_ptr_t sys_call_table[] = {
#undef __SYSCALL
#define __SYSCALL(nr, sym) case nr: return __x64_##sym(regs);
-
long x64_sys_call(const struct pt_regs *regs, unsigned int nr)
{
switch (nr) {
diff --git a/arch/x86/entry/syscall_x32.c b/arch/x86/entry/syscall_x32.c
index 03de4a9321318..4221ecce6e689 100644
--- a/arch/x86/entry/syscall_x32.c
+++ b/arch/x86/entry/syscall_x32.c
@@ -8,11 +8,20 @@
#include <asm/syscall.h>
#define __SYSCALL(nr, sym) extern long __x64_##sym(const struct pt_regs *);
+#define __SYSCALL_NORETURN(nr, sym) extern long __noreturn __x64_##sym(const struct pt_regs *);
#include <asm/syscalls_x32.h>
#undef __SYSCALL
-#define __SYSCALL(nr, sym) case nr: return __x64_##sym(regs);
+#undef __SYSCALL_NORETURN
+#define __SYSCALL_NORETURN __SYSCALL
+
+#define __SYSCALL(nr, sym) __x64_##sym,
+const sys_call_ptr_t x32_sys_call_table[] = {
+#include <asm/syscalls_x32.h>
+};
+#undef __SYSCALL
+#define __SYSCALL(nr, sym) case nr: return __x64_##sym(regs);
long x32_sys_call(const struct pt_regs *regs, unsigned int nr)
{
switch (nr) {
diff --git a/arch/x86/entry/syscalls/syscall_64.tbl b/arch/x86/entry/syscalls/syscall_64.tbl
index a396f6e6ab5bf..7ec68d94eb593 100644
--- a/arch/x86/entry/syscalls/syscall_64.tbl
+++ b/arch/x86/entry/syscalls/syscall_64.tbl
@@ -2,7 +2,7 @@
# 64-bit system call numbers and entry vectors
#
# The format is:
-# <number> <abi> <name> <entry point>
+# <number> <abi> <name> <entry point> [0 noreturn]
#
# The __x64_sys_*() stubs are created on-the-fly for sys_*() system calls
#
@@ -68,7 +68,7 @@
57 common fork sys_fork
58 common vfork sys_vfork
59 64 execve sys_execve
-60 common exit sys_exit
+60 common exit sys_exit 0 noreturn
61 common wait4 sys_wait4
62 common kill sys_kill
63 common uname sys_newuname
@@ -239,7 +239,7 @@
228 common clock_gettime sys_clock_gettime
229 common clock_getres sys_clock_getres
230 common clock_nanosleep sys_clock_nanosleep
-231 common exit_group sys_exit_group
+231 common exit_group sys_exit_group 0 noreturn
232 common epoll_wait sys_epoll_wait
233 common epoll_ctl sys_epoll_ctl
234 common tgkill sys_tgkill
diff --git a/arch/x86/include/asm/cpufeatures.h b/arch/x86/include/asm/cpufeatures.h
index 3c7434329661c..d64b0a5291f10 100644
--- a/arch/x86/include/asm/cpufeatures.h
+++ b/arch/x86/include/asm/cpufeatures.h
@@ -470,6 +470,7 @@
#define X86_FEATURE_BHI_CTRL (21*32+ 2) /* "" BHI_DIS_S HW control available */
#define X86_FEATURE_CLEAR_BHB_HW (21*32+ 3) /* "" BHI_DIS_S HW control enabled */
#define X86_FEATURE_CLEAR_BHB_LOOP_ON_VMEXIT (21*32+ 4) /* "" Clear branch history at vmexit using SW loop */
+#define X86_FEATURE_INDIRECT_BRANCH_OK (21*32+ 5) /* "" It's OK to use indirect branches */
/*
* BUG word(s)
diff --git a/arch/x86/include/asm/syscall.h b/arch/x86/include/asm/syscall.h
index 2fc7bc3863ff6..dfb59521244c2 100644
--- a/arch/x86/include/asm/syscall.h
+++ b/arch/x86/include/asm/syscall.h
@@ -16,14 +16,20 @@
#include <asm/thread_info.h> /* for TS_COMPAT */
#include <asm/unistd.h>
-/* This is used purely for kernel/trace/trace_syscalls.c */
typedef long (*sys_call_ptr_t)(const struct pt_regs *);
extern const sys_call_ptr_t sys_call_table[];
+#if defined(CONFIG_X86_32)
+#define ia32_sys_call_table sys_call_table
+#else
/*
* These may not exist, but still put the prototypes in so we
* can use IS_ENABLED().
*/
+extern const sys_call_ptr_t ia32_sys_call_table[];
+extern const sys_call_ptr_t x32_sys_call_table[];
+#endif
+
extern long ia32_sys_call(const struct pt_regs *, unsigned int nr);
extern long x32_sys_call(const struct pt_regs *, unsigned int nr);
extern long x64_sys_call(const struct pt_regs *, unsigned int nr);
diff --git a/arch/x86/kernel/cpu/bugs.c b/arch/x86/kernel/cpu/bugs.c
index ab18185894dfd..5fca46c78daf2 100644
--- a/arch/x86/kernel/cpu/bugs.c
+++ b/arch/x86/kernel/cpu/bugs.c
@@ -1665,6 +1665,12 @@ static void __init bhi_select_mitigation(void)
if (!IS_ENABLED(CONFIG_X86_64))
return;
+ /*
+ * There's no HW mitigation in place. Mark indirect branches as
+ * "not OK".
+ */
+ setup_clear_cpu_cap(X86_FEATURE_INDIRECT_BRANCH_OK);
+
/* Mitigate KVM by default */
setup_force_cpu_cap(X86_FEATURE_CLEAR_BHB_LOOP_ON_VMEXIT);
pr_info("Spectre BHI mitigation: SW BHB clearing on vm exit\n");
@@ -1679,6 +1685,28 @@ static void __init spectre_v2_select_mitigation(void)
enum spectre_v2_mitigation_cmd cmd = spectre_v2_parse_cmdline();
enum spectre_v2_mitigation mode = SPECTRE_V2_NONE;
+ /*
+ * X86_FEATURE_INDIRECT_BRANCH_OK indicates that indirect calls are
+ * "OK" to use due to (at least) one of the following being true:
+ *
+ * - the CPU isn't vulnerable to Spectre v2, BHI, etc;
+ *
+ * - a HW mitigation is in place (e.g., IBRS, eIBRS+BHI_DIS_S); or
+ *
+ * - the user disabled mitigations.
+ *
+ * Clearing the bit enables certain indirect branch "easy targets" [*]
+ * to be converted to a series of direct branches.
+ *
+ * Assume innocence until proven guilty: set it now and clear it later
+ * if/when needed.
+ *
+ * [*] The closer the indirect branch is to kernel entry, and the more
+ * user-controlled registers there are, the easier target it may be
+ * for future Spectre v2 variants.
+ */
+ setup_force_cpu_cap(X86_FEATURE_INDIRECT_BRANCH_OK);
+
/*
* If the CPU is not affected and the command line mode is NONE or AUTO
* then nothing to do.
@@ -1765,11 +1793,16 @@ static void __init spectre_v2_select_mitigation(void)
break;
case SPECTRE_V2_LFENCE:
+ setup_clear_cpu_cap(X86_FEATURE_INDIRECT_BRANCH_OK);
+ fallthrough;
case SPECTRE_V2_EIBRS_LFENCE:
setup_force_cpu_cap(X86_FEATURE_RETPOLINE_LFENCE);
- fallthrough;
+ setup_force_cpu_cap(X86_FEATURE_RETPOLINE);
+ break;
case SPECTRE_V2_RETPOLINE:
+ setup_clear_cpu_cap(X86_FEATURE_INDIRECT_BRANCH_OK);
+ fallthrough;
case SPECTRE_V2_EIBRS_RETPOLINE:
setup_force_cpu_cap(X86_FEATURE_RETPOLINE);
break;
diff --git a/arch/x86/um/sys_call_table_32.c b/arch/x86/um/sys_call_table_32.c
index 89df5d89d6640..c7d4bf955d2ba 100644
--- a/arch/x86/um/sys_call_table_32.c
+++ b/arch/x86/um/sys_call_table_32.c
@@ -24,6 +24,7 @@
#define __SYSCALL_WITH_COMPAT(nr, native, compat) __SYSCALL(nr, native)
#define __SYSCALL(nr, sym) extern asmlinkage long sym(unsigned long, unsigned long, unsigned long, unsigned long, unsigned long, unsigned long);
+#define __SYSCALL_NORETURN __SYSCALL
#include <asm/syscalls_32.h>
#undef __SYSCALL
diff --git a/arch/x86/um/sys_call_table_64.c b/arch/x86/um/sys_call_table_64.c
index b0b4cfd2308c8..4760c40ae5cd0 100644
--- a/arch/x86/um/sys_call_table_64.c
+++ b/arch/x86/um/sys_call_table_64.c
@@ -19,6 +19,7 @@
#define sys_ioperm sys_ni_syscall
#define __SYSCALL(nr, sym) extern asmlinkage long sym(unsigned long, unsigned long, unsigned long, unsigned long, unsigned long, unsigned long);
+#define __SYSCALL_NORETURN __SYSCALL
#include <asm/syscalls_64.h>
#undef __SYSCALL
diff --git a/scripts/syscalltbl.sh b/scripts/syscalltbl.sh
index 6abe143889ef6..16487d47e06a3 100755
--- a/scripts/syscalltbl.sh
+++ b/scripts/syscalltbl.sh
@@ -54,7 +54,7 @@ nxt=0
grep -E "^[0-9]+[[:space:]]+$abis" "$infile" | {
- while read nr abi name native compat ; do
+ while read nr abi name native compat noreturn; do
if [ $nxt -gt $nr ]; then
echo "error: $infile: syscall table is not sorted or duplicates the same syscall number" >&2
@@ -66,7 +66,9 @@ grep -E "^[0-9]+[[:space:]]+$abis" "$infile" | {
nxt=$((nxt + 1))
done
- if [ -n "$compat" ]; then
+ if [ -n "$noreturn" ]; then
+ echo "__SYSCALL_NORETURN($nr, $native)"
+ elif [ -n "$compat" ]; then
echo "__SYSCALL_WITH_COMPAT($nr, $native, $compat)"
elif [ -n "$native" ]; then
echo "__SYSCALL($nr, $native)"
diff --git a/tools/objtool/noreturns.h b/tools/objtool/noreturns.h
index 7ebf29c911849..1e8141ef1b15d 100644
--- a/tools/objtool/noreturns.h
+++ b/tools/objtool/noreturns.h
@@ -7,12 +7,16 @@
* Yes, this is unfortunate. A better solution is in the works.
*/
NORETURN(__fortify_panic)
+NORETURN(__ia32_sys_exit)
+NORETURN(__ia32_sys_exit_group)
NORETURN(__kunit_abort)
NORETURN(__module_put_and_kthread_exit)
NORETURN(__reiserfs_panic)
NORETURN(__stack_chk_fail)
NORETURN(__tdx_hypercall_failed)
NORETURN(__ubsan_handle_builtin_unreachable)
+NORETURN(__x64_sys_exit)
+NORETURN(__x64_sys_exit_group)
NORETURN(arch_cpu_idle_dead)
NORETURN(bch2_trans_in_restart_error)
NORETURN(bch2_trans_restart_error)
^ permalink raw reply related [flat|nested] 21+ messages in thread
* Re: [PATCH v4 1/5] x86/bugs: Only harden syscalls when needed
2024-04-19 21:09 ` [PATCH v4 1/5] x86/bugs: Only harden syscalls when needed Josh Poimboeuf
@ 2024-04-22 8:09 ` Yujie Liu
0 siblings, 0 replies; 21+ messages in thread
From: Yujie Liu @ 2024-04-22 8:09 UTC (permalink / raw)
To: Josh Poimboeuf
Cc: x86, linux-kernel, Linus Torvalds, Daniel Sneddon, Pawan Gupta,
Thomas Gleixner, Alexandre Chartre, Konrad Rzeszutek Wilk,
Peter Zijlstra, Greg Kroah-Hartman, Sean Christopherson,
Andrew Cooper, Dave Hansen, Nikolay Borisov, KP Singh,
Waiman Long, Borislav Petkov, Ingo Molnar
On Fri, Apr 19, 2024 at 02:09:47PM -0700, Josh Poimboeuf wrote:
> Syscall hardening (converting the syscall indirect branch to a series of
> direct branches) has shown some performance regressions:
>
> - Red Hat internal testing showed up to 12% slowdowns in database
> benchmark testing on Sapphire Rapids when the DB was stressed with 80+
> users to cause contention.
>
> - The kernel test robot's will-it-scale benchmarks showed significant
> regressions on Skylake with IBRS:
> https://lkml.kernel.org/lkml/202404191333.178a0eed-yujie.liu@intel.com
To clarify, we reported a +1.4% improvement (not regression) of
will-it-scale futex4 benchmark on Skylake. Meanwhile we did observe some
regressions by running other benchmarks on Ice Lake, such as:
stress-ng.null.ops_per_sec -4.0% regression on Intel Xeon Gold 6346 (Ice Lake)
unixbench.fsbuffer.throughput -1.4% regression on Intel Xeon Gold 6346 (Ice Lake)
>
> To fix those slowdowns, only use the syscall direct branches when
> indirect branches are considered to be "not OK": meaning Spectre v2+BHI
> isn't mitigated by HW and the user hasn't disabled mitigations.
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH v4 2/5] cpu/speculation: Fix CPU mitigation defaults for !x86
2024-04-20 0:09 ` Sean Christopherson
@ 2024-04-23 14:10 ` Sean Christopherson
2024-04-24 5:35 ` Josh Poimboeuf
0 siblings, 1 reply; 21+ messages in thread
From: Sean Christopherson @ 2024-04-23 14:10 UTC (permalink / raw)
To: Josh Poimboeuf
Cc: x86, linux-kernel, Linus Torvalds, Daniel Sneddon, Pawan Gupta,
Thomas Gleixner, Alexandre Chartre, Konrad Rzeszutek Wilk,
Peter Zijlstra, Greg Kroah-Hartman, Andrew Cooper, Dave Hansen,
Nikolay Borisov, KP Singh, Waiman Long, Borislav Petkov,
Ingo Molnar, Stephen Rothwell, Michael Ellerman,
Geert Uytterhoeven
On Fri, Apr 19, 2024, Sean Christopherson wrote:
> On Fri, Apr 19, 2024, Josh Poimboeuf wrote:
> > CPU speculative execution mitigations were inadvertently disabled on
> > non-x86 arches by the following commit:
> >
> > f337a6a21e2f ("x86/cpu: Actually turn off mitigations by default for SPECULATION_MITIGATIONS=n")
> >
> > Fix it by replacing CONFIG_SPECULATION_MITIGATIONS with a new generic
> > CONFIG_CPU_MITIGATIONS option and moving the x86-specific mitigations to
> > a separate menu which depends on CONFIG_CPU_MITIGATIONS.
>
> Ah drat, I didn't check my mailbox until after Cc'ing Linus my own version[*].
>
> I don't have a strong preference between the two, though I do think it's worth
> nothing that this will (obvioulsy) allow disabling mitigations at compile time
> on all architectures, which may or may not be desirable.
>
> [*] https://lore.kernel.org/all/20240420000556.2645001-2-seanjc@google.com
Josh, when you get a chance, can you weigh in on my menu-preserving approach?
I want to get this resolved asap so that we're not scrambing on Friday again :-)
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH v4 2/5] cpu/speculation: Fix CPU mitigation defaults for !x86
2024-04-23 14:10 ` Sean Christopherson
@ 2024-04-24 5:35 ` Josh Poimboeuf
0 siblings, 0 replies; 21+ messages in thread
From: Josh Poimboeuf @ 2024-04-24 5:35 UTC (permalink / raw)
To: Sean Christopherson
Cc: x86, linux-kernel, Linus Torvalds, Daniel Sneddon, Pawan Gupta,
Thomas Gleixner, Alexandre Chartre, Konrad Rzeszutek Wilk,
Peter Zijlstra, Greg Kroah-Hartman, Andrew Cooper, Dave Hansen,
Nikolay Borisov, KP Singh, Waiman Long, Borislav Petkov,
Ingo Molnar, Stephen Rothwell, Michael Ellerman,
Geert Uytterhoeven
On Tue, Apr 23, 2024 at 07:10:23AM -0700, Sean Christopherson wrote:
> On Fri, Apr 19, 2024, Sean Christopherson wrote:
> > On Fri, Apr 19, 2024, Josh Poimboeuf wrote:
> > > CPU speculative execution mitigations were inadvertently disabled on
> > > non-x86 arches by the following commit:
> > >
> > > f337a6a21e2f ("x86/cpu: Actually turn off mitigations by default for SPECULATION_MITIGATIONS=n")
> > >
> > > Fix it by replacing CONFIG_SPECULATION_MITIGATIONS with a new generic
> > > CONFIG_CPU_MITIGATIONS option and moving the x86-specific mitigations to
> > > a separate menu which depends on CONFIG_CPU_MITIGATIONS.
> >
> > Ah drat, I didn't check my mailbox until after Cc'ing Linus my own version[*].
> >
> > I don't have a strong preference between the two, though I do think it's worth
> > nothing that this will (obvioulsy) allow disabling mitigations at compile time
> > on all architectures, which may or may not be desirable.
> >
> > [*] https://lore.kernel.org/all/20240420000556.2645001-2-seanjc@google.com
>
> Josh, when you get a chance, can you weigh in on my menu-preserving approach?
>
> I want to get this resolved asap so that we're not scrambing on Friday again :-)
Yeah, yours looks good. Lemme go ack.
--
Josh
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH v4 3/5] x86/syscall: Mark exit[_group] syscall handlers __noreturn
2024-04-21 21:47 ` Paul McKenney
@ 2024-05-02 23:48 ` Paul McKenney
2024-05-03 15:38 ` Paul E. McKenney
2024-05-03 19:56 ` Josh Poimboeuf
0 siblings, 2 replies; 21+ messages in thread
From: Paul McKenney @ 2024-05-02 23:48 UTC (permalink / raw)
To: Josh Poimboeuf
Cc: Paul E. McKenney, x86, linux-kernel, Linus Torvalds,
Daniel Sneddon, Pawan Gupta, Thomas Gleixner, Alexandre Chartre,
Konrad Rzeszutek Wilk, Peter Zijlstra, Greg Kroah-Hartman,
Sean Christopherson, Andrew Cooper, Dave Hansen, Nikolay Borisov,
KP Singh, Waiman Long, Borislav Petkov, Ingo Molnar
On Sun, Apr 21, 2024 at 2:47 PM Paul McKenney <paulmckrcu@gmail.com> wrote:
>
> And this definitely helped, thank you!
>
> However, this one still remains:
>
> vmlinux.o: warning: objtool: ia32_sys_call+0x29b6:
> __ia32_sys_exit_group() is missing a __noreturn annotation
And looking at the patched code, this function looks to me to be
correctly marked.
No idea... :-/
Thanx, Paul
> Please see below for my diffs against next-20240419, in case I messed
> something up.
>
> I attached a copy as well, given that I am away from mutt, hence using
> gmail directly.
>
> Thanx, Paul
>
> -----------------------------------------------------
>
> diff --git a/arch/x86/entry/common.c b/arch/x86/entry/common.c
> index 6de50b80702e6..9810ba2857a5c 100644
> --- a/arch/x86/entry/common.c
> +++ b/arch/x86/entry/common.c
> @@ -49,7 +49,10 @@ static __always_inline bool do_syscall_x64(struct
> pt_regs *regs, int nr)
>
> if (likely(unr < NR_syscalls)) {
> unr = array_index_nospec(unr, NR_syscalls);
> - regs->ax = x64_sys_call(regs, unr);
> + if (likely(cpu_feature_enabled(X86_FEATURE_INDIRECT_BRANCH_OK)))
> + regs->ax = sys_call_table[unr](regs);
> + else
> + regs->ax = x64_sys_call(regs, unr);
> return true;
> }
> return false;
> @@ -66,7 +69,10 @@ static __always_inline bool do_syscall_x32(struct
> pt_regs *regs, int nr)
>
> if (IS_ENABLED(CONFIG_X86_X32_ABI) && likely(xnr < X32_NR_syscalls)) {
> xnr = array_index_nospec(xnr, X32_NR_syscalls);
> - regs->ax = x32_sys_call(regs, xnr);
> + if (likely(cpu_feature_enabled(X86_FEATURE_INDIRECT_BRANCH_OK)))
> + regs->ax = x32_sys_call_table[xnr](regs);
> + else
> + regs->ax = x32_sys_call(regs, xnr);
> return true;
> }
> return false;
> @@ -162,7 +168,10 @@ static __always_inline void
> do_syscall_32_irqs_on(struct pt_regs *regs, int nr)
>
> if (likely(unr < IA32_NR_syscalls)) {
> unr = array_index_nospec(unr, IA32_NR_syscalls);
> - regs->ax = ia32_sys_call(regs, unr);
> + if (likely(cpu_feature_enabled(X86_FEATURE_INDIRECT_BRANCH_OK)))
> + regs->ax = ia32_sys_call_table[unr](regs);
> + else
> + regs->ax = ia32_sys_call(regs, unr);
> } else if (nr != -1) {
> regs->ax = __ia32_sys_ni_syscall(regs);
> }
> diff --git a/arch/x86/entry/syscall_32.c b/arch/x86/entry/syscall_32.c
> index c2235bae17ef6..aab31760b4e3e 100644
> --- a/arch/x86/entry/syscall_32.c
> +++ b/arch/x86/entry/syscall_32.c
> @@ -14,25 +14,16 @@
> #endif
>
> #define __SYSCALL(nr, sym) extern long __ia32_##sym(const struct pt_regs *);
> -
> #include <asm/syscalls_32.h>
> #undef __SYSCALL
>
> -/*
> - * The sys_call_table[] is no longer used for system calls, but
> - * kernel/trace/trace_syscalls.c still wants to know the system
> - * call address.
> - */
> -#ifdef CONFIG_X86_32
> #define __SYSCALL(nr, sym) __ia32_##sym,
> -const sys_call_ptr_t sys_call_table[] = {
> +const sys_call_ptr_t ia32_sys_call_table[] = {
> #include <asm/syscalls_32.h>
> };
> #undef __SYSCALL
> -#endif
>
> #define __SYSCALL(nr, sym) case nr: return __ia32_##sym(regs);
> -
> long ia32_sys_call(const struct pt_regs *regs, unsigned int nr)
> {
> switch (nr) {
> diff --git a/arch/x86/entry/syscall_64.c b/arch/x86/entry/syscall_64.c
> index 33b3f09e6f151..ff36a993a07e0 100644
> --- a/arch/x86/entry/syscall_64.c
> +++ b/arch/x86/entry/syscall_64.c
> @@ -8,14 +8,13 @@
> #include <asm/syscall.h>
>
> #define __SYSCALL(nr, sym) extern long __x64_##sym(const struct pt_regs *);
> +#define __SYSCALL_NORETURN(nr, sym) extern long __noreturn
> __x64_##sym(const struct pt_regs *);
> #include <asm/syscalls_64.h>
> #undef __SYSCALL
>
> -/*
> - * The sys_call_table[] is no longer used for system calls, but
> - * kernel/trace/trace_syscalls.c still wants to know the system
> - * call address.
> - */
> +#undef __SYSCALL_NORETURN
> +#define __SYSCALL_NORETURN __SYSCALL
> +
> #define __SYSCALL(nr, sym) __x64_##sym,
> const sys_call_ptr_t sys_call_table[] = {
> #include <asm/syscalls_64.h>
> @@ -23,7 +22,6 @@ const sys_call_ptr_t sys_call_table[] = {
> #undef __SYSCALL
>
> #define __SYSCALL(nr, sym) case nr: return __x64_##sym(regs);
> -
> long x64_sys_call(const struct pt_regs *regs, unsigned int nr)
> {
> switch (nr) {
> diff --git a/arch/x86/entry/syscall_x32.c b/arch/x86/entry/syscall_x32.c
> index 03de4a9321318..4221ecce6e689 100644
> --- a/arch/x86/entry/syscall_x32.c
> +++ b/arch/x86/entry/syscall_x32.c
> @@ -8,11 +8,20 @@
> #include <asm/syscall.h>
>
> #define __SYSCALL(nr, sym) extern long __x64_##sym(const struct pt_regs *);
> +#define __SYSCALL_NORETURN(nr, sym) extern long __noreturn
> __x64_##sym(const struct pt_regs *);
> #include <asm/syscalls_x32.h>
> #undef __SYSCALL
>
> -#define __SYSCALL(nr, sym) case nr: return __x64_##sym(regs);
> +#undef __SYSCALL_NORETURN
> +#define __SYSCALL_NORETURN __SYSCALL
> +
> +#define __SYSCALL(nr, sym) __x64_##sym,
> +const sys_call_ptr_t x32_sys_call_table[] = {
> +#include <asm/syscalls_x32.h>
> +};
> +#undef __SYSCALL
>
> +#define __SYSCALL(nr, sym) case nr: return __x64_##sym(regs);
> long x32_sys_call(const struct pt_regs *regs, unsigned int nr)
> {
> switch (nr) {
> diff --git a/arch/x86/entry/syscalls/syscall_64.tbl
> b/arch/x86/entry/syscalls/syscall_64.tbl
> index a396f6e6ab5bf..7ec68d94eb593 100644
> --- a/arch/x86/entry/syscalls/syscall_64.tbl
> +++ b/arch/x86/entry/syscalls/syscall_64.tbl
> @@ -2,7 +2,7 @@
> # 64-bit system call numbers and entry vectors
> #
> # The format is:
> -# <number> <abi> <name> <entry point>
> +# <number> <abi> <name> <entry point> [0 noreturn]
> #
> # The __x64_sys_*() stubs are created on-the-fly for sys_*() system calls
> #
> @@ -68,7 +68,7 @@
> 57 common fork sys_fork
> 58 common vfork sys_vfork
> 59 64 execve sys_execve
> -60 common exit sys_exit
> +60 common exit sys_exit 0 noreturn
> 61 common wait4 sys_wait4
> 62 common kill sys_kill
> 63 common uname sys_newuname
> @@ -239,7 +239,7 @@
> 228 common clock_gettime sys_clock_gettime
> 229 common clock_getres sys_clock_getres
> 230 common clock_nanosleep sys_clock_nanosleep
> -231 common exit_group sys_exit_group
> +231 common exit_group sys_exit_group 0 noreturn
> 232 common epoll_wait sys_epoll_wait
> 233 common epoll_ctl sys_epoll_ctl
> 234 common tgkill sys_tgkill
> diff --git a/arch/x86/include/asm/cpufeatures.h
> b/arch/x86/include/asm/cpufeatures.h
> index 3c7434329661c..d64b0a5291f10 100644
> --- a/arch/x86/include/asm/cpufeatures.h
> +++ b/arch/x86/include/asm/cpufeatures.h
> @@ -470,6 +470,7 @@
> #define X86_FEATURE_BHI_CTRL (21*32+ 2) /* "" BHI_DIS_S HW control available */
> #define X86_FEATURE_CLEAR_BHB_HW (21*32+ 3) /* "" BHI_DIS_S HW
> control enabled */
> #define X86_FEATURE_CLEAR_BHB_LOOP_ON_VMEXIT (21*32+ 4) /* "" Clear
> branch history at vmexit using SW loop */
> +#define X86_FEATURE_INDIRECT_BRANCH_OK (21*32+ 5) /* "" It's OK to
> use indirect branches */
>
> /*
> * BUG word(s)
> diff --git a/arch/x86/include/asm/syscall.h b/arch/x86/include/asm/syscall.h
> index 2fc7bc3863ff6..dfb59521244c2 100644
> --- a/arch/x86/include/asm/syscall.h
> +++ b/arch/x86/include/asm/syscall.h
> @@ -16,14 +16,20 @@
> #include <asm/thread_info.h> /* for TS_COMPAT */
> #include <asm/unistd.h>
>
> -/* This is used purely for kernel/trace/trace_syscalls.c */
> typedef long (*sys_call_ptr_t)(const struct pt_regs *);
> extern const sys_call_ptr_t sys_call_table[];
>
> +#if defined(CONFIG_X86_32)
> +#define ia32_sys_call_table sys_call_table
> +#else
> /*
> * These may not exist, but still put the prototypes in so we
> * can use IS_ENABLED().
> */
> +extern const sys_call_ptr_t ia32_sys_call_table[];
> +extern const sys_call_ptr_t x32_sys_call_table[];
> +#endif
> +
> extern long ia32_sys_call(const struct pt_regs *, unsigned int nr);
> extern long x32_sys_call(const struct pt_regs *, unsigned int nr);
> extern long x64_sys_call(const struct pt_regs *, unsigned int nr);
> diff --git a/arch/x86/kernel/cpu/bugs.c b/arch/x86/kernel/cpu/bugs.c
> index ab18185894dfd..5fca46c78daf2 100644
> --- a/arch/x86/kernel/cpu/bugs.c
> +++ b/arch/x86/kernel/cpu/bugs.c
> @@ -1665,6 +1665,12 @@ static void __init bhi_select_mitigation(void)
> if (!IS_ENABLED(CONFIG_X86_64))
> return;
>
> + /*
> + * There's no HW mitigation in place. Mark indirect branches as
> + * "not OK".
> + */
> + setup_clear_cpu_cap(X86_FEATURE_INDIRECT_BRANCH_OK);
> +
> /* Mitigate KVM by default */
> setup_force_cpu_cap(X86_FEATURE_CLEAR_BHB_LOOP_ON_VMEXIT);
> pr_info("Spectre BHI mitigation: SW BHB clearing on vm exit\n");
> @@ -1679,6 +1685,28 @@ static void __init spectre_v2_select_mitigation(void)
> enum spectre_v2_mitigation_cmd cmd = spectre_v2_parse_cmdline();
> enum spectre_v2_mitigation mode = SPECTRE_V2_NONE;
>
> + /*
> + * X86_FEATURE_INDIRECT_BRANCH_OK indicates that indirect calls are
> + * "OK" to use due to (at least) one of the following being true:
> + *
> + * - the CPU isn't vulnerable to Spectre v2, BHI, etc;
> + *
> + * - a HW mitigation is in place (e.g., IBRS, eIBRS+BHI_DIS_S); or
> + *
> + * - the user disabled mitigations.
> + *
> + * Clearing the bit enables certain indirect branch "easy targets" [*]
> + * to be converted to a series of direct branches.
> + *
> + * Assume innocence until proven guilty: set it now and clear it later
> + * if/when needed.
> + *
> + * [*] The closer the indirect branch is to kernel entry, and the more
> + * user-controlled registers there are, the easier target it may be
> + * for future Spectre v2 variants.
> + */
> + setup_force_cpu_cap(X86_FEATURE_INDIRECT_BRANCH_OK);
> +
> /*
> * If the CPU is not affected and the command line mode is NONE or AUTO
> * then nothing to do.
> @@ -1765,11 +1793,16 @@ static void __init spectre_v2_select_mitigation(void)
> break;
>
> case SPECTRE_V2_LFENCE:
> + setup_clear_cpu_cap(X86_FEATURE_INDIRECT_BRANCH_OK);
> + fallthrough;
> case SPECTRE_V2_EIBRS_LFENCE:
> setup_force_cpu_cap(X86_FEATURE_RETPOLINE_LFENCE);
> - fallthrough;
> + setup_force_cpu_cap(X86_FEATURE_RETPOLINE);
> + break;
>
> case SPECTRE_V2_RETPOLINE:
> + setup_clear_cpu_cap(X86_FEATURE_INDIRECT_BRANCH_OK);
> + fallthrough;
> case SPECTRE_V2_EIBRS_RETPOLINE:
> setup_force_cpu_cap(X86_FEATURE_RETPOLINE);
> break;
> diff --git a/arch/x86/um/sys_call_table_32.c b/arch/x86/um/sys_call_table_32.c
> index 89df5d89d6640..c7d4bf955d2ba 100644
> --- a/arch/x86/um/sys_call_table_32.c
> +++ b/arch/x86/um/sys_call_table_32.c
> @@ -24,6 +24,7 @@
> #define __SYSCALL_WITH_COMPAT(nr, native, compat) __SYSCALL(nr, native)
>
> #define __SYSCALL(nr, sym) extern asmlinkage long sym(unsigned long,
> unsigned long, unsigned long, unsigned long, unsigned long, unsigned
> long);
> +#define __SYSCALL_NORETURN __SYSCALL
> #include <asm/syscalls_32.h>
>
> #undef __SYSCALL
> diff --git a/arch/x86/um/sys_call_table_64.c b/arch/x86/um/sys_call_table_64.c
> index b0b4cfd2308c8..4760c40ae5cd0 100644
> --- a/arch/x86/um/sys_call_table_64.c
> +++ b/arch/x86/um/sys_call_table_64.c
> @@ -19,6 +19,7 @@
> #define sys_ioperm sys_ni_syscall
>
> #define __SYSCALL(nr, sym) extern asmlinkage long sym(unsigned long,
> unsigned long, unsigned long, unsigned long, unsigned long, unsigned
> long);
> +#define __SYSCALL_NORETURN __SYSCALL
> #include <asm/syscalls_64.h>
>
> #undef __SYSCALL
> diff --git a/scripts/syscalltbl.sh b/scripts/syscalltbl.sh
> index 6abe143889ef6..16487d47e06a3 100755
> --- a/scripts/syscalltbl.sh
> +++ b/scripts/syscalltbl.sh
> @@ -54,7 +54,7 @@ nxt=0
>
> grep -E "^[0-9]+[[:space:]]+$abis" "$infile" | {
>
> - while read nr abi name native compat ; do
> + while read nr abi name native compat noreturn; do
>
> if [ $nxt -gt $nr ]; then
> echo "error: $infile: syscall table is not sorted or duplicates the
> same syscall number" >&2
> @@ -66,7 +66,9 @@ grep -E "^[0-9]+[[:space:]]+$abis" "$infile" | {
> nxt=$((nxt + 1))
> done
>
> - if [ -n "$compat" ]; then
> + if [ -n "$noreturn" ]; then
> + echo "__SYSCALL_NORETURN($nr, $native)"
> + elif [ -n "$compat" ]; then
> echo "__SYSCALL_WITH_COMPAT($nr, $native, $compat)"
> elif [ -n "$native" ]; then
> echo "__SYSCALL($nr, $native)"
> diff --git a/tools/objtool/noreturns.h b/tools/objtool/noreturns.h
> index 7ebf29c911849..1e8141ef1b15d 100644
> --- a/tools/objtool/noreturns.h
> +++ b/tools/objtool/noreturns.h
> @@ -7,12 +7,16 @@
> * Yes, this is unfortunate. A better solution is in the works.
> */
> NORETURN(__fortify_panic)
> +NORETURN(__ia32_sys_exit)
> +NORETURN(__ia32_sys_exit_group)
> NORETURN(__kunit_abort)
> NORETURN(__module_put_and_kthread_exit)
> NORETURN(__reiserfs_panic)
> NORETURN(__stack_chk_fail)
> NORETURN(__tdx_hypercall_failed)
> NORETURN(__ubsan_handle_builtin_unreachable)
> +NORETURN(__x64_sys_exit)
> +NORETURN(__x64_sys_exit_group)
> NORETURN(arch_cpu_idle_dead)
> NORETURN(bch2_trans_in_restart_error)
> NORETURN(bch2_trans_restart_error)
>
> On Sun, Apr 21, 2024 at 1:40 PM Paul McKenney <paulmckrcu@gmail.com> wrote:
> >
> > They apply fine as is, so I have started tests with that pair of patches.
> >
> > Thanx, Paul
> >
> > On Sat, Apr 20, 2024 at 10:25 PM Josh Poimboeuf <jpoimboe@kernel.org> wrote:
> > >
> > > On Sat, Apr 20, 2024 at 06:58:58AM -0700, Paul E. McKenney wrote:
> > > > On Fri, Apr 19, 2024 at 02:09:49PM -0700, Josh Poimboeuf wrote:
> > > > > The direct-call syscall dispatch functions don't know that the exit()
> > > > > and exit_group() syscall handlers don't return. As a result the call
> > > > > sites aren't optimized accordingly.
> > > > >
> > > > > Fix that by marking those exit syscall declarations as __noreturn.
> > > > >
> > > > > Fixes the following warnings:
> > > > >
> > > > > vmlinux.o: warning: objtool: x64_sys_call+0x2804: __x64_sys_exit() is missing a __noreturn annotation
> > > > > vmlinux.o: warning: objtool: ia32_sys_call+0x29b6: __ia32_sys_exit_group() is missing a __noreturn annotation
> > > > >
> > > > > Fixes: 7390db8aea0d ("x86/bhi: Add support for clearing branch history at syscall entry")
> > > > > Reported-by: "Paul E. McKenney" <paulmck@kernel.org>
> > > > > Closes: https://lkml.kernel.org/lkml/6dba9b32-db2c-4e6d-9500-7a08852f17a3@paulmck-laptop
> > > > > Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org>
> > > >
> > > > Looks good, but it does not apply on top of current -next and I don't
> > > > trust myself to hand-apply it (something about having just got off of
> > > > a flight across the big pond).
> > > >
> > > > Could you please let me know what else do I need to pull in to be able
> > > > to cleanly apply this one?
> > >
> > > This patch has a dependency on an earlier patch in the set:
> > >
> > > https://lkml.kernel.org/lkml/982d05a2f669140f26500bee643011896d661094.1713559768.git.jpoimboe@kernel.org
> > >
> > > Though I think it's not a hard dependency and I could reverse the order
> > > of the patches if needed.
> > >
> > > --
> > > Josh
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH v4 3/5] x86/syscall: Mark exit[_group] syscall handlers __noreturn
2024-05-02 23:48 ` Paul McKenney
@ 2024-05-03 15:38 ` Paul E. McKenney
2024-05-03 19:56 ` Josh Poimboeuf
1 sibling, 0 replies; 21+ messages in thread
From: Paul E. McKenney @ 2024-05-03 15:38 UTC (permalink / raw)
To: Josh Poimboeuf
Cc: x86, linux-kernel, Linus Torvalds, Daniel Sneddon, Pawan Gupta,
Thomas Gleixner, Alexandre Chartre, Konrad Rzeszutek Wilk,
Peter Zijlstra, Greg Kroah-Hartman, Sean Christopherson,
Andrew Cooper, Dave Hansen, Nikolay Borisov, KP Singh,
Waiman Long, Borislav Petkov, Ingo Molnar
On Thu, May 02, 2024 at 04:48:13PM -0700, Paul McKenney wrote:
> On Sun, Apr 21, 2024 at 2:47 PM Paul McKenney <paulmckrcu@gmail.com> wrote:
> >
> > And this definitely helped, thank you!
> >
> > However, this one still remains:
> >
> > vmlinux.o: warning: objtool: ia32_sys_call+0x29b6:
> > __ia32_sys_exit_group() is missing a __noreturn annotation
>
> And looking at the patched code, this function looks to me to be
> correctly marked.
>
> No idea... :-/
But thank you for getting rid of the other warning:
Tested-by: Paul E. McKenney <paulmck@kernel.org>
> > Please see below for my diffs against next-20240419, in case I messed
> > something up.
> >
> > I attached a copy as well, given that I am away from mutt, hence using
> > gmail directly.
> >
> > Thanx, Paul
> >
> > -----------------------------------------------------
> >
> > diff --git a/arch/x86/entry/common.c b/arch/x86/entry/common.c
> > index 6de50b80702e6..9810ba2857a5c 100644
> > --- a/arch/x86/entry/common.c
> > +++ b/arch/x86/entry/common.c
> > @@ -49,7 +49,10 @@ static __always_inline bool do_syscall_x64(struct
> > pt_regs *regs, int nr)
> >
> > if (likely(unr < NR_syscalls)) {
> > unr = array_index_nospec(unr, NR_syscalls);
> > - regs->ax = x64_sys_call(regs, unr);
> > + if (likely(cpu_feature_enabled(X86_FEATURE_INDIRECT_BRANCH_OK)))
> > + regs->ax = sys_call_table[unr](regs);
> > + else
> > + regs->ax = x64_sys_call(regs, unr);
> > return true;
> > }
> > return false;
> > @@ -66,7 +69,10 @@ static __always_inline bool do_syscall_x32(struct
> > pt_regs *regs, int nr)
> >
> > if (IS_ENABLED(CONFIG_X86_X32_ABI) && likely(xnr < X32_NR_syscalls)) {
> > xnr = array_index_nospec(xnr, X32_NR_syscalls);
> > - regs->ax = x32_sys_call(regs, xnr);
> > + if (likely(cpu_feature_enabled(X86_FEATURE_INDIRECT_BRANCH_OK)))
> > + regs->ax = x32_sys_call_table[xnr](regs);
> > + else
> > + regs->ax = x32_sys_call(regs, xnr);
> > return true;
> > }
> > return false;
> > @@ -162,7 +168,10 @@ static __always_inline void
> > do_syscall_32_irqs_on(struct pt_regs *regs, int nr)
> >
> > if (likely(unr < IA32_NR_syscalls)) {
> > unr = array_index_nospec(unr, IA32_NR_syscalls);
> > - regs->ax = ia32_sys_call(regs, unr);
> > + if (likely(cpu_feature_enabled(X86_FEATURE_INDIRECT_BRANCH_OK)))
> > + regs->ax = ia32_sys_call_table[unr](regs);
> > + else
> > + regs->ax = ia32_sys_call(regs, unr);
> > } else if (nr != -1) {
> > regs->ax = __ia32_sys_ni_syscall(regs);
> > }
> > diff --git a/arch/x86/entry/syscall_32.c b/arch/x86/entry/syscall_32.c
> > index c2235bae17ef6..aab31760b4e3e 100644
> > --- a/arch/x86/entry/syscall_32.c
> > +++ b/arch/x86/entry/syscall_32.c
> > @@ -14,25 +14,16 @@
> > #endif
> >
> > #define __SYSCALL(nr, sym) extern long __ia32_##sym(const struct pt_regs *);
> > -
> > #include <asm/syscalls_32.h>
> > #undef __SYSCALL
> >
> > -/*
> > - * The sys_call_table[] is no longer used for system calls, but
> > - * kernel/trace/trace_syscalls.c still wants to know the system
> > - * call address.
> > - */
> > -#ifdef CONFIG_X86_32
> > #define __SYSCALL(nr, sym) __ia32_##sym,
> > -const sys_call_ptr_t sys_call_table[] = {
> > +const sys_call_ptr_t ia32_sys_call_table[] = {
> > #include <asm/syscalls_32.h>
> > };
> > #undef __SYSCALL
> > -#endif
> >
> > #define __SYSCALL(nr, sym) case nr: return __ia32_##sym(regs);
> > -
> > long ia32_sys_call(const struct pt_regs *regs, unsigned int nr)
> > {
> > switch (nr) {
> > diff --git a/arch/x86/entry/syscall_64.c b/arch/x86/entry/syscall_64.c
> > index 33b3f09e6f151..ff36a993a07e0 100644
> > --- a/arch/x86/entry/syscall_64.c
> > +++ b/arch/x86/entry/syscall_64.c
> > @@ -8,14 +8,13 @@
> > #include <asm/syscall.h>
> >
> > #define __SYSCALL(nr, sym) extern long __x64_##sym(const struct pt_regs *);
> > +#define __SYSCALL_NORETURN(nr, sym) extern long __noreturn
> > __x64_##sym(const struct pt_regs *);
> > #include <asm/syscalls_64.h>
> > #undef __SYSCALL
> >
> > -/*
> > - * The sys_call_table[] is no longer used for system calls, but
> > - * kernel/trace/trace_syscalls.c still wants to know the system
> > - * call address.
> > - */
> > +#undef __SYSCALL_NORETURN
> > +#define __SYSCALL_NORETURN __SYSCALL
> > +
> > #define __SYSCALL(nr, sym) __x64_##sym,
> > const sys_call_ptr_t sys_call_table[] = {
> > #include <asm/syscalls_64.h>
> > @@ -23,7 +22,6 @@ const sys_call_ptr_t sys_call_table[] = {
> > #undef __SYSCALL
> >
> > #define __SYSCALL(nr, sym) case nr: return __x64_##sym(regs);
> > -
> > long x64_sys_call(const struct pt_regs *regs, unsigned int nr)
> > {
> > switch (nr) {
> > diff --git a/arch/x86/entry/syscall_x32.c b/arch/x86/entry/syscall_x32.c
> > index 03de4a9321318..4221ecce6e689 100644
> > --- a/arch/x86/entry/syscall_x32.c
> > +++ b/arch/x86/entry/syscall_x32.c
> > @@ -8,11 +8,20 @@
> > #include <asm/syscall.h>
> >
> > #define __SYSCALL(nr, sym) extern long __x64_##sym(const struct pt_regs *);
> > +#define __SYSCALL_NORETURN(nr, sym) extern long __noreturn
> > __x64_##sym(const struct pt_regs *);
> > #include <asm/syscalls_x32.h>
> > #undef __SYSCALL
> >
> > -#define __SYSCALL(nr, sym) case nr: return __x64_##sym(regs);
> > +#undef __SYSCALL_NORETURN
> > +#define __SYSCALL_NORETURN __SYSCALL
> > +
> > +#define __SYSCALL(nr, sym) __x64_##sym,
> > +const sys_call_ptr_t x32_sys_call_table[] = {
> > +#include <asm/syscalls_x32.h>
> > +};
> > +#undef __SYSCALL
> >
> > +#define __SYSCALL(nr, sym) case nr: return __x64_##sym(regs);
> > long x32_sys_call(const struct pt_regs *regs, unsigned int nr)
> > {
> > switch (nr) {
> > diff --git a/arch/x86/entry/syscalls/syscall_64.tbl
> > b/arch/x86/entry/syscalls/syscall_64.tbl
> > index a396f6e6ab5bf..7ec68d94eb593 100644
> > --- a/arch/x86/entry/syscalls/syscall_64.tbl
> > +++ b/arch/x86/entry/syscalls/syscall_64.tbl
> > @@ -2,7 +2,7 @@
> > # 64-bit system call numbers and entry vectors
> > #
> > # The format is:
> > -# <number> <abi> <name> <entry point>
> > +# <number> <abi> <name> <entry point> [0 noreturn]
> > #
> > # The __x64_sys_*() stubs are created on-the-fly for sys_*() system calls
> > #
> > @@ -68,7 +68,7 @@
> > 57 common fork sys_fork
> > 58 common vfork sys_vfork
> > 59 64 execve sys_execve
> > -60 common exit sys_exit
> > +60 common exit sys_exit 0 noreturn
> > 61 common wait4 sys_wait4
> > 62 common kill sys_kill
> > 63 common uname sys_newuname
> > @@ -239,7 +239,7 @@
> > 228 common clock_gettime sys_clock_gettime
> > 229 common clock_getres sys_clock_getres
> > 230 common clock_nanosleep sys_clock_nanosleep
> > -231 common exit_group sys_exit_group
> > +231 common exit_group sys_exit_group 0 noreturn
> > 232 common epoll_wait sys_epoll_wait
> > 233 common epoll_ctl sys_epoll_ctl
> > 234 common tgkill sys_tgkill
> > diff --git a/arch/x86/include/asm/cpufeatures.h
> > b/arch/x86/include/asm/cpufeatures.h
> > index 3c7434329661c..d64b0a5291f10 100644
> > --- a/arch/x86/include/asm/cpufeatures.h
> > +++ b/arch/x86/include/asm/cpufeatures.h
> > @@ -470,6 +470,7 @@
> > #define X86_FEATURE_BHI_CTRL (21*32+ 2) /* "" BHI_DIS_S HW control available */
> > #define X86_FEATURE_CLEAR_BHB_HW (21*32+ 3) /* "" BHI_DIS_S HW
> > control enabled */
> > #define X86_FEATURE_CLEAR_BHB_LOOP_ON_VMEXIT (21*32+ 4) /* "" Clear
> > branch history at vmexit using SW loop */
> > +#define X86_FEATURE_INDIRECT_BRANCH_OK (21*32+ 5) /* "" It's OK to
> > use indirect branches */
> >
> > /*
> > * BUG word(s)
> > diff --git a/arch/x86/include/asm/syscall.h b/arch/x86/include/asm/syscall.h
> > index 2fc7bc3863ff6..dfb59521244c2 100644
> > --- a/arch/x86/include/asm/syscall.h
> > +++ b/arch/x86/include/asm/syscall.h
> > @@ -16,14 +16,20 @@
> > #include <asm/thread_info.h> /* for TS_COMPAT */
> > #include <asm/unistd.h>
> >
> > -/* This is used purely for kernel/trace/trace_syscalls.c */
> > typedef long (*sys_call_ptr_t)(const struct pt_regs *);
> > extern const sys_call_ptr_t sys_call_table[];
> >
> > +#if defined(CONFIG_X86_32)
> > +#define ia32_sys_call_table sys_call_table
> > +#else
> > /*
> > * These may not exist, but still put the prototypes in so we
> > * can use IS_ENABLED().
> > */
> > +extern const sys_call_ptr_t ia32_sys_call_table[];
> > +extern const sys_call_ptr_t x32_sys_call_table[];
> > +#endif
> > +
> > extern long ia32_sys_call(const struct pt_regs *, unsigned int nr);
> > extern long x32_sys_call(const struct pt_regs *, unsigned int nr);
> > extern long x64_sys_call(const struct pt_regs *, unsigned int nr);
> > diff --git a/arch/x86/kernel/cpu/bugs.c b/arch/x86/kernel/cpu/bugs.c
> > index ab18185894dfd..5fca46c78daf2 100644
> > --- a/arch/x86/kernel/cpu/bugs.c
> > +++ b/arch/x86/kernel/cpu/bugs.c
> > @@ -1665,6 +1665,12 @@ static void __init bhi_select_mitigation(void)
> > if (!IS_ENABLED(CONFIG_X86_64))
> > return;
> >
> > + /*
> > + * There's no HW mitigation in place. Mark indirect branches as
> > + * "not OK".
> > + */
> > + setup_clear_cpu_cap(X86_FEATURE_INDIRECT_BRANCH_OK);
> > +
> > /* Mitigate KVM by default */
> > setup_force_cpu_cap(X86_FEATURE_CLEAR_BHB_LOOP_ON_VMEXIT);
> > pr_info("Spectre BHI mitigation: SW BHB clearing on vm exit\n");
> > @@ -1679,6 +1685,28 @@ static void __init spectre_v2_select_mitigation(void)
> > enum spectre_v2_mitigation_cmd cmd = spectre_v2_parse_cmdline();
> > enum spectre_v2_mitigation mode = SPECTRE_V2_NONE;
> >
> > + /*
> > + * X86_FEATURE_INDIRECT_BRANCH_OK indicates that indirect calls are
> > + * "OK" to use due to (at least) one of the following being true:
> > + *
> > + * - the CPU isn't vulnerable to Spectre v2, BHI, etc;
> > + *
> > + * - a HW mitigation is in place (e.g., IBRS, eIBRS+BHI_DIS_S); or
> > + *
> > + * - the user disabled mitigations.
> > + *
> > + * Clearing the bit enables certain indirect branch "easy targets" [*]
> > + * to be converted to a series of direct branches.
> > + *
> > + * Assume innocence until proven guilty: set it now and clear it later
> > + * if/when needed.
> > + *
> > + * [*] The closer the indirect branch is to kernel entry, and the more
> > + * user-controlled registers there are, the easier target it may be
> > + * for future Spectre v2 variants.
> > + */
> > + setup_force_cpu_cap(X86_FEATURE_INDIRECT_BRANCH_OK);
> > +
> > /*
> > * If the CPU is not affected and the command line mode is NONE or AUTO
> > * then nothing to do.
> > @@ -1765,11 +1793,16 @@ static void __init spectre_v2_select_mitigation(void)
> > break;
> >
> > case SPECTRE_V2_LFENCE:
> > + setup_clear_cpu_cap(X86_FEATURE_INDIRECT_BRANCH_OK);
> > + fallthrough;
> > case SPECTRE_V2_EIBRS_LFENCE:
> > setup_force_cpu_cap(X86_FEATURE_RETPOLINE_LFENCE);
> > - fallthrough;
> > + setup_force_cpu_cap(X86_FEATURE_RETPOLINE);
> > + break;
> >
> > case SPECTRE_V2_RETPOLINE:
> > + setup_clear_cpu_cap(X86_FEATURE_INDIRECT_BRANCH_OK);
> > + fallthrough;
> > case SPECTRE_V2_EIBRS_RETPOLINE:
> > setup_force_cpu_cap(X86_FEATURE_RETPOLINE);
> > break;
> > diff --git a/arch/x86/um/sys_call_table_32.c b/arch/x86/um/sys_call_table_32.c
> > index 89df5d89d6640..c7d4bf955d2ba 100644
> > --- a/arch/x86/um/sys_call_table_32.c
> > +++ b/arch/x86/um/sys_call_table_32.c
> > @@ -24,6 +24,7 @@
> > #define __SYSCALL_WITH_COMPAT(nr, native, compat) __SYSCALL(nr, native)
> >
> > #define __SYSCALL(nr, sym) extern asmlinkage long sym(unsigned long,
> > unsigned long, unsigned long, unsigned long, unsigned long, unsigned
> > long);
> > +#define __SYSCALL_NORETURN __SYSCALL
> > #include <asm/syscalls_32.h>
> >
> > #undef __SYSCALL
> > diff --git a/arch/x86/um/sys_call_table_64.c b/arch/x86/um/sys_call_table_64.c
> > index b0b4cfd2308c8..4760c40ae5cd0 100644
> > --- a/arch/x86/um/sys_call_table_64.c
> > +++ b/arch/x86/um/sys_call_table_64.c
> > @@ -19,6 +19,7 @@
> > #define sys_ioperm sys_ni_syscall
> >
> > #define __SYSCALL(nr, sym) extern asmlinkage long sym(unsigned long,
> > unsigned long, unsigned long, unsigned long, unsigned long, unsigned
> > long);
> > +#define __SYSCALL_NORETURN __SYSCALL
> > #include <asm/syscalls_64.h>
> >
> > #undef __SYSCALL
> > diff --git a/scripts/syscalltbl.sh b/scripts/syscalltbl.sh
> > index 6abe143889ef6..16487d47e06a3 100755
> > --- a/scripts/syscalltbl.sh
> > +++ b/scripts/syscalltbl.sh
> > @@ -54,7 +54,7 @@ nxt=0
> >
> > grep -E "^[0-9]+[[:space:]]+$abis" "$infile" | {
> >
> > - while read nr abi name native compat ; do
> > + while read nr abi name native compat noreturn; do
> >
> > if [ $nxt -gt $nr ]; then
> > echo "error: $infile: syscall table is not sorted or duplicates the
> > same syscall number" >&2
> > @@ -66,7 +66,9 @@ grep -E "^[0-9]+[[:space:]]+$abis" "$infile" | {
> > nxt=$((nxt + 1))
> > done
> >
> > - if [ -n "$compat" ]; then
> > + if [ -n "$noreturn" ]; then
> > + echo "__SYSCALL_NORETURN($nr, $native)"
> > + elif [ -n "$compat" ]; then
> > echo "__SYSCALL_WITH_COMPAT($nr, $native, $compat)"
> > elif [ -n "$native" ]; then
> > echo "__SYSCALL($nr, $native)"
> > diff --git a/tools/objtool/noreturns.h b/tools/objtool/noreturns.h
> > index 7ebf29c911849..1e8141ef1b15d 100644
> > --- a/tools/objtool/noreturns.h
> > +++ b/tools/objtool/noreturns.h
> > @@ -7,12 +7,16 @@
> > * Yes, this is unfortunate. A better solution is in the works.
> > */
> > NORETURN(__fortify_panic)
> > +NORETURN(__ia32_sys_exit)
> > +NORETURN(__ia32_sys_exit_group)
> > NORETURN(__kunit_abort)
> > NORETURN(__module_put_and_kthread_exit)
> > NORETURN(__reiserfs_panic)
> > NORETURN(__stack_chk_fail)
> > NORETURN(__tdx_hypercall_failed)
> > NORETURN(__ubsan_handle_builtin_unreachable)
> > +NORETURN(__x64_sys_exit)
> > +NORETURN(__x64_sys_exit_group)
> > NORETURN(arch_cpu_idle_dead)
> > NORETURN(bch2_trans_in_restart_error)
> > NORETURN(bch2_trans_restart_error)
> >
> > On Sun, Apr 21, 2024 at 1:40 PM Paul McKenney <paulmckrcu@gmail.com> wrote:
> > >
> > > They apply fine as is, so I have started tests with that pair of patches.
> > >
> > > Thanx, Paul
> > >
> > > On Sat, Apr 20, 2024 at 10:25 PM Josh Poimboeuf <jpoimboe@kernel.org> wrote:
> > > >
> > > > On Sat, Apr 20, 2024 at 06:58:58AM -0700, Paul E. McKenney wrote:
> > > > > On Fri, Apr 19, 2024 at 02:09:49PM -0700, Josh Poimboeuf wrote:
> > > > > > The direct-call syscall dispatch functions don't know that the exit()
> > > > > > and exit_group() syscall handlers don't return. As a result the call
> > > > > > sites aren't optimized accordingly.
> > > > > >
> > > > > > Fix that by marking those exit syscall declarations as __noreturn.
> > > > > >
> > > > > > Fixes the following warnings:
> > > > > >
> > > > > > vmlinux.o: warning: objtool: x64_sys_call+0x2804: __x64_sys_exit() is missing a __noreturn annotation
> > > > > > vmlinux.o: warning: objtool: ia32_sys_call+0x29b6: __ia32_sys_exit_group() is missing a __noreturn annotation
> > > > > >
> > > > > > Fixes: 7390db8aea0d ("x86/bhi: Add support for clearing branch history at syscall entry")
> > > > > > Reported-by: "Paul E. McKenney" <paulmck@kernel.org>
> > > > > > Closes: https://lkml.kernel.org/lkml/6dba9b32-db2c-4e6d-9500-7a08852f17a3@paulmck-laptop
> > > > > > Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org>
> > > > >
> > > > > Looks good, but it does not apply on top of current -next and I don't
> > > > > trust myself to hand-apply it (something about having just got off of
> > > > > a flight across the big pond).
> > > > >
> > > > > Could you please let me know what else do I need to pull in to be able
> > > > > to cleanly apply this one?
> > > >
> > > > This patch has a dependency on an earlier patch in the set:
> > > >
> > > > https://lkml.kernel.org/lkml/982d05a2f669140f26500bee643011896d661094.1713559768.git.jpoimboe@kernel.org
> > > >
> > > > Though I think it's not a hard dependency and I could reverse the order
> > > > of the patches if needed.
> > > >
> > > > --
> > > > Josh
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH v4 3/5] x86/syscall: Mark exit[_group] syscall handlers __noreturn
2024-05-02 23:48 ` Paul McKenney
2024-05-03 15:38 ` Paul E. McKenney
@ 2024-05-03 19:56 ` Josh Poimboeuf
2024-05-03 20:44 ` Josh Poimboeuf
1 sibling, 1 reply; 21+ messages in thread
From: Josh Poimboeuf @ 2024-05-03 19:56 UTC (permalink / raw)
To: Paul McKenney
Cc: Paul E. McKenney, x86, linux-kernel, Linus Torvalds,
Daniel Sneddon, Pawan Gupta, Thomas Gleixner, Alexandre Chartre,
Konrad Rzeszutek Wilk, Peter Zijlstra, Greg Kroah-Hartman,
Sean Christopherson, Andrew Cooper, Dave Hansen, Nikolay Borisov,
KP Singh, Waiman Long, Borislav Petkov, Ingo Molnar
On Thu, May 02, 2024 at 04:48:13PM -0700, Paul McKenney wrote:
> On Sun, Apr 21, 2024 at 2:47 PM Paul McKenney <paulmckrcu@gmail.com> wrote:
> >
> > And this definitely helped, thank you!
> >
> > However, this one still remains:
> >
> > vmlinux.o: warning: objtool: ia32_sys_call+0x29b6:
> > __ia32_sys_exit_group() is missing a __noreturn annotation
>
> And looking at the patched code, this function looks to me to be
> correctly marked.
>
> No idea... :-/
Ah, I think I missed fixing syscall_32.tbl. Lemme see how to do that...
--
Josh
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH v4 3/5] x86/syscall: Mark exit[_group] syscall handlers __noreturn
2024-05-03 19:56 ` Josh Poimboeuf
@ 2024-05-03 20:44 ` Josh Poimboeuf
2024-05-03 23:33 ` Paul E. McKenney
0 siblings, 1 reply; 21+ messages in thread
From: Josh Poimboeuf @ 2024-05-03 20:44 UTC (permalink / raw)
To: Paul McKenney
Cc: Paul E. McKenney, x86, linux-kernel, Linus Torvalds,
Daniel Sneddon, Pawan Gupta, Thomas Gleixner, Alexandre Chartre,
Konrad Rzeszutek Wilk, Peter Zijlstra, Greg Kroah-Hartman,
Sean Christopherson, Andrew Cooper, Dave Hansen, Nikolay Borisov,
KP Singh, Waiman Long, Borislav Petkov, Ingo Molnar
On Fri, May 03, 2024 at 12:57:00PM -0700, Josh Poimboeuf wrote:
> On Thu, May 02, 2024 at 04:48:13PM -0700, Paul McKenney wrote:
> > On Sun, Apr 21, 2024 at 2:47 PM Paul McKenney <paulmckrcu@gmail.com> wrote:
> > >
> > > And this definitely helped, thank you!
> > >
> > > However, this one still remains:
> > >
> > > vmlinux.o: warning: objtool: ia32_sys_call+0x29b6:
> > > __ia32_sys_exit_group() is missing a __noreturn annotation
> >
> > And looking at the patched code, this function looks to me to be
> > correctly marked.
> >
> > No idea... :-/
>
> Ah, I think I missed fixing syscall_32.tbl. Lemme see how to do that...
Can you try adding this on top?
diff --git a/arch/x86/entry/syscalls/syscall_32.tbl b/arch/x86/entry/syscalls/syscall_32.tbl
index 5f8591ce7f25..f30b608d14dc 100644
--- a/arch/x86/entry/syscalls/syscall_32.tbl
+++ b/arch/x86/entry/syscalls/syscall_32.tbl
@@ -12,7 +12,7 @@
# The abi is always "i386" for this file.
#
0 i386 restart_syscall sys_restart_syscall
-1 i386 exit sys_exit
+1 i386 exit sys_exit 0 noreturn
2 i386 fork sys_fork
3 i386 read sys_read
4 i386 write sys_write
@@ -263,7 +263,7 @@
249 i386 io_cancel sys_io_cancel
250 i386 fadvise64 sys_ia32_fadvise64
# 251 is available for reuse (was briefly sys_set_zone_reclaim)
-252 i386 exit_group sys_exit_group
+252 i386 exit_group sys_exit_group 0 noreturn
253 i386 lookup_dcookie
254 i386 epoll_create sys_epoll_create
255 i386 epoll_ctl sys_epoll_ctl
^ permalink raw reply related [flat|nested] 21+ messages in thread
* Re: [PATCH v4 3/5] x86/syscall: Mark exit[_group] syscall handlers __noreturn
2024-05-03 20:44 ` Josh Poimboeuf
@ 2024-05-03 23:33 ` Paul E. McKenney
2024-05-03 23:48 ` Josh Poimboeuf
0 siblings, 1 reply; 21+ messages in thread
From: Paul E. McKenney @ 2024-05-03 23:33 UTC (permalink / raw)
To: Josh Poimboeuf
Cc: x86, linux-kernel, Linus Torvalds, Daniel Sneddon, Pawan Gupta,
Thomas Gleixner, Alexandre Chartre, Konrad Rzeszutek Wilk,
Peter Zijlstra, Greg Kroah-Hartman, Sean Christopherson,
Andrew Cooper, Dave Hansen, Nikolay Borisov, KP Singh,
Waiman Long, Borislav Petkov, Ingo Molnar
On Fri, May 03, 2024 at 01:44:17PM -0700, Josh Poimboeuf wrote:
> On Fri, May 03, 2024 at 12:57:00PM -0700, Josh Poimboeuf wrote:
> > On Thu, May 02, 2024 at 04:48:13PM -0700, Paul McKenney wrote:
> > > On Sun, Apr 21, 2024 at 2:47 PM Paul McKenney <paulmckrcu@gmail.com> wrote:
> > > >
> > > > And this definitely helped, thank you!
> > > >
> > > > However, this one still remains:
> > > >
> > > > vmlinux.o: warning: objtool: ia32_sys_call+0x29b6:
> > > > __ia32_sys_exit_group() is missing a __noreturn annotation
> > >
> > > And looking at the patched code, this function looks to me to be
> > > correctly marked.
> > >
> > > No idea... :-/
> >
> > Ah, I think I missed fixing syscall_32.tbl. Lemme see how to do that...
>
> Can you try adding this on top?
>
> diff --git a/arch/x86/entry/syscalls/syscall_32.tbl b/arch/x86/entry/syscalls/syscall_32.tbl
> index 5f8591ce7f25..f30b608d14dc 100644
> --- a/arch/x86/entry/syscalls/syscall_32.tbl
> +++ b/arch/x86/entry/syscalls/syscall_32.tbl
> @@ -12,7 +12,7 @@
> # The abi is always "i386" for this file.
> #
> 0 i386 restart_syscall sys_restart_syscall
> -1 i386 exit sys_exit
> +1 i386 exit sys_exit 0 noreturn
> 2 i386 fork sys_fork
> 3 i386 read sys_read
> 4 i386 write sys_write
> @@ -263,7 +263,7 @@
> 249 i386 io_cancel sys_io_cancel
> 250 i386 fadvise64 sys_ia32_fadvise64
> # 251 is available for reuse (was briefly sys_set_zone_reclaim)
> -252 i386 exit_group sys_exit_group
> +252 i386 exit_group sys_exit_group 0 noreturn
> 253 i386 lookup_dcookie
> 254 i386 epoll_create sys_epoll_create
> 255 i386 epoll_ctl sys_epoll_ctl
Thank you!
But for non-KCSAN builds, I get the following diagnostics:
In file included from arch/x86/entry/syscall_32.c:17:
./arch/x86/include/generated/asm/syscalls_32.h:2:20: error: expected declaration specifiers or ‘...’ before numeric constant
2 | __SYSCALL_NORETURN(1, sys_exit)
For KCSAN builds, I instead get the following diagnostics:
In file included from arch/x86/entry/syscall_32.c:17:
./arch/x86/include/generated/asm/syscalls_32.h:2:20: error: expected parameter declarator
__SYSCALL_NORETURN(1, sys_exit)
Does arch/x86/entry/syscall_32.c need the following additional patch?
A quick smoke test passes, but perhaps I am just getting lucky...
Thanx, Paul
------------------------------------------------------------------------
diff --git a/arch/x86/entry/syscall_32.c b/arch/x86/entry/syscall_32.c
index aab31760b4e3e..d9ae910ea6f33 100644
--- a/arch/x86/entry/syscall_32.c
+++ b/arch/x86/entry/syscall_32.c
@@ -14,9 +14,13 @@
#endif
#define __SYSCALL(nr, sym) extern long __ia32_##sym(const struct pt_regs *);
+#define __SYSCALL_NORETURN(nr, sym) extern long __noreturn __ia32_##sym(const struct pt_regs *);
#include <asm/syscalls_32.h>
#undef __SYSCALL
+#undef __SYSCALL_NORETURN
+#define __SYSCALL_NORETURN __SYSCALL
+
#define __SYSCALL(nr, sym) __ia32_##sym,
const sys_call_ptr_t ia32_sys_call_table[] = {
#include <asm/syscalls_32.h>
^ permalink raw reply related [flat|nested] 21+ messages in thread
* Re: [PATCH v4 3/5] x86/syscall: Mark exit[_group] syscall handlers __noreturn
2024-05-03 23:33 ` Paul E. McKenney
@ 2024-05-03 23:48 ` Josh Poimboeuf
0 siblings, 0 replies; 21+ messages in thread
From: Josh Poimboeuf @ 2024-05-03 23:48 UTC (permalink / raw)
To: Paul E. McKenney
Cc: x86, linux-kernel, Linus Torvalds, Daniel Sneddon, Pawan Gupta,
Thomas Gleixner, Alexandre Chartre, Konrad Rzeszutek Wilk,
Peter Zijlstra, Greg Kroah-Hartman, Sean Christopherson,
Andrew Cooper, Dave Hansen, Nikolay Borisov, KP Singh,
Waiman Long, Borislav Petkov, Ingo Molnar
On Fri, May 03, 2024 at 04:33:00PM -0700, Paul E. McKenney wrote:
> Does arch/x86/entry/syscall_32.c need the following additional patch?
>
> A quick smoke test passes, but perhaps I am just getting lucky...
>
> Thanx, Paul
>
> ------------------------------------------------------------------------
>
> diff --git a/arch/x86/entry/syscall_32.c b/arch/x86/entry/syscall_32.c
> index aab31760b4e3e..d9ae910ea6f33 100644
> --- a/arch/x86/entry/syscall_32.c
> +++ b/arch/x86/entry/syscall_32.c
> @@ -14,9 +14,13 @@
> #endif
>
> #define __SYSCALL(nr, sym) extern long __ia32_##sym(const struct pt_regs *);
> +#define __SYSCALL_NORETURN(nr, sym) extern long __noreturn __ia32_##sym(const struct pt_regs *);
> #include <asm/syscalls_32.h>
> #undef __SYSCALL
>
> +#undef __SYSCALL_NORETURN
> +#define __SYSCALL_NORETURN __SYSCALL
> +
> #define __SYSCALL(nr, sym) __ia32_##sym,
> const sys_call_ptr_t ia32_sys_call_table[] = {
> #include <asm/syscalls_32.h>
Ah, yeah, that looks right.
--
Josh
^ permalink raw reply [flat|nested] 21+ messages in thread
end of thread, other threads:[~2024-05-03 23:48 UTC | newest]
Thread overview: 21+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-04-19 21:09 [PATCH 0/5] x86/bugs: more BHI fixes Josh Poimboeuf
2024-04-19 21:09 ` [PATCH v4 1/5] x86/bugs: Only harden syscalls when needed Josh Poimboeuf
2024-04-22 8:09 ` Yujie Liu
2024-04-19 21:09 ` [PATCH v4 2/5] cpu/speculation: Fix CPU mitigation defaults for !x86 Josh Poimboeuf
2024-04-20 0:09 ` Sean Christopherson
2024-04-23 14:10 ` Sean Christopherson
2024-04-24 5:35 ` Josh Poimboeuf
2024-04-19 21:09 ` [PATCH v4 3/5] x86/syscall: Mark exit[_group] syscall handlers __noreturn Josh Poimboeuf
2024-04-20 13:58 ` Paul E. McKenney
2024-04-21 5:25 ` Josh Poimboeuf
2024-04-21 20:40 ` Paul McKenney
2024-04-21 21:47 ` Paul McKenney
2024-05-02 23:48 ` Paul McKenney
2024-05-03 15:38 ` Paul E. McKenney
2024-05-03 19:56 ` Josh Poimboeuf
2024-05-03 20:44 ` Josh Poimboeuf
2024-05-03 23:33 ` Paul E. McKenney
2024-05-03 23:48 ` Josh Poimboeuf
2024-04-19 21:09 ` [PATCH v4 4/5] x86/bugs: Remove duplicate Spectre cmdline option descriptions Josh Poimboeuf
2024-04-19 21:09 ` [PATCH v4 5/5] x86/bugs: Add 'spectre_bhi=vmexit' cmdline option Josh Poimboeuf
2024-04-19 21:46 ` Josh Poimboeuf
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.