LKML Archive mirror
 help / color / mirror / Atom feed
* [PATCH v3 00/12] Enable Linear Address Space Separation support
@ 2023-06-09 18:36 Alexander Shishkin
  2023-06-09 18:36 ` [PATCH v3 01/12] x86/cpu: Enumerate the LASS feature bits Alexander Shishkin
                   ` (13 more replies)
  0 siblings, 14 replies; 21+ messages in thread
From: Alexander Shishkin @ 2023-06-09 18:36 UTC (permalink / raw)
  To: linux-kernel, x86, Andy Lutomirski, Dave Hansen, Ravi Shankar,
	Tony Luck, Sohil Mehta
  Cc: Alexander Shishkin

Changes from v2[5]:
- Added myself to the SoB chain

Changes from v1[1]:
- Emulate vsyscall violations in execute mode in the #GP fault handler
- Use inline memcpy and memset while patching alternatives
- Remove CONFIG_X86_LASS
- Make LASS depend on SMAP
- Dropped the minimal KVM enabling patch

Linear Address Space Separation (LASS) is a security feature that intends to
prevent malicious virtual address space accesses across user/kernel mode.

Such mode based access protection already exists today with paging and features
such as SMEP and SMAP. However, to enforce these protections, the processor
must traverse the paging structures in memory.  Malicious software can use
timing information resulting from this traversal to determine details about the
paging structures, and these details may also be used to determine the layout
of the kernel memory.

The LASS mechanism provides the same mode-based protections as paging but
without traversing the paging structures. Because the protections enforced by
LASS are applied before paging, software will not be able to derive
paging-based timing information from the various caching structures such as the
TLBs, mid-level caches, page walker, data caches, etc. LASS can avoid probing
using double page faults, TLB flush and reload, and SW prefetch instructions.
See [2], [3] and [4] for some research on the related attack vectors.

LASS enforcement relies on the typical kernel implemetation to divide the
64-bit virtual address space into two halves:
  Addr[63]=0 -> User address space
  Addr[63]=1 -> Kernel address space
Any data access or code execution across address spaces typically results in a
#GP fault.

Kernel accesses usually only happen to the kernel address space. However, there
are valid reasons for kernel to access memory in the user half. For these cases
(such as text poking and EFI runtime accesses), the kernel can temporarily
suspend the enforcement of LASS by toggling SMAP (Supervisor Mode Access
Prevention) using the stac()/clac() instructions.

User space cannot access any kernel address while LASS is enabled.
Unfortunately, legacy vsyscall functions are located in the address range
0xffffffffff600000 - 0xffffffffff601000 and emulated in kernel.  To avoid
breaking user applications when LASS is enabled, extend the vsyscall emulation
in execute (XONLY) mode to the #GP fault handler.

In contrast, the vsyscall EMULATE mode is deprecated and not expected to be
used by anyone.  Supporting EMULATE mode with LASS would need complex
intruction decoding in the #GP fault handler and is probably not worth the
hassle. Disable LASS in this rare case when someone absolutely needs and
enables vsyscall=emulate via the command line.

As of now there is no publicly available CPU supporting LASS.  The first one to
support LASS would be the Sierra Forest line. The Intel Simics® Simulator was
used as software development and testing vehicle for this patch set.

[1] https://lore.kernel.org/lkml/20230110055204.3227669-1-yian.chen@intel.com/
[2] “Practical Timing Side Channel Attacks against Kernel Space ASLR”,
https://www.ieee-security.org/TC/SP2013/papers/4977a191.pdf
[3] “Prefetch Side-Channel Attacks: Bypassing SMAP and Kernel ASLR”, http://doi.acm.org/10.1145/2976749.2978356
[4] “Harmful prefetch on Intel”, https://ioactive.com/harmful-prefetch-on-intel/ (H/T Anders)
[5] https://lore.kernel.org/all/20230530114247.21821-1-alexander.shishkin@linux.intel.com/

Alexander Shishkin (1):
  x86/vsyscall: Document the fact that vsyscall=emulate disables LASS

Peter Zijlstra (1):
  x86/asm: Introduce inline memcpy and memset

Sohil Mehta (9):
  x86/cpu: Enumerate the LASS feature bits
  x86/alternatives: Disable LASS when patching kernel alternatives
  x86/cpu: Enable LASS during CPU initialization
  x86/cpu: Remove redundant comment during feature setup
  x86/vsyscall: Reorganize the #PF emulation code
  x86/traps: Consolidate user fixups in exc_general_protection()
  x86/vsyscall: Add vsyscall emulation for #GP
  x86/vsyscall: Disable LASS if vsyscall mode is set to EMULATE
  [RFC] x86/efi: Disable LASS enforcement when switching to EFI MM

Yian Chen (1):
  x86/cpu: Set LASS CR4 bit as pinning sensitive

 .../admin-guide/kernel-parameters.txt         |  4 +-
 arch/x86/entry/vsyscall/vsyscall_64.c         | 70 ++++++++++++++-----
 arch/x86/include/asm/cpufeatures.h            |  1 +
 arch/x86/include/asm/disabled-features.h      |  4 +-
 arch/x86/include/asm/smap.h                   |  4 ++
 arch/x86/include/asm/string_32.h              | 21 ++++++
 arch/x86/include/asm/string_64.h              | 21 ++++++
 arch/x86/include/asm/vsyscall.h               | 16 +++--
 arch/x86/include/uapi/asm/processor-flags.h   |  2 +
 arch/x86/kernel/alternative.c                 | 12 +++-
 arch/x86/kernel/cpu/common.c                  | 10 ++-
 arch/x86/kernel/cpu/cpuid-deps.c              |  1 +
 arch/x86/kernel/traps.c                       | 12 ++--
 arch/x86/mm/fault.c                           | 13 +---
 arch/x86/platform/efi/efi_64.c                |  6 ++
 tools/arch/x86/include/asm/cpufeatures.h      |  1 +
 16 files changed, 153 insertions(+), 45 deletions(-)

-- 
2.39.2


^ permalink raw reply	[flat|nested] 21+ messages in thread

* [PATCH v3 01/12] x86/cpu: Enumerate the LASS feature bits
  2023-06-09 18:36 [PATCH v3 00/12] Enable Linear Address Space Separation support Alexander Shishkin
@ 2023-06-09 18:36 ` Alexander Shishkin
  2023-06-09 18:36 ` [PATCH v3 02/12] x86/asm: Introduce inline memcpy and memset Alexander Shishkin
                   ` (12 subsequent siblings)
  13 siblings, 0 replies; 21+ messages in thread
From: Alexander Shishkin @ 2023-06-09 18:36 UTC (permalink / raw)
  To: linux-kernel, x86, Andy Lutomirski, Dave Hansen, Ravi Shankar,
	Tony Luck, Sohil Mehta
  Cc: Yian Chen, Alexander Shishkin

From: Sohil Mehta <sohil.mehta@intel.com>

Linear Address Space Separation (LASS) is a security feature that
intends to prevent malicious virtual address space accesses across
user/kernel mode.

Such mode based access protection already exists today with paging and
features such as SMEP and SMAP. However, to enforce these protections,
the processor must traverse the paging structures in memory.  Malicious
software can use timing information resulting from this traversal to
determine details about the paging structures, and these details may
also be used to determine the layout of the kernel memory.

The LASS mechanism provides the same mode-based protections as paging
but without traversing the paging structures. Because the protections
enforced by LASS are applied before paging, software will not be able to
derive paging-based timing information from the various caching
structures such as the TLBs, mid-level caches, page walker, data caches,
etc.

LASS enforcement relies on the typical kernel implementation to divide
the 64-bit virtual address space into two halves:
  Addr[63]=0 -> User address space
  Addr[63]=1 -> Kernel address space

Any data access or code execution across address spaces typically
results in a #GP fault.

The LASS enforcement for kernel data access is dependent on CR4.SMAP
being set. The enforcement can be disabled by toggling the RFLAGS.AC bit
similar to SMAP.

Define the CPU feature bits to enumerate this feature and include
feature dependencies to reflect the same.

Co-developed-by: Yian Chen <yian.chen@intel.com>
Signed-off-by: Yian Chen <yian.chen@intel.com>
Signed-off-by: Sohil Mehta <sohil.mehta@intel.com>
Signed-off-by: Alexander Shishkin <alexander.shishkin@linux.intel.com>
---
 arch/x86/include/asm/cpufeatures.h          | 1 +
 arch/x86/include/asm/disabled-features.h    | 4 +++-
 arch/x86/include/asm/smap.h                 | 4 ++++
 arch/x86/include/uapi/asm/processor-flags.h | 2 ++
 arch/x86/kernel/cpu/cpuid-deps.c            | 1 +
 tools/arch/x86/include/asm/cpufeatures.h    | 1 +
 6 files changed, 12 insertions(+), 1 deletion(-)

diff --git a/arch/x86/include/asm/cpufeatures.h b/arch/x86/include/asm/cpufeatures.h
index cb8ca46213be..47e775144a34 100644
--- a/arch/x86/include/asm/cpufeatures.h
+++ b/arch/x86/include/asm/cpufeatures.h
@@ -312,6 +312,7 @@
 /* Intel-defined CPU features, CPUID level 0x00000007:1 (EAX), word 12 */
 #define X86_FEATURE_AVX_VNNI		(12*32+ 4) /* AVX VNNI instructions */
 #define X86_FEATURE_AVX512_BF16		(12*32+ 5) /* AVX512 BFLOAT16 instructions */
+#define X86_FEATURE_LASS		(12*32+ 6) /* Linear Address Space Separation */
 #define X86_FEATURE_CMPCCXADD           (12*32+ 7) /* "" CMPccXADD instructions */
 #define X86_FEATURE_ARCH_PERFMON_EXT	(12*32+ 8) /* "" Intel Architectural PerfMon Extension */
 #define X86_FEATURE_FZRM		(12*32+10) /* "" Fast zero-length REP MOVSB */
diff --git a/arch/x86/include/asm/disabled-features.h b/arch/x86/include/asm/disabled-features.h
index fafe9be7a6f4..6535e5192082 100644
--- a/arch/x86/include/asm/disabled-features.h
+++ b/arch/x86/include/asm/disabled-features.h
@@ -22,12 +22,14 @@
 # define DISABLE_CYRIX_ARR	(1<<(X86_FEATURE_CYRIX_ARR & 31))
 # define DISABLE_CENTAUR_MCR	(1<<(X86_FEATURE_CENTAUR_MCR & 31))
 # define DISABLE_PCID		0
+# define DISABLE_LASS		0
 #else
 # define DISABLE_VME		0
 # define DISABLE_K6_MTRR	0
 # define DISABLE_CYRIX_ARR	0
 # define DISABLE_CENTAUR_MCR	0
 # define DISABLE_PCID		(1<<(X86_FEATURE_PCID & 31))
+# define DISABLE_LASS		(1<<(X86_FEATURE_LASS & 31))
 #endif /* CONFIG_X86_64 */
 
 #ifdef CONFIG_X86_INTEL_MEMORY_PROTECTION_KEYS
@@ -122,7 +124,7 @@
 #define DISABLED_MASK11	(DISABLE_RETPOLINE|DISABLE_RETHUNK|DISABLE_UNRET| \
 			 DISABLE_CALL_DEPTH_TRACKING)
 #define DISABLED_MASK12	(DISABLE_LAM)
-#define DISABLED_MASK13	0
+#define DISABLED_MASK13	(DISABLE_LASS)
 #define DISABLED_MASK14	0
 #define DISABLED_MASK15	0
 #define DISABLED_MASK16	(DISABLE_PKU|DISABLE_OSPKE|DISABLE_LA57|DISABLE_UMIP| \
diff --git a/arch/x86/include/asm/smap.h b/arch/x86/include/asm/smap.h
index bab490379c65..776dce849a58 100644
--- a/arch/x86/include/asm/smap.h
+++ b/arch/x86/include/asm/smap.h
@@ -27,6 +27,10 @@
 
 #else /* __ASSEMBLY__ */
 
+/*
+ * The CLAC/STAC instructions toggle enforcement of X86_FEATURE_SMAP as well as
+ * X86_FEATURE_LASS.
+ */
 static __always_inline void clac(void)
 {
 	/* Note: a barrier is implicit in alternative() */
diff --git a/arch/x86/include/uapi/asm/processor-flags.h b/arch/x86/include/uapi/asm/processor-flags.h
index d898432947ff..1d2405869c7a 100644
--- a/arch/x86/include/uapi/asm/processor-flags.h
+++ b/arch/x86/include/uapi/asm/processor-flags.h
@@ -136,6 +136,8 @@
 #define X86_CR4_PKE		_BITUL(X86_CR4_PKE_BIT)
 #define X86_CR4_CET_BIT		23 /* enable Control-flow Enforcement Technology */
 #define X86_CR4_CET		_BITUL(X86_CR4_CET_BIT)
+#define X86_CR4_LASS_BIT	27 /* enable Linear Address Space Separation support */
+#define X86_CR4_LASS		_BITUL(X86_CR4_LASS_BIT)
 #define X86_CR4_LAM_SUP_BIT	28 /* LAM for supervisor pointers */
 #define X86_CR4_LAM_SUP		_BITUL(X86_CR4_LAM_SUP_BIT)
 
diff --git a/arch/x86/kernel/cpu/cpuid-deps.c b/arch/x86/kernel/cpu/cpuid-deps.c
index f6748c8bd647..722020b2e837 100644
--- a/arch/x86/kernel/cpu/cpuid-deps.c
+++ b/arch/x86/kernel/cpu/cpuid-deps.c
@@ -81,6 +81,7 @@ static const struct cpuid_dep cpuid_deps[] = {
 	{ X86_FEATURE_XFD,			X86_FEATURE_XSAVES    },
 	{ X86_FEATURE_XFD,			X86_FEATURE_XGETBV1   },
 	{ X86_FEATURE_AMX_TILE,			X86_FEATURE_XFD       },
+	{ X86_FEATURE_LASS,			X86_FEATURE_SMAP      },
 	{}
 };
 
diff --git a/tools/arch/x86/include/asm/cpufeatures.h b/tools/arch/x86/include/asm/cpufeatures.h
index b89005819cd5..59d2880be0e0 100644
--- a/tools/arch/x86/include/asm/cpufeatures.h
+++ b/tools/arch/x86/include/asm/cpufeatures.h
@@ -311,6 +311,7 @@
 /* Intel-defined CPU features, CPUID level 0x00000007:1 (EAX), word 12 */
 #define X86_FEATURE_AVX_VNNI		(12*32+ 4) /* AVX VNNI instructions */
 #define X86_FEATURE_AVX512_BF16		(12*32+ 5) /* AVX512 BFLOAT16 instructions */
+#define X86_FEATURE_LASS		(12*32+ 6) /* Linear Address Space Separation */
 #define X86_FEATURE_CMPCCXADD           (12*32+ 7) /* "" CMPccXADD instructions */
 #define X86_FEATURE_LKGS		(12*32+18) /* "" Load "kernel" (userspace) GS */
 #define X86_FEATURE_AMX_FP16		(12*32+21) /* "" AMX fp16 Support */
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH v3 02/12] x86/asm: Introduce inline memcpy and memset
  2023-06-09 18:36 [PATCH v3 00/12] Enable Linear Address Space Separation support Alexander Shishkin
  2023-06-09 18:36 ` [PATCH v3 01/12] x86/cpu: Enumerate the LASS feature bits Alexander Shishkin
@ 2023-06-09 18:36 ` Alexander Shishkin
  2023-06-09 18:36 ` [PATCH v3 03/12] x86/alternatives: Disable LASS when patching kernel alternatives Alexander Shishkin
                   ` (11 subsequent siblings)
  13 siblings, 0 replies; 21+ messages in thread
From: Alexander Shishkin @ 2023-06-09 18:36 UTC (permalink / raw)
  To: linux-kernel, x86, Andy Lutomirski, Dave Hansen, Ravi Shankar,
	Tony Luck, Sohil Mehta
  Cc: Peter Zijlstra, Alexander Shishkin

From: Peter Zijlstra <peterz@infradead.org>

Provide inline memcpy and memset functions that can be used instead of
the GCC builtins whenever necessary.

Code posted by Peter Zijlstra <peterz@infradead.org>.
Link: https://lore.kernel.org/lkml/Y759AJ%2F0N9fqwDED@hirez.programming.kicks-ass.net/
[Missing Signed-off-by from PeterZ]
Signed-off-by: Sohil Mehta <sohil.mehta@intel.com>
Signed-off-by: Alexander Shishkin <alexander.shishkin@linux.intel.com>
---
 arch/x86/include/asm/string_32.h | 21 +++++++++++++++++++++
 arch/x86/include/asm/string_64.h | 21 +++++++++++++++++++++
 2 files changed, 42 insertions(+)

diff --git a/arch/x86/include/asm/string_32.h b/arch/x86/include/asm/string_32.h
index 32c0d981a82a..8896270e5eda 100644
--- a/arch/x86/include/asm/string_32.h
+++ b/arch/x86/include/asm/string_32.h
@@ -151,6 +151,16 @@ extern void *memcpy(void *, const void *, size_t);
 
 #endif /* !CONFIG_FORTIFY_SOURCE */
 
+static __always_inline void *__inline_memcpy(void *to, const void *from, size_t len)
+{
+	void *ret = to;
+
+	asm volatile("rep movsb"
+		     : "+D" (to), "+S" (from), "+c" (len)
+		     : : "memory");
+	return ret;
+}
+
 #define __HAVE_ARCH_MEMMOVE
 void *memmove(void *dest, const void *src, size_t n);
 
@@ -195,6 +205,17 @@ extern void *memset(void *, int, size_t);
 #define memset(s, c, count) __builtin_memset(s, c, count)
 #endif /* !CONFIG_FORTIFY_SOURCE */
 
+static __always_inline void *__inline_memset(void *s, int v, size_t n)
+{
+	void *ret = s;
+
+	asm volatile("rep stosb"
+		     : "+D" (s), "+c" (n)
+		     : "a" ((uint8_t)v)
+		     : "memory");
+	return ret;
+}
+
 #define __HAVE_ARCH_MEMSET16
 static inline void *memset16(uint16_t *s, uint16_t v, size_t n)
 {
diff --git a/arch/x86/include/asm/string_64.h b/arch/x86/include/asm/string_64.h
index 857d364b9888..ea51e2d73265 100644
--- a/arch/x86/include/asm/string_64.h
+++ b/arch/x86/include/asm/string_64.h
@@ -18,10 +18,31 @@
 extern void *memcpy(void *to, const void *from, size_t len);
 extern void *__memcpy(void *to, const void *from, size_t len);
 
+static __always_inline void *__inline_memcpy(void *to, const void *from, size_t len)
+{
+	void *ret = to;
+
+	asm volatile("rep movsb"
+		     : "+D" (to), "+S" (from), "+c" (len)
+		     : : "memory");
+	return ret;
+}
+
 #define __HAVE_ARCH_MEMSET
 void *memset(void *s, int c, size_t n);
 void *__memset(void *s, int c, size_t n);
 
+static __always_inline void *__inline_memset(void *s, int v, size_t n)
+{
+	void *ret = s;
+
+	asm volatile("rep stosb"
+		     : "+D" (s), "+c" (n)
+		     : "a" ((uint8_t)v)
+		     : "memory");
+	return ret;
+}
+
 /*
  * KMSAN needs to instrument as much code as possible. Use C versions of
  * memsetXX() from lib/string.c under KMSAN.
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH v3 03/12] x86/alternatives: Disable LASS when patching kernel alternatives
  2023-06-09 18:36 [PATCH v3 00/12] Enable Linear Address Space Separation support Alexander Shishkin
  2023-06-09 18:36 ` [PATCH v3 01/12] x86/cpu: Enumerate the LASS feature bits Alexander Shishkin
  2023-06-09 18:36 ` [PATCH v3 02/12] x86/asm: Introduce inline memcpy and memset Alexander Shishkin
@ 2023-06-09 18:36 ` Alexander Shishkin
  2023-07-31 22:41   ` Edgecombe, Rick P
  2023-06-09 18:36 ` [PATCH v3 04/12] x86/cpu: Enable LASS during CPU initialization Alexander Shishkin
                   ` (10 subsequent siblings)
  13 siblings, 1 reply; 21+ messages in thread
From: Alexander Shishkin @ 2023-06-09 18:36 UTC (permalink / raw)
  To: linux-kernel, x86, Andy Lutomirski, Dave Hansen, Ravi Shankar,
	Tony Luck, Sohil Mehta
  Cc: Alexander Shishkin

From: Sohil Mehta <sohil.mehta@intel.com>

For patching, the kernel initializes a temporary mm area in the lower
half of the address range. See commit 4fc19708b165 ("x86/alternatives:
Initialize temporary mm for patching").

Disable LASS enforcement during patching using the stac()/clac()
instructions to avoid triggering a #GP fault.

The objtool warns due to a call to a non-allowed function that exists
outside of the stac/clac guard, or references to any function with a
dynamic function pointer inside the guard. See the Objtool warnings
section #9 in the document tools/objtool/Documentation/objtool.txt.

Considering that patching is usually small, replace the memcpy and
memset functions in the text poking functions with their inline versions
respectively.

Signed-off-by: Sohil Mehta <sohil.mehta@intel.com>
Signed-off-by: Alexander Shishkin <alexander.shishkin@linux.intel.com>
---
 arch/x86/kernel/alternative.c | 12 ++++++++++--
 1 file changed, 10 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kernel/alternative.c b/arch/x86/kernel/alternative.c
index f615e0cb6d93..eac6a5406d39 100644
--- a/arch/x86/kernel/alternative.c
+++ b/arch/x86/kernel/alternative.c
@@ -1526,16 +1526,24 @@ static inline void unuse_temporary_mm(temp_mm_state_t prev_state)
 __ro_after_init struct mm_struct *poking_mm;
 __ro_after_init unsigned long poking_addr;
 
+/*
+ * poking_init() initializes the text poking address from the lower half of the
+ * address space. Relax LASS enforcement when accessing the poking address.
+ */
 static void text_poke_memcpy(void *dst, const void *src, size_t len)
 {
-	memcpy(dst, src, len);
+	stac();
+	__inline_memcpy(dst, src, len);
+	clac();
 }
 
 static void text_poke_memset(void *dst, const void *src, size_t len)
 {
 	int c = *(const int *)src;
 
-	memset(dst, c, len);
+	stac();
+	__inline_memset(dst, c, len);
+	clac();
 }
 
 typedef void text_poke_f(void *dst, const void *src, size_t len);
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH v3 04/12] x86/cpu: Enable LASS during CPU initialization
  2023-06-09 18:36 [PATCH v3 00/12] Enable Linear Address Space Separation support Alexander Shishkin
                   ` (2 preceding siblings ...)
  2023-06-09 18:36 ` [PATCH v3 03/12] x86/alternatives: Disable LASS when patching kernel alternatives Alexander Shishkin
@ 2023-06-09 18:36 ` Alexander Shishkin
  2023-06-09 18:36 ` [PATCH v3 05/12] x86/cpu: Remove redundant comment during feature setup Alexander Shishkin
                   ` (9 subsequent siblings)
  13 siblings, 0 replies; 21+ messages in thread
From: Alexander Shishkin @ 2023-06-09 18:36 UTC (permalink / raw)
  To: linux-kernel, x86, Andy Lutomirski, Dave Hansen, Ravi Shankar,
	Tony Luck, Sohil Mehta
  Cc: Alexander Shishkin

From: Sohil Mehta <sohil.mehta@intel.com>

Being a security feature, enable LASS by default if the platform
supports it.

Signed-off-by: Sohil Mehta <sohil.mehta@intel.com>
Signed-off-by: Alexander Shishkin <alexander.shishkin@linux.intel.com>
---
 arch/x86/kernel/cpu/common.c | 7 +++++++
 1 file changed, 7 insertions(+)

diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c
index 80710a68ef7d..315cc67ba93a 100644
--- a/arch/x86/kernel/cpu/common.c
+++ b/arch/x86/kernel/cpu/common.c
@@ -413,6 +413,12 @@ static __always_inline void setup_umip(struct cpuinfo_x86 *c)
 	cr4_clear_bits(X86_CR4_UMIP);
 }
 
+static __always_inline void setup_lass(struct cpuinfo_x86 *c)
+{
+	if (cpu_feature_enabled(X86_FEATURE_LASS))
+		cr4_set_bits(X86_CR4_LASS);
+}
+
 /* These bits should not change their value after CPU init is finished. */
 static const unsigned long cr4_pinned_mask =
 	X86_CR4_SMEP | X86_CR4_SMAP | X86_CR4_UMIP |
@@ -1859,6 +1865,7 @@ static void identify_cpu(struct cpuinfo_x86 *c)
 	setup_smep(c);
 	setup_smap(c);
 	setup_umip(c);
+	setup_lass(c);
 
 	/* Enable FSGSBASE instructions if available. */
 	if (cpu_has(c, X86_FEATURE_FSGSBASE)) {
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH v3 05/12] x86/cpu: Remove redundant comment during feature setup
  2023-06-09 18:36 [PATCH v3 00/12] Enable Linear Address Space Separation support Alexander Shishkin
                   ` (3 preceding siblings ...)
  2023-06-09 18:36 ` [PATCH v3 04/12] x86/cpu: Enable LASS during CPU initialization Alexander Shishkin
@ 2023-06-09 18:36 ` Alexander Shishkin
  2023-06-09 18:36 ` [PATCH v3 06/12] x86/vsyscall: Reorganize the #PF emulation code Alexander Shishkin
                   ` (8 subsequent siblings)
  13 siblings, 0 replies; 21+ messages in thread
From: Alexander Shishkin @ 2023-06-09 18:36 UTC (permalink / raw)
  To: linux-kernel, x86, Andy Lutomirski, Dave Hansen, Ravi Shankar,
	Tony Luck, Sohil Mehta
  Cc: Alexander Shishkin

From: Sohil Mehta <sohil.mehta@intel.com>

The code below the comment is self explanatory. Instead of updating the
comment with the newly added LASS feature, it is better to just remove
it.

Signed-off-by: Sohil Mehta <sohil.mehta@intel.com>
Signed-off-by: Alexander Shishkin <alexander.shishkin@linux.intel.com>
---
 arch/x86/kernel/cpu/common.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c
index 315cc67ba93a..f26c56fe9963 100644
--- a/arch/x86/kernel/cpu/common.c
+++ b/arch/x86/kernel/cpu/common.c
@@ -1861,7 +1861,6 @@ static void identify_cpu(struct cpuinfo_x86 *c)
 	/* Disable the PN if appropriate */
 	squash_the_stupid_serial_number(c);
 
-	/* Set up SMEP/SMAP/UMIP */
 	setup_smep(c);
 	setup_smap(c);
 	setup_umip(c);
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH v3 06/12] x86/vsyscall: Reorganize the #PF emulation code
  2023-06-09 18:36 [PATCH v3 00/12] Enable Linear Address Space Separation support Alexander Shishkin
                   ` (4 preceding siblings ...)
  2023-06-09 18:36 ` [PATCH v3 05/12] x86/cpu: Remove redundant comment during feature setup Alexander Shishkin
@ 2023-06-09 18:36 ` Alexander Shishkin
  2023-06-09 18:36 ` [PATCH v3 07/12] x86/traps: Consolidate user fixups in exc_general_protection() Alexander Shishkin
                   ` (7 subsequent siblings)
  13 siblings, 0 replies; 21+ messages in thread
From: Alexander Shishkin @ 2023-06-09 18:36 UTC (permalink / raw)
  To: linux-kernel, x86, Andy Lutomirski, Dave Hansen, Ravi Shankar,
	Tony Luck, Sohil Mehta
  Cc: Alexander Shishkin

From: Sohil Mehta <sohil.mehta@intel.com>

Separate out the actual vsyscall emulation from the page fault specific
handling in preparation for the upcoming #GP fault emulation.

Export is_vsyscall_vaddr() so that it can be reused later.

No functional change intended.

Signed-off-by: Sohil Mehta <sohil.mehta@intel.com>
Signed-off-by: Alexander Shishkin <alexander.shishkin@linux.intel.com>
---
 arch/x86/entry/vsyscall/vsyscall_64.c | 51 +++++++++++++++++----------
 arch/x86/include/asm/vsyscall.h       | 10 +++---
 arch/x86/mm/fault.c                   | 13 ++-----
 3 files changed, 41 insertions(+), 33 deletions(-)

diff --git a/arch/x86/entry/vsyscall/vsyscall_64.c b/arch/x86/entry/vsyscall/vsyscall_64.c
index e0ca8120aea8..dd112e538992 100644
--- a/arch/x86/entry/vsyscall/vsyscall_64.c
+++ b/arch/x86/entry/vsyscall/vsyscall_64.c
@@ -82,6 +82,15 @@ static void warn_bad_vsyscall(const char *level, struct pt_regs *regs,
 			   regs->sp, regs->ax, regs->si, regs->di);
 }
 
+/*
+ * The (legacy) vsyscall page is the long page in the kernel portion
+ * of the address space that has user-accessible permissions.
+ */
+bool is_vsyscall_vaddr(unsigned long vaddr)
+{
+	return unlikely((vaddr & PAGE_MASK) == VSYSCALL_ADDR);
+}
+
 static int addr_to_vsyscall_nr(unsigned long addr)
 {
 	int nr;
@@ -117,8 +126,7 @@ static bool write_ok_or_segv(unsigned long ptr, size_t size)
 	}
 }
 
-bool emulate_vsyscall(unsigned long error_code,
-		      struct pt_regs *regs, unsigned long address)
+static bool __emulate_vsyscall(struct pt_regs *regs, unsigned long address)
 {
 	struct task_struct *tsk;
 	unsigned long caller;
@@ -127,22 +135,6 @@ bool emulate_vsyscall(unsigned long error_code,
 	long ret;
 	unsigned long orig_dx;
 
-	/* Write faults or kernel-privilege faults never get fixed up. */
-	if ((error_code & (X86_PF_WRITE | X86_PF_USER)) != X86_PF_USER)
-		return false;
-
-	if (!(error_code & X86_PF_INSTR)) {
-		/* Failed vsyscall read */
-		if (vsyscall_mode == EMULATE)
-			return false;
-
-		/*
-		 * User code tried and failed to read the vsyscall page.
-		 */
-		warn_bad_vsyscall(KERN_INFO, regs, "vsyscall read attempt denied -- look up the vsyscall kernel parameter if you need a workaround");
-		return false;
-	}
-
 	/*
 	 * No point in checking CS -- the only way to get here is a user mode
 	 * trap to a high address, which means that we're in 64-bit user code.
@@ -294,6 +286,29 @@ bool emulate_vsyscall(unsigned long error_code,
 	return true;
 }
 
+bool emulate_vsyscall_pf(unsigned long error_code, struct pt_regs *regs,
+			 unsigned long address)
+{
+	/* Write faults or kernel-privilege faults never get fixed up. */
+	if ((error_code & (X86_PF_WRITE | X86_PF_USER)) != X86_PF_USER)
+		return false;
+
+	if (!(error_code & X86_PF_INSTR)) {
+		/* Failed vsyscall read */
+		if (vsyscall_mode == EMULATE)
+			return false;
+
+		/*
+		 * User code tried and failed to read the vsyscall page.
+		 */
+		warn_bad_vsyscall(KERN_INFO, regs,
+				  "vsyscall read attempt denied -- look up the vsyscall kernel parameter if you need a workaround");
+		return false;
+	}
+
+	return __emulate_vsyscall(regs, address);
+}
+
 /*
  * A pseudo VMA to allow ptrace access for the vsyscall page.  This only
  * covers the 64bit vsyscall page now. 32bit has a real VMA now and does
diff --git a/arch/x86/include/asm/vsyscall.h b/arch/x86/include/asm/vsyscall.h
index ab60a71a8dcb..667b280afc1a 100644
--- a/arch/x86/include/asm/vsyscall.h
+++ b/arch/x86/include/asm/vsyscall.h
@@ -5,6 +5,8 @@
 #include <linux/seqlock.h>
 #include <uapi/asm/vsyscall.h>
 
+extern bool is_vsyscall_vaddr(unsigned long vaddr);
+
 #ifdef CONFIG_X86_VSYSCALL_EMULATION
 extern void map_vsyscall(void);
 extern void set_vsyscall_pgtable_user_bits(pgd_t *root);
@@ -13,12 +15,12 @@ extern void set_vsyscall_pgtable_user_bits(pgd_t *root);
  * Called on instruction fetch fault in vsyscall page.
  * Returns true if handled.
  */
-extern bool emulate_vsyscall(unsigned long error_code,
-			     struct pt_regs *regs, unsigned long address);
+extern bool emulate_vsyscall_pf(unsigned long error_code,
+				struct pt_regs *regs, unsigned long address);
 #else
 static inline void map_vsyscall(void) {}
-static inline bool emulate_vsyscall(unsigned long error_code,
-				    struct pt_regs *regs, unsigned long address)
+static inline bool emulate_vsyscall_pf(unsigned long error_code,
+				       struct pt_regs *regs, unsigned long address)
 {
 	return false;
 }
diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c
index e4399983c50c..645eb3323f34 100644
--- a/arch/x86/mm/fault.c
+++ b/arch/x86/mm/fault.c
@@ -730,7 +730,7 @@ kernelmode_fixup_or_oops(struct pt_regs *regs, unsigned long error_code,
 		 * Per the above we're !in_interrupt(), aka. task context.
 		 *
 		 * In this case we need to make sure we're not recursively
-		 * faulting through the emulate_vsyscall() logic.
+		 * faulting through the emulate_vsyscall_pf() logic.
 		 */
 		if (current->thread.sig_on_uaccess_err && signal) {
 			sanitize_error_code(address, &error_code);
@@ -798,15 +798,6 @@ show_signal_msg(struct pt_regs *regs, unsigned long error_code,
 	show_opcodes(regs, loglvl);
 }
 
-/*
- * The (legacy) vsyscall page is the long page in the kernel portion
- * of the address space that has user-accessible permissions.
- */
-static bool is_vsyscall_vaddr(unsigned long vaddr)
-{
-	return unlikely((vaddr & PAGE_MASK) == VSYSCALL_ADDR);
-}
-
 static void
 __bad_area_nosemaphore(struct pt_regs *regs, unsigned long error_code,
 		       unsigned long address, u32 pkey, int si_code)
@@ -1329,7 +1320,7 @@ void do_user_addr_fault(struct pt_regs *regs,
 	 * to consider the PF_PK bit.
 	 */
 	if (is_vsyscall_vaddr(address)) {
-		if (emulate_vsyscall(error_code, regs, address))
+		if (emulate_vsyscall_pf(error_code, regs, address))
 			return;
 	}
 #endif
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH v3 07/12] x86/traps: Consolidate user fixups in exc_general_protection()
  2023-06-09 18:36 [PATCH v3 00/12] Enable Linear Address Space Separation support Alexander Shishkin
                   ` (5 preceding siblings ...)
  2023-06-09 18:36 ` [PATCH v3 06/12] x86/vsyscall: Reorganize the #PF emulation code Alexander Shishkin
@ 2023-06-09 18:36 ` Alexander Shishkin
  2023-06-09 18:36 ` [PATCH v3 08/12] x86/vsyscall: Add vsyscall emulation for #GP Alexander Shishkin
                   ` (6 subsequent siblings)
  13 siblings, 0 replies; 21+ messages in thread
From: Alexander Shishkin @ 2023-06-09 18:36 UTC (permalink / raw)
  To: linux-kernel, x86, Andy Lutomirski, Dave Hansen, Ravi Shankar,
	Tony Luck, Sohil Mehta
  Cc: Dave Hansen, Alexander Shishkin

From: Sohil Mehta <sohil.mehta@intel.com>

Move the UMIP exception fixup along with the other user mode fixups.

No functional change intended.

Suggested-by: Dave Hansen <dave.hansen@intel.com>
Signed-off-by: Sohil Mehta <sohil.mehta@intel.com>
Signed-off-by: Alexander Shishkin <alexander.shishkin@linux.intel.com>
---
 arch/x86/kernel/traps.c | 8 +++-----
 1 file changed, 3 insertions(+), 5 deletions(-)

diff --git a/arch/x86/kernel/traps.c b/arch/x86/kernel/traps.c
index 58b1f208eff5..f3e619ce9fbd 100644
--- a/arch/x86/kernel/traps.c
+++ b/arch/x86/kernel/traps.c
@@ -736,11 +736,6 @@ DEFINE_IDTENTRY_ERRORCODE(exc_general_protection)
 
 	cond_local_irq_enable(regs);
 
-	if (static_cpu_has(X86_FEATURE_UMIP)) {
-		if (user_mode(regs) && fixup_umip_exception(regs))
-			goto exit;
-	}
-
 	if (v8086_mode(regs)) {
 		local_irq_enable();
 		handle_vm86_fault((struct kernel_vm86_regs *) regs, error_code);
@@ -755,6 +750,9 @@ DEFINE_IDTENTRY_ERRORCODE(exc_general_protection)
 		if (fixup_vdso_exception(regs, X86_TRAP_GP, error_code, 0))
 			goto exit;
 
+		if (cpu_feature_enabled(X86_FEATURE_UMIP) && fixup_umip_exception(regs))
+			goto exit;
+
 		gp_user_force_sig_segv(regs, X86_TRAP_GP, error_code, desc);
 		goto exit;
 	}
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH v3 08/12] x86/vsyscall: Add vsyscall emulation for #GP
  2023-06-09 18:36 [PATCH v3 00/12] Enable Linear Address Space Separation support Alexander Shishkin
                   ` (6 preceding siblings ...)
  2023-06-09 18:36 ` [PATCH v3 07/12] x86/traps: Consolidate user fixups in exc_general_protection() Alexander Shishkin
@ 2023-06-09 18:36 ` Alexander Shishkin
  2023-06-09 18:36 ` [PATCH v3 09/12] x86/vsyscall: Disable LASS if vsyscall mode is set to EMULATE Alexander Shishkin
                   ` (5 subsequent siblings)
  13 siblings, 0 replies; 21+ messages in thread
From: Alexander Shishkin @ 2023-06-09 18:36 UTC (permalink / raw)
  To: linux-kernel, x86, Andy Lutomirski, Dave Hansen, Ravi Shankar,
	Tony Luck, Sohil Mehta
  Cc: Alexander Shishkin

From: Sohil Mehta <sohil.mehta@intel.com>

The legacy vsyscall page is mapped at a fixed address in the kernel
address range 0xffffffffff600000-0xffffffffff601000. Prior to LASS being
introduced, a legacy vsyscall page access from userspace would always
generate a page fault. The kernel emulates the execute (XONLY) accesses
in the page fault handler and returns back to userspace with the
appropriate register values.

Since LASS intercepts these accesses before the paging structures are
traversed it generates a general protection fault instead of a page
fault. The #GP fault doesn't provide much information in terms of the
error code. So, use the faulting RIP which is preserved in the user
registers to emulate the vsyscall access without going through complex
instruction decoding.

Signed-off-by: Sohil Mehta <sohil.mehta@intel.com>
Signed-off-by: Alexander Shishkin <alexander.shishkin@linux.intel.com>
---
 arch/x86/entry/vsyscall/vsyscall_64.c | 11 ++++++++++-
 arch/x86/include/asm/vsyscall.h       |  6 ++++++
 arch/x86/kernel/traps.c               |  4 ++++
 3 files changed, 20 insertions(+), 1 deletion(-)

diff --git a/arch/x86/entry/vsyscall/vsyscall_64.c b/arch/x86/entry/vsyscall/vsyscall_64.c
index dd112e538992..76e1344997d2 100644
--- a/arch/x86/entry/vsyscall/vsyscall_64.c
+++ b/arch/x86/entry/vsyscall/vsyscall_64.c
@@ -23,7 +23,7 @@
  * soon be no new userspace code that will ever use a vsyscall.
  *
  * The code in this file emulates vsyscalls when notified of a page
- * fault to a vsyscall address.
+ * fault or a general protection fault to a vsyscall address.
  */
 
 #include <linux/kernel.h>
@@ -309,6 +309,15 @@ bool emulate_vsyscall_pf(unsigned long error_code, struct pt_regs *regs,
 	return __emulate_vsyscall(regs, address);
 }
 
+bool emulate_vsyscall_gp(struct pt_regs *regs)
+{
+	/* Emulate only if the RIP points to the vsyscall address */
+	if (!is_vsyscall_vaddr(regs->ip))
+		return false;
+
+	return __emulate_vsyscall(regs, regs->ip);
+}
+
 /*
  * A pseudo VMA to allow ptrace access for the vsyscall page.  This only
  * covers the 64bit vsyscall page now. 32bit has a real VMA now and does
diff --git a/arch/x86/include/asm/vsyscall.h b/arch/x86/include/asm/vsyscall.h
index 667b280afc1a..7180a849143f 100644
--- a/arch/x86/include/asm/vsyscall.h
+++ b/arch/x86/include/asm/vsyscall.h
@@ -17,6 +17,7 @@ extern void set_vsyscall_pgtable_user_bits(pgd_t *root);
  */
 extern bool emulate_vsyscall_pf(unsigned long error_code,
 				struct pt_regs *regs, unsigned long address);
+extern bool emulate_vsyscall_gp(struct pt_regs *regs);
 #else
 static inline void map_vsyscall(void) {}
 static inline bool emulate_vsyscall_pf(unsigned long error_code,
@@ -24,6 +25,11 @@ static inline bool emulate_vsyscall_pf(unsigned long error_code,
 {
 	return false;
 }
+
+static inline bool emulate_vsyscall_gp(struct pt_regs *regs)
+{
+	return false;
+}
 #endif
 
 #endif /* _ASM_X86_VSYSCALL_H */
diff --git a/arch/x86/kernel/traps.c b/arch/x86/kernel/traps.c
index f3e619ce9fbd..42d13e17e068 100644
--- a/arch/x86/kernel/traps.c
+++ b/arch/x86/kernel/traps.c
@@ -65,6 +65,7 @@
 #include <asm/vdso.h>
 #include <asm/tdx.h>
 #include <asm/cfi.h>
+#include <asm/vsyscall.h>
 
 #ifdef CONFIG_X86_64
 #include <asm/x86_init.h>
@@ -753,6 +754,9 @@ DEFINE_IDTENTRY_ERRORCODE(exc_general_protection)
 		if (cpu_feature_enabled(X86_FEATURE_UMIP) && fixup_umip_exception(regs))
 			goto exit;
 
+		if (cpu_feature_enabled(X86_FEATURE_LASS) && emulate_vsyscall_gp(regs))
+			goto exit;
+
 		gp_user_force_sig_segv(regs, X86_TRAP_GP, error_code, desc);
 		goto exit;
 	}
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH v3 09/12] x86/vsyscall: Disable LASS if vsyscall mode is set to EMULATE
  2023-06-09 18:36 [PATCH v3 00/12] Enable Linear Address Space Separation support Alexander Shishkin
                   ` (7 preceding siblings ...)
  2023-06-09 18:36 ` [PATCH v3 08/12] x86/vsyscall: Add vsyscall emulation for #GP Alexander Shishkin
@ 2023-06-09 18:36 ` Alexander Shishkin
  2023-06-09 18:36 ` [PATCH v3 10/12] x86/vsyscall: Document the fact that vsyscall=emulate disables LASS Alexander Shishkin
                   ` (4 subsequent siblings)
  13 siblings, 0 replies; 21+ messages in thread
From: Alexander Shishkin @ 2023-06-09 18:36 UTC (permalink / raw)
  To: linux-kernel, x86, Andy Lutomirski, Dave Hansen, Ravi Shankar,
	Tony Luck, Sohil Mehta
  Cc: Alexander Shishkin

From: Sohil Mehta <sohil.mehta@intel.com>

The EMULATE mode of vsyscall maps the vsyscall page into user address
space which can be read directly by the user application. This mode has
been deprecated recently and can only be enabled from a special command
line parameter vsyscall=emulate. See commit bf00745e7791 ("x86/vsyscall:
Remove CONFIG_LEGACY_VSYSCALL_EMULATE")

Fixing the LASS violations during the EMULATE mode would need complex
instruction decoding since the resulting #GP fault does not include any
useful error information and the vsyscall address is not readily
available in the RIP.

At this point, no one is expected to be using the insecure and
deprecated EMULATE mode. The rare usages that need support probably
don't care much about security anyway. Disable LASS when EMULATE mode is
requested during command line parsing to avoid breaking user software.
LASS will be supported if vsyscall mode is set to XONLY or NONE.

Signed-off-by: Sohil Mehta <sohil.mehta@intel.com>
Signed-off-by: Alexander Shishkin <alexander.shishkin@linux.intel.com>
---
 arch/x86/entry/vsyscall/vsyscall_64.c | 8 ++++++++
 1 file changed, 8 insertions(+)

diff --git a/arch/x86/entry/vsyscall/vsyscall_64.c b/arch/x86/entry/vsyscall/vsyscall_64.c
index 76e1344997d2..edd58eda8f50 100644
--- a/arch/x86/entry/vsyscall/vsyscall_64.c
+++ b/arch/x86/entry/vsyscall/vsyscall_64.c
@@ -36,6 +36,7 @@
 #include <asm/vsyscall.h>
 #include <asm/unistd.h>
 #include <asm/fixmap.h>
+#include <asm/tlbflush.h>
 #include <asm/traps.h>
 #include <asm/paravirt.h>
 
@@ -63,6 +64,13 @@ static int __init vsyscall_setup(char *str)
 		else
 			return -EINVAL;
 
+		if (cpu_feature_enabled(X86_FEATURE_LASS) &&
+		    vsyscall_mode == EMULATE) {
+			cr4_clear_bits(X86_CR4_LASS);
+			setup_clear_cpu_cap(X86_FEATURE_LASS);
+			pr_info_once("x86/cpu: Disabling LASS support due to vsyscall=emulate\n");
+		}
+
 		return 0;
 	}
 
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH v3 10/12] x86/vsyscall: Document the fact that vsyscall=emulate disables LASS
  2023-06-09 18:36 [PATCH v3 00/12] Enable Linear Address Space Separation support Alexander Shishkin
                   ` (8 preceding siblings ...)
  2023-06-09 18:36 ` [PATCH v3 09/12] x86/vsyscall: Disable LASS if vsyscall mode is set to EMULATE Alexander Shishkin
@ 2023-06-09 18:36 ` Alexander Shishkin
  2023-06-09 18:36 ` [PATCH v3 11/12] x86/cpu: Set LASS CR4 bit as pinning sensitive Alexander Shishkin
                   ` (3 subsequent siblings)
  13 siblings, 0 replies; 21+ messages in thread
From: Alexander Shishkin @ 2023-06-09 18:36 UTC (permalink / raw)
  To: linux-kernel, x86, Andy Lutomirski, Dave Hansen, Ravi Shankar,
	Tony Luck, Sohil Mehta
  Cc: Alexander Shishkin, Dave Hansen

Since EMULATE mode of vsyscall disables LASS, because fixing the LASS
violations during the EMULATE mode would need complex instruction
decoding, document this fact in kernel-parameters.txt.

Cc: Andy Lutomirski <luto@kernel.org>
Suggested-by: Dave Hansen <dave.hansen@intel.com>
Signed-off-by: Alexander Shishkin <alexander.shishkin@linux.intel.com>
---
 Documentation/admin-guide/kernel-parameters.txt | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
index 9e5bab29685f..efed9193107e 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -6850,7 +6850,9 @@
 
 			emulate     Vsyscalls turn into traps and are emulated
 			            reasonably safely.  The vsyscall page is
-				    readable.
+				    readable.  This also disables the LASS
+				    feature to allow userspace to poke around
+				    the vsyscall page.
 
 			xonly       [default] Vsyscalls turn into traps and are
 			            emulated reasonably safely.  The vsyscall
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH v3 11/12] x86/cpu: Set LASS CR4 bit as pinning sensitive
  2023-06-09 18:36 [PATCH v3 00/12] Enable Linear Address Space Separation support Alexander Shishkin
                   ` (9 preceding siblings ...)
  2023-06-09 18:36 ` [PATCH v3 10/12] x86/vsyscall: Document the fact that vsyscall=emulate disables LASS Alexander Shishkin
@ 2023-06-09 18:36 ` Alexander Shishkin
  2023-06-09 18:36 ` [RFC v3 12/12] x86/efi: Disable LASS enforcement when switching to EFI MM Alexander Shishkin
                   ` (2 subsequent siblings)
  13 siblings, 0 replies; 21+ messages in thread
From: Alexander Shishkin @ 2023-06-09 18:36 UTC (permalink / raw)
  To: linux-kernel, x86, Andy Lutomirski, Dave Hansen, Ravi Shankar,
	Tony Luck, Sohil Mehta
  Cc: Yian Chen, Alexander Shishkin

From: Yian Chen <yian.chen@intel.com>

Security features such as LASS are not expected to be disabled once
initialized. Add LASS to the CR4 pinned mask.

Signed-off-by: Yian Chen <yian.chen@intel.com>
Signed-off-by: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Reviewed-by: Tony Luck <tony.luck@intel.com>
---
 arch/x86/kernel/cpu/common.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c
index f26c56fe9963..9ddc19c8832d 100644
--- a/arch/x86/kernel/cpu/common.c
+++ b/arch/x86/kernel/cpu/common.c
@@ -422,7 +422,7 @@ static __always_inline void setup_lass(struct cpuinfo_x86 *c)
 /* These bits should not change their value after CPU init is finished. */
 static const unsigned long cr4_pinned_mask =
 	X86_CR4_SMEP | X86_CR4_SMAP | X86_CR4_UMIP |
-	X86_CR4_FSGSBASE | X86_CR4_CET;
+	X86_CR4_FSGSBASE | X86_CR4_CET | X86_CR4_LASS;
 static DEFINE_STATIC_KEY_FALSE_RO(cr_pinning);
 static unsigned long cr4_pinned_bits __ro_after_init;
 
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [RFC v3 12/12] x86/efi: Disable LASS enforcement when switching to EFI MM
  2023-06-09 18:36 [PATCH v3 00/12] Enable Linear Address Space Separation support Alexander Shishkin
                   ` (10 preceding siblings ...)
  2023-06-09 18:36 ` [PATCH v3 11/12] x86/cpu: Set LASS CR4 bit as pinning sensitive Alexander Shishkin
@ 2023-06-09 18:36 ` Alexander Shishkin
  2023-07-31 18:00 ` [PATCH v3 00/12] Enable Linear Address Space Separation support Alexander Shishkin
  2023-07-31 22:36 ` Edgecombe, Rick P
  13 siblings, 0 replies; 21+ messages in thread
From: Alexander Shishkin @ 2023-06-09 18:36 UTC (permalink / raw)
  To: linux-kernel, x86, Andy Lutomirski, Dave Hansen, Ravi Shankar,
	Tony Luck, Sohil Mehta
  Cc: Alexander Shishkin

From: Sohil Mehta <sohil.mehta@intel.com>

[Code is experimental and not yet ready to be merged upstream]

PeterZ suggested that EFI memory can be mapped in user virtual address
space which would trigger LASS violation upon access. It isn't exactly
clear how and when these user address mapping happen. It may be possible
this is related to EFI mixed mode.
Link:https://lore.kernel.org/lkml/Y73S56t%2FwDIGEPlK@hirez.programming.kicks-ass.net/

stac()/clac() calls in the EFI MM enter and exit functions trigger
objtool warnings due to switch_mm() not being classified as
func_uaccess_safe. Refer Objtool warnings section #9 in the document
tools/objtool/Documentation/objtool.txt. This would need to be resolved
before even considering merging.

Signed-off-by: Sohil Mehta <sohil.mehta@intel.com>
Signed-off-by: Alexander Shishkin <alexander.shishkin@linux.intel.com>
---
 arch/x86/platform/efi/efi_64.c | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/arch/x86/platform/efi/efi_64.c b/arch/x86/platform/efi/efi_64.c
index 232acf418cfb..20966efcd87a 100644
--- a/arch/x86/platform/efi/efi_64.c
+++ b/arch/x86/platform/efi/efi_64.c
@@ -473,9 +473,14 @@ void __init efi_dump_pagetable(void)
  * while the EFI-mm is borrowed. mmgrab()/mmdrop() is not used because the mm
  * can not change under us.
  * It should be ensured that there are no concurrent calls to this function.
+ *
+ * Disable LASS enforcement temporarily when switching to EFI MM since it could
+ * be mapped into the low 64-bit virtual address space with address bit 63 set
+ * to 0.
  */
 void efi_enter_mm(void)
 {
+	stac();
 	efi_prev_mm = current->active_mm;
 	current->active_mm = &efi_mm;
 	switch_mm(efi_prev_mm, &efi_mm, NULL);
@@ -485,6 +490,7 @@ void efi_leave_mm(void)
 {
 	current->active_mm = efi_prev_mm;
 	switch_mm(&efi_mm, efi_prev_mm, NULL);
+	clac();
 }
 
 static DEFINE_SPINLOCK(efi_runtime_lock);
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* Re: [PATCH v3 00/12] Enable Linear Address Space Separation support
  2023-06-09 18:36 [PATCH v3 00/12] Enable Linear Address Space Separation support Alexander Shishkin
                   ` (11 preceding siblings ...)
  2023-06-09 18:36 ` [RFC v3 12/12] x86/efi: Disable LASS enforcement when switching to EFI MM Alexander Shishkin
@ 2023-07-31 18:00 ` Alexander Shishkin
  2023-07-31 22:36 ` Edgecombe, Rick P
  13 siblings, 0 replies; 21+ messages in thread
From: Alexander Shishkin @ 2023-07-31 18:00 UTC (permalink / raw)
  To: linux-kernel, x86, Andy Lutomirski, Dave Hansen, Ravi Shankar,
	Tony Luck, Sohil Mehta, alexander.shishkin

Alexander Shishkin <alexander.shishkin@linux.intel.com> writes:

> Changes from v2[5]:
> - Added myself to the SoB chain

Gentle ping.

Regards,
--
Alex

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v3 00/12] Enable Linear Address Space Separation support
  2023-06-09 18:36 [PATCH v3 00/12] Enable Linear Address Space Separation support Alexander Shishkin
                   ` (12 preceding siblings ...)
  2023-07-31 18:00 ` [PATCH v3 00/12] Enable Linear Address Space Separation support Alexander Shishkin
@ 2023-07-31 22:36 ` Edgecombe, Rick P
  2023-08-01 19:50   ` Sohil Mehta
  2023-08-04 23:52   ` Edgecombe, Rick P
  13 siblings, 2 replies; 21+ messages in thread
From: Edgecombe, Rick P @ 2023-07-31 22:36 UTC (permalink / raw)
  To: Lutomirski, Andy, alexander.shishkin@linux.intel.com,
	dave.hansen@linux.intel.com, x86@kernel.org, Shankar, Ravi V,
	linux-kernel@vger.kernel.org, Luck, Tony, Mehta, Sohil

On Fri, 2023-06-09 at 21:36 +0300, Alexander Shishkin wrote:


What do NULL pointer de-references look like with LASS enabled? They
will be a #GP instead of a #PF, right? Currently the kernel prints out
several types of helpful messages:
 - "BUG: kernel NULL pointer dereference, address: %lx"
 - "BUG: unable to handle page fault for address: %px
 - "unable to execute userspace code (SMEP?) (uid: %d)"
 - etc

These will go away I guess, and you will get a more opaque "general
protection fault" message?

Assuming that is all right, I don't know if it might be worth tweaking
that #GP message, so people aren't confused when debugging. Something
that explains to turn off LASS to get more debugging info.

> Kernel accesses usually only happen to the kernel address space.
> However, there
> are valid reasons for kernel to access memory in the user half. For
> these cases
> (such as text poking and EFI runtime accesses), the kernel can
> temporarily
> suspend the enforcement of LASS by toggling SMAP (Supervisor Mode
> Access
> Prevention) using the stac()/clac() instructions.

CET introduces this unusual instruction called WRUSS. It allows you to
make user memory accesses while executing in the kernel. Because of
this special property, the CET shadow stack patches don't toggle
stac/clac while executing this instruction. So I think LASS will need
it to behave more like a normal userspace access from the kernel.
Shadow stack is not upstream yet, so just something to keep in mind for
the future.

Also, what is this series based on? I wasn't able to apply it.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v3 03/12] x86/alternatives: Disable LASS when patching kernel alternatives
  2023-06-09 18:36 ` [PATCH v3 03/12] x86/alternatives: Disable LASS when patching kernel alternatives Alexander Shishkin
@ 2023-07-31 22:41   ` Edgecombe, Rick P
  2023-08-01 21:10     ` Sohil Mehta
  0 siblings, 1 reply; 21+ messages in thread
From: Edgecombe, Rick P @ 2023-07-31 22:41 UTC (permalink / raw)
  To: Lutomirski, Andy, alexander.shishkin@linux.intel.com,
	dave.hansen@linux.intel.com, x86@kernel.org, Shankar, Ravi V,
	linux-kernel@vger.kernel.org, Luck, Tony, Mehta, Sohil

On Fri, 2023-06-09 at 21:36 +0300, Alexander Shishkin wrote:
> +/*
> + * poking_init() initializes the text poking address from the lower
> half of the
> + * address space. Relax LASS enforcement when accessing the poking
> address.
> + */
>  static void text_poke_memcpy(void *dst, const void *src, size_t len)
>  {
> -       memcpy(dst, src, len);
> +       stac();
> +       __inline_memcpy(dst, src, len);
> +       clac();
>  }
>  
>  static void text_poke_memset(void *dst, const void *src, size_t len)
>  {
>         int c = *(const int *)src;
>  
> -       memset(dst, c, len);
> +       stac();
> +       __inline_memset(dst, c, len);
> +       clac();
>  }

Why not do stac/clac in a single place inside __text_poke()?

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v3 00/12] Enable Linear Address Space Separation support
  2023-07-31 22:36 ` Edgecombe, Rick P
@ 2023-08-01 19:50   ` Sohil Mehta
  2023-08-01 20:38     ` Edgecombe, Rick P
  2023-08-04 23:52   ` Edgecombe, Rick P
  1 sibling, 1 reply; 21+ messages in thread
From: Sohil Mehta @ 2023-08-01 19:50 UTC (permalink / raw)
  To: Edgecombe, Rick P, Lutomirski, Andy,
	alexander.shishkin@linux.intel.com, dave.hansen@linux.intel.com,
	x86@kernel.org, Shankar, Ravi V, linux-kernel@vger.kernel.org,
	Luck, Tony

On 7/31/2023 3:36 PM, Edgecombe, Rick P wrote:
> CET introduces this unusual instruction called WRUSS. It allows you to
> make user memory accesses while executing in the kernel. Because of
> this special property, the CET shadow stack patches don't toggle
> stac/clac while executing this instruction. So I think LASS will need
> it to behave more like a normal userspace access from the kernel.
> Shadow stack is not upstream yet, so just something to keep in mind for
> the future.
> 

This is a good point. We should definitely test this out to confirm.

But, isn't WRUSS already defined as a user-mode access? So, in theory, a
user-mode access to a user address space *should* not be blocked by LASS
(even with CPL=0).

Are you suggesting that we might need to do something special for WRUSS
with LASS enabled?

Sohil

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v3 00/12] Enable Linear Address Space Separation support
  2023-08-01 19:50   ` Sohil Mehta
@ 2023-08-01 20:38     ` Edgecombe, Rick P
  0 siblings, 0 replies; 21+ messages in thread
From: Edgecombe, Rick P @ 2023-08-01 20:38 UTC (permalink / raw)
  To: Lutomirski, Andy, alexander.shishkin@linux.intel.com,
	x86@kernel.org, dave.hansen@linux.intel.com,
	linux-kernel@vger.kernel.org, Shankar, Ravi V, Luck, Tony,
	Mehta, Sohil

On Tue, 2023-08-01 at 12:50 -0700, Sohil Mehta wrote:
> On 7/31/2023 3:36 PM, Edgecombe, Rick P wrote:
> > CET introduces this unusual instruction called WRUSS. It allows you
> > to
> > make user memory accesses while executing in the kernel. Because of
> > this special property, the CET shadow stack patches don't toggle
> > stac/clac while executing this instruction. So I think LASS will
> > need
> > it to behave more like a normal userspace access from the kernel.
> > Shadow stack is not upstream yet, so just something to keep in mind
> > for
> > the future.
> > 
> 
> This is a good point. We should definitely test this out to confirm.
> 
> But, isn't WRUSS already defined as a user-mode access? So, in
> theory, a
> user-mode access to a user address space *should* not be blocked by
> LASS
> (even with CPL=0).
> 
> Are you suggesting that we might need to do something special for
> WRUSS
> with LASS enabled?

I was, but reading the docs, I think you are right. It looks like it
will be treated like a user access as far as LASS is concerned. Thanks.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v3 03/12] x86/alternatives: Disable LASS when patching kernel alternatives
  2023-07-31 22:41   ` Edgecombe, Rick P
@ 2023-08-01 21:10     ` Sohil Mehta
  2023-08-01 21:50       ` Edgecombe, Rick P
  0 siblings, 1 reply; 21+ messages in thread
From: Sohil Mehta @ 2023-08-01 21:10 UTC (permalink / raw)
  To: Edgecombe, Rick P, Lutomirski, Andy,
	alexander.shishkin@linux.intel.com, dave.hansen@linux.intel.com,
	x86@kernel.org, Shankar, Ravi V, linux-kernel@vger.kernel.org,
	Luck, Tony

> Why not do stac/clac in a single place inside __text_poke()?

It would mostly look something like this:
> diff --git a/arch/x86/kernel/alternative.c b/arch/x86/kernel/alternative.c
> index 0fbf8a631306..02ef08e2575d 100644
> --- a/arch/x86/kernel/alternative.c
> +++ b/arch/x86/kernel/alternative.c
> @@ -1781,7 +1781,9 @@ static void *__text_poke(text_poke_f func, void *addr, const void *src, size_t l
>         prev = use_temporary_mm(poking_mm);
> 
>         kasan_disable_current();
> +       stac();
>         func((u8 *)poking_addr + offset_in_page(addr), src, len);
> +       clac();
>         kasan_enable_current();
> 
>         /*

Since, __text_poke() uses a dynamic function to call into
text_poke_memcpy() and text_poke_memset(), objtool would still complain.

> arch/x86/kernel/alternative.o: warning: objtool: __text_poke+0x259: call to {dynamic}() with UACCESS enabled

We could change __text_poke() to not use the dynamic func but it might
be a bit heavy handed to save a couple of lines of stac/clac calls. The
current trade-off seems reasonable to me.

Did you have something different in mind?

Sohil



^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v3 03/12] x86/alternatives: Disable LASS when patching kernel alternatives
  2023-08-01 21:10     ` Sohil Mehta
@ 2023-08-01 21:50       ` Edgecombe, Rick P
  0 siblings, 0 replies; 21+ messages in thread
From: Edgecombe, Rick P @ 2023-08-01 21:50 UTC (permalink / raw)
  To: Lutomirski, Andy, alexander.shishkin@linux.intel.com,
	x86@kernel.org, dave.hansen@linux.intel.com,
	linux-kernel@vger.kernel.org, Shankar, Ravi V, Luck, Tony,
	Mehta, Sohil

On Tue, 2023-08-01 at 14:10 -0700, Sohil Mehta wrote:
> > Why not do stac/clac in a single place inside __text_poke()?
> 
> It would mostly look something like this:
> > diff --git a/arch/x86/kernel/alternative.c
> > b/arch/x86/kernel/alternative.c
> > index 0fbf8a631306..02ef08e2575d 100644
> > --- a/arch/x86/kernel/alternative.c
> > +++ b/arch/x86/kernel/alternative.c
> > @@ -1781,7 +1781,9 @@ static void *__text_poke(text_poke_f func,
> > void *addr, const void *src, size_t l
> >          prev = use_temporary_mm(poking_mm);
> > 
> >          kasan_disable_current();
> > +       stac();
> >          func((u8 *)poking_addr + offset_in_page(addr), src, len);
> > +       clac();
> >          kasan_enable_current();
> > 
> >          /*
> 
> Since, __text_poke() uses a dynamic function to call into
> text_poke_memcpy() and text_poke_memset(), objtool would still
> complain.
> 
> > arch/x86/kernel/alternative.o: warning: objtool: __text_poke+0x259:
> > call to {dynamic}() with UACCESS enabled
> 
> We could change __text_poke() to not use the dynamic func but it
> might
> be a bit heavy handed to save a couple of lines of stac/clac calls.
> The
> current trade-off seems reasonable to me.
> 
> Did you have something different in mind?

I wondered if it might be something like that. Yes, seems like an ok
tradeoff.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v3 00/12] Enable Linear Address Space Separation support
  2023-07-31 22:36 ` Edgecombe, Rick P
  2023-08-01 19:50   ` Sohil Mehta
@ 2023-08-04 23:52   ` Edgecombe, Rick P
  1 sibling, 0 replies; 21+ messages in thread
From: Edgecombe, Rick P @ 2023-08-04 23:52 UTC (permalink / raw)
  To: Lutomirski, Andy, alexander.shishkin@linux.intel.com,
	dave.hansen@linux.intel.com, x86@kernel.org, Shankar, Ravi V,
	linux-kernel@vger.kernel.org, Luck, Tony, Mehta, Sohil

On Mon, 2023-07-31 at 15:36 -0700, Rick Edgecombe wrote:
> On Fri, 2023-06-09 at 21:36 +0300, Alexander Shishkin wrote:
> 
> 
> What do NULL pointer de-references look like with LASS enabled? They
> will be a #GP instead of a #PF, right? Currently the kernel prints
> out
> several types of helpful messages:
>  - "BUG: kernel NULL pointer dereference, address: %lx"
>  - "BUG: unable to handle page fault for address: %px
>  - "unable to execute userspace code (SMEP?) (uid: %d)"
>  - etc
> 
> These will go away I guess, and you will get a more opaque "general
> protection fault" message?
> 
> Assuming that is all right, I don't know if it might be worth
> tweaking
> that #GP message, so people aren't confused when debugging. Something
> that explains to turn off LASS to get more debugging info.

Maybe get_kernel_gp_address() could be enhanced to give hints for some
of those cases like it does for non-canonical addresses?


Separately, I think there is a tiny userspace visible change with this.
If userspace tries to access the kernel half of the cannonical address
space they will get a segfault. It seems previously the signal would
have REG_TRAPNO as 14 (X86_TRAP_PF) in this case, but with LASS it will
be 13 (X86_TRAP_GP).

I did a quick search and couldn't find any applications that seemed to
be relying on this behavior (not surprising). Some are looking for
REG_TRAPNO as 14, but none appeared to be relying on accesses to kernel
memory so I guess this should be ok. Still, it is probably appropriate
to call out the change and CC linux-api.

It makes me wonder if it should match for LASS and not LASS going
forward though. Like maybe always do X86_TRAP_GP for user->kernel
accesses instead of having it vary by whether LASS is used? Since there
isn't enough information to do REG_TRAPNO X86_TRAP_PF when LASS is
used.

^ permalink raw reply	[flat|nested] 21+ messages in thread

end of thread, other threads:[~2023-08-04 23:53 UTC | newest]

Thread overview: 21+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-06-09 18:36 [PATCH v3 00/12] Enable Linear Address Space Separation support Alexander Shishkin
2023-06-09 18:36 ` [PATCH v3 01/12] x86/cpu: Enumerate the LASS feature bits Alexander Shishkin
2023-06-09 18:36 ` [PATCH v3 02/12] x86/asm: Introduce inline memcpy and memset Alexander Shishkin
2023-06-09 18:36 ` [PATCH v3 03/12] x86/alternatives: Disable LASS when patching kernel alternatives Alexander Shishkin
2023-07-31 22:41   ` Edgecombe, Rick P
2023-08-01 21:10     ` Sohil Mehta
2023-08-01 21:50       ` Edgecombe, Rick P
2023-06-09 18:36 ` [PATCH v3 04/12] x86/cpu: Enable LASS during CPU initialization Alexander Shishkin
2023-06-09 18:36 ` [PATCH v3 05/12] x86/cpu: Remove redundant comment during feature setup Alexander Shishkin
2023-06-09 18:36 ` [PATCH v3 06/12] x86/vsyscall: Reorganize the #PF emulation code Alexander Shishkin
2023-06-09 18:36 ` [PATCH v3 07/12] x86/traps: Consolidate user fixups in exc_general_protection() Alexander Shishkin
2023-06-09 18:36 ` [PATCH v3 08/12] x86/vsyscall: Add vsyscall emulation for #GP Alexander Shishkin
2023-06-09 18:36 ` [PATCH v3 09/12] x86/vsyscall: Disable LASS if vsyscall mode is set to EMULATE Alexander Shishkin
2023-06-09 18:36 ` [PATCH v3 10/12] x86/vsyscall: Document the fact that vsyscall=emulate disables LASS Alexander Shishkin
2023-06-09 18:36 ` [PATCH v3 11/12] x86/cpu: Set LASS CR4 bit as pinning sensitive Alexander Shishkin
2023-06-09 18:36 ` [RFC v3 12/12] x86/efi: Disable LASS enforcement when switching to EFI MM Alexander Shishkin
2023-07-31 18:00 ` [PATCH v3 00/12] Enable Linear Address Space Separation support Alexander Shishkin
2023-07-31 22:36 ` Edgecombe, Rick P
2023-08-01 19:50   ` Sohil Mehta
2023-08-01 20:38     ` Edgecombe, Rick P
2023-08-04 23:52   ` Edgecombe, Rick P

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).