All the mail mirrored from lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/4] arm64: Support the TSO memory model
@ 2024-04-11  0:51 ` Hector Martin
  0 siblings, 0 replies; 40+ messages in thread
From: Hector Martin @ 2024-04-11  0:51 UTC (permalink / raw)
  To: Catalin Marinas, Will Deacon, Marc Zyngier, Mark Rutland
  Cc: Zayd Qumsieh, Justin Lu, Ryan Houdek, Mark Brown, Ard Biesheuvel,
	Mateusz Guzik, Anshuman Khandual, Oliver Upton, Miguel Luis,
	Joey Gouly, Christoph Paasch, Kees Cook, Sami Tolvanen,
	Baoquan He, Joel Granados, Dawei Li, Andrew Morton,
	Florent Revest, David Hildenbrand, Stefan Roesch, Andy Chiu,
	Josh Triplett, Oleg Nesterov, Helge Deller, Zev Weiss,
	Ondrej Mosnacek, Miguel Ojeda, linux-arm-kernel, linux-kernel,
	Asahi Linux, Hector Martin

x86 CPUs implement a stricter memory modern than ARM64 (TSO). For this
reason, x86 emulation on baseline ARM64 systems requires very expensive
memory model emulation. Having hardware that supports this natively is
therefore very attractive. Such hardware, in fact, exists. This series
adds support for userspace to identify when TSO is available and
toggle it on, if supported.

Some ARM64 CPUs intrinsically implement the TSO memory model, while
others expose is as an IMPDEF control. Apple Silicon SoCs are in the
latter category. Using TSO for x86 emulation on chips that support it
has been shown to provide a massive performance boost [1].

Patch 1 introduces the PR_{SET,GET}_MEM_MODEL userspace control, which
is initially not implemented for any architectures.

Patch 2 implements it for CPUs which are known, to the best of my
knowledge, to always implement the TSO memory model unconditionally.
This uses the cpufeature mechanism to only enable this if *all* cores in
the system meet the requirements.

Patch 3 adds the scaffolding necesasry to save/restore the ACTLR_EL1
register across context switches. This register contains IMPDEF flags
related to CPU execution, and on Apple CPUs this is where the runtime
TSO toggle bit is implemented. Other CPUs could conceivably benefit from
this scaffolding if they also use ACTLR_EL1 for things that could
ostensibly be runtime controlled and context-switched. For this to work,
ACTLR_EL1 must have a uniform layout across all cores in the system.

Finally, patch 4 implements PR_{SET,GET}_MEM_MODEL for Apple CPUs by
hooking it up to flip the appropriate ACTLR_EL1 bit when the Apple TSO
feature is detected (on all CPUs, which also implies the uniform
ACTLR_EL1 layout).

This series has been brewing in the downstream Asahi Linux tree for a
while now, and ships to thousands of users. A subset have been using it
with FEX-Emu, which already supports this feature. This rebase on
v6.9-rc1 is only build-tested (all intermediate commits with and without
the config enabled, on ARM64) but I'll update the downstream branch soon
with this version and get it pushed out to users/testers.

The Apple support works on bare metal and *should* work exactly the same
way on macOS VMs (as alluded to by Zayd in his independent submission [3]),
though I haven't personally verified this. KVM support for this is left
for a future patchset.

(Apologies for the large Cc: list; I want to make sure nobody who got
Cced on Zayd's alternate take is left out of this one.) 

[1] https://fex-emu.com/FEX-2306/
[2] https://github.com/AsahiLinux/linux/tree/bits/220-tso
[3] https://lore.kernel.org/lkml/20240410211652.16640-1-zayd_qumsieh@apple.com/

To: Catalin Marinas <catalin.marinas@arm.com>
To: Will Deacon <will@kernel.org>
To: Marc Zyngier <maz@kernel.org>
To: Mark Rutland <mark.rutland@arm.com>
Cc: Zayd Qumsieh <zayd_qumsieh@apple.com>
Cc: Justin Lu <ih_justin@apple.com>
Cc: Ryan Houdek <Houdek.Ryan@fex-emu.org>
Cc: Mark Brown <broonie@kernel.org>
Cc: Ard Biesheuvel <ardb@kernel.org>
Cc: Mateusz Guzik <mjguzik@gmail.com>
Cc: Anshuman Khandual <anshuman.khandual@arm.com>
Cc: Oliver Upton <oliver.upton@linux.dev>
Cc: Miguel Luis <miguel.luis@oracle.com>
Cc: Joey Gouly <joey.gouly@arm.com>
Cc: Christoph Paasch <cpaasch@apple.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Sami Tolvanen <samitolvanen@google.com>
Cc: Baoquan He <bhe@redhat.com>
Cc: Joel Granados <j.granados@samsung.com>
Cc: Dawei Li <dawei.li@shingroup.cn>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Florent Revest <revest@chromium.org>
Cc: David Hildenbrand <david@redhat.com>
Cc: Stefan Roesch <shr@devkernel.io>
Cc: Andy Chiu <andy.chiu@sifive.com>
Cc: Josh Triplett <josh@joshtriplett.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Helge Deller <deller@gmx.de>
Cc: Zev Weiss <zev@bewilderbeest.net>
Cc: Ondrej Mosnacek <omosnace@redhat.com>
Cc: Miguel Ojeda <ojeda@kernel.org>
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-kernel@vger.kernel.org
Cc: Asahi Linux <asahi@lists.linux.dev>

Signed-off-by: Hector Martin <marcan@marcan.st>
---
Hector Martin (4):
      prctl: Introduce PR_{SET,GET}_MEM_MODEL
      arm64: Implement PR_{GET,SET}_MEM_MODEL for always-TSO CPUs
      arm64: Introduce scaffolding to add ACTLR_EL1 to thread state
      arm64: Implement Apple IMPDEF TSO memory model control

 arch/arm64/Kconfig                        | 14 ++++++
 arch/arm64/include/asm/apple_cpufeature.h | 15 +++++++
 arch/arm64/include/asm/cpufeature.h       | 10 +++++
 arch/arm64/include/asm/processor.h        |  3 ++
 arch/arm64/kernel/Makefile                |  3 +-
 arch/arm64/kernel/cpufeature.c            | 11 ++---
 arch/arm64/kernel/cpufeature_impdef.c     | 61 ++++++++++++++++++++++++++
 arch/arm64/kernel/process.c               | 71 +++++++++++++++++++++++++++++++
 arch/arm64/kernel/setup.c                 |  8 ++++
 arch/arm64/tools/cpucaps                  |  2 +
 include/linux/memory_ordering_model.h     | 11 +++++
 include/uapi/linux/prctl.h                |  5 +++
 kernel/sys.c                              | 21 +++++++++
 13 files changed, 229 insertions(+), 6 deletions(-)
---
base-commit: 4cece764965020c22cff7665b18a012006359095
change-id: 20240411-tso-e86fdceb94b8

Best regards,
-- 
Hector Martin <marcan@marcan.st>


^ permalink raw reply	[flat|nested] 40+ messages in thread

* [PATCH 0/4] arm64: Support the TSO memory model
@ 2024-04-11  0:51 ` Hector Martin
  0 siblings, 0 replies; 40+ messages in thread
From: Hector Martin @ 2024-04-11  0:51 UTC (permalink / raw)
  To: Catalin Marinas, Will Deacon, Marc Zyngier, Mark Rutland
  Cc: Zayd Qumsieh, Justin Lu, Ryan Houdek, Mark Brown, Ard Biesheuvel,
	Mateusz Guzik, Anshuman Khandual, Oliver Upton, Miguel Luis,
	Joey Gouly, Christoph Paasch, Kees Cook, Sami Tolvanen,
	Baoquan He, Joel Granados, Dawei Li, Andrew Morton,
	Florent Revest, David Hildenbrand, Stefan Roesch, Andy Chiu,
	Josh Triplett, Oleg Nesterov, Helge Deller, Zev Weiss,
	Ondrej Mosnacek, Miguel Ojeda, linux-arm-kernel, linux-kernel,
	Asahi Linux, Hector Martin

x86 CPUs implement a stricter memory modern than ARM64 (TSO). For this
reason, x86 emulation on baseline ARM64 systems requires very expensive
memory model emulation. Having hardware that supports this natively is
therefore very attractive. Such hardware, in fact, exists. This series
adds support for userspace to identify when TSO is available and
toggle it on, if supported.

Some ARM64 CPUs intrinsically implement the TSO memory model, while
others expose is as an IMPDEF control. Apple Silicon SoCs are in the
latter category. Using TSO for x86 emulation on chips that support it
has been shown to provide a massive performance boost [1].

Patch 1 introduces the PR_{SET,GET}_MEM_MODEL userspace control, which
is initially not implemented for any architectures.

Patch 2 implements it for CPUs which are known, to the best of my
knowledge, to always implement the TSO memory model unconditionally.
This uses the cpufeature mechanism to only enable this if *all* cores in
the system meet the requirements.

Patch 3 adds the scaffolding necesasry to save/restore the ACTLR_EL1
register across context switches. This register contains IMPDEF flags
related to CPU execution, and on Apple CPUs this is where the runtime
TSO toggle bit is implemented. Other CPUs could conceivably benefit from
this scaffolding if they also use ACTLR_EL1 for things that could
ostensibly be runtime controlled and context-switched. For this to work,
ACTLR_EL1 must have a uniform layout across all cores in the system.

Finally, patch 4 implements PR_{SET,GET}_MEM_MODEL for Apple CPUs by
hooking it up to flip the appropriate ACTLR_EL1 bit when the Apple TSO
feature is detected (on all CPUs, which also implies the uniform
ACTLR_EL1 layout).

This series has been brewing in the downstream Asahi Linux tree for a
while now, and ships to thousands of users. A subset have been using it
with FEX-Emu, which already supports this feature. This rebase on
v6.9-rc1 is only build-tested (all intermediate commits with and without
the config enabled, on ARM64) but I'll update the downstream branch soon
with this version and get it pushed out to users/testers.

The Apple support works on bare metal and *should* work exactly the same
way on macOS VMs (as alluded to by Zayd in his independent submission [3]),
though I haven't personally verified this. KVM support for this is left
for a future patchset.

(Apologies for the large Cc: list; I want to make sure nobody who got
Cced on Zayd's alternate take is left out of this one.) 

[1] https://fex-emu.com/FEX-2306/
[2] https://github.com/AsahiLinux/linux/tree/bits/220-tso
[3] https://lore.kernel.org/lkml/20240410211652.16640-1-zayd_qumsieh@apple.com/

To: Catalin Marinas <catalin.marinas@arm.com>
To: Will Deacon <will@kernel.org>
To: Marc Zyngier <maz@kernel.org>
To: Mark Rutland <mark.rutland@arm.com>
Cc: Zayd Qumsieh <zayd_qumsieh@apple.com>
Cc: Justin Lu <ih_justin@apple.com>
Cc: Ryan Houdek <Houdek.Ryan@fex-emu.org>
Cc: Mark Brown <broonie@kernel.org>
Cc: Ard Biesheuvel <ardb@kernel.org>
Cc: Mateusz Guzik <mjguzik@gmail.com>
Cc: Anshuman Khandual <anshuman.khandual@arm.com>
Cc: Oliver Upton <oliver.upton@linux.dev>
Cc: Miguel Luis <miguel.luis@oracle.com>
Cc: Joey Gouly <joey.gouly@arm.com>
Cc: Christoph Paasch <cpaasch@apple.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Sami Tolvanen <samitolvanen@google.com>
Cc: Baoquan He <bhe@redhat.com>
Cc: Joel Granados <j.granados@samsung.com>
Cc: Dawei Li <dawei.li@shingroup.cn>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Florent Revest <revest@chromium.org>
Cc: David Hildenbrand <david@redhat.com>
Cc: Stefan Roesch <shr@devkernel.io>
Cc: Andy Chiu <andy.chiu@sifive.com>
Cc: Josh Triplett <josh@joshtriplett.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Helge Deller <deller@gmx.de>
Cc: Zev Weiss <zev@bewilderbeest.net>
Cc: Ondrej Mosnacek <omosnace@redhat.com>
Cc: Miguel Ojeda <ojeda@kernel.org>
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-kernel@vger.kernel.org
Cc: Asahi Linux <asahi@lists.linux.dev>

Signed-off-by: Hector Martin <marcan@marcan.st>
---
Hector Martin (4):
      prctl: Introduce PR_{SET,GET}_MEM_MODEL
      arm64: Implement PR_{GET,SET}_MEM_MODEL for always-TSO CPUs
      arm64: Introduce scaffolding to add ACTLR_EL1 to thread state
      arm64: Implement Apple IMPDEF TSO memory model control

 arch/arm64/Kconfig                        | 14 ++++++
 arch/arm64/include/asm/apple_cpufeature.h | 15 +++++++
 arch/arm64/include/asm/cpufeature.h       | 10 +++++
 arch/arm64/include/asm/processor.h        |  3 ++
 arch/arm64/kernel/Makefile                |  3 +-
 arch/arm64/kernel/cpufeature.c            | 11 ++---
 arch/arm64/kernel/cpufeature_impdef.c     | 61 ++++++++++++++++++++++++++
 arch/arm64/kernel/process.c               | 71 +++++++++++++++++++++++++++++++
 arch/arm64/kernel/setup.c                 |  8 ++++
 arch/arm64/tools/cpucaps                  |  2 +
 include/linux/memory_ordering_model.h     | 11 +++++
 include/uapi/linux/prctl.h                |  5 +++
 kernel/sys.c                              | 21 +++++++++
 13 files changed, 229 insertions(+), 6 deletions(-)
---
base-commit: 4cece764965020c22cff7665b18a012006359095
change-id: 20240411-tso-e86fdceb94b8

Best regards,
-- 
Hector Martin <marcan@marcan.st>


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 40+ messages in thread

* [PATCH 1/4] prctl: Introduce PR_{SET,GET}_MEM_MODEL
  2024-04-11  0:51 ` Hector Martin
@ 2024-04-11  0:51   ` Hector Martin
  -1 siblings, 0 replies; 40+ messages in thread
From: Hector Martin @ 2024-04-11  0:51 UTC (permalink / raw)
  To: Catalin Marinas, Will Deacon, Marc Zyngier, Mark Rutland
  Cc: Zayd Qumsieh, Justin Lu, Ryan Houdek, Mark Brown, Ard Biesheuvel,
	Mateusz Guzik, Anshuman Khandual, Oliver Upton, Miguel Luis,
	Joey Gouly, Christoph Paasch, Kees Cook, Sami Tolvanen,
	Baoquan He, Joel Granados, Dawei Li, Andrew Morton,
	Florent Revest, David Hildenbrand, Stefan Roesch, Andy Chiu,
	Josh Triplett, Oleg Nesterov, Helge Deller, Zev Weiss,
	Ondrej Mosnacek, Miguel Ojeda, linux-arm-kernel, linux-kernel,
	Asahi Linux, Hector Martin

On some architectures, it is possible to query and/or change the CPU
memory model. This allows userspace to switch to a stricter memory model
for performance reasons, such as when emulating code for another
architecture where that model is the default.

Introduce two prctls to allow userspace to query and set the memory
model for a thread. Two models are initially defined:

- PR_SET_MEM_MODEL_DEFAULT requests the default memory model for the
  architecture.
- PR_SET_MEM_MODEL_TSO requests the x86 TSO memory model.

PR_SET_MEM_MODEL is allowed to set a stricter memory model than
requested if available, in which case it will return successfully. If
the requested memory model cannot be fulfilled, it will return an error.
The memory model that was actually set can be queried by a subsequent
call to PR_GET_MEM_MODEL.

Examples:
- On a CPU with not support for a memory model at least as strong as
  TSO, PR_SET_MEM_MODEL(PR_SET_MEM_MODEL_TSO) fails.
- On a CPU with runtime-configurable TSO support, PR_SET_MEM_MODEL can
  toggle the memory model between DEFAULT and TSO at will.
- On a CPU where the only memory model is at least as strict as TSO,
  PR_GET_MEM_MODEL will return PR_SET_MEM_MODEL_DEFAULT, and
  PR_SET_MEM_MODEL(PR_SET_MEM_MODEL_TSO) will return success but leave
  the memory model at PR_SET_MEM_MODEL_DEFAULT. This implies that the
  default is in fact at least as strict as TSO.

Signed-off-by: Hector Martin <marcan@marcan.st>
---
 include/linux/memory_ordering_model.h | 11 +++++++++++
 include/uapi/linux/prctl.h            |  5 +++++
 kernel/sys.c                          | 21 +++++++++++++++++++++
 3 files changed, 37 insertions(+)

diff --git a/include/linux/memory_ordering_model.h b/include/linux/memory_ordering_model.h
new file mode 100644
index 000000000000..267a12ca6630
--- /dev/null
+++ b/include/linux/memory_ordering_model.h
@@ -0,0 +1,11 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef __ASM_MEMORY_ORDERING_MODEL_H
+#define __ASM_MEMORY_ORDERING_MODEL_H
+
+/* Arch hooks to implement the PR_{GET_SET}_MEM_MODEL prctls */
+
+struct task_struct;
+int arch_prctl_mem_model_get(struct task_struct *t);
+int arch_prctl_mem_model_set(struct task_struct *t, unsigned long val);
+
+#endif
diff --git a/include/uapi/linux/prctl.h b/include/uapi/linux/prctl.h
index 370ed14b1ae0..961216093f11 100644
--- a/include/uapi/linux/prctl.h
+++ b/include/uapi/linux/prctl.h
@@ -306,4 +306,9 @@ struct prctl_mm_map {
 # define PR_RISCV_V_VSTATE_CTRL_NEXT_MASK	0xc
 # define PR_RISCV_V_VSTATE_CTRL_MASK		0x1f
 
+#define PR_GET_MEM_MODEL	0x6d4d444c
+#define PR_SET_MEM_MODEL	0x4d4d444c
+# define PR_SET_MEM_MODEL_DEFAULT	0
+# define PR_SET_MEM_MODEL_TSO		1
+
 #endif /* _LINUX_PRCTL_H */
diff --git a/kernel/sys.c b/kernel/sys.c
index f8e543f1e38a..6af659a9f826 100644
--- a/kernel/sys.c
+++ b/kernel/sys.c
@@ -45,6 +45,7 @@
 #include <linux/version.h>
 #include <linux/ctype.h>
 #include <linux/syscall_user_dispatch.h>
+#include <linux/memory_ordering_model.h>
 
 #include <linux/compat.h>
 #include <linux/syscalls.h>
@@ -2442,6 +2443,16 @@ static int prctl_get_auxv(void __user *addr, unsigned long len)
 	return sizeof(mm->saved_auxv);
 }
 
+int __weak arch_prctl_mem_model_get(struct task_struct *t)
+{
+	return -EINVAL;
+}
+
+int __weak arch_prctl_mem_model_set(struct task_struct *t, unsigned long val)
+{
+	return -EINVAL;
+}
+
 SYSCALL_DEFINE5(prctl, int, option, unsigned long, arg2, unsigned long, arg3,
 		unsigned long, arg4, unsigned long, arg5)
 {
@@ -2757,6 +2768,16 @@ SYSCALL_DEFINE5(prctl, int, option, unsigned long, arg2, unsigned long, arg3,
 	case PR_RISCV_V_GET_CONTROL:
 		error = RISCV_V_GET_CONTROL();
 		break;
+	case PR_GET_MEM_MODEL:
+		if (arg2 || arg3 || arg4 || arg5)
+			return -EINVAL;
+		error = arch_prctl_mem_model_get(me);
+		break;
+	case PR_SET_MEM_MODEL:
+		if (arg3 || arg4 || arg5)
+			return -EINVAL;
+		error = arch_prctl_mem_model_set(me, arg2);
+		break;
 	default:
 		error = -EINVAL;
 		break;

-- 
2.44.0


^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [PATCH 1/4] prctl: Introduce PR_{SET,GET}_MEM_MODEL
@ 2024-04-11  0:51   ` Hector Martin
  0 siblings, 0 replies; 40+ messages in thread
From: Hector Martin @ 2024-04-11  0:51 UTC (permalink / raw)
  To: Catalin Marinas, Will Deacon, Marc Zyngier, Mark Rutland
  Cc: Zayd Qumsieh, Justin Lu, Ryan Houdek, Mark Brown, Ard Biesheuvel,
	Mateusz Guzik, Anshuman Khandual, Oliver Upton, Miguel Luis,
	Joey Gouly, Christoph Paasch, Kees Cook, Sami Tolvanen,
	Baoquan He, Joel Granados, Dawei Li, Andrew Morton,
	Florent Revest, David Hildenbrand, Stefan Roesch, Andy Chiu,
	Josh Triplett, Oleg Nesterov, Helge Deller, Zev Weiss,
	Ondrej Mosnacek, Miguel Ojeda, linux-arm-kernel, linux-kernel,
	Asahi Linux, Hector Martin

On some architectures, it is possible to query and/or change the CPU
memory model. This allows userspace to switch to a stricter memory model
for performance reasons, such as when emulating code for another
architecture where that model is the default.

Introduce two prctls to allow userspace to query and set the memory
model for a thread. Two models are initially defined:

- PR_SET_MEM_MODEL_DEFAULT requests the default memory model for the
  architecture.
- PR_SET_MEM_MODEL_TSO requests the x86 TSO memory model.

PR_SET_MEM_MODEL is allowed to set a stricter memory model than
requested if available, in which case it will return successfully. If
the requested memory model cannot be fulfilled, it will return an error.
The memory model that was actually set can be queried by a subsequent
call to PR_GET_MEM_MODEL.

Examples:
- On a CPU with not support for a memory model at least as strong as
  TSO, PR_SET_MEM_MODEL(PR_SET_MEM_MODEL_TSO) fails.
- On a CPU with runtime-configurable TSO support, PR_SET_MEM_MODEL can
  toggle the memory model between DEFAULT and TSO at will.
- On a CPU where the only memory model is at least as strict as TSO,
  PR_GET_MEM_MODEL will return PR_SET_MEM_MODEL_DEFAULT, and
  PR_SET_MEM_MODEL(PR_SET_MEM_MODEL_TSO) will return success but leave
  the memory model at PR_SET_MEM_MODEL_DEFAULT. This implies that the
  default is in fact at least as strict as TSO.

Signed-off-by: Hector Martin <marcan@marcan.st>
---
 include/linux/memory_ordering_model.h | 11 +++++++++++
 include/uapi/linux/prctl.h            |  5 +++++
 kernel/sys.c                          | 21 +++++++++++++++++++++
 3 files changed, 37 insertions(+)

diff --git a/include/linux/memory_ordering_model.h b/include/linux/memory_ordering_model.h
new file mode 100644
index 000000000000..267a12ca6630
--- /dev/null
+++ b/include/linux/memory_ordering_model.h
@@ -0,0 +1,11 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef __ASM_MEMORY_ORDERING_MODEL_H
+#define __ASM_MEMORY_ORDERING_MODEL_H
+
+/* Arch hooks to implement the PR_{GET_SET}_MEM_MODEL prctls */
+
+struct task_struct;
+int arch_prctl_mem_model_get(struct task_struct *t);
+int arch_prctl_mem_model_set(struct task_struct *t, unsigned long val);
+
+#endif
diff --git a/include/uapi/linux/prctl.h b/include/uapi/linux/prctl.h
index 370ed14b1ae0..961216093f11 100644
--- a/include/uapi/linux/prctl.h
+++ b/include/uapi/linux/prctl.h
@@ -306,4 +306,9 @@ struct prctl_mm_map {
 # define PR_RISCV_V_VSTATE_CTRL_NEXT_MASK	0xc
 # define PR_RISCV_V_VSTATE_CTRL_MASK		0x1f
 
+#define PR_GET_MEM_MODEL	0x6d4d444c
+#define PR_SET_MEM_MODEL	0x4d4d444c
+# define PR_SET_MEM_MODEL_DEFAULT	0
+# define PR_SET_MEM_MODEL_TSO		1
+
 #endif /* _LINUX_PRCTL_H */
diff --git a/kernel/sys.c b/kernel/sys.c
index f8e543f1e38a..6af659a9f826 100644
--- a/kernel/sys.c
+++ b/kernel/sys.c
@@ -45,6 +45,7 @@
 #include <linux/version.h>
 #include <linux/ctype.h>
 #include <linux/syscall_user_dispatch.h>
+#include <linux/memory_ordering_model.h>
 
 #include <linux/compat.h>
 #include <linux/syscalls.h>
@@ -2442,6 +2443,16 @@ static int prctl_get_auxv(void __user *addr, unsigned long len)
 	return sizeof(mm->saved_auxv);
 }
 
+int __weak arch_prctl_mem_model_get(struct task_struct *t)
+{
+	return -EINVAL;
+}
+
+int __weak arch_prctl_mem_model_set(struct task_struct *t, unsigned long val)
+{
+	return -EINVAL;
+}
+
 SYSCALL_DEFINE5(prctl, int, option, unsigned long, arg2, unsigned long, arg3,
 		unsigned long, arg4, unsigned long, arg5)
 {
@@ -2757,6 +2768,16 @@ SYSCALL_DEFINE5(prctl, int, option, unsigned long, arg2, unsigned long, arg3,
 	case PR_RISCV_V_GET_CONTROL:
 		error = RISCV_V_GET_CONTROL();
 		break;
+	case PR_GET_MEM_MODEL:
+		if (arg2 || arg3 || arg4 || arg5)
+			return -EINVAL;
+		error = arch_prctl_mem_model_get(me);
+		break;
+	case PR_SET_MEM_MODEL:
+		if (arg3 || arg4 || arg5)
+			return -EINVAL;
+		error = arch_prctl_mem_model_set(me, arg2);
+		break;
 	default:
 		error = -EINVAL;
 		break;

-- 
2.44.0


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [PATCH 2/4] arm64: Implement PR_{GET,SET}_MEM_MODEL for always-TSO CPUs
  2024-04-11  0:51 ` Hector Martin
@ 2024-04-11  0:51   ` Hector Martin
  -1 siblings, 0 replies; 40+ messages in thread
From: Hector Martin @ 2024-04-11  0:51 UTC (permalink / raw)
  To: Catalin Marinas, Will Deacon, Marc Zyngier, Mark Rutland
  Cc: Zayd Qumsieh, Justin Lu, Ryan Houdek, Mark Brown, Ard Biesheuvel,
	Mateusz Guzik, Anshuman Khandual, Oliver Upton, Miguel Luis,
	Joey Gouly, Christoph Paasch, Kees Cook, Sami Tolvanen,
	Baoquan He, Joel Granados, Dawei Li, Andrew Morton,
	Florent Revest, David Hildenbrand, Stefan Roesch, Andy Chiu,
	Josh Triplett, Oleg Nesterov, Helge Deller, Zev Weiss,
	Ondrej Mosnacek, Miguel Ojeda, linux-arm-kernel, linux-kernel,
	Asahi Linux, Hector Martin

Some ARM64 implementations are known to always use the TSO memory model.
Add trivial support for the PR_{GET,SET}_MEM_MODEL prctl, which allows
userspace to learn this fact.

Known TSO implementations:
- Nvidia Denver
- Nvidia Carmel
- Fujitsu A64FX

Signed-off-by: Hector Martin <marcan@marcan.st>
---
 arch/arm64/Kconfig                    |  9 +++++++++
 arch/arm64/include/asm/cpufeature.h   |  4 ++++
 arch/arm64/kernel/Makefile            |  3 ++-
 arch/arm64/kernel/cpufeature.c        | 11 +++++-----
 arch/arm64/kernel/cpufeature_impdef.c | 38 +++++++++++++++++++++++++++++++++++
 arch/arm64/kernel/process.c           | 24 ++++++++++++++++++++++
 arch/arm64/tools/cpucaps              |  1 +
 7 files changed, 84 insertions(+), 6 deletions(-)

diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index 7b11c98b3e84..f8e66fe44ff4 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -2162,6 +2162,15 @@ config ARM64_DEBUG_PRIORITY_MASKING
 	  If unsure, say N
 endif # ARM64_PSEUDO_NMI
 
+config ARM64_MEMORY_MODEL_CONTROL
+	bool "Runtime memory model control"
+	help
+	  Some ARM64 CPUs support runtime switching of the CPU memory
+	  model, which can be useful to emulate other CPU architectures
+	  which have different memory models. Say Y to enable support
+	  for the PR_SET_MEM_MODEL/PR_GET_MEM_MODEL prctl() calls on
+	  CPUs with this feature.
+
 config RELOCATABLE
 	bool "Build a relocatable kernel image" if EXPERT
 	select ARCH_HAS_RELR
diff --git a/arch/arm64/include/asm/cpufeature.h b/arch/arm64/include/asm/cpufeature.h
index 8b904a757bd3..fb215b0e7529 100644
--- a/arch/arm64/include/asm/cpufeature.h
+++ b/arch/arm64/include/asm/cpufeature.h
@@ -1032,6 +1032,10 @@ static inline bool cpu_has_lpa2(void)
 #endif
 }
 
+void __init init_cpucap_indirect_list_impdef(void);
+void __init init_cpucap_indirect_list_from_array(const struct arm64_cpu_capabilities *caps);
+bool cpufeature_matches(u64 reg, const struct arm64_cpu_capabilities *entry);
+
 #endif /* __ASSEMBLY__ */
 
 #endif
diff --git a/arch/arm64/kernel/Makefile b/arch/arm64/kernel/Makefile
index 763824963ed1..5eaaee7b8358 100644
--- a/arch/arm64/kernel/Makefile
+++ b/arch/arm64/kernel/Makefile
@@ -33,7 +33,8 @@ obj-y			:= debug-monitors.o entry.o irq.o fpsimd.o		\
 			   return_address.o cpuinfo.o cpu_errata.o		\
 			   cpufeature.o alternative.o cacheinfo.o		\
 			   smp.o smp_spin_table.o topology.o smccc-call.o	\
-			   syscall.o proton-pack.o idle.o patching.o pi/
+			   syscall.o proton-pack.o idle.o patching.o pi/	\
+			   cpufeature_impdef.o
 
 obj-$(CONFIG_COMPAT)			+= sys32.o signal32.o			\
 					   sys_compat.o
diff --git a/arch/arm64/kernel/cpufeature.c b/arch/arm64/kernel/cpufeature.c
index 56583677c1f2..e39ab93ad683 100644
--- a/arch/arm64/kernel/cpufeature.c
+++ b/arch/arm64/kernel/cpufeature.c
@@ -1028,7 +1028,7 @@ static void init_cpu_ftr_reg(u32 sys_reg, u64 new)
 extern const struct arm64_cpu_capabilities arm64_errata[];
 static const struct arm64_cpu_capabilities arm64_features[];
 
-static void __init
+void __init
 init_cpucap_indirect_list_from_array(const struct arm64_cpu_capabilities *caps)
 {
 	for (; caps->matches; caps++) {
@@ -1540,8 +1540,8 @@ has_always(const struct arm64_cpu_capabilities *entry, int scope)
 	return true;
 }
 
-static bool
-feature_matches(u64 reg, const struct arm64_cpu_capabilities *entry)
+bool
+cpufeature_matches(u64 reg, const struct arm64_cpu_capabilities *entry)
 {
 	int val, min, max;
 	u64 tmp;
@@ -1594,14 +1594,14 @@ has_user_cpuid_feature(const struct arm64_cpu_capabilities *entry, int scope)
 	if (!mask)
 		return false;
 
-	return feature_matches(val, entry);
+	return cpufeature_matches(val, entry);
 }
 
 static bool
 has_cpuid_feature(const struct arm64_cpu_capabilities *entry, int scope)
 {
 	u64 val = read_scoped_sysreg(entry, scope);
-	return feature_matches(val, entry);
+	return cpufeature_matches(val, entry);
 }
 
 const struct cpumask *system_32bit_el0_cpumask(void)
@@ -3486,6 +3486,7 @@ void __init setup_boot_cpu_features(void)
 	 * handle the boot CPU.
 	 */
 	init_cpucap_indirect_list();
+	init_cpucap_indirect_list_impdef();
 
 	/*
 	 * Detect broken pseudo-NMI. Must be called _before_ the call to
diff --git a/arch/arm64/kernel/cpufeature_impdef.c b/arch/arm64/kernel/cpufeature_impdef.c
new file mode 100644
index 000000000000..bb04a8e3d79d
--- /dev/null
+++ b/arch/arm64/kernel/cpufeature_impdef.c
@@ -0,0 +1,38 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * Contains implementation-defined CPU feature definitions.
+ */
+
+#include <asm/cpufeature.h>
+
+#ifdef CONFIG_ARM64_MEMORY_MODEL_CONTROL
+static bool has_tso_fixed(const struct arm64_cpu_capabilities *entry, int scope)
+{
+	/* List of CPUs that always use the TSO memory model */
+	static const struct midr_range fixed_tso_list[] = {
+		MIDR_ALL_VERSIONS(MIDR_NVIDIA_DENVER),
+		MIDR_ALL_VERSIONS(MIDR_NVIDIA_CARMEL),
+		MIDR_ALL_VERSIONS(MIDR_FUJITSU_A64FX),
+		{ /* sentinel */ }
+	};
+
+	return is_midr_in_range_list(read_cpuid_id(), fixed_tso_list);
+}
+#endif
+
+static const struct arm64_cpu_capabilities arm64_impdef_features[] = {
+#ifdef CONFIG_ARM64_MEMORY_MODEL_CONTROL
+	{
+		.desc = "TSO memory model (Fixed)",
+		.capability = ARM64_HAS_TSO_FIXED,
+		.type = ARM64_CPUCAP_SYSTEM_FEATURE,
+		.matches = has_tso_fixed,
+	},
+#endif
+	{},
+};
+
+void __init init_cpucap_indirect_list_impdef(void)
+{
+	init_cpucap_indirect_list_from_array(arm64_impdef_features);
+}
diff --git a/arch/arm64/kernel/process.c b/arch/arm64/kernel/process.c
index 4ae31b7af6c3..7920056bad3e 100644
--- a/arch/arm64/kernel/process.c
+++ b/arch/arm64/kernel/process.c
@@ -41,6 +41,7 @@
 #include <linux/thread_info.h>
 #include <linux/prctl.h>
 #include <linux/stacktrace.h>
+#include <linux/memory_ordering_model.h>
 
 #include <asm/alternative.h>
 #include <asm/compat.h>
@@ -513,6 +514,25 @@ void update_sctlr_el1(u64 sctlr)
 	isb();
 }
 
+#ifdef CONFIG_ARM64_MEMORY_MODEL_CONTROL
+int arch_prctl_mem_model_get(struct task_struct *t)
+{
+	return PR_SET_MEM_MODEL_DEFAULT;
+}
+
+int arch_prctl_mem_model_set(struct task_struct *t, unsigned long val)
+{
+	if (alternative_has_cap_unlikely(ARM64_HAS_TSO_FIXED) &&
+	    val == PR_SET_MEM_MODEL_TSO)
+		return 0;
+
+	if (val == PR_SET_MEM_MODEL_DEFAULT)
+		return 0;
+
+	return -EINVAL;
+}
+#endif
+
 /*
  * Thread switching.
  */
@@ -651,6 +671,10 @@ void arch_setup_new_exec(void)
 		arch_prctl_spec_ctrl_set(current, PR_SPEC_STORE_BYPASS,
 					 PR_SPEC_ENABLE);
 	}
+
+#ifdef CONFIG_ARM64_MEMORY_MODEL_CONTROL
+	arch_prctl_mem_model_set(current, PR_SET_MEM_MODEL_DEFAULT);
+#endif
 }
 
 #ifdef CONFIG_ARM64_TAGGED_ADDR_ABI
diff --git a/arch/arm64/tools/cpucaps b/arch/arm64/tools/cpucaps
index 62b2838a231a..daa6b9495402 100644
--- a/arch/arm64/tools/cpucaps
+++ b/arch/arm64/tools/cpucaps
@@ -52,6 +52,7 @@ HAS_STAGE2_FWB
 HAS_TCR2
 HAS_TIDCP1
 HAS_TLB_RANGE
+HAS_TSO_FIXED
 HAS_VA52
 HAS_VIRT_HOST_EXTN
 HAS_WFXT

-- 
2.44.0


^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [PATCH 2/4] arm64: Implement PR_{GET,SET}_MEM_MODEL for always-TSO CPUs
@ 2024-04-11  0:51   ` Hector Martin
  0 siblings, 0 replies; 40+ messages in thread
From: Hector Martin @ 2024-04-11  0:51 UTC (permalink / raw)
  To: Catalin Marinas, Will Deacon, Marc Zyngier, Mark Rutland
  Cc: Zayd Qumsieh, Justin Lu, Ryan Houdek, Mark Brown, Ard Biesheuvel,
	Mateusz Guzik, Anshuman Khandual, Oliver Upton, Miguel Luis,
	Joey Gouly, Christoph Paasch, Kees Cook, Sami Tolvanen,
	Baoquan He, Joel Granados, Dawei Li, Andrew Morton,
	Florent Revest, David Hildenbrand, Stefan Roesch, Andy Chiu,
	Josh Triplett, Oleg Nesterov, Helge Deller, Zev Weiss,
	Ondrej Mosnacek, Miguel Ojeda, linux-arm-kernel, linux-kernel,
	Asahi Linux, Hector Martin

Some ARM64 implementations are known to always use the TSO memory model.
Add trivial support for the PR_{GET,SET}_MEM_MODEL prctl, which allows
userspace to learn this fact.

Known TSO implementations:
- Nvidia Denver
- Nvidia Carmel
- Fujitsu A64FX

Signed-off-by: Hector Martin <marcan@marcan.st>
---
 arch/arm64/Kconfig                    |  9 +++++++++
 arch/arm64/include/asm/cpufeature.h   |  4 ++++
 arch/arm64/kernel/Makefile            |  3 ++-
 arch/arm64/kernel/cpufeature.c        | 11 +++++-----
 arch/arm64/kernel/cpufeature_impdef.c | 38 +++++++++++++++++++++++++++++++++++
 arch/arm64/kernel/process.c           | 24 ++++++++++++++++++++++
 arch/arm64/tools/cpucaps              |  1 +
 7 files changed, 84 insertions(+), 6 deletions(-)

diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index 7b11c98b3e84..f8e66fe44ff4 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -2162,6 +2162,15 @@ config ARM64_DEBUG_PRIORITY_MASKING
 	  If unsure, say N
 endif # ARM64_PSEUDO_NMI
 
+config ARM64_MEMORY_MODEL_CONTROL
+	bool "Runtime memory model control"
+	help
+	  Some ARM64 CPUs support runtime switching of the CPU memory
+	  model, which can be useful to emulate other CPU architectures
+	  which have different memory models. Say Y to enable support
+	  for the PR_SET_MEM_MODEL/PR_GET_MEM_MODEL prctl() calls on
+	  CPUs with this feature.
+
 config RELOCATABLE
 	bool "Build a relocatable kernel image" if EXPERT
 	select ARCH_HAS_RELR
diff --git a/arch/arm64/include/asm/cpufeature.h b/arch/arm64/include/asm/cpufeature.h
index 8b904a757bd3..fb215b0e7529 100644
--- a/arch/arm64/include/asm/cpufeature.h
+++ b/arch/arm64/include/asm/cpufeature.h
@@ -1032,6 +1032,10 @@ static inline bool cpu_has_lpa2(void)
 #endif
 }
 
+void __init init_cpucap_indirect_list_impdef(void);
+void __init init_cpucap_indirect_list_from_array(const struct arm64_cpu_capabilities *caps);
+bool cpufeature_matches(u64 reg, const struct arm64_cpu_capabilities *entry);
+
 #endif /* __ASSEMBLY__ */
 
 #endif
diff --git a/arch/arm64/kernel/Makefile b/arch/arm64/kernel/Makefile
index 763824963ed1..5eaaee7b8358 100644
--- a/arch/arm64/kernel/Makefile
+++ b/arch/arm64/kernel/Makefile
@@ -33,7 +33,8 @@ obj-y			:= debug-monitors.o entry.o irq.o fpsimd.o		\
 			   return_address.o cpuinfo.o cpu_errata.o		\
 			   cpufeature.o alternative.o cacheinfo.o		\
 			   smp.o smp_spin_table.o topology.o smccc-call.o	\
-			   syscall.o proton-pack.o idle.o patching.o pi/
+			   syscall.o proton-pack.o idle.o patching.o pi/	\
+			   cpufeature_impdef.o
 
 obj-$(CONFIG_COMPAT)			+= sys32.o signal32.o			\
 					   sys_compat.o
diff --git a/arch/arm64/kernel/cpufeature.c b/arch/arm64/kernel/cpufeature.c
index 56583677c1f2..e39ab93ad683 100644
--- a/arch/arm64/kernel/cpufeature.c
+++ b/arch/arm64/kernel/cpufeature.c
@@ -1028,7 +1028,7 @@ static void init_cpu_ftr_reg(u32 sys_reg, u64 new)
 extern const struct arm64_cpu_capabilities arm64_errata[];
 static const struct arm64_cpu_capabilities arm64_features[];
 
-static void __init
+void __init
 init_cpucap_indirect_list_from_array(const struct arm64_cpu_capabilities *caps)
 {
 	for (; caps->matches; caps++) {
@@ -1540,8 +1540,8 @@ has_always(const struct arm64_cpu_capabilities *entry, int scope)
 	return true;
 }
 
-static bool
-feature_matches(u64 reg, const struct arm64_cpu_capabilities *entry)
+bool
+cpufeature_matches(u64 reg, const struct arm64_cpu_capabilities *entry)
 {
 	int val, min, max;
 	u64 tmp;
@@ -1594,14 +1594,14 @@ has_user_cpuid_feature(const struct arm64_cpu_capabilities *entry, int scope)
 	if (!mask)
 		return false;
 
-	return feature_matches(val, entry);
+	return cpufeature_matches(val, entry);
 }
 
 static bool
 has_cpuid_feature(const struct arm64_cpu_capabilities *entry, int scope)
 {
 	u64 val = read_scoped_sysreg(entry, scope);
-	return feature_matches(val, entry);
+	return cpufeature_matches(val, entry);
 }
 
 const struct cpumask *system_32bit_el0_cpumask(void)
@@ -3486,6 +3486,7 @@ void __init setup_boot_cpu_features(void)
 	 * handle the boot CPU.
 	 */
 	init_cpucap_indirect_list();
+	init_cpucap_indirect_list_impdef();
 
 	/*
 	 * Detect broken pseudo-NMI. Must be called _before_ the call to
diff --git a/arch/arm64/kernel/cpufeature_impdef.c b/arch/arm64/kernel/cpufeature_impdef.c
new file mode 100644
index 000000000000..bb04a8e3d79d
--- /dev/null
+++ b/arch/arm64/kernel/cpufeature_impdef.c
@@ -0,0 +1,38 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * Contains implementation-defined CPU feature definitions.
+ */
+
+#include <asm/cpufeature.h>
+
+#ifdef CONFIG_ARM64_MEMORY_MODEL_CONTROL
+static bool has_tso_fixed(const struct arm64_cpu_capabilities *entry, int scope)
+{
+	/* List of CPUs that always use the TSO memory model */
+	static const struct midr_range fixed_tso_list[] = {
+		MIDR_ALL_VERSIONS(MIDR_NVIDIA_DENVER),
+		MIDR_ALL_VERSIONS(MIDR_NVIDIA_CARMEL),
+		MIDR_ALL_VERSIONS(MIDR_FUJITSU_A64FX),
+		{ /* sentinel */ }
+	};
+
+	return is_midr_in_range_list(read_cpuid_id(), fixed_tso_list);
+}
+#endif
+
+static const struct arm64_cpu_capabilities arm64_impdef_features[] = {
+#ifdef CONFIG_ARM64_MEMORY_MODEL_CONTROL
+	{
+		.desc = "TSO memory model (Fixed)",
+		.capability = ARM64_HAS_TSO_FIXED,
+		.type = ARM64_CPUCAP_SYSTEM_FEATURE,
+		.matches = has_tso_fixed,
+	},
+#endif
+	{},
+};
+
+void __init init_cpucap_indirect_list_impdef(void)
+{
+	init_cpucap_indirect_list_from_array(arm64_impdef_features);
+}
diff --git a/arch/arm64/kernel/process.c b/arch/arm64/kernel/process.c
index 4ae31b7af6c3..7920056bad3e 100644
--- a/arch/arm64/kernel/process.c
+++ b/arch/arm64/kernel/process.c
@@ -41,6 +41,7 @@
 #include <linux/thread_info.h>
 #include <linux/prctl.h>
 #include <linux/stacktrace.h>
+#include <linux/memory_ordering_model.h>
 
 #include <asm/alternative.h>
 #include <asm/compat.h>
@@ -513,6 +514,25 @@ void update_sctlr_el1(u64 sctlr)
 	isb();
 }
 
+#ifdef CONFIG_ARM64_MEMORY_MODEL_CONTROL
+int arch_prctl_mem_model_get(struct task_struct *t)
+{
+	return PR_SET_MEM_MODEL_DEFAULT;
+}
+
+int arch_prctl_mem_model_set(struct task_struct *t, unsigned long val)
+{
+	if (alternative_has_cap_unlikely(ARM64_HAS_TSO_FIXED) &&
+	    val == PR_SET_MEM_MODEL_TSO)
+		return 0;
+
+	if (val == PR_SET_MEM_MODEL_DEFAULT)
+		return 0;
+
+	return -EINVAL;
+}
+#endif
+
 /*
  * Thread switching.
  */
@@ -651,6 +671,10 @@ void arch_setup_new_exec(void)
 		arch_prctl_spec_ctrl_set(current, PR_SPEC_STORE_BYPASS,
 					 PR_SPEC_ENABLE);
 	}
+
+#ifdef CONFIG_ARM64_MEMORY_MODEL_CONTROL
+	arch_prctl_mem_model_set(current, PR_SET_MEM_MODEL_DEFAULT);
+#endif
 }
 
 #ifdef CONFIG_ARM64_TAGGED_ADDR_ABI
diff --git a/arch/arm64/tools/cpucaps b/arch/arm64/tools/cpucaps
index 62b2838a231a..daa6b9495402 100644
--- a/arch/arm64/tools/cpucaps
+++ b/arch/arm64/tools/cpucaps
@@ -52,6 +52,7 @@ HAS_STAGE2_FWB
 HAS_TCR2
 HAS_TIDCP1
 HAS_TLB_RANGE
+HAS_TSO_FIXED
 HAS_VA52
 HAS_VIRT_HOST_EXTN
 HAS_WFXT

-- 
2.44.0


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [PATCH 3/4] arm64: Introduce scaffolding to add ACTLR_EL1 to thread state
  2024-04-11  0:51 ` Hector Martin
@ 2024-04-11  0:51   ` Hector Martin
  -1 siblings, 0 replies; 40+ messages in thread
From: Hector Martin @ 2024-04-11  0:51 UTC (permalink / raw)
  To: Catalin Marinas, Will Deacon, Marc Zyngier, Mark Rutland
  Cc: Zayd Qumsieh, Justin Lu, Ryan Houdek, Mark Brown, Ard Biesheuvel,
	Mateusz Guzik, Anshuman Khandual, Oliver Upton, Miguel Luis,
	Joey Gouly, Christoph Paasch, Kees Cook, Sami Tolvanen,
	Baoquan He, Joel Granados, Dawei Li, Andrew Morton,
	Florent Revest, David Hildenbrand, Stefan Roesch, Andy Chiu,
	Josh Triplett, Oleg Nesterov, Helge Deller, Zev Weiss,
	Ondrej Mosnacek, Miguel Ojeda, linux-arm-kernel, linux-kernel,
	Asahi Linux, Hector Martin

Some CPUs expose IMPDEF features in ACTLR_EL1 that can be meaningfully
controlled per-thread (like TSO control on Apple cores). Add the basic
scaffolding to save/restore this register as part of context switching.

This mechanism is disabled by default both by config symbol and via a
runtime check, which ensures it is never triggered unless the system is
known to need it for some feature (which also implies that the layout of
ACTLR_EL1 is uniform between all CPU core types).

Signed-off-by: Hector Martin <marcan@marcan.st>
---
 arch/arm64/Kconfig                  |  3 +++
 arch/arm64/include/asm/cpufeature.h |  5 +++++
 arch/arm64/include/asm/processor.h  |  3 +++
 arch/arm64/kernel/process.c         | 25 +++++++++++++++++++++++++
 arch/arm64/kernel/setup.c           |  8 ++++++++
 5 files changed, 44 insertions(+)

diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index f8e66fe44ff4..9b3593b34cce 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -408,6 +408,9 @@ config KASAN_SHADOW_OFFSET
 config UNWIND_TABLES
 	bool
 
+config ARM64_ACTLR_STATE
+	bool
+
 source "arch/arm64/Kconfig.platforms"
 
 menu "Kernel Features"
diff --git a/arch/arm64/include/asm/cpufeature.h b/arch/arm64/include/asm/cpufeature.h
index fb215b0e7529..46ab37f8f4d8 100644
--- a/arch/arm64/include/asm/cpufeature.h
+++ b/arch/arm64/include/asm/cpufeature.h
@@ -909,6 +909,11 @@ static inline unsigned int get_vmid_bits(u64 mmfr1)
 	return 8;
 }
 
+static __always_inline bool system_has_actlr_state(void)
+{
+	return false;
+}
+
 s64 arm64_ftr_safe_value(const struct arm64_ftr_bits *ftrp, s64 new, s64 cur);
 struct arm64_ftr_reg *get_arm64_ftr_reg(u32 sys_id);
 
diff --git a/arch/arm64/include/asm/processor.h b/arch/arm64/include/asm/processor.h
index f77371232d8c..d43c5791a35e 100644
--- a/arch/arm64/include/asm/processor.h
+++ b/arch/arm64/include/asm/processor.h
@@ -184,6 +184,9 @@ struct thread_struct {
 	u64			sctlr_user;
 	u64			svcr;
 	u64			tpidr2_el0;
+#ifdef CONFIG_ARM64_ACTLR_STATE
+	u64			actlr;
+#endif
 };
 
 static inline unsigned int thread_get_vl(struct thread_struct *thread,
diff --git a/arch/arm64/kernel/process.c b/arch/arm64/kernel/process.c
index 7920056bad3e..117f80e16aac 100644
--- a/arch/arm64/kernel/process.c
+++ b/arch/arm64/kernel/process.c
@@ -372,6 +372,11 @@ int copy_thread(struct task_struct *p, const struct kernel_clone_args *args)
 		if (system_supports_tpidr2())
 			p->thread.tpidr2_el0 = read_sysreg_s(SYS_TPIDR2_EL0);
 
+#ifdef CONFIG_ARM64_ACTLR_STATE
+		if (system_has_actlr_state())
+			p->thread.actlr = read_sysreg(actlr_el1);
+#endif
+
 		if (stack_start) {
 			if (is_compat_thread(task_thread_info(p)))
 				childregs->compat_sp = stack_start;
@@ -533,6 +538,25 @@ int arch_prctl_mem_model_set(struct task_struct *t, unsigned long val)
 }
 #endif
 
+#ifdef CONFIG_ARM64_ACTLR_STATE
+/*
+ * IMPDEF control register ACTLR_EL1 handling. Some CPUs use this to
+ * expose features that can be controlled by userspace.
+ */
+static void actlr_thread_switch(struct task_struct *next)
+{
+	if (!system_has_actlr_state())
+		return;
+
+	current->thread.actlr = read_sysreg(actlr_el1);
+	write_sysreg(next->thread.actlr, actlr_el1);
+}
+#else
+static inline void actlr_thread_switch(struct task_struct *next)
+{
+}
+#endif
+
 /*
  * Thread switching.
  */
@@ -550,6 +574,7 @@ struct task_struct *__switch_to(struct task_struct *prev,
 	ssbs_thread_switch(next);
 	erratum_1418040_thread_switch(next);
 	ptrauth_thread_switch_user(next);
+	actlr_thread_switch(next);
 
 	/*
 	 * Complete any pending TLB or cache maintenance on this CPU in case
diff --git a/arch/arm64/kernel/setup.c b/arch/arm64/kernel/setup.c
index 65a052bf741f..35342f633a85 100644
--- a/arch/arm64/kernel/setup.c
+++ b/arch/arm64/kernel/setup.c
@@ -359,6 +359,14 @@ void __init __no_sanitize_address setup_arch(char **cmdline_p)
 	 */
 	init_task.thread_info.ttbr0 = phys_to_ttbr(__pa_symbol(reserved_pg_dir));
 #endif
+#ifdef CONFIG_ARM64_ACTLR_STATE
+	/* Store the boot CPU ACTLR_EL1 value as the default. This will only
+	 * be actually restored during context switching iff the platform is
+	 * known to use ACTLR_EL1 for exposable features and its layout is
+	 * known to be the same on all CPUs.
+	 */
+	init_task.thread.actlr = read_sysreg(actlr_el1);
+#endif
 
 	if (boot_args[1] || boot_args[2] || boot_args[3]) {
 		pr_err("WARNING: x1-x3 nonzero in violation of boot protocol:\n"

-- 
2.44.0


^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [PATCH 3/4] arm64: Introduce scaffolding to add ACTLR_EL1 to thread state
@ 2024-04-11  0:51   ` Hector Martin
  0 siblings, 0 replies; 40+ messages in thread
From: Hector Martin @ 2024-04-11  0:51 UTC (permalink / raw)
  To: Catalin Marinas, Will Deacon, Marc Zyngier, Mark Rutland
  Cc: Zayd Qumsieh, Justin Lu, Ryan Houdek, Mark Brown, Ard Biesheuvel,
	Mateusz Guzik, Anshuman Khandual, Oliver Upton, Miguel Luis,
	Joey Gouly, Christoph Paasch, Kees Cook, Sami Tolvanen,
	Baoquan He, Joel Granados, Dawei Li, Andrew Morton,
	Florent Revest, David Hildenbrand, Stefan Roesch, Andy Chiu,
	Josh Triplett, Oleg Nesterov, Helge Deller, Zev Weiss,
	Ondrej Mosnacek, Miguel Ojeda, linux-arm-kernel, linux-kernel,
	Asahi Linux, Hector Martin

Some CPUs expose IMPDEF features in ACTLR_EL1 that can be meaningfully
controlled per-thread (like TSO control on Apple cores). Add the basic
scaffolding to save/restore this register as part of context switching.

This mechanism is disabled by default both by config symbol and via a
runtime check, which ensures it is never triggered unless the system is
known to need it for some feature (which also implies that the layout of
ACTLR_EL1 is uniform between all CPU core types).

Signed-off-by: Hector Martin <marcan@marcan.st>
---
 arch/arm64/Kconfig                  |  3 +++
 arch/arm64/include/asm/cpufeature.h |  5 +++++
 arch/arm64/include/asm/processor.h  |  3 +++
 arch/arm64/kernel/process.c         | 25 +++++++++++++++++++++++++
 arch/arm64/kernel/setup.c           |  8 ++++++++
 5 files changed, 44 insertions(+)

diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index f8e66fe44ff4..9b3593b34cce 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -408,6 +408,9 @@ config KASAN_SHADOW_OFFSET
 config UNWIND_TABLES
 	bool
 
+config ARM64_ACTLR_STATE
+	bool
+
 source "arch/arm64/Kconfig.platforms"
 
 menu "Kernel Features"
diff --git a/arch/arm64/include/asm/cpufeature.h b/arch/arm64/include/asm/cpufeature.h
index fb215b0e7529..46ab37f8f4d8 100644
--- a/arch/arm64/include/asm/cpufeature.h
+++ b/arch/arm64/include/asm/cpufeature.h
@@ -909,6 +909,11 @@ static inline unsigned int get_vmid_bits(u64 mmfr1)
 	return 8;
 }
 
+static __always_inline bool system_has_actlr_state(void)
+{
+	return false;
+}
+
 s64 arm64_ftr_safe_value(const struct arm64_ftr_bits *ftrp, s64 new, s64 cur);
 struct arm64_ftr_reg *get_arm64_ftr_reg(u32 sys_id);
 
diff --git a/arch/arm64/include/asm/processor.h b/arch/arm64/include/asm/processor.h
index f77371232d8c..d43c5791a35e 100644
--- a/arch/arm64/include/asm/processor.h
+++ b/arch/arm64/include/asm/processor.h
@@ -184,6 +184,9 @@ struct thread_struct {
 	u64			sctlr_user;
 	u64			svcr;
 	u64			tpidr2_el0;
+#ifdef CONFIG_ARM64_ACTLR_STATE
+	u64			actlr;
+#endif
 };
 
 static inline unsigned int thread_get_vl(struct thread_struct *thread,
diff --git a/arch/arm64/kernel/process.c b/arch/arm64/kernel/process.c
index 7920056bad3e..117f80e16aac 100644
--- a/arch/arm64/kernel/process.c
+++ b/arch/arm64/kernel/process.c
@@ -372,6 +372,11 @@ int copy_thread(struct task_struct *p, const struct kernel_clone_args *args)
 		if (system_supports_tpidr2())
 			p->thread.tpidr2_el0 = read_sysreg_s(SYS_TPIDR2_EL0);
 
+#ifdef CONFIG_ARM64_ACTLR_STATE
+		if (system_has_actlr_state())
+			p->thread.actlr = read_sysreg(actlr_el1);
+#endif
+
 		if (stack_start) {
 			if (is_compat_thread(task_thread_info(p)))
 				childregs->compat_sp = stack_start;
@@ -533,6 +538,25 @@ int arch_prctl_mem_model_set(struct task_struct *t, unsigned long val)
 }
 #endif
 
+#ifdef CONFIG_ARM64_ACTLR_STATE
+/*
+ * IMPDEF control register ACTLR_EL1 handling. Some CPUs use this to
+ * expose features that can be controlled by userspace.
+ */
+static void actlr_thread_switch(struct task_struct *next)
+{
+	if (!system_has_actlr_state())
+		return;
+
+	current->thread.actlr = read_sysreg(actlr_el1);
+	write_sysreg(next->thread.actlr, actlr_el1);
+}
+#else
+static inline void actlr_thread_switch(struct task_struct *next)
+{
+}
+#endif
+
 /*
  * Thread switching.
  */
@@ -550,6 +574,7 @@ struct task_struct *__switch_to(struct task_struct *prev,
 	ssbs_thread_switch(next);
 	erratum_1418040_thread_switch(next);
 	ptrauth_thread_switch_user(next);
+	actlr_thread_switch(next);
 
 	/*
 	 * Complete any pending TLB or cache maintenance on this CPU in case
diff --git a/arch/arm64/kernel/setup.c b/arch/arm64/kernel/setup.c
index 65a052bf741f..35342f633a85 100644
--- a/arch/arm64/kernel/setup.c
+++ b/arch/arm64/kernel/setup.c
@@ -359,6 +359,14 @@ void __init __no_sanitize_address setup_arch(char **cmdline_p)
 	 */
 	init_task.thread_info.ttbr0 = phys_to_ttbr(__pa_symbol(reserved_pg_dir));
 #endif
+#ifdef CONFIG_ARM64_ACTLR_STATE
+	/* Store the boot CPU ACTLR_EL1 value as the default. This will only
+	 * be actually restored during context switching iff the platform is
+	 * known to use ACTLR_EL1 for exposable features and its layout is
+	 * known to be the same on all CPUs.
+	 */
+	init_task.thread.actlr = read_sysreg(actlr_el1);
+#endif
 
 	if (boot_args[1] || boot_args[2] || boot_args[3]) {
 		pr_err("WARNING: x1-x3 nonzero in violation of boot protocol:\n"

-- 
2.44.0


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [PATCH 4/4] arm64: Implement Apple IMPDEF TSO memory model control
  2024-04-11  0:51 ` Hector Martin
@ 2024-04-11  0:51   ` Hector Martin
  -1 siblings, 0 replies; 40+ messages in thread
From: Hector Martin @ 2024-04-11  0:51 UTC (permalink / raw)
  To: Catalin Marinas, Will Deacon, Marc Zyngier, Mark Rutland
  Cc: Zayd Qumsieh, Justin Lu, Ryan Houdek, Mark Brown, Ard Biesheuvel,
	Mateusz Guzik, Anshuman Khandual, Oliver Upton, Miguel Luis,
	Joey Gouly, Christoph Paasch, Kees Cook, Sami Tolvanen,
	Baoquan He, Joel Granados, Dawei Li, Andrew Morton,
	Florent Revest, David Hildenbrand, Stefan Roesch, Andy Chiu,
	Josh Triplett, Oleg Nesterov, Helge Deller, Zev Weiss,
	Ondrej Mosnacek, Miguel Ojeda, linux-arm-kernel, linux-kernel,
	Asahi Linux, Hector Martin

Apple CPUs may implement the TSO memory model as an optional
configurable mode. This allows x86 emulators to simplify their
load/store handling, greatly increasing performance.

Expose this via the prctl PR_SET_MEM_MODEL_TSO mechanism. We use the
Apple IMPDEF AIDR_EL1 register to check for the availability of TSO
mode, and enable this codepath on all CPUs with an Apple implementer.

This relies on the ACTLR_EL1 thread state scaffolding introduced
earlier.

Signed-off-by: Hector Martin <marcan@marcan.st>
---
 arch/arm64/Kconfig                        |  2 ++
 arch/arm64/include/asm/apple_cpufeature.h | 15 +++++++++++++++
 arch/arm64/include/asm/cpufeature.h       |  3 ++-
 arch/arm64/kernel/cpufeature_impdef.c     | 23 +++++++++++++++++++++++
 arch/arm64/kernel/process.c               | 22 ++++++++++++++++++++++
 arch/arm64/tools/cpucaps                  |  1 +
 6 files changed, 65 insertions(+), 1 deletion(-)

diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index 9b3593b34cce..2f3eedd955c9 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -2167,6 +2167,8 @@ endif # ARM64_PSEUDO_NMI
 
 config ARM64_MEMORY_MODEL_CONTROL
 	bool "Runtime memory model control"
+	default ARCH_APPLE
+	select ARM64_ACTLR_STATE
 	help
 	  Some ARM64 CPUs support runtime switching of the CPU memory
 	  model, which can be useful to emulate other CPU architectures
diff --git a/arch/arm64/include/asm/apple_cpufeature.h b/arch/arm64/include/asm/apple_cpufeature.h
new file mode 100644
index 000000000000..4370d91ffa3e
--- /dev/null
+++ b/arch/arm64/include/asm/apple_cpufeature.h
@@ -0,0 +1,15 @@
+// SPDX-License-Identifier: GPL-2.0
+
+#ifndef __ASM_APPLE_CPUFEATURES_H
+#define __ASM_APPLE_CPUFEATURES_H
+
+#include <linux/bits.h>
+#include <asm/sysreg.h>
+
+#define AIDR_APPLE_TSO_SHIFT	9
+#define AIDR_APPLE_TSO		BIT(9)
+
+#define ACTLR_APPLE_TSO_SHIFT	1
+#define ACTLR_APPLE_TSO		BIT(1)
+
+#endif
diff --git a/arch/arm64/include/asm/cpufeature.h b/arch/arm64/include/asm/cpufeature.h
index 46ab37f8f4d8..a191000d88c2 100644
--- a/arch/arm64/include/asm/cpufeature.h
+++ b/arch/arm64/include/asm/cpufeature.h
@@ -911,7 +911,8 @@ static inline unsigned int get_vmid_bits(u64 mmfr1)
 
 static __always_inline bool system_has_actlr_state(void)
 {
-	return false;
+	return IS_ENABLED(CONFIG_ARM64_ACTLR_STATE) &&
+		alternative_has_cap_unlikely(ARM64_HAS_TSO_APPLE);
 }
 
 s64 arm64_ftr_safe_value(const struct arm64_ftr_bits *ftrp, s64 new, s64 cur);
diff --git a/arch/arm64/kernel/cpufeature_impdef.c b/arch/arm64/kernel/cpufeature_impdef.c
index bb04a8e3d79d..9325d1eb12f4 100644
--- a/arch/arm64/kernel/cpufeature_impdef.c
+++ b/arch/arm64/kernel/cpufeature_impdef.c
@@ -4,8 +4,21 @@
  */
 
 #include <asm/cpufeature.h>
+#include <asm/apple_cpufeature.h>
 
 #ifdef CONFIG_ARM64_MEMORY_MODEL_CONTROL
+static bool has_apple_feature(const struct arm64_cpu_capabilities *entry, int scope)
+{
+	u64 val;
+	WARN_ON(scope != SCOPE_SYSTEM);
+
+	if (read_cpuid_implementor() != ARM_CPU_IMP_APPLE)
+		return false;
+
+	val = read_sysreg(aidr_el1);
+	return cpufeature_matches(val, entry);
+}
+
 static bool has_tso_fixed(const struct arm64_cpu_capabilities *entry, int scope)
 {
 	/* List of CPUs that always use the TSO memory model */
@@ -22,6 +35,16 @@ static bool has_tso_fixed(const struct arm64_cpu_capabilities *entry, int scope)
 
 static const struct arm64_cpu_capabilities arm64_impdef_features[] = {
 #ifdef CONFIG_ARM64_MEMORY_MODEL_CONTROL
+	{
+		.desc = "TSO memory model (Apple)",
+		.capability = ARM64_HAS_TSO_APPLE,
+		.type = ARM64_CPUCAP_SYSTEM_FEATURE,
+		.matches = has_apple_feature,
+		.field_pos = AIDR_APPLE_TSO_SHIFT,
+		.field_width = 1,
+		.sign = FTR_UNSIGNED,
+		.min_field_value = 1,
+	},
 	{
 		.desc = "TSO memory model (Fixed)",
 		.capability = ARM64_HAS_TSO_FIXED,
diff --git a/arch/arm64/kernel/process.c b/arch/arm64/kernel/process.c
index 117f80e16aac..34a19ecfb630 100644
--- a/arch/arm64/kernel/process.c
+++ b/arch/arm64/kernel/process.c
@@ -44,6 +44,7 @@
 #include <linux/memory_ordering_model.h>
 
 #include <asm/alternative.h>
+#include <asm/apple_cpufeature.h>
 #include <asm/compat.h>
 #include <asm/cpufeature.h>
 #include <asm/cacheflush.h>
@@ -522,6 +523,10 @@ void update_sctlr_el1(u64 sctlr)
 #ifdef CONFIG_ARM64_MEMORY_MODEL_CONTROL
 int arch_prctl_mem_model_get(struct task_struct *t)
 {
+	if (alternative_has_cap_unlikely(ARM64_HAS_TSO_APPLE) &&
+		t->thread.actlr & ACTLR_APPLE_TSO)
+		return PR_SET_MEM_MODEL_TSO;
+
 	return PR_SET_MEM_MODEL_DEFAULT;
 }
 
@@ -531,6 +536,23 @@ int arch_prctl_mem_model_set(struct task_struct *t, unsigned long val)
 	    val == PR_SET_MEM_MODEL_TSO)
 		return 0;
 
+	if (alternative_has_cap_unlikely(ARM64_HAS_TSO_APPLE)) {
+		WARN_ON(!system_has_actlr_state());
+
+		switch (val) {
+		case PR_SET_MEM_MODEL_TSO:
+			t->thread.actlr |= ACTLR_APPLE_TSO;
+			break;
+		case PR_SET_MEM_MODEL_DEFAULT:
+			t->thread.actlr &= ~ACTLR_APPLE_TSO;
+			break;
+		default:
+			return -EINVAL;
+		}
+		write_sysreg(t->thread.actlr, actlr_el1);
+		return 0;
+	}
+
 	if (val == PR_SET_MEM_MODEL_DEFAULT)
 		return 0;
 
diff --git a/arch/arm64/tools/cpucaps b/arch/arm64/tools/cpucaps
index daa6b9495402..62f9ca9ce44b 100644
--- a/arch/arm64/tools/cpucaps
+++ b/arch/arm64/tools/cpucaps
@@ -52,6 +52,7 @@ HAS_STAGE2_FWB
 HAS_TCR2
 HAS_TIDCP1
 HAS_TLB_RANGE
+HAS_TSO_APPLE
 HAS_TSO_FIXED
 HAS_VA52
 HAS_VIRT_HOST_EXTN

-- 
2.44.0


^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [PATCH 4/4] arm64: Implement Apple IMPDEF TSO memory model control
@ 2024-04-11  0:51   ` Hector Martin
  0 siblings, 0 replies; 40+ messages in thread
From: Hector Martin @ 2024-04-11  0:51 UTC (permalink / raw)
  To: Catalin Marinas, Will Deacon, Marc Zyngier, Mark Rutland
  Cc: Zayd Qumsieh, Justin Lu, Ryan Houdek, Mark Brown, Ard Biesheuvel,
	Mateusz Guzik, Anshuman Khandual, Oliver Upton, Miguel Luis,
	Joey Gouly, Christoph Paasch, Kees Cook, Sami Tolvanen,
	Baoquan He, Joel Granados, Dawei Li, Andrew Morton,
	Florent Revest, David Hildenbrand, Stefan Roesch, Andy Chiu,
	Josh Triplett, Oleg Nesterov, Helge Deller, Zev Weiss,
	Ondrej Mosnacek, Miguel Ojeda, linux-arm-kernel, linux-kernel,
	Asahi Linux, Hector Martin

Apple CPUs may implement the TSO memory model as an optional
configurable mode. This allows x86 emulators to simplify their
load/store handling, greatly increasing performance.

Expose this via the prctl PR_SET_MEM_MODEL_TSO mechanism. We use the
Apple IMPDEF AIDR_EL1 register to check for the availability of TSO
mode, and enable this codepath on all CPUs with an Apple implementer.

This relies on the ACTLR_EL1 thread state scaffolding introduced
earlier.

Signed-off-by: Hector Martin <marcan@marcan.st>
---
 arch/arm64/Kconfig                        |  2 ++
 arch/arm64/include/asm/apple_cpufeature.h | 15 +++++++++++++++
 arch/arm64/include/asm/cpufeature.h       |  3 ++-
 arch/arm64/kernel/cpufeature_impdef.c     | 23 +++++++++++++++++++++++
 arch/arm64/kernel/process.c               | 22 ++++++++++++++++++++++
 arch/arm64/tools/cpucaps                  |  1 +
 6 files changed, 65 insertions(+), 1 deletion(-)

diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index 9b3593b34cce..2f3eedd955c9 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -2167,6 +2167,8 @@ endif # ARM64_PSEUDO_NMI
 
 config ARM64_MEMORY_MODEL_CONTROL
 	bool "Runtime memory model control"
+	default ARCH_APPLE
+	select ARM64_ACTLR_STATE
 	help
 	  Some ARM64 CPUs support runtime switching of the CPU memory
 	  model, which can be useful to emulate other CPU architectures
diff --git a/arch/arm64/include/asm/apple_cpufeature.h b/arch/arm64/include/asm/apple_cpufeature.h
new file mode 100644
index 000000000000..4370d91ffa3e
--- /dev/null
+++ b/arch/arm64/include/asm/apple_cpufeature.h
@@ -0,0 +1,15 @@
+// SPDX-License-Identifier: GPL-2.0
+
+#ifndef __ASM_APPLE_CPUFEATURES_H
+#define __ASM_APPLE_CPUFEATURES_H
+
+#include <linux/bits.h>
+#include <asm/sysreg.h>
+
+#define AIDR_APPLE_TSO_SHIFT	9
+#define AIDR_APPLE_TSO		BIT(9)
+
+#define ACTLR_APPLE_TSO_SHIFT	1
+#define ACTLR_APPLE_TSO		BIT(1)
+
+#endif
diff --git a/arch/arm64/include/asm/cpufeature.h b/arch/arm64/include/asm/cpufeature.h
index 46ab37f8f4d8..a191000d88c2 100644
--- a/arch/arm64/include/asm/cpufeature.h
+++ b/arch/arm64/include/asm/cpufeature.h
@@ -911,7 +911,8 @@ static inline unsigned int get_vmid_bits(u64 mmfr1)
 
 static __always_inline bool system_has_actlr_state(void)
 {
-	return false;
+	return IS_ENABLED(CONFIG_ARM64_ACTLR_STATE) &&
+		alternative_has_cap_unlikely(ARM64_HAS_TSO_APPLE);
 }
 
 s64 arm64_ftr_safe_value(const struct arm64_ftr_bits *ftrp, s64 new, s64 cur);
diff --git a/arch/arm64/kernel/cpufeature_impdef.c b/arch/arm64/kernel/cpufeature_impdef.c
index bb04a8e3d79d..9325d1eb12f4 100644
--- a/arch/arm64/kernel/cpufeature_impdef.c
+++ b/arch/arm64/kernel/cpufeature_impdef.c
@@ -4,8 +4,21 @@
  */
 
 #include <asm/cpufeature.h>
+#include <asm/apple_cpufeature.h>
 
 #ifdef CONFIG_ARM64_MEMORY_MODEL_CONTROL
+static bool has_apple_feature(const struct arm64_cpu_capabilities *entry, int scope)
+{
+	u64 val;
+	WARN_ON(scope != SCOPE_SYSTEM);
+
+	if (read_cpuid_implementor() != ARM_CPU_IMP_APPLE)
+		return false;
+
+	val = read_sysreg(aidr_el1);
+	return cpufeature_matches(val, entry);
+}
+
 static bool has_tso_fixed(const struct arm64_cpu_capabilities *entry, int scope)
 {
 	/* List of CPUs that always use the TSO memory model */
@@ -22,6 +35,16 @@ static bool has_tso_fixed(const struct arm64_cpu_capabilities *entry, int scope)
 
 static const struct arm64_cpu_capabilities arm64_impdef_features[] = {
 #ifdef CONFIG_ARM64_MEMORY_MODEL_CONTROL
+	{
+		.desc = "TSO memory model (Apple)",
+		.capability = ARM64_HAS_TSO_APPLE,
+		.type = ARM64_CPUCAP_SYSTEM_FEATURE,
+		.matches = has_apple_feature,
+		.field_pos = AIDR_APPLE_TSO_SHIFT,
+		.field_width = 1,
+		.sign = FTR_UNSIGNED,
+		.min_field_value = 1,
+	},
 	{
 		.desc = "TSO memory model (Fixed)",
 		.capability = ARM64_HAS_TSO_FIXED,
diff --git a/arch/arm64/kernel/process.c b/arch/arm64/kernel/process.c
index 117f80e16aac..34a19ecfb630 100644
--- a/arch/arm64/kernel/process.c
+++ b/arch/arm64/kernel/process.c
@@ -44,6 +44,7 @@
 #include <linux/memory_ordering_model.h>
 
 #include <asm/alternative.h>
+#include <asm/apple_cpufeature.h>
 #include <asm/compat.h>
 #include <asm/cpufeature.h>
 #include <asm/cacheflush.h>
@@ -522,6 +523,10 @@ void update_sctlr_el1(u64 sctlr)
 #ifdef CONFIG_ARM64_MEMORY_MODEL_CONTROL
 int arch_prctl_mem_model_get(struct task_struct *t)
 {
+	if (alternative_has_cap_unlikely(ARM64_HAS_TSO_APPLE) &&
+		t->thread.actlr & ACTLR_APPLE_TSO)
+		return PR_SET_MEM_MODEL_TSO;
+
 	return PR_SET_MEM_MODEL_DEFAULT;
 }
 
@@ -531,6 +536,23 @@ int arch_prctl_mem_model_set(struct task_struct *t, unsigned long val)
 	    val == PR_SET_MEM_MODEL_TSO)
 		return 0;
 
+	if (alternative_has_cap_unlikely(ARM64_HAS_TSO_APPLE)) {
+		WARN_ON(!system_has_actlr_state());
+
+		switch (val) {
+		case PR_SET_MEM_MODEL_TSO:
+			t->thread.actlr |= ACTLR_APPLE_TSO;
+			break;
+		case PR_SET_MEM_MODEL_DEFAULT:
+			t->thread.actlr &= ~ACTLR_APPLE_TSO;
+			break;
+		default:
+			return -EINVAL;
+		}
+		write_sysreg(t->thread.actlr, actlr_el1);
+		return 0;
+	}
+
 	if (val == PR_SET_MEM_MODEL_DEFAULT)
 		return 0;
 
diff --git a/arch/arm64/tools/cpucaps b/arch/arm64/tools/cpucaps
index daa6b9495402..62f9ca9ce44b 100644
--- a/arch/arm64/tools/cpucaps
+++ b/arch/arm64/tools/cpucaps
@@ -52,6 +52,7 @@ HAS_STAGE2_FWB
 HAS_TCR2
 HAS_TIDCP1
 HAS_TLB_RANGE
+HAS_TSO_APPLE
 HAS_TSO_FIXED
 HAS_VA52
 HAS_VIRT_HOST_EXTN

-- 
2.44.0


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 40+ messages in thread

* Re: [PATCH 0/4] arm64: Support the TSO memory model
  2024-04-11  0:51 ` Hector Martin
@ 2024-04-11  1:37   ` Neal Gompa
  -1 siblings, 0 replies; 40+ messages in thread
From: Neal Gompa @ 2024-04-11  1:37 UTC (permalink / raw)
  To: Hector Martin
  Cc: Catalin Marinas, Will Deacon, Marc Zyngier, Mark Rutland,
	Zayd Qumsieh, Justin Lu, Ryan Houdek, Mark Brown, Ard Biesheuvel,
	Mateusz Guzik, Anshuman Khandual, Oliver Upton, Miguel Luis,
	Joey Gouly, Christoph Paasch, Kees Cook, Sami Tolvanen,
	Baoquan He, Joel Granados, Dawei Li, Andrew Morton,
	Florent Revest, David Hildenbrand, Stefan Roesch, Andy Chiu,
	Josh Triplett, Oleg Nesterov, Helge Deller, Zev Weiss,
	Ondrej Mosnacek, Miguel Ojeda, linux-arm-kernel, linux-kernel,
	Asahi Linux

On Wed, Apr 10, 2024 at 8:51 PM Hector Martin <marcan@marcan.st> wrote:
>
> x86 CPUs implement a stricter memory modern than ARM64 (TSO). For this
> reason, x86 emulation on baseline ARM64 systems requires very expensive
> memory model emulation. Having hardware that supports this natively is
> therefore very attractive. Such hardware, in fact, exists. This series
> adds support for userspace to identify when TSO is available and
> toggle it on, if supported.
>
> Some ARM64 CPUs intrinsically implement the TSO memory model, while
> others expose is as an IMPDEF control. Apple Silicon SoCs are in the
> latter category. Using TSO for x86 emulation on chips that support it
> has been shown to provide a massive performance boost [1].
>
> Patch 1 introduces the PR_{SET,GET}_MEM_MODEL userspace control, which
> is initially not implemented for any architectures.
>
> Patch 2 implements it for CPUs which are known, to the best of my
> knowledge, to always implement the TSO memory model unconditionally.
> This uses the cpufeature mechanism to only enable this if *all* cores in
> the system meet the requirements.
>
> Patch 3 adds the scaffolding necesasry to save/restore the ACTLR_EL1
> register across context switches. This register contains IMPDEF flags
> related to CPU execution, and on Apple CPUs this is where the runtime
> TSO toggle bit is implemented. Other CPUs could conceivably benefit from
> this scaffolding if they also use ACTLR_EL1 for things that could
> ostensibly be runtime controlled and context-switched. For this to work,
> ACTLR_EL1 must have a uniform layout across all cores in the system.
>
> Finally, patch 4 implements PR_{SET,GET}_MEM_MODEL for Apple CPUs by
> hooking it up to flip the appropriate ACTLR_EL1 bit when the Apple TSO
> feature is detected (on all CPUs, which also implies the uniform
> ACTLR_EL1 layout).
>
> This series has been brewing in the downstream Asahi Linux tree for a
> while now, and ships to thousands of users. A subset have been using it
> with FEX-Emu, which already supports this feature. This rebase on
> v6.9-rc1 is only build-tested (all intermediate commits with and without
> the config enabled, on ARM64) but I'll update the downstream branch soon
> with this version and get it pushed out to users/testers.
>
> The Apple support works on bare metal and *should* work exactly the same
> way on macOS VMs (as alluded to by Zayd in his independent submission [3]),
> though I haven't personally verified this. KVM support for this is left
> for a future patchset.
>
> (Apologies for the large Cc: list; I want to make sure nobody who got
> Cced on Zayd's alternate take is left out of this one.)
>
> [1] https://fex-emu.com/FEX-2306/
> [2] https://github.com/AsahiLinux/linux/tree/bits/220-tso
> [3] https://lore.kernel.org/lkml/20240410211652.16640-1-zayd_qumsieh@apple.com/
>
> To: Catalin Marinas <catalin.marinas@arm.com>
> To: Will Deacon <will@kernel.org>
> To: Marc Zyngier <maz@kernel.org>
> To: Mark Rutland <mark.rutland@arm.com>
> Cc: Zayd Qumsieh <zayd_qumsieh@apple.com>
> Cc: Justin Lu <ih_justin@apple.com>
> Cc: Ryan Houdek <Houdek.Ryan@fex-emu.org>
> Cc: Mark Brown <broonie@kernel.org>
> Cc: Ard Biesheuvel <ardb@kernel.org>
> Cc: Mateusz Guzik <mjguzik@gmail.com>
> Cc: Anshuman Khandual <anshuman.khandual@arm.com>
> Cc: Oliver Upton <oliver.upton@linux.dev>
> Cc: Miguel Luis <miguel.luis@oracle.com>
> Cc: Joey Gouly <joey.gouly@arm.com>
> Cc: Christoph Paasch <cpaasch@apple.com>
> Cc: Kees Cook <keescook@chromium.org>
> Cc: Sami Tolvanen <samitolvanen@google.com>
> Cc: Baoquan He <bhe@redhat.com>
> Cc: Joel Granados <j.granados@samsung.com>
> Cc: Dawei Li <dawei.li@shingroup.cn>
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Cc: Florent Revest <revest@chromium.org>
> Cc: David Hildenbrand <david@redhat.com>
> Cc: Stefan Roesch <shr@devkernel.io>
> Cc: Andy Chiu <andy.chiu@sifive.com>
> Cc: Josh Triplett <josh@joshtriplett.org>
> Cc: Oleg Nesterov <oleg@redhat.com>
> Cc: Helge Deller <deller@gmx.de>
> Cc: Zev Weiss <zev@bewilderbeest.net>
> Cc: Ondrej Mosnacek <omosnace@redhat.com>
> Cc: Miguel Ojeda <ojeda@kernel.org>
> Cc: linux-arm-kernel@lists.infradead.org
> Cc: linux-kernel@vger.kernel.org
> Cc: Asahi Linux <asahi@lists.linux.dev>
>
> Signed-off-by: Hector Martin <marcan@marcan.st>
> ---
> Hector Martin (4):
>       prctl: Introduce PR_{SET,GET}_MEM_MODEL
>       arm64: Implement PR_{GET,SET}_MEM_MODEL for always-TSO CPUs
>       arm64: Introduce scaffolding to add ACTLR_EL1 to thread state
>       arm64: Implement Apple IMPDEF TSO memory model control
>
>  arch/arm64/Kconfig                        | 14 ++++++
>  arch/arm64/include/asm/apple_cpufeature.h | 15 +++++++
>  arch/arm64/include/asm/cpufeature.h       | 10 +++++
>  arch/arm64/include/asm/processor.h        |  3 ++
>  arch/arm64/kernel/Makefile                |  3 +-
>  arch/arm64/kernel/cpufeature.c            | 11 ++---
>  arch/arm64/kernel/cpufeature_impdef.c     | 61 ++++++++++++++++++++++++++
>  arch/arm64/kernel/process.c               | 71 +++++++++++++++++++++++++++++++
>  arch/arm64/kernel/setup.c                 |  8 ++++
>  arch/arm64/tools/cpucaps                  |  2 +
>  include/linux/memory_ordering_model.h     | 11 +++++
>  include/uapi/linux/prctl.h                |  5 +++
>  kernel/sys.c                              | 21 +++++++++
>  13 files changed, 229 insertions(+), 6 deletions(-)
> ---
> base-commit: 4cece764965020c22cff7665b18a012006359095
> change-id: 20240411-tso-e86fdceb94b8
>

The series looks good to me.

Reviewed-by: Neal Gompa <neal@gompa.dev>



-- 
真実はいつも一つ!/ Always, there's only one truth!

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH 0/4] arm64: Support the TSO memory model
@ 2024-04-11  1:37   ` Neal Gompa
  0 siblings, 0 replies; 40+ messages in thread
From: Neal Gompa @ 2024-04-11  1:37 UTC (permalink / raw)
  To: Hector Martin
  Cc: Catalin Marinas, Will Deacon, Marc Zyngier, Mark Rutland,
	Zayd Qumsieh, Justin Lu, Ryan Houdek, Mark Brown, Ard Biesheuvel,
	Mateusz Guzik, Anshuman Khandual, Oliver Upton, Miguel Luis,
	Joey Gouly, Christoph Paasch, Kees Cook, Sami Tolvanen,
	Baoquan He, Joel Granados, Dawei Li, Andrew Morton,
	Florent Revest, David Hildenbrand, Stefan Roesch, Andy Chiu,
	Josh Triplett, Oleg Nesterov, Helge Deller, Zev Weiss,
	Ondrej Mosnacek, Miguel Ojeda, linux-arm-kernel, linux-kernel,
	Asahi Linux

On Wed, Apr 10, 2024 at 8:51 PM Hector Martin <marcan@marcan.st> wrote:
>
> x86 CPUs implement a stricter memory modern than ARM64 (TSO). For this
> reason, x86 emulation on baseline ARM64 systems requires very expensive
> memory model emulation. Having hardware that supports this natively is
> therefore very attractive. Such hardware, in fact, exists. This series
> adds support for userspace to identify when TSO is available and
> toggle it on, if supported.
>
> Some ARM64 CPUs intrinsically implement the TSO memory model, while
> others expose is as an IMPDEF control. Apple Silicon SoCs are in the
> latter category. Using TSO for x86 emulation on chips that support it
> has been shown to provide a massive performance boost [1].
>
> Patch 1 introduces the PR_{SET,GET}_MEM_MODEL userspace control, which
> is initially not implemented for any architectures.
>
> Patch 2 implements it for CPUs which are known, to the best of my
> knowledge, to always implement the TSO memory model unconditionally.
> This uses the cpufeature mechanism to only enable this if *all* cores in
> the system meet the requirements.
>
> Patch 3 adds the scaffolding necesasry to save/restore the ACTLR_EL1
> register across context switches. This register contains IMPDEF flags
> related to CPU execution, and on Apple CPUs this is where the runtime
> TSO toggle bit is implemented. Other CPUs could conceivably benefit from
> this scaffolding if they also use ACTLR_EL1 for things that could
> ostensibly be runtime controlled and context-switched. For this to work,
> ACTLR_EL1 must have a uniform layout across all cores in the system.
>
> Finally, patch 4 implements PR_{SET,GET}_MEM_MODEL for Apple CPUs by
> hooking it up to flip the appropriate ACTLR_EL1 bit when the Apple TSO
> feature is detected (on all CPUs, which also implies the uniform
> ACTLR_EL1 layout).
>
> This series has been brewing in the downstream Asahi Linux tree for a
> while now, and ships to thousands of users. A subset have been using it
> with FEX-Emu, which already supports this feature. This rebase on
> v6.9-rc1 is only build-tested (all intermediate commits with and without
> the config enabled, on ARM64) but I'll update the downstream branch soon
> with this version and get it pushed out to users/testers.
>
> The Apple support works on bare metal and *should* work exactly the same
> way on macOS VMs (as alluded to by Zayd in his independent submission [3]),
> though I haven't personally verified this. KVM support for this is left
> for a future patchset.
>
> (Apologies for the large Cc: list; I want to make sure nobody who got
> Cced on Zayd's alternate take is left out of this one.)
>
> [1] https://fex-emu.com/FEX-2306/
> [2] https://github.com/AsahiLinux/linux/tree/bits/220-tso
> [3] https://lore.kernel.org/lkml/20240410211652.16640-1-zayd_qumsieh@apple.com/
>
> To: Catalin Marinas <catalin.marinas@arm.com>
> To: Will Deacon <will@kernel.org>
> To: Marc Zyngier <maz@kernel.org>
> To: Mark Rutland <mark.rutland@arm.com>
> Cc: Zayd Qumsieh <zayd_qumsieh@apple.com>
> Cc: Justin Lu <ih_justin@apple.com>
> Cc: Ryan Houdek <Houdek.Ryan@fex-emu.org>
> Cc: Mark Brown <broonie@kernel.org>
> Cc: Ard Biesheuvel <ardb@kernel.org>
> Cc: Mateusz Guzik <mjguzik@gmail.com>
> Cc: Anshuman Khandual <anshuman.khandual@arm.com>
> Cc: Oliver Upton <oliver.upton@linux.dev>
> Cc: Miguel Luis <miguel.luis@oracle.com>
> Cc: Joey Gouly <joey.gouly@arm.com>
> Cc: Christoph Paasch <cpaasch@apple.com>
> Cc: Kees Cook <keescook@chromium.org>
> Cc: Sami Tolvanen <samitolvanen@google.com>
> Cc: Baoquan He <bhe@redhat.com>
> Cc: Joel Granados <j.granados@samsung.com>
> Cc: Dawei Li <dawei.li@shingroup.cn>
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Cc: Florent Revest <revest@chromium.org>
> Cc: David Hildenbrand <david@redhat.com>
> Cc: Stefan Roesch <shr@devkernel.io>
> Cc: Andy Chiu <andy.chiu@sifive.com>
> Cc: Josh Triplett <josh@joshtriplett.org>
> Cc: Oleg Nesterov <oleg@redhat.com>
> Cc: Helge Deller <deller@gmx.de>
> Cc: Zev Weiss <zev@bewilderbeest.net>
> Cc: Ondrej Mosnacek <omosnace@redhat.com>
> Cc: Miguel Ojeda <ojeda@kernel.org>
> Cc: linux-arm-kernel@lists.infradead.org
> Cc: linux-kernel@vger.kernel.org
> Cc: Asahi Linux <asahi@lists.linux.dev>
>
> Signed-off-by: Hector Martin <marcan@marcan.st>
> ---
> Hector Martin (4):
>       prctl: Introduce PR_{SET,GET}_MEM_MODEL
>       arm64: Implement PR_{GET,SET}_MEM_MODEL for always-TSO CPUs
>       arm64: Introduce scaffolding to add ACTLR_EL1 to thread state
>       arm64: Implement Apple IMPDEF TSO memory model control
>
>  arch/arm64/Kconfig                        | 14 ++++++
>  arch/arm64/include/asm/apple_cpufeature.h | 15 +++++++
>  arch/arm64/include/asm/cpufeature.h       | 10 +++++
>  arch/arm64/include/asm/processor.h        |  3 ++
>  arch/arm64/kernel/Makefile                |  3 +-
>  arch/arm64/kernel/cpufeature.c            | 11 ++---
>  arch/arm64/kernel/cpufeature_impdef.c     | 61 ++++++++++++++++++++++++++
>  arch/arm64/kernel/process.c               | 71 +++++++++++++++++++++++++++++++
>  arch/arm64/kernel/setup.c                 |  8 ++++
>  arch/arm64/tools/cpucaps                  |  2 +
>  include/linux/memory_ordering_model.h     | 11 +++++
>  include/uapi/linux/prctl.h                |  5 +++
>  kernel/sys.c                              | 21 +++++++++
>  13 files changed, 229 insertions(+), 6 deletions(-)
> ---
> base-commit: 4cece764965020c22cff7665b18a012006359095
> change-id: 20240411-tso-e86fdceb94b8
>

The series looks good to me.

Reviewed-by: Neal Gompa <neal@gompa.dev>



-- 
真実はいつも一つ!/ Always, there's only one truth!

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH 0/4] arm64: Support the TSO memory model
  2024-04-11  0:51 ` Hector Martin
@ 2024-04-11 13:28   ` Will Deacon
  -1 siblings, 0 replies; 40+ messages in thread
From: Will Deacon @ 2024-04-11 13:28 UTC (permalink / raw)
  To: Hector Martin
  Cc: Catalin Marinas, Marc Zyngier, Mark Rutland, Zayd Qumsieh,
	Justin Lu, Ryan Houdek, Mark Brown, Ard Biesheuvel, Mateusz Guzik,
	Anshuman Khandual, Oliver Upton, Miguel Luis, Joey Gouly,
	Christoph Paasch, Kees Cook, Sami Tolvanen, Baoquan He,
	Joel Granados, Dawei Li, Andrew Morton, Florent Revest,
	David Hildenbrand, Stefan Roesch, Andy Chiu, Josh Triplett,
	Oleg Nesterov, Helge Deller, Zev Weiss, Ondrej Mosnacek,
	Miguel Ojeda, linux-arm-kernel, linux-kernel, Asahi Linux

Hi Hector,

On Thu, Apr 11, 2024 at 09:51:19AM +0900, Hector Martin wrote:
> x86 CPUs implement a stricter memory modern than ARM64 (TSO). For this
> reason, x86 emulation on baseline ARM64 systems requires very expensive
> memory model emulation. Having hardware that supports this natively is
> therefore very attractive. Such hardware, in fact, exists. This series
> adds support for userspace to identify when TSO is available and
> toggle it on, if supported.

I'm probably going to make myself hugely unpopular here, but I have a
strong objection to this patch series as it stands. I firmly believe
that providing a prctl() to query and toggle the memory model to/from
TSO is going to lead to subtle fragmentation of arm64 Linux userspace.

It's not difficult to envisage this TSO switch being abused for native
arm64 applications:

  * A program no longer crashes when TSO is enabled, so the developer
    just toggles TSO to meet a deadline.

  * Some legacy x86 sources are being ported to arm64 but concurrency
    is hard so the developer just enables TSO to (mostly) avoid thinking
    about it.

  * Some binaries in a distribution exhibit instability which goes away
    in TSO mode, so a taskset-like program is used to run them with TSO
    enabled.

In all these cases, we end up with native arm64 applications that will
either fail to load or will crash in subtle ways on CPUs without the TSO
feature. Assuming that the application cannot be fixed, a better
approach would be to recompile using stronger instructions (e.g.
LDAR/STLR) so that at least the resulting binary is portable. Now, it's
true that some existing CPUs are TSO by design (this is a perfectly
valid implementation of the arm64 memory model), but I think there's a
big difference between quietly providing more ordering guarantees than
software may be relying on and providing a mechanism to discover,
request and ultimately rely upon the stronger behaviour.

An alternative option is to go down the SPARC RMO route and just enable
TSO statically (although presumably in the firmware) for Apple silicon.
I'm assuming that has a performance impact for native code?

Will

P.S. I briefly pondered the idea of the kernel toggling the bit in the
ELF loader when e.g. it sees an x86 machine type but I suspect that
doesn't really help with existing emulators and you'd still need a way
to tell the emulator whether or not it was enabled.

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH 0/4] arm64: Support the TSO memory model
@ 2024-04-11 13:28   ` Will Deacon
  0 siblings, 0 replies; 40+ messages in thread
From: Will Deacon @ 2024-04-11 13:28 UTC (permalink / raw)
  To: Hector Martin
  Cc: Catalin Marinas, Marc Zyngier, Mark Rutland, Zayd Qumsieh,
	Justin Lu, Ryan Houdek, Mark Brown, Ard Biesheuvel, Mateusz Guzik,
	Anshuman Khandual, Oliver Upton, Miguel Luis, Joey Gouly,
	Christoph Paasch, Kees Cook, Sami Tolvanen, Baoquan He,
	Joel Granados, Dawei Li, Andrew Morton, Florent Revest,
	David Hildenbrand, Stefan Roesch, Andy Chiu, Josh Triplett,
	Oleg Nesterov, Helge Deller, Zev Weiss, Ondrej Mosnacek,
	Miguel Ojeda, linux-arm-kernel, linux-kernel, Asahi Linux

Hi Hector,

On Thu, Apr 11, 2024 at 09:51:19AM +0900, Hector Martin wrote:
> x86 CPUs implement a stricter memory modern than ARM64 (TSO). For this
> reason, x86 emulation on baseline ARM64 systems requires very expensive
> memory model emulation. Having hardware that supports this natively is
> therefore very attractive. Such hardware, in fact, exists. This series
> adds support for userspace to identify when TSO is available and
> toggle it on, if supported.

I'm probably going to make myself hugely unpopular here, but I have a
strong objection to this patch series as it stands. I firmly believe
that providing a prctl() to query and toggle the memory model to/from
TSO is going to lead to subtle fragmentation of arm64 Linux userspace.

It's not difficult to envisage this TSO switch being abused for native
arm64 applications:

  * A program no longer crashes when TSO is enabled, so the developer
    just toggles TSO to meet a deadline.

  * Some legacy x86 sources are being ported to arm64 but concurrency
    is hard so the developer just enables TSO to (mostly) avoid thinking
    about it.

  * Some binaries in a distribution exhibit instability which goes away
    in TSO mode, so a taskset-like program is used to run them with TSO
    enabled.

In all these cases, we end up with native arm64 applications that will
either fail to load or will crash in subtle ways on CPUs without the TSO
feature. Assuming that the application cannot be fixed, a better
approach would be to recompile using stronger instructions (e.g.
LDAR/STLR) so that at least the resulting binary is portable. Now, it's
true that some existing CPUs are TSO by design (this is a perfectly
valid implementation of the arm64 memory model), but I think there's a
big difference between quietly providing more ordering guarantees than
software may be relying on and providing a mechanism to discover,
request and ultimately rely upon the stronger behaviour.

An alternative option is to go down the SPARC RMO route and just enable
TSO statically (although presumably in the firmware) for Apple silicon.
I'm assuming that has a performance impact for native code?

Will

P.S. I briefly pondered the idea of the kernel toggling the bit in the
ELF loader when e.g. it sees an x86 machine type but I suspect that
doesn't really help with existing emulators and you'd still need a way
to tell the emulator whether or not it was enabled.

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH 0/4] arm64: Support the TSO memory model
  2024-04-11 13:28   ` Will Deacon
@ 2024-04-11 14:19     ` Hector Martin
  -1 siblings, 0 replies; 40+ messages in thread
From: Hector Martin @ 2024-04-11 14:19 UTC (permalink / raw)
  To: Will Deacon
  Cc: Catalin Marinas, Marc Zyngier, Mark Rutland, Zayd Qumsieh,
	Justin Lu, Ryan Houdek, Mark Brown, Ard Biesheuvel, Mateusz Guzik,
	Anshuman Khandual, Oliver Upton, Miguel Luis, Joey Gouly,
	Christoph Paasch, Kees Cook, Sami Tolvanen, Baoquan He,
	Joel Granados, Dawei Li, Andrew Morton, Florent Revest,
	David Hildenbrand, Stefan Roesch, Andy Chiu, Josh Triplett,
	Oleg Nesterov, Helge Deller, Zev Weiss, Ondrej Mosnacek,
	Miguel Ojeda, linux-arm-kernel, linux-kernel, Asahi Linux

On 2024/04/11 22:28, Will Deacon wrote:
> Hi Hector,
> 
> On Thu, Apr 11, 2024 at 09:51:19AM +0900, Hector Martin wrote:
>> x86 CPUs implement a stricter memory modern than ARM64 (TSO). For this
>> reason, x86 emulation on baseline ARM64 systems requires very expensive
>> memory model emulation. Having hardware that supports this natively is
>> therefore very attractive. Such hardware, in fact, exists. This series
>> adds support for userspace to identify when TSO is available and
>> toggle it on, if supported.
> 
> I'm probably going to make myself hugely unpopular here, but I have a
> strong objection to this patch series as it stands. I firmly believe
> that providing a prctl() to query and toggle the memory model to/from
> TSO is going to lead to subtle fragmentation of arm64 Linux userspace.

I honestly doubt this should be a significant concern right now, given
that only a subset of implementations actually support this. Yes,
developers can do stupid stuff, but we already have gone through this
kind of story with other situations (e.g. 16K and 64K page support on
ARM64 breaking 4K assumptions) and things have been fixed over time.

In particular, I highly suspect Asahi Linux and Apple Silicon have done
a lot more good for the ARM64 ecosystem by getting developers to fix
their page size mess than they will do bad by somehow encouraging TSO
abuse. We've even found new memory model issues thanks to the
architecture's deep out-of-order character (remember that mess with
Linux atomics? :-)). So far, in the year+ we've had this patchset
downstream, not a single developer has proposed abusing it for something
that isn't an x86 emulator.

There's a pragmatic argument here: since we need this, and it absolutely
will continue to ship downstream if rejected, it doesn't make much
difference for fragmentation risk does it? The vast majority of
Linux-on-Mac users are likely to continue running downstream kernels for
the foreseeable future anyway to get newer features and hardware support
faster than they can be upstreamed. So not allowing this upstream
doesn't really change the landscape vis-a-vis being able to abuse this
or not, it just makes our life harder by forcing us to carry more
patches forever.

> It's not difficult to envisage this TSO switch being abused for native
> arm64 applications:
> 
>   * A program no longer crashes when TSO is enabled, so the developer
>     just toggles TSO to meet a deadline.
> 
>   * Some legacy x86 sources are being ported to arm64 but concurrency
>     is hard so the developer just enables TSO to (mostly) avoid thinking
>     about it.

Both of these rely on the developer *knowing* what TSO is and why it
fixes this. I posit that a developer who knows what that is also likely
to know why this is a stupid hack and they shouldn't be doing this and
that it won't work on all machines.

> 
>   * Some binaries in a distribution exhibit instability which goes away
>     in TSO mode, so a taskset-like program is used to run them with TSO
>     enabled.

Since the flag is cleared on execve, this third one isn't generally
possible as far as I know.

> In all these cases, we end up with native arm64 applications that will
> either fail to load or will crash in subtle ways on CPUs without the TSO
> feature. Assuming that the application cannot be fixed, a better
> approach would be to recompile using stronger instructions (e.g.
> LDAR/STLR) so that at least the resulting binary is portable. Now, it's
> true that some existing CPUs are TSO by design (this is a perfectly
> valid implementation of the arm64 memory model), but I think there's a
> big difference between quietly providing more ordering guarantees than
> software may be relying on and providing a mechanism to discover,
> request and ultimately rely upon the stronger behaviour.

The problem is "just" using stronger instructions is much more
expensive, as emulators have demonstrated. If TSO didn't serve a
practical purpose I wouldn't be submitting this, but it does. This is
basically non-negotiable for x86 emulation; if this is rejected
upstream, it will forever live as a downstream patch used by the entire
gaming-on-Mac-Linux ecosystem (and this is an ecosystem we are very
explicitly targeting, given our efforts with microVMs for 4K page size
support and the upcoming Vulkan drivers).

That said, I have a pragmatic proposal here. The "fixed TSO" part of the
implementation should be harmless, since those CPUs would correctly run
poorly-written applications anyway so the API is moot. That leaves Apple
Silicon. Our native kernels are and likely always will be 16K page size,
due to a bunch of pain around 16K-only IOMMUs (4K kernels do boot
natively but with very broken functionality including no GPU
acceleration) plus performance differences that favor 16K. How about we
gate the TSO functionality to only be supported on 4K kernel builds?
This would make them only work in 4K VMs on Asahi Linux. We are very
explicitly discouraging people from trying to use the microVMs to work
around page size problems (which they can already do, another
fragmentation problem, anyway); any application which requires the 4K VM
to run that isn't an emulator is already clearly broken and advertising
that fact openly. So, adding TSO to this should be only a marginal risk
of further fragmentation, and it wouldn't allow apps to "sneakily" "just
work" on Apple machines by abusing TSO.

> 
> An alternative option is to go down the SPARC RMO route and just enable
> TSO statically (although presumably in the firmware) for Apple silicon.
> I'm assuming that has a performance impact for native code?

Correct. We already have this as a bootloader option, but it is not
desirable. Plus, userspace code still needs a way to *discover* that TSO
is enabled for correctness, so it can automatically decide whether to
use stronger or weaker instructions.

> 
> Will
> 
> P.S. I briefly pondered the idea of the kernel toggling the bit in the
> ELF loader when e.g. it sees an x86 machine type but I suspect that
> doesn't really help with existing emulators and you'd still need a way
> to tell the emulator whether or not it was enabled.
> 

- Hector

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH 0/4] arm64: Support the TSO memory model
@ 2024-04-11 14:19     ` Hector Martin
  0 siblings, 0 replies; 40+ messages in thread
From: Hector Martin @ 2024-04-11 14:19 UTC (permalink / raw)
  To: Will Deacon
  Cc: Catalin Marinas, Marc Zyngier, Mark Rutland, Zayd Qumsieh,
	Justin Lu, Ryan Houdek, Mark Brown, Ard Biesheuvel, Mateusz Guzik,
	Anshuman Khandual, Oliver Upton, Miguel Luis, Joey Gouly,
	Christoph Paasch, Kees Cook, Sami Tolvanen, Baoquan He,
	Joel Granados, Dawei Li, Andrew Morton, Florent Revest,
	David Hildenbrand, Stefan Roesch, Andy Chiu, Josh Triplett,
	Oleg Nesterov, Helge Deller, Zev Weiss, Ondrej Mosnacek,
	Miguel Ojeda, linux-arm-kernel, linux-kernel, Asahi Linux

On 2024/04/11 22:28, Will Deacon wrote:
> Hi Hector,
> 
> On Thu, Apr 11, 2024 at 09:51:19AM +0900, Hector Martin wrote:
>> x86 CPUs implement a stricter memory modern than ARM64 (TSO). For this
>> reason, x86 emulation on baseline ARM64 systems requires very expensive
>> memory model emulation. Having hardware that supports this natively is
>> therefore very attractive. Such hardware, in fact, exists. This series
>> adds support for userspace to identify when TSO is available and
>> toggle it on, if supported.
> 
> I'm probably going to make myself hugely unpopular here, but I have a
> strong objection to this patch series as it stands. I firmly believe
> that providing a prctl() to query and toggle the memory model to/from
> TSO is going to lead to subtle fragmentation of arm64 Linux userspace.

I honestly doubt this should be a significant concern right now, given
that only a subset of implementations actually support this. Yes,
developers can do stupid stuff, but we already have gone through this
kind of story with other situations (e.g. 16K and 64K page support on
ARM64 breaking 4K assumptions) and things have been fixed over time.

In particular, I highly suspect Asahi Linux and Apple Silicon have done
a lot more good for the ARM64 ecosystem by getting developers to fix
their page size mess than they will do bad by somehow encouraging TSO
abuse. We've even found new memory model issues thanks to the
architecture's deep out-of-order character (remember that mess with
Linux atomics? :-)). So far, in the year+ we've had this patchset
downstream, not a single developer has proposed abusing it for something
that isn't an x86 emulator.

There's a pragmatic argument here: since we need this, and it absolutely
will continue to ship downstream if rejected, it doesn't make much
difference for fragmentation risk does it? The vast majority of
Linux-on-Mac users are likely to continue running downstream kernels for
the foreseeable future anyway to get newer features and hardware support
faster than they can be upstreamed. So not allowing this upstream
doesn't really change the landscape vis-a-vis being able to abuse this
or not, it just makes our life harder by forcing us to carry more
patches forever.

> It's not difficult to envisage this TSO switch being abused for native
> arm64 applications:
> 
>   * A program no longer crashes when TSO is enabled, so the developer
>     just toggles TSO to meet a deadline.
> 
>   * Some legacy x86 sources are being ported to arm64 but concurrency
>     is hard so the developer just enables TSO to (mostly) avoid thinking
>     about it.

Both of these rely on the developer *knowing* what TSO is and why it
fixes this. I posit that a developer who knows what that is also likely
to know why this is a stupid hack and they shouldn't be doing this and
that it won't work on all machines.

> 
>   * Some binaries in a distribution exhibit instability which goes away
>     in TSO mode, so a taskset-like program is used to run them with TSO
>     enabled.

Since the flag is cleared on execve, this third one isn't generally
possible as far as I know.

> In all these cases, we end up with native arm64 applications that will
> either fail to load or will crash in subtle ways on CPUs without the TSO
> feature. Assuming that the application cannot be fixed, a better
> approach would be to recompile using stronger instructions (e.g.
> LDAR/STLR) so that at least the resulting binary is portable. Now, it's
> true that some existing CPUs are TSO by design (this is a perfectly
> valid implementation of the arm64 memory model), but I think there's a
> big difference between quietly providing more ordering guarantees than
> software may be relying on and providing a mechanism to discover,
> request and ultimately rely upon the stronger behaviour.

The problem is "just" using stronger instructions is much more
expensive, as emulators have demonstrated. If TSO didn't serve a
practical purpose I wouldn't be submitting this, but it does. This is
basically non-negotiable for x86 emulation; if this is rejected
upstream, it will forever live as a downstream patch used by the entire
gaming-on-Mac-Linux ecosystem (and this is an ecosystem we are very
explicitly targeting, given our efforts with microVMs for 4K page size
support and the upcoming Vulkan drivers).

That said, I have a pragmatic proposal here. The "fixed TSO" part of the
implementation should be harmless, since those CPUs would correctly run
poorly-written applications anyway so the API is moot. That leaves Apple
Silicon. Our native kernels are and likely always will be 16K page size,
due to a bunch of pain around 16K-only IOMMUs (4K kernels do boot
natively but with very broken functionality including no GPU
acceleration) plus performance differences that favor 16K. How about we
gate the TSO functionality to only be supported on 4K kernel builds?
This would make them only work in 4K VMs on Asahi Linux. We are very
explicitly discouraging people from trying to use the microVMs to work
around page size problems (which they can already do, another
fragmentation problem, anyway); any application which requires the 4K VM
to run that isn't an emulator is already clearly broken and advertising
that fact openly. So, adding TSO to this should be only a marginal risk
of further fragmentation, and it wouldn't allow apps to "sneakily" "just
work" on Apple machines by abusing TSO.

> 
> An alternative option is to go down the SPARC RMO route and just enable
> TSO statically (although presumably in the firmware) for Apple silicon.
> I'm assuming that has a performance impact for native code?

Correct. We already have this as a bootloader option, but it is not
desirable. Plus, userspace code still needs a way to *discover* that TSO
is enabled for correctness, so it can automatically decide whether to
use stronger or weaker instructions.

> 
> Will
> 
> P.S. I briefly pondered the idea of the kernel toggling the bit in the
> ELF loader when e.g. it sees an x86 machine type but I suspect that
> doesn't really help with existing emulators and you'd still need a way
> to tell the emulator whether or not it was enabled.
> 

- Hector

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH 0/4] arm64: Support the TSO memory model
  2024-04-11 14:19     ` Hector Martin
@ 2024-04-11 18:43       ` Hector Martin
  -1 siblings, 0 replies; 40+ messages in thread
From: Hector Martin @ 2024-04-11 18:43 UTC (permalink / raw)
  To: Will Deacon
  Cc: Catalin Marinas, Marc Zyngier, Mark Rutland, Zayd Qumsieh,
	Justin Lu, Ryan Houdek, Mark Brown, Ard Biesheuvel, Mateusz Guzik,
	Anshuman Khandual, Oliver Upton, Miguel Luis, Joey Gouly,
	Christoph Paasch, Kees Cook, Sami Tolvanen, Baoquan He,
	Joel Granados, Dawei Li, Andrew Morton, Florent Revest,
	David Hildenbrand, Stefan Roesch, Andy Chiu, Josh Triplett,
	Oleg Nesterov, Helge Deller, Zev Weiss, Ondrej Mosnacek,
	Miguel Ojeda, linux-arm-kernel, linux-kernel, Asahi Linux



On 2024/04/11 23:19, Hector Martin wrote:
>>
>> An alternative option is to go down the SPARC RMO route and just enable
>> TSO statically (although presumably in the firmware) for Apple silicon.
>> I'm assuming that has a performance impact for native code?
> 
> Correct. We already have this as a bootloader option, but it is not
> desirable. Plus, userspace code still needs a way to *discover* that TSO
> is enabled for correctness, so it can automatically decide whether to
> use stronger or weaker instructions.

To add some numbers to this (I was just made aware of this paper):

https://www.sra.uni-hannover.de/Publications/2023/tosting-arcs23/wrenger_23_arcs.pdf

Using TSO globally has, on average, a 9% performance hit, so that is
clearly off the table as a general solution.

Meanwhile, more detailed microbenchmarks often show TSO as having better
performance than outright using acquire/release instructions without
TSO. Therefore, just giving up on TSO and using acq/rel semantics for
emulators is also not an acceptable solution.

Additionally, the general load/store instructions on ARM have more
flexible addressing modes than the synchronizing ones, and since general
x86 emulation requires *all* loads and stores to be like this in a
non-TSO model (without much more complex/expensive program analysis to
determine where this can be elided), the perf impact is definitely worse
for emulation (e.g. stack accesses are affected) than for a
microbenchmark where only the "target" test instructions are being modified.

- Hector

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH 0/4] arm64: Support the TSO memory model
@ 2024-04-11 18:43       ` Hector Martin
  0 siblings, 0 replies; 40+ messages in thread
From: Hector Martin @ 2024-04-11 18:43 UTC (permalink / raw)
  To: Will Deacon
  Cc: Catalin Marinas, Marc Zyngier, Mark Rutland, Zayd Qumsieh,
	Justin Lu, Ryan Houdek, Mark Brown, Ard Biesheuvel, Mateusz Guzik,
	Anshuman Khandual, Oliver Upton, Miguel Luis, Joey Gouly,
	Christoph Paasch, Kees Cook, Sami Tolvanen, Baoquan He,
	Joel Granados, Dawei Li, Andrew Morton, Florent Revest,
	David Hildenbrand, Stefan Roesch, Andy Chiu, Josh Triplett,
	Oleg Nesterov, Helge Deller, Zev Weiss, Ondrej Mosnacek,
	Miguel Ojeda, linux-arm-kernel, linux-kernel, Asahi Linux



On 2024/04/11 23:19, Hector Martin wrote:
>>
>> An alternative option is to go down the SPARC RMO route and just enable
>> TSO statically (although presumably in the firmware) for Apple silicon.
>> I'm assuming that has a performance impact for native code?
> 
> Correct. We already have this as a bootloader option, but it is not
> desirable. Plus, userspace code still needs a way to *discover* that TSO
> is enabled for correctness, so it can automatically decide whether to
> use stronger or weaker instructions.

To add some numbers to this (I was just made aware of this paper):

https://www.sra.uni-hannover.de/Publications/2023/tosting-arcs23/wrenger_23_arcs.pdf

Using TSO globally has, on average, a 9% performance hit, so that is
clearly off the table as a general solution.

Meanwhile, more detailed microbenchmarks often show TSO as having better
performance than outright using acquire/release instructions without
TSO. Therefore, just giving up on TSO and using acq/rel semantics for
emulators is also not an acceptable solution.

Additionally, the general load/store instructions on ARM have more
flexible addressing modes than the synchronizing ones, and since general
x86 emulation requires *all* loads and stores to be like this in a
non-TSO model (without much more complex/expensive program analysis to
determine where this can be elided), the perf impact is definitely worse
for emulation (e.g. stack accesses are affected) than for a
microbenchmark where only the "target" test instructions are being modified.

- Hector

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH 0/4] arm64: Support the TSO memory model
  2024-04-11  0:51 ` Hector Martin
@ 2024-04-16  2:11   ` Zayd Qumsieh
  -1 siblings, 0 replies; 40+ messages in thread
From: Zayd Qumsieh @ 2024-04-16  2:11 UTC (permalink / raw)
  To: marcan
  Cc: catalin.marinas, will, maz, mark.rutland, zayd_qumsieh, ih_justin,
	Houdek.Ryan, broonie, ardb, mjguzik, anshuman.khandual,
	oliver.upton, miguel.luis, joey.gouly, cpaasch, keescook,
	samitolvanen, bhe, j.granados, dawei.li, akpm, revest, david, shr,
	andy.chiu, josh, oleg, deller, zev, omosnace, ojeda,
	linux-arm-kernel, linux-kernel, asahi

The patch looks great! :) I have one minor suggestion, though:

>static __always_inline bool system_has_actlr_state(void)
>{
>	return IS_ENABLED(CONFIG_ARM64_ACTLR_STATE) &&
>		alternative_has_cap_unlikely(ARM64_HAS_TSO_APPLE);
>}

ACTLR_EL1.TSO is not exposed for writing on Virtual Machines on all
versions of MacOS. However, AIDR_EL1 may still advertise TSO, whether
or not ACTLR_EL1.TSO is writable. Could you modify the patch such that
we check the writability of ACTLR_EL1.TSO in system_has_actlr_state
(or once on startup, and cache it, since reading from AIDR_EL1 causes
a trap to Hypervisor.fwk)?

Thanks,
Zayd

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH 0/4] arm64: Support the TSO memory model
@ 2024-04-16  2:11   ` Zayd Qumsieh
  0 siblings, 0 replies; 40+ messages in thread
From: Zayd Qumsieh @ 2024-04-16  2:11 UTC (permalink / raw)
  To: marcan
  Cc: catalin.marinas, will, maz, mark.rutland, zayd_qumsieh, ih_justin,
	Houdek.Ryan, broonie, ardb, mjguzik, anshuman.khandual,
	oliver.upton, miguel.luis, joey.gouly, cpaasch, keescook,
	samitolvanen, bhe, j.granados, dawei.li, akpm, revest, david, shr,
	andy.chiu, josh, oleg, deller, zev, omosnace, ojeda,
	linux-arm-kernel, linux-kernel, asahi

The patch looks great! :) I have one minor suggestion, though:

>static __always_inline bool system_has_actlr_state(void)
>{
>	return IS_ENABLED(CONFIG_ARM64_ACTLR_STATE) &&
>		alternative_has_cap_unlikely(ARM64_HAS_TSO_APPLE);
>}

ACTLR_EL1.TSO is not exposed for writing on Virtual Machines on all
versions of MacOS. However, AIDR_EL1 may still advertise TSO, whether
or not ACTLR_EL1.TSO is writable. Could you modify the patch such that
we check the writability of ACTLR_EL1.TSO in system_has_actlr_state
(or once on startup, and cache it, since reading from AIDR_EL1 causes
a trap to Hypervisor.fwk)?

Thanks,
Zayd

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH 0/4] arm64: Support the TSO memory model
  2024-04-11 18:43       ` Hector Martin
@ 2024-04-16  2:22         ` Zayd Qumsieh
  -1 siblings, 0 replies; 40+ messages in thread
From: Zayd Qumsieh @ 2024-04-16  2:22 UTC (permalink / raw)
  To: Hector Martin, Will Deacon
  Cc: Catalin Marinas, Marc Zyngier, Mark Rutland, Zayd Qumsieh,
	Justin Lu, Ryan Houdek, Mark Brown, Ard Biesheuvel, Mateusz Guzik,
	Anshuman Khandual, Oliver Upton, Miguel Luis, Joey Gouly,
	Christoph Paasch, Kees Cook, Sami Tolvanen, Baoquan He,
	Joel Granados, Dawei Li, Andrew Morton, Florent Revest,
	David Hildenbrand, Stefan Roesch, Andy Chiu, Josh Triplett,
	Oleg Nesterov, Helge Deller, Zev Weiss, Ondrej Mosnacek,
	Miguel Ojeda, linux-arm-kernel, linux-kernel, Asahi Linux

>I'm probably going to make myself hugely unpopular here, but I have a
>strong objection to this patch series as it stands. I firmly believe
>that providing a prctl() to query and toggle the memory model to/from
>TSO is going to lead to subtle fragmentation of arm64 Linux userspace.

It's definitely not our intent to fragment the ecosystem.
The goal of this memory ordering is to simplify emulation layers that benefit from this.
If you have suggestions to reduce the risk of it being misused outside of emulators, we'd be happy to look into it.

Thanks,
Zayd

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH 0/4] arm64: Support the TSO memory model
@ 2024-04-16  2:22         ` Zayd Qumsieh
  0 siblings, 0 replies; 40+ messages in thread
From: Zayd Qumsieh @ 2024-04-16  2:22 UTC (permalink / raw)
  To: Hector Martin, Will Deacon
  Cc: Catalin Marinas, Marc Zyngier, Mark Rutland, Zayd Qumsieh,
	Justin Lu, Ryan Houdek, Mark Brown, Ard Biesheuvel, Mateusz Guzik,
	Anshuman Khandual, Oliver Upton, Miguel Luis, Joey Gouly,
	Christoph Paasch, Kees Cook, Sami Tolvanen, Baoquan He,
	Joel Granados, Dawei Li, Andrew Morton, Florent Revest,
	David Hildenbrand, Stefan Roesch, Andy Chiu, Josh Triplett,
	Oleg Nesterov, Helge Deller, Zev Weiss, Ondrej Mosnacek,
	Miguel Ojeda, linux-arm-kernel, linux-kernel, Asahi Linux

>I'm probably going to make myself hugely unpopular here, but I have a
>strong objection to this patch series as it stands. I firmly believe
>that providing a prctl() to query and toggle the memory model to/from
>TSO is going to lead to subtle fragmentation of arm64 Linux userspace.

It's definitely not our intent to fragment the ecosystem.
The goal of this memory ordering is to simplify emulation layers that benefit from this.
If you have suggestions to reduce the risk of it being misused outside of emulators, we'd be happy to look into it.

Thanks,
Zayd

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH 0/4] arm64: Support the TSO memory model
  2024-04-11 14:19     ` Hector Martin
@ 2024-04-19 16:58       ` Will Deacon
  -1 siblings, 0 replies; 40+ messages in thread
From: Will Deacon @ 2024-04-19 16:58 UTC (permalink / raw)
  To: Hector Martin
  Cc: Catalin Marinas, Marc Zyngier, Mark Rutland, Zayd Qumsieh,
	Justin Lu, Ryan Houdek, Mark Brown, Ard Biesheuvel, Mateusz Guzik,
	Anshuman Khandual, Oliver Upton, Miguel Luis, Joey Gouly,
	Christoph Paasch, Kees Cook, Sami Tolvanen, Baoquan He,
	Joel Granados, Dawei Li, Andrew Morton, Florent Revest,
	David Hildenbrand, Stefan Roesch, Andy Chiu, Josh Triplett,
	Oleg Nesterov, Helge Deller, Zev Weiss, Ondrej Mosnacek,
	Miguel Ojeda, linux-arm-kernel, linux-kernel, Asahi Linux

On Thu, Apr 11, 2024 at 11:19:13PM +0900, Hector Martin wrote:
> On 2024/04/11 22:28, Will Deacon wrote:
> >   * Some binaries in a distribution exhibit instability which goes away
> >     in TSO mode, so a taskset-like program is used to run them with TSO
> >     enabled.
> 
> Since the flag is cleared on execve, this third one isn't generally
> possible as far as I know.

Ah ok, I'd missed that. Thanks.

> > In all these cases, we end up with native arm64 applications that will
> > either fail to load or will crash in subtle ways on CPUs without the TSO
> > feature. Assuming that the application cannot be fixed, a better
> > approach would be to recompile using stronger instructions (e.g.
> > LDAR/STLR) so that at least the resulting binary is portable. Now, it's
> > true that some existing CPUs are TSO by design (this is a perfectly
> > valid implementation of the arm64 memory model), but I think there's a
> > big difference between quietly providing more ordering guarantees than
> > software may be relying on and providing a mechanism to discover,
> > request and ultimately rely upon the stronger behaviour.
> 
> The problem is "just" using stronger instructions is much more
> expensive, as emulators have demonstrated. If TSO didn't serve a
> practical purpose I wouldn't be submitting this, but it does. This is
> basically non-negotiable for x86 emulation; if this is rejected
> upstream, it will forever live as a downstream patch used by the entire
> gaming-on-Mac-Linux ecosystem (and this is an ecosystem we are very
> explicitly targeting, given our efforts with microVMs for 4K page size
> support and the upcoming Vulkan drivers).

These microVMs sound quite interesting. What exactly are they? Are you
running them under KVM?

Ignoring the mechanism for the time being, would it solve your problem
if you were able to run specific microVMs in TSO mode, or do you *really*
need the VM to have finer-grained control than that? If the whole VM is
running in TSO mode, then my concerns largely disappear, as that's
indistinguishable from running on a hardware implementation that happens
to be TSO.

> That said, I have a pragmatic proposal here. The "fixed TSO" part of the
> implementation should be harmless, since those CPUs would correctly run
> poorly-written applications anyway so the API is moot. That leaves Apple
> Silicon. Our native kernels are and likely always will be 16K page size,
> due to a bunch of pain around 16K-only IOMMUs (4K kernels do boot
> natively but with very broken functionality including no GPU
> acceleration) plus performance differences that favor 16K. How about we
> gate the TSO functionality to only be supported on 4K kernel builds?
> This would make them only work in 4K VMs on Asahi Linux. We are very
> explicitly discouraging people from trying to use the microVMs to work
> around page size problems (which they can already do, another
> fragmentation problem, anyway); any application which requires the 4K VM
> to run that isn't an emulator is already clearly broken and advertising
> that fact openly. So, adding TSO to this should be only a marginal risk
> of further fragmentation, and it wouldn't allow apps to "sneakily" "just
> work" on Apple machines by abusing TSO.

I appreciate that you're trying to be constructive here, but I don't think
we should tie this to the page size. It's an artifical limitation and I
don't think it really addresses the underlying concerns that I have.

Will

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH 0/4] arm64: Support the TSO memory model
@ 2024-04-19 16:58       ` Will Deacon
  0 siblings, 0 replies; 40+ messages in thread
From: Will Deacon @ 2024-04-19 16:58 UTC (permalink / raw)
  To: Hector Martin
  Cc: Catalin Marinas, Marc Zyngier, Mark Rutland, Zayd Qumsieh,
	Justin Lu, Ryan Houdek, Mark Brown, Ard Biesheuvel, Mateusz Guzik,
	Anshuman Khandual, Oliver Upton, Miguel Luis, Joey Gouly,
	Christoph Paasch, Kees Cook, Sami Tolvanen, Baoquan He,
	Joel Granados, Dawei Li, Andrew Morton, Florent Revest,
	David Hildenbrand, Stefan Roesch, Andy Chiu, Josh Triplett,
	Oleg Nesterov, Helge Deller, Zev Weiss, Ondrej Mosnacek,
	Miguel Ojeda, linux-arm-kernel, linux-kernel, Asahi Linux

On Thu, Apr 11, 2024 at 11:19:13PM +0900, Hector Martin wrote:
> On 2024/04/11 22:28, Will Deacon wrote:
> >   * Some binaries in a distribution exhibit instability which goes away
> >     in TSO mode, so a taskset-like program is used to run them with TSO
> >     enabled.
> 
> Since the flag is cleared on execve, this third one isn't generally
> possible as far as I know.

Ah ok, I'd missed that. Thanks.

> > In all these cases, we end up with native arm64 applications that will
> > either fail to load or will crash in subtle ways on CPUs without the TSO
> > feature. Assuming that the application cannot be fixed, a better
> > approach would be to recompile using stronger instructions (e.g.
> > LDAR/STLR) so that at least the resulting binary is portable. Now, it's
> > true that some existing CPUs are TSO by design (this is a perfectly
> > valid implementation of the arm64 memory model), but I think there's a
> > big difference between quietly providing more ordering guarantees than
> > software may be relying on and providing a mechanism to discover,
> > request and ultimately rely upon the stronger behaviour.
> 
> The problem is "just" using stronger instructions is much more
> expensive, as emulators have demonstrated. If TSO didn't serve a
> practical purpose I wouldn't be submitting this, but it does. This is
> basically non-negotiable for x86 emulation; if this is rejected
> upstream, it will forever live as a downstream patch used by the entire
> gaming-on-Mac-Linux ecosystem (and this is an ecosystem we are very
> explicitly targeting, given our efforts with microVMs for 4K page size
> support and the upcoming Vulkan drivers).

These microVMs sound quite interesting. What exactly are they? Are you
running them under KVM?

Ignoring the mechanism for the time being, would it solve your problem
if you were able to run specific microVMs in TSO mode, or do you *really*
need the VM to have finer-grained control than that? If the whole VM is
running in TSO mode, then my concerns largely disappear, as that's
indistinguishable from running on a hardware implementation that happens
to be TSO.

> That said, I have a pragmatic proposal here. The "fixed TSO" part of the
> implementation should be harmless, since those CPUs would correctly run
> poorly-written applications anyway so the API is moot. That leaves Apple
> Silicon. Our native kernels are and likely always will be 16K page size,
> due to a bunch of pain around 16K-only IOMMUs (4K kernels do boot
> natively but with very broken functionality including no GPU
> acceleration) plus performance differences that favor 16K. How about we
> gate the TSO functionality to only be supported on 4K kernel builds?
> This would make them only work in 4K VMs on Asahi Linux. We are very
> explicitly discouraging people from trying to use the microVMs to work
> around page size problems (which they can already do, another
> fragmentation problem, anyway); any application which requires the 4K VM
> to run that isn't an emulator is already clearly broken and advertising
> that fact openly. So, adding TSO to this should be only a marginal risk
> of further fragmentation, and it wouldn't allow apps to "sneakily" "just
> work" on Apple machines by abusing TSO.

I appreciate that you're trying to be constructive here, but I don't think
we should tie this to the page size. It's an artifical limitation and I
don't think it really addresses the underlying concerns that I have.

Will

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH 0/4] arm64: Support the TSO memory model
  2024-04-16  2:22         ` Zayd Qumsieh
@ 2024-04-19 16:58           ` Will Deacon
  -1 siblings, 0 replies; 40+ messages in thread
From: Will Deacon @ 2024-04-19 16:58 UTC (permalink / raw)
  To: Zayd Qumsieh
  Cc: Hector Martin, Catalin Marinas, Marc Zyngier, Mark Rutland,
	Justin Lu, Ryan Houdek, Mark Brown, Ard Biesheuvel, Mateusz Guzik,
	Anshuman Khandual, Oliver Upton, Miguel Luis, Joey Gouly,
	Christoph Paasch, Kees Cook, Sami Tolvanen, Baoquan He,
	Joel Granados, Dawei Li, Andrew Morton, Florent Revest,
	David Hildenbrand, Stefan Roesch, Andy Chiu, Josh Triplett,
	Oleg Nesterov, Helge Deller, Zev Weiss, Ondrej Mosnacek,
	Miguel Ojeda, linux-arm-kernel, linux-kernel, Asahi Linux

On Mon, Apr 15, 2024 at 07:22:41PM -0700, Zayd Qumsieh wrote:
> >I'm probably going to make myself hugely unpopular here, but I have a
> >strong objection to this patch series as it stands. I firmly believe
> >that providing a prctl() to query and toggle the memory model to/from
> >TSO is going to lead to subtle fragmentation of arm64 Linux userspace.
> 
> It's definitely not our intent to fragment the ecosystem.
> The goal of this memory ordering is to simplify emulation layers that benefit from this.
> If you have suggestions to reduce the risk of it being misused outside of emulators, we'd be happy to look into it.

Once you have exposed this toggle via prctl(), it doesn't really matter
what your intentions where. It will get used outside of emulation laters
and we'll be stuck supporting it.

Will

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH 0/4] arm64: Support the TSO memory model
@ 2024-04-19 16:58           ` Will Deacon
  0 siblings, 0 replies; 40+ messages in thread
From: Will Deacon @ 2024-04-19 16:58 UTC (permalink / raw)
  To: Zayd Qumsieh
  Cc: Hector Martin, Catalin Marinas, Marc Zyngier, Mark Rutland,
	Justin Lu, Ryan Houdek, Mark Brown, Ard Biesheuvel, Mateusz Guzik,
	Anshuman Khandual, Oliver Upton, Miguel Luis, Joey Gouly,
	Christoph Paasch, Kees Cook, Sami Tolvanen, Baoquan He,
	Joel Granados, Dawei Li, Andrew Morton, Florent Revest,
	David Hildenbrand, Stefan Roesch, Andy Chiu, Josh Triplett,
	Oleg Nesterov, Helge Deller, Zev Weiss, Ondrej Mosnacek,
	Miguel Ojeda, linux-arm-kernel, linux-kernel, Asahi Linux

On Mon, Apr 15, 2024 at 07:22:41PM -0700, Zayd Qumsieh wrote:
> >I'm probably going to make myself hugely unpopular here, but I have a
> >strong objection to this patch series as it stands. I firmly believe
> >that providing a prctl() to query and toggle the memory model to/from
> >TSO is going to lead to subtle fragmentation of arm64 Linux userspace.
> 
> It's definitely not our intent to fragment the ecosystem.
> The goal of this memory ordering is to simplify emulation layers that benefit from this.
> If you have suggestions to reduce the risk of it being misused outside of emulators, we'd be happy to look into it.

Once you have exposed this toggle via prctl(), it doesn't really matter
what your intentions where. It will get used outside of emulation laters
and we'll be stuck supporting it.

Will

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH 0/4] arm64: Support the TSO memory model
  2024-04-19 16:58           ` Will Deacon
@ 2024-04-19 18:05             ` Catalin Marinas
  -1 siblings, 0 replies; 40+ messages in thread
From: Catalin Marinas @ 2024-04-19 18:05 UTC (permalink / raw)
  To: Will Deacon
  Cc: Zayd Qumsieh, Hector Martin, Marc Zyngier, Mark Rutland,
	Justin Lu, Ryan Houdek, Mark Brown, Ard Biesheuvel, Mateusz Guzik,
	Anshuman Khandual, Oliver Upton, Miguel Luis, Joey Gouly,
	Christoph Paasch, Kees Cook, Sami Tolvanen, Baoquan He,
	Joel Granados, Dawei Li, Andrew Morton, Florent Revest,
	David Hildenbrand, Stefan Roesch, Andy Chiu, Josh Triplett,
	Oleg Nesterov, Helge Deller, Zev Weiss, Ondrej Mosnacek,
	Miguel Ojeda, linux-arm-kernel, linux-kernel, Asahi Linux

On Fri, Apr 19, 2024 at 05:58:26PM +0100, Will Deacon wrote:
> On Mon, Apr 15, 2024 at 07:22:41PM -0700, Zayd Qumsieh wrote:
> > >I'm probably going to make myself hugely unpopular here, but I have a
> > >strong objection to this patch series as it stands. I firmly believe
> > >that providing a prctl() to query and toggle the memory model to/from
> > >TSO is going to lead to subtle fragmentation of arm64 Linux userspace.
> > 
> > It's definitely not our intent to fragment the ecosystem. The goal
> > of this memory ordering is to simplify emulation layers that benefit
> > from this. If you have suggestions to reduce the risk of it being
> > misused outside of emulators, we'd be happy to look into it.
> 
> Once you have exposed this toggle via prctl(), it doesn't really matter
> what your intentions where. It will get used outside of emulation laters
> and we'll be stuck supporting it.

Just FTR, I fully agree with Will. I'm strongly against this kind of ABI
for a non-architected, implementation defined feature. I can't even tell
exactly what TSO means on the Apple hardware. Is it close to the x86
TSO? Is there a formal memory model for it? Are future Apple (or other
Arm vendor) implementations going to follow exactly the same model to be
able to call it some form of "Apple standard" that deserves an ABI?

So, sorry, I'm going to NAK these approaches proposing imp def features
as generic opt-in mechanisms (the microVMs thing sounds doable though,
to my limited understanding; I guess that would mean running the
emulator in a VM).

-- 
Catalin

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH 0/4] arm64: Support the TSO memory model
@ 2024-04-19 18:05             ` Catalin Marinas
  0 siblings, 0 replies; 40+ messages in thread
From: Catalin Marinas @ 2024-04-19 18:05 UTC (permalink / raw)
  To: Will Deacon
  Cc: Zayd Qumsieh, Hector Martin, Marc Zyngier, Mark Rutland,
	Justin Lu, Ryan Houdek, Mark Brown, Ard Biesheuvel, Mateusz Guzik,
	Anshuman Khandual, Oliver Upton, Miguel Luis, Joey Gouly,
	Christoph Paasch, Kees Cook, Sami Tolvanen, Baoquan He,
	Joel Granados, Dawei Li, Andrew Morton, Florent Revest,
	David Hildenbrand, Stefan Roesch, Andy Chiu, Josh Triplett,
	Oleg Nesterov, Helge Deller, Zev Weiss, Ondrej Mosnacek,
	Miguel Ojeda, linux-arm-kernel, linux-kernel, Asahi Linux

On Fri, Apr 19, 2024 at 05:58:26PM +0100, Will Deacon wrote:
> On Mon, Apr 15, 2024 at 07:22:41PM -0700, Zayd Qumsieh wrote:
> > >I'm probably going to make myself hugely unpopular here, but I have a
> > >strong objection to this patch series as it stands. I firmly believe
> > >that providing a prctl() to query and toggle the memory model to/from
> > >TSO is going to lead to subtle fragmentation of arm64 Linux userspace.
> > 
> > It's definitely not our intent to fragment the ecosystem. The goal
> > of this memory ordering is to simplify emulation layers that benefit
> > from this. If you have suggestions to reduce the risk of it being
> > misused outside of emulators, we'd be happy to look into it.
> 
> Once you have exposed this toggle via prctl(), it doesn't really matter
> what your intentions where. It will get used outside of emulation laters
> and we'll be stuck supporting it.

Just FTR, I fully agree with Will. I'm strongly against this kind of ABI
for a non-architected, implementation defined feature. I can't even tell
exactly what TSO means on the Apple hardware. Is it close to the x86
TSO? Is there a formal memory model for it? Are future Apple (or other
Arm vendor) implementations going to follow exactly the same model to be
able to call it some form of "Apple standard" that deserves an ABI?

So, sorry, I'm going to NAK these approaches proposing imp def features
as generic opt-in mechanisms (the microVMs thing sounds doable though,
to my limited understanding; I guess that would mean running the
emulator in a VM).

-- 
Catalin

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH 0/4] arm64: Support the TSO memory model
  2024-04-19 16:58       ` Will Deacon
@ 2024-04-20 11:37         ` Marc Zyngier
  -1 siblings, 0 replies; 40+ messages in thread
From: Marc Zyngier @ 2024-04-20 11:37 UTC (permalink / raw)
  To: Will Deacon
  Cc: Hector Martin, Catalin Marinas, Mark Rutland, Zayd Qumsieh,
	Justin Lu, Ryan Houdek, Mark Brown, Ard Biesheuvel, Mateusz Guzik,
	Anshuman Khandual, Oliver Upton, Miguel Luis, Joey Gouly,
	Christoph Paasch, Kees Cook, Sami Tolvanen, Baoquan He,
	Joel Granados, Dawei Li, Andrew Morton, Florent Revest,
	David Hildenbrand, Stefan Roesch, Andy Chiu, Josh Triplett,
	Oleg Nesterov, Helge Deller, Zev Weiss, Ondrej Mosnacek,
	Miguel Ojeda, linux-arm-kernel, linux-kernel, Asahi Linux

On Fri, 19 Apr 2024 17:58:09 +0100,
Will Deacon <will@kernel.org> wrote:
> 
> On Thu, Apr 11, 2024 at 11:19:13PM +0900, Hector Martin wrote:
> > On 2024/04/11 22:28, Will Deacon wrote:
> > >   * Some binaries in a distribution exhibit instability which goes away
> > >     in TSO mode, so a taskset-like program is used to run them with TSO
> > >     enabled.
> > 
> > Since the flag is cleared on execve, this third one isn't generally
> > possible as far as I know.
> 
> Ah ok, I'd missed that. Thanks.
> 
> > > In all these cases, we end up with native arm64 applications that will
> > > either fail to load or will crash in subtle ways on CPUs without the TSO
> > > feature. Assuming that the application cannot be fixed, a better
> > > approach would be to recompile using stronger instructions (e.g.
> > > LDAR/STLR) so that at least the resulting binary is portable. Now, it's
> > > true that some existing CPUs are TSO by design (this is a perfectly
> > > valid implementation of the arm64 memory model), but I think there's a
> > > big difference between quietly providing more ordering guarantees than
> > > software may be relying on and providing a mechanism to discover,
> > > request and ultimately rely upon the stronger behaviour.
> > 
> > The problem is "just" using stronger instructions is much more
> > expensive, as emulators have demonstrated. If TSO didn't serve a
> > practical purpose I wouldn't be submitting this, but it does. This is
> > basically non-negotiable for x86 emulation; if this is rejected
> > upstream, it will forever live as a downstream patch used by the entire
> > gaming-on-Mac-Linux ecosystem (and this is an ecosystem we are very
> > explicitly targeting, given our efforts with microVMs for 4K page size
> > support and the upcoming Vulkan drivers).
> 
> These microVMs sound quite interesting. What exactly are they? Are you
> running them under KVM?
> 
> Ignoring the mechanism for the time being, would it solve your problem
> if you were able to run specific microVMs in TSO mode, or do you *really*
> need the VM to have finer-grained control than that? If the whole VM is
> running in TSO mode, then my concerns largely disappear, as that's
> indistinguishable from running on a hardware implementation that happens
> to be TSO.

Since KVM has been mentioned a few times, I'll give my take on this.

Since day 1, it was a conscious decision for KVM/arm64 to emulate the
architecture, and only that -- this is complicated enough. Meaning
that no implementation-defined features should be explicitly exposed
to the guest. So I have no plan to expose any such feature for
userspace to configure TSO or anything else of the sort.

However, that doesn't preclude VMs from running in TSO mode if the HW
is configured as such at boot time. From what I have understood, this
is a per translation regime setting (EL1 and EL2 have separate knobs).

So it should be possible to set ACTLR_EL1.TSO=1 from firmware (using
the non-architected ACTLR_EL12 accessor), and let things work without
touching anything else (KVM doesn't context switch this register and
traps accesses to it). This would keep KVM out of the loop, the host
side would be unaffected, and only VMs would pay the overhead of TSO.

I appreciate that this is not the ideal situation, and very much an
all-or-nothing approach. But that's what we can reasonably manage from
an upstream perspective given the variability of the arm64 ecosystem.

	M.

-- 
Without deviation from the norm, progress is not possible.

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH 0/4] arm64: Support the TSO memory model
@ 2024-04-20 11:37         ` Marc Zyngier
  0 siblings, 0 replies; 40+ messages in thread
From: Marc Zyngier @ 2024-04-20 11:37 UTC (permalink / raw)
  To: Will Deacon
  Cc: Hector Martin, Catalin Marinas, Mark Rutland, Zayd Qumsieh,
	Justin Lu, Ryan Houdek, Mark Brown, Ard Biesheuvel, Mateusz Guzik,
	Anshuman Khandual, Oliver Upton, Miguel Luis, Joey Gouly,
	Christoph Paasch, Kees Cook, Sami Tolvanen, Baoquan He,
	Joel Granados, Dawei Li, Andrew Morton, Florent Revest,
	David Hildenbrand, Stefan Roesch, Andy Chiu, Josh Triplett,
	Oleg Nesterov, Helge Deller, Zev Weiss, Ondrej Mosnacek,
	Miguel Ojeda, linux-arm-kernel, linux-kernel, Asahi Linux

On Fri, 19 Apr 2024 17:58:09 +0100,
Will Deacon <will@kernel.org> wrote:
> 
> On Thu, Apr 11, 2024 at 11:19:13PM +0900, Hector Martin wrote:
> > On 2024/04/11 22:28, Will Deacon wrote:
> > >   * Some binaries in a distribution exhibit instability which goes away
> > >     in TSO mode, so a taskset-like program is used to run them with TSO
> > >     enabled.
> > 
> > Since the flag is cleared on execve, this third one isn't generally
> > possible as far as I know.
> 
> Ah ok, I'd missed that. Thanks.
> 
> > > In all these cases, we end up with native arm64 applications that will
> > > either fail to load or will crash in subtle ways on CPUs without the TSO
> > > feature. Assuming that the application cannot be fixed, a better
> > > approach would be to recompile using stronger instructions (e.g.
> > > LDAR/STLR) so that at least the resulting binary is portable. Now, it's
> > > true that some existing CPUs are TSO by design (this is a perfectly
> > > valid implementation of the arm64 memory model), but I think there's a
> > > big difference between quietly providing more ordering guarantees than
> > > software may be relying on and providing a mechanism to discover,
> > > request and ultimately rely upon the stronger behaviour.
> > 
> > The problem is "just" using stronger instructions is much more
> > expensive, as emulators have demonstrated. If TSO didn't serve a
> > practical purpose I wouldn't be submitting this, but it does. This is
> > basically non-negotiable for x86 emulation; if this is rejected
> > upstream, it will forever live as a downstream patch used by the entire
> > gaming-on-Mac-Linux ecosystem (and this is an ecosystem we are very
> > explicitly targeting, given our efforts with microVMs for 4K page size
> > support and the upcoming Vulkan drivers).
> 
> These microVMs sound quite interesting. What exactly are they? Are you
> running them under KVM?
> 
> Ignoring the mechanism for the time being, would it solve your problem
> if you were able to run specific microVMs in TSO mode, or do you *really*
> need the VM to have finer-grained control than that? If the whole VM is
> running in TSO mode, then my concerns largely disappear, as that's
> indistinguishable from running on a hardware implementation that happens
> to be TSO.

Since KVM has been mentioned a few times, I'll give my take on this.

Since day 1, it was a conscious decision for KVM/arm64 to emulate the
architecture, and only that -- this is complicated enough. Meaning
that no implementation-defined features should be explicitly exposed
to the guest. So I have no plan to expose any such feature for
userspace to configure TSO or anything else of the sort.

However, that doesn't preclude VMs from running in TSO mode if the HW
is configured as such at boot time. From what I have understood, this
is a per translation regime setting (EL1 and EL2 have separate knobs).

So it should be possible to set ACTLR_EL1.TSO=1 from firmware (using
the non-architected ACTLR_EL12 accessor), and let things work without
touching anything else (KVM doesn't context switch this register and
traps accesses to it). This would keep KVM out of the loop, the host
side would be unaffected, and only VMs would pay the overhead of TSO.

I appreciate that this is not the ideal situation, and very much an
all-or-nothing approach. But that's what we can reasonably manage from
an upstream perspective given the variability of the arm64 ecosystem.

	M.

-- 
Without deviation from the norm, progress is not possible.

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH 0/4] arm64: Support the TSO memory model
  2024-04-19 16:58       ` Will Deacon
@ 2024-04-20 12:13         ` Eric Curtin
  -1 siblings, 0 replies; 40+ messages in thread
From: Eric Curtin @ 2024-04-20 12:13 UTC (permalink / raw)
  To: Will Deacon
  Cc: Hector Martin, Catalin Marinas, Marc Zyngier, Mark Rutland,
	Zayd Qumsieh, Justin Lu, Ryan Houdek, Mark Brown, Ard Biesheuvel,
	Mateusz Guzik, Anshuman Khandual, Oliver Upton, Miguel Luis,
	Joey Gouly, Christoph Paasch, Kees Cook, Sami Tolvanen,
	Baoquan He, Joel Granados, Dawei Li, Andrew Morton,
	Florent Revest, David Hildenbrand, Stefan Roesch, Andy Chiu,
	Josh Triplett, Oleg Nesterov, Helge Deller, Zev Weiss,
	Ondrej Mosnacek, Miguel Ojeda, linux-arm-kernel, linux-kernel,
	Asahi Linux, Sergio Lopez Pascual

On Fri, 19 Apr 2024 at 18:08, Will Deacon <will@kernel.org> wrote:
>
> On Thu, Apr 11, 2024 at 11:19:13PM +0900, Hector Martin wrote:
> > On 2024/04/11 22:28, Will Deacon wrote:
> > >   * Some binaries in a distribution exhibit instability which goes away
> > >     in TSO mode, so a taskset-like program is used to run them with TSO
> > >     enabled.
> >
> > Since the flag is cleared on execve, this third one isn't generally
> > possible as far as I know.
>
> Ah ok, I'd missed that. Thanks.
>
> > > In all these cases, we end up with native arm64 applications that will
> > > either fail to load or will crash in subtle ways on CPUs without the TSO
> > > feature. Assuming that the application cannot be fixed, a better
> > > approach would be to recompile using stronger instructions (e.g.
> > > LDAR/STLR) so that at least the resulting binary is portable. Now, it's
> > > true that some existing CPUs are TSO by design (this is a perfectly
> > > valid implementation of the arm64 memory model), but I think there's a
> > > big difference between quietly providing more ordering guarantees than
> > > software may be relying on and providing a mechanism to discover,
> > > request and ultimately rely upon the stronger behaviour.
> >
> > The problem is "just" using stronger instructions is much more
> > expensive, as emulators have demonstrated. If TSO didn't serve a
> > practical purpose I wouldn't be submitting this, but it does. This is
> > basically non-negotiable for x86 emulation; if this is rejected
> > upstream, it will forever live as a downstream patch used by the entire
> > gaming-on-Mac-Linux ecosystem (and this is an ecosystem we are very
> > explicitly targeting, given our efforts with microVMs for 4K page size
> > support and the upcoming Vulkan drivers).
>
> These microVMs sound quite interesting. What exactly are they? Are you
> running them under KVM?

It's the magic of libkrun. This is one of the git repos in the family
of libkrun, it has a wide array of use cases, which I personally won't
do much justice explaining all then, this is just one
repo/tool/usecases:

https://github.com/containers/krunvm

https://sinrega.org/running-microvms-on-m1/

CC'ing @Sergio Lopez Pascual the lead of krun in general.

Is mise le meas/Regards,

Eric Curtin

>
> Ignoring the mechanism for the time being, would it solve your problem
> if you were able to run specific microVMs in TSO mode, or do you *really*
> need the VM to have finer-grained control than that? If the whole VM is
> running in TSO mode, then my concerns largely disappear, as that's
> indistinguishable from running on a hardware implementation that happens
> to be TSO.
>
> > That said, I have a pragmatic proposal here. The "fixed TSO" part of the
> > implementation should be harmless, since those CPUs would correctly run
> > poorly-written applications anyway so the API is moot. That leaves Apple
> > Silicon. Our native kernels are and likely always will be 16K page size,
> > due to a bunch of pain around 16K-only IOMMUs (4K kernels do boot
> > natively but with very broken functionality including no GPU
> > acceleration) plus performance differences that favor 16K. How about we
> > gate the TSO functionality to only be supported on 4K kernel builds?
> > This would make them only work in 4K VMs on Asahi Linux. We are very
> > explicitly discouraging people from trying to use the microVMs to work
> > around page size problems (which they can already do, another
> > fragmentation problem, anyway); any application which requires the 4K VM
> > to run that isn't an emulator is already clearly broken and advertising
> > that fact openly. So, adding TSO to this should be only a marginal risk
> > of further fragmentation, and it wouldn't allow apps to "sneakily" "just
> > work" on Apple machines by abusing TSO.
>
> I appreciate that you're trying to be constructive here, but I don't think
> we should tie this to the page size. It's an artifical limitation and I
> don't think it really addresses the underlying concerns that I have.
>
> Will
>


^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH 0/4] arm64: Support the TSO memory model
@ 2024-04-20 12:13         ` Eric Curtin
  0 siblings, 0 replies; 40+ messages in thread
From: Eric Curtin @ 2024-04-20 12:13 UTC (permalink / raw)
  To: Will Deacon
  Cc: Hector Martin, Catalin Marinas, Marc Zyngier, Mark Rutland,
	Zayd Qumsieh, Justin Lu, Ryan Houdek, Mark Brown, Ard Biesheuvel,
	Mateusz Guzik, Anshuman Khandual, Oliver Upton, Miguel Luis,
	Joey Gouly, Christoph Paasch, Kees Cook, Sami Tolvanen,
	Baoquan He, Joel Granados, Dawei Li, Andrew Morton,
	Florent Revest, David Hildenbrand, Stefan Roesch, Andy Chiu,
	Josh Triplett, Oleg Nesterov, Helge Deller, Zev Weiss,
	Ondrej Mosnacek, Miguel Ojeda, linux-arm-kernel, linux-kernel,
	Asahi Linux, Sergio Lopez Pascual

On Fri, 19 Apr 2024 at 18:08, Will Deacon <will@kernel.org> wrote:
>
> On Thu, Apr 11, 2024 at 11:19:13PM +0900, Hector Martin wrote:
> > On 2024/04/11 22:28, Will Deacon wrote:
> > >   * Some binaries in a distribution exhibit instability which goes away
> > >     in TSO mode, so a taskset-like program is used to run them with TSO
> > >     enabled.
> >
> > Since the flag is cleared on execve, this third one isn't generally
> > possible as far as I know.
>
> Ah ok, I'd missed that. Thanks.
>
> > > In all these cases, we end up with native arm64 applications that will
> > > either fail to load or will crash in subtle ways on CPUs without the TSO
> > > feature. Assuming that the application cannot be fixed, a better
> > > approach would be to recompile using stronger instructions (e.g.
> > > LDAR/STLR) so that at least the resulting binary is portable. Now, it's
> > > true that some existing CPUs are TSO by design (this is a perfectly
> > > valid implementation of the arm64 memory model), but I think there's a
> > > big difference between quietly providing more ordering guarantees than
> > > software may be relying on and providing a mechanism to discover,
> > > request and ultimately rely upon the stronger behaviour.
> >
> > The problem is "just" using stronger instructions is much more
> > expensive, as emulators have demonstrated. If TSO didn't serve a
> > practical purpose I wouldn't be submitting this, but it does. This is
> > basically non-negotiable for x86 emulation; if this is rejected
> > upstream, it will forever live as a downstream patch used by the entire
> > gaming-on-Mac-Linux ecosystem (and this is an ecosystem we are very
> > explicitly targeting, given our efforts with microVMs for 4K page size
> > support and the upcoming Vulkan drivers).
>
> These microVMs sound quite interesting. What exactly are they? Are you
> running them under KVM?

It's the magic of libkrun. This is one of the git repos in the family
of libkrun, it has a wide array of use cases, which I personally won't
do much justice explaining all then, this is just one
repo/tool/usecases:

https://github.com/containers/krunvm

https://sinrega.org/running-microvms-on-m1/

CC'ing @Sergio Lopez Pascual the lead of krun in general.

Is mise le meas/Regards,

Eric Curtin

>
> Ignoring the mechanism for the time being, would it solve your problem
> if you were able to run specific microVMs in TSO mode, or do you *really*
> need the VM to have finer-grained control than that? If the whole VM is
> running in TSO mode, then my concerns largely disappear, as that's
> indistinguishable from running on a hardware implementation that happens
> to be TSO.
>
> > That said, I have a pragmatic proposal here. The "fixed TSO" part of the
> > implementation should be harmless, since those CPUs would correctly run
> > poorly-written applications anyway so the API is moot. That leaves Apple
> > Silicon. Our native kernels are and likely always will be 16K page size,
> > due to a bunch of pain around 16K-only IOMMUs (4K kernels do boot
> > natively but with very broken functionality including no GPU
> > acceleration) plus performance differences that favor 16K. How about we
> > gate the TSO functionality to only be supported on 4K kernel builds?
> > This would make them only work in 4K VMs on Asahi Linux. We are very
> > explicitly discouraging people from trying to use the microVMs to work
> > around page size problems (which they can already do, another
> > fragmentation problem, anyway); any application which requires the 4K VM
> > to run that isn't an emulator is already clearly broken and advertising
> > that fact openly. So, adding TSO to this should be only a marginal risk
> > of further fragmentation, and it wouldn't allow apps to "sneakily" "just
> > work" on Apple machines by abusing TSO.
>
> I appreciate that you're trying to be constructive here, but I don't think
> we should tie this to the page size. It's an artifical limitation and I
> don't think it really addresses the underlying concerns that I have.
>
> Will
>


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH 0/4] arm64: Support the TSO memory model
  2024-04-20 12:13         ` Eric Curtin
@ 2024-04-20 12:15           ` Eric Curtin
  -1 siblings, 0 replies; 40+ messages in thread
From: Eric Curtin @ 2024-04-20 12:15 UTC (permalink / raw)
  To: Will Deacon
  Cc: Hector Martin, Catalin Marinas, Marc Zyngier, Mark Rutland,
	Zayd Qumsieh, Justin Lu, Ryan Houdek, Mark Brown, Ard Biesheuvel,
	Mateusz Guzik, Anshuman Khandual, Oliver Upton, Miguel Luis,
	Joey Gouly, Christoph Paasch, Kees Cook, Sami Tolvanen,
	Baoquan He, Joel Granados, Dawei Li, Andrew Morton,
	Florent Revest, David Hildenbrand, Stefan Roesch, Andy Chiu,
	Josh Triplett, Oleg Nesterov, Helge Deller, Zev Weiss,
	Ondrej Mosnacek, Miguel Ojeda, linux-arm-kernel, linux-kernel,
	Asahi Linux, Sergio Lopez Pascual

On Sat, 20 Apr 2024 at 13:13, Eric Curtin <ecurtin@redhat.com> wrote:
>
> On Fri, 19 Apr 2024 at 18:08, Will Deacon <will@kernel.org> wrote:
> >
> > On Thu, Apr 11, 2024 at 11:19:13PM +0900, Hector Martin wrote:
> > > On 2024/04/11 22:28, Will Deacon wrote:
> > > >   * Some binaries in a distribution exhibit instability which goes away
> > > >     in TSO mode, so a taskset-like program is used to run them with TSO
> > > >     enabled.
> > >
> > > Since the flag is cleared on execve, this third one isn't generally
> > > possible as far as I know.
> >
> > Ah ok, I'd missed that. Thanks.
> >
> > > > In all these cases, we end up with native arm64 applications that will
> > > > either fail to load or will crash in subtle ways on CPUs without the TSO
> > > > feature. Assuming that the application cannot be fixed, a better
> > > > approach would be to recompile using stronger instructions (e.g.
> > > > LDAR/STLR) so that at least the resulting binary is portable. Now, it's
> > > > true that some existing CPUs are TSO by design (this is a perfectly
> > > > valid implementation of the arm64 memory model), but I think there's a
> > > > big difference between quietly providing more ordering guarantees than
> > > > software may be relying on and providing a mechanism to discover,
> > > > request and ultimately rely upon the stronger behaviour.
> > >
> > > The problem is "just" using stronger instructions is much more
> > > expensive, as emulators have demonstrated. If TSO didn't serve a
> > > practical purpose I wouldn't be submitting this, but it does. This is
> > > basically non-negotiable for x86 emulation; if this is rejected
> > > upstream, it will forever live as a downstream patch used by the entire
> > > gaming-on-Mac-Linux ecosystem (and this is an ecosystem we are very
> > > explicitly targeting, given our efforts with microVMs for 4K page size
> > > support and the upcoming Vulkan drivers).
> >
> > These microVMs sound quite interesting. What exactly are they? Are you
> > running them under KVM?
>
> It's the magic of libkrun. This is one of the git repos in the family
> of libkrun, it has a wide array of use cases, which I personally won't
> do much justice explaining all then, this is just one
> repo/tool/usecases:
>
> https://github.com/containers/krunvm
>
> https://sinrega.org/running-microvms-on-m1/

Sorry for the double post, meant to share this one for the Asahi
emulator usecase. Sergio's blogs are great in general:

https://sinrega.org/2023-10-06-using-microvms-for-gaming-on-fedora-asahi/

Is mise le meas/Regards,

Eric Curtin

>
> CC'ing @Sergio Lopez Pascual the lead of krun in general.
>
> Is mise le meas/Regards,
>
> Eric Curtin
>
> >
> > Ignoring the mechanism for the time being, would it solve your problem
> > if you were able to run specific microVMs in TSO mode, or do you *really*
> > need the VM to have finer-grained control than that? If the whole VM is
> > running in TSO mode, then my concerns largely disappear, as that's
> > indistinguishable from running on a hardware implementation that happens
> > to be TSO.
> >
> > > That said, I have a pragmatic proposal here. The "fixed TSO" part of the
> > > implementation should be harmless, since those CPUs would correctly run
> > > poorly-written applications anyway so the API is moot. That leaves Apple
> > > Silicon. Our native kernels are and likely always will be 16K page size,
> > > due to a bunch of pain around 16K-only IOMMUs (4K kernels do boot
> > > natively but with very broken functionality including no GPU
> > > acceleration) plus performance differences that favor 16K. How about we
> > > gate the TSO functionality to only be supported on 4K kernel builds?
> > > This would make them only work in 4K VMs on Asahi Linux. We are very
> > > explicitly discouraging people from trying to use the microVMs to work
> > > around page size problems (which they can already do, another
> > > fragmentation problem, anyway); any application which requires the 4K VM
> > > to run that isn't an emulator is already clearly broken and advertising
> > > that fact openly. So, adding TSO to this should be only a marginal risk
> > > of further fragmentation, and it wouldn't allow apps to "sneakily" "just
> > > work" on Apple machines by abusing TSO.
> >
> > I appreciate that you're trying to be constructive here, but I don't think
> > we should tie this to the page size. It's an artifical limitation and I
> > don't think it really addresses the underlying concerns that I have.
> >
> > Will
> >


^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH 0/4] arm64: Support the TSO memory model
@ 2024-04-20 12:15           ` Eric Curtin
  0 siblings, 0 replies; 40+ messages in thread
From: Eric Curtin @ 2024-04-20 12:15 UTC (permalink / raw)
  To: Will Deacon
  Cc: Hector Martin, Catalin Marinas, Marc Zyngier, Mark Rutland,
	Zayd Qumsieh, Justin Lu, Ryan Houdek, Mark Brown, Ard Biesheuvel,
	Mateusz Guzik, Anshuman Khandual, Oliver Upton, Miguel Luis,
	Joey Gouly, Christoph Paasch, Kees Cook, Sami Tolvanen,
	Baoquan He, Joel Granados, Dawei Li, Andrew Morton,
	Florent Revest, David Hildenbrand, Stefan Roesch, Andy Chiu,
	Josh Triplett, Oleg Nesterov, Helge Deller, Zev Weiss,
	Ondrej Mosnacek, Miguel Ojeda, linux-arm-kernel, linux-kernel,
	Asahi Linux, Sergio Lopez Pascual

On Sat, 20 Apr 2024 at 13:13, Eric Curtin <ecurtin@redhat.com> wrote:
>
> On Fri, 19 Apr 2024 at 18:08, Will Deacon <will@kernel.org> wrote:
> >
> > On Thu, Apr 11, 2024 at 11:19:13PM +0900, Hector Martin wrote:
> > > On 2024/04/11 22:28, Will Deacon wrote:
> > > >   * Some binaries in a distribution exhibit instability which goes away
> > > >     in TSO mode, so a taskset-like program is used to run them with TSO
> > > >     enabled.
> > >
> > > Since the flag is cleared on execve, this third one isn't generally
> > > possible as far as I know.
> >
> > Ah ok, I'd missed that. Thanks.
> >
> > > > In all these cases, we end up with native arm64 applications that will
> > > > either fail to load or will crash in subtle ways on CPUs without the TSO
> > > > feature. Assuming that the application cannot be fixed, a better
> > > > approach would be to recompile using stronger instructions (e.g.
> > > > LDAR/STLR) so that at least the resulting binary is portable. Now, it's
> > > > true that some existing CPUs are TSO by design (this is a perfectly
> > > > valid implementation of the arm64 memory model), but I think there's a
> > > > big difference between quietly providing more ordering guarantees than
> > > > software may be relying on and providing a mechanism to discover,
> > > > request and ultimately rely upon the stronger behaviour.
> > >
> > > The problem is "just" using stronger instructions is much more
> > > expensive, as emulators have demonstrated. If TSO didn't serve a
> > > practical purpose I wouldn't be submitting this, but it does. This is
> > > basically non-negotiable for x86 emulation; if this is rejected
> > > upstream, it will forever live as a downstream patch used by the entire
> > > gaming-on-Mac-Linux ecosystem (and this is an ecosystem we are very
> > > explicitly targeting, given our efforts with microVMs for 4K page size
> > > support and the upcoming Vulkan drivers).
> >
> > These microVMs sound quite interesting. What exactly are they? Are you
> > running them under KVM?
>
> It's the magic of libkrun. This is one of the git repos in the family
> of libkrun, it has a wide array of use cases, which I personally won't
> do much justice explaining all then, this is just one
> repo/tool/usecases:
>
> https://github.com/containers/krunvm
>
> https://sinrega.org/running-microvms-on-m1/

Sorry for the double post, meant to share this one for the Asahi
emulator usecase. Sergio's blogs are great in general:

https://sinrega.org/2023-10-06-using-microvms-for-gaming-on-fedora-asahi/

Is mise le meas/Regards,

Eric Curtin

>
> CC'ing @Sergio Lopez Pascual the lead of krun in general.
>
> Is mise le meas/Regards,
>
> Eric Curtin
>
> >
> > Ignoring the mechanism for the time being, would it solve your problem
> > if you were able to run specific microVMs in TSO mode, or do you *really*
> > need the VM to have finer-grained control than that? If the whole VM is
> > running in TSO mode, then my concerns largely disappear, as that's
> > indistinguishable from running on a hardware implementation that happens
> > to be TSO.
> >
> > > That said, I have a pragmatic proposal here. The "fixed TSO" part of the
> > > implementation should be harmless, since those CPUs would correctly run
> > > poorly-written applications anyway so the API is moot. That leaves Apple
> > > Silicon. Our native kernels are and likely always will be 16K page size,
> > > due to a bunch of pain around 16K-only IOMMUs (4K kernels do boot
> > > natively but with very broken functionality including no GPU
> > > acceleration) plus performance differences that favor 16K. How about we
> > > gate the TSO functionality to only be supported on 4K kernel builds?
> > > This would make them only work in 4K VMs on Asahi Linux. We are very
> > > explicitly discouraging people from trying to use the microVMs to work
> > > around page size problems (which they can already do, another
> > > fragmentation problem, anyway); any application which requires the 4K VM
> > > to run that isn't an emulator is already clearly broken and advertising
> > > that fact openly. So, adding TSO to this should be only a marginal risk
> > > of further fragmentation, and it wouldn't allow apps to "sneakily" "just
> > > work" on Apple machines by abusing TSO.
> >
> > I appreciate that you're trying to be constructive here, but I don't think
> > we should tie this to the page size. It's an artifical limitation and I
> > don't think it really addresses the underlying concerns that I have.
> >
> > Will
> >


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH 0/4] arm64: Support the TSO memory model
  2024-04-20 11:37         ` Marc Zyngier
@ 2024-05-02  0:10           ` Zayd Qumsieh
  -1 siblings, 0 replies; 40+ messages in thread
From: Zayd Qumsieh @ 2024-05-02  0:10 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: Catalin Marinas, Mark Rutland, Zayd Qumsieh, Justin Lu,
	Ryan Houdek, Mark Brown, Ard Biesheuvel, Mateusz Guzik,
	Anshuman Khandual, Oliver Upton, Miguel Luis, Joey Gouly,
	Christoph Paasch, Kees Cook, Sami Tolvanen, Baoquan He,
	Joel Granados, Dawei Li, Andrew Morton, Florent Revest,
	David Hildenbrand, Stefan Roesch, Andy Chiu, Josh Triplett,
	Oleg Nesterov, Helge Deller, Zev Weiss, Ondrej Mosnacek,
	Miguel Ojeda, linux-arm-kernel, linux-kernel, Asahi Linux

> On Fri, 19 Apr 2024 17:58:09 +0100,
> Will Deacon <will@kernel.org> wrote:
> > 
> > On Thu, Apr 11, 2024 at 11:19:13PM +0900, Hector Martin wrote:
> > > On 2024/04/11 22:28, Will Deacon wrote:
> > > >   * Some binaries in a distribution exhibit instability which goes away
> > > >     in TSO mode, so a taskset-like program is used to run them with TSO
> > > >     enabled.
> > > 
> > > Since the flag is cleared on execve, this third one isn't generally
> > > possible as far as I know.
> > 
> > Ah ok, I'd missed that. Thanks.
> > 
> > > > In all these cases, we end up with native arm64 applications that will
> > > > either fail to load or will crash in subtle ways on CPUs without the TSO
> > > > feature. Assuming that the application cannot be fixed, a better
> > > > approach would be to recompile using stronger instructions (e.g.
> > > > LDAR/STLR) so that at least the resulting binary is portable. Now, it's
> > > > true that some existing CPUs are TSO by design (this is a perfectly
> > > > valid implementation of the arm64 memory model), but I think there's a
> > > > big difference between quietly providing more ordering guarantees than
> > > > software may be relying on and providing a mechanism to discover,
> > > > request and ultimately rely upon the stronger behaviour.
> > > 
> > > The problem is "just" using stronger instructions is much more
> > > expensive, as emulators have demonstrated. If TSO didn't serve a
> > > practical purpose I wouldn't be submitting this, but it does. This is
> > > basically non-negotiable for x86 emulation; if this is rejected
> > > upstream, it will forever live as a downstream patch used by the entire
> > > gaming-on-Mac-Linux ecosystem (and this is an ecosystem we are very
> > > explicitly targeting, given our efforts with microVMs for 4K page size
> > > support and the upcoming Vulkan drivers).
> > 
> > These microVMs sound quite interesting. What exactly are they? Are you
> > running them under KVM?
> > 
> > Ignoring the mechanism for the time being, would it solve your problem
> > if you were able to run specific microVMs in TSO mode, or do you *really*
> > need the VM to have finer-grained control than that? If the whole VM is
> > running in TSO mode, then my concerns largely disappear, as that's
> > indistinguishable from running on a hardware implementation that happens
> > to be TSO.
>
> Since KVM has been mentioned a few times, I'll give my take on this.
>
> Since day 1, it was a conscious decision for KVM/arm64 to emulate the
> architecture, and only that -- this is complicated enough. Meaning
> that no implementation-defined features should be explicitly exposed
> to the guest. So I have no plan to expose any such feature for
> userspace to configure TSO or anything else of the sort.

Agreed. We do not intend for TSO mode to be used extensively for EL1, the
intention is for TSO mode to be reserved for userspace applications that
request it.

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH 0/4] arm64: Support the TSO memory model
@ 2024-05-02  0:10           ` Zayd Qumsieh
  0 siblings, 0 replies; 40+ messages in thread
From: Zayd Qumsieh @ 2024-05-02  0:10 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: Catalin Marinas, Mark Rutland, Zayd Qumsieh, Justin Lu,
	Ryan Houdek, Mark Brown, Ard Biesheuvel, Mateusz Guzik,
	Anshuman Khandual, Oliver Upton, Miguel Luis, Joey Gouly,
	Christoph Paasch, Kees Cook, Sami Tolvanen, Baoquan He,
	Joel Granados, Dawei Li, Andrew Morton, Florent Revest,
	David Hildenbrand, Stefan Roesch, Andy Chiu, Josh Triplett,
	Oleg Nesterov, Helge Deller, Zev Weiss, Ondrej Mosnacek,
	Miguel Ojeda, linux-arm-kernel, linux-kernel, Asahi Linux

> On Fri, 19 Apr 2024 17:58:09 +0100,
> Will Deacon <will@kernel.org> wrote:
> > 
> > On Thu, Apr 11, 2024 at 11:19:13PM +0900, Hector Martin wrote:
> > > On 2024/04/11 22:28, Will Deacon wrote:
> > > >   * Some binaries in a distribution exhibit instability which goes away
> > > >     in TSO mode, so a taskset-like program is used to run them with TSO
> > > >     enabled.
> > > 
> > > Since the flag is cleared on execve, this third one isn't generally
> > > possible as far as I know.
> > 
> > Ah ok, I'd missed that. Thanks.
> > 
> > > > In all these cases, we end up with native arm64 applications that will
> > > > either fail to load or will crash in subtle ways on CPUs without the TSO
> > > > feature. Assuming that the application cannot be fixed, a better
> > > > approach would be to recompile using stronger instructions (e.g.
> > > > LDAR/STLR) so that at least the resulting binary is portable. Now, it's
> > > > true that some existing CPUs are TSO by design (this is a perfectly
> > > > valid implementation of the arm64 memory model), but I think there's a
> > > > big difference between quietly providing more ordering guarantees than
> > > > software may be relying on and providing a mechanism to discover,
> > > > request and ultimately rely upon the stronger behaviour.
> > > 
> > > The problem is "just" using stronger instructions is much more
> > > expensive, as emulators have demonstrated. If TSO didn't serve a
> > > practical purpose I wouldn't be submitting this, but it does. This is
> > > basically non-negotiable for x86 emulation; if this is rejected
> > > upstream, it will forever live as a downstream patch used by the entire
> > > gaming-on-Mac-Linux ecosystem (and this is an ecosystem we are very
> > > explicitly targeting, given our efforts with microVMs for 4K page size
> > > support and the upcoming Vulkan drivers).
> > 
> > These microVMs sound quite interesting. What exactly are they? Are you
> > running them under KVM?
> > 
> > Ignoring the mechanism for the time being, would it solve your problem
> > if you were able to run specific microVMs in TSO mode, or do you *really*
> > need the VM to have finer-grained control than that? If the whole VM is
> > running in TSO mode, then my concerns largely disappear, as that's
> > indistinguishable from running on a hardware implementation that happens
> > to be TSO.
>
> Since KVM has been mentioned a few times, I'll give my take on this.
>
> Since day 1, it was a conscious decision for KVM/arm64 to emulate the
> architecture, and only that -- this is complicated enough. Meaning
> that no implementation-defined features should be explicitly exposed
> to the guest. So I have no plan to expose any such feature for
> userspace to configure TSO or anything else of the sort.

Agreed. We do not intend for TSO mode to be used extensively for EL1, the
intention is for TSO mode to be reserved for userspace applications that
request it.

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH 0/4] arm64: Support the TSO memory model
  2024-04-11 13:28   ` Will Deacon
@ 2024-05-02  0:16     ` Zayd Qumsieh
  -1 siblings, 0 replies; 40+ messages in thread
From: Zayd Qumsieh @ 2024-05-02  0:16 UTC (permalink / raw)
  To: Will Deacon
  Cc: Catalin Marinas, Marc Zyngier, Mark Rutland, Zayd Qumsieh,
	Justin Lu, Ryan Houdek, Mark Brown, Ard Biesheuvel, Mateusz Guzik,
	Anshuman Khandual, Oliver Upton, Miguel Luis, Joey Gouly,
	Christoph Paasch, Kees Cook, Sami Tolvanen, Baoquan He,
	Joel Granados, Dawei Li, Andrew Morton, Florent Revest,
	David Hildenbrand, Stefan Roesch, Andy Chiu, Josh Triplett,
	Oleg Nesterov, Helge Deller, Zev Weiss, Ondrej Mosnacek,
	Miguel Ojeda, linux-arm-kernel, linux-kernel, Asahi Linux

On Thu, 11 Apr 2024 14:28:54 +0100,
Will Deacon <will@kernel.org> wrote:
> P.S. I briefly pondered the idea of the kernel toggling the bit in the
> ELF loader when e.g. it sees an x86 machine type but I suspect that
> doesn't really help with existing emulators and you'd still need a way
> to tell the emulator whether or not it was enabled.

This seems promising to me. What do people think of adding an opt-in argument,
option, or similar to binfmt that allows users to mark certain file formats as
"must run under TSO"? And then, the kernel would set the TSO bit when invoking
the interpreter for those file formats. If an emulator decides to create a
non-CPU-emulation thread, then it can use a prctl to disable TSO and switch to
the default ARM memory model. Note that this prctl wouldn't be allowed to
enable TSO - it would only disable it. This way, it is much harder for a
faulty application to be made that relies on TSO, since enabling of TSO is
only done via a binfmt handler that the user must explicitly opt into.

It is true that existing emulators wouldn't be able to benefit from this, but
that's the case no matter the activation mechanism. We can, however, expose a
prctl to get the memory model, so emulators can detect if TSO was enabled for
their threads.

To summarize, I propose two prctls (similar to the ones in the current revision
of the patch series). One to switch from the TSO memory model to the default
ARM one (this is a one-way street). And another to query the current memory
model.

Thanks,
Zayd

P.S. I forgot to CC you in my most recent email to Marc Zyngier just now. 
Sorry, I'm quite new to using mailing lists.

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH 0/4] arm64: Support the TSO memory model
@ 2024-05-02  0:16     ` Zayd Qumsieh
  0 siblings, 0 replies; 40+ messages in thread
From: Zayd Qumsieh @ 2024-05-02  0:16 UTC (permalink / raw)
  To: Will Deacon
  Cc: Catalin Marinas, Marc Zyngier, Mark Rutland, Zayd Qumsieh,
	Justin Lu, Ryan Houdek, Mark Brown, Ard Biesheuvel, Mateusz Guzik,
	Anshuman Khandual, Oliver Upton, Miguel Luis, Joey Gouly,
	Christoph Paasch, Kees Cook, Sami Tolvanen, Baoquan He,
	Joel Granados, Dawei Li, Andrew Morton, Florent Revest,
	David Hildenbrand, Stefan Roesch, Andy Chiu, Josh Triplett,
	Oleg Nesterov, Helge Deller, Zev Weiss, Ondrej Mosnacek,
	Miguel Ojeda, linux-arm-kernel, linux-kernel, Asahi Linux

On Thu, 11 Apr 2024 14:28:54 +0100,
Will Deacon <will@kernel.org> wrote:
> P.S. I briefly pondered the idea of the kernel toggling the bit in the
> ELF loader when e.g. it sees an x86 machine type but I suspect that
> doesn't really help with existing emulators and you'd still need a way
> to tell the emulator whether or not it was enabled.

This seems promising to me. What do people think of adding an opt-in argument,
option, or similar to binfmt that allows users to mark certain file formats as
"must run under TSO"? And then, the kernel would set the TSO bit when invoking
the interpreter for those file formats. If an emulator decides to create a
non-CPU-emulation thread, then it can use a prctl to disable TSO and switch to
the default ARM memory model. Note that this prctl wouldn't be allowed to
enable TSO - it would only disable it. This way, it is much harder for a
faulty application to be made that relies on TSO, since enabling of TSO is
only done via a binfmt handler that the user must explicitly opt into.

It is true that existing emulators wouldn't be able to benefit from this, but
that's the case no matter the activation mechanism. We can, however, expose a
prctl to get the memory model, so emulators can detect if TSO was enabled for
their threads.

To summarize, I propose two prctls (similar to the ones in the current revision
of the patch series). One to switch from the TSO memory model to the default
ARM one (this is a one-way street). And another to query the current memory
model.

Thanks,
Zayd

P.S. I forgot to CC you in my most recent email to Marc Zyngier just now. 
Sorry, I'm quite new to using mailing lists.

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH 0/4] arm64: Support the TSO memory model
  2024-05-02  0:10           ` Zayd Qumsieh
@ 2024-05-02 13:25             ` Marc Zyngier
  -1 siblings, 0 replies; 40+ messages in thread
From: Marc Zyngier @ 2024-05-02 13:25 UTC (permalink / raw)
  To: Zayd Qumsieh
  Cc: Catalin Marinas, Will Deacon, Mark Rutland, Justin Lu,
	Ryan Houdek, Mark Brown, Ard Biesheuvel, Mateusz Guzik,
	Anshuman Khandual, Oliver Upton, Miguel Luis, Joey Gouly,
	Christoph Paasch, Kees Cook, Sami Tolvanen, Baoquan He,
	Joel Granados, Dawei Li, Andrew Morton, Florent Revest,
	David Hildenbrand, Stefan Roesch, Andy Chiu, Josh Triplett,
	Oleg Nesterov, Helge Deller, Zev Weiss, Ondrej Mosnacek,
	Miguel Ojeda, linux-arm-kernel, linux-kernel, Asahi Linux

[adding Will back to the thread]

On Thu, 02 May 2024 01:10:35 +0100,
Zayd Qumsieh <zayd_qumsieh@apple.com> wrote:
> 
> > On Fri, 19 Apr 2024 17:58:09 +0100,
> > Will Deacon <will@kernel.org> wrote:
> > > 
> > > On Thu, Apr 11, 2024 at 11:19:13PM +0900, Hector Martin wrote:
> > > > On 2024/04/11 22:28, Will Deacon wrote:
> > > > >   * Some binaries in a distribution exhibit instability which goes away
> > > > >     in TSO mode, so a taskset-like program is used to run them with TSO
> > > > >     enabled.
> > > > 
> > > > Since the flag is cleared on execve, this third one isn't generally
> > > > possible as far as I know.
> > > 
> > > Ah ok, I'd missed that. Thanks.
> > > 
> > > > > In all these cases, we end up with native arm64 applications that will
> > > > > either fail to load or will crash in subtle ways on CPUs without the TSO
> > > > > feature. Assuming that the application cannot be fixed, a better
> > > > > approach would be to recompile using stronger instructions (e.g.
> > > > > LDAR/STLR) so that at least the resulting binary is portable. Now, it's
> > > > > true that some existing CPUs are TSO by design (this is a perfectly
> > > > > valid implementation of the arm64 memory model), but I think there's a
> > > > > big difference between quietly providing more ordering guarantees than
> > > > > software may be relying on and providing a mechanism to discover,
> > > > > request and ultimately rely upon the stronger behaviour.
> > > > 
> > > > The problem is "just" using stronger instructions is much more
> > > > expensive, as emulators have demonstrated. If TSO didn't serve a
> > > > practical purpose I wouldn't be submitting this, but it does. This is
> > > > basically non-negotiable for x86 emulation; if this is rejected
> > > > upstream, it will forever live as a downstream patch used by the entire
> > > > gaming-on-Mac-Linux ecosystem (and this is an ecosystem we are very
> > > > explicitly targeting, given our efforts with microVMs for 4K page size
> > > > support and the upcoming Vulkan drivers).
> > > 
> > > These microVMs sound quite interesting. What exactly are they? Are you
> > > running them under KVM?
> > > 
> > > Ignoring the mechanism for the time being, would it solve your problem
> > > if you were able to run specific microVMs in TSO mode, or do you *really*
> > > need the VM to have finer-grained control than that? If the whole VM is
> > > running in TSO mode, then my concerns largely disappear, as that's
> > > indistinguishable from running on a hardware implementation that happens
> > > to be TSO.
> >
> > Since KVM has been mentioned a few times, I'll give my take on this.
> >
> > Since day 1, it was a conscious decision for KVM/arm64 to emulate the
> > architecture, and only that -- this is complicated enough. Meaning
> > that no implementation-defined features should be explicitly exposed
> > to the guest. So I have no plan to expose any such feature for
> > userspace to configure TSO or anything else of the sort.
> 
> Agreed. We do not intend for TSO mode to be used extensively for EL1, the
> intention is for TSO mode to be reserved for userspace applications that
> request it.

But that's the same thing for a hypervisor.

For usersoace in a VM to make use of any feature, it must be exposed
to the VM as a whole by the host VMM (QEMU, kvmtool, whatever). Which
means having a new userspace ABI, specific to KVM, exposing a feature
for which there is no spec whatsoever. Even worse, you cannot discover
whether the instruction you must use to context switch the ACTLR_EL1
register is implemented. Isn't that great?

And I'm not even talking about the joys of migrating such a VM,
because we have no clue what this bit means on other implementations.
For all we know it causes another CPU to catch fire (or go PDP-endian,
which is basically the same).

Which is why my proposal is for this bit to be set statically for
*all* VMs, and leave the kernel (and KVM) out of the picture
altogether. At least that is something we can reason about (although
someone would need to start thinking of how this particular TSO
implementation composes with the relaxed memory ordering used outside
of the VM and show that they actually lead to correct results for
something such as virtio, for example).

Thanks,

	M.

-- 
Without deviation from the norm, progress is not possible.

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH 0/4] arm64: Support the TSO memory model
@ 2024-05-02 13:25             ` Marc Zyngier
  0 siblings, 0 replies; 40+ messages in thread
From: Marc Zyngier @ 2024-05-02 13:25 UTC (permalink / raw)
  To: Zayd Qumsieh
  Cc: Catalin Marinas, Will Deacon, Mark Rutland, Justin Lu,
	Ryan Houdek, Mark Brown, Ard Biesheuvel, Mateusz Guzik,
	Anshuman Khandual, Oliver Upton, Miguel Luis, Joey Gouly,
	Christoph Paasch, Kees Cook, Sami Tolvanen, Baoquan He,
	Joel Granados, Dawei Li, Andrew Morton, Florent Revest,
	David Hildenbrand, Stefan Roesch, Andy Chiu, Josh Triplett,
	Oleg Nesterov, Helge Deller, Zev Weiss, Ondrej Mosnacek,
	Miguel Ojeda, linux-arm-kernel, linux-kernel, Asahi Linux

[adding Will back to the thread]

On Thu, 02 May 2024 01:10:35 +0100,
Zayd Qumsieh <zayd_qumsieh@apple.com> wrote:
> 
> > On Fri, 19 Apr 2024 17:58:09 +0100,
> > Will Deacon <will@kernel.org> wrote:
> > > 
> > > On Thu, Apr 11, 2024 at 11:19:13PM +0900, Hector Martin wrote:
> > > > On 2024/04/11 22:28, Will Deacon wrote:
> > > > >   * Some binaries in a distribution exhibit instability which goes away
> > > > >     in TSO mode, so a taskset-like program is used to run them with TSO
> > > > >     enabled.
> > > > 
> > > > Since the flag is cleared on execve, this third one isn't generally
> > > > possible as far as I know.
> > > 
> > > Ah ok, I'd missed that. Thanks.
> > > 
> > > > > In all these cases, we end up with native arm64 applications that will
> > > > > either fail to load or will crash in subtle ways on CPUs without the TSO
> > > > > feature. Assuming that the application cannot be fixed, a better
> > > > > approach would be to recompile using stronger instructions (e.g.
> > > > > LDAR/STLR) so that at least the resulting binary is portable. Now, it's
> > > > > true that some existing CPUs are TSO by design (this is a perfectly
> > > > > valid implementation of the arm64 memory model), but I think there's a
> > > > > big difference between quietly providing more ordering guarantees than
> > > > > software may be relying on and providing a mechanism to discover,
> > > > > request and ultimately rely upon the stronger behaviour.
> > > > 
> > > > The problem is "just" using stronger instructions is much more
> > > > expensive, as emulators have demonstrated. If TSO didn't serve a
> > > > practical purpose I wouldn't be submitting this, but it does. This is
> > > > basically non-negotiable for x86 emulation; if this is rejected
> > > > upstream, it will forever live as a downstream patch used by the entire
> > > > gaming-on-Mac-Linux ecosystem (and this is an ecosystem we are very
> > > > explicitly targeting, given our efforts with microVMs for 4K page size
> > > > support and the upcoming Vulkan drivers).
> > > 
> > > These microVMs sound quite interesting. What exactly are they? Are you
> > > running them under KVM?
> > > 
> > > Ignoring the mechanism for the time being, would it solve your problem
> > > if you were able to run specific microVMs in TSO mode, or do you *really*
> > > need the VM to have finer-grained control than that? If the whole VM is
> > > running in TSO mode, then my concerns largely disappear, as that's
> > > indistinguishable from running on a hardware implementation that happens
> > > to be TSO.
> >
> > Since KVM has been mentioned a few times, I'll give my take on this.
> >
> > Since day 1, it was a conscious decision for KVM/arm64 to emulate the
> > architecture, and only that -- this is complicated enough. Meaning
> > that no implementation-defined features should be explicitly exposed
> > to the guest. So I have no plan to expose any such feature for
> > userspace to configure TSO or anything else of the sort.
> 
> Agreed. We do not intend for TSO mode to be used extensively for EL1, the
> intention is for TSO mode to be reserved for userspace applications that
> request it.

But that's the same thing for a hypervisor.

For usersoace in a VM to make use of any feature, it must be exposed
to the VM as a whole by the host VMM (QEMU, kvmtool, whatever). Which
means having a new userspace ABI, specific to KVM, exposing a feature
for which there is no spec whatsoever. Even worse, you cannot discover
whether the instruction you must use to context switch the ACTLR_EL1
register is implemented. Isn't that great?

And I'm not even talking about the joys of migrating such a VM,
because we have no clue what this bit means on other implementations.
For all we know it causes another CPU to catch fire (or go PDP-endian,
which is basically the same).

Which is why my proposal is for this bit to be set statically for
*all* VMs, and leave the kernel (and KVM) out of the picture
altogether. At least that is something we can reason about (although
someone would need to start thinking of how this particular TSO
implementation composes with the relaxed memory ordering used outside
of the VM and show that they actually lead to correct results for
something such as virtio, for example).

Thanks,

	M.

-- 
Without deviation from the norm, progress is not possible.

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 40+ messages in thread

end of thread, other threads:[~2024-05-02 13:25 UTC | newest]

Thread overview: 40+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-04-11  0:51 [PATCH 0/4] arm64: Support the TSO memory model Hector Martin
2024-04-11  0:51 ` Hector Martin
2024-04-11  0:51 ` [PATCH 1/4] prctl: Introduce PR_{SET,GET}_MEM_MODEL Hector Martin
2024-04-11  0:51   ` Hector Martin
2024-04-11  0:51 ` [PATCH 2/4] arm64: Implement PR_{GET,SET}_MEM_MODEL for always-TSO CPUs Hector Martin
2024-04-11  0:51   ` Hector Martin
2024-04-11  0:51 ` [PATCH 3/4] arm64: Introduce scaffolding to add ACTLR_EL1 to thread state Hector Martin
2024-04-11  0:51   ` Hector Martin
2024-04-11  0:51 ` [PATCH 4/4] arm64: Implement Apple IMPDEF TSO memory model control Hector Martin
2024-04-11  0:51   ` Hector Martin
2024-04-11  1:37 ` [PATCH 0/4] arm64: Support the TSO memory model Neal Gompa
2024-04-11  1:37   ` Neal Gompa
2024-04-11 13:28 ` Will Deacon
2024-04-11 13:28   ` Will Deacon
2024-04-11 14:19   ` Hector Martin
2024-04-11 14:19     ` Hector Martin
2024-04-11 18:43     ` Hector Martin
2024-04-11 18:43       ` Hector Martin
2024-04-16  2:22       ` Zayd Qumsieh
2024-04-16  2:22         ` Zayd Qumsieh
2024-04-19 16:58         ` Will Deacon
2024-04-19 16:58           ` Will Deacon
2024-04-19 18:05           ` Catalin Marinas
2024-04-19 18:05             ` Catalin Marinas
2024-04-19 16:58     ` Will Deacon
2024-04-19 16:58       ` Will Deacon
2024-04-20 11:37       ` Marc Zyngier
2024-04-20 11:37         ` Marc Zyngier
2024-05-02  0:10         ` Zayd Qumsieh
2024-05-02  0:10           ` Zayd Qumsieh
2024-05-02 13:25           ` Marc Zyngier
2024-05-02 13:25             ` Marc Zyngier
2024-04-20 12:13       ` Eric Curtin
2024-04-20 12:13         ` Eric Curtin
2024-04-20 12:15         ` Eric Curtin
2024-04-20 12:15           ` Eric Curtin
2024-05-02  0:16   ` Zayd Qumsieh
2024-05-02  0:16     ` Zayd Qumsieh
2024-04-16  2:11 ` Zayd Qumsieh
2024-04-16  2:11   ` Zayd Qumsieh

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.