From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from mail.marcansoft.com (marcansoft.com [212.63.210.85]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 2ADBB2CA4 for ; Thu, 11 Apr 2024 00:51:32 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=212.63.210.85 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1712796696; cv=none; b=u/Dx76mb04o4RsMPbRdwS5wWWYqkDCZ75Cy+Yawjf67OSzc0IhPEgiGzHc3hccxxtfix9dQc12CFdGGFBpWl+EBeWQ0u6Z2w5q7MFtqFr1HFkLWiYS4asXL1ZFGXu7rW/SlZGpsz6TJRS0aYHbXvQ2kjjIj/dt6EJf6qDIJq3dY= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1712796696; c=relaxed/simple; bh=nxya/b7E+A6BVmNU6QddSmRkpeeUmd91NwPZg8zTcWA=; h=From:Subject:Date:Message-Id:MIME-Version:Content-Type:To:Cc; b=EAbmjKvCZqJmuExht2idWdeRGgLJgT+g//+KkwEkApSw5GM/uGAZfmCQenHyvkvxcDvttz3S5PRbVw5VrNOX9VU4qZWCDafGHsvu7moq6Qmdr3gb0EEQVzsfLEGoVF52Xpm8pYGSVJ2x2ISQV/Qju3ZcabEU+CNe0nLCTK561qE= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=marcan.st; spf=pass smtp.mailfrom=marcan.st; dkim=pass (2048-bit key) header.d=marcan.st header.i=@marcan.st header.b=ylASkj0g; arc=none smtp.client-ip=212.63.210.85 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=marcan.st Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=marcan.st Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=marcan.st header.i=@marcan.st header.b="ylASkj0g" Received: from [127.0.0.1] (localhost [127.0.0.1]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) (Authenticated sender: sendonly@marcansoft.com) by mail.marcansoft.com (Postfix) with ESMTPSA id 05645425BB; Thu, 11 Apr 2024 00:51:25 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=marcan.st; s=default; t=1712796690; bh=nxya/b7E+A6BVmNU6QddSmRkpeeUmd91NwPZg8zTcWA=; h=From:Subject:Date:To:Cc; b=ylASkj0gP/QMhmRNIvwKuEdCIVPxgosdu2BqIWIWtVcA+4sxwIfsCYxeYUd8Oig5A lrgYWXm3bTQWGn+DRkaoCV/0PUOcKJujcyC1l+JpuIF9dnY5DKFXCZP8Mwe29QeDpd vNwZ5LmEox/uyZgWoWP2ioV4Lvx9KnrJM8FNuWGjA+oItbpDBFqFdDWVfkEMNFM61j lN38i36EcHO6GN9i3eAY2BjNe0G936qUWj09l25T7TrGXKtGn+7un5IToz0G3vfUHY 64syV6TV2sOkgUf4TNcr/Vu+MZ82LWbf3WJAZd5CavNjnETt9+MLLwnuZNSBssTPmd u3TiC55+C7SWQ== From: Hector Martin Subject: [PATCH 0/4] arm64: Support the TSO memory model Date: Thu, 11 Apr 2024 09:51:19 +0900 Message-Id: <20240411-tso-v1-0-754f11abfbff@marcan.st> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: 7bit X-B4-Tracking: v=1; b=H4sIAAc0F2YC/6tWKk4tykwtVrJSqFYqSi3LLM7MzwNyDHUUlJIzE vPSU3UzU4B8JSMDIxMDE0ND3ZLifN1UC7O0lOTUJEuTJAsloMqCotS0zAqwKdGxtbUA7VNoR1U AAAA= To: Catalin Marinas , Will Deacon , Marc Zyngier , Mark Rutland Cc: Zayd Qumsieh , Justin Lu , Ryan Houdek , Mark Brown , Ard Biesheuvel , Mateusz Guzik , Anshuman Khandual , Oliver Upton , Miguel Luis , Joey Gouly , Christoph Paasch , Kees Cook , Sami Tolvanen , Baoquan He , Joel Granados , Dawei Li , Andrew Morton , Florent Revest , David Hildenbrand , Stefan Roesch , Andy Chiu , Josh Triplett , Oleg Nesterov , Helge Deller , Zev Weiss , Ondrej Mosnacek , Miguel Ojeda , linux-arm-kernel@lists.infradead.org, linux-kernel@vger.kernel.org, Asahi Linux , Hector Martin X-Mailer: b4 0.13.0 X-Developer-Signature: v=1; a=openpgp-sha256; l=5393; i=marcan@marcan.st; h=from:subject:message-id; bh=nxya/b7E+A6BVmNU6QddSmRkpeeUmd91NwPZg8zTcWA=; b=owGbwMvMwCUm+yP4NEe/cRLjabUkhjRxE26+LT/3PJ/+89teKROlYJ/0zYWnTohs2sS+jP/Zy im7K/9LdJSyMIhxMciKKbI0nug91e05/Zy6asp0mDmsTCBDGLg4BWAib18zMlzcNZ+/2EZF/cKx zGbzopnLArb8mlc9szl9bi2H2jb/Fy4M/zNOivxMeZv+2d5YV+jmu167wwYZQlFiB9odJRrC1Rb E8AAA X-Developer-Key: i=marcan@marcan.st; a=openpgp; fpr=FC18F00317968B7BE86201CBE22A629A4C515DD5 x86 CPUs implement a stricter memory modern than ARM64 (TSO). For this reason, x86 emulation on baseline ARM64 systems requires very expensive memory model emulation. Having hardware that supports this natively is therefore very attractive. Such hardware, in fact, exists. This series adds support for userspace to identify when TSO is available and toggle it on, if supported. Some ARM64 CPUs intrinsically implement the TSO memory model, while others expose is as an IMPDEF control. Apple Silicon SoCs are in the latter category. Using TSO for x86 emulation on chips that support it has been shown to provide a massive performance boost [1]. Patch 1 introduces the PR_{SET,GET}_MEM_MODEL userspace control, which is initially not implemented for any architectures. Patch 2 implements it for CPUs which are known, to the best of my knowledge, to always implement the TSO memory model unconditionally. This uses the cpufeature mechanism to only enable this if *all* cores in the system meet the requirements. Patch 3 adds the scaffolding necesasry to save/restore the ACTLR_EL1 register across context switches. This register contains IMPDEF flags related to CPU execution, and on Apple CPUs this is where the runtime TSO toggle bit is implemented. Other CPUs could conceivably benefit from this scaffolding if they also use ACTLR_EL1 for things that could ostensibly be runtime controlled and context-switched. For this to work, ACTLR_EL1 must have a uniform layout across all cores in the system. Finally, patch 4 implements PR_{SET,GET}_MEM_MODEL for Apple CPUs by hooking it up to flip the appropriate ACTLR_EL1 bit when the Apple TSO feature is detected (on all CPUs, which also implies the uniform ACTLR_EL1 layout). This series has been brewing in the downstream Asahi Linux tree for a while now, and ships to thousands of users. A subset have been using it with FEX-Emu, which already supports this feature. This rebase on v6.9-rc1 is only build-tested (all intermediate commits with and without the config enabled, on ARM64) but I'll update the downstream branch soon with this version and get it pushed out to users/testers. The Apple support works on bare metal and *should* work exactly the same way on macOS VMs (as alluded to by Zayd in his independent submission [3]), though I haven't personally verified this. KVM support for this is left for a future patchset. (Apologies for the large Cc: list; I want to make sure nobody who got Cced on Zayd's alternate take is left out of this one.) [1] https://fex-emu.com/FEX-2306/ [2] https://github.com/AsahiLinux/linux/tree/bits/220-tso [3] https://lore.kernel.org/lkml/20240410211652.16640-1-zayd_qumsieh@apple.com/ To: Catalin Marinas To: Will Deacon To: Marc Zyngier To: Mark Rutland Cc: Zayd Qumsieh Cc: Justin Lu Cc: Ryan Houdek Cc: Mark Brown Cc: Ard Biesheuvel Cc: Mateusz Guzik Cc: Anshuman Khandual Cc: Oliver Upton Cc: Miguel Luis Cc: Joey Gouly Cc: Christoph Paasch Cc: Kees Cook Cc: Sami Tolvanen Cc: Baoquan He Cc: Joel Granados Cc: Dawei Li Cc: Andrew Morton Cc: Florent Revest Cc: David Hildenbrand Cc: Stefan Roesch Cc: Andy Chiu Cc: Josh Triplett Cc: Oleg Nesterov Cc: Helge Deller Cc: Zev Weiss Cc: Ondrej Mosnacek Cc: Miguel Ojeda Cc: linux-arm-kernel@lists.infradead.org Cc: linux-kernel@vger.kernel.org Cc: Asahi Linux Signed-off-by: Hector Martin --- Hector Martin (4): prctl: Introduce PR_{SET,GET}_MEM_MODEL arm64: Implement PR_{GET,SET}_MEM_MODEL for always-TSO CPUs arm64: Introduce scaffolding to add ACTLR_EL1 to thread state arm64: Implement Apple IMPDEF TSO memory model control arch/arm64/Kconfig | 14 ++++++ arch/arm64/include/asm/apple_cpufeature.h | 15 +++++++ arch/arm64/include/asm/cpufeature.h | 10 +++++ arch/arm64/include/asm/processor.h | 3 ++ arch/arm64/kernel/Makefile | 3 +- arch/arm64/kernel/cpufeature.c | 11 ++--- arch/arm64/kernel/cpufeature_impdef.c | 61 ++++++++++++++++++++++++++ arch/arm64/kernel/process.c | 71 +++++++++++++++++++++++++++++++ arch/arm64/kernel/setup.c | 8 ++++ arch/arm64/tools/cpucaps | 2 + include/linux/memory_ordering_model.h | 11 +++++ include/uapi/linux/prctl.h | 5 +++ kernel/sys.c | 21 +++++++++ 13 files changed, 229 insertions(+), 6 deletions(-) --- base-commit: 4cece764965020c22cff7665b18a012006359095 change-id: 20240411-tso-e86fdceb94b8 Best regards, -- Hector Martin From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from bombadil.infradead.org (bombadil.infradead.org [198.137.202.133]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id C1AA5CD1299 for ; Thu, 11 Apr 2024 00:51:54 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=lists.infradead.org; s=bombadil.20210309; h=Sender: Content-Transfer-Encoding:Content-Type:List-Subscribe:List-Help:List-Post: List-Archive:List-Unsubscribe:List-Id:Cc:To:MIME-Version:Message-Id:Date: Subject:From:Reply-To:Content-ID:Content-Description:Resent-Date:Resent-From: Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID:In-Reply-To:References: List-Owner; bh=9MfUnKKSe2kP3+f/r/3bRi8Gi8rO13pkKIQmXzxxck8=; b=sm1Vou44gXaXyE 1veiOG61kWBKJVDSasMP+rWdZjw4oUbI/gSXvg5C0ACijBPxjGCIcmF1EMSM9suv6gyk43Rrl9d5j YKmEzw1mVIiIWvX431x9mPLfZGPmysjNO5ijeNTOQhZ5NONl18I6jdPHcMDoY/bnWNpswmy9vuxMq 8MLQALlsmIzFvx501zNOOMXHCe5OuiLkD3dJxP83Z+4RyYHS35lGqB6HcCh9nV34iPYhWTvEl6bqj K47ZzfKUiM84bObsn8F+OD88u5glV6A/nQB1OkqackMcpGrfAKOLFbLh3X2qtSmJHJJ0UQED6Qygw Hw+PF6Ahu1Jc6kmcCb+Q==; Received: from localhost ([::1] helo=bombadil.infradead.org) by bombadil.infradead.org with esmtp (Exim 4.97.1 #2 (Red Hat Linux)) id 1ruifE-00000009icn-28Ux; Thu, 11 Apr 2024 00:51:40 +0000 Received: from marcansoft.com ([212.63.210.85] helo=mail.marcansoft.com) by bombadil.infradead.org with esmtps (Exim 4.97.1 #2 (Red Hat Linux)) id 1ruifA-00000009iax-15GD for linux-arm-kernel@lists.infradead.org; Thu, 11 Apr 2024 00:51:38 +0000 Received: from [127.0.0.1] (localhost [127.0.0.1]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) (Authenticated sender: sendonly@marcansoft.com) by mail.marcansoft.com (Postfix) with ESMTPSA id 05645425BB; Thu, 11 Apr 2024 00:51:25 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=marcan.st; s=default; t=1712796690; bh=nxya/b7E+A6BVmNU6QddSmRkpeeUmd91NwPZg8zTcWA=; h=From:Subject:Date:To:Cc; b=ylASkj0gP/QMhmRNIvwKuEdCIVPxgosdu2BqIWIWtVcA+4sxwIfsCYxeYUd8Oig5A lrgYWXm3bTQWGn+DRkaoCV/0PUOcKJujcyC1l+JpuIF9dnY5DKFXCZP8Mwe29QeDpd vNwZ5LmEox/uyZgWoWP2ioV4Lvx9KnrJM8FNuWGjA+oItbpDBFqFdDWVfkEMNFM61j lN38i36EcHO6GN9i3eAY2BjNe0G936qUWj09l25T7TrGXKtGn+7un5IToz0G3vfUHY 64syV6TV2sOkgUf4TNcr/Vu+MZ82LWbf3WJAZd5CavNjnETt9+MLLwnuZNSBssTPmd u3TiC55+C7SWQ== From: Hector Martin Subject: [PATCH 0/4] arm64: Support the TSO memory model Date: Thu, 11 Apr 2024 09:51:19 +0900 Message-Id: <20240411-tso-v1-0-754f11abfbff@marcan.st> MIME-Version: 1.0 X-B4-Tracking: v=1; b=H4sIAAc0F2YC/6tWKk4tykwtVrJSqFYqSi3LLM7MzwNyDHUUlJIzE vPSU3UzU4B8JSMDIxMDE0ND3ZLifN1UC7O0lOTUJEuTJAsloMqCotS0zAqwKdGxtbUA7VNoR1U AAAA= To: Catalin Marinas , Will Deacon , Marc Zyngier , Mark Rutland Cc: Zayd Qumsieh , Justin Lu , Ryan Houdek , Mark Brown , Ard Biesheuvel , Mateusz Guzik , Anshuman Khandual , Oliver Upton , Miguel Luis , Joey Gouly , Christoph Paasch , Kees Cook , Sami Tolvanen , Baoquan He , Joel Granados , Dawei Li , Andrew Morton , Florent Revest , David Hildenbrand , Stefan Roesch , Andy Chiu , Josh Triplett , Oleg Nesterov , Helge Deller , Zev Weiss , Ondrej Mosnacek , Miguel Ojeda , linux-arm-kernel@lists.infradead.org, linux-kernel@vger.kernel.org, Asahi Linux , Hector Martin X-Mailer: b4 0.13.0 X-Developer-Signature: v=1; a=openpgp-sha256; l=5393; i=marcan@marcan.st; h=from:subject:message-id; bh=nxya/b7E+A6BVmNU6QddSmRkpeeUmd91NwPZg8zTcWA=; b=owGbwMvMwCUm+yP4NEe/cRLjabUkhjRxE26+LT/3PJ/+89teKROlYJ/0zYWnTohs2sS+jP/Zy im7K/9LdJSyMIhxMciKKbI0nug91e05/Zy6asp0mDmsTCBDGLg4BWAib18zMlzcNZ+/2EZF/cKx zGbzopnLArb8mlc9szl9bi2H2jb/Fy4M/zNOivxMeZv+2d5YV+jmu167wwYZQlFiB9odJRrC1Rb E8AAA X-Developer-Key: i=marcan@marcan.st; a=openpgp; fpr=FC18F00317968B7BE86201CBE22A629A4C515DD5 X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20240410_175136_710350_3BD32DD5 X-CRM114-Status: GOOD ( 19.20 ) X-BeenThere: linux-arm-kernel@lists.infradead.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Sender: "linux-arm-kernel" Errors-To: linux-arm-kernel-bounces+linux-arm-kernel=archiver.kernel.org@lists.infradead.org x86 CPUs implement a stricter memory modern than ARM64 (TSO). For this reason, x86 emulation on baseline ARM64 systems requires very expensive memory model emulation. Having hardware that supports this natively is therefore very attractive. Such hardware, in fact, exists. This series adds support for userspace to identify when TSO is available and toggle it on, if supported. Some ARM64 CPUs intrinsically implement the TSO memory model, while others expose is as an IMPDEF control. Apple Silicon SoCs are in the latter category. Using TSO for x86 emulation on chips that support it has been shown to provide a massive performance boost [1]. Patch 1 introduces the PR_{SET,GET}_MEM_MODEL userspace control, which is initially not implemented for any architectures. Patch 2 implements it for CPUs which are known, to the best of my knowledge, to always implement the TSO memory model unconditionally. This uses the cpufeature mechanism to only enable this if *all* cores in the system meet the requirements. Patch 3 adds the scaffolding necesasry to save/restore the ACTLR_EL1 register across context switches. This register contains IMPDEF flags related to CPU execution, and on Apple CPUs this is where the runtime TSO toggle bit is implemented. Other CPUs could conceivably benefit from this scaffolding if they also use ACTLR_EL1 for things that could ostensibly be runtime controlled and context-switched. For this to work, ACTLR_EL1 must have a uniform layout across all cores in the system. Finally, patch 4 implements PR_{SET,GET}_MEM_MODEL for Apple CPUs by hooking it up to flip the appropriate ACTLR_EL1 bit when the Apple TSO feature is detected (on all CPUs, which also implies the uniform ACTLR_EL1 layout). This series has been brewing in the downstream Asahi Linux tree for a while now, and ships to thousands of users. A subset have been using it with FEX-Emu, which already supports this feature. This rebase on v6.9-rc1 is only build-tested (all intermediate commits with and without the config enabled, on ARM64) but I'll update the downstream branch soon with this version and get it pushed out to users/testers. The Apple support works on bare metal and *should* work exactly the same way on macOS VMs (as alluded to by Zayd in his independent submission [3]), though I haven't personally verified this. KVM support for this is left for a future patchset. (Apologies for the large Cc: list; I want to make sure nobody who got Cced on Zayd's alternate take is left out of this one.) [1] https://fex-emu.com/FEX-2306/ [2] https://github.com/AsahiLinux/linux/tree/bits/220-tso [3] https://lore.kernel.org/lkml/20240410211652.16640-1-zayd_qumsieh@apple.com/ To: Catalin Marinas To: Will Deacon To: Marc Zyngier To: Mark Rutland Cc: Zayd Qumsieh Cc: Justin Lu Cc: Ryan Houdek Cc: Mark Brown Cc: Ard Biesheuvel Cc: Mateusz Guzik Cc: Anshuman Khandual Cc: Oliver Upton Cc: Miguel Luis Cc: Joey Gouly Cc: Christoph Paasch Cc: Kees Cook Cc: Sami Tolvanen Cc: Baoquan He Cc: Joel Granados Cc: Dawei Li Cc: Andrew Morton Cc: Florent Revest Cc: David Hildenbrand Cc: Stefan Roesch Cc: Andy Chiu Cc: Josh Triplett Cc: Oleg Nesterov Cc: Helge Deller Cc: Zev Weiss Cc: Ondrej Mosnacek Cc: Miguel Ojeda Cc: linux-arm-kernel@lists.infradead.org Cc: linux-kernel@vger.kernel.org Cc: Asahi Linux Signed-off-by: Hector Martin --- Hector Martin (4): prctl: Introduce PR_{SET,GET}_MEM_MODEL arm64: Implement PR_{GET,SET}_MEM_MODEL for always-TSO CPUs arm64: Introduce scaffolding to add ACTLR_EL1 to thread state arm64: Implement Apple IMPDEF TSO memory model control arch/arm64/Kconfig | 14 ++++++ arch/arm64/include/asm/apple_cpufeature.h | 15 +++++++ arch/arm64/include/asm/cpufeature.h | 10 +++++ arch/arm64/include/asm/processor.h | 3 ++ arch/arm64/kernel/Makefile | 3 +- arch/arm64/kernel/cpufeature.c | 11 ++--- arch/arm64/kernel/cpufeature_impdef.c | 61 ++++++++++++++++++++++++++ arch/arm64/kernel/process.c | 71 +++++++++++++++++++++++++++++++ arch/arm64/kernel/setup.c | 8 ++++ arch/arm64/tools/cpucaps | 2 + include/linux/memory_ordering_model.h | 11 +++++ include/uapi/linux/prctl.h | 5 +++ kernel/sys.c | 21 +++++++++ 13 files changed, 229 insertions(+), 6 deletions(-) --- base-commit: 4cece764965020c22cff7665b18a012006359095 change-id: 20240411-tso-e86fdceb94b8 Best regards, -- Hector Martin _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel