* [RFC PATCH 00/28] x86: Rely on toolchain for relocatable code
@ 2024-09-25 15:01 Ard Biesheuvel
2024-09-25 15:01 ` [RFC PATCH 01/28] x86/pvh: Call C code via the kernel virtual mapping Ard Biesheuvel
` (27 more replies)
0 siblings, 28 replies; 73+ messages in thread
From: Ard Biesheuvel @ 2024-09-25 15:01 UTC (permalink / raw)
To: linux-kernel
Cc: Ard Biesheuvel, x86, H. Peter Anvin, Andy Lutomirski,
Peter Zijlstra, Uros Bizjak, Dennis Zhou, Tejun Heo,
Christoph Lameter, Mathieu Desnoyers, Paolo Bonzini,
Vitaly Kuznetsov, Juergen Gross, Boris Ostrovsky,
Greg Kroah-Hartman, Arnd Bergmann, Masahiro Yamada, Kees Cook,
Nathan Chancellor, Keith Packard, Justin Stitt, Josh Poimboeuf,
Arnaldo Carvalho de Melo, Namhyung Kim, Jiri Olsa, Ian Rogers,
Adrian Hunter, Kan Liang, linux-doc, linux-pm, kvm, xen-devel,
linux-efi, linux-arch, linux-sparse, linux-kbuild,
linux-perf-users, rust-for-linux, llvm
From: Ard Biesheuvel <ardb@kernel.org>
The x86_64 port has a number of historical quirks that result in a
reliance on toolchain features that are either poorly specified or
basically implementation details of the toolchain:
- the 'kernel' C model implemented by the compiler is intended for
position dependent code residing in the 'negative' 2 GiB of the
virtual address space, but is used to create a position independent
executable (for virtual KASLR);
- the 'kernel' C model has other properties that are not written down
anywhere, and may therefore deviate between compilers and versions,
which now includes the Rust compilers too (e.g., use %gs not %fs for
per-CPU references);
- the relocation format used to perform the PIE relocation at boot is
complicated and non-standard, as it deals with 3 types of
displacements, including 32-bit negative displacements for
RIP-relative per-CPU references that are not subject to relocation
fixups (as they are places in a separate, disjoint address space);
- the relocation table is generated from static relocation metadata
taken from the ELF input objects into the linker, and these describe
the input not the output - relaxations or other linker tweaks may
result in a mismatch between the two, and GNU ld and LLD display
different behavior here;
- this disjoint per-CPU address space requires elaborate hacks in the
linker script and the startup code;
- some of the startup code executes from a 1:1 mapping of memory, where
RIP-relative references are mandatory, whereas RIP-relative per-CPU
variable references can only work correctly from the kernel virtual
mapping (as they need to wrap around from the negative 2 GiB space
into the 0x0 based per-CPU region);
The reason for this odd situation wrt per-CPU variable addressing is the
fact that we rely on the user-space TLS arrangement for per-task stack
cookies, and this was implemented using a fixed offset of 40 bytes from
%GS. If we bump the minimum GCC version to 8.1, we can switch to symbol
based stack cookie references, allowing the same arrangement to be
adopted as on other architectures, i.e., where the CPU register carries
the per-CPU offset, and UP or boot-time per-CPU references point into
the per-CPU load area directly (using an offset of 0x0).
With that out of the way, we can untangle this whole thing, and replace
the bespoke tooling and relocation formats with ordinary, linker
generated relocation tables, using the RELR format that reduces the
memory footprint of the relocation table by 20x. The compilers can
efficiently generate position independent code these days, without
unnecessary indirections via the Global Object Table (GOT) except for a
handful of special cases (see the KVM patch for an example where a
GOT-based indirection is the best choice for pushing the absolute
address of a symbol onto the stack in a position independent manner when
there are no free GPRs)
It also brings us much closer to the ordinary PIE relocation model used
for most of user space, which is therefore much better supported and
less likely to create problems as we increase the range of compilers and
linkers that need to be supported.
Tested on GCC 8 - 14 and Clang 15 - 17, using EFI and bare metal boot
using a variety of entry points (decompressor, EFI stub, XenPV, PVH)
Cc: x86@kernel.org
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Uros Bizjak <ubizjak@gmail.com>
Cc: Dennis Zhou <dennis@kernel.org>
Cc: Tejun Heo <tj@kernel.org>
Cc: Christoph Lameter <cl@linux.com>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Vitaly Kuznetsov <vkuznets@redhat.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Masahiro Yamada <masahiroy@kernel.org>
Cc: Kees Cook <kees@kernel.org>
Cc: Nathan Chancellor <nathan@kernel.org>
Cc: Keith Packard <keithp@keithp.com>
Cc: Justin Stitt <justinstitt@google.com>
Cc: Josh Poimboeuf <jpoimboe@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Ian Rogers <irogers@google.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: linux-doc@vger.kernel.org
Cc: linux-pm@vger.kernel.org
Cc: kvm@vger.kernel.org
Cc: xen-devel@lists.xenproject.org
Cc: linux-efi@vger.kernel.org
Cc: linux-arch@vger.kernel.org
Cc: linux-sparse@vger.kernel.org
Cc: linux-kbuild@vger.kernel.org
Cc: linux-perf-users@vger.kernel.org
Cc: rust-for-linux@vger.kernel.org
Cc: llvm@lists.linux.dev
Ard Biesheuvel (28):
x86/pvh: Call C code via the kernel virtual mapping
Documentation: Bump minimum GCC version to 8.1
x86/tools: Use mmap() to simplify relocs host tool
x86/boot: Permit GOTPCREL relocations for x86_64 builds
x86: Define the stack protector guard symbol explicitly
x86/percpu: Get rid of absolute per-CPU variable placement
scripts/kallsyms: Avoid 0x0 as the relative base
scripts/kallsyms: Remove support for absolute per-CPU variables
x86/tools: Remove special relocation handling for per-CPU variables
x86/xen: Avoid relocatable quantities in Xen ELF notes
x86/pvh: Avoid absolute symbol references in .head.text
x86/pm-trace: Use RIP-relative accesses for .tracedata
x86/kvm: Use RIP-relative addressing
x86/rethook: Use RIP-relative reference for return address
x86/sync_core: Use RIP-relative addressing
x86/entry_64: Use RIP-relative addressing
x86/hibernate: Prefer RIP-relative accesses
x86/boot/64: Determine VA/PA offset before entering C code
x86/boot/64: Avoid intentional absolute symbol references in
.head.text
x64/acpi: Use PIC-compatible references in wakeup_64.S
x86/head: Use PIC-compatible symbol references in startup code
asm-generic: Treat PIC .data.rel.ro sections as .rodata
tools/objtool: Mark generated sections as writable
tools/objtool: Treat indirect ftrace calls as direct calls
x86: Use PIE codegen for the core kernel
x86/boot: Implement support for ELF RELA/RELR relocations
x86/kernel: Switch to PIE linking for the core kernel
x86/tools: Drop x86_64 support from 'relocs' tool
Documentation/admin-guide/README.rst | 2 +-
Documentation/arch/x86/zero-page.rst | 3 +-
Documentation/process/changes.rst | 2 +-
arch/x86/Kconfig | 3 +-
arch/x86/Makefile | 22 +-
arch/x86/boot/Makefile | 1 +
arch/x86/boot/compressed/Makefile | 2 +-
arch/x86/boot/compressed/misc.c | 16 +-
arch/x86/entry/calling.h | 9 +-
arch/x86/entry/entry_64.S | 12 +-
arch/x86/entry/vdso/Makefile | 1 +
arch/x86/include/asm/desc.h | 1 -
arch/x86/include/asm/init.h | 2 +-
arch/x86/include/asm/percpu.h | 22 -
arch/x86/include/asm/pm-trace.h | 4 +-
arch/x86/include/asm/processor.h | 14 +-
arch/x86/include/asm/setup.h | 3 +-
arch/x86/include/asm/stackprotector.h | 4 -
arch/x86/include/asm/sync_core.h | 3 +-
arch/x86/include/uapi/asm/bootparam.h | 2 +-
arch/x86/kernel/acpi/wakeup_64.S | 11 +-
arch/x86/kernel/head64.c | 76 +++-
arch/x86/kernel/head_64.S | 40 +-
arch/x86/kernel/irq_64.c | 1 -
arch/x86/kernel/kvm.c | 8 +-
arch/x86/kernel/relocate_kernel_64.S | 6 +-
arch/x86/kernel/rethook.c | 3 +-
arch/x86/kernel/setup_percpu.c | 9 +-
arch/x86/kernel/vmlinux.lds.S | 75 ++--
arch/x86/platform/pvh/head.S | 57 ++-
arch/x86/power/hibernate_asm_64.S | 4 +-
arch/x86/realmode/rm/Makefile | 1 +
arch/x86/tools/Makefile | 2 +-
arch/x86/tools/relocs.c | 425 +++-----------------
arch/x86/tools/relocs.h | 11 +-
arch/x86/tools/relocs_64.c | 18 -
arch/x86/tools/relocs_common.c | 11 +-
arch/x86/xen/xen-head.S | 16 +-
drivers/base/power/trace.c | 6 +-
drivers/firmware/efi/libstub/x86-stub.c | 2 +
include/asm-generic/vmlinux.lds.h | 10 +-
include/linux/compiler.h | 2 +-
init/Kconfig | 5 -
kernel/kallsyms.c | 12 +-
scripts/kallsyms.c | 53 +--
scripts/link-vmlinux.sh | 4 -
tools/objtool/check.c | 43 +-
tools/objtool/elf.c | 2 +-
tools/objtool/include/objtool/special.h | 2 +-
tools/perf/util/annotate.c | 4 +-
50 files changed, 380 insertions(+), 667 deletions(-)
delete mode 100644 arch/x86/tools/relocs_64.c
--
2.46.0.792.g87dc391469-goog
^ permalink raw reply [flat|nested] 73+ messages in thread
* [RFC PATCH 01/28] x86/pvh: Call C code via the kernel virtual mapping
2024-09-25 15:01 [RFC PATCH 00/28] x86: Rely on toolchain for relocatable code Ard Biesheuvel
@ 2024-09-25 15:01 ` Ard Biesheuvel
2024-09-25 21:12 ` Jason Andryuk
2024-09-25 15:01 ` [RFC PATCH 02/28] Documentation: Bump minimum GCC version to 8.1 Ard Biesheuvel
` (26 subsequent siblings)
27 siblings, 1 reply; 73+ messages in thread
From: Ard Biesheuvel @ 2024-09-25 15:01 UTC (permalink / raw)
To: linux-kernel
Cc: Ard Biesheuvel, x86, H. Peter Anvin, Andy Lutomirski,
Peter Zijlstra, Uros Bizjak, Dennis Zhou, Tejun Heo,
Christoph Lameter, Mathieu Desnoyers, Paolo Bonzini,
Vitaly Kuznetsov, Juergen Gross, Boris Ostrovsky,
Greg Kroah-Hartman, Arnd Bergmann, Masahiro Yamada, Kees Cook,
Nathan Chancellor, Keith Packard, Justin Stitt, Josh Poimboeuf,
Arnaldo Carvalho de Melo, Namhyung Kim, Jiri Olsa, Ian Rogers,
Adrian Hunter, Kan Liang, linux-doc, linux-pm, kvm, xen-devel,
linux-efi, linux-arch, linux-sparse, linux-kbuild,
linux-perf-users, rust-for-linux, llvm
From: Ard Biesheuvel <ardb@kernel.org>
Calling C code via a different mapping than it was linked at is
problematic, because the compiler assumes that RIP-relative and absolute
symbol references are interchangeable. GCC in particular may use
RIP-relative per-CPU variable references even when not using -fpic.
So call xen_prepare_pvh() via its kernel virtual mapping on x86_64, so
that those RIP-relative references produce the correct values. This
matches the pre-existing behavior for i386, which also invokes
xen_prepare_pvh() via the kernel virtual mapping before invoking
startup_32 with paging disabled again.
Fixes: 7243b93345f7 ("xen/pvh: Bootstrap PVH guest")
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
---
arch/x86/platform/pvh/head.S | 6 +++++-
1 file changed, 5 insertions(+), 1 deletion(-)
diff --git a/arch/x86/platform/pvh/head.S b/arch/x86/platform/pvh/head.S
index f7235ef87bc3..a308b79a887c 100644
--- a/arch/x86/platform/pvh/head.S
+++ b/arch/x86/platform/pvh/head.S
@@ -101,7 +101,11 @@ SYM_CODE_START_LOCAL(pvh_start_xen)
xor %edx, %edx
wrmsr
- call xen_prepare_pvh
+ /* Call xen_prepare_pvh() via the kernel virtual mapping */
+ leaq xen_prepare_pvh(%rip), %rax
+ addq $__START_KERNEL_map, %rax
+ ANNOTATE_RETPOLINE_SAFE
+ call *%rax
/* startup_64 expects boot_params in %rsi. */
mov $_pa(pvh_bootparams), %rsi
--
2.46.0.792.g87dc391469-goog
^ permalink raw reply related [flat|nested] 73+ messages in thread
* [RFC PATCH 02/28] Documentation: Bump minimum GCC version to 8.1
2024-09-25 15:01 [RFC PATCH 00/28] x86: Rely on toolchain for relocatable code Ard Biesheuvel
2024-09-25 15:01 ` [RFC PATCH 01/28] x86/pvh: Call C code via the kernel virtual mapping Ard Biesheuvel
@ 2024-09-25 15:01 ` Ard Biesheuvel
2024-09-25 15:58 ` Arnd Bergmann
` (2 more replies)
2024-09-25 15:01 ` [RFC PATCH 03/28] x86/tools: Use mmap() to simplify relocs host tool Ard Biesheuvel
` (25 subsequent siblings)
27 siblings, 3 replies; 73+ messages in thread
From: Ard Biesheuvel @ 2024-09-25 15:01 UTC (permalink / raw)
To: linux-kernel
Cc: Ard Biesheuvel, x86, H. Peter Anvin, Andy Lutomirski,
Peter Zijlstra, Uros Bizjak, Dennis Zhou, Tejun Heo,
Christoph Lameter, Mathieu Desnoyers, Paolo Bonzini,
Vitaly Kuznetsov, Juergen Gross, Boris Ostrovsky,
Greg Kroah-Hartman, Arnd Bergmann, Masahiro Yamada, Kees Cook,
Nathan Chancellor, Keith Packard, Justin Stitt, Josh Poimboeuf,
Arnaldo Carvalho de Melo, Namhyung Kim, Jiri Olsa, Ian Rogers,
Adrian Hunter, Kan Liang, linux-doc, linux-pm, kvm, xen-devel,
linux-efi, linux-arch, linux-sparse, linux-kbuild,
linux-perf-users, rust-for-linux, llvm
From: Ard Biesheuvel <ardb@kernel.org>
Bump the minimum GCC version to 8.1 to gain unconditional support for
referring to the per-task stack cookie using a symbol rather than
relying on the fixed offset of 40 bytes from %GS, which requires
elaborate hacks to support.
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
---
Documentation/admin-guide/README.rst | 2 +-
Documentation/process/changes.rst | 2 +-
2 files changed, 2 insertions(+), 2 deletions(-)
diff --git a/Documentation/admin-guide/README.rst b/Documentation/admin-guide/README.rst
index f2bebff6a733..3dda41923ed6 100644
--- a/Documentation/admin-guide/README.rst
+++ b/Documentation/admin-guide/README.rst
@@ -259,7 +259,7 @@ Configuring the kernel
Compiling the kernel
--------------------
- - Make sure you have at least gcc 5.1 available.
+ - Make sure you have at least gcc 8.1 available.
For more information, refer to :ref:`Documentation/process/changes.rst <changes>`.
- Do a ``make`` to create a compressed kernel image. It is also possible to do
diff --git a/Documentation/process/changes.rst b/Documentation/process/changes.rst
index 00f1ed7c59c3..59b7d3d8a577 100644
--- a/Documentation/process/changes.rst
+++ b/Documentation/process/changes.rst
@@ -29,7 +29,7 @@ you probably needn't concern yourself with pcmciautils.
====================== =============== ========================================
Program Minimal version Command to check the version
====================== =============== ========================================
-GNU C 5.1 gcc --version
+GNU C 8.1 gcc --version
Clang/LLVM (optional) 13.0.1 clang --version
Rust (optional) 1.78.0 rustc --version
bindgen (optional) 0.65.1 bindgen --version
--
2.46.0.792.g87dc391469-goog
^ permalink raw reply related [flat|nested] 73+ messages in thread
* [RFC PATCH 03/28] x86/tools: Use mmap() to simplify relocs host tool
2024-09-25 15:01 [RFC PATCH 00/28] x86: Rely on toolchain for relocatable code Ard Biesheuvel
2024-09-25 15:01 ` [RFC PATCH 01/28] x86/pvh: Call C code via the kernel virtual mapping Ard Biesheuvel
2024-09-25 15:01 ` [RFC PATCH 02/28] Documentation: Bump minimum GCC version to 8.1 Ard Biesheuvel
@ 2024-09-25 15:01 ` Ard Biesheuvel
2024-09-25 15:01 ` [RFC PATCH 04/28] x86/boot: Permit GOTPCREL relocations for x86_64 builds Ard Biesheuvel
` (24 subsequent siblings)
27 siblings, 0 replies; 73+ messages in thread
From: Ard Biesheuvel @ 2024-09-25 15:01 UTC (permalink / raw)
To: linux-kernel
Cc: Ard Biesheuvel, x86, H. Peter Anvin, Andy Lutomirski,
Peter Zijlstra, Uros Bizjak, Dennis Zhou, Tejun Heo,
Christoph Lameter, Mathieu Desnoyers, Paolo Bonzini,
Vitaly Kuznetsov, Juergen Gross, Boris Ostrovsky,
Greg Kroah-Hartman, Arnd Bergmann, Masahiro Yamada, Kees Cook,
Nathan Chancellor, Keith Packard, Justin Stitt, Josh Poimboeuf,
Arnaldo Carvalho de Melo, Namhyung Kim, Jiri Olsa, Ian Rogers,
Adrian Hunter, Kan Liang, linux-doc, linux-pm, kvm, xen-devel,
linux-efi, linux-arch, linux-sparse, linux-kbuild,
linux-perf-users, rust-for-linux, llvm
From: Ard Biesheuvel <ardb@kernel.org>
Instead of relying on fseek() and fread() to traverse the vmlinux file
when processing the ELF relocations, mmap() the whole thing and use
memcpy() or direct references where appropriate:
- the executable and section headers are byte swabbed before use if the
host is big endian, so there, the copy is retained;
- the strtab and extended symtab are not byte swabbed so there, the
copies are replaced with direct references into the mmap()'ed region.
This substantially simplifies the code, and makes it much easier to
refer to other file contents directly. This will be used by a subsequent
patch to handle GOTPCREL relocations.
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
---
arch/x86/tools/relocs.c | 145 ++++++++------------
arch/x86/tools/relocs.h | 2 +
2 files changed, 62 insertions(+), 85 deletions(-)
diff --git a/arch/x86/tools/relocs.c b/arch/x86/tools/relocs.c
index c101bed61940..35a73e4aa74d 100644
--- a/arch/x86/tools/relocs.c
+++ b/arch/x86/tools/relocs.c
@@ -37,15 +37,17 @@ static struct relocs relocs64;
#endif
struct section {
- Elf_Shdr shdr;
- struct section *link;
- Elf_Sym *symtab;
- Elf32_Word *xsymtab;
- Elf_Rel *reltab;
- char *strtab;
+ Elf_Shdr shdr;
+ struct section *link;
+ Elf_Sym *symtab;
+ const Elf32_Word *xsymtab;
+ Elf_Rel *reltab;
+ const char *strtab;
};
static struct section *secs;
+static const void *elf_image;
+
static const char * const sym_regex_kernel[S_NSYMTYPES] = {
/*
* Following symbols have been audited. There values are constant and do
@@ -291,7 +293,7 @@ static Elf_Sym *sym_lookup(const char *symname)
for (i = 0; i < shnum; i++) {
struct section *sec = &secs[i];
long nsyms;
- char *strtab;
+ const char *strtab;
Elf_Sym *symtab;
Elf_Sym *sym;
@@ -354,7 +356,7 @@ static uint64_t elf64_to_cpu(uint64_t val)
static int sym_index(Elf_Sym *sym)
{
Elf_Sym *symtab = secs[shsymtabndx].symtab;
- Elf32_Word *xsymtab = secs[shxsymtabndx].xsymtab;
+ const Elf32_Word *xsymtab = secs[shxsymtabndx].xsymtab;
unsigned long offset;
int index;
@@ -368,10 +370,9 @@ static int sym_index(Elf_Sym *sym)
return elf32_to_cpu(xsymtab[index]);
}
-static void read_ehdr(FILE *fp)
+static void read_ehdr(void)
{
- if (fread(&ehdr, sizeof(ehdr), 1, fp) != 1)
- die("Cannot read ELF header: %s\n", strerror(errno));
+ memcpy(&ehdr, elf_image, sizeof(ehdr));
if (memcmp(ehdr.e_ident, ELFMAG, SELFMAG) != 0)
die("No ELF magic\n");
if (ehdr.e_ident[EI_CLASS] != ELF_CLASS)
@@ -414,60 +415,48 @@ static void read_ehdr(FILE *fp)
if (shnum == SHN_UNDEF || shstrndx == SHN_XINDEX) {
- Elf_Shdr shdr;
-
- if (fseek(fp, ehdr.e_shoff, SEEK_SET) < 0)
- die("Seek to %" FMT " failed: %s\n", ehdr.e_shoff, strerror(errno));
-
- if (fread(&shdr, sizeof(shdr), 1, fp) != 1)
- die("Cannot read initial ELF section header: %s\n", strerror(errno));
+ const Elf_Shdr *shdr = elf_image + ehdr.e_shoff;
if (shnum == SHN_UNDEF)
- shnum = elf_xword_to_cpu(shdr.sh_size);
+ shnum = elf_xword_to_cpu(shdr->sh_size);
if (shstrndx == SHN_XINDEX)
- shstrndx = elf_word_to_cpu(shdr.sh_link);
+ shstrndx = elf_word_to_cpu(shdr->sh_link);
}
if (shstrndx >= shnum)
die("String table index out of bounds\n");
}
-static void read_shdrs(FILE *fp)
+static void read_shdrs(void)
{
+ const Elf_Shdr *shdr = elf_image + ehdr.e_shoff;
int i;
- Elf_Shdr shdr;
secs = calloc(shnum, sizeof(struct section));
if (!secs)
die("Unable to allocate %ld section headers\n", shnum);
- if (fseek(fp, ehdr.e_shoff, SEEK_SET) < 0)
- die("Seek to %" FMT " failed: %s\n", ehdr.e_shoff, strerror(errno));
-
- for (i = 0; i < shnum; i++) {
+ for (i = 0; i < shnum; i++, shdr++) {
struct section *sec = &secs[i];
- if (fread(&shdr, sizeof(shdr), 1, fp) != 1)
- die("Cannot read ELF section headers %d/%ld: %s\n", i, shnum, strerror(errno));
-
- sec->shdr.sh_name = elf_word_to_cpu(shdr.sh_name);
- sec->shdr.sh_type = elf_word_to_cpu(shdr.sh_type);
- sec->shdr.sh_flags = elf_xword_to_cpu(shdr.sh_flags);
- sec->shdr.sh_addr = elf_addr_to_cpu(shdr.sh_addr);
- sec->shdr.sh_offset = elf_off_to_cpu(shdr.sh_offset);
- sec->shdr.sh_size = elf_xword_to_cpu(shdr.sh_size);
- sec->shdr.sh_link = elf_word_to_cpu(shdr.sh_link);
- sec->shdr.sh_info = elf_word_to_cpu(shdr.sh_info);
- sec->shdr.sh_addralign = elf_xword_to_cpu(shdr.sh_addralign);
- sec->shdr.sh_entsize = elf_xword_to_cpu(shdr.sh_entsize);
+ sec->shdr.sh_name = elf_word_to_cpu(shdr->sh_name);
+ sec->shdr.sh_type = elf_word_to_cpu(shdr->sh_type);
+ sec->shdr.sh_flags = elf_xword_to_cpu(shdr->sh_flags);
+ sec->shdr.sh_addr = elf_addr_to_cpu(shdr->sh_addr);
+ sec->shdr.sh_offset = elf_off_to_cpu(shdr->sh_offset);
+ sec->shdr.sh_size = elf_xword_to_cpu(shdr->sh_size);
+ sec->shdr.sh_link = elf_word_to_cpu(shdr->sh_link);
+ sec->shdr.sh_info = elf_word_to_cpu(shdr->sh_info);
+ sec->shdr.sh_addralign = elf_xword_to_cpu(shdr->sh_addralign);
+ sec->shdr.sh_entsize = elf_xword_to_cpu(shdr->sh_entsize);
if (sec->shdr.sh_link < shnum)
sec->link = &secs[sec->shdr.sh_link];
}
}
-static void read_strtabs(FILE *fp)
+static void read_strtabs(void)
{
int i;
@@ -476,20 +465,11 @@ static void read_strtabs(FILE *fp)
if (sec->shdr.sh_type != SHT_STRTAB)
continue;
-
- sec->strtab = malloc(sec->shdr.sh_size);
- if (!sec->strtab)
- die("malloc of %" FMT " bytes for strtab failed\n", sec->shdr.sh_size);
-
- if (fseek(fp, sec->shdr.sh_offset, SEEK_SET) < 0)
- die("Seek to %" FMT " failed: %s\n", sec->shdr.sh_offset, strerror(errno));
-
- if (fread(sec->strtab, 1, sec->shdr.sh_size, fp) != sec->shdr.sh_size)
- die("Cannot read symbol table: %s\n", strerror(errno));
+ sec->strtab = elf_image + sec->shdr.sh_offset;
}
}
-static void read_symtabs(FILE *fp)
+static void read_symtabs(void)
{
int i, j;
@@ -499,16 +479,7 @@ static void read_symtabs(FILE *fp)
switch (sec->shdr.sh_type) {
case SHT_SYMTAB_SHNDX:
- sec->xsymtab = malloc(sec->shdr.sh_size);
- if (!sec->xsymtab)
- die("malloc of %" FMT " bytes for xsymtab failed\n", sec->shdr.sh_size);
-
- if (fseek(fp, sec->shdr.sh_offset, SEEK_SET) < 0)
- die("Seek to %" FMT " failed: %s\n", sec->shdr.sh_offset, strerror(errno));
-
- if (fread(sec->xsymtab, 1, sec->shdr.sh_size, fp) != sec->shdr.sh_size)
- die("Cannot read extended symbol table: %s\n", strerror(errno));
-
+ sec->xsymtab = elf_image + sec->shdr.sh_offset;
shxsymtabndx = i;
continue;
@@ -519,11 +490,7 @@ static void read_symtabs(FILE *fp)
if (!sec->symtab)
die("malloc of %" FMT " bytes for symtab failed\n", sec->shdr.sh_size);
- if (fseek(fp, sec->shdr.sh_offset, SEEK_SET) < 0)
- die("Seek to %" FMT " failed: %s\n", sec->shdr.sh_offset, strerror(errno));
-
- if (fread(sec->symtab, 1, sec->shdr.sh_size, fp) != sec->shdr.sh_size)
- die("Cannot read symbol table: %s\n", strerror(errno));
+ memcpy(sec->symtab, elf_image + sec->shdr.sh_offset, sec->shdr.sh_size);
for (j = 0; j < num_syms; j++) {
Elf_Sym *sym = &sec->symtab[j];
@@ -543,12 +510,13 @@ static void read_symtabs(FILE *fp)
}
-static void read_relocs(FILE *fp)
+static void read_relocs(void)
{
int i, j;
for (i = 0; i < shnum; i++) {
struct section *sec = &secs[i];
+ const Elf_Rel *reltab = elf_image + sec->shdr.sh_offset;
if (sec->shdr.sh_type != SHT_REL_TYPE)
continue;
@@ -557,19 +525,12 @@ static void read_relocs(FILE *fp)
if (!sec->reltab)
die("malloc of %" FMT " bytes for relocs failed\n", sec->shdr.sh_size);
- if (fseek(fp, sec->shdr.sh_offset, SEEK_SET) < 0)
- die("Seek to %" FMT " failed: %s\n", sec->shdr.sh_offset, strerror(errno));
-
- if (fread(sec->reltab, 1, sec->shdr.sh_size, fp) != sec->shdr.sh_size)
- die("Cannot read symbol table: %s\n", strerror(errno));
-
for (j = 0; j < sec->shdr.sh_size/sizeof(Elf_Rel); j++) {
Elf_Rel *rel = &sec->reltab[j];
-
- rel->r_offset = elf_addr_to_cpu(rel->r_offset);
- rel->r_info = elf_xword_to_cpu(rel->r_info);
+ rel->r_offset = elf_addr_to_cpu(reltab[j].r_offset);
+ rel->r_info = elf_xword_to_cpu(reltab[j].r_info);
#if (SHT_REL_TYPE == SHT_RELA)
- rel->r_addend = elf_xword_to_cpu(rel->r_addend);
+ rel->r_addend = elf_xword_to_cpu(reltab[j].r_addend);
#endif
}
}
@@ -591,7 +552,7 @@ static void print_absolute_symbols(void)
for (i = 0; i < shnum; i++) {
struct section *sec = &secs[i];
- char *sym_strtab;
+ const char *sym_strtab;
int j;
if (sec->shdr.sh_type != SHT_SYMTAB)
@@ -633,7 +594,7 @@ static void print_absolute_relocs(void)
for (i = 0; i < shnum; i++) {
struct section *sec = &secs[i];
struct section *sec_applies, *sec_symtab;
- char *sym_strtab;
+ const char *sym_strtab;
Elf_Sym *sh_symtab;
int j;
@@ -725,7 +686,7 @@ static void walk_relocs(int (*process)(struct section *sec, Elf_Rel *rel,
/* Walk through the relocations */
for (i = 0; i < shnum; i++) {
- char *sym_strtab;
+ const char *sym_strtab;
Elf_Sym *sh_symtab;
struct section *sec_applies, *sec_symtab;
int j;
@@ -1177,12 +1138,24 @@ void process(FILE *fp, int use_real_mode, int as_text,
int show_absolute_syms, int show_absolute_relocs,
int show_reloc_info)
{
+ int fd = fileno(fp);
+ struct stat sb;
+ void *p;
+
+ if (fstat(fd, &sb))
+ die("fstat() failed\n");
+
+ elf_image = p = mmap(NULL, sb.st_size, PROT_READ, MAP_PRIVATE, fd, 0);
+ if (p == MAP_FAILED)
+ die("mmap() failed\n");
+
regex_init(use_real_mode);
- read_ehdr(fp);
- read_shdrs(fp);
- read_strtabs(fp);
- read_symtabs(fp);
- read_relocs(fp);
+
+ read_ehdr();
+ read_shdrs();
+ read_strtabs();
+ read_symtabs();
+ read_relocs();
if (ELF_BITS == 64)
percpu_init();
@@ -1203,4 +1176,6 @@ void process(FILE *fp, int use_real_mode, int as_text,
}
emit_relocs(as_text, use_real_mode);
+
+ munmap(p, sb.st_size);
}
diff --git a/arch/x86/tools/relocs.h b/arch/x86/tools/relocs.h
index 4c49c82446eb..7a509604ff92 100644
--- a/arch/x86/tools/relocs.h
+++ b/arch/x86/tools/relocs.h
@@ -16,6 +16,8 @@
#include <endian.h>
#include <regex.h>
#include <tools/le_byteshift.h>
+#include <sys/stat.h>
+#include <sys/mman.h>
__attribute__((__format__(printf, 1, 2)))
void die(char *fmt, ...) __attribute__((noreturn));
--
2.46.0.792.g87dc391469-goog
^ permalink raw reply related [flat|nested] 73+ messages in thread
* [RFC PATCH 04/28] x86/boot: Permit GOTPCREL relocations for x86_64 builds
2024-09-25 15:01 [RFC PATCH 00/28] x86: Rely on toolchain for relocatable code Ard Biesheuvel
` (2 preceding siblings ...)
2024-09-25 15:01 ` [RFC PATCH 03/28] x86/tools: Use mmap() to simplify relocs host tool Ard Biesheuvel
@ 2024-09-25 15:01 ` Ard Biesheuvel
2024-10-01 5:33 ` Josh Poimboeuf
2024-09-25 15:01 ` [RFC PATCH 05/28] x86: Define the stack protector guard symbol explicitly Ard Biesheuvel
` (23 subsequent siblings)
27 siblings, 1 reply; 73+ messages in thread
From: Ard Biesheuvel @ 2024-09-25 15:01 UTC (permalink / raw)
To: linux-kernel
Cc: Ard Biesheuvel, x86, H. Peter Anvin, Andy Lutomirski,
Peter Zijlstra, Uros Bizjak, Dennis Zhou, Tejun Heo,
Christoph Lameter, Mathieu Desnoyers, Paolo Bonzini,
Vitaly Kuznetsov, Juergen Gross, Boris Ostrovsky,
Greg Kroah-Hartman, Arnd Bergmann, Masahiro Yamada, Kees Cook,
Nathan Chancellor, Keith Packard, Justin Stitt, Josh Poimboeuf,
Arnaldo Carvalho de Melo, Namhyung Kim, Jiri Olsa, Ian Rogers,
Adrian Hunter, Kan Liang, linux-doc, linux-pm, kvm, xen-devel,
linux-efi, linux-arch, linux-sparse, linux-kbuild,
linux-perf-users, rust-for-linux, llvm
From: Ard Biesheuvel <ardb@kernel.org>
Some of the early x86_64 startup code is written in C, and executes in
the early 1:1 mapping of the kernel, which is not the address it was
linked at, and this requires special care when accessing global
variables. This is currently being dealt with on an ad-hoc basis,
primarily in head64.c, using explicit pointer fixups, but it would be
better to rely on the compiler for this, by using -fPIE to generate code
that can run at any address, and uses RIP-relative accesses to refer to
global variables.
While it is possible to avoid most GOT based symbol references that the
compiler typically emits when running in -fPIE mode, by using 'hidden'
visibility, there are cases where the compiler will always rely on the
GOT, for instance, for weak external references (which may remain
unsatisfied at link time).
This means the build may produce a small number of GOT entries
nonetheless. So update the reloc processing host tool to add support for
this, and place the GOT in the .text section rather than discard it.
Note that multiple GOT based references to the same symbol will share a
single GOT entry, and so naively emitting a relocation for the GOT entry
each time a reference to it is encountered could result in duplicates.
Work around this by relying on the fact that the relocation lists are
sorted, and deduplicate 64-bit relocations as they are emitted by
comparing each entry with the previous one.
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
---
arch/x86/Makefile | 4 +++
arch/x86/kernel/vmlinux.lds.S | 5 +++
arch/x86/tools/relocs.c | 33 ++++++++++++++++++--
include/asm-generic/vmlinux.lds.h | 7 +++++
4 files changed, 47 insertions(+), 2 deletions(-)
diff --git a/arch/x86/Makefile b/arch/x86/Makefile
index 801fd85c3ef6..6b3fe6e2aadd 100644
--- a/arch/x86/Makefile
+++ b/arch/x86/Makefile
@@ -192,6 +192,10 @@ else
KBUILD_CFLAGS += -mcmodel=kernel
KBUILD_RUSTFLAGS += -Cno-redzone=y
KBUILD_RUSTFLAGS += -Ccode-model=kernel
+
+ # Don't emit relaxable GOTPCREL relocations
+ KBUILD_AFLAGS_KERNEL += -Wa,-mrelax-relocations=no
+ KBUILD_CFLAGS_KERNEL += -Wa,-mrelax-relocations=no
endif
#
diff --git a/arch/x86/kernel/vmlinux.lds.S b/arch/x86/kernel/vmlinux.lds.S
index 6e73403e874f..7f060d873f75 100644
--- a/arch/x86/kernel/vmlinux.lds.S
+++ b/arch/x86/kernel/vmlinux.lds.S
@@ -20,6 +20,9 @@
#define RUNTIME_DISCARD_EXIT
#define EMITS_PT_NOTE
#define RO_EXCEPTION_TABLE_ALIGN 16
+#ifdef CONFIG_X86_64
+#define GOT_IN_RODATA
+#endif
#include <asm-generic/vmlinux.lds.h>
#include <asm/asm-offsets.h>
@@ -464,10 +467,12 @@ SECTIONS
* Sections that should stay zero sized, which is safer to
* explicitly check instead of blindly discarding.
*/
+#ifdef CONFIG_X86_32
.got : {
*(.got) *(.igot.*)
}
ASSERT(SIZEOF(.got) == 0, "Unexpected GOT entries detected!")
+#endif
.plt : {
*(.plt) *(.plt.*) *(.iplt)
diff --git a/arch/x86/tools/relocs.c b/arch/x86/tools/relocs.c
index 35a73e4aa74d..880f0f2e465e 100644
--- a/arch/x86/tools/relocs.c
+++ b/arch/x86/tools/relocs.c
@@ -223,6 +223,8 @@ static const char *rel_type(unsigned type)
REL_TYPE(R_X86_64_JUMP_SLOT),
REL_TYPE(R_X86_64_RELATIVE),
REL_TYPE(R_X86_64_GOTPCREL),
+ REL_TYPE(R_X86_64_GOTPCRELX),
+ REL_TYPE(R_X86_64_REX_GOTPCRELX),
REL_TYPE(R_X86_64_32),
REL_TYPE(R_X86_64_32S),
REL_TYPE(R_X86_64_16),
@@ -843,6 +845,7 @@ static int do_reloc64(struct section *sec, Elf_Rel *rel, ElfW(Sym) *sym,
case R_X86_64_32:
case R_X86_64_32S:
case R_X86_64_64:
+ case R_X86_64_GOTPCREL:
/*
* References to the percpu area don't need to be adjusted.
*/
@@ -861,6 +864,31 @@ static int do_reloc64(struct section *sec, Elf_Rel *rel, ElfW(Sym) *sym,
break;
}
+ if (r_type == R_X86_64_GOTPCREL) {
+ Elf_Shdr *s = &secs[sec->shdr.sh_info].shdr;
+ unsigned file_off = offset - s->sh_addr + s->sh_offset;
+
+ /*
+ * GOTPCREL relocations refer to instructions that load
+ * a 64-bit address via a 32-bit relative reference to
+ * the GOT. In this case, it is the GOT entry that
+ * needs to be fixed up, not the immediate offset in
+ * the opcode. Note that the linker will have applied an
+ * addend of -4 to compensate for the delta between the
+ * relocation offset and the value of RIP when the
+ * instruction executes, and this needs to be backed out
+ * again. (Addends other than -4 are permitted in
+ * principle, but make no sense in practice so they are
+ * not supported.)
+ */
+ if (rel->r_addend != -4) {
+ die("invalid addend (%ld) for %s relocation: %s\n",
+ rel->r_addend, rel_type(r_type), symname);
+ break;
+ }
+ offset += 4 + (int32_t)get_unaligned_le32(elf_image + file_off);
+ }
+
/*
* Relocation offsets for 64 bit kernels are output
* as 32 bits and sign extended back to 64 bits when
@@ -870,7 +898,7 @@ static int do_reloc64(struct section *sec, Elf_Rel *rel, ElfW(Sym) *sym,
if ((int32_t)offset != (int64_t)offset)
die("Relocation offset doesn't fit in 32 bits\n");
- if (r_type == R_X86_64_64)
+ if (r_type == R_X86_64_64 || r_type == R_X86_64_GOTPCREL)
add_reloc(&relocs64, offset);
else
add_reloc(&relocs32, offset);
@@ -1085,7 +1113,8 @@ static void emit_relocs(int as_text, int use_real_mode)
/* Now print each relocation */
for (i = 0; i < relocs64.count; i++)
- write_reloc(relocs64.offset[i], stdout);
+ if (!i || relocs64.offset[i] != relocs64.offset[i - 1])
+ write_reloc(relocs64.offset[i], stdout);
/* Print a stop */
write_reloc(0, stdout);
diff --git a/include/asm-generic/vmlinux.lds.h b/include/asm-generic/vmlinux.lds.h
index 19ec49a9179b..cc14d780c70d 100644
--- a/include/asm-generic/vmlinux.lds.h
+++ b/include/asm-generic/vmlinux.lds.h
@@ -443,6 +443,12 @@
#endif
#endif
+#ifdef GOT_IN_RODATA
+#define GOT_RODATA *(.got .igot*)
+#else
+#define GOT_RODATA
+#endif
+
/*
* Read only Data
*/
@@ -454,6 +460,7 @@
SCHED_DATA \
RO_AFTER_INIT_DATA /* Read only after init */ \
. = ALIGN(8); \
+ GOT_RODATA \
BOUNDED_SECTION_BY(__tracepoints_ptrs, ___tracepoints_ptrs) \
*(__tracepoints_strings)/* Tracepoints: strings */ \
} \
--
2.46.0.792.g87dc391469-goog
^ permalink raw reply related [flat|nested] 73+ messages in thread
* [RFC PATCH 05/28] x86: Define the stack protector guard symbol explicitly
2024-09-25 15:01 [RFC PATCH 00/28] x86: Rely on toolchain for relocatable code Ard Biesheuvel
` (3 preceding siblings ...)
2024-09-25 15:01 ` [RFC PATCH 04/28] x86/boot: Permit GOTPCREL relocations for x86_64 builds Ard Biesheuvel
@ 2024-09-25 15:01 ` Ard Biesheuvel
2024-09-25 15:53 ` Ian Rogers
` (2 more replies)
2024-09-25 15:01 ` [RFC PATCH 06/28] x86/percpu: Get rid of absolute per-CPU variable placement Ard Biesheuvel
` (22 subsequent siblings)
27 siblings, 3 replies; 73+ messages in thread
From: Ard Biesheuvel @ 2024-09-25 15:01 UTC (permalink / raw)
To: linux-kernel
Cc: Ard Biesheuvel, x86, H. Peter Anvin, Andy Lutomirski,
Peter Zijlstra, Uros Bizjak, Dennis Zhou, Tejun Heo,
Christoph Lameter, Mathieu Desnoyers, Paolo Bonzini,
Vitaly Kuznetsov, Juergen Gross, Boris Ostrovsky,
Greg Kroah-Hartman, Arnd Bergmann, Masahiro Yamada, Kees Cook,
Nathan Chancellor, Keith Packard, Justin Stitt, Josh Poimboeuf,
Arnaldo Carvalho de Melo, Namhyung Kim, Jiri Olsa, Ian Rogers,
Adrian Hunter, Kan Liang, linux-doc, linux-pm, kvm, xen-devel,
linux-efi, linux-arch, linux-sparse, linux-kbuild,
linux-perf-users, rust-for-linux, llvm
From: Ard Biesheuvel <ardb@kernel.org>
Specify the guard symbol for the stack cookie explicitly, rather than
positioning it exactly 40 bytes into the per-CPU area. Doing so removes
the need for the per-CPU region to be absolute rather than relative to
the placement of the per-CPU template region in the kernel image, and
this allows the special handling for absolute per-CPU symbols to be
removed entirely.
This is a worthwhile cleanup in itself, but it is also a prerequisite
for PIE codegen and PIE linking, which can replace our bespoke and
rather clunky runtime relocation handling.
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
---
arch/x86/Makefile | 4 ++++
arch/x86/include/asm/init.h | 2 +-
arch/x86/include/asm/processor.h | 11 +++--------
arch/x86/include/asm/stackprotector.h | 4 ----
tools/perf/util/annotate.c | 4 ++--
5 files changed, 10 insertions(+), 15 deletions(-)
diff --git a/arch/x86/Makefile b/arch/x86/Makefile
index 6b3fe6e2aadd..b78b7623a4a9 100644
--- a/arch/x86/Makefile
+++ b/arch/x86/Makefile
@@ -193,6 +193,10 @@ else
KBUILD_RUSTFLAGS += -Cno-redzone=y
KBUILD_RUSTFLAGS += -Ccode-model=kernel
+ ifeq ($(CONFIG_STACKPROTECTOR),y)
+ KBUILD_CFLAGS += -mstack-protector-guard-symbol=fixed_percpu_data
+ endif
+
# Don't emit relaxable GOTPCREL relocations
KBUILD_AFLAGS_KERNEL += -Wa,-mrelax-relocations=no
KBUILD_CFLAGS_KERNEL += -Wa,-mrelax-relocations=no
diff --git a/arch/x86/include/asm/init.h b/arch/x86/include/asm/init.h
index 14d72727d7ee..3ed0e8ec973f 100644
--- a/arch/x86/include/asm/init.h
+++ b/arch/x86/include/asm/init.h
@@ -2,7 +2,7 @@
#ifndef _ASM_X86_INIT_H
#define _ASM_X86_INIT_H
-#define __head __section(".head.text")
+#define __head __section(".head.text") __no_stack_protector
struct x86_mapping_info {
void *(*alloc_pgt_page)(void *); /* allocate buf for page table */
diff --git a/arch/x86/include/asm/processor.h b/arch/x86/include/asm/processor.h
index 4a686f0e5dbf..56bc36116814 100644
--- a/arch/x86/include/asm/processor.h
+++ b/arch/x86/include/asm/processor.h
@@ -402,14 +402,9 @@ struct irq_stack {
#ifdef CONFIG_X86_64
struct fixed_percpu_data {
/*
- * GCC hardcodes the stack canary as %gs:40. Since the
- * irq_stack is the object at %gs:0, we reserve the bottom
- * 48 bytes of the irq stack for the canary.
- *
- * Once we are willing to require -mstack-protector-guard-symbol=
- * support for x86_64 stackprotector, we can get rid of this.
+ * Since the irq_stack is the object at %gs:0, the bottom 8 bytes of
+ * the irq stack are reserved for the canary.
*/
- char gs_base[40];
unsigned long stack_canary;
};
@@ -418,7 +413,7 @@ DECLARE_INIT_PER_CPU(fixed_percpu_data);
static inline unsigned long cpu_kernelmode_gs_base(int cpu)
{
- return (unsigned long)per_cpu(fixed_percpu_data.gs_base, cpu);
+ return (unsigned long)&per_cpu(fixed_percpu_data, cpu);
}
extern asmlinkage void entry_SYSCALL32_ignore(void);
diff --git a/arch/x86/include/asm/stackprotector.h b/arch/x86/include/asm/stackprotector.h
index 00473a650f51..d1dcd22a0a4c 100644
--- a/arch/x86/include/asm/stackprotector.h
+++ b/arch/x86/include/asm/stackprotector.h
@@ -51,10 +51,6 @@ static __always_inline void boot_init_stack_canary(void)
{
unsigned long canary = get_random_canary();
-#ifdef CONFIG_X86_64
- BUILD_BUG_ON(offsetof(struct fixed_percpu_data, stack_canary) != 40);
-#endif
-
current->stack_canary = canary;
#ifdef CONFIG_X86_64
this_cpu_write(fixed_percpu_data.stack_canary, canary);
diff --git a/tools/perf/util/annotate.c b/tools/perf/util/annotate.c
index 37ce43c4eb8f..7ecfedf5edb9 100644
--- a/tools/perf/util/annotate.c
+++ b/tools/perf/util/annotate.c
@@ -2485,10 +2485,10 @@ static bool is_stack_operation(struct arch *arch, struct disasm_line *dl)
static bool is_stack_canary(struct arch *arch, struct annotated_op_loc *loc)
{
- /* On x86_64, %gs:40 is used for stack canary */
+ /* On x86_64, %gs:0 is used for stack canary */
if (arch__is(arch, "x86")) {
if (loc->segment == INSN_SEG_X86_GS && loc->imm &&
- loc->offset == 40)
+ loc->offset == 0)
return true;
}
--
2.46.0.792.g87dc391469-goog
^ permalink raw reply related [flat|nested] 73+ messages in thread
* [RFC PATCH 06/28] x86/percpu: Get rid of absolute per-CPU variable placement
2024-09-25 15:01 [RFC PATCH 00/28] x86: Rely on toolchain for relocatable code Ard Biesheuvel
` (4 preceding siblings ...)
2024-09-25 15:01 ` [RFC PATCH 05/28] x86: Define the stack protector guard symbol explicitly Ard Biesheuvel
@ 2024-09-25 15:01 ` Ard Biesheuvel
2024-09-25 17:56 ` Christoph Lameter (Ampere)
2024-09-25 15:01 ` [RFC PATCH 07/28] scripts/kallsyms: Avoid 0x0 as the relative base Ard Biesheuvel
` (21 subsequent siblings)
27 siblings, 1 reply; 73+ messages in thread
From: Ard Biesheuvel @ 2024-09-25 15:01 UTC (permalink / raw)
To: linux-kernel
Cc: Ard Biesheuvel, x86, H. Peter Anvin, Andy Lutomirski,
Peter Zijlstra, Uros Bizjak, Dennis Zhou, Tejun Heo,
Christoph Lameter, Mathieu Desnoyers, Paolo Bonzini,
Vitaly Kuznetsov, Juergen Gross, Boris Ostrovsky,
Greg Kroah-Hartman, Arnd Bergmann, Masahiro Yamada, Kees Cook,
Nathan Chancellor, Keith Packard, Justin Stitt, Josh Poimboeuf,
Arnaldo Carvalho de Melo, Namhyung Kim, Jiri Olsa, Ian Rogers,
Adrian Hunter, Kan Liang, linux-doc, linux-pm, kvm, xen-devel,
linux-efi, linux-arch, linux-sparse, linux-kbuild,
linux-perf-users, rust-for-linux, llvm
From: Ard Biesheuvel <ardb@kernel.org>
For historic reasons, per-CPU symbols on x86_64 are emitted in an
address space that is disjoint from the ordinary kernel VA space,
starting at address 0x0. This splits a per-CPU symbol reference into a
base plus offset, where the base is programmed into the GS segment
register.
This deviates from the usual approach adopted by other SMP
architectures, where the base is a reference to the variable in the
kernel image's per-CPU template area, and the offset is a per-CPU value
accounting for the displacement of that particular CPU's per-CPU region
with respect to the template area. This gives per-CPU variable
references a range that is identical to ordinary references, and
requires no special handling for the startup code, as the offset will
simply be 0x0 up until the point where per-CPU variables are initialized
properly.
The x86_64 approach was needed to accommodate per-task stack protector
cookies, which used to live at a fixed offset of GS+40, requiring GS to
be treated as a base register. This is no longer the case, though, and
so GS can be repurposed as a true per-CPU offset, adopting the same
strategy as other architectures.
This also removes the need for linker tricks to emit the per-CPU ELF
segment at a different virtual address. It also means RIP-relative
per-CPU variables no longer need to be relocated in the opposite
direction when KASLR is applied, which was necessary because the 0x0
based per-CPU region remains in place even when the kernel is moved
around.
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
---
arch/x86/include/asm/desc.h | 1 -
arch/x86/include/asm/percpu.h | 22 --------------
arch/x86/include/asm/processor.h | 5 ++--
arch/x86/kernel/head64.c | 2 +-
arch/x86/kernel/head_64.S | 12 ++------
arch/x86/kernel/irq_64.c | 1 -
arch/x86/kernel/setup_percpu.c | 9 +-----
arch/x86/kernel/vmlinux.lds.S | 30 --------------------
arch/x86/platform/pvh/head.S | 6 ++--
arch/x86/tools/relocs.c | 8 +-----
arch/x86/xen/xen-head.S | 10 ++-----
init/Kconfig | 1 -
12 files changed, 13 insertions(+), 94 deletions(-)
diff --git a/arch/x86/include/asm/desc.h b/arch/x86/include/asm/desc.h
index 62dc9f59ea76..ec95fe44fa3a 100644
--- a/arch/x86/include/asm/desc.h
+++ b/arch/x86/include/asm/desc.h
@@ -46,7 +46,6 @@ struct gdt_page {
} __attribute__((aligned(PAGE_SIZE)));
DECLARE_PER_CPU_PAGE_ALIGNED(struct gdt_page, gdt_page);
-DECLARE_INIT_PER_CPU(gdt_page);
/* Provide the original GDT */
static inline struct desc_struct *get_cpu_gdt_rw(unsigned int cpu)
diff --git a/arch/x86/include/asm/percpu.h b/arch/x86/include/asm/percpu.h
index c55a79d5feae..1ded1207528d 100644
--- a/arch/x86/include/asm/percpu.h
+++ b/arch/x86/include/asm/percpu.h
@@ -20,12 +20,6 @@
#define PER_CPU_VAR(var) __percpu(var)__percpu_rel
-#ifdef CONFIG_X86_64_SMP
-# define INIT_PER_CPU_VAR(var) init_per_cpu__##var
-#else
-# define INIT_PER_CPU_VAR(var) var
-#endif
-
#else /* !__ASSEMBLY__: */
#include <linux/build_bug.h>
@@ -97,22 +91,6 @@
#define __percpu_arg(x) __percpu_prefix "%" #x
#define __force_percpu_arg(x) __force_percpu_prefix "%" #x
-/*
- * Initialized pointers to per-CPU variables needed for the boot
- * processor need to use these macros to get the proper address
- * offset from __per_cpu_load on SMP.
- *
- * There also must be an entry in vmlinux_64.lds.S
- */
-#define DECLARE_INIT_PER_CPU(var) \
- extern typeof(var) init_per_cpu_var(var)
-
-#ifdef CONFIG_X86_64_SMP
-# define init_per_cpu_var(var) init_per_cpu__##var
-#else
-# define init_per_cpu_var(var) var
-#endif
-
/*
* For arch-specific code, we can use direct single-insn ops (they
* don't give an lvalue though).
diff --git a/arch/x86/include/asm/processor.h b/arch/x86/include/asm/processor.h
index 56bc36116814..d7219e149f24 100644
--- a/arch/x86/include/asm/processor.h
+++ b/arch/x86/include/asm/processor.h
@@ -409,11 +409,12 @@ struct fixed_percpu_data {
};
DECLARE_PER_CPU_FIRST(struct fixed_percpu_data, fixed_percpu_data) __visible;
-DECLARE_INIT_PER_CPU(fixed_percpu_data);
static inline unsigned long cpu_kernelmode_gs_base(int cpu)
{
- return (unsigned long)&per_cpu(fixed_percpu_data, cpu);
+ extern unsigned long __per_cpu_offset[];
+
+ return IS_ENABLED(CONFIG_SMP) ? __per_cpu_offset[cpu] : 0;
}
extern asmlinkage void entry_SYSCALL32_ignore(void);
diff --git a/arch/x86/kernel/head64.c b/arch/x86/kernel/head64.c
index 4b9d4557fc94..d4398261ad81 100644
--- a/arch/x86/kernel/head64.c
+++ b/arch/x86/kernel/head64.c
@@ -559,7 +559,7 @@ void early_setup_idt(void)
*/
void __head startup_64_setup_gdt_idt(void)
{
- struct desc_struct *gdt = (void *)(__force unsigned long)init_per_cpu_var(gdt_page.gdt);
+ struct desc_struct *gdt = (void *)(__force unsigned long)gdt_page.gdt;
void *handler = NULL;
struct desc_ptr startup_gdt_descr = {
diff --git a/arch/x86/kernel/head_64.S b/arch/x86/kernel/head_64.S
index 330922b328bf..ab6ccee81493 100644
--- a/arch/x86/kernel/head_64.S
+++ b/arch/x86/kernel/head_64.S
@@ -68,11 +68,10 @@ SYM_CODE_START_NOALIGN(startup_64)
/* Set up the stack for verify_cpu() */
leaq __top_init_kernel_stack(%rip), %rsp
- /* Setup GSBASE to allow stack canary access for C code */
+ /* Clear %gs so early per-CPU references target the per-CPU load area */
movl $MSR_GS_BASE, %ecx
- leaq INIT_PER_CPU_VAR(fixed_percpu_data)(%rip), %rdx
- movl %edx, %eax
- shrq $32, %rdx
+ xorl %eax, %eax
+ cdq
wrmsr
call startup_64_setup_gdt_idt
@@ -361,15 +360,10 @@ SYM_INNER_LABEL(common_startup_64, SYM_L_LOCAL)
/* Set up %gs.
*
- * The base of %gs always points to fixed_percpu_data. If the
- * stack protector canary is enabled, it is located at %gs:40.
* Note that, on SMP, the boot cpu uses init data section until
* the per cpu areas are set up.
*/
movl $MSR_GS_BASE,%ecx
-#ifndef CONFIG_SMP
- leaq INIT_PER_CPU_VAR(fixed_percpu_data)(%rip), %rdx
-#endif
movl %edx, %eax
shrq $32, %rdx
wrmsr
diff --git a/arch/x86/kernel/irq_64.c b/arch/x86/kernel/irq_64.c
index ade0043ce56e..56bdeecd8ee0 100644
--- a/arch/x86/kernel/irq_64.c
+++ b/arch/x86/kernel/irq_64.c
@@ -27,7 +27,6 @@
#include <asm/apic.h>
DEFINE_PER_CPU_PAGE_ALIGNED(struct irq_stack, irq_stack_backing_store) __visible;
-DECLARE_INIT_PER_CPU(irq_stack_backing_store);
#ifdef CONFIG_VMAP_STACK
/*
diff --git a/arch/x86/kernel/setup_percpu.c b/arch/x86/kernel/setup_percpu.c
index b30d6e180df7..57482420ff42 100644
--- a/arch/x86/kernel/setup_percpu.c
+++ b/arch/x86/kernel/setup_percpu.c
@@ -23,17 +23,10 @@
#include <asm/cpumask.h>
#include <asm/cpu.h>
-#ifdef CONFIG_X86_64
-#define BOOT_PERCPU_OFFSET ((unsigned long)__per_cpu_load)
-#else
-#define BOOT_PERCPU_OFFSET 0
-#endif
-
-DEFINE_PER_CPU_READ_MOSTLY(unsigned long, this_cpu_off) = BOOT_PERCPU_OFFSET;
+DEFINE_PER_CPU_READ_MOSTLY(unsigned long, this_cpu_off) = 0;
EXPORT_PER_CPU_SYMBOL(this_cpu_off);
unsigned long __per_cpu_offset[NR_CPUS] __ro_after_init = {
- [0 ... NR_CPUS-1] = BOOT_PERCPU_OFFSET,
};
EXPORT_SYMBOL(__per_cpu_offset);
diff --git a/arch/x86/kernel/vmlinux.lds.S b/arch/x86/kernel/vmlinux.lds.S
index 7f060d873f75..00f82db7b3e1 100644
--- a/arch/x86/kernel/vmlinux.lds.S
+++ b/arch/x86/kernel/vmlinux.lds.S
@@ -103,9 +103,6 @@ PHDRS {
text PT_LOAD FLAGS(5); /* R_E */
data PT_LOAD FLAGS(6); /* RW_ */
#ifdef CONFIG_X86_64
-#ifdef CONFIG_SMP
- percpu PT_LOAD FLAGS(6); /* RW_ */
-#endif
init PT_LOAD FLAGS(7); /* RWE */
#endif
note PT_NOTE FLAGS(0); /* ___ */
@@ -225,17 +222,6 @@ SECTIONS
__init_begin = .; /* paired with __init_end */
}
-#if defined(CONFIG_X86_64) && defined(CONFIG_SMP)
- /*
- * percpu offsets are zero-based on SMP. PERCPU_VADDR() changes the
- * output PHDR, so the next output section - .init.text - should
- * start another segment - init.
- */
- PERCPU_VADDR(INTERNODE_CACHE_BYTES, 0, :percpu)
- ASSERT(SIZEOF(.data..percpu) < CONFIG_PHYSICAL_START,
- "per-CPU data too large - increase CONFIG_PHYSICAL_START")
-#endif
-
INIT_TEXT_SECTION(PAGE_SIZE)
#ifdef CONFIG_X86_64
:init
@@ -356,9 +342,7 @@ SECTIONS
EXIT_DATA
}
-#if !defined(CONFIG_X86_64) || !defined(CONFIG_SMP)
PERCPU_SECTION(INTERNODE_CACHE_BYTES)
-#endif
RUNTIME_CONST(shift, d_hash_shift)
RUNTIME_CONST(ptr, dentry_hashtable)
@@ -497,20 +481,6 @@ SECTIONS
"kernel image bigger than KERNEL_IMAGE_SIZE");
#ifdef CONFIG_X86_64
-/*
- * Per-cpu symbols which need to be offset from __per_cpu_load
- * for the boot processor.
- */
-#define INIT_PER_CPU(x) init_per_cpu__##x = ABSOLUTE(x) + __per_cpu_load
-INIT_PER_CPU(gdt_page);
-INIT_PER_CPU(fixed_percpu_data);
-INIT_PER_CPU(irq_stack_backing_store);
-
-#ifdef CONFIG_SMP
-. = ASSERT((fixed_percpu_data == 0),
- "fixed_percpu_data is not at start of per-cpu area");
-#endif
-
#ifdef CONFIG_MITIGATION_UNRET_ENTRY
. = ASSERT((retbleed_return_thunk & 0x3f) == 0, "retbleed_return_thunk not cacheline-aligned");
#endif
diff --git a/arch/x86/platform/pvh/head.S b/arch/x86/platform/pvh/head.S
index a308b79a887c..11245ecdc08d 100644
--- a/arch/x86/platform/pvh/head.S
+++ b/arch/x86/platform/pvh/head.S
@@ -95,9 +95,9 @@ SYM_CODE_START_LOCAL(pvh_start_xen)
/* 64-bit entry point. */
.code64
1:
- /* Set base address in stack canary descriptor. */
+ /* Clear %gs so early per-CPU references target the per-CPU load area */
mov $MSR_GS_BASE,%ecx
- mov $_pa(canary), %eax
+ xor %eax, %eax
xor %edx, %edx
wrmsr
@@ -161,8 +161,6 @@ SYM_DATA_START_LOCAL(gdt_start)
SYM_DATA_END_LABEL(gdt_start, SYM_L_LOCAL, gdt_end)
.balign 16
-SYM_DATA_LOCAL(canary, .fill 48, 1, 0)
-
SYM_DATA_START_LOCAL(early_stack)
.fill BOOT_STACK_SIZE, 1, 0
SYM_DATA_END_LABEL(early_stack, SYM_L_LOCAL, early_stack_end)
diff --git a/arch/x86/tools/relocs.c b/arch/x86/tools/relocs.c
index 880f0f2e465e..10add45b99f1 100644
--- a/arch/x86/tools/relocs.c
+++ b/arch/x86/tools/relocs.c
@@ -88,7 +88,6 @@ static const char * const sym_regex_kernel[S_NSYMTYPES] = {
"(jiffies|jiffies_64)|"
#if ELF_BITS == 64
"__per_cpu_load|"
- "init_per_cpu__.*|"
"__end_rodata_hpage_align|"
#endif
"__vvar_page|"
@@ -785,10 +784,6 @@ static void percpu_init(void)
* The GNU linker incorrectly associates:
* __init_begin
* __per_cpu_load
- *
- * The "gold" linker incorrectly associates:
- * init_per_cpu__fixed_percpu_data
- * init_per_cpu__gdt_page
*/
static int is_percpu_sym(ElfW(Sym) *sym, const char *symname)
{
@@ -796,8 +791,7 @@ static int is_percpu_sym(ElfW(Sym) *sym, const char *symname)
return (shndx == per_cpu_shndx) &&
strcmp(symname, "__init_begin") &&
- strcmp(symname, "__per_cpu_load") &&
- strncmp(symname, "init_per_cpu_", 13);
+ strcmp(symname, "__per_cpu_load");
}
diff --git a/arch/x86/xen/xen-head.S b/arch/x86/xen/xen-head.S
index 758bcd47b72d..faadac7c29e6 100644
--- a/arch/x86/xen/xen-head.S
+++ b/arch/x86/xen/xen-head.S
@@ -51,15 +51,9 @@ SYM_CODE_START(startup_xen)
leaq __top_init_kernel_stack(%rip), %rsp
- /* Set up %gs.
- *
- * The base of %gs always points to fixed_percpu_data. If the
- * stack protector canary is enabled, it is located at %gs:40.
- * Note that, on SMP, the boot cpu uses init data section until
- * the per cpu areas are set up.
- */
+ /* Clear %gs so early per-CPU references target the per-CPU load area */
movl $MSR_GS_BASE,%ecx
- movq $INIT_PER_CPU_VAR(fixed_percpu_data),%rax
+ xorl %eax, %eax
cdq
wrmsr
diff --git a/init/Kconfig b/init/Kconfig
index b05467014041..be8a9a786d3c 100644
--- a/init/Kconfig
+++ b/init/Kconfig
@@ -1838,7 +1838,6 @@ config KALLSYMS_ALL
config KALLSYMS_ABSOLUTE_PERCPU
bool
depends on KALLSYMS
- default X86_64 && SMP
# end of the "standard kernel features (expert users)" menu
--
2.46.0.792.g87dc391469-goog
^ permalink raw reply related [flat|nested] 73+ messages in thread
* [RFC PATCH 07/28] scripts/kallsyms: Avoid 0x0 as the relative base
2024-09-25 15:01 [RFC PATCH 00/28] x86: Rely on toolchain for relocatable code Ard Biesheuvel
` (5 preceding siblings ...)
2024-09-25 15:01 ` [RFC PATCH 06/28] x86/percpu: Get rid of absolute per-CPU variable placement Ard Biesheuvel
@ 2024-09-25 15:01 ` Ard Biesheuvel
2024-09-25 15:01 ` [RFC PATCH 08/28] scripts/kallsyms: Remove support for absolute per-CPU variables Ard Biesheuvel
` (20 subsequent siblings)
27 siblings, 0 replies; 73+ messages in thread
From: Ard Biesheuvel @ 2024-09-25 15:01 UTC (permalink / raw)
To: linux-kernel
Cc: Ard Biesheuvel, x86, H. Peter Anvin, Andy Lutomirski,
Peter Zijlstra, Uros Bizjak, Dennis Zhou, Tejun Heo,
Christoph Lameter, Mathieu Desnoyers, Paolo Bonzini,
Vitaly Kuznetsov, Juergen Gross, Boris Ostrovsky,
Greg Kroah-Hartman, Arnd Bergmann, Masahiro Yamada, Kees Cook,
Nathan Chancellor, Keith Packard, Justin Stitt, Josh Poimboeuf,
Arnaldo Carvalho de Melo, Namhyung Kim, Jiri Olsa, Ian Rogers,
Adrian Hunter, Kan Liang, linux-doc, linux-pm, kvm, xen-devel,
linux-efi, linux-arch, linux-sparse, linux-kbuild,
linux-perf-users, rust-for-linux, llvm
From: Ard Biesheuvel <ardb@kernel.org>
In some cases, LLVM's lld linker may emit the following symbol into the
symbol table
0000000000000000 ? _GLOBAL_OFFSET_TABLE_
and its presence throws off the relative base logic in kallsyms. Since
0x0 is never a valid relative base, just ignore it.
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
---
scripts/kallsyms.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/scripts/kallsyms.c b/scripts/kallsyms.c
index 03852da3d249..09757d300a05 100644
--- a/scripts/kallsyms.c
+++ b/scripts/kallsyms.c
@@ -747,7 +747,7 @@ static void record_relative_base(void)
unsigned int i;
for (i = 0; i < table_cnt; i++)
- if (!symbol_absolute(table[i])) {
+ if (table[i]->addr && !symbol_absolute(table[i])) {
/*
* The table is sorted by address.
* Take the first non-absolute symbol value.
--
2.46.0.792.g87dc391469-goog
^ permalink raw reply related [flat|nested] 73+ messages in thread
* [RFC PATCH 08/28] scripts/kallsyms: Remove support for absolute per-CPU variables
2024-09-25 15:01 [RFC PATCH 00/28] x86: Rely on toolchain for relocatable code Ard Biesheuvel
` (6 preceding siblings ...)
2024-09-25 15:01 ` [RFC PATCH 07/28] scripts/kallsyms: Avoid 0x0 as the relative base Ard Biesheuvel
@ 2024-09-25 15:01 ` Ard Biesheuvel
2024-09-25 15:01 ` [RFC PATCH 09/28] x86/tools: Remove special relocation handling for " Ard Biesheuvel
` (19 subsequent siblings)
27 siblings, 0 replies; 73+ messages in thread
From: Ard Biesheuvel @ 2024-09-25 15:01 UTC (permalink / raw)
To: linux-kernel
Cc: Ard Biesheuvel, x86, H. Peter Anvin, Andy Lutomirski,
Peter Zijlstra, Uros Bizjak, Dennis Zhou, Tejun Heo,
Christoph Lameter, Mathieu Desnoyers, Paolo Bonzini,
Vitaly Kuznetsov, Juergen Gross, Boris Ostrovsky,
Greg Kroah-Hartman, Arnd Bergmann, Masahiro Yamada, Kees Cook,
Nathan Chancellor, Keith Packard, Justin Stitt, Josh Poimboeuf,
Arnaldo Carvalho de Melo, Namhyung Kim, Jiri Olsa, Ian Rogers,
Adrian Hunter, Kan Liang, linux-doc, linux-pm, kvm, xen-devel,
linux-efi, linux-arch, linux-sparse, linux-kbuild,
linux-perf-users, rust-for-linux, llvm
From: Ard Biesheuvel <ardb@kernel.org>
SMP on x86_64 no longer needs absolute per-CPU variables, so this
support can be dropped from kallsyms as well, as no other architectures
rely on this functionality.
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
---
init/Kconfig | 4 --
kernel/kallsyms.c | 12 +----
scripts/kallsyms.c | 51 +++-----------------
scripts/link-vmlinux.sh | 4 --
4 files changed, 9 insertions(+), 62 deletions(-)
diff --git a/init/Kconfig b/init/Kconfig
index be8a9a786d3c..f6eeba81282d 100644
--- a/init/Kconfig
+++ b/init/Kconfig
@@ -1835,10 +1835,6 @@ config KALLSYMS_ALL
Say N unless you really need all symbols, or kernel live patching.
-config KALLSYMS_ABSOLUTE_PERCPU
- bool
- depends on KALLSYMS
-
# end of the "standard kernel features (expert users)" menu
config ARCH_HAS_MEMBARRIER_CALLBACKS
diff --git a/kernel/kallsyms.c b/kernel/kallsyms.c
index a9a0ca605d4a..4198f30aac3c 100644
--- a/kernel/kallsyms.c
+++ b/kernel/kallsyms.c
@@ -148,16 +148,8 @@ static unsigned int get_symbol_offset(unsigned long pos)
unsigned long kallsyms_sym_address(int idx)
{
- /* values are unsigned offsets if --absolute-percpu is not in effect */
- if (!IS_ENABLED(CONFIG_KALLSYMS_ABSOLUTE_PERCPU))
- return kallsyms_relative_base + (u32)kallsyms_offsets[idx];
-
- /* ...otherwise, positive offsets are absolute values */
- if (kallsyms_offsets[idx] >= 0)
- return kallsyms_offsets[idx];
-
- /* ...and negative offsets are relative to kallsyms_relative_base - 1 */
- return kallsyms_relative_base - 1 - kallsyms_offsets[idx];
+ /* values are unsigned offsets */
+ return kallsyms_relative_base + (u32)kallsyms_offsets[idx];
}
static unsigned int get_symbol_seq(int index)
diff --git a/scripts/kallsyms.c b/scripts/kallsyms.c
index 09757d300a05..9c34b9397872 100644
--- a/scripts/kallsyms.c
+++ b/scripts/kallsyms.c
@@ -5,7 +5,7 @@
* This software may be used and distributed according to the terms
* of the GNU General Public License, incorporated herein by reference.
*
- * Usage: kallsyms [--all-symbols] [--absolute-percpu] in.map > out.S
+ * Usage: kallsyms [--all-symbols] in.map > out.S
*
* Table compression uses all the unused char codes on the symbols and
* maps these to the most used substrings (tokens). For instance, it might
@@ -37,7 +37,6 @@ struct sym_entry {
unsigned long long addr;
unsigned int len;
unsigned int seq;
- bool percpu_absolute;
unsigned char sym[];
};
@@ -62,7 +61,6 @@ static struct addr_range percpu_range = {
static struct sym_entry **table;
static unsigned int table_size, table_cnt;
static int all_symbols;
-static int absolute_percpu;
static int token_profit[0x10000];
@@ -73,7 +71,7 @@ static unsigned char best_table_len[256];
static void usage(void)
{
- fprintf(stderr, "Usage: kallsyms [--all-symbols] [--absolute-percpu] in.map > out.S\n");
+ fprintf(stderr, "Usage: kallsyms [--all-symbols] in.map > out.S\n");
exit(1);
}
@@ -175,7 +173,6 @@ static struct sym_entry *read_symbol(FILE *in, char **buf, size_t *buf_len)
sym->len = len;
sym->sym[0] = type;
strcpy(sym_name(sym), name);
- sym->percpu_absolute = false;
return sym;
}
@@ -319,11 +316,6 @@ static int expand_symbol(const unsigned char *data, int len, char *result)
return total;
}
-static bool symbol_absolute(const struct sym_entry *s)
-{
- return s->percpu_absolute;
-}
-
static int compare_names(const void *a, const void *b)
{
int ret;
@@ -457,20 +449,10 @@ static void write_src(void)
long long offset;
bool overflow;
- if (!absolute_percpu) {
- offset = table[i]->addr - relative_base;
- overflow = offset < 0 || offset > UINT_MAX;
- } else if (symbol_absolute(table[i])) {
- offset = table[i]->addr;
- overflow = offset < 0 || offset > INT_MAX;
- } else {
- offset = relative_base - table[i]->addr - 1;
- overflow = offset < INT_MIN || offset >= 0;
- }
+ offset = table[i]->addr - relative_base;
+ overflow = (offset < 0 || offset > UINT_MAX);
if (overflow) {
- fprintf(stderr, "kallsyms failure: "
- "%s symbol value %#llx out of range in relative mode\n",
- symbol_absolute(table[i]) ? "absolute" : "relative",
+ fprintf(stderr, "kallsyms failure: symbol value %#llx out of range\n",
table[i]->addr);
exit(EXIT_FAILURE);
}
@@ -725,32 +707,16 @@ static void sort_symbols(void)
qsort(table, table_cnt, sizeof(table[0]), compare_symbols);
}
-static void make_percpus_absolute(void)
-{
- unsigned int i;
-
- for (i = 0; i < table_cnt; i++)
- if (symbol_in_range(table[i], &percpu_range, 1)) {
- /*
- * Keep the 'A' override for percpu symbols to
- * ensure consistent behavior compared to older
- * versions of this tool.
- */
- table[i]->sym[0] = 'A';
- table[i]->percpu_absolute = true;
- }
-}
-
/* find the minimum non-absolute symbol address */
static void record_relative_base(void)
{
unsigned int i;
for (i = 0; i < table_cnt; i++)
- if (table[i]->addr && !symbol_absolute(table[i])) {
+ if (table[i]->addr) {
/*
* The table is sorted by address.
- * Take the first non-absolute symbol value.
+ * Take the first non-zero symbol value.
*/
relative_base = table[i]->addr;
return;
@@ -762,7 +728,6 @@ int main(int argc, char **argv)
while (1) {
static const struct option long_options[] = {
{"all-symbols", no_argument, &all_symbols, 1},
- {"absolute-percpu", no_argument, &absolute_percpu, 1},
{},
};
@@ -779,8 +744,6 @@ int main(int argc, char **argv)
read_map(argv[optind]);
shrink_table();
- if (absolute_percpu)
- make_percpus_absolute();
sort_symbols();
record_relative_base();
optimize_token_table();
diff --git a/scripts/link-vmlinux.sh b/scripts/link-vmlinux.sh
index a9b3f34a78d2..df5f3fbb46f3 100755
--- a/scripts/link-vmlinux.sh
+++ b/scripts/link-vmlinux.sh
@@ -140,10 +140,6 @@ kallsyms()
kallsymopt="${kallsymopt} --all-symbols"
fi
- if is_enabled CONFIG_KALLSYMS_ABSOLUTE_PERCPU; then
- kallsymopt="${kallsymopt} --absolute-percpu"
- fi
-
info KSYMS "${2}.S"
scripts/kallsyms ${kallsymopt} "${1}" > "${2}.S"
--
2.46.0.792.g87dc391469-goog
^ permalink raw reply related [flat|nested] 73+ messages in thread
* [RFC PATCH 09/28] x86/tools: Remove special relocation handling for per-CPU variables
2024-09-25 15:01 [RFC PATCH 00/28] x86: Rely on toolchain for relocatable code Ard Biesheuvel
` (7 preceding siblings ...)
2024-09-25 15:01 ` [RFC PATCH 08/28] scripts/kallsyms: Remove support for absolute per-CPU variables Ard Biesheuvel
@ 2024-09-25 15:01 ` Ard Biesheuvel
2024-09-25 15:01 ` [RFC PATCH 10/28] x86/xen: Avoid relocatable quantities in Xen ELF notes Ard Biesheuvel
` (18 subsequent siblings)
27 siblings, 0 replies; 73+ messages in thread
From: Ard Biesheuvel @ 2024-09-25 15:01 UTC (permalink / raw)
To: linux-kernel
Cc: Ard Biesheuvel, x86, H. Peter Anvin, Andy Lutomirski,
Peter Zijlstra, Uros Bizjak, Dennis Zhou, Tejun Heo,
Christoph Lameter, Mathieu Desnoyers, Paolo Bonzini,
Vitaly Kuznetsov, Juergen Gross, Boris Ostrovsky,
Greg Kroah-Hartman, Arnd Bergmann, Masahiro Yamada, Kees Cook,
Nathan Chancellor, Keith Packard, Justin Stitt, Josh Poimboeuf,
Arnaldo Carvalho de Melo, Namhyung Kim, Jiri Olsa, Ian Rogers,
Adrian Hunter, Kan Liang, linux-doc, linux-pm, kvm, xen-devel,
linux-efi, linux-arch, linux-sparse, linux-kbuild,
linux-perf-users, rust-for-linux, llvm
From: Ard Biesheuvel <ardb@kernel.org>
Due to the placement of per-CPU variables in a special, 0x0 based
disjoint memory segment in the ELF binary, the KASLR relocation tool
needed to perform special processing for references to such variables,
as they were not affected by KASLR displacement.
This meant that absolute references could be ignored, and RIP-relative
references had to be compensated for KASLR, by applying the same offset
but negated.
None of this is necessary any longer, so remove this handling from the
relocation host tool.
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
---
arch/x86/boot/compressed/misc.c | 14 +--
arch/x86/tools/relocs.c | 130 +-------------------
2 files changed, 2 insertions(+), 142 deletions(-)
diff --git a/arch/x86/boot/compressed/misc.c b/arch/x86/boot/compressed/misc.c
index 04a35b2c26e9..89f01375cdb7 100644
--- a/arch/x86/boot/compressed/misc.c
+++ b/arch/x86/boot/compressed/misc.c
@@ -235,7 +235,7 @@ static void handle_relocations(void *output, unsigned long output_len,
/*
* Process relocations: 32 bit relocations first then 64 bit after.
- * Three sets of binary relocations are added to the end of the kernel
+ * Two sets of binary relocations are added to the end of the kernel
* before compression. Each relocation table entry is the kernel
* address of the location which needs to be updated stored as a
* 32-bit value which is sign extended to 64 bits.
@@ -245,8 +245,6 @@ static void handle_relocations(void *output, unsigned long output_len,
* kernel bits...
* 0 - zero terminator for 64 bit relocations
* 64 bit relocation repeated
- * 0 - zero terminator for inverse 32 bit relocations
- * 32 bit inverse relocation repeated
* 0 - zero terminator for 32 bit relocations
* 32 bit relocation repeated
*
@@ -267,16 +265,6 @@ static void handle_relocations(void *output, unsigned long output_len,
long extended = *reloc;
extended += map;
- ptr = (unsigned long)extended;
- if (ptr < min_addr || ptr > max_addr)
- error("inverse 32-bit relocation outside of kernel!\n");
-
- *(int32_t *)ptr -= delta;
- }
- for (reloc--; *reloc; reloc--) {
- long extended = *reloc;
- extended += map;
-
ptr = (unsigned long)extended;
if (ptr < min_addr || ptr > max_addr)
error("64-bit relocation outside of kernel!\n");
diff --git a/arch/x86/tools/relocs.c b/arch/x86/tools/relocs.c
index 10add45b99f1..942c029a5067 100644
--- a/arch/x86/tools/relocs.c
+++ b/arch/x86/tools/relocs.c
@@ -29,7 +29,6 @@ static struct relocs relocs16;
static struct relocs relocs32;
#if ELF_BITS == 64
-static struct relocs relocs32neg;
static struct relocs relocs64;
# define FMT PRIu64
#else
@@ -287,34 +286,6 @@ static const char *sym_name(const char *sym_strtab, Elf_Sym *sym)
return name;
}
-static Elf_Sym *sym_lookup(const char *symname)
-{
- int i;
-
- for (i = 0; i < shnum; i++) {
- struct section *sec = &secs[i];
- long nsyms;
- const char *strtab;
- Elf_Sym *symtab;
- Elf_Sym *sym;
-
- if (sec->shdr.sh_type != SHT_SYMTAB)
- continue;
-
- nsyms = sec->shdr.sh_size/sizeof(Elf_Sym);
- symtab = sec->symtab;
- strtab = sec->link->strtab;
-
- for (sym = symtab; --nsyms >= 0; sym++) {
- if (!sym->st_name)
- continue;
- if (strcmp(symname, strtab + sym->st_name) == 0)
- return sym;
- }
- }
- return 0;
-}
-
#if BYTE_ORDER == LITTLE_ENDIAN
# define le16_to_cpu(val) (val)
# define le32_to_cpu(val) (val)
@@ -722,79 +693,8 @@ static void walk_relocs(int (*process)(struct section *sec, Elf_Rel *rel,
}
}
-/*
- * The .data..percpu section is a special case for x86_64 SMP kernels.
- * It is used to initialize the actual per_cpu areas and to provide
- * definitions for the per_cpu variables that correspond to their offsets
- * within the percpu area. Since the values of all of the symbols need
- * to be offsets from the start of the per_cpu area the virtual address
- * (sh_addr) of .data..percpu is 0 in SMP kernels.
- *
- * This means that:
- *
- * Relocations that reference symbols in the per_cpu area do not
- * need further relocation (since the value is an offset relative
- * to the start of the per_cpu area that does not change).
- *
- * Relocations that apply to the per_cpu area need to have their
- * offset adjusted by by the value of __per_cpu_load to make them
- * point to the correct place in the loaded image (because the
- * virtual address of .data..percpu is 0).
- *
- * For non SMP kernels .data..percpu is linked as part of the normal
- * kernel data and does not require special treatment.
- *
- */
-static int per_cpu_shndx = -1;
-static Elf_Addr per_cpu_load_addr;
-
-static void percpu_init(void)
-{
- int i;
-
- for (i = 0; i < shnum; i++) {
- ElfW(Sym) *sym;
-
- if (strcmp(sec_name(i), ".data..percpu"))
- continue;
-
- if (secs[i].shdr.sh_addr != 0) /* non SMP kernel */
- return;
-
- sym = sym_lookup("__per_cpu_load");
- if (!sym)
- die("can't find __per_cpu_load\n");
-
- per_cpu_shndx = i;
- per_cpu_load_addr = sym->st_value;
-
- return;
- }
-}
-
#if ELF_BITS == 64
-/*
- * Check to see if a symbol lies in the .data..percpu section.
- *
- * The linker incorrectly associates some symbols with the
- * .data..percpu section so we also need to check the symbol
- * name to make sure that we classify the symbol correctly.
- *
- * The GNU linker incorrectly associates:
- * __init_begin
- * __per_cpu_load
- */
-static int is_percpu_sym(ElfW(Sym) *sym, const char *symname)
-{
- int shndx = sym_index(sym);
-
- return (shndx == per_cpu_shndx) &&
- strcmp(symname, "__init_begin") &&
- strcmp(symname, "__per_cpu_load");
-}
-
-
static int do_reloc64(struct section *sec, Elf_Rel *rel, ElfW(Sym) *sym,
const char *symname)
{
@@ -805,12 +705,6 @@ static int do_reloc64(struct section *sec, Elf_Rel *rel, ElfW(Sym) *sym,
if (sym->st_shndx == SHN_UNDEF)
return 0;
- /*
- * Adjust the offset if this reloc applies to the percpu section.
- */
- if (sec->shdr.sh_info == per_cpu_shndx)
- offset += per_cpu_load_addr;
-
switch (r_type) {
case R_X86_64_NONE:
/* NONE can be ignored. */
@@ -819,33 +713,22 @@ static int do_reloc64(struct section *sec, Elf_Rel *rel, ElfW(Sym) *sym,
case R_X86_64_PC32:
case R_X86_64_PLT32:
/*
- * PC relative relocations don't need to be adjusted unless
- * referencing a percpu symbol.
+ * PC relative relocations don't need to be adjusted.
*
* NB: R_X86_64_PLT32 can be treated as R_X86_64_PC32.
*/
- if (is_percpu_sym(sym, symname))
- add_reloc(&relocs32neg, offset);
break;
case R_X86_64_PC64:
/*
* Only used by jump labels
*/
- if (is_percpu_sym(sym, symname))
- die("Invalid R_X86_64_PC64 relocation against per-CPU symbol %s\n", symname);
break;
case R_X86_64_32:
case R_X86_64_32S:
case R_X86_64_64:
case R_X86_64_GOTPCREL:
- /*
- * References to the percpu area don't need to be adjusted.
- */
- if (is_percpu_sym(sym, symname))
- break;
-
if (shn_abs) {
/*
* Whitelisted absolute symbols do not require
@@ -1076,7 +959,6 @@ static void emit_relocs(int as_text, int use_real_mode)
/* Order the relocations for more efficient processing */
sort_relocs(&relocs32);
#if ELF_BITS == 64
- sort_relocs(&relocs32neg);
sort_relocs(&relocs64);
#else
sort_relocs(&relocs16);
@@ -1109,13 +991,6 @@ static void emit_relocs(int as_text, int use_real_mode)
for (i = 0; i < relocs64.count; i++)
if (!i || relocs64.offset[i] != relocs64.offset[i - 1])
write_reloc(relocs64.offset[i], stdout);
-
- /* Print a stop */
- write_reloc(0, stdout);
-
- /* Now print each inverse 32-bit relocation */
- for (i = 0; i < relocs32neg.count; i++)
- write_reloc(relocs32neg.offset[i], stdout);
#endif
/* Print a stop */
@@ -1180,9 +1055,6 @@ void process(FILE *fp, int use_real_mode, int as_text,
read_symtabs();
read_relocs();
- if (ELF_BITS == 64)
- percpu_init();
-
if (show_absolute_syms) {
print_absolute_symbols();
return;
--
2.46.0.792.g87dc391469-goog
^ permalink raw reply related [flat|nested] 73+ messages in thread
* [RFC PATCH 10/28] x86/xen: Avoid relocatable quantities in Xen ELF notes
2024-09-25 15:01 [RFC PATCH 00/28] x86: Rely on toolchain for relocatable code Ard Biesheuvel
` (8 preceding siblings ...)
2024-09-25 15:01 ` [RFC PATCH 09/28] x86/tools: Remove special relocation handling for " Ard Biesheuvel
@ 2024-09-25 15:01 ` Ard Biesheuvel
2024-09-25 15:01 ` [RFC PATCH 11/28] x86/pvh: Avoid absolute symbol references in .head.text Ard Biesheuvel
` (17 subsequent siblings)
27 siblings, 0 replies; 73+ messages in thread
From: Ard Biesheuvel @ 2024-09-25 15:01 UTC (permalink / raw)
To: linux-kernel
Cc: Ard Biesheuvel, x86, H. Peter Anvin, Andy Lutomirski,
Peter Zijlstra, Uros Bizjak, Dennis Zhou, Tejun Heo,
Christoph Lameter, Mathieu Desnoyers, Paolo Bonzini,
Vitaly Kuznetsov, Juergen Gross, Boris Ostrovsky,
Greg Kroah-Hartman, Arnd Bergmann, Masahiro Yamada, Kees Cook,
Nathan Chancellor, Keith Packard, Justin Stitt, Josh Poimboeuf,
Arnaldo Carvalho de Melo, Namhyung Kim, Jiri Olsa, Ian Rogers,
Adrian Hunter, Kan Liang, linux-doc, linux-pm, kvm, xen-devel,
linux-efi, linux-arch, linux-sparse, linux-kbuild,
linux-perf-users, rust-for-linux, llvm
From: Ard Biesheuvel <ardb@kernel.org>
Xen puts virtual and physical addresses into ELF notes that are treated
by the linker as relocatable by default. Doing so is not only pointless,
given that the ELF notes are only intended for consumption by Xen before
the kernel boots. It is also a KASLR leak, given that the kernel's ELF
notes are exposed via the world readable /sys/kernel/notes.
So emit these constants in a way that prevents the linker from marking
them as relocatable. This involves place-relative relocations (which
subtract their own virtual address from the symbol value) and linker
provided absolute symbols that add the address of the place to the
desired value.
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
---
arch/x86/kernel/vmlinux.lds.S | 13 +++++++++++++
arch/x86/platform/pvh/head.S | 6 +++---
arch/x86/tools/relocs.c | 1 +
arch/x86/xen/xen-head.S | 6 ++++--
4 files changed, 21 insertions(+), 5 deletions(-)
diff --git a/arch/x86/kernel/vmlinux.lds.S b/arch/x86/kernel/vmlinux.lds.S
index 00f82db7b3e1..52b8db931d0f 100644
--- a/arch/x86/kernel/vmlinux.lds.S
+++ b/arch/x86/kernel/vmlinux.lds.S
@@ -111,6 +111,19 @@ PHDRS {
SECTIONS
{
. = __START_KERNEL;
+
+#ifdef CONFIG_XEN_PV
+xen_elfnote_entry_offset =
+ ABSOLUTE(xen_elfnote_entry) + ABSOLUTE(startup_xen);
+xen_elfnote_hypercall_page_offset =
+ ABSOLUTE(xen_elfnote_hypercall_page) + ABSOLUTE(hypercall_page);
+#endif
+
+#ifdef CONFIG_PVH
+xen_elfnote_phys32_entry_offset =
+ ABSOLUTE(xen_elfnote_phys32_entry) + ABSOLUTE(pvh_start_xen - LOAD_OFFSET);
+#endif
+
#ifdef CONFIG_X86_32
phys_startup_32 = ABSOLUTE(startup_32 - LOAD_OFFSET);
#else
diff --git a/arch/x86/platform/pvh/head.S b/arch/x86/platform/pvh/head.S
index 11245ecdc08d..adbf57e83e4e 100644
--- a/arch/x86/platform/pvh/head.S
+++ b/arch/x86/platform/pvh/head.S
@@ -50,7 +50,7 @@
#define PVH_CS_SEL (PVH_GDT_ENTRY_CS * 8)
#define PVH_DS_SEL (PVH_GDT_ENTRY_DS * 8)
-SYM_CODE_START_LOCAL(pvh_start_xen)
+SYM_CODE_START(pvh_start_xen)
UNWIND_HINT_END_OF_STACK
cld
@@ -165,5 +165,5 @@ SYM_DATA_START_LOCAL(early_stack)
.fill BOOT_STACK_SIZE, 1, 0
SYM_DATA_END_LABEL(early_stack, SYM_L_LOCAL, early_stack_end)
- ELFNOTE(Xen, XEN_ELFNOTE_PHYS32_ENTRY,
- _ASM_PTR (pvh_start_xen - __START_KERNEL_map))
+ ELFNOTE(Xen, XEN_ELFNOTE_PHYS32_ENTRY, .global xen_elfnote_phys32_entry;
+ xen_elfnote_phys32_entry: _ASM_PTR xen_elfnote_phys32_entry_offset - .)
diff --git a/arch/x86/tools/relocs.c b/arch/x86/tools/relocs.c
index 942c029a5067..22c2d3f07a57 100644
--- a/arch/x86/tools/relocs.c
+++ b/arch/x86/tools/relocs.c
@@ -57,6 +57,7 @@ static const char * const sym_regex_kernel[S_NSYMTYPES] = {
[S_ABS] =
"^(xen_irq_disable_direct_reloc$|"
"xen_save_fl_direct_reloc$|"
+ "xen_elfnote_.+_offset$|"
"VDSO|"
"__kcfi_typeid_|"
"__crc_)",
diff --git a/arch/x86/xen/xen-head.S b/arch/x86/xen/xen-head.S
index faadac7c29e6..4d246a48a85f 100644
--- a/arch/x86/xen/xen-head.S
+++ b/arch/x86/xen/xen-head.S
@@ -88,7 +88,8 @@ SYM_CODE_END(xen_cpu_bringup_again)
ELFNOTE(Xen, XEN_ELFNOTE_VIRT_BASE, _ASM_PTR __START_KERNEL_map)
/* Map the p2m table to a 512GB-aligned user address. */
ELFNOTE(Xen, XEN_ELFNOTE_INIT_P2M, .quad (PUD_SIZE * PTRS_PER_PUD))
- ELFNOTE(Xen, XEN_ELFNOTE_ENTRY, _ASM_PTR startup_xen)
+ ELFNOTE(Xen, XEN_ELFNOTE_ENTRY, .globl xen_elfnote_entry;
+ xen_elfnote_entry: _ASM_PTR xen_elfnote_entry_offset - .)
ELFNOTE(Xen, XEN_ELFNOTE_FEATURES, .ascii "!writable_page_tables")
ELFNOTE(Xen, XEN_ELFNOTE_PAE_MODE, .asciz "yes")
ELFNOTE(Xen, XEN_ELFNOTE_L1_MFN_VALID,
@@ -109,7 +110,8 @@ SYM_CODE_END(xen_cpu_bringup_again)
#else
# define FEATURES_DOM0 0
#endif
- ELFNOTE(Xen, XEN_ELFNOTE_HYPERCALL_PAGE, _ASM_PTR hypercall_page)
+ ELFNOTE(Xen, XEN_ELFNOTE_HYPERCALL_PAGE, .globl xen_elfnote_hypercall_page;
+ xen_elfnote_hypercall_page: _ASM_PTR xen_elfnote_hypercall_page_offset - .)
ELFNOTE(Xen, XEN_ELFNOTE_SUPPORTED_FEATURES,
.long FEATURES_PV | FEATURES_PVH | FEATURES_DOM0)
ELFNOTE(Xen, XEN_ELFNOTE_LOADER, .asciz "generic")
--
2.46.0.792.g87dc391469-goog
^ permalink raw reply related [flat|nested] 73+ messages in thread
* [RFC PATCH 11/28] x86/pvh: Avoid absolute symbol references in .head.text
2024-09-25 15:01 [RFC PATCH 00/28] x86: Rely on toolchain for relocatable code Ard Biesheuvel
` (9 preceding siblings ...)
2024-09-25 15:01 ` [RFC PATCH 10/28] x86/xen: Avoid relocatable quantities in Xen ELF notes Ard Biesheuvel
@ 2024-09-25 15:01 ` Ard Biesheuvel
2024-09-25 21:10 ` Jason Andryuk
2024-09-25 15:01 ` [RFC PATCH 12/28] x86/pm-trace: Use RIP-relative accesses for .tracedata Ard Biesheuvel
` (16 subsequent siblings)
27 siblings, 1 reply; 73+ messages in thread
From: Ard Biesheuvel @ 2024-09-25 15:01 UTC (permalink / raw)
To: linux-kernel
Cc: Ard Biesheuvel, x86, H. Peter Anvin, Andy Lutomirski,
Peter Zijlstra, Uros Bizjak, Dennis Zhou, Tejun Heo,
Christoph Lameter, Mathieu Desnoyers, Paolo Bonzini,
Vitaly Kuznetsov, Juergen Gross, Boris Ostrovsky,
Greg Kroah-Hartman, Arnd Bergmann, Masahiro Yamada, Kees Cook,
Nathan Chancellor, Keith Packard, Justin Stitt, Josh Poimboeuf,
Arnaldo Carvalho de Melo, Namhyung Kim, Jiri Olsa, Ian Rogers,
Adrian Hunter, Kan Liang, linux-doc, linux-pm, kvm, xen-devel,
linux-efi, linux-arch, linux-sparse, linux-kbuild,
linux-perf-users, rust-for-linux, llvm
From: Ard Biesheuvel <ardb@kernel.org>
The .head.text section contains code that may execute from a different
address than it was linked at. This is fragile, given that the x86 ABI
can refer to global symbols via absolute or relative references, and the
toolchain assumes that these are interchangeable, which they are not in
this particular case.
In the case of the PVH code, there are some additional complications:
- the absolute references are in 32-bit code, which get emitted with
R_X86_64_32 relocations, and these are not permitted in PIE code;
- the code in question is not actually relocatable: it can only run
correctly from the physical load address specified in the ELF note.
So rewrite the code to only rely on relative symbol references: these
are always 32-bits wide, even in 64-bit code, and are resolved by the
linker at build time.
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
---
arch/x86/platform/pvh/head.S | 39 ++++++++++++++------
1 file changed, 27 insertions(+), 12 deletions(-)
diff --git a/arch/x86/platform/pvh/head.S b/arch/x86/platform/pvh/head.S
index adbf57e83e4e..e6cb7da40e09 100644
--- a/arch/x86/platform/pvh/head.S
+++ b/arch/x86/platform/pvh/head.S
@@ -54,7 +54,20 @@ SYM_CODE_START(pvh_start_xen)
UNWIND_HINT_END_OF_STACK
cld
- lgdt (_pa(gdt))
+ /*
+ * This is position dependent code that can only execute correctly from
+ * the physical address that the kernel was linked to run at. Use the
+ * symbols emitted for the ELF note to construct the build time physical
+ * address of pvh_start_xen(), without relying on absolute 32-bit ELF
+ * relocations, as these are not supported by the linker when running in
+ * -pie mode, and should be avoided in .head.text in general.
+ */
+0: mov $xen_elfnote_phys32_entry_offset - 0b, %ebp
+ sub $xen_elfnote_phys32_entry - 0b, %ebp
+
+ lea (gdt - pvh_start_xen)(%ebp), %eax
+ add %eax, 2(%eax)
+ lgdt (%eax)
mov $PVH_DS_SEL,%eax
mov %eax,%ds
@@ -62,14 +75,14 @@ SYM_CODE_START(pvh_start_xen)
mov %eax,%ss
/* Stash hvm_start_info. */
- mov $_pa(pvh_start_info), %edi
+ lea (pvh_start_info - pvh_start_xen)(%ebp), %edi
mov %ebx, %esi
- mov _pa(pvh_start_info_sz), %ecx
+ mov (pvh_start_info_sz - pvh_start_xen)(%ebp), %ecx
shr $2,%ecx
rep
movsl
- mov $_pa(early_stack_end), %esp
+ lea (early_stack_end - pvh_start_xen)(%ebp), %esp
/* Enable PAE mode. */
mov %cr4, %eax
@@ -84,17 +97,21 @@ SYM_CODE_START(pvh_start_xen)
wrmsr
/* Enable pre-constructed page tables. */
- mov $_pa(init_top_pgt), %eax
+ lea (init_top_pgt - pvh_start_xen)(%ebp), %eax
mov %eax, %cr3
mov $(X86_CR0_PG | X86_CR0_PE), %eax
mov %eax, %cr0
/* Jump to 64-bit mode. */
- ljmp $PVH_CS_SEL, $_pa(1f)
+ lea (1f - pvh_start_xen)(%ebp), %eax
+ push $PVH_CS_SEL
+ push %eax
+ lret
/* 64-bit entry point. */
.code64
1:
+ UNWIND_HINT_END_OF_STACK
/* Clear %gs so early per-CPU references target the per-CPU load area */
mov $MSR_GS_BASE,%ecx
xor %eax, %eax
@@ -108,10 +125,8 @@ SYM_CODE_START(pvh_start_xen)
call *%rax
/* startup_64 expects boot_params in %rsi. */
- mov $_pa(pvh_bootparams), %rsi
- mov $_pa(startup_64), %rax
- ANNOTATE_RETPOLINE_SAFE
- jmp *%rax
+ lea pvh_bootparams(%rip), %rsi
+ jmp startup_64
#else /* CONFIG_X86_64 */
@@ -146,8 +161,8 @@ SYM_CODE_END(pvh_start_xen)
.section ".init.data","aw"
.balign 8
SYM_DATA_START_LOCAL(gdt)
- .word gdt_end - gdt_start
- .long _pa(gdt_start)
+ .word gdt_end - gdt_start - 1
+ .long gdt_start - gdt
.word 0
SYM_DATA_END(gdt)
SYM_DATA_START_LOCAL(gdt_start)
--
2.46.0.792.g87dc391469-goog
^ permalink raw reply related [flat|nested] 73+ messages in thread
* [RFC PATCH 12/28] x86/pm-trace: Use RIP-relative accesses for .tracedata
2024-09-25 15:01 [RFC PATCH 00/28] x86: Rely on toolchain for relocatable code Ard Biesheuvel
` (10 preceding siblings ...)
2024-09-25 15:01 ` [RFC PATCH 11/28] x86/pvh: Avoid absolute symbol references in .head.text Ard Biesheuvel
@ 2024-09-25 15:01 ` Ard Biesheuvel
2024-09-25 15:01 ` [RFC PATCH 13/28] x86/kvm: Use RIP-relative addressing Ard Biesheuvel
` (15 subsequent siblings)
27 siblings, 0 replies; 73+ messages in thread
From: Ard Biesheuvel @ 2024-09-25 15:01 UTC (permalink / raw)
To: linux-kernel
Cc: Ard Biesheuvel, x86, H. Peter Anvin, Andy Lutomirski,
Peter Zijlstra, Uros Bizjak, Dennis Zhou, Tejun Heo,
Christoph Lameter, Mathieu Desnoyers, Paolo Bonzini,
Vitaly Kuznetsov, Juergen Gross, Boris Ostrovsky,
Greg Kroah-Hartman, Arnd Bergmann, Masahiro Yamada, Kees Cook,
Nathan Chancellor, Keith Packard, Justin Stitt, Josh Poimboeuf,
Arnaldo Carvalho de Melo, Namhyung Kim, Jiri Olsa, Ian Rogers,
Adrian Hunter, Kan Liang, linux-doc, linux-pm, kvm, xen-devel,
linux-efi, linux-arch, linux-sparse, linux-kbuild,
linux-perf-users, rust-for-linux, llvm
From: Ard Biesheuvel <ardb@kernel.org>
Use RIP-relative accesses and 32-bit offsets for .tracedata, to avoid
the need for relocation fixups at boot time.
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
---
arch/x86/include/asm/pm-trace.h | 4 ++--
drivers/base/power/trace.c | 6 +++---
2 files changed, 5 insertions(+), 5 deletions(-)
diff --git a/arch/x86/include/asm/pm-trace.h b/arch/x86/include/asm/pm-trace.h
index bfa32aa428e5..123faf978473 100644
--- a/arch/x86/include/asm/pm-trace.h
+++ b/arch/x86/include/asm/pm-trace.h
@@ -8,10 +8,10 @@
do { \
if (pm_trace_enabled) { \
const void *tracedata; \
- asm volatile(_ASM_MOV " $1f,%0\n" \
+ asm volatile("lea " _ASM_RIP(1f) ", %0\n" \
".section .tracedata,\"a\"\n" \
"1:\t.word %c1\n\t" \
- _ASM_PTR " %c2\n" \
+ ".long %c2 - .\n" \
".previous" \
:"=r" (tracedata) \
: "i" (__LINE__), "i" (__FILE__)); \
diff --git a/drivers/base/power/trace.c b/drivers/base/power/trace.c
index cd6e559648b2..686a0276ccfc 100644
--- a/drivers/base/power/trace.c
+++ b/drivers/base/power/trace.c
@@ -167,7 +167,7 @@ EXPORT_SYMBOL(set_trace_device);
void generate_pm_trace(const void *tracedata, unsigned int user)
{
unsigned short lineno = *(unsigned short *)tracedata;
- const char *file = *(const char **)(tracedata + 2);
+ const char *file = offset_to_ptr((int *)(tracedata + 2));
unsigned int user_hash_value, file_hash_value;
if (!x86_platform.legacy.rtc)
@@ -187,9 +187,9 @@ static int show_file_hash(unsigned int value)
match = 0;
for (tracedata = __tracedata_start ; tracedata < __tracedata_end ;
- tracedata += 2 + sizeof(unsigned long)) {
+ tracedata += 2 + sizeof(int)) {
unsigned short lineno = *(unsigned short *)tracedata;
- const char *file = *(const char **)(tracedata + 2);
+ const char *file = offset_to_ptr((int *)(tracedata + 2));
unsigned int hash = hash_string(lineno, file, FILEHASH);
if (hash != value)
continue;
--
2.46.0.792.g87dc391469-goog
^ permalink raw reply related [flat|nested] 73+ messages in thread
* [RFC PATCH 13/28] x86/kvm: Use RIP-relative addressing
2024-09-25 15:01 [RFC PATCH 00/28] x86: Rely on toolchain for relocatable code Ard Biesheuvel
` (11 preceding siblings ...)
2024-09-25 15:01 ` [RFC PATCH 12/28] x86/pm-trace: Use RIP-relative accesses for .tracedata Ard Biesheuvel
@ 2024-09-25 15:01 ` Ard Biesheuvel
2024-09-25 15:01 ` [RFC PATCH 14/28] x86/rethook: Use RIP-relative reference for return address Ard Biesheuvel
` (14 subsequent siblings)
27 siblings, 0 replies; 73+ messages in thread
From: Ard Biesheuvel @ 2024-09-25 15:01 UTC (permalink / raw)
To: linux-kernel
Cc: Ard Biesheuvel, x86, H. Peter Anvin, Andy Lutomirski,
Peter Zijlstra, Uros Bizjak, Dennis Zhou, Tejun Heo,
Christoph Lameter, Mathieu Desnoyers, Paolo Bonzini,
Vitaly Kuznetsov, Juergen Gross, Boris Ostrovsky,
Greg Kroah-Hartman, Arnd Bergmann, Masahiro Yamada, Kees Cook,
Nathan Chancellor, Keith Packard, Justin Stitt, Josh Poimboeuf,
Arnaldo Carvalho de Melo, Namhyung Kim, Jiri Olsa, Ian Rogers,
Adrian Hunter, Kan Liang, linux-doc, linux-pm, kvm, xen-devel,
linux-efi, linux-arch, linux-sparse, linux-kbuild,
linux-perf-users, rust-for-linux, llvm
From: Ard Biesheuvel <ardb@kernel.org>
Avoid absolute references in code, which require fixing up at boot time,
and replace them with RIP-relative ones. In this particular case, due to
the register pressure, they cannot be avoided entirely, so one absolute
reference is retained but the resulting reference via the GOT is
compatible with running the linker in PIE mode.
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
---
arch/x86/kernel/kvm.c | 8 +++++---
1 file changed, 5 insertions(+), 3 deletions(-)
diff --git a/arch/x86/kernel/kvm.c b/arch/x86/kernel/kvm.c
index 263f8aed4e2c..8eac209a31aa 100644
--- a/arch/x86/kernel/kvm.c
+++ b/arch/x86/kernel/kvm.c
@@ -800,9 +800,11 @@ extern bool __raw_callee_save___kvm_vcpu_is_preempted(long);
* Hand-optimize version for x86-64 to avoid 8 64-bit register saving and
* restoring to/from the stack.
*/
-#define PV_VCPU_PREEMPTED_ASM \
- "movq __per_cpu_offset(,%rdi,8), %rax\n\t" \
- "cmpb $0, " __stringify(KVM_STEAL_TIME_preempted) "+steal_time(%rax)\n\t" \
+#define PV_VCPU_PREEMPTED_ASM \
+ "leaq __per_cpu_offset(%rip), %rax \n\t" \
+ "movq (%rax,%rdi,8), %rax \n\t" \
+ "addq steal_time@GOTPCREL(%rip), %rax \n\t" \
+ "cmpb $0, " __stringify(KVM_STEAL_TIME_preempted) "(%rax) \n\t" \
"setne %al\n\t"
DEFINE_ASM_FUNC(__raw_callee_save___kvm_vcpu_is_preempted,
--
2.46.0.792.g87dc391469-goog
^ permalink raw reply related [flat|nested] 73+ messages in thread
* [RFC PATCH 14/28] x86/rethook: Use RIP-relative reference for return address
2024-09-25 15:01 [RFC PATCH 00/28] x86: Rely on toolchain for relocatable code Ard Biesheuvel
` (12 preceding siblings ...)
2024-09-25 15:01 ` [RFC PATCH 13/28] x86/kvm: Use RIP-relative addressing Ard Biesheuvel
@ 2024-09-25 15:01 ` Ard Biesheuvel
2024-09-25 16:39 ` Linus Torvalds
2024-09-25 15:01 ` [RFC PATCH 15/28] x86/sync_core: Use RIP-relative addressing Ard Biesheuvel
` (13 subsequent siblings)
27 siblings, 1 reply; 73+ messages in thread
From: Ard Biesheuvel @ 2024-09-25 15:01 UTC (permalink / raw)
To: linux-kernel
Cc: Ard Biesheuvel, x86, H. Peter Anvin, Andy Lutomirski,
Peter Zijlstra, Uros Bizjak, Dennis Zhou, Tejun Heo,
Christoph Lameter, Mathieu Desnoyers, Paolo Bonzini,
Vitaly Kuznetsov, Juergen Gross, Boris Ostrovsky,
Greg Kroah-Hartman, Arnd Bergmann, Masahiro Yamada, Kees Cook,
Nathan Chancellor, Keith Packard, Justin Stitt, Josh Poimboeuf,
Arnaldo Carvalho de Melo, Namhyung Kim, Jiri Olsa, Ian Rogers,
Adrian Hunter, Kan Liang, linux-doc, linux-pm, kvm, xen-devel,
linux-efi, linux-arch, linux-sparse, linux-kbuild,
linux-perf-users, rust-for-linux, llvm
From: Ard Biesheuvel <ardb@kernel.org>
Instead of pushing an immediate absolute address, which is incompatible
with PIE codegen or linking, use a LEA instruction to take the address
into a register.
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
---
arch/x86/kernel/rethook.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/arch/x86/kernel/rethook.c b/arch/x86/kernel/rethook.c
index 8a1c0111ae79..3b3c17ba3cd5 100644
--- a/arch/x86/kernel/rethook.c
+++ b/arch/x86/kernel/rethook.c
@@ -27,7 +27,8 @@ asm(
#ifdef CONFIG_X86_64
ANNOTATE_NOENDBR /* This is only jumped from ret instruction */
/* Push a fake return address to tell the unwinder it's a rethook. */
- " pushq $arch_rethook_trampoline\n"
+ " leaq arch_rethook_trampoline(%rip), %rdi\n"
+ " pushq %rdi\n"
UNWIND_HINT_FUNC
" pushq $" __stringify(__KERNEL_DS) "\n"
/* Save the 'sp - 16', this will be fixed later. */
--
2.46.0.792.g87dc391469-goog
^ permalink raw reply related [flat|nested] 73+ messages in thread
* [RFC PATCH 15/28] x86/sync_core: Use RIP-relative addressing
2024-09-25 15:01 [RFC PATCH 00/28] x86: Rely on toolchain for relocatable code Ard Biesheuvel
` (13 preceding siblings ...)
2024-09-25 15:01 ` [RFC PATCH 14/28] x86/rethook: Use RIP-relative reference for return address Ard Biesheuvel
@ 2024-09-25 15:01 ` Ard Biesheuvel
2024-09-25 15:01 ` [RFC PATCH 16/28] x86/entry_64: " Ard Biesheuvel
` (12 subsequent siblings)
27 siblings, 0 replies; 73+ messages in thread
From: Ard Biesheuvel @ 2024-09-25 15:01 UTC (permalink / raw)
To: linux-kernel
Cc: Ard Biesheuvel, x86, H. Peter Anvin, Andy Lutomirski,
Peter Zijlstra, Uros Bizjak, Dennis Zhou, Tejun Heo,
Christoph Lameter, Mathieu Desnoyers, Paolo Bonzini,
Vitaly Kuznetsov, Juergen Gross, Boris Ostrovsky,
Greg Kroah-Hartman, Arnd Bergmann, Masahiro Yamada, Kees Cook,
Nathan Chancellor, Keith Packard, Justin Stitt, Josh Poimboeuf,
Arnaldo Carvalho de Melo, Namhyung Kim, Jiri Olsa, Ian Rogers,
Adrian Hunter, Kan Liang, linux-doc, linux-pm, kvm, xen-devel,
linux-efi, linux-arch, linux-sparse, linux-kbuild,
linux-perf-users, rust-for-linux, llvm
From: Ard Biesheuvel <ardb@kernel.org>
Use RIP-relative accesses and avoid fixups at runtime.
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
---
arch/x86/include/asm/sync_core.h | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/arch/x86/include/asm/sync_core.h b/arch/x86/include/asm/sync_core.h
index ab7382f92aff..cfd2f3bca83b 100644
--- a/arch/x86/include/asm/sync_core.h
+++ b/arch/x86/include/asm/sync_core.h
@@ -31,7 +31,8 @@ static inline void iret_to_self(void)
"pushfq\n\t"
"mov %%cs, %0\n\t"
"pushq %q0\n\t"
- "pushq $1f\n\t"
+ "leaq 1f(%%rip), %q0\n\t"
+ "pushq %q0\n\t"
"iretq\n\t"
"1:"
: "=&r" (tmp), ASM_CALL_CONSTRAINT : : "cc", "memory");
--
2.46.0.792.g87dc391469-goog
^ permalink raw reply related [flat|nested] 73+ messages in thread
* [RFC PATCH 16/28] x86/entry_64: Use RIP-relative addressing
2024-09-25 15:01 [RFC PATCH 00/28] x86: Rely on toolchain for relocatable code Ard Biesheuvel
` (14 preceding siblings ...)
2024-09-25 15:01 ` [RFC PATCH 15/28] x86/sync_core: Use RIP-relative addressing Ard Biesheuvel
@ 2024-09-25 15:01 ` Ard Biesheuvel
2024-09-25 15:01 ` [RFC PATCH 17/28] x86/hibernate: Prefer RIP-relative accesses Ard Biesheuvel
` (11 subsequent siblings)
27 siblings, 0 replies; 73+ messages in thread
From: Ard Biesheuvel @ 2024-09-25 15:01 UTC (permalink / raw)
To: linux-kernel
Cc: Ard Biesheuvel, x86, H. Peter Anvin, Andy Lutomirski,
Peter Zijlstra, Uros Bizjak, Dennis Zhou, Tejun Heo,
Christoph Lameter, Mathieu Desnoyers, Paolo Bonzini,
Vitaly Kuznetsov, Juergen Gross, Boris Ostrovsky,
Greg Kroah-Hartman, Arnd Bergmann, Masahiro Yamada, Kees Cook,
Nathan Chancellor, Keith Packard, Justin Stitt, Josh Poimboeuf,
Arnaldo Carvalho de Melo, Namhyung Kim, Jiri Olsa, Ian Rogers,
Adrian Hunter, Kan Liang, linux-doc, linux-pm, kvm, xen-devel,
linux-efi, linux-arch, linux-sparse, linux-kbuild,
linux-perf-users, rust-for-linux, llvm
From: Ard Biesheuvel <ardb@kernel.org>
Fix up a couple of occurrences in the x86_64 entry code where we take
the absolute address of a symbol while we could use RIP-relative
addressing just the same. This avoids relocation fixups at boot for
these quantities.
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
---
arch/x86/entry/calling.h | 9 +++++----
arch/x86/entry/entry_64.S | 12 +++++++-----
2 files changed, 12 insertions(+), 9 deletions(-)
diff --git a/arch/x86/entry/calling.h b/arch/x86/entry/calling.h
index ea81770629ee..099da5aaf929 100644
--- a/arch/x86/entry/calling.h
+++ b/arch/x86/entry/calling.h
@@ -375,8 +375,8 @@ For 32-bit we have the following conventions - kernel is built with
.endm
.macro SAVE_AND_SET_GSBASE scratch_reg:req save_reg:req
+ GET_PERCPU_BASE \scratch_reg \save_reg
rdgsbase \save_reg
- GET_PERCPU_BASE \scratch_reg
wrgsbase \scratch_reg
.endm
@@ -412,15 +412,16 @@ For 32-bit we have the following conventions - kernel is built with
* Thus the kernel would consume a guest's TSC_AUX if an NMI arrives
* while running KVM's run loop.
*/
-.macro GET_PERCPU_BASE reg:req
+.macro GET_PERCPU_BASE reg:req scratch:req
LOAD_CPU_AND_NODE_SEG_LIMIT \reg
andq $VDSO_CPUNODE_MASK, \reg
- movq __per_cpu_offset(, \reg, 8), \reg
+ leaq __per_cpu_offset(%rip), \scratch
+ movq (\scratch, \reg, 8), \reg
.endm
#else
-.macro GET_PERCPU_BASE reg:req
+.macro GET_PERCPU_BASE reg:req scratch:req
movq pcpu_unit_offsets(%rip), \reg
.endm
diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S
index 1b5be07f8669..6509e12b6329 100644
--- a/arch/x86/entry/entry_64.S
+++ b/arch/x86/entry/entry_64.S
@@ -1038,7 +1038,8 @@ SYM_CODE_START(error_entry)
movl %ecx, %eax /* zero extend */
cmpq %rax, RIP+8(%rsp)
je .Lbstep_iret
- cmpq $.Lgs_change, RIP+8(%rsp)
+ leaq .Lgs_change(%rip), %rcx
+ cmpq %rcx, RIP+8(%rsp)
jne .Lerror_entry_done_lfence
/*
@@ -1250,10 +1251,10 @@ SYM_CODE_START(asm_exc_nmi)
* the outer NMI.
*/
- movq $repeat_nmi, %rdx
+ leaq repeat_nmi(%rip), %rdx
cmpq 8(%rsp), %rdx
ja 1f
- movq $end_repeat_nmi, %rdx
+ leaq end_repeat_nmi(%rip), %rdx
cmpq 8(%rsp), %rdx
ja nested_nmi_out
1:
@@ -1307,7 +1308,8 @@ nested_nmi:
pushq %rdx
pushfq
pushq $__KERNEL_CS
- pushq $repeat_nmi
+ leaq repeat_nmi(%rip), %rdx
+ pushq %rdx
/* Put stack back */
addq $(6*8), %rsp
@@ -1346,7 +1348,7 @@ first_nmi:
addq $8, (%rsp) /* Fix up RSP */
pushfq /* RFLAGS */
pushq $__KERNEL_CS /* CS */
- pushq $1f /* RIP */
+ pushq 1f@GOTPCREL(%rip) /* RIP */
iretq /* continues at repeat_nmi below */
UNWIND_HINT_IRET_REGS
1:
--
2.46.0.792.g87dc391469-goog
^ permalink raw reply related [flat|nested] 73+ messages in thread
* [RFC PATCH 17/28] x86/hibernate: Prefer RIP-relative accesses
2024-09-25 15:01 [RFC PATCH 00/28] x86: Rely on toolchain for relocatable code Ard Biesheuvel
` (15 preceding siblings ...)
2024-09-25 15:01 ` [RFC PATCH 16/28] x86/entry_64: " Ard Biesheuvel
@ 2024-09-25 15:01 ` Ard Biesheuvel
2024-09-25 15:01 ` [RFC PATCH 18/28] x86/boot/64: Determine VA/PA offset before entering C code Ard Biesheuvel
` (10 subsequent siblings)
27 siblings, 0 replies; 73+ messages in thread
From: Ard Biesheuvel @ 2024-09-25 15:01 UTC (permalink / raw)
To: linux-kernel
Cc: Ard Biesheuvel, x86, H. Peter Anvin, Andy Lutomirski,
Peter Zijlstra, Uros Bizjak, Dennis Zhou, Tejun Heo,
Christoph Lameter, Mathieu Desnoyers, Paolo Bonzini,
Vitaly Kuznetsov, Juergen Gross, Boris Ostrovsky,
Greg Kroah-Hartman, Arnd Bergmann, Masahiro Yamada, Kees Cook,
Nathan Chancellor, Keith Packard, Justin Stitt, Josh Poimboeuf,
Arnaldo Carvalho de Melo, Namhyung Kim, Jiri Olsa, Ian Rogers,
Adrian Hunter, Kan Liang, linux-doc, linux-pm, kvm, xen-devel,
linux-efi, linux-arch, linux-sparse, linux-kbuild,
linux-perf-users, rust-for-linux, llvm
From: Ard Biesheuvel <ardb@kernel.org>
Replace some absolute symbol references with RIP-relative ones, so we
don't need to fix them up at boot.
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
---
arch/x86/power/hibernate_asm_64.S | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/arch/x86/power/hibernate_asm_64.S b/arch/x86/power/hibernate_asm_64.S
index 0a0539e1cc81..1d96a119d29d 100644
--- a/arch/x86/power/hibernate_asm_64.S
+++ b/arch/x86/power/hibernate_asm_64.S
@@ -39,7 +39,7 @@ SYM_FUNC_START(restore_registers)
movq %rax, %cr4; # turn PGE back on
/* We don't restore %rax, it must be 0 anyway */
- movq $saved_context, %rax
+ leaq saved_context(%rip), %rax
movq pt_regs_sp(%rax), %rsp
movq pt_regs_bp(%rax), %rbp
movq pt_regs_si(%rax), %rsi
@@ -70,7 +70,7 @@ SYM_FUNC_START(restore_registers)
SYM_FUNC_END(restore_registers)
SYM_FUNC_START(swsusp_arch_suspend)
- movq $saved_context, %rax
+ leaq saved_context(%rip), %rax
movq %rsp, pt_regs_sp(%rax)
movq %rbp, pt_regs_bp(%rax)
movq %rsi, pt_regs_si(%rax)
--
2.46.0.792.g87dc391469-goog
^ permalink raw reply related [flat|nested] 73+ messages in thread
* [RFC PATCH 18/28] x86/boot/64: Determine VA/PA offset before entering C code
2024-09-25 15:01 [RFC PATCH 00/28] x86: Rely on toolchain for relocatable code Ard Biesheuvel
` (16 preceding siblings ...)
2024-09-25 15:01 ` [RFC PATCH 17/28] x86/hibernate: Prefer RIP-relative accesses Ard Biesheuvel
@ 2024-09-25 15:01 ` Ard Biesheuvel
2024-09-25 15:01 ` [RFC PATCH 19/28] x86/boot/64: Avoid intentional absolute symbol references in .head.text Ard Biesheuvel
` (9 subsequent siblings)
27 siblings, 0 replies; 73+ messages in thread
From: Ard Biesheuvel @ 2024-09-25 15:01 UTC (permalink / raw)
To: linux-kernel
Cc: Ard Biesheuvel, x86, H. Peter Anvin, Andy Lutomirski,
Peter Zijlstra, Uros Bizjak, Dennis Zhou, Tejun Heo,
Christoph Lameter, Mathieu Desnoyers, Paolo Bonzini,
Vitaly Kuznetsov, Juergen Gross, Boris Ostrovsky,
Greg Kroah-Hartman, Arnd Bergmann, Masahiro Yamada, Kees Cook,
Nathan Chancellor, Keith Packard, Justin Stitt, Josh Poimboeuf,
Arnaldo Carvalho de Melo, Namhyung Kim, Jiri Olsa, Ian Rogers,
Adrian Hunter, Kan Liang, linux-doc, linux-pm, kvm, xen-devel,
linux-efi, linux-arch, linux-sparse, linux-kbuild,
linux-perf-users, rust-for-linux, llvm
From: Ard Biesheuvel <ardb@kernel.org>
Implicit absolute symbol references (e.g., taking the address of a
global variable) must be avoided in the C code that runs from the early
1:1 mapping of the kernel, given that this is a practice that violates
assumptions on the part of the toolchain. I.e., RIP-relative and
absolute references are expected to produce the same values, and so the
compiler is free to choose either. However, the code currently assumes
that RIP-relative references are never emitted here.
So an explicit virtual-to-physical offset needs to be used instead to
derive the kernel virtual addresses of _text and _end, instead of simply
taking the addresses and assuming that the compiler will not choose to
use a RIP-relative references in this particular case.
Currently, phys_base is already used to perform such calculations, but
it is derived from the kernel virtual address of _text, which is taken
using an implicit absolute symbol reference. So instead, derive this
VA-to-PA offset in asm code, using the kernel VA of common_startup_64
(which we already keep in a global variable for other reasons), and pass
it to the C startup code.
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
---
arch/x86/include/asm/setup.h | 2 +-
arch/x86/kernel/head64.c | 8 +++++---
arch/x86/kernel/head_64.S | 9 ++++++++-
3 files changed, 14 insertions(+), 5 deletions(-)
diff --git a/arch/x86/include/asm/setup.h b/arch/x86/include/asm/setup.h
index 0667b2a88614..85f4fde3515c 100644
--- a/arch/x86/include/asm/setup.h
+++ b/arch/x86/include/asm/setup.h
@@ -49,7 +49,7 @@ extern unsigned long saved_video_mode;
extern void reserve_standard_io_resources(void);
extern void i386_reserve_resources(void);
-extern unsigned long __startup_64(unsigned long physaddr, struct boot_params *bp);
+extern unsigned long __startup_64(unsigned long p2v_offset, struct boot_params *bp);
extern void startup_64_setup_gdt_idt(void);
extern void early_setup_idt(void);
extern void __init do_early_exception(struct pt_regs *regs, int trapnr);
diff --git a/arch/x86/kernel/head64.c b/arch/x86/kernel/head64.c
index d4398261ad81..de33ac34773c 100644
--- a/arch/x86/kernel/head64.c
+++ b/arch/x86/kernel/head64.c
@@ -138,12 +138,14 @@ static unsigned long __head sme_postprocess_startup(struct boot_params *bp, pmdv
* doesn't have to generate PC-relative relocations when accessing globals from
* that function. Clang actually does not generate them, which leads to
* boot-time crashes. To work around this problem, every global pointer must
- * be accessed using RIP_REL_REF().
+ * be accessed using RIP_REL_REF(). Kernel virtual addresses can be determined
+ * by subtracting p2v_offset from the RIP-relative address.
*/
-unsigned long __head __startup_64(unsigned long physaddr,
+unsigned long __head __startup_64(unsigned long p2v_offset,
struct boot_params *bp)
{
pmd_t (*early_pgts)[PTRS_PER_PMD] = RIP_REL_REF(early_dynamic_pgts);
+ unsigned long physaddr = (unsigned long)&RIP_REL_REF(_text);
unsigned long pgtable_flags;
unsigned long load_delta;
pgdval_t *pgd;
@@ -163,7 +165,7 @@ unsigned long __head __startup_64(unsigned long physaddr,
* Compute the delta between the address I am compiled to run at
* and the address I am actually running at.
*/
- load_delta = physaddr - (unsigned long)(_text - __START_KERNEL_map);
+ load_delta = __START_KERNEL_map + p2v_offset;
RIP_REL_REF(phys_base) = load_delta;
/* Is the address not 2M aligned? */
diff --git a/arch/x86/kernel/head_64.S b/arch/x86/kernel/head_64.S
index ab6ccee81493..db71cf64204b 100644
--- a/arch/x86/kernel/head_64.S
+++ b/arch/x86/kernel/head_64.S
@@ -99,13 +99,20 @@ SYM_CODE_START_NOALIGN(startup_64)
/* Sanitize CPU configuration */
call verify_cpu
+ /*
+ * Use the 1:1 physical and kernel virtual addresses of
+ * common_startup_64 to determine the physical-to-virtual offset, and
+ * pass it as the first argument to __startup_64().
+ */
+ leaq common_startup_64(%rip), %rdi
+ subq 0f(%rip), %rdi
+
/*
* Perform pagetable fixups. Additionally, if SME is active, encrypt
* the kernel and retrieve the modifier (SME encryption mask if SME
* is active) to be added to the initial pgdir entry that will be
* programmed into CR3.
*/
- leaq _text(%rip), %rdi
movq %r15, %rsi
call __startup_64
--
2.46.0.792.g87dc391469-goog
^ permalink raw reply related [flat|nested] 73+ messages in thread
* [RFC PATCH 19/28] x86/boot/64: Avoid intentional absolute symbol references in .head.text
2024-09-25 15:01 [RFC PATCH 00/28] x86: Rely on toolchain for relocatable code Ard Biesheuvel
` (17 preceding siblings ...)
2024-09-25 15:01 ` [RFC PATCH 18/28] x86/boot/64: Determine VA/PA offset before entering C code Ard Biesheuvel
@ 2024-09-25 15:01 ` Ard Biesheuvel
2024-09-25 15:01 ` [RFC PATCH 20/28] x64/acpi: Use PIC-compatible references in wakeup_64.S Ard Biesheuvel
` (8 subsequent siblings)
27 siblings, 0 replies; 73+ messages in thread
From: Ard Biesheuvel @ 2024-09-25 15:01 UTC (permalink / raw)
To: linux-kernel
Cc: Ard Biesheuvel, x86, H. Peter Anvin, Andy Lutomirski,
Peter Zijlstra, Uros Bizjak, Dennis Zhou, Tejun Heo,
Christoph Lameter, Mathieu Desnoyers, Paolo Bonzini,
Vitaly Kuznetsov, Juergen Gross, Boris Ostrovsky,
Greg Kroah-Hartman, Arnd Bergmann, Masahiro Yamada, Kees Cook,
Nathan Chancellor, Keith Packard, Justin Stitt, Josh Poimboeuf,
Arnaldo Carvalho de Melo, Namhyung Kim, Jiri Olsa, Ian Rogers,
Adrian Hunter, Kan Liang, linux-doc, linux-pm, kvm, xen-devel,
linux-efi, linux-arch, linux-sparse, linux-kbuild,
linux-perf-users, rust-for-linux, llvm
From: Ard Biesheuvel <ardb@kernel.org>
The code in .head.text executes from a 1:1 mapping and cannot generally
refer to global variables using their kernel virtual addresses. However,
there are some occurrences of such references that are valid: the kernel
virtual addresses of _text and _end are needed to populate the page
tables correctly, and some other section markers are used in a similar
way.
To avoid the need for making exceptions to the rule that .head.text must
not contain any absolute symbol references, derive these addresses from
the RIP-relative 1:1 mapped physical addresses, which can be safely
determined using RIP_REL_REF().
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
---
arch/x86/kernel/head64.c | 30 ++++++++++++--------
1 file changed, 18 insertions(+), 12 deletions(-)
diff --git a/arch/x86/kernel/head64.c b/arch/x86/kernel/head64.c
index de33ac34773c..49e8ba1c0d34 100644
--- a/arch/x86/kernel/head64.c
+++ b/arch/x86/kernel/head64.c
@@ -91,9 +91,11 @@ static inline bool check_la57_support(void)
return true;
}
-static unsigned long __head sme_postprocess_startup(struct boot_params *bp, pmdval_t *pmd)
+static unsigned long __head sme_postprocess_startup(struct boot_params *bp,
+ pmdval_t *pmd,
+ unsigned long p2v_offset)
{
- unsigned long vaddr, vaddr_end;
+ unsigned long paddr, paddr_end;
int i;
/* Encrypt the kernel and related (if SME is active) */
@@ -106,10 +108,10 @@ static unsigned long __head sme_postprocess_startup(struct boot_params *bp, pmdv
* attribute.
*/
if (sme_get_me_mask()) {
- vaddr = (unsigned long)__start_bss_decrypted;
- vaddr_end = (unsigned long)__end_bss_decrypted;
+ paddr = (unsigned long)&RIP_REL_REF(__start_bss_decrypted);
+ paddr_end = (unsigned long)&RIP_REL_REF(__end_bss_decrypted);
- for (; vaddr < vaddr_end; vaddr += PMD_SIZE) {
+ for (; paddr < paddr_end; paddr += PMD_SIZE) {
/*
* On SNP, transition the page to shared in the RMP table so that
* it is consistent with the page table attribute change.
@@ -118,11 +120,11 @@ static unsigned long __head sme_postprocess_startup(struct boot_params *bp, pmdv
* mapping (kernel .text). PVALIDATE, by way of
* early_snp_set_memory_shared(), requires a valid virtual
* address but the kernel is currently running off of the identity
- * mapping so use __pa() to get a *currently* valid virtual address.
+ * mapping so use the PA to get a *currently* valid virtual address.
*/
- early_snp_set_memory_shared(__pa(vaddr), __pa(vaddr), PTRS_PER_PMD);
+ early_snp_set_memory_shared(paddr, paddr, PTRS_PER_PMD);
- i = pmd_index(vaddr);
+ i = pmd_index(paddr - p2v_offset);
pmd[i] -= sme_get_me_mask();
}
}
@@ -146,6 +148,7 @@ unsigned long __head __startup_64(unsigned long p2v_offset,
{
pmd_t (*early_pgts)[PTRS_PER_PMD] = RIP_REL_REF(early_dynamic_pgts);
unsigned long physaddr = (unsigned long)&RIP_REL_REF(_text);
+ unsigned long va_text, va_end;
unsigned long pgtable_flags;
unsigned long load_delta;
pgdval_t *pgd;
@@ -172,6 +175,9 @@ unsigned long __head __startup_64(unsigned long p2v_offset,
if (load_delta & ~PMD_MASK)
for (;;);
+ va_text = physaddr - p2v_offset;
+ va_end = (unsigned long)&RIP_REL_REF(_end) - p2v_offset;
+
/* Include the SME encryption mask in the fixup value */
load_delta += sme_get_me_mask();
@@ -232,7 +238,7 @@ unsigned long __head __startup_64(unsigned long p2v_offset,
pmd_entry += sme_get_me_mask();
pmd_entry += physaddr;
- for (i = 0; i < DIV_ROUND_UP(_end - _text, PMD_SIZE); i++) {
+ for (i = 0; i < DIV_ROUND_UP(va_end - va_text, PMD_SIZE); i++) {
int idx = i + (physaddr >> PMD_SHIFT);
pmd[idx % PTRS_PER_PMD] = pmd_entry + i * PMD_SIZE;
@@ -257,11 +263,11 @@ unsigned long __head __startup_64(unsigned long p2v_offset,
pmd = &RIP_REL_REF(level2_kernel_pgt)->pmd;
/* invalidate pages before the kernel image */
- for (i = 0; i < pmd_index((unsigned long)_text); i++)
+ for (i = 0; i < pmd_index(va_text); i++)
pmd[i] &= ~_PAGE_PRESENT;
/* fixup pages that are part of the kernel image */
- for (; i <= pmd_index((unsigned long)_end); i++)
+ for (; i <= pmd_index(va_end); i++)
if (pmd[i] & _PAGE_PRESENT)
pmd[i] += load_delta;
@@ -269,7 +275,7 @@ unsigned long __head __startup_64(unsigned long p2v_offset,
for (; i < PTRS_PER_PMD; i++)
pmd[i] &= ~_PAGE_PRESENT;
- return sme_postprocess_startup(bp, pmd);
+ return sme_postprocess_startup(bp, pmd, p2v_offset);
}
/* Wipe all early page tables except for the kernel symbol map */
--
2.46.0.792.g87dc391469-goog
^ permalink raw reply related [flat|nested] 73+ messages in thread
* [RFC PATCH 20/28] x64/acpi: Use PIC-compatible references in wakeup_64.S
2024-09-25 15:01 [RFC PATCH 00/28] x86: Rely on toolchain for relocatable code Ard Biesheuvel
` (18 preceding siblings ...)
2024-09-25 15:01 ` [RFC PATCH 19/28] x86/boot/64: Avoid intentional absolute symbol references in .head.text Ard Biesheuvel
@ 2024-09-25 15:01 ` Ard Biesheuvel
2024-09-25 15:01 ` [RFC PATCH 21/28] x86/head: Use PIC-compatible symbol references in startup code Ard Biesheuvel
` (7 subsequent siblings)
27 siblings, 0 replies; 73+ messages in thread
From: Ard Biesheuvel @ 2024-09-25 15:01 UTC (permalink / raw)
To: linux-kernel
Cc: Ard Biesheuvel, x86, H. Peter Anvin, Andy Lutomirski,
Peter Zijlstra, Uros Bizjak, Dennis Zhou, Tejun Heo,
Christoph Lameter, Mathieu Desnoyers, Paolo Bonzini,
Vitaly Kuznetsov, Juergen Gross, Boris Ostrovsky,
Greg Kroah-Hartman, Arnd Bergmann, Masahiro Yamada, Kees Cook,
Nathan Chancellor, Keith Packard, Justin Stitt, Josh Poimboeuf,
Arnaldo Carvalho de Melo, Namhyung Kim, Jiri Olsa, Ian Rogers,
Adrian Hunter, Kan Liang, linux-doc, linux-pm, kvm, xen-devel,
linux-efi, linux-arch, linux-sparse, linux-kbuild,
linux-perf-users, rust-for-linux, llvm
From: Ard Biesheuvel <ardb@kernel.org>
Use ordinary RIP-relative references to make the code compatible with
running the linker in PIE mode.
Note that wakeup_long64() runs in the kernel's ordinary virtual mapping
so there is no need to record the address of .Lresume_point in a global
variable. And fix the comment while at it.
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
---
arch/x86/kernel/acpi/wakeup_64.S | 11 ++++-------
1 file changed, 4 insertions(+), 7 deletions(-)
diff --git a/arch/x86/kernel/acpi/wakeup_64.S b/arch/x86/kernel/acpi/wakeup_64.S
index 94ff83f3d3fe..af2f2ed57658 100644
--- a/arch/x86/kernel/acpi/wakeup_64.S
+++ b/arch/x86/kernel/acpi/wakeup_64.S
@@ -14,7 +14,7 @@
.code64
/*
- * Hooray, we are in Long 64-bit mode (but still running in low memory)
+ * Hooray, we are in Long 64-bit mode
*/
SYM_FUNC_START(wakeup_long64)
movq saved_magic(%rip), %rax
@@ -40,7 +40,7 @@ SYM_FUNC_START(wakeup_long64)
movq saved_rsi(%rip), %rsi
movq saved_rbp(%rip), %rbp
- movq saved_rip(%rip), %rax
+ leaq .Lresume_point(%rip), %rax
ANNOTATE_RETPOLINE_SAFE
jmp *%rax
SYM_FUNC_END(wakeup_long64)
@@ -51,7 +51,7 @@ SYM_FUNC_START(do_suspend_lowlevel)
xorl %eax, %eax
call save_processor_state
- movq $saved_context, %rax
+ leaq saved_context(%rip), %rax
movq %rsp, pt_regs_sp(%rax)
movq %rbp, pt_regs_bp(%rax)
movq %rsi, pt_regs_si(%rax)
@@ -70,8 +70,6 @@ SYM_FUNC_START(do_suspend_lowlevel)
pushfq
popq pt_regs_flags(%rax)
- movq $.Lresume_point, saved_rip(%rip)
-
movq %rsp, saved_rsp(%rip)
movq %rbp, saved_rbp(%rip)
movq %rbx, saved_rbx(%rip)
@@ -88,7 +86,7 @@ SYM_FUNC_START(do_suspend_lowlevel)
.align 4
.Lresume_point:
/* We don't restore %rax, it must be 0 anyway */
- movq $saved_context, %rax
+ leaq saved_context(%rip), %rax
movq saved_context_cr4(%rax), %rbx
movq %rbx, %cr4
movq saved_context_cr3(%rax), %rbx
@@ -137,7 +135,6 @@ saved_rsi: .quad 0
saved_rdi: .quad 0
saved_rbx: .quad 0
-saved_rip: .quad 0
saved_rsp: .quad 0
SYM_DATA(saved_magic, .quad 0)
--
2.46.0.792.g87dc391469-goog
^ permalink raw reply related [flat|nested] 73+ messages in thread
* [RFC PATCH 21/28] x86/head: Use PIC-compatible symbol references in startup code
2024-09-25 15:01 [RFC PATCH 00/28] x86: Rely on toolchain for relocatable code Ard Biesheuvel
` (19 preceding siblings ...)
2024-09-25 15:01 ` [RFC PATCH 20/28] x64/acpi: Use PIC-compatible references in wakeup_64.S Ard Biesheuvel
@ 2024-09-25 15:01 ` Ard Biesheuvel
2024-09-25 15:01 ` [RFC PATCH 22/28] asm-generic: Treat PIC .data.rel.ro sections as .rodata Ard Biesheuvel
` (6 subsequent siblings)
27 siblings, 0 replies; 73+ messages in thread
From: Ard Biesheuvel @ 2024-09-25 15:01 UTC (permalink / raw)
To: linux-kernel
Cc: Ard Biesheuvel, x86, H. Peter Anvin, Andy Lutomirski,
Peter Zijlstra, Uros Bizjak, Dennis Zhou, Tejun Heo,
Christoph Lameter, Mathieu Desnoyers, Paolo Bonzini,
Vitaly Kuznetsov, Juergen Gross, Boris Ostrovsky,
Greg Kroah-Hartman, Arnd Bergmann, Masahiro Yamada, Kees Cook,
Nathan Chancellor, Keith Packard, Justin Stitt, Josh Poimboeuf,
Arnaldo Carvalho de Melo, Namhyung Kim, Jiri Olsa, Ian Rogers,
Adrian Hunter, Kan Liang, linux-doc, linux-pm, kvm, xen-devel,
linux-efi, linux-arch, linux-sparse, linux-kbuild,
linux-perf-users, rust-for-linux, llvm
From: Ard Biesheuvel <ardb@kernel.org>
Use RIP-relative symbol references to make them compatible with running
the linker in PIE mode.
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
---
arch/x86/kernel/head_64.S | 14 +++++++++-----
arch/x86/kernel/relocate_kernel_64.S | 6 ++++--
2 files changed, 13 insertions(+), 7 deletions(-)
diff --git a/arch/x86/kernel/head_64.S b/arch/x86/kernel/head_64.S
index db71cf64204b..cc2fec3de4b7 100644
--- a/arch/x86/kernel/head_64.S
+++ b/arch/x86/kernel/head_64.S
@@ -182,8 +182,9 @@ SYM_INNER_LABEL(secondary_startup_64_no_verify, SYM_L_GLOBAL)
xorl %r15d, %r15d
/* Derive the runtime physical address of init_top_pgt[] */
- movq phys_base(%rip), %rax
- addq $(init_top_pgt - __START_KERNEL_map), %rax
+ leaq init_top_pgt(%rip), %rax
+ subq $__START_KERNEL_map, %rax
+ addq phys_base(%rip), %rax
/*
* Retrieve the modifier (SME encryption mask if SME is active) to be
@@ -314,7 +315,8 @@ SYM_INNER_LABEL(common_startup_64, SYM_L_LOCAL)
.Lsetup_cpu:
/* Get the per cpu offset for the given CPU# which is in ECX */
- movq __per_cpu_offset(,%rcx,8), %rdx
+ leaq __per_cpu_offset(%rip), %rdx
+ movq (%rdx,%rcx,8), %rdx
#else
xorl %edx, %edx /* zero-extended to clear all of RDX */
#endif /* CONFIG_SMP */
@@ -325,7 +327,8 @@ SYM_INNER_LABEL(common_startup_64, SYM_L_LOCAL)
*
* RDX contains the per-cpu offset
*/
- movq pcpu_hot + X86_current_task(%rdx), %rax
+ leaq pcpu_hot + X86_current_task(%rip), %rax
+ movq (%rax,%rdx), %rax
movq TASK_threadsp(%rax), %rsp
/*
@@ -346,7 +349,8 @@ SYM_INNER_LABEL(common_startup_64, SYM_L_LOCAL)
*/
subq $16, %rsp
movw $(GDT_SIZE-1), (%rsp)
- leaq gdt_page(%rdx), %rax
+ leaq gdt_page(%rip), %rax
+ addq %rdx, %rax
movq %rax, 2(%rsp)
lgdt (%rsp)
addq $16, %rsp
diff --git a/arch/x86/kernel/relocate_kernel_64.S b/arch/x86/kernel/relocate_kernel_64.S
index e9e88c342f75..cbfd0227ea3e 100644
--- a/arch/x86/kernel/relocate_kernel_64.S
+++ b/arch/x86/kernel/relocate_kernel_64.S
@@ -106,6 +106,9 @@ SYM_CODE_START_NOALIGN(relocate_kernel)
/* setup a new stack at the end of the physical control page */
lea PAGE_SIZE(%r8), %rsp
+ /* take the virtual address of virtual_mapped() before jumping */
+ leaq virtual_mapped(%rip), %r14
+
/* jump to identity mapped page */
addq $(identity_mapped - relocate_kernel), %r8
pushq %r8
@@ -225,8 +228,7 @@ SYM_CODE_START_LOCAL_NOALIGN(identity_mapped)
movq %rax, %cr3
lea PAGE_SIZE(%r8), %rsp
call swap_pages
- movq $virtual_mapped, %rax
- pushq %rax
+ pushq %r14
ANNOTATE_UNRET_SAFE
ret
int3
--
2.46.0.792.g87dc391469-goog
^ permalink raw reply related [flat|nested] 73+ messages in thread
* [RFC PATCH 22/28] asm-generic: Treat PIC .data.rel.ro sections as .rodata
2024-09-25 15:01 [RFC PATCH 00/28] x86: Rely on toolchain for relocatable code Ard Biesheuvel
` (20 preceding siblings ...)
2024-09-25 15:01 ` [RFC PATCH 21/28] x86/head: Use PIC-compatible symbol references in startup code Ard Biesheuvel
@ 2024-09-25 15:01 ` Ard Biesheuvel
2024-09-25 15:01 ` [RFC PATCH 23/28] tools/objtool: Mark generated sections as writable Ard Biesheuvel
` (5 subsequent siblings)
27 siblings, 0 replies; 73+ messages in thread
From: Ard Biesheuvel @ 2024-09-25 15:01 UTC (permalink / raw)
To: linux-kernel
Cc: Ard Biesheuvel, x86, H. Peter Anvin, Andy Lutomirski,
Peter Zijlstra, Uros Bizjak, Dennis Zhou, Tejun Heo,
Christoph Lameter, Mathieu Desnoyers, Paolo Bonzini,
Vitaly Kuznetsov, Juergen Gross, Boris Ostrovsky,
Greg Kroah-Hartman, Arnd Bergmann, Masahiro Yamada, Kees Cook,
Nathan Chancellor, Keith Packard, Justin Stitt, Josh Poimboeuf,
Arnaldo Carvalho de Melo, Namhyung Kim, Jiri Olsa, Ian Rogers,
Adrian Hunter, Kan Liang, linux-doc, linux-pm, kvm, xen-devel,
linux-efi, linux-arch, linux-sparse, linux-kbuild,
linux-perf-users, rust-for-linux, llvm
From: Ard Biesheuvel <ardb@kernel.org>
When running the compiler in PIC/PIE mode, it will emit data objects
that are 'const' in the context of the program into the .data.rel.ro
section if they contain absolute addresses of statically allocated
global objects. This helps the dynamic loader distinguish between
objects that are truly const from objects that will need to be fixed up
by the loader before starting the program.
This is not a concern for the kernel, but it does mean those
.data.rel.ro input sections need to be handled. So treat them as
.rodata.
It also means some explicit uses of .rodata for global structures
containing absolute addresses need to be changed to .data.rel.ro to
prevent the linker from warning about incompatible section flags.
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
---
include/asm-generic/vmlinux.lds.h | 2 +-
include/linux/compiler.h | 2 +-
scripts/kallsyms.c | 2 +-
tools/objtool/check.c | 11 ++++++-----
tools/objtool/include/objtool/special.h | 2 +-
5 files changed, 10 insertions(+), 9 deletions(-)
diff --git a/include/asm-generic/vmlinux.lds.h b/include/asm-generic/vmlinux.lds.h
index cc14d780c70d..2b079f73820f 100644
--- a/include/asm-generic/vmlinux.lds.h
+++ b/include/asm-generic/vmlinux.lds.h
@@ -456,7 +456,7 @@
. = ALIGN((align)); \
.rodata : AT(ADDR(.rodata) - LOAD_OFFSET) { \
__start_rodata = .; \
- *(.rodata) *(.rodata.*) \
+ *(.rodata .rodata.* .data.rel.ro*) \
SCHED_DATA \
RO_AFTER_INIT_DATA /* Read only after init */ \
. = ALIGN(8); \
diff --git a/include/linux/compiler.h b/include/linux/compiler.h
index ec55bcce4146..f7c48b7c0a6b 100644
--- a/include/linux/compiler.h
+++ b/include/linux/compiler.h
@@ -133,7 +133,7 @@ void ftrace_likely_update(struct ftrace_likely_data *f, int val,
#define annotate_unreachable() __annotate_unreachable(__COUNTER__)
/* Annotate a C jump table to allow objtool to follow the code flow */
-#define __annotate_jump_table __section(".rodata..c_jump_table")
+#define __annotate_jump_table __section(".data.rel.ro.c_jump_table")
#else /* !CONFIG_OBJTOOL */
#define annotate_reachable()
diff --git a/scripts/kallsyms.c b/scripts/kallsyms.c
index 9c34b9397872..1700e97400aa 100644
--- a/scripts/kallsyms.c
+++ b/scripts/kallsyms.c
@@ -357,7 +357,7 @@ static void write_src(void)
printf("#define ALGN .balign 4\n");
printf("#endif\n");
- printf("\t.section .rodata, \"a\"\n");
+ printf("\t.section .data.rel.ro, \"a\"\n");
output_label("kallsyms_num_syms");
printf("\t.long\t%u\n", table_cnt);
diff --git a/tools/objtool/check.c b/tools/objtool/check.c
index 01237d167223..04725bd83232 100644
--- a/tools/objtool/check.c
+++ b/tools/objtool/check.c
@@ -2575,15 +2575,16 @@ static void mark_rodata(struct objtool_file *file)
* Search for the following rodata sections, each of which can
* potentially contain jump tables:
*
- * - .rodata: can contain GCC switch tables
- * - .rodata.<func>: same, if -fdata-sections is being used
- * - .rodata..c_jump_table: contains C annotated jump tables
+ * - .rodata .data.rel.ro : can contain GCC switch tables
+ * - .rodata.<func> .data.rel.ro.<func> : same, if -fdata-sections is being used
+ * - .data.rel.ro.c_jump_table : contains C annotated jump tables
*
* .rodata.str1.* sections are ignored; they don't contain jump tables.
*/
for_each_sec(file, sec) {
- if (!strncmp(sec->name, ".rodata", 7) &&
- !strstr(sec->name, ".str1.")) {
+ if ((!strncmp(sec->name, ".rodata", 7) &&
+ !strstr(sec->name, ".str1.")) ||
+ !strncmp(sec->name, ".data.rel.ro", 12)) {
sec->rodata = true;
found = true;
}
diff --git a/tools/objtool/include/objtool/special.h b/tools/objtool/include/objtool/special.h
index 86d4af9c5aa9..89ee12b1a138 100644
--- a/tools/objtool/include/objtool/special.h
+++ b/tools/objtool/include/objtool/special.h
@@ -10,7 +10,7 @@
#include <objtool/check.h>
#include <objtool/elf.h>
-#define C_JUMP_TABLE_SECTION ".rodata..c_jump_table"
+#define C_JUMP_TABLE_SECTION ".data.rel.ro.c_jump_table"
struct special_alt {
struct list_head list;
--
2.46.0.792.g87dc391469-goog
^ permalink raw reply related [flat|nested] 73+ messages in thread
* [RFC PATCH 23/28] tools/objtool: Mark generated sections as writable
2024-09-25 15:01 [RFC PATCH 00/28] x86: Rely on toolchain for relocatable code Ard Biesheuvel
` (21 preceding siblings ...)
2024-09-25 15:01 ` [RFC PATCH 22/28] asm-generic: Treat PIC .data.rel.ro sections as .rodata Ard Biesheuvel
@ 2024-09-25 15:01 ` Ard Biesheuvel
2024-09-25 15:01 ` [RFC PATCH 24/28] tools/objtool: Treat indirect ftrace calls as direct calls Ard Biesheuvel
` (4 subsequent siblings)
27 siblings, 0 replies; 73+ messages in thread
From: Ard Biesheuvel @ 2024-09-25 15:01 UTC (permalink / raw)
To: linux-kernel
Cc: Ard Biesheuvel, x86, H. Peter Anvin, Andy Lutomirski,
Peter Zijlstra, Uros Bizjak, Dennis Zhou, Tejun Heo,
Christoph Lameter, Mathieu Desnoyers, Paolo Bonzini,
Vitaly Kuznetsov, Juergen Gross, Boris Ostrovsky,
Greg Kroah-Hartman, Arnd Bergmann, Masahiro Yamada, Kees Cook,
Nathan Chancellor, Keith Packard, Justin Stitt, Josh Poimboeuf,
Arnaldo Carvalho de Melo, Namhyung Kim, Jiri Olsa, Ian Rogers,
Adrian Hunter, Kan Liang, linux-doc, linux-pm, kvm, xen-devel,
linux-efi, linux-arch, linux-sparse, linux-kbuild,
linux-perf-users, rust-for-linux, llvm
From: Ard Biesheuvel <ardb@kernel.org>
objtool generates ELF sections such as __mcount_loc, which carry
absolute symbol references that need to be fixed up at boot time, based
on the actual virtual placement of the kernel binary.
This involves writing to the section at boot time, and in some cases
(e.g., when using --pie and -z text), the lld linker is more pedantic
about this, and complains about absolute relocations operating on
read-only sections.
None of this actually matters for vmlinux, which manages its own mapping
permissions, and so we can just set the SHF_WRITE flag on those sections
to make the linker happy.
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
---
tools/objtool/elf.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/tools/objtool/elf.c b/tools/objtool/elf.c
index 3d27983dc908..26a39b010c92 100644
--- a/tools/objtool/elf.c
+++ b/tools/objtool/elf.c
@@ -1142,7 +1142,7 @@ struct section *elf_create_section(struct elf *elf, const char *name,
sec->sh.sh_entsize = entsize;
sec->sh.sh_type = SHT_PROGBITS;
sec->sh.sh_addralign = 1;
- sec->sh.sh_flags = SHF_ALLOC;
+ sec->sh.sh_flags = SHF_ALLOC | SHF_WRITE;
/* Add section name to .shstrtab (or .strtab for Clang) */
shstrtab = find_section_by_name(elf, ".shstrtab");
--
2.46.0.792.g87dc391469-goog
^ permalink raw reply related [flat|nested] 73+ messages in thread
* [RFC PATCH 24/28] tools/objtool: Treat indirect ftrace calls as direct calls
2024-09-25 15:01 [RFC PATCH 00/28] x86: Rely on toolchain for relocatable code Ard Biesheuvel
` (22 preceding siblings ...)
2024-09-25 15:01 ` [RFC PATCH 23/28] tools/objtool: Mark generated sections as writable Ard Biesheuvel
@ 2024-09-25 15:01 ` Ard Biesheuvel
2024-10-01 7:18 ` Josh Poimboeuf
2024-09-25 15:01 ` [RFC PATCH 25/28] x86: Use PIE codegen for the core kernel Ard Biesheuvel
` (3 subsequent siblings)
27 siblings, 1 reply; 73+ messages in thread
From: Ard Biesheuvel @ 2024-09-25 15:01 UTC (permalink / raw)
To: linux-kernel
Cc: Ard Biesheuvel, x86, H. Peter Anvin, Andy Lutomirski,
Peter Zijlstra, Uros Bizjak, Dennis Zhou, Tejun Heo,
Christoph Lameter, Mathieu Desnoyers, Paolo Bonzini,
Vitaly Kuznetsov, Juergen Gross, Boris Ostrovsky,
Greg Kroah-Hartman, Arnd Bergmann, Masahiro Yamada, Kees Cook,
Nathan Chancellor, Keith Packard, Justin Stitt, Josh Poimboeuf,
Arnaldo Carvalho de Melo, Namhyung Kim, Jiri Olsa, Ian Rogers,
Adrian Hunter, Kan Liang, linux-doc, linux-pm, kvm, xen-devel,
linux-efi, linux-arch, linux-sparse, linux-kbuild,
linux-perf-users, rust-for-linux, llvm
From: Ard Biesheuvel <ardb@kernel.org>
In some cases, the compiler may rely on indirect calls using GOT slots
as memory operands to emit function calls. This leaves it up to the
linker to relax the call to a direct call if possible, i.e., if the
destination address is known at link time and in range, which may not be
the case when building shared libraries for user space.
On x86, this may happen when building in PIC mode with ftrace enabled,
and given that vmlinux is a fully linked binary, this relaxation is
always possible, and therefore mandatory per the x86_64 psABI.
This means that the indirect calls to __fentry__ that are observeable in
vmlinux.o will have been converted to direct calls in vmlinux, and can
be treated as such by objtool.
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
---
tools/objtool/check.c | 32 ++++++++++++++++++--
1 file changed, 30 insertions(+), 2 deletions(-)
diff --git a/tools/objtool/check.c b/tools/objtool/check.c
index 04725bd83232..94a56099e22d 100644
--- a/tools/objtool/check.c
+++ b/tools/objtool/check.c
@@ -1696,11 +1696,39 @@ static int add_call_destinations(struct objtool_file *file)
struct reloc *reloc;
for_each_insn(file, insn) {
- if (insn->type != INSN_CALL)
+ if (insn->type != INSN_CALL &&
+ insn->type != INSN_CALL_DYNAMIC)
continue;
reloc = insn_reloc(file, insn);
- if (!reloc) {
+ if (insn->type == INSN_CALL_DYNAMIC) {
+ if (!reloc)
+ continue;
+
+ /*
+ * GCC 13 and older on x86 will always emit the call to
+ * __fentry__ using a relaxable GOT-based symbol
+ * reference when operating in PIC mode, i.e.,
+ *
+ * call *0x0(%rip)
+ * R_X86_64_GOTPCRELX __fentry__-0x4
+ *
+ * where it is left up to the linker to relax this into
+ *
+ * call __fentry__
+ * nop
+ *
+ * if __fentry__ turns out to be DSO local, which is
+ * always the case for vmlinux. Given that this
+ * relaxation is mandatory per the x86_64 psABI, these
+ * calls can simply be treated as direct calls.
+ */
+ if (arch_ftrace_match(reloc->sym->name)) {
+ insn->type = INSN_CALL;
+ add_call_dest(file, insn, reloc->sym, false);
+ }
+
+ } else if (!reloc) {
dest_off = arch_jump_destination(insn);
dest = find_call_destination(insn->sec, dest_off);
--
2.46.0.792.g87dc391469-goog
^ permalink raw reply related [flat|nested] 73+ messages in thread
* [RFC PATCH 25/28] x86: Use PIE codegen for the core kernel
2024-09-25 15:01 [RFC PATCH 00/28] x86: Rely on toolchain for relocatable code Ard Biesheuvel
` (23 preceding siblings ...)
2024-09-25 15:01 ` [RFC PATCH 24/28] tools/objtool: Treat indirect ftrace calls as direct calls Ard Biesheuvel
@ 2024-09-25 15:01 ` Ard Biesheuvel
2024-10-01 21:13 ` H. Peter Anvin
2024-09-25 15:01 ` [RFC PATCH 26/28] x86/boot: Implement support for ELF RELA/RELR relocations Ard Biesheuvel
` (2 subsequent siblings)
27 siblings, 1 reply; 73+ messages in thread
From: Ard Biesheuvel @ 2024-09-25 15:01 UTC (permalink / raw)
To: linux-kernel
Cc: Ard Biesheuvel, x86, H. Peter Anvin, Andy Lutomirski,
Peter Zijlstra, Uros Bizjak, Dennis Zhou, Tejun Heo,
Christoph Lameter, Mathieu Desnoyers, Paolo Bonzini,
Vitaly Kuznetsov, Juergen Gross, Boris Ostrovsky,
Greg Kroah-Hartman, Arnd Bergmann, Masahiro Yamada, Kees Cook,
Nathan Chancellor, Keith Packard, Justin Stitt, Josh Poimboeuf,
Arnaldo Carvalho de Melo, Namhyung Kim, Jiri Olsa, Ian Rogers,
Adrian Hunter, Kan Liang, linux-doc, linux-pm, kvm, xen-devel,
linux-efi, linux-arch, linux-sparse, linux-kbuild,
linux-perf-users, rust-for-linux, llvm
From: Ard Biesheuvel <ardb@kernel.org>
As an intermediate step towards enabling PIE linking for the 64-bit x86
kernel, enable PIE codegen for all objects that are linked into the
kernel proper.
This substantially reduces the number of relocations that need to be
processed when booting a relocatable KASLR kernel.
Before (size in bytes of the reloc table):
797372 arch/x86/boot/compressed/vmlinux.relocs
After:
400252 arch/x86/boot/compressed/vmlinux.relocs
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
---
arch/x86/Makefile | 11 ++++++++++-
arch/x86/boot/Makefile | 1 +
arch/x86/boot/compressed/Makefile | 2 +-
arch/x86/entry/vdso/Makefile | 1 +
arch/x86/realmode/rm/Makefile | 1 +
include/asm-generic/vmlinux.lds.h | 1 +
6 files changed, 15 insertions(+), 2 deletions(-)
diff --git a/arch/x86/Makefile b/arch/x86/Makefile
index b78b7623a4a9..83d20f402535 100644
--- a/arch/x86/Makefile
+++ b/arch/x86/Makefile
@@ -193,13 +193,22 @@ else
KBUILD_RUSTFLAGS += -Cno-redzone=y
KBUILD_RUSTFLAGS += -Ccode-model=kernel
+ PIE_CFLAGS-y := -fpie -mcmodel=small \
+ -include $(srctree)/include/linux/hidden.h
+
+ PIE_CFLAGS-$(CONFIG_CC_IS_GCC) += $(call cc-option.-mdirect-extern-access)
+ PIE_CFLAGS-$(CONFIG_CC_IS_CLANG) += -fdirect-access-external-data
+
ifeq ($(CONFIG_STACKPROTECTOR),y)
KBUILD_CFLAGS += -mstack-protector-guard-symbol=fixed_percpu_data
+
+ # the 'small' C model defaults to %fs
+ PIE_CFLAGS-$(CONFIG_SMP) += -mstack-protector-guard-reg=gs
endif
# Don't emit relaxable GOTPCREL relocations
KBUILD_AFLAGS_KERNEL += -Wa,-mrelax-relocations=no
- KBUILD_CFLAGS_KERNEL += -Wa,-mrelax-relocations=no
+ KBUILD_CFLAGS_KERNEL += -Wa,-mrelax-relocations=no $(PIE_CFLAGS-y)
endif
#
diff --git a/arch/x86/boot/Makefile b/arch/x86/boot/Makefile
index 9cc0ff6e9067..4d3ba35cb619 100644
--- a/arch/x86/boot/Makefile
+++ b/arch/x86/boot/Makefile
@@ -57,6 +57,7 @@ KBUILD_AFLAGS := $(KBUILD_CFLAGS) -D__ASSEMBLY__
KBUILD_CFLAGS += $(call cc-option,-fmacro-prefix-map=$(srctree)/=)
KBUILD_CFLAGS += -fno-asynchronous-unwind-tables
KBUILD_CFLAGS += $(CONFIG_CC_IMPLICIT_FALLTHROUGH)
+KBUILD_CFLAGS_KERNEL :=
$(obj)/bzImage: asflags-y := $(SVGA_MODE)
diff --git a/arch/x86/boot/compressed/Makefile b/arch/x86/boot/compressed/Makefile
index f2051644de94..c362d36b5b69 100644
--- a/arch/x86/boot/compressed/Makefile
+++ b/arch/x86/boot/compressed/Makefile
@@ -73,7 +73,7 @@ LDFLAGS_vmlinux += -T
hostprogs := mkpiggy
HOST_EXTRACFLAGS += -I$(srctree)/tools/include
-sed-voffset := -e 's/^\([0-9a-fA-F]*\) [ABCDGRSTVW] \(_text\|__start_rodata\|__bss_start\|_end\)$$/\#define VO_\2 _AC(0x\1,UL)/p'
+sed-voffset := -e 's/^\([0-9a-fA-F]*\) [ABbCDdGRSTtVW] \(_text\|__start_rodata\|__bss_start\|_end\)$$/\#define VO_\2 _AC(0x\1,UL)/p'
quiet_cmd_voffset = VOFFSET $@
cmd_voffset = $(NM) $< | sed -n $(sed-voffset) > $@
diff --git a/arch/x86/entry/vdso/Makefile b/arch/x86/entry/vdso/Makefile
index c9216ac4fb1e..7af9fecf9abb 100644
--- a/arch/x86/entry/vdso/Makefile
+++ b/arch/x86/entry/vdso/Makefile
@@ -141,6 +141,7 @@ endif
endif
$(obj)/vdso32.so.dbg: KBUILD_CFLAGS = $(KBUILD_CFLAGS_32)
+$(obj)/vdso32.so.dbg: KBUILD_CFLAGS_KERNEL :=
$(obj)/vdso32.so.dbg: $(obj)/vdso32/vdso32.lds $(vobjs32) FORCE
$(call if_changed,vdso_and_check)
diff --git a/arch/x86/realmode/rm/Makefile b/arch/x86/realmode/rm/Makefile
index a0fb39abc5c8..70bf0a26da91 100644
--- a/arch/x86/realmode/rm/Makefile
+++ b/arch/x86/realmode/rm/Makefile
@@ -67,3 +67,4 @@ KBUILD_CFLAGS := $(REALMODE_CFLAGS) -D_SETUP -D_WAKEUP \
-I$(srctree)/arch/x86/boot
KBUILD_AFLAGS := $(KBUILD_CFLAGS) -D__ASSEMBLY__
KBUILD_CFLAGS += -fno-asynchronous-unwind-tables
+KBUILD_CFLAGS_KERNEL :=
diff --git a/include/asm-generic/vmlinux.lds.h b/include/asm-generic/vmlinux.lds.h
index 2b079f73820f..3a084ac77109 100644
--- a/include/asm-generic/vmlinux.lds.h
+++ b/include/asm-generic/vmlinux.lds.h
@@ -349,6 +349,7 @@
*(DATA_MAIN) \
*(.data..decrypted) \
*(.ref.data) \
+ *(.data.rel*) \
*(.data..shared_aligned) /* percpu related */ \
*(.data.unlikely) \
__start_once = .; \
--
2.46.0.792.g87dc391469-goog
^ permalink raw reply related [flat|nested] 73+ messages in thread
* [RFC PATCH 26/28] x86/boot: Implement support for ELF RELA/RELR relocations
2024-09-25 15:01 [RFC PATCH 00/28] x86: Rely on toolchain for relocatable code Ard Biesheuvel
` (24 preceding siblings ...)
2024-09-25 15:01 ` [RFC PATCH 25/28] x86: Use PIE codegen for the core kernel Ard Biesheuvel
@ 2024-09-25 15:01 ` Ard Biesheuvel
2024-09-25 15:01 ` [RFC PATCH 27/28] x86/kernel: Switch to PIE linking for the core kernel Ard Biesheuvel
2024-09-25 15:01 ` [RFC PATCH 28/28] x86/tools: Drop x86_64 support from 'relocs' tool Ard Biesheuvel
27 siblings, 0 replies; 73+ messages in thread
From: Ard Biesheuvel @ 2024-09-25 15:01 UTC (permalink / raw)
To: linux-kernel
Cc: Ard Biesheuvel, x86, H. Peter Anvin, Andy Lutomirski,
Peter Zijlstra, Uros Bizjak, Dennis Zhou, Tejun Heo,
Christoph Lameter, Mathieu Desnoyers, Paolo Bonzini,
Vitaly Kuznetsov, Juergen Gross, Boris Ostrovsky,
Greg Kroah-Hartman, Arnd Bergmann, Masahiro Yamada, Kees Cook,
Nathan Chancellor, Keith Packard, Justin Stitt, Josh Poimboeuf,
Arnaldo Carvalho de Melo, Namhyung Kim, Jiri Olsa, Ian Rogers,
Adrian Hunter, Kan Liang, linux-doc, linux-pm, kvm, xen-devel,
linux-efi, linux-arch, linux-sparse, linux-kbuild,
linux-perf-users, rust-for-linux, llvm
From: Ard Biesheuvel <ardb@kernel.org>
Add support for standard dynamic ELF relocations to perform the virtual
relocation of the core kernel at boot. The RELR format results in a 10x
reduction in memory footprint of the relocation data, and can be
generated by the linker directly. This removes the need for
a) a host tool 'relocs' and a bespoke, clunky relocation table format
where the table is simply concatenated to the vmlinux payload when
building the decompressor;
b) dependence on the --emit-relocs linker switch, which dumps static,
intermediate build time relocations into the ELF binary, to be
subsequently used as runtime relocations.
The latter is especially problematic, as linkers may apply relaxations
that result in the code going out of sync with the static relocation
that annotated it in the input. This requires additional work on the
part of the linker to update the static relocation, which is not even
possible in all cases. Therefore, it is much better to consume a
runtime, dynamic relocation format in the way it was intended.
This will require switching to linking vmlinux in PIE mode - this is
implemented in a subsequent patch.
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
---
Documentation/arch/x86/zero-page.rst | 3 +-
arch/x86/Kconfig | 1 +
arch/x86/include/asm/setup.h | 1 +
arch/x86/include/uapi/asm/bootparam.h | 2 +-
arch/x86/kernel/head64.c | 36 ++++++++++++++++++++
arch/x86/kernel/head_64.S | 5 +++
arch/x86/kernel/vmlinux.lds.S | 24 +++++++++----
7 files changed, 64 insertions(+), 8 deletions(-)
diff --git a/Documentation/arch/x86/zero-page.rst b/Documentation/arch/x86/zero-page.rst
index 45aa9cceb4f1..fd18b77113e2 100644
--- a/Documentation/arch/x86/zero-page.rst
+++ b/Documentation/arch/x86/zero-page.rst
@@ -3,7 +3,7 @@
=========
Zero Page
=========
-The additional fields in struct boot_params as a part of 32-bit boot
+The additional fields in struct boot_params as a part of 32/64-bit boot
protocol of kernel. These should be filled by bootloader or 16-bit
real-mode setup code of the kernel. References/settings to it mainly
are in::
@@ -20,6 +20,7 @@ Offset/Size Proto Name Meaning
060/010 ALL ist_info Intel SpeedStep (IST) BIOS support information
(struct ist_info)
070/008 ALL acpi_rsdp_addr Physical address of ACPI RSDP table
+078/008 64-bit kaslr_va_shift Virtual kASLR displacement of the core kernel
080/010 ALL hd0_info hd0 disk parameter, OBSOLETE!!
090/010 ALL hd1_info hd1 disk parameter, OBSOLETE!!
0A0/010 ALL sys_desc_table System description table (struct sys_desc_table),
diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index 2852fcd82cbd..54cb1f14218b 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -26,6 +26,7 @@ config X86_64
depends on 64BIT
# Options that are inherently 64-bit kernel only:
select ARCH_HAS_GIGANTIC_PAGE
+ select ARCH_HAS_RELR
select ARCH_SUPPORTS_INT128 if CC_HAS_INT128
select ARCH_SUPPORTS_PER_VMA_LOCK
select ARCH_SUPPORTS_HUGE_PFNMAP if TRANSPARENT_HUGEPAGE
diff --git a/arch/x86/include/asm/setup.h b/arch/x86/include/asm/setup.h
index 85f4fde3515c..a4d7dd81f773 100644
--- a/arch/x86/include/asm/setup.h
+++ b/arch/x86/include/asm/setup.h
@@ -51,6 +51,7 @@ extern void reserve_standard_io_resources(void);
extern void i386_reserve_resources(void);
extern unsigned long __startup_64(unsigned long p2v_offset, struct boot_params *bp);
extern void startup_64_setup_gdt_idt(void);
+extern void startup_64_apply_relocations(struct boot_params *bp);
extern void early_setup_idt(void);
extern void __init do_early_exception(struct pt_regs *regs, int trapnr);
diff --git a/arch/x86/include/uapi/asm/bootparam.h b/arch/x86/include/uapi/asm/bootparam.h
index 9b82eebd7add..3389b1be234c 100644
--- a/arch/x86/include/uapi/asm/bootparam.h
+++ b/arch/x86/include/uapi/asm/bootparam.h
@@ -120,7 +120,7 @@ struct boot_params {
__u64 tboot_addr; /* 0x058 */
struct ist_info ist_info; /* 0x060 */
__u64 acpi_rsdp_addr; /* 0x070 */
- __u8 _pad3[8]; /* 0x078 */
+ __u64 kaslr_va_shift; /* 0x078 */
__u8 hd0_info[16]; /* obsolete! */ /* 0x080 */
__u8 hd1_info[16]; /* obsolete! */ /* 0x090 */
struct sys_desc_table sys_desc_table; /* obsolete! */ /* 0x0a0 */
diff --git a/arch/x86/kernel/head64.c b/arch/x86/kernel/head64.c
index 49e8ba1c0d34..6609e1012f2f 100644
--- a/arch/x86/kernel/head64.c
+++ b/arch/x86/kernel/head64.c
@@ -20,6 +20,7 @@
#include <linux/io.h>
#include <linux/memblock.h>
#include <linux/cc_platform.h>
+#include <linux/elf.h>
#include <linux/pgtable.h>
#include <asm/asm.h>
@@ -588,3 +589,38 @@ void __head startup_64_setup_gdt_idt(void)
startup_64_load_idt(handler);
}
+
+#ifdef CONFIG_RELOCATABLE
+void __head startup_64_apply_relocations(struct boot_params *bp)
+{
+ extern const Elf64_Rela __rela_start[], __rela_end[];
+ extern const u64 __relr_start[], __relr_end[];
+ u64 va_offset = (u64)RIP_REL_REF(_text) - __START_KERNEL;
+ u64 va_shift = bp->kaslr_va_shift;
+ u64 *place = NULL;
+
+ if (!va_shift)
+ return;
+
+ for (const Elf64_Rela *r = __rela_start; r < __rela_end; r++) {
+ if (ELF64_R_TYPE(r->r_info) != R_X86_64_RELATIVE)
+ continue;
+
+ place = (u64 *)(r->r_offset + va_offset);
+ *place += va_shift;
+ }
+
+ for (const u64 *rel = __relr_start; rel < __relr_end; rel++) {
+ if ((*rel & 1) == 0) {
+ place = (u64 *)(*rel + va_offset);
+ *place++ += va_shift;
+ continue;
+ }
+
+ for (u64 *p = place, r = *rel >> 1; r; p++, r >>= 1)
+ if (r & 1)
+ *p += va_shift;
+ place += 63;
+ }
+}
+#endif
diff --git a/arch/x86/kernel/head_64.S b/arch/x86/kernel/head_64.S
index cc2fec3de4b7..88cdc5a0c7a3 100644
--- a/arch/x86/kernel/head_64.S
+++ b/arch/x86/kernel/head_64.S
@@ -74,6 +74,11 @@ SYM_CODE_START_NOALIGN(startup_64)
cdq
wrmsr
+#ifdef CONFIG_RELOCATABLE
+ movq %r15, %rdi
+ call startup_64_apply_relocations
+#endif
+
call startup_64_setup_gdt_idt
/* Now switch to __KERNEL_CS so IRET works reliably */
diff --git a/arch/x86/kernel/vmlinux.lds.S b/arch/x86/kernel/vmlinux.lds.S
index 52b8db931d0f..f7e832c2ac61 100644
--- a/arch/x86/kernel/vmlinux.lds.S
+++ b/arch/x86/kernel/vmlinux.lds.S
@@ -240,6 +240,18 @@ xen_elfnote_phys32_entry_offset =
:init
#endif
+ .init.rela : {
+ __rela_start = .;
+ *(.rela.*) *(.rela_*)
+ __rela_end = .;
+ }
+
+ .init.relr : {
+ __relr_start = .;
+ *(.relr.*)
+ __relr_end = .;
+ }
+
/*
* Section for code used exclusively before alternatives are run. All
* references to such code must be patched out by alternatives, normally
@@ -469,12 +481,6 @@ xen_elfnote_phys32_entry_offset =
*(.got) *(.igot.*)
}
ASSERT(SIZEOF(.got) == 0, "Unexpected GOT entries detected!")
-#endif
-
- .plt : {
- *(.plt) *(.plt.*) *(.iplt)
- }
- ASSERT(SIZEOF(.plt) == 0, "Unexpected run-time procedure linkages detected!")
.rel.dyn : {
*(.rel.*) *(.rel_*)
@@ -485,6 +491,12 @@ xen_elfnote_phys32_entry_offset =
*(.rela.*) *(.rela_*)
}
ASSERT(SIZEOF(.rela.dyn) == 0, "Unexpected run-time relocations (.rela) detected!")
+#endif
+
+ .plt : {
+ *(.plt) *(.plt.*) *(.iplt)
+ }
+ ASSERT(SIZEOF(.plt) == 0, "Unexpected run-time procedure linkages detected!")
}
/*
--
2.46.0.792.g87dc391469-goog
^ permalink raw reply related [flat|nested] 73+ messages in thread
* [RFC PATCH 27/28] x86/kernel: Switch to PIE linking for the core kernel
2024-09-25 15:01 [RFC PATCH 00/28] x86: Rely on toolchain for relocatable code Ard Biesheuvel
` (25 preceding siblings ...)
2024-09-25 15:01 ` [RFC PATCH 26/28] x86/boot: Implement support for ELF RELA/RELR relocations Ard Biesheuvel
@ 2024-09-25 15:01 ` Ard Biesheuvel
2024-09-25 18:54 ` Uros Bizjak
2024-09-25 20:24 ` Vegard Nossum
2024-09-25 15:01 ` [RFC PATCH 28/28] x86/tools: Drop x86_64 support from 'relocs' tool Ard Biesheuvel
27 siblings, 2 replies; 73+ messages in thread
From: Ard Biesheuvel @ 2024-09-25 15:01 UTC (permalink / raw)
To: linux-kernel
Cc: Ard Biesheuvel, x86, H. Peter Anvin, Andy Lutomirski,
Peter Zijlstra, Uros Bizjak, Dennis Zhou, Tejun Heo,
Christoph Lameter, Mathieu Desnoyers, Paolo Bonzini,
Vitaly Kuznetsov, Juergen Gross, Boris Ostrovsky,
Greg Kroah-Hartman, Arnd Bergmann, Masahiro Yamada, Kees Cook,
Nathan Chancellor, Keith Packard, Justin Stitt, Josh Poimboeuf,
Arnaldo Carvalho de Melo, Namhyung Kim, Jiri Olsa, Ian Rogers,
Adrian Hunter, Kan Liang, linux-doc, linux-pm, kvm, xen-devel,
linux-efi, linux-arch, linux-sparse, linux-kbuild,
linux-perf-users, rust-for-linux, llvm
From: Ard Biesheuvel <ardb@kernel.org>
Build the kernel as a Position Independent Executable (PIE). This
results in more efficient relocation processing for the virtual
displacement of the kernel (for KASLR). More importantly, it instructs
the linker to generate what is actually needed (a program that can be
moved around in memory before execution), which is better than having to
rely on the linker to create a position dependent binary that happens to
tolerate being moved around after poking it in exactly the right manner.
Note that this means that all codegen should be compatible with PIE,
including Rust objects, so this needs to switch to the small code model
with the PIE relocation model as well.
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
---
arch/x86/Kconfig | 2 +-
arch/x86/Makefile | 11 +++++++----
arch/x86/boot/compressed/misc.c | 2 ++
arch/x86/kernel/vmlinux.lds.S | 5 +++++
drivers/firmware/efi/libstub/x86-stub.c | 2 ++
5 files changed, 17 insertions(+), 5 deletions(-)
diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index 54cb1f14218b..dbb4d284b0e1 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -2187,7 +2187,7 @@ config RANDOMIZE_BASE
# Relocation on x86 needs some additional build support
config X86_NEED_RELOCS
def_bool y
- depends on RANDOMIZE_BASE || (X86_32 && RELOCATABLE)
+ depends on X86_32 && RELOCATABLE
config PHYSICAL_ALIGN
hex "Alignment value to which kernel should be aligned"
diff --git a/arch/x86/Makefile b/arch/x86/Makefile
index 83d20f402535..c1dcff444bc8 100644
--- a/arch/x86/Makefile
+++ b/arch/x86/Makefile
@@ -206,9 +206,8 @@ else
PIE_CFLAGS-$(CONFIG_SMP) += -mstack-protector-guard-reg=gs
endif
- # Don't emit relaxable GOTPCREL relocations
- KBUILD_AFLAGS_KERNEL += -Wa,-mrelax-relocations=no
- KBUILD_CFLAGS_KERNEL += -Wa,-mrelax-relocations=no $(PIE_CFLAGS-y)
+ KBUILD_CFLAGS_KERNEL += $(PIE_CFLAGS-y)
+ KBUILD_RUSTFLAGS_KERNEL += -Ccode-model=small -Crelocation-model=pie
endif
#
@@ -264,12 +263,16 @@ else
LDFLAGS_vmlinux :=
endif
+ifdef CONFIG_X86_64
+ldflags-pie-$(CONFIG_LD_IS_LLD) := --apply-dynamic-relocs
+ldflags-pie-$(CONFIG_LD_IS_BFD) := -z call-nop=suffix-nop
+LDFLAGS_vmlinux += --pie -z text $(ldflags-pie-y)
+
#
# The 64-bit kernel must be aligned to 2MB. Pass -z max-page-size=0x200000 to
# the linker to force 2MB page size regardless of the default page size used
# by the linker.
#
-ifdef CONFIG_X86_64
LDFLAGS_vmlinux += -z max-page-size=0x200000
endif
diff --git a/arch/x86/boot/compressed/misc.c b/arch/x86/boot/compressed/misc.c
index 89f01375cdb7..79e3ffe16f61 100644
--- a/arch/x86/boot/compressed/misc.c
+++ b/arch/x86/boot/compressed/misc.c
@@ -495,6 +495,8 @@ asmlinkage __visible void *extract_kernel(void *rmode, unsigned char *output)
error("Destination virtual address changed when not relocatable");
#endif
+ boot_params_ptr->kaslr_va_shift = virt_addr - LOAD_PHYSICAL_ADDR;
+
debug_putstr("\nDecompressing Linux... ");
if (init_unaccepted_memory()) {
diff --git a/arch/x86/kernel/vmlinux.lds.S b/arch/x86/kernel/vmlinux.lds.S
index f7e832c2ac61..d172e6e8eaaf 100644
--- a/arch/x86/kernel/vmlinux.lds.S
+++ b/arch/x86/kernel/vmlinux.lds.S
@@ -459,6 +459,11 @@ xen_elfnote_phys32_entry_offset =
DISCARDS
+ /DISCARD/ : {
+ *(.dynsym .gnu.hash .hash .dynamic .dynstr)
+ *(.interp .dynbss .eh_frame .sframe)
+ }
+
/*
* Make sure that the .got.plt is either completely empty or it
* contains only the lazy dispatch entries.
diff --git a/drivers/firmware/efi/libstub/x86-stub.c b/drivers/firmware/efi/libstub/x86-stub.c
index f8e465da344d..5c03954924fe 100644
--- a/drivers/firmware/efi/libstub/x86-stub.c
+++ b/drivers/firmware/efi/libstub/x86-stub.c
@@ -912,6 +912,8 @@ static efi_status_t efi_decompress_kernel(unsigned long *kernel_entry)
if (status != EFI_SUCCESS)
return status;
+ boot_params_ptr->kaslr_va_shift = virt_addr - LOAD_PHYSICAL_ADDR;
+
entry = decompress_kernel((void *)addr, virt_addr, error);
if (entry == ULONG_MAX) {
efi_free(alloc_size, addr);
--
2.46.0.792.g87dc391469-goog
^ permalink raw reply related [flat|nested] 73+ messages in thread
* [RFC PATCH 28/28] x86/tools: Drop x86_64 support from 'relocs' tool
2024-09-25 15:01 [RFC PATCH 00/28] x86: Rely on toolchain for relocatable code Ard Biesheuvel
` (26 preceding siblings ...)
2024-09-25 15:01 ` [RFC PATCH 27/28] x86/kernel: Switch to PIE linking for the core kernel Ard Biesheuvel
@ 2024-09-25 15:01 ` Ard Biesheuvel
27 siblings, 0 replies; 73+ messages in thread
From: Ard Biesheuvel @ 2024-09-25 15:01 UTC (permalink / raw)
To: linux-kernel
Cc: Ard Biesheuvel, x86, H. Peter Anvin, Andy Lutomirski,
Peter Zijlstra, Uros Bizjak, Dennis Zhou, Tejun Heo,
Christoph Lameter, Mathieu Desnoyers, Paolo Bonzini,
Vitaly Kuznetsov, Juergen Gross, Boris Ostrovsky,
Greg Kroah-Hartman, Arnd Bergmann, Masahiro Yamada, Kees Cook,
Nathan Chancellor, Keith Packard, Justin Stitt, Josh Poimboeuf,
Arnaldo Carvalho de Melo, Namhyung Kim, Jiri Olsa, Ian Rogers,
Adrian Hunter, Kan Liang, linux-doc, linux-pm, kvm, xen-devel,
linux-efi, linux-arch, linux-sparse, linux-kbuild,
linux-perf-users, rust-for-linux, llvm
From: Ard Biesheuvel <ardb@kernel.org>
The relocs tool is no longer used on vmlinux, which is the only 64-bit
ELF executable that it used to operate on in the 64-bit build. (It is
still used for parts of the decompressor)
So drop the 64-bit handling - it is dead code now.
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
---
arch/x86/tools/Makefile | 2 +-
arch/x86/tools/relocs.c | 178 +-------------------
arch/x86/tools/relocs.h | 9 +-
arch/x86/tools/relocs_64.c | 18 --
arch/x86/tools/relocs_common.c | 11 +-
5 files changed, 9 insertions(+), 209 deletions(-)
diff --git a/arch/x86/tools/Makefile b/arch/x86/tools/Makefile
index 7278e2545c35..f7d12a9dccfc 100644
--- a/arch/x86/tools/Makefile
+++ b/arch/x86/tools/Makefile
@@ -40,7 +40,7 @@ $(obj)/insn_sanity.o: $(srctree)/tools/arch/x86/lib/insn.c $(srctree)/tools/arch
HOST_EXTRACFLAGS += -I$(srctree)/tools/include
hostprogs += relocs
-relocs-objs := relocs_32.o relocs_64.o relocs_common.o
+relocs-objs := relocs_32.o relocs_common.o
PHONY += relocs
relocs: $(obj)/relocs
@:
diff --git a/arch/x86/tools/relocs.c b/arch/x86/tools/relocs.c
index 22c2d3f07a57..ff5578e63ff8 100644
--- a/arch/x86/tools/relocs.c
+++ b/arch/x86/tools/relocs.c
@@ -28,12 +28,7 @@ struct relocs {
static struct relocs relocs16;
static struct relocs relocs32;
-#if ELF_BITS == 64
-static struct relocs relocs64;
-# define FMT PRIu64
-#else
# define FMT PRIu32
-#endif
struct section {
Elf_Shdr shdr;
@@ -86,10 +81,6 @@ static const char * const sym_regex_kernel[S_NSYMTYPES] = {
"__end_rodata_aligned|"
"__initramfs_start|"
"(jiffies|jiffies_64)|"
-#if ELF_BITS == 64
- "__per_cpu_load|"
- "__end_rodata_hpage_align|"
-#endif
"__vvar_page|"
"_end)$"
};
@@ -210,27 +201,6 @@ static const char *rel_type(unsigned type)
{
static const char *type_name[] = {
#define REL_TYPE(X) [X] = #X
-#if ELF_BITS == 64
- REL_TYPE(R_X86_64_NONE),
- REL_TYPE(R_X86_64_64),
- REL_TYPE(R_X86_64_PC64),
- REL_TYPE(R_X86_64_PC32),
- REL_TYPE(R_X86_64_GOT32),
- REL_TYPE(R_X86_64_PLT32),
- REL_TYPE(R_X86_64_COPY),
- REL_TYPE(R_X86_64_GLOB_DAT),
- REL_TYPE(R_X86_64_JUMP_SLOT),
- REL_TYPE(R_X86_64_RELATIVE),
- REL_TYPE(R_X86_64_GOTPCREL),
- REL_TYPE(R_X86_64_GOTPCRELX),
- REL_TYPE(R_X86_64_REX_GOTPCRELX),
- REL_TYPE(R_X86_64_32),
- REL_TYPE(R_X86_64_32S),
- REL_TYPE(R_X86_64_16),
- REL_TYPE(R_X86_64_PC16),
- REL_TYPE(R_X86_64_8),
- REL_TYPE(R_X86_64_PC8),
-#else
REL_TYPE(R_386_NONE),
REL_TYPE(R_386_32),
REL_TYPE(R_386_PC32),
@@ -246,7 +216,6 @@ static const char *rel_type(unsigned type)
REL_TYPE(R_386_PC8),
REL_TYPE(R_386_16),
REL_TYPE(R_386_PC16),
-#endif
#undef REL_TYPE
};
const char *name = "unknown type rel type name";
@@ -312,19 +281,9 @@ static uint32_t elf32_to_cpu(uint32_t val)
#define elf_half_to_cpu(x) elf16_to_cpu(x)
#define elf_word_to_cpu(x) elf32_to_cpu(x)
-#if ELF_BITS == 64
-static uint64_t elf64_to_cpu(uint64_t val)
-{
- return le64_to_cpu(val);
-}
-# define elf_addr_to_cpu(x) elf64_to_cpu(x)
-# define elf_off_to_cpu(x) elf64_to_cpu(x)
-# define elf_xword_to_cpu(x) elf64_to_cpu(x)
-#else
# define elf_addr_to_cpu(x) elf32_to_cpu(x)
# define elf_off_to_cpu(x) elf32_to_cpu(x)
# define elf_xword_to_cpu(x) elf32_to_cpu(x)
-#endif
static int sym_index(Elf_Sym *sym)
{
@@ -515,10 +474,7 @@ static void print_absolute_symbols(void)
int i;
const char *format;
- if (ELF_BITS == 64)
- format = "%5d %016"PRIx64" %5"PRId64" %10s %10s %12s %s\n";
- else
- format = "%5d %08"PRIx32" %5"PRId32" %10s %10s %12s %s\n";
+ format = "%5d %08"PRIx32" %5"PRId32" %10s %10s %12s %s\n";
printf("Absolute symbols\n");
printf(" Num: Value Size Type Bind Visibility Name\n");
@@ -559,10 +515,7 @@ static void print_absolute_relocs(void)
int i, printed = 0;
const char *format;
- if (ELF_BITS == 64)
- format = "%016"PRIx64" %016"PRIx64" %10s %016"PRIx64" %s\n";
- else
- format = "%08"PRIx32" %08"PRIx32" %10s %08"PRIx32" %s\n";
+ format = "%08"PRIx32" %08"PRIx32" %10s %08"PRIx32" %s\n";
for (i = 0; i < shnum; i++) {
struct section *sec = &secs[i];
@@ -694,104 +647,6 @@ static void walk_relocs(int (*process)(struct section *sec, Elf_Rel *rel,
}
}
-#if ELF_BITS == 64
-
-static int do_reloc64(struct section *sec, Elf_Rel *rel, ElfW(Sym) *sym,
- const char *symname)
-{
- unsigned r_type = ELF64_R_TYPE(rel->r_info);
- ElfW(Addr) offset = rel->r_offset;
- int shn_abs = (sym->st_shndx == SHN_ABS) && !is_reloc(S_REL, symname);
-
- if (sym->st_shndx == SHN_UNDEF)
- return 0;
-
- switch (r_type) {
- case R_X86_64_NONE:
- /* NONE can be ignored. */
- break;
-
- case R_X86_64_PC32:
- case R_X86_64_PLT32:
- /*
- * PC relative relocations don't need to be adjusted.
- *
- * NB: R_X86_64_PLT32 can be treated as R_X86_64_PC32.
- */
- break;
-
- case R_X86_64_PC64:
- /*
- * Only used by jump labels
- */
- break;
-
- case R_X86_64_32:
- case R_X86_64_32S:
- case R_X86_64_64:
- case R_X86_64_GOTPCREL:
- if (shn_abs) {
- /*
- * Whitelisted absolute symbols do not require
- * relocation.
- */
- if (is_reloc(S_ABS, symname))
- break;
-
- die("Invalid absolute %s relocation: %s\n", rel_type(r_type), symname);
- break;
- }
-
- if (r_type == R_X86_64_GOTPCREL) {
- Elf_Shdr *s = &secs[sec->shdr.sh_info].shdr;
- unsigned file_off = offset - s->sh_addr + s->sh_offset;
-
- /*
- * GOTPCREL relocations refer to instructions that load
- * a 64-bit address via a 32-bit relative reference to
- * the GOT. In this case, it is the GOT entry that
- * needs to be fixed up, not the immediate offset in
- * the opcode. Note that the linker will have applied an
- * addend of -4 to compensate for the delta between the
- * relocation offset and the value of RIP when the
- * instruction executes, and this needs to be backed out
- * again. (Addends other than -4 are permitted in
- * principle, but make no sense in practice so they are
- * not supported.)
- */
- if (rel->r_addend != -4) {
- die("invalid addend (%ld) for %s relocation: %s\n",
- rel->r_addend, rel_type(r_type), symname);
- break;
- }
- offset += 4 + (int32_t)get_unaligned_le32(elf_image + file_off);
- }
-
- /*
- * Relocation offsets for 64 bit kernels are output
- * as 32 bits and sign extended back to 64 bits when
- * the relocations are processed.
- * Make sure that the offset will fit.
- */
- if ((int32_t)offset != (int64_t)offset)
- die("Relocation offset doesn't fit in 32 bits\n");
-
- if (r_type == R_X86_64_64 || r_type == R_X86_64_GOTPCREL)
- add_reloc(&relocs64, offset);
- else
- add_reloc(&relocs32, offset);
- break;
-
- default:
- die("Unsupported relocation type: %s (%d)\n", rel_type(r_type), r_type);
- break;
- }
-
- return 0;
-}
-
-#else
-
static int do_reloc32(struct section *sec, Elf_Rel *rel, Elf_Sym *sym,
const char *symname)
{
@@ -902,8 +757,6 @@ static int do_reloc_real(struct section *sec, Elf_Rel *rel, Elf_Sym *sym, const
return 0;
}
-#endif
-
static int cmp_relocs(const void *va, const void *vb)
{
const uint32_t *a, *b;
@@ -939,17 +792,10 @@ static void emit_relocs(int as_text, int use_real_mode)
int (*write_reloc)(uint32_t, FILE *) = write32;
int (*do_reloc)(struct section *sec, Elf_Rel *rel, Elf_Sym *sym, const char *symname);
-#if ELF_BITS == 64
- if (!use_real_mode)
- do_reloc = do_reloc64;
- else
- die("--realmode not valid for a 64-bit ELF file");
-#else
if (!use_real_mode)
do_reloc = do_reloc32;
else
do_reloc = do_reloc_real;
-#endif
/* Collect up the relocations */
walk_relocs(do_reloc);
@@ -959,11 +805,7 @@ static void emit_relocs(int as_text, int use_real_mode)
/* Order the relocations for more efficient processing */
sort_relocs(&relocs32);
-#if ELF_BITS == 64
- sort_relocs(&relocs64);
-#else
sort_relocs(&relocs16);
-#endif
/* Print the relocations */
if (as_text) {
@@ -984,16 +826,6 @@ static void emit_relocs(int as_text, int use_real_mode)
for (i = 0; i < relocs32.count; i++)
write_reloc(relocs32.offset[i], stdout);
} else {
-#if ELF_BITS == 64
- /* Print a stop */
- write_reloc(0, stdout);
-
- /* Now print each relocation */
- for (i = 0; i < relocs64.count; i++)
- if (!i || relocs64.offset[i] != relocs64.offset[i - 1])
- write_reloc(relocs64.offset[i], stdout);
-#endif
-
/* Print a stop */
write_reloc(0, stdout);
@@ -1027,12 +859,6 @@ static void print_reloc_info(void)
walk_relocs(do_reloc_info);
}
-#if ELF_BITS == 64
-# define process process_64
-#else
-# define process process_32
-#endif
-
void process(FILE *fp, int use_real_mode, int as_text,
int show_absolute_syms, int show_absolute_relocs,
int show_reloc_info)
diff --git a/arch/x86/tools/relocs.h b/arch/x86/tools/relocs.h
index 7a509604ff92..ef9eec96bd62 100644
--- a/arch/x86/tools/relocs.h
+++ b/arch/x86/tools/relocs.h
@@ -32,10 +32,7 @@ enum symtype {
S_NSYMTYPES
};
-void process_32(FILE *fp, int use_real_mode, int as_text,
- int show_absolute_syms, int show_absolute_relocs,
- int show_reloc_info);
-void process_64(FILE *fp, int use_real_mode, int as_text,
- int show_absolute_syms, int show_absolute_relocs,
- int show_reloc_info);
+void process(FILE *fp, int use_real_mode, int as_text,
+ int show_absolute_syms, int show_absolute_relocs,
+ int show_reloc_info);
#endif /* RELOCS_H */
diff --git a/arch/x86/tools/relocs_64.c b/arch/x86/tools/relocs_64.c
deleted file mode 100644
index 9029cb619cb1..000000000000
--- a/arch/x86/tools/relocs_64.c
+++ /dev/null
@@ -1,18 +0,0 @@
-// SPDX-License-Identifier: GPL-2.0
-#include "relocs.h"
-
-#define ELF_BITS 64
-
-#define ELF_MACHINE EM_X86_64
-#define ELF_MACHINE_NAME "x86_64"
-#define SHT_REL_TYPE SHT_RELA
-#define Elf_Rel Elf64_Rela
-
-#define ELF_CLASS ELFCLASS64
-#define ELF_R_SYM(val) ELF64_R_SYM(val)
-#define ELF_R_TYPE(val) ELF64_R_TYPE(val)
-#define ELF_ST_TYPE(o) ELF64_ST_TYPE(o)
-#define ELF_ST_BIND(o) ELF64_ST_BIND(o)
-#define ELF_ST_VISIBILITY(o) ELF64_ST_VISIBILITY(o)
-
-#include "relocs.c"
diff --git a/arch/x86/tools/relocs_common.c b/arch/x86/tools/relocs_common.c
index 6634352a20bc..167985ecd544 100644
--- a/arch/x86/tools/relocs_common.c
+++ b/arch/x86/tools/relocs_common.c
@@ -72,14 +72,9 @@ int main(int argc, char **argv)
die("Cannot read %s: %s", fname, strerror(errno));
}
rewind(fp);
- if (e_ident[EI_CLASS] == ELFCLASS64)
- process_64(fp, use_real_mode, as_text,
- show_absolute_syms, show_absolute_relocs,
- show_reloc_info);
- else
- process_32(fp, use_real_mode, as_text,
- show_absolute_syms, show_absolute_relocs,
- show_reloc_info);
+ process(fp, use_real_mode, as_text,
+ show_absolute_syms, show_absolute_relocs,
+ show_reloc_info);
fclose(fp);
return 0;
}
--
2.46.0.792.g87dc391469-goog
^ permalink raw reply related [flat|nested] 73+ messages in thread
* Re: [RFC PATCH 05/28] x86: Define the stack protector guard symbol explicitly
2024-09-25 15:01 ` [RFC PATCH 05/28] x86: Define the stack protector guard symbol explicitly Ard Biesheuvel
@ 2024-09-25 15:53 ` Ian Rogers
2024-09-25 17:43 ` Ard Biesheuvel
2024-09-25 18:32 ` Uros Bizjak
2024-10-04 10:01 ` Uros Bizjak
2 siblings, 1 reply; 73+ messages in thread
From: Ian Rogers @ 2024-09-25 15:53 UTC (permalink / raw)
To: Ard Biesheuvel
Cc: linux-kernel, Ard Biesheuvel, x86, H. Peter Anvin,
Andy Lutomirski, Peter Zijlstra, Uros Bizjak, Dennis Zhou,
Tejun Heo, Christoph Lameter, Mathieu Desnoyers, Paolo Bonzini,
Vitaly Kuznetsov, Juergen Gross, Boris Ostrovsky,
Greg Kroah-Hartman, Arnd Bergmann, Masahiro Yamada, Kees Cook,
Nathan Chancellor, Keith Packard, Justin Stitt, Josh Poimboeuf,
Arnaldo Carvalho de Melo, Namhyung Kim, Jiri Olsa, Adrian Hunter,
Kan Liang, linux-doc, linux-pm, kvm, xen-devel, linux-efi,
linux-arch, linux-sparse, linux-kbuild, linux-perf-users,
rust-for-linux, llvm
On Wed, Sep 25, 2024 at 8:02 AM Ard Biesheuvel <ardb+git@google.com> wrote:
>
> From: Ard Biesheuvel <ardb@kernel.org>
>
> Specify the guard symbol for the stack cookie explicitly, rather than
> positioning it exactly 40 bytes into the per-CPU area. Doing so removes
> the need for the per-CPU region to be absolute rather than relative to
> the placement of the per-CPU template region in the kernel image, and
> this allows the special handling for absolute per-CPU symbols to be
> removed entirely.
>
> This is a worthwhile cleanup in itself, but it is also a prerequisite
> for PIE codegen and PIE linking, which can replace our bespoke and
> rather clunky runtime relocation handling.
>
> Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
> ---
> arch/x86/Makefile | 4 ++++
> arch/x86/include/asm/init.h | 2 +-
> arch/x86/include/asm/processor.h | 11 +++--------
> arch/x86/include/asm/stackprotector.h | 4 ----
> tools/perf/util/annotate.c | 4 ++--
> 5 files changed, 10 insertions(+), 15 deletions(-)
>
> diff --git a/arch/x86/Makefile b/arch/x86/Makefile
> index 6b3fe6e2aadd..b78b7623a4a9 100644
> --- a/arch/x86/Makefile
> +++ b/arch/x86/Makefile
> @@ -193,6 +193,10 @@ else
> KBUILD_RUSTFLAGS += -Cno-redzone=y
> KBUILD_RUSTFLAGS += -Ccode-model=kernel
>
> + ifeq ($(CONFIG_STACKPROTECTOR),y)
> + KBUILD_CFLAGS += -mstack-protector-guard-symbol=fixed_percpu_data
> + endif
> +
> # Don't emit relaxable GOTPCREL relocations
> KBUILD_AFLAGS_KERNEL += -Wa,-mrelax-relocations=no
> KBUILD_CFLAGS_KERNEL += -Wa,-mrelax-relocations=no
> diff --git a/arch/x86/include/asm/init.h b/arch/x86/include/asm/init.h
> index 14d72727d7ee..3ed0e8ec973f 100644
> --- a/arch/x86/include/asm/init.h
> +++ b/arch/x86/include/asm/init.h
> @@ -2,7 +2,7 @@
> #ifndef _ASM_X86_INIT_H
> #define _ASM_X86_INIT_H
>
> -#define __head __section(".head.text")
> +#define __head __section(".head.text") __no_stack_protector
>
> struct x86_mapping_info {
> void *(*alloc_pgt_page)(void *); /* allocate buf for page table */
> diff --git a/arch/x86/include/asm/processor.h b/arch/x86/include/asm/processor.h
> index 4a686f0e5dbf..56bc36116814 100644
> --- a/arch/x86/include/asm/processor.h
> +++ b/arch/x86/include/asm/processor.h
> @@ -402,14 +402,9 @@ struct irq_stack {
> #ifdef CONFIG_X86_64
> struct fixed_percpu_data {
> /*
> - * GCC hardcodes the stack canary as %gs:40. Since the
> - * irq_stack is the object at %gs:0, we reserve the bottom
> - * 48 bytes of the irq stack for the canary.
> - *
> - * Once we are willing to require -mstack-protector-guard-symbol=
> - * support for x86_64 stackprotector, we can get rid of this.
> + * Since the irq_stack is the object at %gs:0, the bottom 8 bytes of
> + * the irq stack are reserved for the canary.
> */
> - char gs_base[40];
> unsigned long stack_canary;
> };
>
> @@ -418,7 +413,7 @@ DECLARE_INIT_PER_CPU(fixed_percpu_data);
>
> static inline unsigned long cpu_kernelmode_gs_base(int cpu)
> {
> - return (unsigned long)per_cpu(fixed_percpu_data.gs_base, cpu);
> + return (unsigned long)&per_cpu(fixed_percpu_data, cpu);
> }
>
> extern asmlinkage void entry_SYSCALL32_ignore(void);
> diff --git a/arch/x86/include/asm/stackprotector.h b/arch/x86/include/asm/stackprotector.h
> index 00473a650f51..d1dcd22a0a4c 100644
> --- a/arch/x86/include/asm/stackprotector.h
> +++ b/arch/x86/include/asm/stackprotector.h
> @@ -51,10 +51,6 @@ static __always_inline void boot_init_stack_canary(void)
> {
> unsigned long canary = get_random_canary();
>
> -#ifdef CONFIG_X86_64
> - BUILD_BUG_ON(offsetof(struct fixed_percpu_data, stack_canary) != 40);
> -#endif
> -
> current->stack_canary = canary;
> #ifdef CONFIG_X86_64
> this_cpu_write(fixed_percpu_data.stack_canary, canary);
> diff --git a/tools/perf/util/annotate.c b/tools/perf/util/annotate.c
> index 37ce43c4eb8f..7ecfedf5edb9 100644
> --- a/tools/perf/util/annotate.c
> +++ b/tools/perf/util/annotate.c
> @@ -2485,10 +2485,10 @@ static bool is_stack_operation(struct arch *arch, struct disasm_line *dl)
>
> static bool is_stack_canary(struct arch *arch, struct annotated_op_loc *loc)
> {
> - /* On x86_64, %gs:40 is used for stack canary */
> + /* On x86_64, %gs:0 is used for stack canary */
> if (arch__is(arch, "x86")) {
> if (loc->segment == INSN_SEG_X86_GS && loc->imm &&
> - loc->offset == 40)
> + loc->offset == 0)
As a new perf tool can run on old kernels we may need to have this be
something like:
(loc->offset == 40 /* pre v6.xx kernels */ || loc->offset == 0 /*
v6.xx and later */ )
We could make this dependent on the kernel by processing the os_release string:
https://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools-next.git/tree/tools/perf/util/env.h#n55
but that could well be more trouble than it is worth.
Thanks,
Ian
> return true;
> }
>
> --
> 2.46.0.792.g87dc391469-goog
>
^ permalink raw reply [flat|nested] 73+ messages in thread
* Re: [RFC PATCH 02/28] Documentation: Bump minimum GCC version to 8.1
2024-09-25 15:01 ` [RFC PATCH 02/28] Documentation: Bump minimum GCC version to 8.1 Ard Biesheuvel
@ 2024-09-25 15:58 ` Arnd Bergmann
2024-12-19 11:53 ` Mark Rutland
2024-09-26 21:35 ` Miguel Ojeda
2024-09-27 16:22 ` Mark Rutland
2 siblings, 1 reply; 73+ messages in thread
From: Arnd Bergmann @ 2024-09-25 15:58 UTC (permalink / raw)
To: Ard Biesheuvel, linux-kernel
Cc: Ard Biesheuvel, x86, H. Peter Anvin, Andy Lutomirski,
Peter Zijlstra, Uros Bizjak, Dennis Zhou, Tejun Heo,
Christoph Lameter, Mathieu Desnoyers, Paolo Bonzini,
Vitaly Kuznetsov, Juergen Gross, Boris Ostrovsky,
Greg Kroah-Hartman, Masahiro Yamada, Kees Cook, Nathan Chancellor,
Keith Packard, Justin Stitt, Josh Poimboeuf,
Arnaldo Carvalho de Melo, Namhyung Kim, Jiri Olsa, Ian Rogers,
Adrian Hunter, Kan Liang, linux-doc, linux-pm, kvm, xen-devel,
linux-efi, Linux-Arch, linux-sparse, linux-kbuild,
linux-perf-users, rust-for-linux, llvm
On Wed, Sep 25, 2024, at 15:01, Ard Biesheuvel wrote:
> From: Ard Biesheuvel <ardb@kernel.org>
>
> Bump the minimum GCC version to 8.1 to gain unconditional support for
> referring to the per-task stack cookie using a symbol rather than
> relying on the fixed offset of 40 bytes from %GS, which requires
> elaborate hacks to support.
>
> Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
> ---
> Documentation/admin-guide/README.rst | 2 +-
> Documentation/process/changes.rst | 2 +-
> 2 files changed, 2 insertions(+), 2 deletions(-)
Acked-by: Arnd Bergmann <arnd@arndb.de>
As we discussed during plumbers, I think this is reasonable,
both the gcc-8.1 version and the timing after the 6.12-LTS
kernel.
We obviously need to go through all the other version checks
to see what else can be cleaned up. I would suggest we also
raise the binutils version to 2.30+, which is what RHEL8
shipped alongside gcc-8. I have not found other distros that
use older binutils in combination with gcc-8 or higher,
Debian 10 uses binutils-2.31.
I don't think we want to combine the additional cleanup with
your series, but if we can agree on the version, we can do that
in parallel.
FWIW, here are links to the last few times we discussed this,
and there are already has a few other things that would
benefit from more modern compilers:
https://lore.kernel.org/lkml/dca5b082-90d1-40ab-954f-8b3b6f51138c@app.fastmail.com/
https://lore.kernel.org/lkml/CAFULd4biN8FPRtU54Q0QywfBFvvWV-s1M3kWF9YOmozyAX9+ZQ@mail.gmail.com/
https://lore.kernel.org/lkml/CAK8P3a1Vt17Yry_gTQ0dwr7_tEoFhuec+mQzzKzFvZGD5Hrnow@mail.gmail.com/
Arnd
^ permalink raw reply [flat|nested] 73+ messages in thread
* Re: [RFC PATCH 14/28] x86/rethook: Use RIP-relative reference for return address
2024-09-25 15:01 ` [RFC PATCH 14/28] x86/rethook: Use RIP-relative reference for return address Ard Biesheuvel
@ 2024-09-25 16:39 ` Linus Torvalds
2024-09-25 16:45 ` Ard Biesheuvel
0 siblings, 1 reply; 73+ messages in thread
From: Linus Torvalds @ 2024-09-25 16:39 UTC (permalink / raw)
To: Ard Biesheuvel
Cc: linux-kernel, Ard Biesheuvel, x86, H. Peter Anvin,
Andy Lutomirski, Peter Zijlstra, Uros Bizjak, Dennis Zhou,
Tejun Heo, Christoph Lameter, Mathieu Desnoyers, Paolo Bonzini,
Vitaly Kuznetsov, Juergen Gross, Boris Ostrovsky,
Greg Kroah-Hartman, Arnd Bergmann, Masahiro Yamada, Kees Cook,
Nathan Chancellor, Keith Packard, Justin Stitt, Josh Poimboeuf,
Arnaldo Carvalho de Melo, Namhyung Kim, Jiri Olsa, Ian Rogers,
Adrian Hunter, Kan Liang, linux-doc, linux-pm, kvm, xen-devel,
linux-efi, linux-arch, linux-sparse, linux-kbuild,
linux-perf-users, rust-for-linux, llvm
On Wed, 25 Sept 2024 at 08:16, Ard Biesheuvel <ardb+git@google.com> wrote:
>
> Instead of pushing an immediate absolute address, which is incompatible
> with PIE codegen or linking, use a LEA instruction to take the address
> into a register.
I don't think you can do this - it corrupts %rdi.
Yes, the code uses %rdi later, but that's inside the SAVE_REGS_STRING
/ RESTORE_REGS_STRING area.
And we do have special calling conventions that aren't the regular
ones, so %rdi might actually be used elsewhere. For example,
__get_user_X and __put_user_X all have magical calling conventions:
they don't actually use %rdi, but part of the calling convention is
that the unused registers aren't modified.
Of course, I'm not actually sure you can probe those and trigger this
issue, but it all makes me think it's broken.
And it's entirely possible that I'm wrong for some reason, but this
just _looks_ very very wrong to me.
I think you can do this with a "pushq mem" instead, and put the
relocation into the memory location.
Linus
^ permalink raw reply [flat|nested] 73+ messages in thread
* Re: [RFC PATCH 14/28] x86/rethook: Use RIP-relative reference for return address
2024-09-25 16:39 ` Linus Torvalds
@ 2024-09-25 16:45 ` Ard Biesheuvel
0 siblings, 0 replies; 73+ messages in thread
From: Ard Biesheuvel @ 2024-09-25 16:45 UTC (permalink / raw)
To: Linus Torvalds
Cc: Ard Biesheuvel, linux-kernel, x86, H. Peter Anvin,
Andy Lutomirski, Peter Zijlstra, Uros Bizjak, Dennis Zhou,
Tejun Heo, Christoph Lameter, Mathieu Desnoyers, Paolo Bonzini,
Vitaly Kuznetsov, Juergen Gross, Boris Ostrovsky,
Greg Kroah-Hartman, Arnd Bergmann, Masahiro Yamada, Kees Cook,
Nathan Chancellor, Keith Packard, Justin Stitt, Josh Poimboeuf,
Arnaldo Carvalho de Melo, Namhyung Kim, Jiri Olsa, Ian Rogers,
Adrian Hunter, Kan Liang, linux-doc, linux-pm, kvm, xen-devel,
linux-efi, linux-arch, linux-sparse, linux-kbuild,
linux-perf-users, rust-for-linux, llvm
On Wed, 25 Sept 2024 at 18:39, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
>
> On Wed, 25 Sept 2024 at 08:16, Ard Biesheuvel <ardb+git@google.com> wrote:
> >
> > Instead of pushing an immediate absolute address, which is incompatible
> > with PIE codegen or linking, use a LEA instruction to take the address
> > into a register.
>
> I don't think you can do this - it corrupts %rdi.
>
> Yes, the code uses %rdi later, but that's inside the SAVE_REGS_STRING
> / RESTORE_REGS_STRING area.
>
Oops, I missed that.
> And we do have special calling conventions that aren't the regular
> ones, so %rdi might actually be used elsewhere. For example,
> __get_user_X and __put_user_X all have magical calling conventions:
> they don't actually use %rdi, but part of the calling convention is
> that the unused registers aren't modified.
>
> Of course, I'm not actually sure you can probe those and trigger this
> issue, but it all makes me think it's broken.
>
> And it's entirely possible that I'm wrong for some reason, but this
> just _looks_ very very wrong to me.
>
> I think you can do this with a "pushq mem" instead, and put the
> relocation into the memory location.
>
I'll change this into
pushq arch_rethook_trampoline@GOTPCREL(%rip)
which I had originally. I was trying to avoid the load from memory,
but that obviously only works if the register is not live.
^ permalink raw reply [flat|nested] 73+ messages in thread
* Re: [RFC PATCH 05/28] x86: Define the stack protector guard symbol explicitly
2024-09-25 15:53 ` Ian Rogers
@ 2024-09-25 17:43 ` Ard Biesheuvel
2024-09-25 17:48 ` Ian Rogers
0 siblings, 1 reply; 73+ messages in thread
From: Ard Biesheuvel @ 2024-09-25 17:43 UTC (permalink / raw)
To: Ian Rogers
Cc: Ard Biesheuvel, linux-kernel, x86, H. Peter Anvin,
Andy Lutomirski, Peter Zijlstra, Uros Bizjak, Dennis Zhou,
Tejun Heo, Christoph Lameter, Mathieu Desnoyers, Paolo Bonzini,
Vitaly Kuznetsov, Juergen Gross, Boris Ostrovsky,
Greg Kroah-Hartman, Arnd Bergmann, Masahiro Yamada, Kees Cook,
Nathan Chancellor, Keith Packard, Justin Stitt, Josh Poimboeuf,
Arnaldo Carvalho de Melo, Namhyung Kim, Jiri Olsa, Adrian Hunter,
Kan Liang, linux-doc, linux-pm, kvm, xen-devel, linux-efi,
linux-arch, linux-sparse, linux-kbuild, linux-perf-users,
rust-for-linux, llvm
On Wed, 25 Sept 2024 at 17:54, Ian Rogers <irogers@google.com> wrote:
>
> On Wed, Sep 25, 2024 at 8:02 AM Ard Biesheuvel <ardb+git@google.com> wrote:
> >
> > From: Ard Biesheuvel <ardb@kernel.org>
> >
> > Specify the guard symbol for the stack cookie explicitly, rather than
> > positioning it exactly 40 bytes into the per-CPU area. Doing so removes
> > the need for the per-CPU region to be absolute rather than relative to
> > the placement of the per-CPU template region in the kernel image, and
> > this allows the special handling for absolute per-CPU symbols to be
> > removed entirely.
> >
> > This is a worthwhile cleanup in itself, but it is also a prerequisite
> > for PIE codegen and PIE linking, which can replace our bespoke and
> > rather clunky runtime relocation handling.
> >
> > Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
> > ---
> > arch/x86/Makefile | 4 ++++
> > arch/x86/include/asm/init.h | 2 +-
> > arch/x86/include/asm/processor.h | 11 +++--------
> > arch/x86/include/asm/stackprotector.h | 4 ----
> > tools/perf/util/annotate.c | 4 ++--
> > 5 files changed, 10 insertions(+), 15 deletions(-)
> >
...
> > diff --git a/tools/perf/util/annotate.c b/tools/perf/util/annotate.c
> > index 37ce43c4eb8f..7ecfedf5edb9 100644
> > --- a/tools/perf/util/annotate.c
> > +++ b/tools/perf/util/annotate.c
> > @@ -2485,10 +2485,10 @@ static bool is_stack_operation(struct arch *arch, struct disasm_line *dl)
> >
> > static bool is_stack_canary(struct arch *arch, struct annotated_op_loc *loc)
> > {
> > - /* On x86_64, %gs:40 is used for stack canary */
> > + /* On x86_64, %gs:0 is used for stack canary */
> > if (arch__is(arch, "x86")) {
> > if (loc->segment == INSN_SEG_X86_GS && loc->imm &&
> > - loc->offset == 40)
> > + loc->offset == 0)
>
> As a new perf tool can run on old kernels we may need to have this be
> something like:
> (loc->offset == 40 /* pre v6.xx kernels */ || loc->offset == 0 /*
> v6.xx and later */ )
>
> We could make this dependent on the kernel by processing the os_release string:
> https://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools-next.git/tree/tools/perf/util/env.h#n55
> but that could well be more trouble than it is worth.
>
Yeah. I also wonder what the purpose of this feature is. At the end of
this series, the stack cookie will no longer be at a fixed offset of
%GS anyway, and so perf will not be able to identify it in the same
manner. So it is probably better to just leave this in place, as the
%gs:0 case will not exist in the field (assuming that the series lands
all at once).
Any idea why this deviates from other architectures? Is x86_64 the
only arch that needs to identify stack canary accesses in perf? We
could rename the symbol to something identifiable, and do it across
all architectures, if this really serves a need (and assuming that
perf has insight into the symbol table).
^ permalink raw reply [flat|nested] 73+ messages in thread
* Re: [RFC PATCH 05/28] x86: Define the stack protector guard symbol explicitly
2024-09-25 17:43 ` Ard Biesheuvel
@ 2024-09-25 17:48 ` Ian Rogers
0 siblings, 0 replies; 73+ messages in thread
From: Ian Rogers @ 2024-09-25 17:48 UTC (permalink / raw)
To: Ard Biesheuvel, Namhyung Kim
Cc: Ard Biesheuvel, linux-kernel, x86, H. Peter Anvin,
Andy Lutomirski, Peter Zijlstra, Uros Bizjak, Dennis Zhou,
Tejun Heo, Christoph Lameter, Mathieu Desnoyers, Paolo Bonzini,
Vitaly Kuznetsov, Juergen Gross, Boris Ostrovsky,
Greg Kroah-Hartman, Arnd Bergmann, Masahiro Yamada, Kees Cook,
Nathan Chancellor, Keith Packard, Justin Stitt, Josh Poimboeuf,
Arnaldo Carvalho de Melo, Jiri Olsa, Adrian Hunter, Kan Liang,
linux-doc, linux-pm, kvm, xen-devel, linux-efi, linux-arch,
linux-sparse, linux-kbuild, linux-perf-users, rust-for-linux,
llvm
On Wed, Sep 25, 2024 at 10:43 AM Ard Biesheuvel <ardb@kernel.org> wrote:
>
> On Wed, 25 Sept 2024 at 17:54, Ian Rogers <irogers@google.com> wrote:
> >
> > On Wed, Sep 25, 2024 at 8:02 AM Ard Biesheuvel <ardb+git@google.com> wrote:
> > >
> > > From: Ard Biesheuvel <ardb@kernel.org>
> > >
> > > Specify the guard symbol for the stack cookie explicitly, rather than
> > > positioning it exactly 40 bytes into the per-CPU area. Doing so removes
> > > the need for the per-CPU region to be absolute rather than relative to
> > > the placement of the per-CPU template region in the kernel image, and
> > > this allows the special handling for absolute per-CPU symbols to be
> > > removed entirely.
> > >
> > > This is a worthwhile cleanup in itself, but it is also a prerequisite
> > > for PIE codegen and PIE linking, which can replace our bespoke and
> > > rather clunky runtime relocation handling.
> > >
> > > Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
> > > ---
> > > arch/x86/Makefile | 4 ++++
> > > arch/x86/include/asm/init.h | 2 +-
> > > arch/x86/include/asm/processor.h | 11 +++--------
> > > arch/x86/include/asm/stackprotector.h | 4 ----
> > > tools/perf/util/annotate.c | 4 ++--
> > > 5 files changed, 10 insertions(+), 15 deletions(-)
> > >
> ...
> > > diff --git a/tools/perf/util/annotate.c b/tools/perf/util/annotate.c
> > > index 37ce43c4eb8f..7ecfedf5edb9 100644
> > > --- a/tools/perf/util/annotate.c
> > > +++ b/tools/perf/util/annotate.c
> > > @@ -2485,10 +2485,10 @@ static bool is_stack_operation(struct arch *arch, struct disasm_line *dl)
> > >
> > > static bool is_stack_canary(struct arch *arch, struct annotated_op_loc *loc)
> > > {
> > > - /* On x86_64, %gs:40 is used for stack canary */
> > > + /* On x86_64, %gs:0 is used for stack canary */
> > > if (arch__is(arch, "x86")) {
> > > if (loc->segment == INSN_SEG_X86_GS && loc->imm &&
> > > - loc->offset == 40)
> > > + loc->offset == 0)
> >
> > As a new perf tool can run on old kernels we may need to have this be
> > something like:
> > (loc->offset == 40 /* pre v6.xx kernels */ || loc->offset == 0 /*
> > v6.xx and later */ )
> >
> > We could make this dependent on the kernel by processing the os_release string:
> > https://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools-next.git/tree/tools/perf/util/env.h#n55
> > but that could well be more trouble than it is worth.
> >
>
> Yeah. I also wonder what the purpose of this feature is. At the end of
> this series, the stack cookie will no longer be at a fixed offset of
> %GS anyway, and so perf will not be able to identify it in the same
> manner. So it is probably better to just leave this in place, as the
> %gs:0 case will not exist in the field (assuming that the series lands
> all at once).
>
> Any idea why this deviates from other architectures? Is x86_64 the
> only arch that needs to identify stack canary accesses in perf? We
> could rename the symbol to something identifiable, and do it across
> all architectures, if this really serves a need (and assuming that
> perf has insight into the symbol table).
This is relatively new work coming from Namhyung for data type
profiling and I believe is pretty much just x86 at the moment -
although the ever awesome IBM made contributions for PowerPC. The data
type profiling is trying to classify memory accesses which is why it
cares about the stack canary instruction, the particular encoding
shouldn't matter.
Thanks,
Ian
^ permalink raw reply [flat|nested] 73+ messages in thread
* Re: [RFC PATCH 06/28] x86/percpu: Get rid of absolute per-CPU variable placement
2024-09-25 15:01 ` [RFC PATCH 06/28] x86/percpu: Get rid of absolute per-CPU variable placement Ard Biesheuvel
@ 2024-09-25 17:56 ` Christoph Lameter (Ampere)
0 siblings, 0 replies; 73+ messages in thread
From: Christoph Lameter (Ampere) @ 2024-09-25 17:56 UTC (permalink / raw)
To: Ard Biesheuvel
Cc: linux-kernel, Ard Biesheuvel, x86, H. Peter Anvin,
Andy Lutomirski, Peter Zijlstra, Uros Bizjak, Dennis Zhou,
Tejun Heo, Mathieu Desnoyers, Paolo Bonzini, Vitaly Kuznetsov,
Juergen Gross, Boris Ostrovsky, Greg Kroah-Hartman, Arnd Bergmann,
Masahiro Yamada, Kees Cook, Nathan Chancellor, Keith Packard,
Justin Stitt, Josh Poimboeuf, Arnaldo Carvalho de Melo,
Namhyung Kim, Jiri Olsa, Ian Rogers, Adrian Hunter, Kan Liang,
linux-doc, linux-pm, kvm, xen-devel, linux-efi, linux-arch,
linux-sparse, linux-kbuild, linux-perf-users, rust-for-linux,
llvm
On Wed, 25 Sep 2024, Ard Biesheuvel wrote:
> The x86_64 approach was needed to accommodate per-task stack protector
> cookies, which used to live at a fixed offset of GS+40, requiring GS to
> be treated as a base register. This is no longer the case, though, and
> so GS can be repurposed as a true per-CPU offset, adopting the same
> strategy as other architectures.
>
> This also removes the need for linker tricks to emit the per-CPU ELF
> segment at a different virtual address. It also means RIP-relative
> per-CPU variables no longer need to be relocated in the opposite
> direction when KASLR is applied, which was necessary because the 0x0
> based per-CPU region remains in place even when the kernel is moved
> around.
Looks like a good cleanup. Hope it does not break anything that relies on
structures %GS points to.
Reviewed-by: Christoph Lameter <cl@linux.com>
^ permalink raw reply [flat|nested] 73+ messages in thread
* Re: [RFC PATCH 05/28] x86: Define the stack protector guard symbol explicitly
2024-09-25 15:01 ` [RFC PATCH 05/28] x86: Define the stack protector guard symbol explicitly Ard Biesheuvel
2024-09-25 15:53 ` Ian Rogers
@ 2024-09-25 18:32 ` Uros Bizjak
2024-09-28 13:41 ` Brian Gerst
2024-10-04 10:01 ` Uros Bizjak
2 siblings, 1 reply; 73+ messages in thread
From: Uros Bizjak @ 2024-09-25 18:32 UTC (permalink / raw)
To: Ard Biesheuvel
Cc: linux-kernel, Ard Biesheuvel, x86, H. Peter Anvin,
Andy Lutomirski, Peter Zijlstra, Dennis Zhou, Tejun Heo,
Christoph Lameter, Mathieu Desnoyers, Paolo Bonzini,
Vitaly Kuznetsov, Juergen Gross, Boris Ostrovsky,
Greg Kroah-Hartman, Arnd Bergmann, Masahiro Yamada, Kees Cook,
Nathan Chancellor, Keith Packard, Justin Stitt, Josh Poimboeuf,
Arnaldo Carvalho de Melo, Namhyung Kim, Jiri Olsa, Ian Rogers,
Adrian Hunter, Kan Liang, linux-doc, linux-pm, kvm, xen-devel,
linux-efi, linux-arch, linux-sparse, linux-kbuild,
linux-perf-users, rust-for-linux, llvm, Brian Gerst
On Wed, Sep 25, 2024 at 5:02 PM Ard Biesheuvel <ardb+git@google.com> wrote:
>
> From: Ard Biesheuvel <ardb@kernel.org>
>
> Specify the guard symbol for the stack cookie explicitly, rather than
> positioning it exactly 40 bytes into the per-CPU area. Doing so removes
> the need for the per-CPU region to be absolute rather than relative to
> the placement of the per-CPU template region in the kernel image, and
> this allows the special handling for absolute per-CPU symbols to be
> removed entirely.
>
> This is a worthwhile cleanup in itself, but it is also a prerequisite
> for PIE codegen and PIE linking, which can replace our bespoke and
> rather clunky runtime relocation handling.
I would like to point out a series that converted the stack protector
guard symbol to a normal percpu variable [1], so there was no need to
assume anything about the location of the guard symbol.
[1] "[PATCH v4 00/16] x86-64: Stack protector and percpu improvements"
https://lore.kernel.org/lkml/20240322165233.71698-1-brgerst@gmail.com/
Uros.
>
> Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
> ---
> arch/x86/Makefile | 4 ++++
> arch/x86/include/asm/init.h | 2 +-
> arch/x86/include/asm/processor.h | 11 +++--------
> arch/x86/include/asm/stackprotector.h | 4 ----
> tools/perf/util/annotate.c | 4 ++--
> 5 files changed, 10 insertions(+), 15 deletions(-)
>
> diff --git a/arch/x86/Makefile b/arch/x86/Makefile
> index 6b3fe6e2aadd..b78b7623a4a9 100644
> --- a/arch/x86/Makefile
> +++ b/arch/x86/Makefile
> @@ -193,6 +193,10 @@ else
> KBUILD_RUSTFLAGS += -Cno-redzone=y
> KBUILD_RUSTFLAGS += -Ccode-model=kernel
>
> + ifeq ($(CONFIG_STACKPROTECTOR),y)
> + KBUILD_CFLAGS += -mstack-protector-guard-symbol=fixed_percpu_data
> + endif
> +
> # Don't emit relaxable GOTPCREL relocations
> KBUILD_AFLAGS_KERNEL += -Wa,-mrelax-relocations=no
> KBUILD_CFLAGS_KERNEL += -Wa,-mrelax-relocations=no
> diff --git a/arch/x86/include/asm/init.h b/arch/x86/include/asm/init.h
> index 14d72727d7ee..3ed0e8ec973f 100644
> --- a/arch/x86/include/asm/init.h
> +++ b/arch/x86/include/asm/init.h
> @@ -2,7 +2,7 @@
> #ifndef _ASM_X86_INIT_H
> #define _ASM_X86_INIT_H
>
> -#define __head __section(".head.text")
> +#define __head __section(".head.text") __no_stack_protector
>
> struct x86_mapping_info {
> void *(*alloc_pgt_page)(void *); /* allocate buf for page table */
> diff --git a/arch/x86/include/asm/processor.h b/arch/x86/include/asm/processor.h
> index 4a686f0e5dbf..56bc36116814 100644
> --- a/arch/x86/include/asm/processor.h
> +++ b/arch/x86/include/asm/processor.h
> @@ -402,14 +402,9 @@ struct irq_stack {
> #ifdef CONFIG_X86_64
> struct fixed_percpu_data {
> /*
> - * GCC hardcodes the stack canary as %gs:40. Since the
> - * irq_stack is the object at %gs:0, we reserve the bottom
> - * 48 bytes of the irq stack for the canary.
> - *
> - * Once we are willing to require -mstack-protector-guard-symbol=
> - * support for x86_64 stackprotector, we can get rid of this.
> + * Since the irq_stack is the object at %gs:0, the bottom 8 bytes of
> + * the irq stack are reserved for the canary.
> */
> - char gs_base[40];
> unsigned long stack_canary;
> };
>
> @@ -418,7 +413,7 @@ DECLARE_INIT_PER_CPU(fixed_percpu_data);
>
> static inline unsigned long cpu_kernelmode_gs_base(int cpu)
> {
> - return (unsigned long)per_cpu(fixed_percpu_data.gs_base, cpu);
> + return (unsigned long)&per_cpu(fixed_percpu_data, cpu);
> }
>
> extern asmlinkage void entry_SYSCALL32_ignore(void);
> diff --git a/arch/x86/include/asm/stackprotector.h b/arch/x86/include/asm/stackprotector.h
> index 00473a650f51..d1dcd22a0a4c 100644
> --- a/arch/x86/include/asm/stackprotector.h
> +++ b/arch/x86/include/asm/stackprotector.h
> @@ -51,10 +51,6 @@ static __always_inline void boot_init_stack_canary(void)
> {
> unsigned long canary = get_random_canary();
>
> -#ifdef CONFIG_X86_64
> - BUILD_BUG_ON(offsetof(struct fixed_percpu_data, stack_canary) != 40);
> -#endif
> -
> current->stack_canary = canary;
> #ifdef CONFIG_X86_64
> this_cpu_write(fixed_percpu_data.stack_canary, canary);
> diff --git a/tools/perf/util/annotate.c b/tools/perf/util/annotate.c
> index 37ce43c4eb8f..7ecfedf5edb9 100644
> --- a/tools/perf/util/annotate.c
> +++ b/tools/perf/util/annotate.c
> @@ -2485,10 +2485,10 @@ static bool is_stack_operation(struct arch *arch, struct disasm_line *dl)
>
> static bool is_stack_canary(struct arch *arch, struct annotated_op_loc *loc)
> {
> - /* On x86_64, %gs:40 is used for stack canary */
> + /* On x86_64, %gs:0 is used for stack canary */
> if (arch__is(arch, "x86")) {
> if (loc->segment == INSN_SEG_X86_GS && loc->imm &&
> - loc->offset == 40)
> + loc->offset == 0)
> return true;
> }
>
> --
> 2.46.0.792.g87dc391469-goog
>
^ permalink raw reply [flat|nested] 73+ messages in thread
* Re: [RFC PATCH 27/28] x86/kernel: Switch to PIE linking for the core kernel
2024-09-25 15:01 ` [RFC PATCH 27/28] x86/kernel: Switch to PIE linking for the core kernel Ard Biesheuvel
@ 2024-09-25 18:54 ` Uros Bizjak
2024-09-25 19:14 ` Ard Biesheuvel
2024-09-25 20:24 ` Vegard Nossum
1 sibling, 1 reply; 73+ messages in thread
From: Uros Bizjak @ 2024-09-25 18:54 UTC (permalink / raw)
To: Ard Biesheuvel
Cc: linux-kernel, Ard Biesheuvel, x86, H. Peter Anvin,
Andy Lutomirski, Peter Zijlstra, Dennis Zhou, Tejun Heo,
Christoph Lameter, Mathieu Desnoyers, Paolo Bonzini,
Vitaly Kuznetsov, Juergen Gross, Boris Ostrovsky,
Greg Kroah-Hartman, Arnd Bergmann, Masahiro Yamada, Kees Cook,
Nathan Chancellor, Keith Packard, Justin Stitt, Josh Poimboeuf,
Arnaldo Carvalho de Melo, Namhyung Kim, Jiri Olsa, Ian Rogers,
Adrian Hunter, Kan Liang, linux-doc, linux-pm, kvm, xen-devel,
linux-efi, linux-arch, linux-sparse, linux-kbuild,
linux-perf-users, rust-for-linux, llvm, Hou Wenlong
On Wed, Sep 25, 2024 at 5:02 PM Ard Biesheuvel <ardb+git@google.com> wrote:
>
> From: Ard Biesheuvel <ardb@kernel.org>
>
> Build the kernel as a Position Independent Executable (PIE). This
> results in more efficient relocation processing for the virtual
> displacement of the kernel (for KASLR). More importantly, it instructs
> the linker to generate what is actually needed (a program that can be
> moved around in memory before execution), which is better than having to
> rely on the linker to create a position dependent binary that happens to
> tolerate being moved around after poking it in exactly the right manner.
>
> Note that this means that all codegen should be compatible with PIE,
> including Rust objects, so this needs to switch to the small code model
> with the PIE relocation model as well.
I think that related to this work is the patch series [1] that
introduces the changes necessary to build the kernel as Position
Independent Executable (PIE) on x86_64 [1]. There are some more places
that need to be adapted for PIE. The patch series also introduces
objtool functionality to add validation for x86 PIE.
[1] "[PATCH RFC 00/43] x86/pie: Make kernel image's virtual address flexible"
https://lore.kernel.org/lkml/cover.1682673542.git.houwenlong.hwl@antgroup.com/
Uros.
>
> Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
> ---
> arch/x86/Kconfig | 2 +-
> arch/x86/Makefile | 11 +++++++----
> arch/x86/boot/compressed/misc.c | 2 ++
> arch/x86/kernel/vmlinux.lds.S | 5 +++++
> drivers/firmware/efi/libstub/x86-stub.c | 2 ++
> 5 files changed, 17 insertions(+), 5 deletions(-)
>
> diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
> index 54cb1f14218b..dbb4d284b0e1 100644
> --- a/arch/x86/Kconfig
> +++ b/arch/x86/Kconfig
> @@ -2187,7 +2187,7 @@ config RANDOMIZE_BASE
> # Relocation on x86 needs some additional build support
> config X86_NEED_RELOCS
> def_bool y
> - depends on RANDOMIZE_BASE || (X86_32 && RELOCATABLE)
> + depends on X86_32 && RELOCATABLE
>
> config PHYSICAL_ALIGN
> hex "Alignment value to which kernel should be aligned"
> diff --git a/arch/x86/Makefile b/arch/x86/Makefile
> index 83d20f402535..c1dcff444bc8 100644
> --- a/arch/x86/Makefile
> +++ b/arch/x86/Makefile
> @@ -206,9 +206,8 @@ else
> PIE_CFLAGS-$(CONFIG_SMP) += -mstack-protector-guard-reg=gs
> endif
>
> - # Don't emit relaxable GOTPCREL relocations
> - KBUILD_AFLAGS_KERNEL += -Wa,-mrelax-relocations=no
> - KBUILD_CFLAGS_KERNEL += -Wa,-mrelax-relocations=no $(PIE_CFLAGS-y)
> + KBUILD_CFLAGS_KERNEL += $(PIE_CFLAGS-y)
> + KBUILD_RUSTFLAGS_KERNEL += -Ccode-model=small -Crelocation-model=pie
> endif
>
> #
> @@ -264,12 +263,16 @@ else
> LDFLAGS_vmlinux :=
> endif
>
> +ifdef CONFIG_X86_64
> +ldflags-pie-$(CONFIG_LD_IS_LLD) := --apply-dynamic-relocs
> +ldflags-pie-$(CONFIG_LD_IS_BFD) := -z call-nop=suffix-nop
> +LDFLAGS_vmlinux += --pie -z text $(ldflags-pie-y)
> +
> #
> # The 64-bit kernel must be aligned to 2MB. Pass -z max-page-size=0x200000 to
> # the linker to force 2MB page size regardless of the default page size used
> # by the linker.
> #
> -ifdef CONFIG_X86_64
> LDFLAGS_vmlinux += -z max-page-size=0x200000
> endif
>
> diff --git a/arch/x86/boot/compressed/misc.c b/arch/x86/boot/compressed/misc.c
> index 89f01375cdb7..79e3ffe16f61 100644
> --- a/arch/x86/boot/compressed/misc.c
> +++ b/arch/x86/boot/compressed/misc.c
> @@ -495,6 +495,8 @@ asmlinkage __visible void *extract_kernel(void *rmode, unsigned char *output)
> error("Destination virtual address changed when not relocatable");
> #endif
>
> + boot_params_ptr->kaslr_va_shift = virt_addr - LOAD_PHYSICAL_ADDR;
> +
> debug_putstr("\nDecompressing Linux... ");
>
> if (init_unaccepted_memory()) {
> diff --git a/arch/x86/kernel/vmlinux.lds.S b/arch/x86/kernel/vmlinux.lds.S
> index f7e832c2ac61..d172e6e8eaaf 100644
> --- a/arch/x86/kernel/vmlinux.lds.S
> +++ b/arch/x86/kernel/vmlinux.lds.S
> @@ -459,6 +459,11 @@ xen_elfnote_phys32_entry_offset =
>
> DISCARDS
>
> + /DISCARD/ : {
> + *(.dynsym .gnu.hash .hash .dynamic .dynstr)
> + *(.interp .dynbss .eh_frame .sframe)
> + }
> +
> /*
> * Make sure that the .got.plt is either completely empty or it
> * contains only the lazy dispatch entries.
> diff --git a/drivers/firmware/efi/libstub/x86-stub.c b/drivers/firmware/efi/libstub/x86-stub.c
> index f8e465da344d..5c03954924fe 100644
> --- a/drivers/firmware/efi/libstub/x86-stub.c
> +++ b/drivers/firmware/efi/libstub/x86-stub.c
> @@ -912,6 +912,8 @@ static efi_status_t efi_decompress_kernel(unsigned long *kernel_entry)
> if (status != EFI_SUCCESS)
> return status;
>
> + boot_params_ptr->kaslr_va_shift = virt_addr - LOAD_PHYSICAL_ADDR;
> +
> entry = decompress_kernel((void *)addr, virt_addr, error);
> if (entry == ULONG_MAX) {
> efi_free(alloc_size, addr);
> --
> 2.46.0.792.g87dc391469-goog
>
^ permalink raw reply [flat|nested] 73+ messages in thread
* Re: [RFC PATCH 27/28] x86/kernel: Switch to PIE linking for the core kernel
2024-09-25 18:54 ` Uros Bizjak
@ 2024-09-25 19:14 ` Ard Biesheuvel
2024-09-25 19:39 ` Uros Bizjak
0 siblings, 1 reply; 73+ messages in thread
From: Ard Biesheuvel @ 2024-09-25 19:14 UTC (permalink / raw)
To: Uros Bizjak
Cc: Ard Biesheuvel, linux-kernel, x86, H. Peter Anvin,
Andy Lutomirski, Peter Zijlstra, Dennis Zhou, Tejun Heo,
Christoph Lameter, Mathieu Desnoyers, Paolo Bonzini,
Vitaly Kuznetsov, Juergen Gross, Boris Ostrovsky,
Greg Kroah-Hartman, Arnd Bergmann, Masahiro Yamada, Kees Cook,
Nathan Chancellor, Keith Packard, Justin Stitt, Josh Poimboeuf,
Arnaldo Carvalho de Melo, Namhyung Kim, Jiri Olsa, Ian Rogers,
Adrian Hunter, Kan Liang, linux-doc, linux-pm, kvm, xen-devel,
linux-efi, linux-arch, linux-sparse, linux-kbuild,
linux-perf-users, rust-for-linux, llvm, Hou Wenlong
On Wed, 25 Sept 2024 at 20:54, Uros Bizjak <ubizjak@gmail.com> wrote:
>
> On Wed, Sep 25, 2024 at 5:02 PM Ard Biesheuvel <ardb+git@google.com> wrote:
> >
> > From: Ard Biesheuvel <ardb@kernel.org>
> >
> > Build the kernel as a Position Independent Executable (PIE). This
> > results in more efficient relocation processing for the virtual
> > displacement of the kernel (for KASLR). More importantly, it instructs
> > the linker to generate what is actually needed (a program that can be
> > moved around in memory before execution), which is better than having to
> > rely on the linker to create a position dependent binary that happens to
> > tolerate being moved around after poking it in exactly the right manner.
> >
> > Note that this means that all codegen should be compatible with PIE,
> > including Rust objects, so this needs to switch to the small code model
> > with the PIE relocation model as well.
>
> I think that related to this work is the patch series [1] that
> introduces the changes necessary to build the kernel as Position
> Independent Executable (PIE) on x86_64 [1]. There are some more places
> that need to be adapted for PIE. The patch series also introduces
> objtool functionality to add validation for x86 PIE.
>
> [1] "[PATCH RFC 00/43] x86/pie: Make kernel image's virtual address flexible"
> https://lore.kernel.org/lkml/cover.1682673542.git.houwenlong.hwl@antgroup.com/
>
Hi Uros,
I am aware of that discussion, as I took part in it as well.
I don't think any of those changes are actually needed now - did you
notice anything in particular that is missing?
^ permalink raw reply [flat|nested] 73+ messages in thread
* Re: [RFC PATCH 27/28] x86/kernel: Switch to PIE linking for the core kernel
2024-09-25 19:14 ` Ard Biesheuvel
@ 2024-09-25 19:39 ` Uros Bizjak
2024-09-25 20:01 ` Ard Biesheuvel
0 siblings, 1 reply; 73+ messages in thread
From: Uros Bizjak @ 2024-09-25 19:39 UTC (permalink / raw)
To: Ard Biesheuvel
Cc: Ard Biesheuvel, linux-kernel, x86, H. Peter Anvin,
Andy Lutomirski, Peter Zijlstra, Dennis Zhou, Tejun Heo,
Christoph Lameter, Mathieu Desnoyers, Paolo Bonzini,
Vitaly Kuznetsov, Juergen Gross, Boris Ostrovsky,
Greg Kroah-Hartman, Arnd Bergmann, Masahiro Yamada, Kees Cook,
Nathan Chancellor, Keith Packard, Justin Stitt, Josh Poimboeuf,
Arnaldo Carvalho de Melo, Namhyung Kim, Jiri Olsa, Ian Rogers,
Adrian Hunter, Kan Liang, linux-doc, linux-pm, kvm, xen-devel,
linux-efi, linux-arch, linux-sparse, linux-kbuild,
linux-perf-users, rust-for-linux, llvm, Hou Wenlong
On Wed, Sep 25, 2024 at 9:14 PM Ard Biesheuvel <ardb@kernel.org> wrote:
>
> On Wed, 25 Sept 2024 at 20:54, Uros Bizjak <ubizjak@gmail.com> wrote:
> >
> > On Wed, Sep 25, 2024 at 5:02 PM Ard Biesheuvel <ardb+git@google.com> wrote:
> > >
> > > From: Ard Biesheuvel <ardb@kernel.org>
> > >
> > > Build the kernel as a Position Independent Executable (PIE). This
> > > results in more efficient relocation processing for the virtual
> > > displacement of the kernel (for KASLR). More importantly, it instructs
> > > the linker to generate what is actually needed (a program that can be
> > > moved around in memory before execution), which is better than having to
> > > rely on the linker to create a position dependent binary that happens to
> > > tolerate being moved around after poking it in exactly the right manner.
> > >
> > > Note that this means that all codegen should be compatible with PIE,
> > > including Rust objects, so this needs to switch to the small code model
> > > with the PIE relocation model as well.
> >
> > I think that related to this work is the patch series [1] that
> > introduces the changes necessary to build the kernel as Position
> > Independent Executable (PIE) on x86_64 [1]. There are some more places
> > that need to be adapted for PIE. The patch series also introduces
> > objtool functionality to add validation for x86 PIE.
> >
> > [1] "[PATCH RFC 00/43] x86/pie: Make kernel image's virtual address flexible"
> > https://lore.kernel.org/lkml/cover.1682673542.git.houwenlong.hwl@antgroup.com/
> >
>
> Hi Uros,
>
> I am aware of that discussion, as I took part in it as well.
>
> I don't think any of those changes are actually needed now - did you
> notice anything in particular that is missing?
Some time ago I went through the kernel sources and proposed several
patches that changed all trivial occurrences of non-RIP addresses to
RIP ones. The work was partially based on the mentioned patch series,
and I remember, I left some of them out [e.g. 1], because they
required a temporary variable. Also, there was discussion about ftrace
[2], where no solution was found.
Looking through your series, I didn't find some of the non-RIP -> RIP
changes proposed by the original series (especially the ftrace part),
and noticed that there is no objtool validator proposed to ensure that
all generated code is indeed PIE compatible.
Speaking of non-RIP -> RIP changes that require a temporary - would it
be beneficial to make a macro that would use the RIP form only when
#ifdef CONFIG_X86_PIE? That would avoid code size increase when PIE is
not needed.
[1] https://lore.kernel.org/lkml/a0b69f3fac1834c05f960b916cc6eb0004cdffbf.1682673543.git.houwenlong.hwl@antgroup.com/
[2] https://lore.kernel.org/lkml/20230428094454.0f2f5049@gandalf.local.home/
[3] https://lore.kernel.org/lkml/226af8c63c5bfa361763dd041a997ee84fe926cf.1682673543.git.houwenlong.hwl@antgroup.com/
Thanks and best regards,
Uros.
^ permalink raw reply [flat|nested] 73+ messages in thread
* Re: [RFC PATCH 27/28] x86/kernel: Switch to PIE linking for the core kernel
2024-09-25 19:39 ` Uros Bizjak
@ 2024-09-25 20:01 ` Ard Biesheuvel
2024-09-25 20:22 ` Uros Bizjak
0 siblings, 1 reply; 73+ messages in thread
From: Ard Biesheuvel @ 2024-09-25 20:01 UTC (permalink / raw)
To: Uros Bizjak
Cc: Ard Biesheuvel, linux-kernel, x86, H. Peter Anvin,
Andy Lutomirski, Peter Zijlstra, Dennis Zhou, Tejun Heo,
Christoph Lameter, Mathieu Desnoyers, Paolo Bonzini,
Vitaly Kuznetsov, Juergen Gross, Boris Ostrovsky,
Greg Kroah-Hartman, Arnd Bergmann, Masahiro Yamada, Kees Cook,
Nathan Chancellor, Keith Packard, Justin Stitt, Josh Poimboeuf,
Arnaldo Carvalho de Melo, Namhyung Kim, Jiri Olsa, Ian Rogers,
Adrian Hunter, Kan Liang, linux-doc, linux-pm, kvm, xen-devel,
linux-efi, linux-arch, linux-sparse, linux-kbuild,
linux-perf-users, rust-for-linux, llvm, Hou Wenlong
On Wed, 25 Sept 2024 at 21:39, Uros Bizjak <ubizjak@gmail.com> wrote:
>
> On Wed, Sep 25, 2024 at 9:14 PM Ard Biesheuvel <ardb@kernel.org> wrote:
> >
> > On Wed, 25 Sept 2024 at 20:54, Uros Bizjak <ubizjak@gmail.com> wrote:
> > >
> > > On Wed, Sep 25, 2024 at 5:02 PM Ard Biesheuvel <ardb+git@google.com> wrote:
> > > >
> > > > From: Ard Biesheuvel <ardb@kernel.org>
> > > >
> > > > Build the kernel as a Position Independent Executable (PIE). This
> > > > results in more efficient relocation processing for the virtual
> > > > displacement of the kernel (for KASLR). More importantly, it instructs
> > > > the linker to generate what is actually needed (a program that can be
> > > > moved around in memory before execution), which is better than having to
> > > > rely on the linker to create a position dependent binary that happens to
> > > > tolerate being moved around after poking it in exactly the right manner.
> > > >
> > > > Note that this means that all codegen should be compatible with PIE,
> > > > including Rust objects, so this needs to switch to the small code model
> > > > with the PIE relocation model as well.
> > >
> > > I think that related to this work is the patch series [1] that
> > > introduces the changes necessary to build the kernel as Position
> > > Independent Executable (PIE) on x86_64 [1]. There are some more places
> > > that need to be adapted for PIE. The patch series also introduces
> > > objtool functionality to add validation for x86 PIE.
> > >
> > > [1] "[PATCH RFC 00/43] x86/pie: Make kernel image's virtual address flexible"
> > > https://lore.kernel.org/lkml/cover.1682673542.git.houwenlong.hwl@antgroup.com/
> > >
> >
> > Hi Uros,
> >
> > I am aware of that discussion, as I took part in it as well.
> >
> > I don't think any of those changes are actually needed now - did you
> > notice anything in particular that is missing?
>
> Some time ago I went through the kernel sources and proposed several
> patches that changed all trivial occurrences of non-RIP addresses to
> RIP ones. The work was partially based on the mentioned patch series,
> and I remember, I left some of them out [e.g. 1], because they
> required a temporary variable.
I have a similar patch in my series, but the DEBUG_ENTRY code just uses
pushf 1f@GOTPCREL(%rip)
so no temporaries are needed.
> Also, there was discussion about ftrace
> [2], where no solution was found.
>
When linking with -z call-nop=suffix-nop, the __fentry__ call via the
GOT will be relaxed by the linker into a 5 byte call followed by a 1
byte NOP, so I don't think we need to do anything special here. It
might mean we currently lose -mnop-mcount until we find a solution for
that in the compiler. In case you remember, I contributed and you
merged a GCC patch that makes the __fentry__ emission logic honour
-fdirect-access-external-data which should help here. This landed in
GCC 14.
> Looking through your series, I didn't find some of the non-RIP -> RIP
> changes proposed by the original series (especially the ftrace part),
> and noticed that there is no objtool validator proposed to ensure that
> all generated code is indeed PIE compatible.
>
What would be the point of that? The linker will complain and throw an
error if the code cannot be converted into a PIE executable, so I
don't think we need objtool's help for that.
> Speaking of non-RIP -> RIP changes that require a temporary - would it
> be beneficial to make a macro that would use the RIP form only when
> #ifdef CONFIG_X86_PIE? That would avoid code size increase when PIE is
> not needed.
>
This series does not make the PIE support configurable. Do you think
the code size increase is a concern if all GOT based symbol references
are elided, e.g, via -fdirect-access-external-data?
^ permalink raw reply [flat|nested] 73+ messages in thread
* Re: [RFC PATCH 27/28] x86/kernel: Switch to PIE linking for the core kernel
2024-09-25 20:01 ` Ard Biesheuvel
@ 2024-09-25 20:22 ` Uros Bizjak
0 siblings, 0 replies; 73+ messages in thread
From: Uros Bizjak @ 2024-09-25 20:22 UTC (permalink / raw)
To: Ard Biesheuvel
Cc: Ard Biesheuvel, linux-kernel, x86, H. Peter Anvin,
Andy Lutomirski, Peter Zijlstra, Dennis Zhou, Tejun Heo,
Christoph Lameter, Mathieu Desnoyers, Paolo Bonzini,
Vitaly Kuznetsov, Juergen Gross, Boris Ostrovsky,
Greg Kroah-Hartman, Arnd Bergmann, Masahiro Yamada, Kees Cook,
Nathan Chancellor, Keith Packard, Justin Stitt, Josh Poimboeuf,
Arnaldo Carvalho de Melo, Namhyung Kim, Jiri Olsa, Ian Rogers,
Adrian Hunter, Kan Liang, linux-doc, linux-pm, kvm, xen-devel,
linux-efi, linux-arch, linux-sparse, linux-kbuild,
linux-perf-users, rust-for-linux, llvm, Hou Wenlong
On Wed, Sep 25, 2024 at 10:01 PM Ard Biesheuvel <ardb@kernel.org> wrote:
>
> On Wed, 25 Sept 2024 at 21:39, Uros Bizjak <ubizjak@gmail.com> wrote:
> >
> > On Wed, Sep 25, 2024 at 9:14 PM Ard Biesheuvel <ardb@kernel.org> wrote:
> > >
> > > On Wed, 25 Sept 2024 at 20:54, Uros Bizjak <ubizjak@gmail.com> wrote:
> > > >
> > > > On Wed, Sep 25, 2024 at 5:02 PM Ard Biesheuvel <ardb+git@google.com> wrote:
> > > > >
> > > > > From: Ard Biesheuvel <ardb@kernel.org>
> > > > >
> > > > > Build the kernel as a Position Independent Executable (PIE). This
> > > > > results in more efficient relocation processing for the virtual
> > > > > displacement of the kernel (for KASLR). More importantly, it instructs
> > > > > the linker to generate what is actually needed (a program that can be
> > > > > moved around in memory before execution), which is better than having to
> > > > > rely on the linker to create a position dependent binary that happens to
> > > > > tolerate being moved around after poking it in exactly the right manner.
> > > > >
> > > > > Note that this means that all codegen should be compatible with PIE,
> > > > > including Rust objects, so this needs to switch to the small code model
> > > > > with the PIE relocation model as well.
> > > >
> > > > I think that related to this work is the patch series [1] that
> > > > introduces the changes necessary to build the kernel as Position
> > > > Independent Executable (PIE) on x86_64 [1]. There are some more places
> > > > that need to be adapted for PIE. The patch series also introduces
> > > > objtool functionality to add validation for x86 PIE.
> > > >
> > > > [1] "[PATCH RFC 00/43] x86/pie: Make kernel image's virtual address flexible"
> > > > https://lore.kernel.org/lkml/cover.1682673542.git.houwenlong.hwl@antgroup.com/
> > > >
> > >
> > > Hi Uros,
> > >
> > > I am aware of that discussion, as I took part in it as well.
> > >
> > > I don't think any of those changes are actually needed now - did you
> > > notice anything in particular that is missing?
> >
> > Some time ago I went through the kernel sources and proposed several
> > patches that changed all trivial occurrences of non-RIP addresses to
> > RIP ones. The work was partially based on the mentioned patch series,
> > and I remember, I left some of them out [e.g. 1], because they
> > required a temporary variable.
>
> I have a similar patch in my series, but the DEBUG_ENTRY code just uses
>
> pushf 1f@GOTPCREL(%rip)
>
> so no temporaries are needed.
>
> > Also, there was discussion about ftrace
> > [2], where no solution was found.
> >
>
> When linking with -z call-nop=suffix-nop, the __fentry__ call via the
> GOT will be relaxed by the linker into a 5 byte call followed by a 1
> byte NOP, so I don't think we need to do anything special here. It
> might mean we currently lose -mnop-mcount until we find a solution for
> that in the compiler. In case you remember, I contributed and you
> merged a GCC patch that makes the __fentry__ emission logic honour
> -fdirect-access-external-data which should help here. This landed in
> GCC 14.
>
> > Looking through your series, I didn't find some of the non-RIP -> RIP
> > changes proposed by the original series (especially the ftrace part),
> > and noticed that there is no objtool validator proposed to ensure that
> > all generated code is indeed PIE compatible.
> >
>
> What would be the point of that? The linker will complain and throw an
> error if the code cannot be converted into a PIE executable, so I
> don't think we need objtool's help for that.
Indeed.
> > Speaking of non-RIP -> RIP changes that require a temporary - would it
> > be beneficial to make a macro that would use the RIP form only when
> > #ifdef CONFIG_X86_PIE? That would avoid code size increase when PIE is
> > not needed.
> >
>
> This series does not make the PIE support configurable. Do you think
> the code size increase is a concern if all GOT based symbol references
> are elided, e.g, via -fdirect-access-external-data?
I was looking at the code size measurement of the original patch
series (perhaps these are not relevant with your series) and I think
2.2% - 2.4% code size increase can be problematic. Can you perhaps
provide new code size increase measurements with your patches applied?
Thanks and BR,
Uros.
^ permalink raw reply [flat|nested] 73+ messages in thread
* Re: [RFC PATCH 27/28] x86/kernel: Switch to PIE linking for the core kernel
2024-09-25 15:01 ` [RFC PATCH 27/28] x86/kernel: Switch to PIE linking for the core kernel Ard Biesheuvel
2024-09-25 18:54 ` Uros Bizjak
@ 2024-09-25 20:24 ` Vegard Nossum
2024-09-26 13:38 ` Ard Biesheuvel
1 sibling, 1 reply; 73+ messages in thread
From: Vegard Nossum @ 2024-09-25 20:24 UTC (permalink / raw)
To: Ard Biesheuvel, linux-kernel
Cc: Ard Biesheuvel, x86, H. Peter Anvin, Andy Lutomirski,
Peter Zijlstra, Uros Bizjak, Dennis Zhou, Tejun Heo,
Christoph Lameter, Mathieu Desnoyers, Paolo Bonzini,
Vitaly Kuznetsov, Juergen Gross, Boris Ostrovsky,
Greg Kroah-Hartman, Arnd Bergmann, Masahiro Yamada, Kees Cook,
Nathan Chancellor, Keith Packard, Justin Stitt, Josh Poimboeuf,
Arnaldo Carvalho de Melo, Namhyung Kim, Jiri Olsa, Ian Rogers,
Adrian Hunter, Kan Liang, linux-doc, linux-pm, kvm, xen-devel,
linux-efi, linux-arch, linux-sparse, linux-kbuild,
linux-perf-users, rust-for-linux, llvm
On 25/09/2024 17:01, Ard Biesheuvel wrote:
> From: Ard Biesheuvel <ardb@kernel.org>
>
> Build the kernel as a Position Independent Executable (PIE). This
> results in more efficient relocation processing for the virtual
> displacement of the kernel (for KASLR). More importantly, it instructs
> the linker to generate what is actually needed (a program that can be
> moved around in memory before execution), which is better than having to
> rely on the linker to create a position dependent binary that happens to
> tolerate being moved around after poking it in exactly the right manner.
>
> Note that this means that all codegen should be compatible with PIE,
> including Rust objects, so this needs to switch to the small code model
> with the PIE relocation model as well.
>
> Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
> ---
> arch/x86/Kconfig | 2 +-
> arch/x86/Makefile | 11 +++++++----
> arch/x86/boot/compressed/misc.c | 2 ++
> arch/x86/kernel/vmlinux.lds.S | 5 +++++
> drivers/firmware/efi/libstub/x86-stub.c | 2 ++
> 5 files changed, 17 insertions(+), 5 deletions(-)
>
> diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
> index 54cb1f14218b..dbb4d284b0e1 100644
> --- a/arch/x86/Kconfig
> +++ b/arch/x86/Kconfig
> @@ -2187,7 +2187,7 @@ config RANDOMIZE_BASE
> # Relocation on x86 needs some additional build support
> config X86_NEED_RELOCS
> def_bool y
> - depends on RANDOMIZE_BASE || (X86_32 && RELOCATABLE)
> + depends on X86_32 && RELOCATABLE
>
> config PHYSICAL_ALIGN
> hex "Alignment value to which kernel should be aligned"
> diff --git a/arch/x86/Makefile b/arch/x86/Makefile
> index 83d20f402535..c1dcff444bc8 100644
> --- a/arch/x86/Makefile
> +++ b/arch/x86/Makefile
> @@ -206,9 +206,8 @@ else
> PIE_CFLAGS-$(CONFIG_SMP) += -mstack-protector-guard-reg=gs
> endif
>
> - # Don't emit relaxable GOTPCREL relocations
> - KBUILD_AFLAGS_KERNEL += -Wa,-mrelax-relocations=no
> - KBUILD_CFLAGS_KERNEL += -Wa,-mrelax-relocations=no $(PIE_CFLAGS-y)
> + KBUILD_CFLAGS_KERNEL += $(PIE_CFLAGS-y)
> + KBUILD_RUSTFLAGS_KERNEL += -Ccode-model=small -Crelocation-model=pie
> endif
>
> #
> @@ -264,12 +263,16 @@ else
> LDFLAGS_vmlinux :=
> endif
>
> +ifdef CONFIG_X86_64
> +ldflags-pie-$(CONFIG_LD_IS_LLD) := --apply-dynamic-relocs
> +ldflags-pie-$(CONFIG_LD_IS_BFD) := -z call-nop=suffix-nop
> +LDFLAGS_vmlinux += --pie -z text $(ldflags-pie-y)
> +
> #
> # The 64-bit kernel must be aligned to 2MB. Pass -z max-page-size=0x200000 to
> # the linker to force 2MB page size regardless of the default page size used
> # by the linker.
> #
> -ifdef CONFIG_X86_64
> LDFLAGS_vmlinux += -z max-page-size=0x200000
> endif
>
> diff --git a/arch/x86/boot/compressed/misc.c b/arch/x86/boot/compressed/misc.c
> index 89f01375cdb7..79e3ffe16f61 100644
> --- a/arch/x86/boot/compressed/misc.c
> +++ b/arch/x86/boot/compressed/misc.c
> @@ -495,6 +495,8 @@ asmlinkage __visible void *extract_kernel(void *rmode, unsigned char *output)
> error("Destination virtual address changed when not relocatable");
> #endif
>
> + boot_params_ptr->kaslr_va_shift = virt_addr - LOAD_PHYSICAL_ADDR;
> +
> debug_putstr("\nDecompressing Linux... ");
>
> if (init_unaccepted_memory()) {
> diff --git a/arch/x86/kernel/vmlinux.lds.S b/arch/x86/kernel/vmlinux.lds.S
> index f7e832c2ac61..d172e6e8eaaf 100644
> --- a/arch/x86/kernel/vmlinux.lds.S
> +++ b/arch/x86/kernel/vmlinux.lds.S
> @@ -459,6 +459,11 @@ xen_elfnote_phys32_entry_offset =
>
> DISCARDS
>
> + /DISCARD/ : {
> + *(.dynsym .gnu.hash .hash .dynamic .dynstr)
> + *(.interp .dynbss .eh_frame .sframe)
> + }
> +
> /*
> * Make sure that the .got.plt is either completely empty or it
> * contains only the lazy dispatch entries.
> diff --git a/drivers/firmware/efi/libstub/x86-stub.c b/drivers/firmware/efi/libstub/x86-stub.c
> index f8e465da344d..5c03954924fe 100644
> --- a/drivers/firmware/efi/libstub/x86-stub.c
> +++ b/drivers/firmware/efi/libstub/x86-stub.c
> @@ -912,6 +912,8 @@ static efi_status_t efi_decompress_kernel(unsigned long *kernel_entry)
> if (status != EFI_SUCCESS)
> return status;
>
> + boot_params_ptr->kaslr_va_shift = virt_addr - LOAD_PHYSICAL_ADDR;
> +
> entry = decompress_kernel((void *)addr, virt_addr, error);
> if (entry == ULONG_MAX) {
> efi_free(alloc_size, addr);
This patch causes a build failure here (on 64-bit):
LD .tmp_vmlinux2
NM .tmp_vmlinux2.syms
KSYMS .tmp_vmlinux2.kallsyms.S
AS .tmp_vmlinux2.kallsyms.o
LD vmlinux
BTFIDS vmlinux
WARN: resolve_btfids: unresolved symbol bpf_lsm_key_free
FAILED elf_update(WRITE): invalid section entry size
make[5]: *** [scripts/Makefile.vmlinux:34: vmlinux] Error 255
make[5]: *** Deleting file 'vmlinux'
make[4]: *** [Makefile:1153: vmlinux] Error 2
make[3]: *** [debian/rules:74: build-arch] Error 2
dpkg-buildpackage: error: make -f debian/rules binary subprocess
returned exit status 2
make[2]: *** [scripts/Makefile.package:121: bindeb-pkg] Error 2
make[1]: *** [/home/opc/linux-mainline-worktree2/Makefile:1544:
bindeb-pkg] Error 2
make: *** [Makefile:224: __sub-make] Error 2
The parent commit builds fine. With V=1:
+ ldflags='-m elf_x86_64 -z noexecstack --pie -z text -z
call-nop=suffix-nop -z max-page-size=0x200000 --build-id=sha1
--orphan-handling=warn --script=./arch/x86/kernel/vmlinux.lds
-Map=vmlinux.map'
+ ld -m elf_x86_64 -z noexecstack --pie -z text -z call-nop=suffix-nop
-z max-page-size=0x200000 --build-id=sha1 --orphan-handling=warn
--script=./arch/x86/kernel/vmlinux.lds -Map=vmlinux.map -o vmlinux
--whole-archive vmlinux.a .vmlinux.export.o init/version-timestamp.o
--no-whole-archive --start-group --end-group .tmp_vmlinux2.kallsyms.o
.tmp_vmlinux1.btf.o
+ is_enabled CONFIG_DEBUG_INFO_BTF
+ grep -q '^CONFIG_DEBUG_INFO_BTF=y' include/config/auto.conf
+ info BTFIDS vmlinux
+ printf ' %-7s %s\n' BTFIDS vmlinux
BTFIDS vmlinux
+ ./tools/bpf/resolve_btfids/resolve_btfids vmlinux
WARN: resolve_btfids: unresolved symbol bpf_lsm_key_free
FAILED elf_update(WRITE): invalid section entry size
I can send the full config off-list if necessary, but looks like it
might be enough to set CONFIG_DEBUG_INFO_BTF=y.
Vegard
^ permalink raw reply [flat|nested] 73+ messages in thread
* Re: [RFC PATCH 11/28] x86/pvh: Avoid absolute symbol references in .head.text
2024-09-25 15:01 ` [RFC PATCH 11/28] x86/pvh: Avoid absolute symbol references in .head.text Ard Biesheuvel
@ 2024-09-25 21:10 ` Jason Andryuk
2024-09-25 21:50 ` Ard Biesheuvel
0 siblings, 1 reply; 73+ messages in thread
From: Jason Andryuk @ 2024-09-25 21:10 UTC (permalink / raw)
To: Ard Biesheuvel, linux-kernel
Cc: Ard Biesheuvel, x86, H. Peter Anvin, Andy Lutomirski,
Peter Zijlstra, Uros Bizjak, Dennis Zhou, Tejun Heo,
Christoph Lameter, Mathieu Desnoyers, Paolo Bonzini,
Vitaly Kuznetsov, Juergen Gross, Boris Ostrovsky,
Greg Kroah-Hartman, Arnd Bergmann, Masahiro Yamada, Kees Cook,
Nathan Chancellor, Keith Packard, Justin Stitt, Josh Poimboeuf,
Arnaldo Carvalho de Melo, Namhyung Kim, Jiri Olsa, Ian Rogers,
Adrian Hunter, Kan Liang, linux-doc, linux-pm, kvm, xen-devel,
linux-efi, linux-arch, linux-sparse, linux-kbuild,
linux-perf-users, rust-for-linux, llvm
Hi Ard,
On 2024-09-25 11:01, Ard Biesheuvel wrote:
> From: Ard Biesheuvel <ardb@kernel.org>
>
> The .head.text section contains code that may execute from a different
> address than it was linked at. This is fragile, given that the x86 ABI
> can refer to global symbols via absolute or relative references, and the
> toolchain assumes that these are interchangeable, which they are not in
> this particular case.
>
> In the case of the PVH code, there are some additional complications:
> - the absolute references are in 32-bit code, which get emitted with
> R_X86_64_32 relocations, and these are not permitted in PIE code;
> - the code in question is not actually relocatable: it can only run
> correctly from the physical load address specified in the ELF note.
>
> So rewrite the code to only rely on relative symbol references: these
> are always 32-bits wide, even in 64-bit code, and are resolved by the
> linker at build time.
>
> Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Juergen queued up my patches to make the PVH entry point position
independent (5 commits):
https://git.kernel.org/pub/scm/linux/kernel/git/xen/tip.git/log/?h=linux-next
My commit that corresponds to this patch of yours is:
https://git.kernel.org/pub/scm/linux/kernel/git/xen/tip.git/commit/?h=linux-next&id=1db29f99edb056d8445876292f53a63459142309
(There are more changes to handle adjusting the page tables.)
Regards,
Jason
^ permalink raw reply [flat|nested] 73+ messages in thread
* Re: [RFC PATCH 01/28] x86/pvh: Call C code via the kernel virtual mapping
2024-09-25 15:01 ` [RFC PATCH 01/28] x86/pvh: Call C code via the kernel virtual mapping Ard Biesheuvel
@ 2024-09-25 21:12 ` Jason Andryuk
0 siblings, 0 replies; 73+ messages in thread
From: Jason Andryuk @ 2024-09-25 21:12 UTC (permalink / raw)
To: Ard Biesheuvel, linux-kernel
Cc: Ard Biesheuvel, x86, H. Peter Anvin, Andy Lutomirski,
Peter Zijlstra, Uros Bizjak, Dennis Zhou, Tejun Heo,
Christoph Lameter, Mathieu Desnoyers, Paolo Bonzini,
Vitaly Kuznetsov, Juergen Gross, Boris Ostrovsky,
Greg Kroah-Hartman, Arnd Bergmann, Masahiro Yamada, Kees Cook,
Nathan Chancellor, Keith Packard, Justin Stitt, Josh Poimboeuf,
Arnaldo Carvalho de Melo, Namhyung Kim, Jiri Olsa, Ian Rogers,
Adrian Hunter, Kan Liang, linux-doc, linux-pm, kvm, xen-devel,
linux-efi, linux-arch, linux-sparse, linux-kbuild,
linux-perf-users, rust-for-linux, llvm
On 2024-09-25 11:01, Ard Biesheuvel wrote:
> From: Ard Biesheuvel <ardb@kernel.org>
>
> Calling C code via a different mapping than it was linked at is
> problematic, because the compiler assumes that RIP-relative and absolute
> symbol references are interchangeable. GCC in particular may use
> RIP-relative per-CPU variable references even when not using -fpic.
>
> So call xen_prepare_pvh() via its kernel virtual mapping on x86_64, so
> that those RIP-relative references produce the correct values. This
> matches the pre-existing behavior for i386, which also invokes
> xen_prepare_pvh() via the kernel virtual mapping before invoking
> startup_32 with paging disabled again.
>
> Fixes: 7243b93345f7 ("xen/pvh: Bootstrap PVH guest")
> Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Tested-by: Jason Andryuk <jason.andryuk@amd.com>
Reviewed-by: Jason Andryuk <jason.andryuk@amd.com>
I found that before this change xen_prepare_pvh() would call through
some pv_ops function pointers into the kernel virtual mapping.
Regards,
Jason
^ permalink raw reply [flat|nested] 73+ messages in thread
* Re: [RFC PATCH 11/28] x86/pvh: Avoid absolute symbol references in .head.text
2024-09-25 21:10 ` Jason Andryuk
@ 2024-09-25 21:50 ` Ard Biesheuvel
2024-09-25 22:40 ` Jason Andryuk
0 siblings, 1 reply; 73+ messages in thread
From: Ard Biesheuvel @ 2024-09-25 21:50 UTC (permalink / raw)
To: Jason Andryuk
Cc: Ard Biesheuvel, linux-kernel, x86, H. Peter Anvin,
Andy Lutomirski, Peter Zijlstra, Uros Bizjak, Dennis Zhou,
Tejun Heo, Christoph Lameter, Mathieu Desnoyers, Paolo Bonzini,
Vitaly Kuznetsov, Juergen Gross, Boris Ostrovsky,
Greg Kroah-Hartman, Arnd Bergmann, Masahiro Yamada, Kees Cook,
Nathan Chancellor, Keith Packard, Justin Stitt, Josh Poimboeuf,
Arnaldo Carvalho de Melo, Namhyung Kim, Jiri Olsa, Ian Rogers,
Adrian Hunter, Kan Liang, linux-doc, linux-pm, kvm, xen-devel,
linux-efi, linux-arch, linux-sparse, linux-kbuild,
linux-perf-users, rust-for-linux, llvm
On Wed, 25 Sept 2024 at 23:11, Jason Andryuk <jason.andryuk@amd.com> wrote:
>
> Hi Ard,
>
> On 2024-09-25 11:01, Ard Biesheuvel wrote:
> > From: Ard Biesheuvel <ardb@kernel.org>
> >
> > The .head.text section contains code that may execute from a different
> > address than it was linked at. This is fragile, given that the x86 ABI
> > can refer to global symbols via absolute or relative references, and the
> > toolchain assumes that these are interchangeable, which they are not in
> > this particular case.
> >
> > In the case of the PVH code, there are some additional complications:
> > - the absolute references are in 32-bit code, which get emitted with
> > R_X86_64_32 relocations, and these are not permitted in PIE code;
> > - the code in question is not actually relocatable: it can only run
> > correctly from the physical load address specified in the ELF note.
> >
> > So rewrite the code to only rely on relative symbol references: these
> > are always 32-bits wide, even in 64-bit code, and are resolved by the
> > linker at build time.
> >
> > Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
>
> Juergen queued up my patches to make the PVH entry point position
> independent (5 commits):
> https://git.kernel.org/pub/scm/linux/kernel/git/xen/tip.git/log/?h=linux-next
>
> My commit that corresponds to this patch of yours is:
> https://git.kernel.org/pub/scm/linux/kernel/git/xen/tip.git/commit/?h=linux-next&id=1db29f99edb056d8445876292f53a63459142309
>
> (There are more changes to handle adjusting the page tables.)
>
Thanks for the head's up. Those changes look quite similar, so I guess
I should just rebase my stuff onto the xen tree.
The only thing that I would like to keep from my version is
+ lea (gdt - pvh_start_xen)(%ebp), %eax
+ add %eax, 2(%eax)
+ lgdt (%eax)
and
- .word gdt_end - gdt_start
- .long _pa(gdt_start)
+ .word gdt_end - gdt_start - 1
+ .long gdt_start - gdt
The first line is a bugfix, btw, so perhaps I should send that out
separately. But my series relies on all 32-bit absolute symbol
references being removed, since the linker rejects those when running
in PIE mode, and so the second line is needed to get rid of the _pa()
there.
^ permalink raw reply [flat|nested] 73+ messages in thread
* Re: [RFC PATCH 11/28] x86/pvh: Avoid absolute symbol references in .head.text
2024-09-25 21:50 ` Ard Biesheuvel
@ 2024-09-25 22:40 ` Jason Andryuk
0 siblings, 0 replies; 73+ messages in thread
From: Jason Andryuk @ 2024-09-25 22:40 UTC (permalink / raw)
To: Ard Biesheuvel
Cc: Ard Biesheuvel, linux-kernel, x86, H. Peter Anvin,
Andy Lutomirski, Peter Zijlstra, Uros Bizjak, Dennis Zhou,
Tejun Heo, Christoph Lameter, Mathieu Desnoyers, Paolo Bonzini,
Vitaly Kuznetsov, Juergen Gross, Boris Ostrovsky,
Greg Kroah-Hartman, Arnd Bergmann, Masahiro Yamada, Kees Cook,
Nathan Chancellor, Keith Packard, Justin Stitt, Josh Poimboeuf,
Arnaldo Carvalho de Melo, Namhyung Kim, Jiri Olsa, Ian Rogers,
Adrian Hunter, Kan Liang, linux-doc, linux-pm, kvm, xen-devel,
linux-efi, linux-arch, linux-sparse, linux-kbuild,
linux-perf-users, rust-for-linux, llvm
On 2024-09-25 17:50, Ard Biesheuvel wrote:
> On Wed, 25 Sept 2024 at 23:11, Jason Andryuk <jason.andryuk@amd.com> wrote:
>>
>> Hi Ard,
>>
>> On 2024-09-25 11:01, Ard Biesheuvel wrote:
>>> From: Ard Biesheuvel <ardb@kernel.org>
>>>
>>> The .head.text section contains code that may execute from a different
>>> address than it was linked at. This is fragile, given that the x86 ABI
>>> can refer to global symbols via absolute or relative references, and the
>>> toolchain assumes that these are interchangeable, which they are not in
>>> this particular case.
>>>
>>> In the case of the PVH code, there are some additional complications:
>>> - the absolute references are in 32-bit code, which get emitted with
>>> R_X86_64_32 relocations, and these are not permitted in PIE code;
>>> - the code in question is not actually relocatable: it can only run
>>> correctly from the physical load address specified in the ELF note.
>>>
>>> So rewrite the code to only rely on relative symbol references: these
>>> are always 32-bits wide, even in 64-bit code, and are resolved by the
>>> linker at build time.
>>>
>>> Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
>>
>> Juergen queued up my patches to make the PVH entry point position
>> independent (5 commits):
>> https://git.kernel.org/pub/scm/linux/kernel/git/xen/tip.git/log/?h=linux-next
>>
>> My commit that corresponds to this patch of yours is:
>> https://git.kernel.org/pub/scm/linux/kernel/git/xen/tip.git/commit/?h=linux-next&id=1db29f99edb056d8445876292f53a63459142309
>>
>> (There are more changes to handle adjusting the page tables.)
>>
>
> Thanks for the head's up. Those changes look quite similar, so I guess
> I should just rebase my stuff onto the xen tree.
>
> The only thing that I would like to keep from my version is
>
> + lea (gdt - pvh_start_xen)(%ebp), %eax
If you rebase on top of the xen tree, using rva() would match the rest
of the code:
lea rva(gdt)(%ebp), %eax
> + add %eax, 2(%eax)
> + lgdt (%eax)
>
> and
>
> - .word gdt_end - gdt_start
> - .long _pa(gdt_start)
> + .word gdt_end - gdt_start - 1
> + .long gdt_start - gdt
>
> The first line is a bugfix, btw, so perhaps I should send that out
> separately. But my series relies on all 32-bit absolute symbol
> references being removed, since the linker rejects those when running
> in PIE mode, and so the second line is needed to get rid of the _pa()
> there.
Sounds good.
Regards,
Jason
^ permalink raw reply [flat|nested] 73+ messages in thread
* Re: [RFC PATCH 27/28] x86/kernel: Switch to PIE linking for the core kernel
2024-09-25 20:24 ` Vegard Nossum
@ 2024-09-26 13:38 ` Ard Biesheuvel
0 siblings, 0 replies; 73+ messages in thread
From: Ard Biesheuvel @ 2024-09-26 13:38 UTC (permalink / raw)
To: Vegard Nossum
Cc: Ard Biesheuvel, linux-kernel, x86, H. Peter Anvin,
Andy Lutomirski, Peter Zijlstra, Uros Bizjak, Dennis Zhou,
Tejun Heo, Christoph Lameter, Mathieu Desnoyers, Paolo Bonzini,
Vitaly Kuznetsov, Juergen Gross, Boris Ostrovsky,
Greg Kroah-Hartman, Arnd Bergmann, Masahiro Yamada, Kees Cook,
Nathan Chancellor, Keith Packard, Justin Stitt, Josh Poimboeuf,
Arnaldo Carvalho de Melo, Namhyung Kim, Jiri Olsa, Ian Rogers,
Adrian Hunter, Kan Liang, linux-doc, linux-pm, kvm, xen-devel,
linux-efi, linux-arch, linux-sparse, linux-kbuild,
linux-perf-users, rust-for-linux, llvm
On Wed, 25 Sept 2024 at 22:25, Vegard Nossum <vegard.nossum@oracle.com> wrote:
>
>
> On 25/09/2024 17:01, Ard Biesheuvel wrote:
> > From: Ard Biesheuvel <ardb@kernel.org>
> >
> > Build the kernel as a Position Independent Executable (PIE). This
> > results in more efficient relocation processing for the virtual
> > displacement of the kernel (for KASLR). More importantly, it instructs
> > the linker to generate what is actually needed (a program that can be
> > moved around in memory before execution), which is better than having to
> > rely on the linker to create a position dependent binary that happens to
> > tolerate being moved around after poking it in exactly the right manner.
> >
> > Note that this means that all codegen should be compatible with PIE,
> > including Rust objects, so this needs to switch to the small code model
> > with the PIE relocation model as well.
> >
> > Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
> > ---
> > arch/x86/Kconfig | 2 +-
> > arch/x86/Makefile | 11 +++++++----
> > arch/x86/boot/compressed/misc.c | 2 ++
> > arch/x86/kernel/vmlinux.lds.S | 5 +++++
> > drivers/firmware/efi/libstub/x86-stub.c | 2 ++
> > 5 files changed, 17 insertions(+), 5 deletions(-)
> >
...
>
> This patch causes a build failure here (on 64-bit):
>
> LD .tmp_vmlinux2
> NM .tmp_vmlinux2.syms
> KSYMS .tmp_vmlinux2.kallsyms.S
> AS .tmp_vmlinux2.kallsyms.o
> LD vmlinux
> BTFIDS vmlinux
> WARN: resolve_btfids: unresolved symbol bpf_lsm_key_free
> FAILED elf_update(WRITE): invalid section entry size
> make[5]: *** [scripts/Makefile.vmlinux:34: vmlinux] Error 255
> make[5]: *** Deleting file 'vmlinux'
> make[4]: *** [Makefile:1153: vmlinux] Error 2
> make[3]: *** [debian/rules:74: build-arch] Error 2
> dpkg-buildpackage: error: make -f debian/rules binary subprocess
> returned exit status 2
> make[2]: *** [scripts/Makefile.package:121: bindeb-pkg] Error 2
> make[1]: *** [/home/opc/linux-mainline-worktree2/Makefile:1544:
> bindeb-pkg] Error 2
> make: *** [Makefile:224: __sub-make] Error 2
>
> The parent commit builds fine. With V=1:
>
> + ldflags='-m elf_x86_64 -z noexecstack --pie -z text -z
> call-nop=suffix-nop -z max-page-size=0x200000 --build-id=sha1
> --orphan-handling=warn --script=./arch/x86/kernel/vmlinux.lds
> -Map=vmlinux.map'
> + ld -m elf_x86_64 -z noexecstack --pie -z text -z call-nop=suffix-nop
> -z max-page-size=0x200000 --build-id=sha1 --orphan-handling=warn
> --script=./arch/x86/kernel/vmlinux.lds -Map=vmlinux.map -o vmlinux
> --whole-archive vmlinux.a .vmlinux.export.o init/version-timestamp.o
> --no-whole-archive --start-group --end-group .tmp_vmlinux2.kallsyms.o
> .tmp_vmlinux1.btf.o
> + is_enabled CONFIG_DEBUG_INFO_BTF
> + grep -q '^CONFIG_DEBUG_INFO_BTF=y' include/config/auto.conf
> + info BTFIDS vmlinux
> + printf ' %-7s %s\n' BTFIDS vmlinux
> BTFIDS vmlinux
> + ./tools/bpf/resolve_btfids/resolve_btfids vmlinux
> WARN: resolve_btfids: unresolved symbol bpf_lsm_key_free
> FAILED elf_update(WRITE): invalid section entry size
>
> I can send the full config off-list if necessary, but looks like it
> might be enough to set CONFIG_DEBUG_INFO_BTF=y.
>
Thanks for the report. Turns out that adding the GOT to .rodata bumps
the section's sh_entsize to 8, and libelf complains if the section
size is not a multiple of the entry size.
I'll include a fix in the next revision.
^ permalink raw reply [flat|nested] 73+ messages in thread
* Re: [RFC PATCH 02/28] Documentation: Bump minimum GCC version to 8.1
2024-09-25 15:01 ` [RFC PATCH 02/28] Documentation: Bump minimum GCC version to 8.1 Ard Biesheuvel
2024-09-25 15:58 ` Arnd Bergmann
@ 2024-09-26 21:35 ` Miguel Ojeda
2024-09-27 16:22 ` Mark Rutland
2 siblings, 0 replies; 73+ messages in thread
From: Miguel Ojeda @ 2024-09-26 21:35 UTC (permalink / raw)
To: Ard Biesheuvel
Cc: linux-kernel, Ard Biesheuvel, x86, H. Peter Anvin,
Andy Lutomirski, Peter Zijlstra, Uros Bizjak, Dennis Zhou,
Tejun Heo, Christoph Lameter, Mathieu Desnoyers, Paolo Bonzini,
Vitaly Kuznetsov, Juergen Gross, Boris Ostrovsky,
Greg Kroah-Hartman, Arnd Bergmann, Masahiro Yamada, Kees Cook,
Nathan Chancellor, Keith Packard, Justin Stitt, Josh Poimboeuf,
Arnaldo Carvalho de Melo, Namhyung Kim, Jiri Olsa, Ian Rogers,
Adrian Hunter, Kan Liang, linux-doc, linux-pm, kvm, xen-devel,
linux-efi, linux-arch, linux-sparse, linux-kbuild,
linux-perf-users, rust-for-linux, llvm
On Wed, Sep 25, 2024 at 5:10 PM Ard Biesheuvel <ardb+git@google.com> wrote:
>
> Documentation/admin-guide/README.rst | 2 +-
> Documentation/process/changes.rst | 2 +-
This should update scripts/min-tool-version.sh too. With that:
Acked-by: Miguel Ojeda <ojeda@kernel.org>
As Arnd says, the cleanups can be done afterwards.
Cheers,
Miguel
^ permalink raw reply [flat|nested] 73+ messages in thread
* Re: [RFC PATCH 02/28] Documentation: Bump minimum GCC version to 8.1
2024-09-25 15:01 ` [RFC PATCH 02/28] Documentation: Bump minimum GCC version to 8.1 Ard Biesheuvel
2024-09-25 15:58 ` Arnd Bergmann
2024-09-26 21:35 ` Miguel Ojeda
@ 2024-09-27 16:22 ` Mark Rutland
2 siblings, 0 replies; 73+ messages in thread
From: Mark Rutland @ 2024-09-27 16:22 UTC (permalink / raw)
To: Ard Biesheuvel
Cc: linux-kernel, Ard Biesheuvel, x86, H. Peter Anvin,
Andy Lutomirski, Peter Zijlstra, Uros Bizjak, Dennis Zhou,
Tejun Heo, Christoph Lameter, Mathieu Desnoyers, Paolo Bonzini,
Vitaly Kuznetsov, Juergen Gross, Boris Ostrovsky,
Greg Kroah-Hartman, Arnd Bergmann, Masahiro Yamada, Kees Cook,
Nathan Chancellor, Keith Packard, Justin Stitt, Josh Poimboeuf,
Arnaldo Carvalho de Melo, Namhyung Kim, Jiri Olsa, Ian Rogers,
Adrian Hunter, Kan Liang, linux-doc, linux-pm, kvm, xen-devel,
linux-efi, linux-arch, linux-sparse, linux-kbuild,
linux-perf-users, rust-for-linux, llvm
On Wed, Sep 25, 2024 at 05:01:02PM +0200, Ard Biesheuvel wrote:
> From: Ard Biesheuvel <ardb@kernel.org>
>
> Bump the minimum GCC version to 8.1 to gain unconditional support for
> referring to the per-task stack cookie using a symbol rather than
> relying on the fixed offset of 40 bytes from %GS, which requires
> elaborate hacks to support.
>
> Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
> ---
> Documentation/admin-guide/README.rst | 2 +-
> Documentation/process/changes.rst | 2 +-
> 2 files changed, 2 insertions(+), 2 deletions(-)
I'd like this for arm64 and others too (for unconditional support for
-fpatchable-function-entry), so FWIW:
Acked-by: Mark Rutland <mark.rutland@arm.com>
I think you'll want to update scripts/min-tool-version.sh too; judging
by the diff in the cover letter that's not handled elsehere in the
series.
Mark.
>
> diff --git a/Documentation/admin-guide/README.rst b/Documentation/admin-guide/README.rst
> index f2bebff6a733..3dda41923ed6 100644
> --- a/Documentation/admin-guide/README.rst
> +++ b/Documentation/admin-guide/README.rst
> @@ -259,7 +259,7 @@ Configuring the kernel
> Compiling the kernel
> --------------------
>
> - - Make sure you have at least gcc 5.1 available.
> + - Make sure you have at least gcc 8.1 available.
> For more information, refer to :ref:`Documentation/process/changes.rst <changes>`.
>
> - Do a ``make`` to create a compressed kernel image. It is also possible to do
> diff --git a/Documentation/process/changes.rst b/Documentation/process/changes.rst
> index 00f1ed7c59c3..59b7d3d8a577 100644
> --- a/Documentation/process/changes.rst
> +++ b/Documentation/process/changes.rst
> @@ -29,7 +29,7 @@ you probably needn't concern yourself with pcmciautils.
> ====================== =============== ========================================
> Program Minimal version Command to check the version
> ====================== =============== ========================================
> -GNU C 5.1 gcc --version
> +GNU C 8.1 gcc --version
> Clang/LLVM (optional) 13.0.1 clang --version
> Rust (optional) 1.78.0 rustc --version
> bindgen (optional) 0.65.1 bindgen --version
> --
> 2.46.0.792.g87dc391469-goog
>
>
^ permalink raw reply [flat|nested] 73+ messages in thread
* Re: [RFC PATCH 05/28] x86: Define the stack protector guard symbol explicitly
2024-09-25 18:32 ` Uros Bizjak
@ 2024-09-28 13:41 ` Brian Gerst
2024-10-04 13:15 ` Ard Biesheuvel
0 siblings, 1 reply; 73+ messages in thread
From: Brian Gerst @ 2024-09-28 13:41 UTC (permalink / raw)
To: Uros Bizjak
Cc: Ard Biesheuvel, linux-kernel, Ard Biesheuvel, x86, H. Peter Anvin,
Andy Lutomirski, Peter Zijlstra, Dennis Zhou, Tejun Heo,
Christoph Lameter, Mathieu Desnoyers, Paolo Bonzini,
Vitaly Kuznetsov, Juergen Gross, Boris Ostrovsky,
Greg Kroah-Hartman, Arnd Bergmann, Masahiro Yamada, Kees Cook,
Nathan Chancellor, Keith Packard, Justin Stitt, Josh Poimboeuf,
Arnaldo Carvalho de Melo, Namhyung Kim, Jiri Olsa, Ian Rogers,
Adrian Hunter, Kan Liang, linux-doc, linux-pm, kvm, xen-devel,
linux-efi, linux-arch, linux-sparse, linux-kbuild,
linux-perf-users, rust-for-linux, llvm
On Wed, Sep 25, 2024 at 2:33 PM Uros Bizjak <ubizjak@gmail.com> wrote:
>
> On Wed, Sep 25, 2024 at 5:02 PM Ard Biesheuvel <ardb+git@google.com> wrote:
> >
> > From: Ard Biesheuvel <ardb@kernel.org>
> >
> > Specify the guard symbol for the stack cookie explicitly, rather than
> > positioning it exactly 40 bytes into the per-CPU area. Doing so removes
> > the need for the per-CPU region to be absolute rather than relative to
> > the placement of the per-CPU template region in the kernel image, and
> > this allows the special handling for absolute per-CPU symbols to be
> > removed entirely.
> >
> > This is a worthwhile cleanup in itself, but it is also a prerequisite
> > for PIE codegen and PIE linking, which can replace our bespoke and
> > rather clunky runtime relocation handling.
>
> I would like to point out a series that converted the stack protector
> guard symbol to a normal percpu variable [1], so there was no need to
> assume anything about the location of the guard symbol.
>
> [1] "[PATCH v4 00/16] x86-64: Stack protector and percpu improvements"
> https://lore.kernel.org/lkml/20240322165233.71698-1-brgerst@gmail.com/
>
> Uros.
I plan on resubmitting that series sometime after the 6.12 merge
window closes. As I recall from the last version, it was decided to
wait until after the next LTS release to raise the minimum GCC version
to 8.1 and avoid the need to be compatible with the old stack
protector layout.
Brian Gerst
^ permalink raw reply [flat|nested] 73+ messages in thread
* Re: [RFC PATCH 04/28] x86/boot: Permit GOTPCREL relocations for x86_64 builds
2024-09-25 15:01 ` [RFC PATCH 04/28] x86/boot: Permit GOTPCREL relocations for x86_64 builds Ard Biesheuvel
@ 2024-10-01 5:33 ` Josh Poimboeuf
2024-10-01 6:56 ` Ard Biesheuvel
0 siblings, 1 reply; 73+ messages in thread
From: Josh Poimboeuf @ 2024-10-01 5:33 UTC (permalink / raw)
To: Ard Biesheuvel
Cc: linux-kernel, Ard Biesheuvel, x86, H. Peter Anvin,
Andy Lutomirski, Peter Zijlstra, Uros Bizjak, Dennis Zhou,
Tejun Heo, Christoph Lameter, Mathieu Desnoyers, Paolo Bonzini,
Vitaly Kuznetsov, Juergen Gross, Boris Ostrovsky,
Greg Kroah-Hartman, Arnd Bergmann, Masahiro Yamada, Kees Cook,
Nathan Chancellor, Keith Packard, Justin Stitt,
Arnaldo Carvalho de Melo, Namhyung Kim, Jiri Olsa, Ian Rogers,
Adrian Hunter, Kan Liang, linux-doc, linux-pm, kvm, xen-devel,
linux-efi, linux-arch, linux-sparse, linux-kbuild,
linux-perf-users, rust-for-linux, llvm
On Wed, Sep 25, 2024 at 05:01:04PM +0200, Ard Biesheuvel wrote:
> + if (r_type == R_X86_64_GOTPCREL) {
> + Elf_Shdr *s = &secs[sec->shdr.sh_info].shdr;
> + unsigned file_off = offset - s->sh_addr + s->sh_offset;
> +
> + /*
> + * GOTPCREL relocations refer to instructions that load
> + * a 64-bit address via a 32-bit relative reference to
> + * the GOT. In this case, it is the GOT entry that
> + * needs to be fixed up, not the immediate offset in
> + * the opcode. Note that the linker will have applied an
> + * addend of -4 to compensate for the delta between the
> + * relocation offset and the value of RIP when the
> + * instruction executes, and this needs to be backed out
> + * again. (Addends other than -4 are permitted in
> + * principle, but make no sense in practice so they are
> + * not supported.)
> + */
> + if (rel->r_addend != -4) {
> + die("invalid addend (%ld) for %s relocation: %s\n",
> + rel->r_addend, rel_type(r_type), symname);
> + break;
> + }
For x86 PC-relative addressing, the addend is <reloc offset> -
<subsequent insn offset>. So a PC-relative addend can be something
other than -4 when the relocation applies to the middle of an
instruction, e.g.:
5b381: 66 81 3d 00 00 00 00 01 06 cmpw $0x601,0x0(%rip) # 5b38a <generic_validate_add_page+0x4a> 5b384: R_X86_64_PC32 boot_cpu_data-0x6
5f283: 81 3d 00 00 00 00 ff ff ff 00 cmpl $0xffffff,0x0(%rip) # 5f28d <x86_acpi_suspend_lowlevel+0x9d> 5f285: R_X86_64_PC32 smpboot_control-0x8
72f67: c6 05 00 00 00 00 01 movb $0x1,0x0(%rip) # 72f6e <sched_itmt_update_handler+0x6e> 72f69: R_X86_64_PC32 x86_topology_update-0x5
Presumably that could also happen with R_X86_64_GOTPCREL?
--
Josh
^ permalink raw reply [flat|nested] 73+ messages in thread
* Re: [RFC PATCH 04/28] x86/boot: Permit GOTPCREL relocations for x86_64 builds
2024-10-01 5:33 ` Josh Poimboeuf
@ 2024-10-01 6:56 ` Ard Biesheuvel
0 siblings, 0 replies; 73+ messages in thread
From: Ard Biesheuvel @ 2024-10-01 6:56 UTC (permalink / raw)
To: Josh Poimboeuf
Cc: Ard Biesheuvel, linux-kernel, x86, H. Peter Anvin,
Andy Lutomirski, Peter Zijlstra, Uros Bizjak, Dennis Zhou,
Tejun Heo, Christoph Lameter, Mathieu Desnoyers, Paolo Bonzini,
Vitaly Kuznetsov, Juergen Gross, Boris Ostrovsky,
Greg Kroah-Hartman, Arnd Bergmann, Masahiro Yamada, Kees Cook,
Nathan Chancellor, Keith Packard, Justin Stitt,
Arnaldo Carvalho de Melo, Namhyung Kim, Jiri Olsa, Ian Rogers,
Adrian Hunter, Kan Liang, linux-doc, linux-pm, kvm, xen-devel,
linux-efi, linux-arch, linux-sparse, linux-kbuild,
linux-perf-users, rust-for-linux, llvm
On Tue, 1 Oct 2024 at 07:33, Josh Poimboeuf <jpoimboe@kernel.org> wrote:
>
> On Wed, Sep 25, 2024 at 05:01:04PM +0200, Ard Biesheuvel wrote:
> > + if (r_type == R_X86_64_GOTPCREL) {
> > + Elf_Shdr *s = &secs[sec->shdr.sh_info].shdr;
> > + unsigned file_off = offset - s->sh_addr + s->sh_offset;
> > +
> > + /*
> > + * GOTPCREL relocations refer to instructions that load
> > + * a 64-bit address via a 32-bit relative reference to
> > + * the GOT. In this case, it is the GOT entry that
> > + * needs to be fixed up, not the immediate offset in
> > + * the opcode. Note that the linker will have applied an
> > + * addend of -4 to compensate for the delta between the
> > + * relocation offset and the value of RIP when the
> > + * instruction executes, and this needs to be backed out
> > + * again. (Addends other than -4 are permitted in
> > + * principle, but make no sense in practice so they are
> > + * not supported.)
> > + */
> > + if (rel->r_addend != -4) {
> > + die("invalid addend (%ld) for %s relocation: %s\n",
> > + rel->r_addend, rel_type(r_type), symname);
> > + break;
> > + }
>
> For x86 PC-relative addressing, the addend is <reloc offset> -
> <subsequent insn offset>. So a PC-relative addend can be something
> other than -4 when the relocation applies to the middle of an
> instruction, e.g.:
>
> 5b381: 66 81 3d 00 00 00 00 01 06 cmpw $0x601,0x0(%rip) # 5b38a <generic_validate_add_page+0x4a> 5b384: R_X86_64_PC32 boot_cpu_data-0x6
>
> 5f283: 81 3d 00 00 00 00 ff ff ff 00 cmpl $0xffffff,0x0(%rip) # 5f28d <x86_acpi_suspend_lowlevel+0x9d> 5f285: R_X86_64_PC32 smpboot_control-0x8
>
> 72f67: c6 05 00 00 00 00 01 movb $0x1,0x0(%rip) # 72f6e <sched_itmt_update_handler+0x6e> 72f69: R_X86_64_PC32 x86_topology_update-0x5
>
> Presumably that could also happen with R_X86_64_GOTPCREL?
>
In theory, yes.
But for the class of GOTPCREL relaxable instructions listed in the
psABI, the addend is always -4, and these are the only ones we might
expect from the compiler when using -fpic with 'hidden' visibility
and/or -mdirect-extern-access. Note that the memory operand
foo@GOTPCREL(%rip) produces the *address* of foo, and so it is always
the source operand, appearing at the end of the encoding.
Alternatively, we might simply subtract the addend from 'offset'
before applying the displacement from the opcode.
Note that this code gets removed again in the last patch, after
switching to PIE linking.
^ permalink raw reply [flat|nested] 73+ messages in thread
* Re: [RFC PATCH 24/28] tools/objtool: Treat indirect ftrace calls as direct calls
2024-09-25 15:01 ` [RFC PATCH 24/28] tools/objtool: Treat indirect ftrace calls as direct calls Ard Biesheuvel
@ 2024-10-01 7:18 ` Josh Poimboeuf
2024-10-01 7:39 ` Ard Biesheuvel
0 siblings, 1 reply; 73+ messages in thread
From: Josh Poimboeuf @ 2024-10-01 7:18 UTC (permalink / raw)
To: Ard Biesheuvel
Cc: linux-kernel, Ard Biesheuvel, x86, H. Peter Anvin,
Andy Lutomirski, Peter Zijlstra, Uros Bizjak, Dennis Zhou,
Tejun Heo, Christoph Lameter, Mathieu Desnoyers, Paolo Bonzini,
Vitaly Kuznetsov, Juergen Gross, Boris Ostrovsky,
Greg Kroah-Hartman, Arnd Bergmann, Masahiro Yamada, Kees Cook,
Nathan Chancellor, Keith Packard, Justin Stitt,
Arnaldo Carvalho de Melo, Namhyung Kim, Jiri Olsa, Ian Rogers,
Adrian Hunter, Kan Liang, linux-doc, linux-pm, kvm, xen-devel,
linux-efi, linux-arch, linux-sparse, linux-kbuild,
linux-perf-users, rust-for-linux, llvm
On Wed, Sep 25, 2024 at 05:01:24PM +0200, Ard Biesheuvel wrote:
> + if (insn->type == INSN_CALL_DYNAMIC) {
> + if (!reloc)
> + continue;
> +
> + /*
> + * GCC 13 and older on x86 will always emit the call to
> + * __fentry__ using a relaxable GOT-based symbol
> + * reference when operating in PIC mode, i.e.,
> + *
> + * call *0x0(%rip)
> + * R_X86_64_GOTPCRELX __fentry__-0x4
> + *
> + * where it is left up to the linker to relax this into
> + *
> + * call __fentry__
> + * nop
> + *
> + * if __fentry__ turns out to be DSO local, which is
> + * always the case for vmlinux. Given that this
> + * relaxation is mandatory per the x86_64 psABI, these
> + * calls can simply be treated as direct calls.
> + */
> + if (arch_ftrace_match(reloc->sym->name)) {
> + insn->type = INSN_CALL;
> + add_call_dest(file, insn, reloc->sym, false);
> + }
Can the compiler also do this for non-fentry direct calls? If so would
it make sense to generalize this by converting all
INSN_CALL_DYNAMIC+reloc to INSN_CALL?
And maybe something similar for add_jump_destinations().
--
Josh
^ permalink raw reply [flat|nested] 73+ messages in thread
* Re: [RFC PATCH 24/28] tools/objtool: Treat indirect ftrace calls as direct calls
2024-10-01 7:18 ` Josh Poimboeuf
@ 2024-10-01 7:39 ` Ard Biesheuvel
0 siblings, 0 replies; 73+ messages in thread
From: Ard Biesheuvel @ 2024-10-01 7:39 UTC (permalink / raw)
To: Josh Poimboeuf
Cc: Ard Biesheuvel, linux-kernel, x86, H. Peter Anvin,
Andy Lutomirski, Peter Zijlstra, Uros Bizjak, Dennis Zhou,
Tejun Heo, Christoph Lameter, Mathieu Desnoyers, Paolo Bonzini,
Vitaly Kuznetsov, Juergen Gross, Boris Ostrovsky,
Greg Kroah-Hartman, Arnd Bergmann, Masahiro Yamada, Kees Cook,
Nathan Chancellor, Keith Packard, Justin Stitt,
Arnaldo Carvalho de Melo, Namhyung Kim, Jiri Olsa, Ian Rogers,
Adrian Hunter, Kan Liang, linux-doc, linux-pm, kvm, xen-devel,
linux-efi, linux-arch, linux-sparse, linux-kbuild,
linux-perf-users, rust-for-linux, llvm
On Tue, 1 Oct 2024 at 09:18, Josh Poimboeuf <jpoimboe@kernel.org> wrote:
>
> On Wed, Sep 25, 2024 at 05:01:24PM +0200, Ard Biesheuvel wrote:
> > + if (insn->type == INSN_CALL_DYNAMIC) {
> > + if (!reloc)
> > + continue;
> > +
> > + /*
> > + * GCC 13 and older on x86 will always emit the call to
> > + * __fentry__ using a relaxable GOT-based symbol
> > + * reference when operating in PIC mode, i.e.,
> > + *
> > + * call *0x0(%rip)
> > + * R_X86_64_GOTPCRELX __fentry__-0x4
> > + *
> > + * where it is left up to the linker to relax this into
> > + *
> > + * call __fentry__
> > + * nop
> > + *
> > + * if __fentry__ turns out to be DSO local, which is
> > + * always the case for vmlinux. Given that this
> > + * relaxation is mandatory per the x86_64 psABI, these
> > + * calls can simply be treated as direct calls.
> > + */
> > + if (arch_ftrace_match(reloc->sym->name)) {
> > + insn->type = INSN_CALL;
> > + add_call_dest(file, insn, reloc->sym, false);
> > + }
>
> Can the compiler also do this for non-fentry direct calls?
No, it is essentially an oversight in GCC that this happens at all,
and I fixed it [0] for GCC 14, i.e., to honour -mdirect-extern-access
when emitting these calls.
But even without that, it is peculiar at the very least that the
compiler would emit GOT based indirect calls at all.
Instead of
call *__fentry__@GOTPCREL(%rip)
it should simply emit
call __fentry__@PLT
and leave it up to the linker to resolve this directly or
lazily/eagerly via a PLT jump (assuming -fno-plt is not being used)
> If so would
> it make sense to generalize this by converting all
> INSN_CALL_DYNAMIC+reloc to INSN_CALL?
>
> And maybe something similar for add_jump_destinations().
>
I suppose that the pattern INSN_CALL_DYNAMIC+reloc is unambiguous, and
can therefore always be treated as INSN_CALL. But I don't anticipate
any other occurrences here, and if they do exist, they indicate some
other weirdness in the compiler, so perhaps it is better not to add
general support for these.
[0] https://gcc.gnu.org/git/?p=gcc.git;a=commit;h=bde21de1205c0456f6df68c950fb7ee631fcfa93
^ permalink raw reply [flat|nested] 73+ messages in thread
* Re: [RFC PATCH 25/28] x86: Use PIE codegen for the core kernel
2024-09-25 15:01 ` [RFC PATCH 25/28] x86: Use PIE codegen for the core kernel Ard Biesheuvel
@ 2024-10-01 21:13 ` H. Peter Anvin
2024-10-02 15:25 ` Ard Biesheuvel
0 siblings, 1 reply; 73+ messages in thread
From: H. Peter Anvin @ 2024-10-01 21:13 UTC (permalink / raw)
To: Ard Biesheuvel, linux-kernel
Cc: Ard Biesheuvel, x86, Andy Lutomirski, Peter Zijlstra, Uros Bizjak,
Dennis Zhou, Tejun Heo, Christoph Lameter, Mathieu Desnoyers,
Paolo Bonzini, Vitaly Kuznetsov, Juergen Gross, Boris Ostrovsky,
Greg Kroah-Hartman, Arnd Bergmann, Masahiro Yamada, Kees Cook,
Nathan Chancellor, Keith Packard, Justin Stitt, Josh Poimboeuf,
Arnaldo Carvalho de Melo, Namhyung Kim, Jiri Olsa, Ian Rogers,
Adrian Hunter, Kan Liang, linux-doc, linux-pm, kvm, xen-devel,
linux-efi, linux-arch, linux-sparse, linux-kbuild,
linux-perf-users, rust-for-linux, llvm
On 9/25/24 08:01, Ard Biesheuvel wrote:
> From: Ard Biesheuvel <ardb@kernel.org>
>
> As an intermediate step towards enabling PIE linking for the 64-bit x86
> kernel, enable PIE codegen for all objects that are linked into the
> kernel proper.
>
> This substantially reduces the number of relocations that need to be
> processed when booting a relocatable KASLR kernel.
>
This really seems like going completely backwards to me.
You are imposing a more restrictive code model on the kernel, optimizing
for boot time in a way that will exert a permanent cost on the running
kernel.
There is a *huge* difference between the kernel and user space here:
KERNEL MEMORY IS PERMANENTLY ALLOCATED, AND IS NEVER SHARED.
Dirtying user pages requires them to be unshared and dirty, which is
undesirable. Kernel pages are *always* unshared and dirty.
> It also brings us much closer to the ordinary PIE relocation model used
> for most of user space, which is therefore much better supported and
> less likely to create problems as we increase the range of compilers and
> linkers that need to be supported.
We have been resisting *for ages* making the kernel worse to accomodate
broken compilers. We don't "need" to support more compilers -- we need
the compilers to support us. We have working compilers; any new compiler
that wants to play should be expected to work correctly.
-hpa
^ permalink raw reply [flat|nested] 73+ messages in thread
* Re: [RFC PATCH 25/28] x86: Use PIE codegen for the core kernel
2024-10-01 21:13 ` H. Peter Anvin
@ 2024-10-02 15:25 ` Ard Biesheuvel
2024-10-02 20:01 ` Linus Torvalds
0 siblings, 1 reply; 73+ messages in thread
From: Ard Biesheuvel @ 2024-10-02 15:25 UTC (permalink / raw)
To: H. Peter Anvin
Cc: Ard Biesheuvel, linux-kernel, x86, Andy Lutomirski,
Peter Zijlstra, Uros Bizjak, Dennis Zhou, Tejun Heo,
Christoph Lameter, Mathieu Desnoyers, Paolo Bonzini,
Vitaly Kuznetsov, Juergen Gross, Boris Ostrovsky,
Greg Kroah-Hartman, Arnd Bergmann, Masahiro Yamada, Kees Cook,
Nathan Chancellor, Keith Packard, Justin Stitt, Josh Poimboeuf,
Arnaldo Carvalho de Melo, Namhyung Kim, Jiri Olsa, Ian Rogers,
Adrian Hunter, Kan Liang, linux-doc, linux-pm, kvm, xen-devel,
linux-efi, linux-arch, linux-sparse, linux-kbuild,
linux-perf-users, rust-for-linux, llvm
Hi Peter,
Thanks for taking a look.
On Tue, 1 Oct 2024 at 23:13, H. Peter Anvin <hpa@zytor.com> wrote:
>
> On 9/25/24 08:01, Ard Biesheuvel wrote:
> > From: Ard Biesheuvel <ardb@kernel.org>
> >
> > As an intermediate step towards enabling PIE linking for the 64-bit x86
> > kernel, enable PIE codegen for all objects that are linked into the
> > kernel proper.
> >
> > This substantially reduces the number of relocations that need to be
> > processed when booting a relocatable KASLR kernel.
> >
>
> This really seems like going completely backwards to me.
>
> You are imposing a more restrictive code model on the kernel, optimizing
> for boot time in a way that will exert a permanent cost on the running
> kernel.
>
Fair point about the boot time. This is not the only concern, though,
and arguably the least important one.
As I responded to Andi before, it is also about using a code model and
relocation model that matches the reality of how the code is executed:
- the early C code runs from the 1:1 mapping, and needs special hacks
to accommodate this
- KASLR runs the kernel from a different virtual address than the one
we told the linker about
> There is a *huge* difference between the kernel and user space here:
>
> KERNEL MEMORY IS PERMANENTLY ALLOCATED, AND IS NEVER SHARED.
>
No need to shout.
> Dirtying user pages requires them to be unshared and dirty, which is
> undesirable. Kernel pages are *always* unshared and dirty.
>
I guess you are referring to the use of a GOT? That is a valid
concern, but it does not apply here. With hidden visibility and
compiler command line options like -mdirect-access-extern, all emitted
symbol references are direct. Disallowing text relocations could be
trivially enabled with this series if desired, and actually helps
avoid the tricky bugs we keep fixing in the early startup code that
executes from the 1:1 mapping (the C code in .head.text)
So it mostly comes down to minor differences in addressing modes, e.g.,
movq $sym, %reg
actually uses more bytes than
leaq sym(%rip), %reg
whereas
movq sym, %reg
and
movq sym(%rip), %reg
are the same length.
OTOH, indexing a statically allocated global array like
movl array(,%reg1,4), %reg2
will be converted into
leaq array(%rip), %reg2
movl (%reg2,%reg1,4), %reg2
and is therefore less efficient in terms of code footprint. But in
general, the x86_64 ISA and psABI are quite flexible in this regard,
and extrapolating from past experiences with PIC code on i386 is not
really justified here.
As Andi also pointed out, what ultimately matters is the performance,
as well as code size where it impacts performance, through the I-cache
footprint. I'll do some testing before reposting, and maybe not bother
if the impact is negative.
> > It also brings us much closer to the ordinary PIE relocation model used
> > for most of user space, which is therefore much better supported and
> > less likely to create problems as we increase the range of compilers and
> > linkers that need to be supported.
>
> We have been resisting *for ages* making the kernel worse to accomodate
> broken compilers. We don't "need" to support more compilers -- we need
> the compilers to support us. We have working compilers; any new compiler
> that wants to play should be expected to work correctly.
>
We are in a much better place now than we were before in that regard,
which is actually how this effort came about: instead of lying to the
compiler, and maintaining our own pile of scripts and relocation
tools, we can just do what other arches are doing in Linux, and let
the toolchain do it for us.
^ permalink raw reply [flat|nested] 73+ messages in thread
* Re: [RFC PATCH 25/28] x86: Use PIE codegen for the core kernel
2024-10-02 15:25 ` Ard Biesheuvel
@ 2024-10-02 20:01 ` Linus Torvalds
2024-10-03 11:13 ` Ard Biesheuvel
0 siblings, 1 reply; 73+ messages in thread
From: Linus Torvalds @ 2024-10-02 20:01 UTC (permalink / raw)
To: Ard Biesheuvel
Cc: H. Peter Anvin, Ard Biesheuvel, linux-kernel, x86,
Andy Lutomirski, Peter Zijlstra, Uros Bizjak, Dennis Zhou,
Tejun Heo, Christoph Lameter, Mathieu Desnoyers, Paolo Bonzini,
Vitaly Kuznetsov, Juergen Gross, Boris Ostrovsky,
Greg Kroah-Hartman, Arnd Bergmann, Masahiro Yamada, Kees Cook,
Nathan Chancellor, Keith Packard, Justin Stitt, Josh Poimboeuf,
Arnaldo Carvalho de Melo, Namhyung Kim, Jiri Olsa, Ian Rogers,
Adrian Hunter, Kan Liang, linux-doc, linux-pm, kvm, xen-devel,
linux-efi, linux-arch, linux-sparse, linux-kbuild,
linux-perf-users, rust-for-linux, llvm
On Wed, 2 Oct 2024 at 08:31, Ard Biesheuvel <ardb@kernel.org> wrote:
>
> I guess you are referring to the use of a GOT? That is a valid
> concern, but it does not apply here. With hidden visibility and
> compiler command line options like -mdirect-access-extern, all emitted
> symbol references are direct.
I absolutely hate GOT entries. We definitely shouldn't ever do
anything that causes them on x86-64.
I'd much rather just do boot-time relocation, and I don't think the
"we run code at a different location than we told the linker" is an
arghument against it.
Please, let's make sure we never have any of the global offset table horror.
Yes, yes, you can't avoid them on other architectures.
That said, doing changes like changing "mov $sym" to "lea sym(%rip)" I
feel are a complete no-brainer and should be done regardless of any
other code generation issues.
Let's not do relocation for no good reason.
Linus
^ permalink raw reply [flat|nested] 73+ messages in thread
* Re: [RFC PATCH 25/28] x86: Use PIE codegen for the core kernel
2024-10-02 20:01 ` Linus Torvalds
@ 2024-10-03 11:13 ` Ard Biesheuvel
2024-10-04 21:06 ` H. Peter Anvin
0 siblings, 1 reply; 73+ messages in thread
From: Ard Biesheuvel @ 2024-10-03 11:13 UTC (permalink / raw)
To: Linus Torvalds
Cc: H. Peter Anvin, Ard Biesheuvel, linux-kernel, x86,
Andy Lutomirski, Peter Zijlstra, Uros Bizjak, Dennis Zhou,
Tejun Heo, Christoph Lameter, Mathieu Desnoyers, Paolo Bonzini,
Vitaly Kuznetsov, Juergen Gross, Boris Ostrovsky,
Greg Kroah-Hartman, Arnd Bergmann, Masahiro Yamada, Kees Cook,
Nathan Chancellor, Keith Packard, Justin Stitt, Josh Poimboeuf,
Arnaldo Carvalho de Melo, Namhyung Kim, Jiri Olsa, Ian Rogers,
Adrian Hunter, Kan Liang, linux-doc, linux-pm, kvm, xen-devel,
linux-efi, linux-arch, linux-sparse, linux-kbuild,
linux-perf-users, rust-for-linux, llvm
On Wed, 2 Oct 2024 at 22:02, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
>
> On Wed, 2 Oct 2024 at 08:31, Ard Biesheuvel <ardb@kernel.org> wrote:
> >
> > I guess you are referring to the use of a GOT? That is a valid
> > concern, but it does not apply here. With hidden visibility and
> > compiler command line options like -mdirect-access-extern, all emitted
> > symbol references are direct.
>
> I absolutely hate GOT entries. We definitely shouldn't ever do
> anything that causes them on x86-64.
>
> I'd much rather just do boot-time relocation, and I don't think the
> "we run code at a different location than we told the linker" is an
> arghument against it.
>
> Please, let's make sure we never have any of the global offset table horror.
>
> Yes, yes, you can't avoid them on other architectures.
>
GCC/binutils never needs them. GCC 13 and older will emit some
horrible indirect fentry hook calls via the GOT, but those are NOPed
out by objtool anyway, so that is easily fixable.
Clang/LLD is slightly trickier, because Clang emits relaxable GOTPCREL
relocations, but LLD doesn't update the relocations emitted via
--emit-relocs, so the relocs tool gets confused. This is one of the
reasons I had for proposing to simply switch to PIE linking, and let
the linker deal with all of that. Note that Clang may emit these even
when not generating PIC/PIE code at all.
So this is the reason I wanted to add support for GOTPCREL relocations
in the relocs tool - it is really quite trivial to do, and makes our
jobs slightly easier when dealing with these compiler quirks. The
alternative would be to teach objtool how to relax 'movq
foo@GOTPCREL(%rip)' into 'leaq foo(%rip)' - these are GOTPCREL
relaxations described in the x86_64 psABI for ELF, which is why
compilers are assuming more and more that emitting these is fine even
without -fpic, given that the linker is supposed to elide them if
possible.
Note that there are 1 or 2 cases in the asm code where it is actually
quite useful to refer to the address of foo as 'foo@GOTPCREL(%rip)' in
instructions that take memory operands, but those individual cases are
easily converted to something else if even having a GOT with just 2
entries is a dealbreaker for you.
> That said, doing changes like changing "mov $sym" to "lea sym(%rip)" I
> feel are a complete no-brainer and should be done regardless of any
> other code generation issues.
>
Yes, this is the primary reason I ended up looking into this in the
first place. Earlier this year, we ended up having to introduce
RIP_REL_REF() to emit those RIP-relative references explicitly, in
order to prevent the C code that is called via the early 1:1 mapping
from exploding. The amount of C code called in that manner has been
growing steadily over time with the introduction of 5-level paging and
SEV-SNP and TDX support, which need to play all kinds of tricks before
the normal kernel mappings are created.
Compiling with -fpie and linking with --pie -z text produces an
executable that is guaranteed to have only RIP-relative references in
the .text segment, removing the need for RIP_REL_REF entirely (it
already does nothing when __pic__ is #define'd).
^ permalink raw reply [flat|nested] 73+ messages in thread
* Re: [RFC PATCH 05/28] x86: Define the stack protector guard symbol explicitly
2024-09-25 15:01 ` [RFC PATCH 05/28] x86: Define the stack protector guard symbol explicitly Ard Biesheuvel
2024-09-25 15:53 ` Ian Rogers
2024-09-25 18:32 ` Uros Bizjak
@ 2024-10-04 10:01 ` Uros Bizjak
2 siblings, 0 replies; 73+ messages in thread
From: Uros Bizjak @ 2024-10-04 10:01 UTC (permalink / raw)
To: Ard Biesheuvel
Cc: linux-kernel, Ard Biesheuvel, x86, H. Peter Anvin,
Andy Lutomirski, Peter Zijlstra, Dennis Zhou, Tejun Heo,
Christoph Lameter, Mathieu Desnoyers, Paolo Bonzini,
Vitaly Kuznetsov, Juergen Gross, Boris Ostrovsky,
Greg Kroah-Hartman, Arnd Bergmann, Masahiro Yamada, Kees Cook,
Nathan Chancellor, Keith Packard, Justin Stitt, Josh Poimboeuf,
Arnaldo Carvalho de Melo, Namhyung Kim, Jiri Olsa, Ian Rogers,
Adrian Hunter, Kan Liang, linux-doc, linux-pm, kvm, xen-devel,
linux-efi, linux-arch, linux-sparse, linux-kbuild,
linux-perf-users, rust-for-linux, llvm
On Wed, Sep 25, 2024 at 5:02 PM Ard Biesheuvel <ardb+git@google.com> wrote:
>
> From: Ard Biesheuvel <ardb@kernel.org>
>
> Specify the guard symbol for the stack cookie explicitly, rather than
> positioning it exactly 40 bytes into the per-CPU area. Doing so removes
> the need for the per-CPU region to be absolute rather than relative to
> the placement of the per-CPU template region in the kernel image, and
> this allows the special handling for absolute per-CPU symbols to be
> removed entirely.
>
> This is a worthwhile cleanup in itself, but it is also a prerequisite
> for PIE codegen and PIE linking, which can replace our bespoke and
> rather clunky runtime relocation handling.
>
> Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
> ---
> arch/x86/Makefile | 4 ++++
> arch/x86/include/asm/init.h | 2 +-
> arch/x86/include/asm/processor.h | 11 +++--------
> arch/x86/include/asm/stackprotector.h | 4 ----
> tools/perf/util/annotate.c | 4 ++--
> 5 files changed, 10 insertions(+), 15 deletions(-)
>
> diff --git a/arch/x86/Makefile b/arch/x86/Makefile
> index 6b3fe6e2aadd..b78b7623a4a9 100644
> --- a/arch/x86/Makefile
> +++ b/arch/x86/Makefile
> @@ -193,6 +193,10 @@ else
> KBUILD_RUSTFLAGS += -Cno-redzone=y
> KBUILD_RUSTFLAGS += -Ccode-model=kernel
>
> + ifeq ($(CONFIG_STACKPROTECTOR),y)
> + KBUILD_CFLAGS += -mstack-protector-guard-symbol=fixed_percpu_data
Looking at:
> + * Since the irq_stack is the object at %gs:0, the bottom 8 bytes of
> + * the irq stack are reserved for the canary.
Please note that %gs:0 can also be achieved with
-mstack-protector-guard-offset=0
Uros.
^ permalink raw reply [flat|nested] 73+ messages in thread
* Re: [RFC PATCH 05/28] x86: Define the stack protector guard symbol explicitly
2024-09-28 13:41 ` Brian Gerst
@ 2024-10-04 13:15 ` Ard Biesheuvel
2024-10-08 14:36 ` Brian Gerst
0 siblings, 1 reply; 73+ messages in thread
From: Ard Biesheuvel @ 2024-10-04 13:15 UTC (permalink / raw)
To: Brian Gerst
Cc: Uros Bizjak, Ard Biesheuvel, linux-kernel, x86, H. Peter Anvin,
Andy Lutomirski, Peter Zijlstra, Dennis Zhou, Tejun Heo,
Christoph Lameter, Mathieu Desnoyers, Paolo Bonzini,
Vitaly Kuznetsov, Juergen Gross, Boris Ostrovsky,
Greg Kroah-Hartman, Arnd Bergmann, Masahiro Yamada, Kees Cook,
Nathan Chancellor, Keith Packard, Justin Stitt, Josh Poimboeuf,
Arnaldo Carvalho de Melo, Namhyung Kim, Jiri Olsa, Ian Rogers,
Adrian Hunter, Kan Liang, linux-doc, linux-pm, kvm, xen-devel,
linux-efi, linux-arch, linux-sparse, linux-kbuild,
linux-perf-users, rust-for-linux, llvm
On Sat, 28 Sept 2024 at 15:41, Brian Gerst <brgerst@gmail.com> wrote:
>
> On Wed, Sep 25, 2024 at 2:33 PM Uros Bizjak <ubizjak@gmail.com> wrote:
> >
> > On Wed, Sep 25, 2024 at 5:02 PM Ard Biesheuvel <ardb+git@google.com> wrote:
> > >
> > > From: Ard Biesheuvel <ardb@kernel.org>
> > >
> > > Specify the guard symbol for the stack cookie explicitly, rather than
> > > positioning it exactly 40 bytes into the per-CPU area. Doing so removes
> > > the need for the per-CPU region to be absolute rather than relative to
> > > the placement of the per-CPU template region in the kernel image, and
> > > this allows the special handling for absolute per-CPU symbols to be
> > > removed entirely.
> > >
> > > This is a worthwhile cleanup in itself, but it is also a prerequisite
> > > for PIE codegen and PIE linking, which can replace our bespoke and
> > > rather clunky runtime relocation handling.
> >
> > I would like to point out a series that converted the stack protector
> > guard symbol to a normal percpu variable [1], so there was no need to
> > assume anything about the location of the guard symbol.
> >
> > [1] "[PATCH v4 00/16] x86-64: Stack protector and percpu improvements"
> > https://lore.kernel.org/lkml/20240322165233.71698-1-brgerst@gmail.com/
> >
> > Uros.
>
> I plan on resubmitting that series sometime after the 6.12 merge
> window closes. As I recall from the last version, it was decided to
> wait until after the next LTS release to raise the minimum GCC version
> to 8.1 and avoid the need to be compatible with the old stack
> protector layout.
>
Hi Brian,
I'd be more than happy to compare notes on that - I wasn't aware of
your intentions here, or I would have reached out before sending this
RFC.
There are two things that you would need to address for Clang support
to work correctly:
- the workaround I cc'ed you on the other day [0],
- a workaround for the module loader so it tolerates the GOTPCRELX
relocations that Clang emits [1]
[0] https://lore.kernel.org/all/20241002092534.3163838-2-ardb+git@google.com/
[1] https://git.kernel.org/pub/scm/linux/kernel/git/ardb/linux.git/commit/?id=a18121aabbdd
^ permalink raw reply [flat|nested] 73+ messages in thread
* Re: [RFC PATCH 25/28] x86: Use PIE codegen for the core kernel
2024-10-03 11:13 ` Ard Biesheuvel
@ 2024-10-04 21:06 ` H. Peter Anvin
2024-10-05 8:31 ` Uros Bizjak
0 siblings, 1 reply; 73+ messages in thread
From: H. Peter Anvin @ 2024-10-04 21:06 UTC (permalink / raw)
To: Ard Biesheuvel, Linus Torvalds
Cc: Ard Biesheuvel, linux-kernel, x86, Andy Lutomirski,
Peter Zijlstra, Uros Bizjak, Dennis Zhou, Tejun Heo,
Christoph Lameter, Mathieu Desnoyers, Paolo Bonzini,
Vitaly Kuznetsov, Juergen Gross, Boris Ostrovsky,
Greg Kroah-Hartman, Arnd Bergmann, Masahiro Yamada, Kees Cook,
Nathan Chancellor, Keith Packard, Justin Stitt, Josh Poimboeuf,
Arnaldo Carvalho de Melo, Namhyung Kim, Jiri Olsa, Ian Rogers,
Adrian Hunter, Kan Liang, linux-doc, linux-pm, kvm, xen-devel,
linux-efi, linux-arch, linux-sparse, linux-kbuild,
linux-perf-users, rust-for-linux, llvm
On 10/3/24 04:13, Ard Biesheuvel wrote:
>
>> That said, doing changes like changing "mov $sym" to "lea sym(%rip)" I
>> feel are a complete no-brainer and should be done regardless of any
>> other code generation issues.
>
> Yes, this is the primary reason I ended up looking into this in the
> first place. Earlier this year, we ended up having to introduce
> RIP_REL_REF() to emit those RIP-relative references explicitly, in
> order to prevent the C code that is called via the early 1:1 mapping
> from exploding. The amount of C code called in that manner has been
> growing steadily over time with the introduction of 5-level paging and
> SEV-SNP and TDX support, which need to play all kinds of tricks before
> the normal kernel mappings are created.
>
movq $sym to leaq sym(%rip) which you said ought to be smaller (and in
reality appears to be the same size, 7 bytes) seems like a no-brainer
and can be treated as a code quality issue -- in other words, file bug
reports against gcc and clang.
> Compiling with -fpie and linking with --pie -z text produces an
> executable that is guaranteed to have only RIP-relative references in
> the .text segment, removing the need for RIP_REL_REF entirely (it
> already does nothing when __pic__ is #define'd).
But -fpie has a considerable cost; specifically when we have indexed
references, as in that case the base pointer needs to be manifest in a
register, *and* it takes up a register slot in the EA, which may end
converting one instruction into three.
Now, the "kernel" memory model is defined in the ABI document, but there
is nothing that prevents us from making updates to it if we need to;
e.g. the statement that movq $sym can be used is undesirable, of course.
-hpa
^ permalink raw reply [flat|nested] 73+ messages in thread
* Re: [RFC PATCH 25/28] x86: Use PIE codegen for the core kernel
2024-10-04 21:06 ` H. Peter Anvin
@ 2024-10-05 8:31 ` Uros Bizjak
2024-10-05 23:36 ` H. Peter Anvin
0 siblings, 1 reply; 73+ messages in thread
From: Uros Bizjak @ 2024-10-05 8:31 UTC (permalink / raw)
To: H. Peter Anvin
Cc: Ard Biesheuvel, Linus Torvalds, Ard Biesheuvel, linux-kernel, x86,
Andy Lutomirski, Peter Zijlstra, Dennis Zhou, Tejun Heo,
Christoph Lameter, Mathieu Desnoyers, Paolo Bonzini,
Vitaly Kuznetsov, Juergen Gross, Boris Ostrovsky,
Greg Kroah-Hartman, Arnd Bergmann, Masahiro Yamada, Kees Cook,
Nathan Chancellor, Keith Packard, Justin Stitt, Josh Poimboeuf,
Arnaldo Carvalho de Melo, Namhyung Kim, Jiri Olsa, Ian Rogers,
Adrian Hunter, Kan Liang, linux-doc, linux-pm, kvm, xen-devel,
linux-efi, linux-arch, linux-sparse, linux-kbuild,
linux-perf-users, rust-for-linux, llvm
On Fri, Oct 4, 2024 at 11:06 PM H. Peter Anvin <hpa@zytor.com> wrote:
>
> On 10/3/24 04:13, Ard Biesheuvel wrote:
> >
> >> That said, doing changes like changing "mov $sym" to "lea sym(%rip)" I
> >> feel are a complete no-brainer and should be done regardless of any
> >> other code generation issues.
> >
> > Yes, this is the primary reason I ended up looking into this in the
> > first place. Earlier this year, we ended up having to introduce
> > RIP_REL_REF() to emit those RIP-relative references explicitly, in
> > order to prevent the C code that is called via the early 1:1 mapping
> > from exploding. The amount of C code called in that manner has been
> > growing steadily over time with the introduction of 5-level paging and
> > SEV-SNP and TDX support, which need to play all kinds of tricks before
> > the normal kernel mappings are created.
> >
>
> movq $sym to leaq sym(%rip) which you said ought to be smaller (and in
> reality appears to be the same size, 7 bytes) seems like a no-brainer
> and can be treated as a code quality issue -- in other words, file bug
> reports against gcc and clang.
It is the kernel assembly source that should be converted to
rip-relative form, gcc (and probably clang) have nothing with it.
Uros.
^ permalink raw reply [flat|nested] 73+ messages in thread
* Re: [RFC PATCH 25/28] x86: Use PIE codegen for the core kernel
2024-10-05 8:31 ` Uros Bizjak
@ 2024-10-05 23:36 ` H. Peter Anvin
2024-10-06 0:00 ` Linus Torvalds
2024-10-06 7:59 ` Uros Bizjak
0 siblings, 2 replies; 73+ messages in thread
From: H. Peter Anvin @ 2024-10-05 23:36 UTC (permalink / raw)
To: Uros Bizjak
Cc: Ard Biesheuvel, Linus Torvalds, Ard Biesheuvel, linux-kernel, x86,
Andy Lutomirski, Peter Zijlstra, Dennis Zhou, Tejun Heo,
Christoph Lameter, Mathieu Desnoyers, Paolo Bonzini,
Vitaly Kuznetsov, Juergen Gross, Boris Ostrovsky,
Greg Kroah-Hartman, Arnd Bergmann, Masahiro Yamada, Kees Cook,
Nathan Chancellor, Keith Packard, Justin Stitt, Josh Poimboeuf,
Arnaldo Carvalho de Melo, Namhyung Kim, Jiri Olsa, Ian Rogers,
Adrian Hunter, Kan Liang, linux-doc, linux-pm, kvm, xen-devel,
linux-efi, linux-arch, linux-sparse, linux-kbuild,
linux-perf-users, rust-for-linux, llvm
On 10/5/24 01:31, Uros Bizjak wrote:
>>
>> movq $sym to leaq sym(%rip) which you said ought to be smaller (and in
>> reality appears to be the same size, 7 bytes) seems like a no-brainer
>> and can be treated as a code quality issue -- in other words, file bug
>> reports against gcc and clang.
>
> It is the kernel assembly source that should be converted to
> rip-relative form, gcc (and probably clang) have nothing with it.
>
Sadly, that is not correct; neither gcc nor clang uses lea:
-hpa
gcc version 14.2.1 20240912 (Red Hat 14.2.1-3) (GCC)
hpa@tazenda:/tmp$ cat foo.c
int foobar;
int *where_is_foobar(void)
{
return &foobar;
}
hpa@tazenda:/tmp$ gcc -mcmodel=kernel -O2 -c -o foo.o foo.c
hpa@tazenda:/tmp$ objdump -dr foo.o
foo.o: file format elf64-x86-64
Disassembly of section .text:
0000000000000000 <where_is_foobar>:
0: 48 c7 c0 00 00 00 00 mov $0x0,%rax
3: R_X86_64_32S foobar
7: c3 ret
clang version 18.1.8 (Fedora 18.1.8-1.fc40)
hpa@tazenda:/tmp$ clang -mcmodel=kernel -O2 -c -o foo.o foo.c
hpa@tazenda:/tmp$ objdump -dr foo.o
foo.o: file format elf64-x86-64
Disassembly of section .text:
0000000000000000 <where_is_foobar>:
0: 48 c7 c0 00 00 00 00 mov $0x0,%rax
3: R_X86_64_32S foobar
7: c3 ret
^ permalink raw reply [flat|nested] 73+ messages in thread
* Re: [RFC PATCH 25/28] x86: Use PIE codegen for the core kernel
2024-10-05 23:36 ` H. Peter Anvin
@ 2024-10-06 0:00 ` Linus Torvalds
2024-10-06 8:06 ` Uros Bizjak
2024-10-06 7:59 ` Uros Bizjak
1 sibling, 1 reply; 73+ messages in thread
From: Linus Torvalds @ 2024-10-06 0:00 UTC (permalink / raw)
To: H. Peter Anvin
Cc: Uros Bizjak, Ard Biesheuvel, Ard Biesheuvel, linux-kernel, x86,
Andy Lutomirski, Peter Zijlstra, Dennis Zhou, Tejun Heo,
Christoph Lameter, Mathieu Desnoyers, Paolo Bonzini,
Vitaly Kuznetsov, Juergen Gross, Boris Ostrovsky,
Greg Kroah-Hartman, Arnd Bergmann, Masahiro Yamada, Kees Cook,
Nathan Chancellor, Keith Packard, Justin Stitt, Josh Poimboeuf,
Arnaldo Carvalho de Melo, Namhyung Kim, Jiri Olsa, Ian Rogers,
Adrian Hunter, Kan Liang, linux-doc, linux-pm, kvm, xen-devel,
linux-efi, linux-arch, linux-sparse, linux-kbuild,
linux-perf-users, rust-for-linux, llvm
On Sat, 5 Oct 2024 at 16:37, H. Peter Anvin <hpa@zytor.com> wrote:
>
> Sadly, that is not correct; neither gcc nor clang uses lea:
Looking around, this may be intentional. At least according to Agner,
several cores do better at "mov immediate" compared to "lea".
Eg a RIP-relative LEA on Zen 2 gets a throughput of two per cycle, but
a "MOV r,i" gets four. That got fixed in Zen 3 and later, but
apparently Intel had similar issues (Ivy Bridge: 1 LEA per cycle, vs 3
"mov i,r". Haswell is 1:4).
Of course, Agner's tables are good, but not necessarily always the
whole story. There are other instruction tables on the internet (eg
uops.info) with possibly more info.
And in reality, I would expect it to be a complete non-issue with any
OoO engine and real code, because you are very seldom ALU limited
particularly when there aren't any data dependencies.
But a RIP-relative LEA does seem to put a *bit* more pressure on the
core resources, so the compilers are may be right to pick a "mov".
Linus
^ permalink raw reply [flat|nested] 73+ messages in thread
* Re: [RFC PATCH 25/28] x86: Use PIE codegen for the core kernel
2024-10-05 23:36 ` H. Peter Anvin
2024-10-06 0:00 ` Linus Torvalds
@ 2024-10-06 7:59 ` Uros Bizjak
2024-10-06 18:00 ` David Laight
1 sibling, 1 reply; 73+ messages in thread
From: Uros Bizjak @ 2024-10-06 7:59 UTC (permalink / raw)
To: H. Peter Anvin
Cc: Ard Biesheuvel, Linus Torvalds, Ard Biesheuvel, linux-kernel, x86,
Andy Lutomirski, Peter Zijlstra, Dennis Zhou, Tejun Heo,
Christoph Lameter, Mathieu Desnoyers, Paolo Bonzini,
Vitaly Kuznetsov, Juergen Gross, Boris Ostrovsky,
Greg Kroah-Hartman, Arnd Bergmann, Masahiro Yamada, Kees Cook,
Nathan Chancellor, Keith Packard, Justin Stitt, Josh Poimboeuf,
Arnaldo Carvalho de Melo, Namhyung Kim, Jiri Olsa, Ian Rogers,
Adrian Hunter, Kan Liang, linux-doc, linux-pm, kvm, xen-devel,
linux-efi, linux-arch, linux-sparse, linux-kbuild,
linux-perf-users, rust-for-linux, llvm
On Sun, Oct 6, 2024 at 1:37 AM H. Peter Anvin <hpa@zytor.com> wrote:
>
> On 10/5/24 01:31, Uros Bizjak wrote:
> >>
> >> movq $sym to leaq sym(%rip) which you said ought to be smaller (and in
> >> reality appears to be the same size, 7 bytes) seems like a no-brainer
> >> and can be treated as a code quality issue -- in other words, file bug
> >> reports against gcc and clang.
> >
> > It is the kernel assembly source that should be converted to
> > rip-relative form, gcc (and probably clang) have nothing with it.
> >
>
> Sadly, that is not correct; neither gcc nor clang uses lea:
>
> -hpa
>
>
> gcc version 14.2.1 20240912 (Red Hat 14.2.1-3) (GCC)
>
> hpa@tazenda:/tmp$ cat foo.c
> int foobar;
>
> int *where_is_foobar(void)
> {
> return &foobar;
> }
>
> hpa@tazenda:/tmp$ gcc -mcmodel=kernel -O2 -c -o foo.o foo.c
Indeed, but my reply was in the context of -fpie, which guarantees RIP
relative access. IOW, the compiler will always generate sym(%rip) with
-fpie, but (obviously) can't change assembly code in the kernel when
the PIE is requested.
Otherwise, MOV $immediate, %reg is faster when PIE is not required,
which is the case with -mcmodel=kernel. IIRC, LEA with %rip had some
performance issues, which may not be the case anymore with newer
processors.
Due to the non-negligible impact of PIE, perhaps some kind of
CONFIG_PIE config definition should be introduced, so the assembly
code would be able to choose optimal asm sequence when PIE and non-PIE
is requested?
Uros.
^ permalink raw reply [flat|nested] 73+ messages in thread
* Re: [RFC PATCH 25/28] x86: Use PIE codegen for the core kernel
2024-10-06 0:00 ` Linus Torvalds
@ 2024-10-06 8:06 ` Uros Bizjak
0 siblings, 0 replies; 73+ messages in thread
From: Uros Bizjak @ 2024-10-06 8:06 UTC (permalink / raw)
To: Linus Torvalds
Cc: H. Peter Anvin, Ard Biesheuvel, Ard Biesheuvel, linux-kernel, x86,
Andy Lutomirski, Peter Zijlstra, Dennis Zhou, Tejun Heo,
Christoph Lameter, Mathieu Desnoyers, Paolo Bonzini,
Vitaly Kuznetsov, Juergen Gross, Boris Ostrovsky,
Greg Kroah-Hartman, Arnd Bergmann, Masahiro Yamada, Kees Cook,
Nathan Chancellor, Keith Packard, Justin Stitt, Josh Poimboeuf,
Arnaldo Carvalho de Melo, Namhyung Kim, Jiri Olsa, Ian Rogers,
Adrian Hunter, Kan Liang, linux-doc, linux-pm, kvm, xen-devel,
linux-efi, linux-arch, linux-sparse, linux-kbuild,
linux-perf-users, rust-for-linux, llvm
On Sun, Oct 6, 2024 at 2:00 AM Linus Torvalds
<torvalds@linux-foundation.org> wrote:
>
> On Sat, 5 Oct 2024 at 16:37, H. Peter Anvin <hpa@zytor.com> wrote:
> >
> > Sadly, that is not correct; neither gcc nor clang uses lea:
>
> Looking around, this may be intentional. At least according to Agner,
> several cores do better at "mov immediate" compared to "lea".
>
> Eg a RIP-relative LEA on Zen 2 gets a throughput of two per cycle, but
> a "MOV r,i" gets four. That got fixed in Zen 3 and later, but
> apparently Intel had similar issues (Ivy Bridge: 1 LEA per cycle, vs 3
> "mov i,r". Haswell is 1:4).
Yes, this is the case. I just missed your reply when replying to
Peter's mail with a not so precise answer.
Uros.
^ permalink raw reply [flat|nested] 73+ messages in thread
* RE: [RFC PATCH 25/28] x86: Use PIE codegen for the core kernel
2024-10-06 7:59 ` Uros Bizjak
@ 2024-10-06 18:00 ` David Laight
2024-10-06 19:17 ` Uros Bizjak
0 siblings, 1 reply; 73+ messages in thread
From: David Laight @ 2024-10-06 18:00 UTC (permalink / raw)
To: 'Uros Bizjak', H. Peter Anvin
Cc: Ard Biesheuvel, Linus Torvalds, Ard Biesheuvel,
linux-kernel@vger.kernel.org, x86@kernel.org, Andy Lutomirski,
Peter Zijlstra, Dennis Zhou, Tejun Heo, Christoph Lameter,
Mathieu Desnoyers, Paolo Bonzini, Vitaly Kuznetsov, Juergen Gross,
Boris Ostrovsky, Greg Kroah-Hartman, Arnd Bergmann,
Masahiro Yamada, Kees Cook, Nathan Chancellor, Keith Packard,
Justin Stitt, Josh Poimboeuf, Arnaldo Carvalho de Melo,
Namhyung Kim, Jiri Olsa, Ian Rogers, Adrian Hunter, Kan Liang,
linux-doc@vger.kernel.org, linux-pm@vger.kernel.org,
kvm@vger.kernel.org, xen-devel@lists.xenproject.org,
linux-efi@vger.kernel.org, linux-arch@vger.kernel.org,
linux-sparse@vger.kernel.org, linux-kbuild@vger.kernel.org,
linux-perf-users@vger.kernel.org, rust-for-linux@vger.kernel.org,
llvm@lists.linux.dev
...
> Due to the non-negligible impact of PIE, perhaps some kind of
> CONFIG_PIE config definition should be introduced, so the assembly
> code would be able to choose optimal asm sequence when PIE and non-PIE
> is requested?
I wouldn't have thought that performance mattered in the asm code
that runs during startup?
While x86-84 code (ignoring data references) is pretty much always
position independent, the same isn't true of all architectures.
Some (at least Nios-II) only have absolute call instructions.
So you can't really move to pic code globally.
You'd also want 'bad' pic code that contained some fixups that
needed the code patching.
(Which you really don't want for a shared library.)
Otherwise you get an extra instruction for non-trivial data
accesses.
Thinking....
Doesn't the code generated for -fpic assume that the dynamic loader
has processed the relocations before it is run?
But the kernel startup code is running before they can have been done?
So even if that C code were 'pic' it could still contain things that
are invalid (probably arrays of pointers?).
So you lose one set of bugs and gain another.
David
-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)
^ permalink raw reply [flat|nested] 73+ messages in thread
* Re: [RFC PATCH 25/28] x86: Use PIE codegen for the core kernel
2024-10-06 18:00 ` David Laight
@ 2024-10-06 19:17 ` Uros Bizjak
2024-10-06 19:38 ` H. Peter Anvin
0 siblings, 1 reply; 73+ messages in thread
From: Uros Bizjak @ 2024-10-06 19:17 UTC (permalink / raw)
To: David Laight
Cc: H. Peter Anvin, Ard Biesheuvel, Linus Torvalds, Ard Biesheuvel,
linux-kernel@vger.kernel.org, x86@kernel.org, Andy Lutomirski,
Peter Zijlstra, Dennis Zhou, Tejun Heo, Christoph Lameter,
Mathieu Desnoyers, Paolo Bonzini, Vitaly Kuznetsov, Juergen Gross,
Boris Ostrovsky, Greg Kroah-Hartman, Arnd Bergmann,
Masahiro Yamada, Kees Cook, Nathan Chancellor, Keith Packard,
Justin Stitt, Josh Poimboeuf, Arnaldo Carvalho de Melo,
Namhyung Kim, Jiri Olsa, Ian Rogers, Adrian Hunter, Kan Liang,
linux-doc@vger.kernel.org, linux-pm@vger.kernel.org,
kvm@vger.kernel.org, xen-devel@lists.xenproject.org,
linux-efi@vger.kernel.org, linux-arch@vger.kernel.org,
linux-sparse@vger.kernel.org, linux-kbuild@vger.kernel.org,
linux-perf-users@vger.kernel.org, rust-for-linux@vger.kernel.org,
llvm@lists.linux.dev
On Sun, Oct 6, 2024 at 8:01 PM David Laight <David.Laight@aculab.com> wrote:
>
> ...
> > Due to the non-negligible impact of PIE, perhaps some kind of
> > CONFIG_PIE config definition should be introduced, so the assembly
> > code would be able to choose optimal asm sequence when PIE and non-PIE
> > is requested?
>
> I wouldn't have thought that performance mattered in the asm code
> that runs during startup?
No, not the code that runs only once, where performance impact can be tolerated.
This one:
https://lore.kernel.org/lkml/20240925150059.3955569-44-ardb+git@google.com/
Uros.
^ permalink raw reply [flat|nested] 73+ messages in thread
* Re: [RFC PATCH 25/28] x86: Use PIE codegen for the core kernel
2024-10-06 19:17 ` Uros Bizjak
@ 2024-10-06 19:38 ` H. Peter Anvin
0 siblings, 0 replies; 73+ messages in thread
From: H. Peter Anvin @ 2024-10-06 19:38 UTC (permalink / raw)
To: Uros Bizjak, David Laight
Cc: Ard Biesheuvel, Linus Torvalds, Ard Biesheuvel,
linux-kernel@vger.kernel.org, x86@kernel.org, Andy Lutomirski,
Peter Zijlstra, Dennis Zhou, Tejun Heo, Christoph Lameter,
Mathieu Desnoyers, Paolo Bonzini, Vitaly Kuznetsov, Juergen Gross,
Boris Ostrovsky, Greg Kroah-Hartman, Arnd Bergmann,
Masahiro Yamada, Kees Cook, Nathan Chancellor, Keith Packard,
Justin Stitt, Josh Poimboeuf, Arnaldo Carvalho de Melo,
Namhyung Kim, Jiri Olsa, Ian Rogers, Adrian Hunter, Kan Liang,
linux-doc@vger.kernel.org, linux-pm@vger.kernel.org,
kvm@vger.kernel.org, xen-devel@lists.xenproject.org,
linux-efi@vger.kernel.org, linux-arch@vger.kernel.org,
linux-sparse@vger.kernel.org, linux-kbuild@vger.kernel.org,
linux-perf-users@vger.kernel.org, rust-for-linux@vger.kernel.org,
llvm@lists.linux.dev
On October 6, 2024 12:17:40 PM PDT, Uros Bizjak <ubizjak@gmail.com> wrote:
>On Sun, Oct 6, 2024 at 8:01 PM David Laight <David.Laight@aculab.com> wrote:
>>
>> ...
>> > Due to the non-negligible impact of PIE, perhaps some kind of
>> > CONFIG_PIE config definition should be introduced, so the assembly
>> > code would be able to choose optimal asm sequence when PIE and non-PIE
>> > is requested?
>>
>> I wouldn't have thought that performance mattered in the asm code
>> that runs during startup?
>
>No, not the code that runs only once, where performance impact can be tolerated.
>
>This one:
>
>https://lore.kernel.org/lkml/20240925150059.3955569-44-ardb+git@google.com/
>
>Uros.
>
Yeah, running the kernel proper as PIE seems like a lose all around. The decompressor, ELF stub, etc, are of course a different matter entirely (and at least the latter can't rely on the small or kernel memory models anyway.)
^ permalink raw reply [flat|nested] 73+ messages in thread
* Re: [RFC PATCH 05/28] x86: Define the stack protector guard symbol explicitly
2024-10-04 13:15 ` Ard Biesheuvel
@ 2024-10-08 14:36 ` Brian Gerst
0 siblings, 0 replies; 73+ messages in thread
From: Brian Gerst @ 2024-10-08 14:36 UTC (permalink / raw)
To: Ard Biesheuvel
Cc: Uros Bizjak, Ard Biesheuvel, linux-kernel, x86, H. Peter Anvin,
Andy Lutomirski, Peter Zijlstra, Dennis Zhou, Tejun Heo,
Christoph Lameter, Mathieu Desnoyers, Paolo Bonzini,
Vitaly Kuznetsov, Juergen Gross, Boris Ostrovsky,
Greg Kroah-Hartman, Arnd Bergmann, Masahiro Yamada, Kees Cook,
Nathan Chancellor, Keith Packard, Justin Stitt, Josh Poimboeuf,
Arnaldo Carvalho de Melo, Namhyung Kim, Jiri Olsa, Ian Rogers,
Adrian Hunter, Kan Liang, linux-doc, linux-pm, kvm, xen-devel,
linux-efi, linux-arch, linux-sparse, linux-kbuild,
linux-perf-users, rust-for-linux, llvm
On Fri, Oct 4, 2024 at 9:15 AM Ard Biesheuvel <ardb@kernel.org> wrote:
>
> On Sat, 28 Sept 2024 at 15:41, Brian Gerst <brgerst@gmail.com> wrote:
> >
> > On Wed, Sep 25, 2024 at 2:33 PM Uros Bizjak <ubizjak@gmail.com> wrote:
> > >
> > > On Wed, Sep 25, 2024 at 5:02 PM Ard Biesheuvel <ardb+git@google.com> wrote:
> > > >
> > > > From: Ard Biesheuvel <ardb@kernel.org>
> > > >
> > > > Specify the guard symbol for the stack cookie explicitly, rather than
> > > > positioning it exactly 40 bytes into the per-CPU area. Doing so removes
> > > > the need for the per-CPU region to be absolute rather than relative to
> > > > the placement of the per-CPU template region in the kernel image, and
> > > > this allows the special handling for absolute per-CPU symbols to be
> > > > removed entirely.
> > > >
> > > > This is a worthwhile cleanup in itself, but it is also a prerequisite
> > > > for PIE codegen and PIE linking, which can replace our bespoke and
> > > > rather clunky runtime relocation handling.
> > >
> > > I would like to point out a series that converted the stack protector
> > > guard symbol to a normal percpu variable [1], so there was no need to
> > > assume anything about the location of the guard symbol.
> > >
> > > [1] "[PATCH v4 00/16] x86-64: Stack protector and percpu improvements"
> > > https://lore.kernel.org/lkml/20240322165233.71698-1-brgerst@gmail.com/
> > >
> > > Uros.
> >
> > I plan on resubmitting that series sometime after the 6.12 merge
> > window closes. As I recall from the last version, it was decided to
> > wait until after the next LTS release to raise the minimum GCC version
> > to 8.1 and avoid the need to be compatible with the old stack
> > protector layout.
> >
>
> Hi Brian,
>
> I'd be more than happy to compare notes on that - I wasn't aware of
> your intentions here, or I would have reached out before sending this
> RFC.
>
> There are two things that you would need to address for Clang support
> to work correctly:
> - the workaround I cc'ed you on the other day [0],
> - a workaround for the module loader so it tolerates the GOTPCRELX
> relocations that Clang emits [1]
>
>
>
> [0] https://lore.kernel.org/all/20241002092534.3163838-2-ardb+git@google.com/
> [1] https://git.kernel.org/pub/scm/linux/kernel/git/ardb/linux.git/commit/?id=a18121aabbdd
The first patch should be applied independently as a bug fix, since it
already affects the 32-bit build with clang.
I don't have an environment with an older clang compiler to test the
second patch, but I'll assume it will be necessary. I did run into an
issue with the GOTPCRELX relocations before [1], but I thought it was
just an objtool issue and didn't do more testing to know if modules
were broken or not.
Brian Gerst
[1] https://lore.kernel.org/all/20231026160100.195099-6-brgerst@gmail.com/
^ permalink raw reply [flat|nested] 73+ messages in thread
* Re: [RFC PATCH 02/28] Documentation: Bump minimum GCC version to 8.1
2024-09-25 15:58 ` Arnd Bergmann
@ 2024-12-19 11:53 ` Mark Rutland
2024-12-19 12:02 ` Arnd Bergmann
0 siblings, 1 reply; 73+ messages in thread
From: Mark Rutland @ 2024-12-19 11:53 UTC (permalink / raw)
To: Arnd Bergmann
Cc: Ard Biesheuvel, linux-kernel, Ard Biesheuvel, x86, H. Peter Anvin,
Andy Lutomirski, Peter Zijlstra, Uros Bizjak, Dennis Zhou,
Tejun Heo, Christoph Lameter, Mathieu Desnoyers, Paolo Bonzini,
Vitaly Kuznetsov, Juergen Gross, Boris Ostrovsky,
Greg Kroah-Hartman, Masahiro Yamada, Kees Cook, Nathan Chancellor,
Keith Packard, Justin Stitt, Josh Poimboeuf,
Arnaldo Carvalho de Melo, Namhyung Kim, Jiri Olsa, Ian Rogers,
Adrian Hunter, Kan Liang, linux-doc, linux-pm, kvm, xen-devel,
linux-efi, Linux-Arch, linux-sparse, linux-kbuild,
linux-perf-users, rust-for-linux, llvm
Hi Arnd,
On Wed, Sep 25, 2024 at 03:58:38PM +0000, Arnd Bergmann wrote:
> On Wed, Sep 25, 2024, at 15:01, Ard Biesheuvel wrote:
> > From: Ard Biesheuvel <ardb@kernel.org>
> >
> > Bump the minimum GCC version to 8.1 to gain unconditional support for
> > referring to the per-task stack cookie using a symbol rather than
> > relying on the fixed offset of 40 bytes from %GS, which requires
> > elaborate hacks to support.
> >
> > Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
> > ---
> > Documentation/admin-guide/README.rst | 2 +-
> > Documentation/process/changes.rst | 2 +-
> > 2 files changed, 2 insertions(+), 2 deletions(-)
>
> Acked-by: Arnd Bergmann <arnd@arndb.de>
>
> As we discussed during plumbers, I think this is reasonable,
> both the gcc-8.1 version and the timing after the 6.12-LTS
> kernel.
>
> We obviously need to go through all the other version checks
> to see what else can be cleaned up. I would suggest we also
> raise the binutils version to 2.30+, which is what RHEL8
> shipped alongside gcc-8. I have not found other distros that
> use older binutils in combination with gcc-8 or higher,
> Debian 10 uses binutils-2.31.
> I don't think we want to combine the additional cleanup with
> your series, but if we can agree on the version, we can do that
> in parallel.
Were you planning to send patches to that effect, or did you want
someone else to do that? I think we were largely agreed on making those
changes, but it wasn't clear to me who was actually going to send
patches, and I couldn't spot a subsequent thread on LKML.
Mark.
^ permalink raw reply [flat|nested] 73+ messages in thread
* Re: [RFC PATCH 02/28] Documentation: Bump minimum GCC version to 8.1
2024-12-19 11:53 ` Mark Rutland
@ 2024-12-19 12:02 ` Arnd Bergmann
0 siblings, 0 replies; 73+ messages in thread
From: Arnd Bergmann @ 2024-12-19 12:02 UTC (permalink / raw)
To: Mark Rutland
Cc: Ard Biesheuvel, linux-kernel, Ard Biesheuvel, x86, H. Peter Anvin,
Andy Lutomirski, Peter Zijlstra, Uros Bizjak, Dennis Zhou,
Tejun Heo, Christoph Lameter, Mathieu Desnoyers, Paolo Bonzini,
Vitaly Kuznetsov, Juergen Gross, Boris Ostrovsky,
Greg Kroah-Hartman, Masahiro Yamada, Kees Cook, Nathan Chancellor,
Keith Packard, Justin Stitt, Josh Poimboeuf,
Arnaldo Carvalho de Melo, Namhyung Kim, Jiri Olsa, Ian Rogers,
Adrian Hunter, Kan Liang, linux-doc, linux-pm, kvm, xen-devel,
linux-efi, Linux-Arch, linux-sparse, linux-kbuild,
linux-perf-users, rust-for-linux, llvm
On Thu, Dec 19, 2024, at 12:53, Mark Rutland wrote:
> On Wed, Sep 25, 2024 at 03:58:38PM +0000, Arnd Bergmann wrote:
>> On Wed, Sep 25, 2024, at 15:01, Ard Biesheuvel wrote:
>> > From: Ard Biesheuvel <ardb@kernel.org>
>>
>> We obviously need to go through all the other version checks
>> to see what else can be cleaned up. I would suggest we also
>> raise the binutils version to 2.30+, which is what RHEL8
>> shipped alongside gcc-8. I have not found other distros that
>> use older binutils in combination with gcc-8 or higher,
>> Debian 10 uses binutils-2.31.
>> I don't think we want to combine the additional cleanup with
>> your series, but if we can agree on the version, we can do that
>> in parallel.
>
> Were you planning to send patches to that effect, or did you want
> someone else to do that? I think we were largely agreed on making those
> changes, but it wasn't clear to me who was actually going to send
> patches, and I couldn't spot a subsequent thread on LKML.
I hadn't planned on doing that, but I could help (after my
vacation). As Ard already posted the the patch for gcc, I
was expecting that this one would get merged along with the
other patches in the series.
Ard, what is the status of your series, is this likely to
make it into 6.14, or should we have a separate patch to
just raise the minimum gcc and binutils version independent
of your work?
Arnd
^ permalink raw reply [flat|nested] 73+ messages in thread
end of thread, other threads:[~2024-12-19 12:03 UTC | newest]
Thread overview: 73+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-09-25 15:01 [RFC PATCH 00/28] x86: Rely on toolchain for relocatable code Ard Biesheuvel
2024-09-25 15:01 ` [RFC PATCH 01/28] x86/pvh: Call C code via the kernel virtual mapping Ard Biesheuvel
2024-09-25 21:12 ` Jason Andryuk
2024-09-25 15:01 ` [RFC PATCH 02/28] Documentation: Bump minimum GCC version to 8.1 Ard Biesheuvel
2024-09-25 15:58 ` Arnd Bergmann
2024-12-19 11:53 ` Mark Rutland
2024-12-19 12:02 ` Arnd Bergmann
2024-09-26 21:35 ` Miguel Ojeda
2024-09-27 16:22 ` Mark Rutland
2024-09-25 15:01 ` [RFC PATCH 03/28] x86/tools: Use mmap() to simplify relocs host tool Ard Biesheuvel
2024-09-25 15:01 ` [RFC PATCH 04/28] x86/boot: Permit GOTPCREL relocations for x86_64 builds Ard Biesheuvel
2024-10-01 5:33 ` Josh Poimboeuf
2024-10-01 6:56 ` Ard Biesheuvel
2024-09-25 15:01 ` [RFC PATCH 05/28] x86: Define the stack protector guard symbol explicitly Ard Biesheuvel
2024-09-25 15:53 ` Ian Rogers
2024-09-25 17:43 ` Ard Biesheuvel
2024-09-25 17:48 ` Ian Rogers
2024-09-25 18:32 ` Uros Bizjak
2024-09-28 13:41 ` Brian Gerst
2024-10-04 13:15 ` Ard Biesheuvel
2024-10-08 14:36 ` Brian Gerst
2024-10-04 10:01 ` Uros Bizjak
2024-09-25 15:01 ` [RFC PATCH 06/28] x86/percpu: Get rid of absolute per-CPU variable placement Ard Biesheuvel
2024-09-25 17:56 ` Christoph Lameter (Ampere)
2024-09-25 15:01 ` [RFC PATCH 07/28] scripts/kallsyms: Avoid 0x0 as the relative base Ard Biesheuvel
2024-09-25 15:01 ` [RFC PATCH 08/28] scripts/kallsyms: Remove support for absolute per-CPU variables Ard Biesheuvel
2024-09-25 15:01 ` [RFC PATCH 09/28] x86/tools: Remove special relocation handling for " Ard Biesheuvel
2024-09-25 15:01 ` [RFC PATCH 10/28] x86/xen: Avoid relocatable quantities in Xen ELF notes Ard Biesheuvel
2024-09-25 15:01 ` [RFC PATCH 11/28] x86/pvh: Avoid absolute symbol references in .head.text Ard Biesheuvel
2024-09-25 21:10 ` Jason Andryuk
2024-09-25 21:50 ` Ard Biesheuvel
2024-09-25 22:40 ` Jason Andryuk
2024-09-25 15:01 ` [RFC PATCH 12/28] x86/pm-trace: Use RIP-relative accesses for .tracedata Ard Biesheuvel
2024-09-25 15:01 ` [RFC PATCH 13/28] x86/kvm: Use RIP-relative addressing Ard Biesheuvel
2024-09-25 15:01 ` [RFC PATCH 14/28] x86/rethook: Use RIP-relative reference for return address Ard Biesheuvel
2024-09-25 16:39 ` Linus Torvalds
2024-09-25 16:45 ` Ard Biesheuvel
2024-09-25 15:01 ` [RFC PATCH 15/28] x86/sync_core: Use RIP-relative addressing Ard Biesheuvel
2024-09-25 15:01 ` [RFC PATCH 16/28] x86/entry_64: " Ard Biesheuvel
2024-09-25 15:01 ` [RFC PATCH 17/28] x86/hibernate: Prefer RIP-relative accesses Ard Biesheuvel
2024-09-25 15:01 ` [RFC PATCH 18/28] x86/boot/64: Determine VA/PA offset before entering C code Ard Biesheuvel
2024-09-25 15:01 ` [RFC PATCH 19/28] x86/boot/64: Avoid intentional absolute symbol references in .head.text Ard Biesheuvel
2024-09-25 15:01 ` [RFC PATCH 20/28] x64/acpi: Use PIC-compatible references in wakeup_64.S Ard Biesheuvel
2024-09-25 15:01 ` [RFC PATCH 21/28] x86/head: Use PIC-compatible symbol references in startup code Ard Biesheuvel
2024-09-25 15:01 ` [RFC PATCH 22/28] asm-generic: Treat PIC .data.rel.ro sections as .rodata Ard Biesheuvel
2024-09-25 15:01 ` [RFC PATCH 23/28] tools/objtool: Mark generated sections as writable Ard Biesheuvel
2024-09-25 15:01 ` [RFC PATCH 24/28] tools/objtool: Treat indirect ftrace calls as direct calls Ard Biesheuvel
2024-10-01 7:18 ` Josh Poimboeuf
2024-10-01 7:39 ` Ard Biesheuvel
2024-09-25 15:01 ` [RFC PATCH 25/28] x86: Use PIE codegen for the core kernel Ard Biesheuvel
2024-10-01 21:13 ` H. Peter Anvin
2024-10-02 15:25 ` Ard Biesheuvel
2024-10-02 20:01 ` Linus Torvalds
2024-10-03 11:13 ` Ard Biesheuvel
2024-10-04 21:06 ` H. Peter Anvin
2024-10-05 8:31 ` Uros Bizjak
2024-10-05 23:36 ` H. Peter Anvin
2024-10-06 0:00 ` Linus Torvalds
2024-10-06 8:06 ` Uros Bizjak
2024-10-06 7:59 ` Uros Bizjak
2024-10-06 18:00 ` David Laight
2024-10-06 19:17 ` Uros Bizjak
2024-10-06 19:38 ` H. Peter Anvin
2024-09-25 15:01 ` [RFC PATCH 26/28] x86/boot: Implement support for ELF RELA/RELR relocations Ard Biesheuvel
2024-09-25 15:01 ` [RFC PATCH 27/28] x86/kernel: Switch to PIE linking for the core kernel Ard Biesheuvel
2024-09-25 18:54 ` Uros Bizjak
2024-09-25 19:14 ` Ard Biesheuvel
2024-09-25 19:39 ` Uros Bizjak
2024-09-25 20:01 ` Ard Biesheuvel
2024-09-25 20:22 ` Uros Bizjak
2024-09-25 20:24 ` Vegard Nossum
2024-09-26 13:38 ` Ard Biesheuvel
2024-09-25 15:01 ` [RFC PATCH 28/28] x86/tools: Drop x86_64 support from 'relocs' tool Ard Biesheuvel
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).