All the mail mirrored from lore.kernel.org
 help / color / mirror / Atom feed
* [RFC PATCH 0/4 v1] LPC materials: livedump
@ 2023-11-10 15:00 Lukas Hruska
  2023-11-10 15:00 ` [RFC PATCH 1/4 v1] crash/vmcore: VMCOREINFO creation from non-kdump kernel Lukas Hruska
                   ` (3 more replies)
  0 siblings, 4 replies; 10+ messages in thread
From: Lukas Hruska @ 2023-11-10 15:00 UTC (permalink / raw
  To: linux-debuggers, linux-kernel; +Cc: Michal Koutny, YOSHIDA Masanori

Quick note
----------

This patchset is primarily here as materials for presentation at the
Linux Plumber Conference. I will appreciate any feedback you can
provide, whether in person at the conference or here. This patch is a
continuation in the development of a long-unupdated patch by YOSHIDA
Masanori. The last version was v3, see [1]


Summary
-------

Linux Kernel currently has a mechanism to create a dump of a whole memory for
further debugging of an observed issue with the help of crashkernel.
Unfortunately, we are unable to do this without restarting the host which causes
a problem in case of having a high availability service running on the system
experiencing some complex issue that cannot be debugged without the complete
memory dump and hypervisor-assisted dumps are not an option on bare metal
setups. For this purpose, there is a live dump mechanism being developed which
was initially introduced by Yoshida Maasanori [1] in 2012. This PoC was already
able to create a consistent image of memory with the support of dumping the data
into a reserved raw block device.


Mechanism overview
------------------

Live Dump is based on Copy-on-write technique. Basically processing is
performed in the following order.
(1) Suspends processing of all CPUs.
(2) Makes pages (which you want to dump) read-only.
(3) Dumps hard-to-handle pages (that cannot fault)
(4) Resumes all CPUs
(5) On page fault, dumps a faulting page.
(6) Finally, dumps the rest of pages that are not updated.

Page fault handler sends a dump request to the queue handled by
"livedump" kthread which is in charge of dumping to disk. If ever the
queue becomes full, livedump simply fails, since livedump's page fault
can never sleep to wait for space.


TODO
----
- Large page support
	Currently livedump can dump only 4K pages, and so it splits all
	pages in kernel space in advance. This may cause big TLB overhead.
 - Other target storage support
	Currently livedump can dump only to block device. Practically,
	dumping to normal file is necessary.
 - Other space/area support
	Currently livedump write-protect only kernel's straight mapping
	area. Pages in vmap area cannot be dumped consistently.
- Other CPU architecture support
	Currently livedump supports only x86-64.
- Testing
	Testbench and measurements to provide guarantees about
	(non)intrusiveness of livedump mechanism under certain conditions.


Summary of changes since 2012 version
-------------------------------------
- rebase for v6.2
- fs/vmcore code modification to be reused by livedump
- memdump output change to ELF format
- crash tool modification not needed anymore
- all loops through pfn's replaced with pagewalk
- 5-level paging support
- multiple bitmaps handling page-faults for correct restoration of PTE's state
- rewrite API from ioctls to sysfs


[1] https://lore.kernel.org/r/20121011055356.6719.46214.stgit@t3500.sdl.hitachi.co.jp/

YOSHIDA Masanori (1):
  livedump: Add memory dumping functionality

Lukas Hruska (3):
  crash/vmcore: VMCOREINFO creation from non-kdump kernel
  livedump: Add write protection management
  livedump: Add tools to make livedump creation easier

 arch/x86/Kconfig                   |  29 ++
 arch/x86/include/asm/wrprotect.h   |  39 ++
 arch/x86/mm/Makefile               |   2 +
 arch/x86/mm/fault.c                |   8 +
 arch/x86/mm/wrprotect.c            | 744 +++++++++++++++++++++++++++++
 fs/proc/vmcore.c                   |  57 +--
 include/linux/crash_dump.h         |   2 +
 kernel/Makefile                    |   1 +
 kernel/crash_core.c                |  10 +-
 kernel/crash_dump.c                |  38 ++
 kernel/livedump/Makefile           |   2 +
 kernel/livedump/core.c             | 262 ++++++++++
 kernel/livedump/memdump.c          | 525 ++++++++++++++++++++
 kernel/livedump/memdump.h          |  32 ++
 kernel/livedump/memdump_trace.h    |  30 ++
 tools/livedump/livedump.sh         |  44 ++
 tools/livedump/livedump_extract.sh |  19 +
 17 files changed, 1803 insertions(+), 41 deletions(-)
 create mode 100644 arch/x86/include/asm/wrprotect.h
 create mode 100644 arch/x86/mm/wrprotect.c
 create mode 100644 kernel/livedump/Makefile
 create mode 100644 kernel/livedump/core.c
 create mode 100644 kernel/livedump/memdump.c
 create mode 100644 kernel/livedump/memdump.h
 create mode 100644 kernel/livedump/memdump_trace.h
 create mode 100755 tools/livedump/livedump.sh
 create mode 100755 tools/livedump/livedump_extract.sh


^ permalink raw reply	[flat|nested] 10+ messages in thread

* [RFC PATCH 1/4 v1] crash/vmcore: VMCOREINFO creation from non-kdump kernel
  2023-11-10 15:00 [RFC PATCH 0/4 v1] LPC materials: livedump Lukas Hruska
@ 2023-11-10 15:00 ` Lukas Hruska
  2023-11-10 21:56   ` kernel test robot
  2023-11-10 23:26   ` kernel test robot
  2023-11-10 15:00 ` [RFC PATCH 2/4 v1] livedump: Add write protection management Lukas Hruska
                   ` (2 subsequent siblings)
  3 siblings, 2 replies; 10+ messages in thread
From: Lukas Hruska @ 2023-11-10 15:00 UTC (permalink / raw
  To: linux-debuggers, linux-kernel; +Cc: Michal Koutny, YOSHIDA Masanori

Currently there is only fs/vmcore which requires generation of
VMCOREINFO and modification of ELF notes. That is why there are some
functions which expects they will be called only once and do not try to
recover their previous state. Because livedump output is similar to
vmcore file, it needs to do the same, only from non-kdump kernel.  To
prevent code duplicity and re-use already existing functions they must
be modified to properly restore their state.

Export VMCORE's ELF header creating functing. This way it can be used not only
for kdump ELF file generation.

Correctly restore the original pointer to VMCOREINFO destination in
crash_update_vmcoreinfo_safecopy so it can be used more than once.

Add check if elfcorehdr_read_* functions are being called from original kernel
or kdump kernel and correctly pick from which memory to read from.

Signed-off-by: Lukas Hruska <lhruska@suse.cz>
---
 fs/proc/vmcore.c           | 57 ++++++++++++--------------------------
 include/linux/crash_dump.h |  2 ++
 kernel/crash_core.c        | 10 ++++++-
 kernel/crash_dump.c        | 38 +++++++++++++++++++++++++
 4 files changed, 66 insertions(+), 41 deletions(-)

diff --git a/fs/proc/vmcore.c b/fs/proc/vmcore.c
index 09a81e4b1273..806420d07d8c 100644
--- a/fs/proc/vmcore.c
+++ b/fs/proc/vmcore.c
@@ -191,33 +191,6 @@ int __weak elfcorehdr_alloc(unsigned long long *addr, unsigned long long *size)
 void __weak elfcorehdr_free(unsigned long long addr)
 {}
 
-/*
- * Architectures may override this function to read from ELF header
- */
-ssize_t __weak elfcorehdr_read(char *buf, size_t count, u64 *ppos)
-{
-	struct kvec kvec = { .iov_base = buf, .iov_len = count };
-	struct iov_iter iter;
-
-	iov_iter_kvec(&iter, ITER_DEST, &kvec, 1, count);
-
-	return read_from_oldmem(&iter, count, ppos, false);
-}
-
-/*
- * Architectures may override this function to read from notes sections
- */
-ssize_t __weak elfcorehdr_read_notes(char *buf, size_t count, u64 *ppos)
-{
-	struct kvec kvec = { .iov_base = buf, .iov_len = count };
-	struct iov_iter iter;
-
-	iov_iter_kvec(&iter, ITER_DEST, &kvec, 1, count);
-
-	return read_from_oldmem(&iter, count, ppos,
-			cc_platform_has(CC_ATTR_MEM_ENCRYPT));
-}
-
 /*
  * Architectures may override this function to map oldmem
  */
@@ -721,7 +694,7 @@ static u64 get_vmcore_size(size_t elfsz, size_t elfnotesegsz,
  * program header table pointed to by @ehdr_ptr to real size of ELF
  * note segment.
  */
-static int __init update_note_header_size_elf64(const Elf64_Ehdr *ehdr_ptr)
+static int update_note_header_size_elf64(const Elf64_Ehdr *ehdr_ptr)
 {
 	int i, rc=0;
 	Elf64_Phdr *phdr_ptr;
@@ -735,14 +708,17 @@ static int __init update_note_header_size_elf64(const Elf64_Ehdr *ehdr_ptr)
 			continue;
 		max_sz = phdr_ptr->p_memsz;
 		offset = phdr_ptr->p_offset;
-		notes_section = kmalloc(max_sz, GFP_KERNEL);
-		if (!notes_section)
-			return -ENOMEM;
-		rc = elfcorehdr_read_notes(notes_section, max_sz, &offset);
-		if (rc < 0) {
-			kfree(notes_section);
-			return rc;
-		}
+		if (is_kdump_kernel()) {
+			notes_section = kmalloc(max_sz, GFP_KERNEL);
+			if (!notes_section)
+				return -ENOMEM;
+			rc = elfcorehdr_read_notes(notes_section, max_sz, &offset);
+			if (rc < 0) {
+				kfree(notes_section);
+				return rc;
+			}
+		} else
+			notes_section = __va(phdr_ptr->p_paddr);
 		nhdr_ptr = notes_section;
 		while (nhdr_ptr->n_namesz != 0) {
 			sz = sizeof(Elf64_Nhdr) +
@@ -756,7 +732,8 @@ static int __init update_note_header_size_elf64(const Elf64_Ehdr *ehdr_ptr)
 			real_sz += sz;
 			nhdr_ptr = (Elf64_Nhdr*)((char*)nhdr_ptr + sz);
 		}
-		kfree(notes_section);
+		if (is_kdump_kernel())
+			kfree(notes_section);
 		phdr_ptr->p_memsz = real_sz;
 		if (real_sz == 0) {
 			pr_warn("Warning: Zero PT_NOTE entries found\n");
@@ -784,7 +761,7 @@ static int __init update_note_header_size_elf64(const Elf64_Ehdr *ehdr_ptr)
  * and each of PT_NOTE program headers has actual ELF note segment
  * size in its p_memsz member.
  */
-static int __init get_note_number_and_size_elf64(const Elf64_Ehdr *ehdr_ptr,
+static int get_note_number_and_size_elf64(const Elf64_Ehdr *ehdr_ptr,
 						 int *nr_ptnote, u64 *sz_ptnote)
 {
 	int i;
@@ -819,7 +796,7 @@ static int __init get_note_number_and_size_elf64(const Elf64_Ehdr *ehdr_ptr,
  * and each of PT_NOTE program headers has actual ELF note segment
  * size in its p_memsz member.
  */
-static int __init copy_notes_elf64(const Elf64_Ehdr *ehdr_ptr, char *notes_buf)
+static int copy_notes_elf64(const Elf64_Ehdr *ehdr_ptr, char *notes_buf)
 {
 	int i, rc=0;
 	Elf64_Phdr *phdr_ptr;
@@ -842,7 +819,7 @@ static int __init copy_notes_elf64(const Elf64_Ehdr *ehdr_ptr, char *notes_buf)
 }
 
 /* Merges all the PT_NOTE headers into one. */
-static int __init merge_note_headers_elf64(char *elfptr, size_t *elfsz,
+int merge_note_headers_elf64(char *elfptr, size_t *elfsz,
 					   char **notes_buf, size_t *notes_sz)
 {
 	int i, nr_ptnote=0, rc=0;
diff --git a/include/linux/crash_dump.h b/include/linux/crash_dump.h
index 0f3a656293b0..53cc971cf5b6 100644
--- a/include/linux/crash_dump.h
+++ b/include/linux/crash_dump.h
@@ -28,6 +28,8 @@ ssize_t copy_oldmem_page(struct iov_iter *i, unsigned long pfn, size_t csize,
 		unsigned long offset);
 ssize_t copy_oldmem_page_encrypted(struct iov_iter *iter, unsigned long pfn,
 				   size_t csize, unsigned long offset);
+int merge_note_headers_elf64(char *elfptr, size_t *elfsz,
+					   char **notes_buf, size_t *notes_sz);
 
 void vmcore_cleanup(void);
 
diff --git a/kernel/crash_core.c b/kernel/crash_core.c
index 87ef6096823f..e90fd3a79411 100644
--- a/kernel/crash_core.c
+++ b/kernel/crash_core.c
@@ -357,15 +357,23 @@ void crash_update_vmcoreinfo_safecopy(void *ptr)
 
 void crash_save_vmcoreinfo(void)
 {
+	unsigned char *tmp_vmcoreinfo_data;
+
 	if (!vmcoreinfo_note)
 		return;
 
 	/* Use the safe copy to generate vmcoreinfo note if have */
-	if (vmcoreinfo_data_safecopy)
+	if (vmcoreinfo_data_safecopy) {
+		tmp_vmcoreinfo_data = vmcoreinfo_data;
 		vmcoreinfo_data = vmcoreinfo_data_safecopy;
+	}
 
 	vmcoreinfo_append_str("CRASHTIME=%lld\n", ktime_get_real_seconds());
 	update_vmcoreinfo_note();
+
+	/* Restore the original destination so it can be used multiple times */
+	if (vmcoreinfo_data_safecopy)
+		vmcoreinfo_data = tmp_vmcoreinfo_data;
 }
 
 void vmcoreinfo_append_str(const char *fmt, ...)
diff --git a/kernel/crash_dump.c b/kernel/crash_dump.c
index 92da32275af5..0122c1694111 100644
--- a/kernel/crash_dump.c
+++ b/kernel/crash_dump.c
@@ -39,3 +39,41 @@ static int __init setup_elfcorehdr(char *arg)
 	return end > arg ? 0 : -EINVAL;
 }
 early_param("elfcorehdr", setup_elfcorehdr);
+
+/*
+ * Architectures may override this function to read from ELF header
+ */
+ssize_t __weak elfcorehdr_read(char *buf, size_t count, u64 *ppos)
+{
+	struct kvec kvec = { .iov_base = buf, .iov_len = count };
+	struct iov_iter iter;
+
+	if (!is_kdump_kernel()) {
+		memcpy(buf, ppos, count);
+		return count;
+	}
+
+	iov_iter_kvec(&iter, ITER_DEST, &kvec, 1, count);
+
+	return read_from_oldmem(&iter, count, ppos, false);
+}
+
+/*
+ * Architectures may override this function to read from notes sections
+ */
+ssize_t __weak elfcorehdr_read_notes(char *buf, size_t count, u64 *ppos)
+{
+	struct kvec kvec = { .iov_base = buf, .iov_len = count };
+	struct iov_iter iter;
+
+	if (!is_kdump_kernel()) {
+		memcpy(buf, __va(*ppos), count);
+		return count;
+	}
+
+	iov_iter_kvec(&iter, ITER_DEST, &kvec, 1, count);
+
+	return read_from_oldmem(&iter, count, ppos,
+			cc_platform_has(CC_ATTR_MEM_ENCRYPT));
+}
+
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [RFC PATCH 2/4 v1] livedump: Add write protection management
  2023-11-10 15:00 [RFC PATCH 0/4 v1] LPC materials: livedump Lukas Hruska
  2023-11-10 15:00 ` [RFC PATCH 1/4 v1] crash/vmcore: VMCOREINFO creation from non-kdump kernel Lukas Hruska
@ 2023-11-10 15:00 ` Lukas Hruska
  2023-11-11  5:20   ` kernel test robot
  2023-11-10 15:00 ` [RFC PATCH 3/4 v1] livedump: Add memory dumping functionality Lukas Hruska
  2023-11-10 15:00 ` [RFC PATCH 4/4 v1] livedump: Add tools to make livedump creation easier Lukas Hruska
  3 siblings, 1 reply; 10+ messages in thread
From: Lukas Hruska @ 2023-11-10 15:00 UTC (permalink / raw
  To: linux-debuggers, linux-kernel; +Cc: Michal Koutny, YOSHIDA Masanori

This patch makes it possible to write-protect pages in kernel space and to
install a handler function that is called every time when page fault occurs
on the protected page. The write protection is executed in the stop-machine
state to protect all pages consistently.

Processing of write protection and fault handling is executed in the order
as follows:

(1) Initialization phase
  - Sets up data structure for write protection management.
  - Splits all large pages in kernel space into 4K pages since currently
    livedump can handle only 4K pages. In the future, this step (page
    splitting) should be eliminated.
(2) Write protection phase
  - Stops machine.
  - Handles sensitive pages.
    (described below about sensitive pages)
  - Sets up write protection.
  - Resumes machine.
(3) Page fault exception handling
  - Calls the handler function before unprotecting the faulted page.
(4) Sweep phase
  - Calls the handler function against the rest of pages.
(5) Uninitialization phase
  - Cleans up all data structure for write protection management.

This module have following 4 phases.
- initialization phase
- write protection phase
- sweep phase
- uninitialization phase

States of processing is as follows. They can transit only in this order.
- STATE_UNINIT
- STATE_INITED
- STATE_STARTED (= write protection already set up)
- STATE_SWEPT

However, this order is protected by a normal integer variable, therefore,
to be exact, this code is not yet safe against concurrent operation.

The livedump module has to acquire consistent memory image of kernel space.
Therefore, write protection is set up while update of memory state is
suspended. To do so, the livedump uses stop_machine currently.

Causing livedump's page fault (LPF) during LPF handling results in nested
LPF handling. Since LPF handler uses spinlocks, this situation may cause
deadlock. Therefore, any pages that can be updated during LPF handling must
not be write-protected. For the same reason, any pages that can be updated
during NMI handling must not be write-protected. NMI can happen during LPF
handling, and so LPF during NMI handling also results in nested LPF
handling. I call such pages that must not be write-protected
"sensitive page". Against the sensitive pages, the handler function is
called during the stop-machine state and they are not write-protected.

I list the sensitive pages in the following:

- Kernel/Exception/Interrupt stacks
- Page table structure
- All task_struct
- ".data" section of kernel
- per_cpu areas

Pages that are not updated don't cause page fault and so the handler
function is not invoked against them. To handle these pages, the livedump
module finally needs to call the handler function against each of them.
I call this phase "sweep", which is triggered by sysfs state attribute.

Currently the pagewalk and sweep through direct-mapped addresses only but it
can already corretly handle multiple page faults from different addresses
pointing to the same physical address. To achieve this there are 2 bitmaps
which are both bound to PFN directly. One of them indicates if wrprotect is
handling addresses pointing at this PFN and second one indicates whether the
content of this PFN was already saved or not.

To support vmap area the wrprotect must correctly handle synchronization
primitives which are currently defined as static (vmap_area_lock,
free_vmap_area_lock, ...), so either export of these variables or
implementation of vmap area walk which would handle the correct synchronization
before calling the provided function is needed.

Because PTE's R/W permissions might change during all wrprotect's phases the
module needs to track these changes and correctly modify the PTE's _PAGE_SOFTW1
value which hold the value to which the current R/O permissions are then
restored to by calling the protect_pte(addr, 0).

Signed-off-by: YOSHIDA Masanori <masanori.yoshida.tv@hitachi.com>
Signed-off-by: Lukas Hruska <lhruska@suse.cz>
---
 arch/x86/Kconfig                 |  14 +
 arch/x86/include/asm/wrprotect.h |  41 ++
 arch/x86/mm/Makefile             |   2 +
 arch/x86/mm/fault.c              |   8 +
 arch/x86/mm/wrprotect.c          | 754 +++++++++++++++++++++++++++++++
 5 files changed, 819 insertions(+)
 create mode 100644 arch/x86/include/asm/wrprotect.h
 create mode 100644 arch/x86/mm/wrprotect.c

diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index 3604074a878b..ef3550697be1 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -2448,6 +2448,20 @@ config CMDLINE_OVERRIDE
 	  This is used to work around broken boot loaders.  This should
 	  be set to 'N' under normal conditions.
 
+config WRPROTECT
+	bool "Write protection on kernel space"
+	depends on X86_64
+	help
+	  Set this option to 'Y' to allow the kernel to write protect
+	  its own memory space and to handle page fault caused by the
+	  write protection.
+
+	  This feature regularly causes small overhead on kernel.
+	  Once this feature is activated, it causes much more overhead
+	  on kernel.
+
+	  If in doubt, say N.
+
 config MODIFY_LDT_SYSCALL
 	bool "Enable the LDT (local descriptor table)" if EXPERT
 	default y
diff --git a/arch/x86/include/asm/wrprotect.h b/arch/x86/include/asm/wrprotect.h
new file mode 100644
index 000000000000..f0cb2da870c5
--- /dev/null
+++ b/arch/x86/include/asm/wrprotect.h
@@ -0,0 +1,41 @@
+/* SPDX-License-Identifier: GPL-2.0-or-later */
+/*
+ * wrprortect.h - Kernel space write protection support
+ * Copyright (C) 2012 Hitachi, Ltd.
+ * Copyright (C) 2023 SUSE
+ * Author: YOSHIDA Masanori <masanori.yoshida.tv@hitachi.com>
+ * Author: Lukas Hruska <lhruska@suse.cz>
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version 2
+ * of the License, or (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ */
+
+#ifndef _WRPROTECT_H
+#define _WRPROTECT_H
+
+#include <linux/mm.h>		/* PAGE_SIZE */
+
+typedef void (*fn_handle_page_t)(unsigned long pfn, unsigned long addr, int for_sweep);
+typedef void (*fn_sm_init_t)(void);
+
+extern int wrprotect_init(
+		fn_handle_page_t fn_handle_page,
+		fn_sm_init_t fn_sm_init);
+extern void wrprotect_uninit(void);
+extern int wrprotect_start(void);
+extern int wrprotect_sweep(void);
+extern void wrprotect_unselect_pages(
+		unsigned long start,
+		unsigned long len);
+extern int wrprotect_page_fault_handler(unsigned long error_code);
+
+extern int wrprotect_is_on;
+
+#endif /* _WRPROTECT_H */
diff --git a/arch/x86/mm/Makefile b/arch/x86/mm/Makefile
index c80febc44cd2..795d0f202ed3 100644
--- a/arch/x86/mm/Makefile
+++ b/arch/x86/mm/Makefile
@@ -58,6 +58,8 @@ obj-$(CONFIG_AMD_NUMA)		+= amdtopology.o
 obj-$(CONFIG_ACPI_NUMA)		+= srat.o
 obj-$(CONFIG_NUMA_EMU)		+= numa_emulation.o
 
+obj-$(CONFIG_WRPROTECT)		+= wrprotect.o
+
 obj-$(CONFIG_X86_INTEL_MEMORY_PROTECTION_KEYS)	+= pkeys.o
 obj-$(CONFIG_RANDOMIZE_MEMORY)			+= kaslr.o
 obj-$(CONFIG_PAGE_TABLE_ISOLATION)		+= pti.o
diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c
index 7b0d4ab894c8..02752f1f78a5 100644
--- a/arch/x86/mm/fault.c
+++ b/arch/x86/mm/fault.c
@@ -33,6 +33,7 @@
 #include <asm/kvm_para.h>		/* kvm_handle_async_pf		*/
 #include <asm/vdso.h>			/* fixup_vdso_exception()	*/
 #include <asm/irq_stack.h>
+#include <asm/wrprotect.h>		/* wrprotect_is_on, ...		*/
 
 #define CREATE_TRACE_POINTS
 #include <asm/trace/exceptions.h>
@@ -1512,6 +1513,13 @@ handle_page_fault(struct pt_regs *regs, unsigned long error_code,
 	if (unlikely(kmmio_fault(regs, address)))
 		return;
 
+#ifdef CONFIG_WRPROTECT
+	/* only react on protection fault with write access */
+	if (unlikely(wrprotect_is_on))
+		if (wrprotect_page_fault_handler(error_code))
+			return;
+#endif /* CONFIG_WRPROTECT */
+
 	/* Was the fault on kernel-controlled part of the address space? */
 	if (unlikely(fault_in_kernel_space(address))) {
 		do_kern_addr_fault(regs, error_code, address);
diff --git a/arch/x86/mm/wrprotect.c b/arch/x86/mm/wrprotect.c
new file mode 100644
index 000000000000..534f7c133709
--- /dev/null
+++ b/arch/x86/mm/wrprotect.c
@@ -0,0 +1,754 @@
+// SPDX-License-Identifier: GPL-2.0-or-later
+/*
+ * wrprotect.c - Kernel space write protection support
+ * Copyright (C) 2012 Hitachi, Ltd.
+ * Copyright (C) 2023 SUSE
+ * Author: YOSHIDA Masanori <masanori.yoshida.tv@hitachi.com>
+ * Author: Lukas Hruska <lhruska@suse.cz>
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version 2
+ * of the License, or (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ */
+
+#include <asm/wrprotect.h>
+#include <linux/mm.h>		/* __get_free_page, etc. */
+#include <linux/bitmap.h>	/* bit operations */
+#include <linux/memblock.h> /* max_pfn */
+#include <linux/vmalloc.h>	/* vzalloc, vfree */
+#include <linux/hugetlb.h>	/* __flush_tlb_all */
+#include <linux/pagewalk.h>	/* walk_page_range_novma */
+#include <linux/stop_machine.h>	/* stop_machine */
+#include <asm/sections.h>	/* __per_cpu_* */
+#include <asm/set_memory.h> /* set_memory_4k */
+#include <asm/e820/api.h>	/* e820__mapped_any */
+#include <asm/e820/types.h>	/* E820_TYPE_RAM */
+
+#define PGBMP_LEN			PAGE_ALIGN(sizeof(long) * BITS_TO_LONGS(max_pfn))
+#define DIRECT_MAP_SIZE		(1UL << MAX_PHYSMEM_BITS)
+
+enum state {
+	WRPROTECT_STATE_UNINIT,
+	WRPROTECT_STATE_INITED,
+	WRPROTECT_STATE_STARTED,
+	WRPROTECT_STATE_SWEPT,
+};
+
+/* wrprotect's stuffs */
+struct wrprotect_state {
+	enum state state;
+
+	/*
+	 * r/o bitmap after initialization
+	 * 0: there is no virt-address pointing at this pfn which
+	 *    this module ever holded
+	 * 1: there exists an virt-address pointing at this pfn which
+	 *    is wprotect interested in
+	 */
+	unsigned long *pgbmp_original;
+	/*
+	 * r/w bitmap
+	 * 0: content of this pfn was already saved
+	 * 1: content of this pfn was still not saved yet
+	 */
+	unsigned long *pgbmp_save;
+
+	fn_handle_page_t handle_page;
+	fn_sm_init_t sm_init;
+} __aligned(PAGE_SIZE);
+
+int wrprotect_is_on;
+struct wrprotect_state wrprotect_state;
+
+static int split_large_pages_walk_pud(pud_t *pud, unsigned long addr, unsigned long next,
+	struct mm_walk *walk)
+{
+	int ret = 0;
+
+	if (pud_present(*pud) && pud_large(*pud))
+		ret = set_memory_4k(addr, 1);
+	if (ret)
+		return -EFAULT;
+
+	return 0;
+}
+
+static int split_large_pages_walk_pmd(pmd_t *pmd, unsigned long addr, unsigned long next,
+	struct mm_walk *walk)
+{
+	int ret = 0;
+
+	if (pmd_present(*pmd) && pmd_large(*pmd))
+		ret = set_memory_4k(addr, 1);
+	if (ret)
+		return -EFAULT;
+
+	return 0;
+}
+
+/* split_large_pages
+ *
+ * This function splits all large pages in straight mapping area into 4K ones.
+ * Currently wrprotect supports only 4K pages, and so this is needed.
+ */
+static int split_large_pages(void)
+{
+	int ret;
+	struct mm_walk_ops split_large_pages_walk_ops;
+
+	memset(&split_large_pages_walk_ops, 0, sizeof(struct mm_walk_ops));
+	split_large_pages_walk_ops.pud_entry = split_large_pages_walk_pud;
+	split_large_pages_walk_ops.pmd_entry = split_large_pages_walk_pmd;
+
+	mmap_write_lock(&init_mm);
+	ret = walk_page_range_novma(&init_mm, PAGE_OFFSET, PAGE_OFFSET + DIRECT_MAP_SIZE,
+		&split_large_pages_walk_ops, init_mm.pgd, NULL);
+	mmap_write_unlock(&init_mm);
+
+	return 0;
+}
+
+struct sm_context {
+	int leader_cpu;
+	int leader_done;
+	int (*fn_leader)(void *arg);
+	int (*fn_follower)(void *arg);
+	void *arg;
+};
+
+static int call_leader_follower(void *data)
+{
+	int ret;
+	struct sm_context *ctx = data;
+
+	if (smp_processor_id() == ctx->leader_cpu) {
+		ret = ctx->fn_leader(ctx->arg);
+		ctx->leader_done = 1;
+	} else {
+		while (!ctx->leader_done)
+			cpu_relax();
+		ret = ctx->fn_follower(ctx->arg);
+	}
+
+	return ret;
+}
+
+/* stop_machine_leader_follower
+ *
+ * Calls stop_machine with a leader CPU and follower CPUs
+ * executing different codes.
+ * At first, the leader CPU is selected randomly and executes its code.
+ * After that, follower CPUs execute their codes.
+ */
+static int stop_machine_leader_follower(
+		int (*fn_leader)(void *),
+		int (*fn_follower)(void *),
+		void *arg)
+{
+	int cpu;
+	struct sm_context ctx;
+
+	preempt_disable();
+	cpu = smp_processor_id();
+	preempt_enable();
+
+	memset(&ctx, 0, sizeof(ctx));
+	ctx.leader_cpu = cpu;
+	ctx.leader_done = 0;
+	ctx.fn_leader = fn_leader;
+	ctx.fn_follower = fn_follower;
+	ctx.arg = arg;
+
+	return stop_machine(call_leader_follower, &ctx, cpu_online_mask);
+}
+
+/*
+ * This functions converts kernel address to it's pfn in most optimal way:
+ * direct mapping address -> __pa
+ * other address -> lookup_address -> pte_pfn
+ */
+static unsigned long kernel_address_to_pfn(unsigned long addr, int *level)
+{
+	pte_t *ptep;
+	unsigned long pfn;
+
+	if (addr >= PAGE_OFFSET && addr < PAGE_OFFSET + DIRECT_MAP_SIZE) {
+		// Direct-mapped addresses
+		pfn = __pa(addr) >> PAGE_SHIFT;
+	} else {
+		// Non-direct-mapped addresses
+		ptep = lookup_address((unsigned long)addr, level);
+		if (ptep && pte_present(*ptep))
+			pfn = pte_pfn(*ptep);
+		else
+			pfn = 0;
+	}
+
+	return pfn;
+}
+
+/* wrprotect_unselect_pages
+ *
+ * This function clears bits corresponding to pages that cover a range
+ * from start to start+len.
+ */
+void wrprotect_unselect_pages(
+		unsigned long start,
+		unsigned long len)
+{
+	unsigned long addr, pfn;
+	int level;
+
+	BUG_ON(start & ~PAGE_MASK);
+	BUG_ON(len & ~PAGE_MASK);
+
+	for (addr = start; addr < start + len; addr += PAGE_SIZE) {
+		pfn = kernel_address_to_pfn(addr, &level);
+		clear_bit(pfn, wrprotect_state.pgbmp_original);
+	}
+}
+
+/* handle_addr_range
+ *
+ * This function executes wrprotect_state.handle_page in turns against pages that
+ * cover a range from start to start+len.
+ * At the same time, it clears bits corresponding to the pages.
+ */
+static void handle_addr_range(unsigned long start, unsigned long len)
+{
+	int level;
+	unsigned long end = start + len;
+	unsigned long pfn;
+
+	start &= PAGE_MASK;
+	while (start < end) {
+		pfn = kernel_address_to_pfn(start, &level);
+		if (test_bit(pfn, wrprotect_state.pgbmp_original)) {
+			wrprotect_state.handle_page(pfn, start, 0);
+			clear_bit(pfn, wrprotect_state.pgbmp_original);
+		}
+		start += PAGE_SIZE;
+	}
+}
+
+/* handle_task
+ *
+ * This function executes handle_addr_range against task_struct & thread_info.
+ */
+static void handle_task(struct task_struct *t)
+{
+	BUG_ON(!t);
+	BUG_ON(!t->stack);
+	BUG_ON((unsigned long)t->stack & ~PAGE_MASK);
+	handle_addr_range((unsigned long)t, sizeof(*t));
+	handle_addr_range((unsigned long)t->stack, THREAD_SIZE);
+}
+
+/* handle_tasks
+ *
+ * This function executes handle_task against all tasks (including idle_task).
+ */
+static void handle_tasks(void)
+{
+	struct task_struct *p, *t;
+	unsigned int cpu;
+
+	do_each_thread(p, t) {
+		handle_task(t);
+	} while_each_thread(p, t);
+
+	for_each_online_cpu(cpu)
+		handle_task(idle_task(cpu));
+}
+
+static void handle_pmd(pmd_t *pmd)
+{
+	unsigned long i;
+
+	handle_addr_range((unsigned long)pmd, PAGE_SIZE);
+	for (i = 0; i < PTRS_PER_PMD; i++) {
+		if (pmd_present(pmd[i]) && !pmd_large(pmd[i]))
+			handle_addr_range(pmd_page_vaddr(pmd[i]), PAGE_SIZE);
+	}
+}
+
+static void handle_pud(pud_t *pud)
+{
+	unsigned long i;
+
+	handle_addr_range((unsigned long)pud, PAGE_SIZE);
+	for (i = 0; i < PTRS_PER_PUD; i++) {
+		if (pud_present(pud[i]) && !pud_large(pud[i]))
+			handle_pmd((pmd_t *)pud_pgtable(pud[i]));
+	}
+}
+
+static void handle_p4d(p4d_t *p4d)
+{
+	unsigned long i;
+
+	handle_addr_range((unsigned long)p4d, PAGE_SIZE);
+	for (i = 0; i < PTRS_PER_P4D; i++) {
+		if (p4d_present(p4d[i]))
+			handle_pud((pud_t *)p4d_pgtable(p4d[i]));
+	}
+}
+
+/* handle_page_table
+ *
+ * This function executes wrprotect_state.handle_page against all pages that make up
+ * page table structure and clears all bits corresponding to the pages.
+ */
+static void handle_page_table(void)
+{
+	pgd_t *pgd;
+	p4d_t *p4d;
+	unsigned long i;
+
+	pgd = init_mm.pgd;
+	handle_addr_range((unsigned long)pgd, PAGE_SIZE);
+	for (i = pgd_index(PAGE_OFFSET); i < PTRS_PER_PGD; i++) {
+		if (pgd_present(pgd[i])) {
+			if (!pgtable_l5_enabled())
+				p4d = (p4d_t *)(pgd+i);
+			else
+				p4d = (p4d_t *)pgd_page_vaddr(pgd[i]);
+			handle_p4d(p4d);
+		}
+	}
+}
+
+/* handle_sensitive_pages
+ *
+ * This function executes wrprotect_state.handle_page against the following pages and
+ * clears bits corresponding to them.
+ * - All pages that include task_struct & thread_info
+ * - All pages that make up page table structure
+ * - All pages that include per_cpu variables
+ * - All pages that cover kernel's data section
+ */
+static void handle_sensitive_pages(void)
+{
+	handle_tasks();
+	handle_page_table();
+	handle_addr_range((unsigned long)__per_cpu_offset[0], HPAGE_SIZE);
+	handle_addr_range((unsigned long)_sdata, _edata - _sdata);
+}
+
+/* protect_pte
+ *
+ * Changes a specified page's _PAGE_RW flag and _PAGE_SOFTW1 flag.
+ * If the argument protect is non-zero:
+ *  - _PAGE_RW flag is cleared
+ *  - _PAGE_SOFTW1 flag is set to original value of _PAGE_RW
+ * If the argument protect is zero:
+ *  - _PAGE_RW flag is set to _PAGE_SOFTW1
+ *
+ * The change is executed only when all the following are true.
+ *  - The page is mapped as 4K page.
+ *  - The page is originally writable.
+ *
+ * Returns 1 if the change is actually executed, otherwise returns 0.
+ */
+static int protect_pte(unsigned long addr, int protect)
+{
+	pte_t *ptep, pte;
+	unsigned int level;
+
+	ptep = lookup_address(addr, &level);
+	if (WARN(!ptep, "livedump: Page=%016lx isn't mapped.\n", addr) ||
+	    WARN(!pte_present(*ptep),
+		    "livedump: Page=%016lx isn't mapped.\n", addr) ||
+	    WARN(level == PG_LEVEL_NONE,
+		    "livedump: Page=%016lx isn't mapped.\n", addr) ||
+	    WARN(level == PG_LEVEL_2M,
+		    "livedump: Page=%016lx is consisted of 2M page.\n", addr) ||
+	    WARN(level == PG_LEVEL_1G,
+		    "livedump: Page=%016lx is consisted of 1G page.\n", addr)) {
+		return 0;
+	}
+
+	pte = *ptep;
+	if (protect) {
+		if (pte_write(pte)) {
+			pte = pte_wrprotect(pte);
+			pte = pte_set_flags(pte, _PAGE_SOFTW1);
+		} else
+			pte = pte_clear_flags(pte, _PAGE_SOFTW1);
+	} else if (pte_flags(pte) && _PAGE_SOFTW1)
+		pte = pte_mkwrite(pte);
+	*ptep = pte;
+
+	return 1;
+}
+
+/*
+ * Page fault error code bits:
+ *
+ *   bit 0 ==	 0: no page found	1: protection fault
+ *   bit 1 ==	 0: read access		1: write access
+ *   bit 2 ==	 0: kernel-mode access	1: user-mode access
+ *   bit 3 ==				1: use of reserved bit detected
+ *   bit 4 ==				1: fault was an instruction fetch
+ */
+enum x86_pf_error_code {
+	PF_PROT		=		1 << 0,
+	PF_WRITE	=		1 << 1,
+	PF_USER		=		1 << 2,
+	PF_RSVD		=		1 << 3,
+	PF_INSTR	=		1 << 4,
+};
+
+int wrprotect_page_fault_handler(unsigned long error_code)
+{
+	unsigned int level;
+	unsigned long pfn, addr;
+
+	/*
+	 * Handle only kernel-mode write access
+	 *
+	 * error_code must be:
+	 *  (1) PF_PROT
+	 *  (2) PF_WRITE
+	 *  (3) not PF_USER
+	 *  (4) not PF_RSVD
+	 *  (5) not PF_INSTR
+	 */
+	if (!(PF_PROT  & error_code) ||
+	    !(PF_WRITE & error_code) ||
+	     (PF_USER  & error_code) ||
+	     (PF_RSVD  & error_code) ||
+	     (PF_INSTR & error_code))
+		goto not_processed;
+
+	addr = (unsigned long)read_cr2();
+	addr = addr & PAGE_MASK;
+
+	if (addr >= PAGE_OFFSET && addr < PAGE_OFFSET + DIRECT_MAP_SIZE) {
+		pfn = __pa(addr) >> PAGE_SHIFT;
+	} else {
+		pfn = kernel_address_to_pfn(addr, &level);
+		if (pfn == 0 || level != PG_LEVEL_4K)
+			goto not_processed;
+	}
+
+	if (!test_bit(pfn, wrprotect_state.pgbmp_original))
+		goto not_processed;
+
+	if (test_and_clear_bit(pfn, wrprotect_state.pgbmp_save))
+		wrprotect_state.handle_page(pfn, addr, 0);
+
+	protect_pte(addr, 0);
+
+	return true;
+
+not_processed:
+	return false;
+}
+
+static int generic_page_walk_pmd(pmd_t *pmd, unsigned long addr, unsigned long next,
+	struct mm_walk *walk)
+{
+	if (WARN(pmd_large(*pmd), "livedump: Page=%016lx is consisted of 2M page.\n", addr))
+		return 0;
+
+	return 0;
+}
+
+static int sm_leader_page_walk_pte(pte_t *pte, unsigned long addr, unsigned long next,
+	struct mm_walk *walk)
+{
+	unsigned long pfn;
+
+	if (!pte || !pte_present(*pte))
+		return 0;
+
+	pfn = pte_pfn(*pte);
+
+	if (test_bit(pfn, wrprotect_state.pgbmp_original)) {
+		if (!protect_pte(addr, 1))
+			clear_bit(pfn, wrprotect_state.pgbmp_original);
+	}
+
+	return 0;
+}
+
+/* sm_leader
+ *
+ * Is executed by a leader CPU during stop-machine.
+ *
+ * This function does the following:
+ * (1)Handle pages that must not be write-protected.
+ * (2)Turn on the callback in the page fault handler.
+ * (3)Write-protect pages which are specified by the bitmap.
+ * (4)Flush TLB cache of the leader CPU.
+ */
+static int sm_leader(void *arg)
+{
+	int ret;
+	struct mm_walk_ops sm_leader_walk_ops;
+
+	memset(&sm_leader_walk_ops, 0, sizeof(struct mm_walk_ops));
+	sm_leader_walk_ops.pmd_entry = generic_page_walk_pmd;
+	sm_leader_walk_ops.pte_entry = sm_leader_page_walk_pte;
+
+	handle_sensitive_pages();
+
+	wrprotect_state.sm_init();
+
+	wrprotect_is_on = true;
+
+	mmap_write_lock(&init_mm);
+	ret = walk_page_range_novma(&init_mm, PAGE_OFFSET, PAGE_OFFSET + DIRECT_MAP_SIZE,
+	    &sm_leader_walk_ops, init_mm.pgd, NULL);
+	mmap_write_unlock(&init_mm);
+
+	if (ret)
+		return ret;
+
+	memcpy(wrprotect_state.pgbmp_save, wrprotect_state.pgbmp_original,
+			PGBMP_LEN);
+
+	__flush_tlb_all();
+
+	return 0;
+}
+
+/* sm_follower
+ *
+ * Is executed by follower CPUs during stop-machine.
+ * Flushes TLB cache of each CPU.
+ */
+static int sm_follower(void *arg)
+{
+	__flush_tlb_all();
+	return 0;
+}
+
+/* wrprotect_start
+ *
+ * This function sets up write protection on the kernel space during the
+ * stop-machine state.
+ */
+int wrprotect_start(void)
+{
+	int ret;
+
+	if (wrprotect_state.state != WRPROTECT_STATE_INITED) {
+		pr_warn("livedump: wrprotect isn't initialized yet.\n");
+		return 0;
+	}
+
+	ret = stop_machine_leader_follower(sm_leader, sm_follower, NULL);
+	if (WARN(ret, "livedump: Failed to protect pages w/errno=%d.\n", ret))
+		return ret;
+
+	wrprotect_state.state = WRPROTECT_STATE_STARTED;
+	return 0;
+}
+
+static int sweep_page_walk_pte(pte_t *pte, unsigned long addr, unsigned long next,
+	struct mm_walk *walk)
+{
+	unsigned long pfn;
+
+	if (!pte || !pte_present(*pte))
+		return 0;
+
+	pfn = pte_pfn(*pte);
+
+	if (test_and_clear_bit(pfn, wrprotect_state.pgbmp_save))
+		wrprotect_state.handle_page(pfn, addr, 1);
+	if (test_bit(pfn, wrprotect_state.pgbmp_original))
+		protect_pte(addr, 0);
+	if (!(pfn & 0xffUL))
+		cond_resched();
+
+	return 0;
+}
+
+/* wrprotect_sweep
+ *
+ * On every page specified by the bitmap, this function executes the following.
+ *  - Handle the page by calling wrprotect_state.handle_page.
+ *  - Unprotect the page by calling protect_page.
+ *
+ * The above work may be executed on the same page at the same time
+ * by the notifer-call-chain.
+ * test_and_clear_bit is used for exclusion control.
+ */
+int wrprotect_sweep(void)
+{
+	int ret;
+	struct mm_walk_ops sweep_walk_ops;
+
+	memset(&sweep_walk_ops, 0, sizeof(struct mm_walk_ops));
+	sweep_walk_ops.pmd_entry = generic_page_walk_pmd;
+	sweep_walk_ops.pte_entry = sweep_page_walk_pte;
+
+	if (wrprotect_state.state != WRPROTECT_STATE_STARTED) {
+		pr_warn("livedump: Pages aren't protected yet.\n");
+		return 0;
+	}
+
+	mmap_write_lock(&init_mm);
+	ret = walk_page_range_novma(&init_mm, PAGE_OFFSET, PAGE_OFFSET + DIRECT_MAP_SIZE,
+	    &sweep_walk_ops, init_mm.pgd, NULL);
+	mmap_write_unlock(&init_mm);
+	if (ret)
+		return ret;
+
+	wrprotect_state.state = WRPROTECT_STATE_SWEPT;
+	return 0;
+}
+
+/* wrprotect_create_page_bitmap
+ *
+ * This function creates bitmap of which each bit corresponds to physical page.
+ * Here, all ram pages are selected as being write-protected.
+ */
+static int wrprotect_create_page_bitmap(void)
+{
+	unsigned long pfn;
+
+	/* allocate on vmap area */
+	wrprotect_state.pgbmp_original = vzalloc(PGBMP_LEN);
+	if (!wrprotect_state.pgbmp_original)
+		return -ENOMEM;
+	wrprotect_state.pgbmp_save = vzalloc(PGBMP_LEN);
+	if (!wrprotect_state.pgbmp_original)
+		return -ENOMEM;
+
+	/* select all ram pages */
+	for (pfn = 0; pfn < max_pfn; pfn++) {
+		if (e820__mapped_any(pfn << PAGE_SHIFT,
+				    (pfn + 1) << PAGE_SHIFT,
+				    E820_TYPE_RAM))
+			set_bit(pfn, wrprotect_state.pgbmp_original);
+		if (!(pfn & 0xffUL))
+			cond_resched();
+	}
+
+	return 0;
+}
+
+/* wrprotect_destroy_page_bitmap
+ *
+ * This function frees both page bitmaps created by wrprotect_create_page_bitmap.
+ */
+static void wrprotect_destroy_page_bitmap(void)
+{
+	vfree(wrprotect_state.pgbmp_original);
+	vfree(wrprotect_state.pgbmp_save);
+	wrprotect_state.pgbmp_original = NULL;
+	wrprotect_state.pgbmp_save = NULL;
+}
+
+static void default_handle_page(unsigned long pfn, unsigned long addr, int for_sweep)
+{
+}
+
+/* wrprotect_init
+ *
+ * fn_handle_page:
+ *   This callback is invoked to handle faulting pages.
+ *   This function takes 3 arguments.
+ *   First one is PFN that tells where is this address physically located.
+ *   Second one is address that tells which page caused page fault.
+ *   Third one is a flag that tells whether it's called in the sweep phase.
+ */
+int wrprotect_init(fn_handle_page_t fn_handle_page, fn_sm_init_t fn_sm_init)
+{
+	int ret;
+
+	if (wrprotect_state.state != WRPROTECT_STATE_UNINIT) {
+		pr_warn("livedump: wrprotect is already initialized.\n");
+		return 0;
+	}
+
+	ret = wrprotect_create_page_bitmap();
+	if (ret < 0) {
+		pr_warn("livedump: not enough memory for wrprotect bitmaps\n");
+		return -ENOMEM;
+	}
+
+	/* split all large pages in straight mapping area */
+	ret = split_large_pages();
+	if (ret)
+		goto err;
+
+	/* unselect internal stuffs of wrprotect */
+	wrprotect_unselect_pages(
+			(unsigned long)&wrprotect_state, sizeof(wrprotect_state));
+	wrprotect_unselect_pages(
+			(unsigned long)wrprotect_state.pgbmp_original, PGBMP_LEN);
+	wrprotect_unselect_pages(
+			(unsigned long)wrprotect_state.pgbmp_save, PGBMP_LEN);
+
+	wrprotect_state.handle_page = fn_handle_page ?: default_handle_page;
+	wrprotect_state.sm_init = fn_sm_init;
+
+	wrprotect_state.state = WRPROTECT_STATE_INITED;
+	return 0;
+
+err:
+	return ret;
+}
+
+static int uninit_page_walk_pte(pte_t *pte, unsigned long addr, unsigned long next,
+	struct mm_walk *walk)
+{
+	unsigned long pfn;
+
+	if (!pte || !pte_present(*pte))
+		return 0;
+
+	pfn = pte_pfn(*pte);
+
+	if (!test_bit(pfn, wrprotect_state.pgbmp_original))
+		return 0;
+	protect_pte(addr, 0);
+	*pte = pte_clear_flags(*pte, _PAGE_SOFTW1);
+
+	if (!(pfn & 0xffUL))
+		cond_resched();
+
+	return 0;
+}
+
+void wrprotect_uninit(void)
+{
+	int ret;
+	struct mm_walk_ops uninit_walk_ops;
+
+	if (wrprotect_state.state == WRPROTECT_STATE_UNINIT)
+		return;
+
+	if (wrprotect_state.state == WRPROTECT_STATE_STARTED) {
+		memset(&uninit_walk_ops, 0, sizeof(struct mm_walk_ops));
+		uninit_walk_ops.pmd_entry = generic_page_walk_pmd;
+		uninit_walk_ops.pte_entry = uninit_page_walk_pte;
+
+		mmap_write_lock(&init_mm);
+		ret = walk_page_range_novma(&init_mm, PAGE_OFFSET, PAGE_OFFSET + DIRECT_MAP_SIZE,
+		    &uninit_walk_ops, init_mm.pgd, NULL);
+		mmap_write_unlock(&init_mm);
+
+		flush_tlb_all();
+	}
+
+	if (wrprotect_state.state >= WRPROTECT_STATE_STARTED)
+		wrprotect_is_on = false;
+
+	wrprotect_destroy_page_bitmap();
+
+	wrprotect_state.handle_page = NULL;
+	wrprotect_state.state = WRPROTECT_STATE_UNINIT;
+}
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [RFC PATCH 3/4 v1] livedump: Add memory dumping functionality
  2023-11-10 15:00 [RFC PATCH 0/4 v1] LPC materials: livedump Lukas Hruska
  2023-11-10 15:00 ` [RFC PATCH 1/4 v1] crash/vmcore: VMCOREINFO creation from non-kdump kernel Lukas Hruska
  2023-11-10 15:00 ` [RFC PATCH 2/4 v1] livedump: Add write protection management Lukas Hruska
@ 2023-11-10 15:00 ` Lukas Hruska
  2023-11-11 15:06   ` kernel test robot
  2023-11-11 19:19   ` kernel test robot
  2023-11-10 15:00 ` [RFC PATCH 4/4 v1] livedump: Add tools to make livedump creation easier Lukas Hruska
  3 siblings, 2 replies; 10+ messages in thread
From: Lukas Hruska @ 2023-11-10 15:00 UTC (permalink / raw
  To: linux-debuggers, linux-kernel; +Cc: Michal Koutny, YOSHIDA Masanori

From: YOSHIDA Masanori <masanori.yoshida.tv@hitachi.com>

This patch implements memory dumping of kernel space. Faulting pages are
temporarily pushed into kfifo and they are poped and dumped by kthread
dedicated to livedump. At the moment, supported target is only block
device like /dev/sdb.

Memory dumping is executed as follows:
(1) The handler function is invoked and:
  - It pops a buffer page from the kfifo "pool".
  - It copies a faulting page into the buffer page.
  - It pushes the buffer page into the kfifo "pend".
(2) The kthread pops the buffer page from the kfifo "pend" and submits
   bio to dump it.
(3) The endio returns the buffer page back to the kfifo "pool".

At the step (1), if the kfifo "pool" is empty, processing varies depending
on whether tha handler function is called in the sweep phase or not.
If it's in the sweep phase, the handler function waits until the kfifo
"pool" becomes available.
If not, the livedump simply fails.

For ELF format there are vmcore's function used which were already exported.
Those are written to the block device during finish phase where PFN 0 is used
to write the ELF's header and PFN 1 for VMCOREINFO string. These first 8KB of
the physical memory won't be ever saved because the first 1MB of memory is
reserved anyway.

Signed-off-by: YOSHIDA Masanori <masanori.yoshida.tv@hitachi.com>
Signed-off-by: Lukas Hruska <lhruska@suse.cz>
---
 arch/x86/Kconfig                |  15 +
 kernel/Makefile                 |   1 +
 kernel/livedump/Makefile        |   2 +
 kernel/livedump/core.c          | 268 +++++++++++++++++
 kernel/livedump/memdump.c       | 516 ++++++++++++++++++++++++++++++++
 kernel/livedump/memdump.h       |  34 +++
 kernel/livedump/memdump_trace.h |  30 ++
 7 files changed, 866 insertions(+)
 create mode 100644 kernel/livedump/Makefile
 create mode 100644 kernel/livedump/core.c
 create mode 100644 kernel/livedump/memdump.c
 create mode 100644 kernel/livedump/memdump.h
 create mode 100644 kernel/livedump/memdump_trace.h

diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index ef3550697be1..8f0a660a6bbf 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -2462,6 +2462,21 @@ config WRPROTECT
 
 	  If in doubt, say N.
 
+config LIVEDUMP
+   bool "Live Dump support"
+   depends on WRPROTECT
+   help
+     Set this option to 'Y' to allow the kernel support to acquire
+     a consistent snapshot of kernel space without stopping system.
+
+     This feature regularly causes small overhead on kernel.
+
+     Once this feature is initialized by its special ioctl, it
+     allocates huge memory for itself and causes much more overhead
+     on kernel.
+
+     If in doubt, say N.
+
 config MODIFY_LDT_SYSCALL
 	bool "Enable the LDT (local descriptor table)" if EXPERT
 	default y
diff --git a/kernel/Makefile b/kernel/Makefile
index 10ef068f598d..9368085e3817 100644
--- a/kernel/Makefile
+++ b/kernel/Makefile
@@ -47,6 +47,7 @@ obj-y += power/
 obj-y += printk/
 obj-y += irq/
 obj-y += rcu/
+obj-y += livedump/
 obj-y += livepatch/
 obj-y += dma/
 obj-y += entry/
diff --git a/kernel/livedump/Makefile b/kernel/livedump/Makefile
new file mode 100644
index 000000000000..e23f6f28e624
--- /dev/null
+++ b/kernel/livedump/Makefile
@@ -0,0 +1,2 @@
+# SPDX-License-Identifier: GPL-2.0-only
+obj-$(CONFIG_LIVEDUMP) += core.o memdump.o
diff --git a/kernel/livedump/core.c b/kernel/livedump/core.c
new file mode 100644
index 000000000000..fb90901fc1a1
--- /dev/null
+++ b/kernel/livedump/core.c
@@ -0,0 +1,268 @@
+// SPDX-License-Identifier: GPL-2.0-or-later
+/* core.c - Live Dump's main
+ * Copyright (C) 2012 Hitachi, Ltd.
+ * Copyright (C) 2023 SUSE
+ * Author: YOSHIDA Masanori <masanori.yoshida.tv@hitachi.com>
+ * Author: Lukas Hruska <lhruska@suse.cz>
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version 2
+ * of the License, or (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ */
+
+#include "memdump.h"
+#include <asm/wrprotect.h>
+
+#include <linux/kernel.h>
+#include <linux/fs.h>
+#include <linux/uaccess.h>
+#include <linux/miscdevice.h>
+#include <linux/printk.h>
+#include <linux/reboot.h>
+#include <linux/sysfs.h>
+#include <linux/memblock.h>
+
+#define DEVICE_NAME	"livedump"
+
+enum state {
+	LIVEDUMP_STATE_UNDEFINED,
+	LIVEDUMP_STATE_INIT,
+	LIVEDUMP_STATE_START,
+	LIVEDUMP_STATE_SWEEP,
+	LIVEDUMP_STATE_FINISH,
+	LIVEDUMP_STATE_UNINIT,
+};
+
+struct livedump_conf {
+	char bdevpath[PATH_MAX];
+} livedump_conf;
+
+enum state livedump_state;
+
+static void do_uninit(void)
+{
+	wrprotect_uninit();
+	livedump_memdump_uninit();
+}
+
+static int do_init(void)
+{
+	int ret;
+
+	if (strlen(livedump_conf.bdevpath) == 0) {
+		ret = -EINVAL;
+		goto err;
+	}
+
+	ret = wrprotect_init(livedump_memdump_handle_page, livedump_memdump_sm_init);
+	if (ret) {
+		pr_warn("livedump: Failed to initialize Protection manager.\n");
+		goto err;
+	}
+
+	ret = livedump_memdump_init(livedump_conf.bdevpath);
+	if (ret) {
+		pr_warn("livedump: Failed to initialize Dump manager.\n");
+		goto err;
+	}
+
+	return 0;
+err:
+	do_uninit();
+	return ret;
+}
+
+static long livedump_change_state(unsigned int cmd)
+{
+	long ret = 0;
+
+	if (cmd == LIVEDUMP_STATE_UNDEFINED) {
+		pr_warn("livedump: you cannot change the livedump state into LIVEDUMP_STATE_UNDEFINED.\n");
+		return -EINVAL;
+	}
+
+	/* All states except LIVEDUMP_STATE_UNINIT must have an output set. */
+	switch (cmd) {
+	case LIVEDUMP_STATE_UNINIT:
+		break;
+	default:
+		if (!strlen(livedump_conf.bdevpath)) {
+			pr_warn("livedump: The output must be set first before changing the state.\n");
+			return -EINVAL;
+		}
+	}
+
+	switch (cmd) {
+	case LIVEDUMP_STATE_INIT:
+		if (livedump_state != LIVEDUMP_STATE_UNDEFINED &&
+		    livedump_state != LIVEDUMP_STATE_UNINIT) {
+			pr_warn("livedump: To initialize a livedump the current state must be "
+			    "LIVEDUMP_STATE_UNDEFINED or LIVEDUMP_STATE_UNINIT.\n");
+			return -EINVAL;
+		}
+		ret = do_init();
+		break;
+	case LIVEDUMP_STATE_START:
+		if (livedump_state != LIVEDUMP_STATE_INIT) {
+			pr_warn("livedump: To start a livedump the current state must be "
+			    "LIVEDUMP_STATE_INIT.\n");
+			return -EINVAL;
+		}
+		ret = wrprotect_start();
+		break;
+	case LIVEDUMP_STATE_SWEEP:
+		if (livedump_state != LIVEDUMP_STATE_START) {
+			pr_warn("livedump: To start sweep functionality of livedump the current state must "
+			    "be LIVEDUMP_STATE_START.\n");
+			return -EINVAL;
+		}
+		ret = wrprotect_sweep();
+		break;
+	case LIVEDUMP_STATE_FINISH:
+		if (livedump_state != LIVEDUMP_STATE_SWEEP) {
+			pr_warn("livedump: To finish a livedump the current state must be "
+			    "LIVEDUMP_STATE_SWEEP.\n");
+			return -EINVAL;
+		}
+		livedump_memdump_write_elf_hdr();
+		break;
+	case LIVEDUMP_STATE_UNINIT:
+		if (livedump_state < LIVEDUMP_STATE_INIT) {
+			pr_warn("livedump: To uninitialize livedump the current state must be at least "
+			    "LIVEDUMP_STATE_INIT.\n");
+			return -EINVAL;
+		}
+		do_uninit();
+		break;
+	default:
+		return -ENOIOCTLCMD;
+	}
+
+	if (ret == 0)
+		livedump_state = cmd;
+
+	return ret;
+}
+
+/* sysfs */
+
+static struct kobject *livedump_root_kobj;
+
+static ssize_t state_store(struct kobject *kobj, struct kobj_attribute *attr,
+				const char *buf, size_t count)
+{
+	int new_state, ret;
+
+	ret = kstrtoint(buf, 10, &new_state);
+	if (ret < 0)
+		return -EINVAL;
+
+	if (new_state < LIVEDUMP_STATE_UNDEFINED || new_state > LIVEDUMP_STATE_UNINIT)
+		return -ENOIOCTLCMD;
+
+	ret = livedump_change_state(new_state);
+	if (ret < 0)
+		return ret;
+
+	livedump_state = new_state;
+	return count;
+}
+
+static ssize_t state_show(struct kobject *kobj,
+				struct kobj_attribute *attr, char *buf)
+{
+	ssize_t count = 0;
+
+	count += sprintf(buf, "%u\n\n", livedump_state);
+	count += sprintf(buf+count, "LIVEDUMP_STATE_UNDEFINED = 0\n");
+	count += sprintf(buf+count, "LIVEDUMP_STATE_INIT = 1\n");
+	count += sprintf(buf+count, "LIVEDUMP_STATE_START = 2\n");
+	count += sprintf(buf+count, "LIVEDUMP_STATE_SWEEP = 3\n");
+	count += sprintf(buf+count, "LIVEDUMP_STATE_FINISH = 4\n");
+	count += sprintf(buf+count, "LIVEDUMP_STATE_UNINIT = 5\n");
+	buf[count] = '\0';
+	return count;
+}
+
+static ssize_t output_store(struct kobject *kobj, struct kobj_attribute *attr,
+				const char *buf, size_t count)
+{
+	int len;
+
+	switch (livedump_state) {
+	case LIVEDUMP_STATE_UNDEFINED:
+	case LIVEDUMP_STATE_UNINIT:
+		break;
+	default:
+		pr_warn("livedump: you cannot change the output in current state of livedump.\n");
+		return -EINVAL;
+	}
+
+	len = strlcpy(livedump_conf.bdevpath, buf, sizeof(livedump_conf.bdevpath));
+	if (len == 0 || len >= sizeof(livedump_conf.bdevpath))
+		return -EINVAL;
+	/* remove the newline character */
+	livedump_conf.bdevpath[len-1] = '\0';	
+
+	return count;
+}
+
+static ssize_t output_show(struct kobject *kobj,
+				struct kobj_attribute *attr, char *buf)
+{
+	return sprintf(buf, "%s\n", livedump_conf.bdevpath);
+}
+
+static struct kobj_attribute state_kobj_attr = __ATTR_RW(state);
+static struct kobj_attribute output_kobj_attr = __ATTR_RW(output);
+static struct attribute *livedump_attrs[] = {
+	&state_kobj_attr.attr,
+	&output_kobj_attr.attr,
+	NULL
+};
+ATTRIBUTE_GROUPS(livedump);
+
+static int livedump_exit(struct notifier_block *_, unsigned long __, void *___)
+{
+	if (livedump_root_kobj)
+		kobject_put(livedump_root_kobj);
+	do_uninit();
+	return NOTIFY_DONE;
+}
+static struct notifier_block livedump_nb = {
+	.notifier_call = livedump_exit
+};
+
+static int __init livedump_init(void)
+{
+	int ret;
+
+	livedump_root_kobj = kobject_create_and_add("livedump", kernel_kobj);
+	if (!livedump_root_kobj)
+		return -ENOMEM;
+
+	ret = sysfs_create_group(livedump_root_kobj, *livedump_groups);
+	if (ret) {
+		livedump_exit(NULL, 0, NULL);
+		return ret;
+	}
+
+	ret = register_reboot_notifier(&livedump_nb);
+	if (WARN_ON(ret)) {
+		livedump_exit(NULL, 0, NULL);
+		return ret;
+	}
+
+	livedump_conf.bdevpath[0] = '\0';
+	livedump_state = 0;
+
+	return 0;
+}
+
+module_init(livedump_init);
diff --git a/kernel/livedump/memdump.c b/kernel/livedump/memdump.c
new file mode 100644
index 000000000000..1df413ba8e12
--- /dev/null
+++ b/kernel/livedump/memdump.c
@@ -0,0 +1,516 @@
+// SPDX-License-Identifier: GPL-2.0-or-later
+/* memdump.c - Live Dump's memory dumping management
+ * Copyright (C) 2012 Hitachi, Ltd.
+ * Copyright (C) 2023 SUSE
+ * Author: YOSHIDA Masanori <masanori.yoshida.tv@hitachi.com>
+ * Author: Lukas Hruska <lhruska@suse.cz>
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version 2
+ * of the License, or (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ */
+
+#include "memdump.h"
+
+#define CREATE_TRACE_POINTS
+#include "memdump_trace.h"
+
+#include <asm/wrprotect.h>
+
+#include <linux/crash_core.h>
+#include <linux/crash_dump.h>
+#include <linux/kthread.h>
+#include <linux/slab.h>
+#include <linux/kexec.h>
+#include <linux/kfifo.h>
+#include <linux/delay.h>
+#include <linux/bio.h>
+#include <linux/blkdev.h>
+#include <linux/sizes.h>
+#include <linux/printk.h>
+#include <linux/tracepoint.h>
+
+#define MEMDUMP_KFIFO_SIZE	131072 /* in pages */
+#define SECTOR_SHIFT		9
+#define PFN_ELF_0			0
+#define PFN_ELF_1			1
+
+static const char THREAD_NAME[] = "livedump";
+static struct block_device *memdump_bdev;
+
+/* ELF metadata */
+static unsigned char *vmcoreinfo;
+static void *elf_data;
+static unsigned long elf_size;
+static struct crash_mem *cmem;
+
+/* ELF modification */
+static char *elfnotes_buf;
+static size_t elfnotes_sz;
+
+/***** State machine *****/
+enum MEMDUMP_STATE {
+	_MEMDUMP_INIT,
+	MEMDUMP_INACTIVE = _MEMDUMP_INIT,
+	MEMDUMP_ACTIVATING,
+	MEMDUMP_ACTIVE,
+	MEMDUMP_INACTIVATING,
+	_MEMDUMP_OVERFLOW,
+};
+
+static struct memdump_state {
+	atomic_t val;
+	atomic_t count;
+	spinlock_t lock;
+} __aligned(PAGE_SIZE) memdump_state = {
+	ATOMIC_INIT(_MEMDUMP_INIT),
+	ATOMIC_INIT(0),
+	__SPIN_LOCK_INITIALIZER(memdump_state.lock),
+};
+
+/* memdump_state_inc
+ *
+ * Increments ACTIVE state refcount.
+ * The refcount must be zero to transit to next state (INACTIVATING).
+ */
+static bool memdump_state_inc(void)
+{
+	bool ret;
+
+	spin_lock(&memdump_state.lock);
+	ret = (atomic_read(&memdump_state.val) == MEMDUMP_ACTIVE);
+	if (ret)
+		atomic_inc(&memdump_state.count);
+	spin_unlock(&memdump_state.lock);
+	return ret;
+}
+
+/* memdump_state_dec
+ *
+ * Decrements ACTIVE state refcount
+ */
+static void memdump_state_dec(void)
+{
+	atomic_dec(&memdump_state.count);
+}
+
+/* memdump_state_transit
+ *
+ * Transit to next state.
+ * If current state isn't assumed state, transition fails.
+ */
+static bool memdump_state_transit(enum MEMDUMP_STATE assumed)
+{
+	bool ret;
+
+	spin_lock(&memdump_state.lock);
+	ret = (atomic_read(&memdump_state.val) == assumed &&
+		atomic_read(&memdump_state.count) == 0);
+	if (ret) {
+		atomic_inc(&memdump_state.val);
+		if (atomic_read(&memdump_state.val) == _MEMDUMP_OVERFLOW)
+			atomic_set(&memdump_state.val, _MEMDUMP_INIT);
+	}
+	spin_unlock(&memdump_state.lock);
+	return ret;
+}
+
+static void memdump_state_transit_back(void)
+{
+	atomic_dec(&memdump_state.val);
+}
+
+/***** Request queue *****/
+
+/*
+ * Request queue consists of 2 kfifos: pend, pool
+ *
+ * Processing between the two kfifos:
+ *  (1)handle_page READs one request from POOL.
+ *  (2)handle_page makes the request and WRITEs it to PEND.
+ *  (3)kthread READs the request from PEND and submits bio.
+ *  (4)endio WRITEs the request to POOL.
+ *
+ * kfifo permits parallel access by 1 reader and 1 writer.
+ * Therefore, (1), (2) and (4) must be serialized.
+ * (3) need not be protected since livedump uses only one kthread.
+ *
+ * (1) is protected by pool_r_lock.
+ * (2) is protected by pend_w_lock.
+ * (4) is protected by pool_w_lock.
+ */
+
+struct memdump_request {
+	void *p; /* pointing to buffer (one page) */
+	unsigned long pfn;
+};
+
+static struct memdump_request_queue {
+	void *pages[MEMDUMP_KFIFO_SIZE];
+
+	STRUCT_KFIFO(struct memdump_request, MEMDUMP_KFIFO_SIZE) pool;
+	STRUCT_KFIFO(struct memdump_request, MEMDUMP_KFIFO_SIZE) pend;
+
+	spinlock_t pool_w_lock;
+	spinlock_t pool_r_lock;
+	spinlock_t pend_w_lock;
+} __aligned(PAGE_SIZE) memdump_req_queue, memdump_req_queue_for_sweep;
+
+static void free_req_queue(void)
+{
+	int i;
+
+	for (i = 0; i < MEMDUMP_KFIFO_SIZE; i++) {
+		if (memdump_req_queue.pages[i]) {
+			free_page((unsigned long)memdump_req_queue.pages[i]);
+			memdump_req_queue.pages[i] = NULL;
+		}
+	}
+	for (i = 0; i < MEMDUMP_KFIFO_SIZE; i++) {
+		if (memdump_req_queue_for_sweep.pages[i]) {
+			free_page((unsigned long)memdump_req_queue_for_sweep.pages[i]);
+			memdump_req_queue_for_sweep.pages[i] = NULL;
+		}
+	}
+}
+
+static long alloc_req_queue(void)
+{
+	long ret;
+	int i;
+	struct memdump_request req;
+
+	/* initialize spinlocks */
+	spin_lock_init(&memdump_req_queue.pool_w_lock);
+	spin_lock_init(&memdump_req_queue.pool_r_lock);
+	spin_lock_init(&memdump_req_queue.pend_w_lock);
+	spin_lock_init(&memdump_req_queue_for_sweep.pool_w_lock);
+	spin_lock_init(&memdump_req_queue_for_sweep.pool_r_lock);
+	spin_lock_init(&memdump_req_queue_for_sweep.pend_w_lock);
+
+	/* initialize kfifos */
+	INIT_KFIFO(memdump_req_queue.pend);
+	INIT_KFIFO(memdump_req_queue.pool);
+	INIT_KFIFO(memdump_req_queue_for_sweep.pend);
+	INIT_KFIFO(memdump_req_queue_for_sweep.pool);
+
+	/* allocate pages and push pages into pool */
+	for (i = 0; i < MEMDUMP_KFIFO_SIZE; i++) {
+		/* for normal queue */
+		memdump_req_queue.pages[i]
+			= (void *)__get_free_page(GFP_KERNEL);
+		if (!memdump_req_queue.pages[i]) {
+			ret = -ENOMEM;
+			goto err;
+		}
+
+		req.p = memdump_req_queue.pages[i];
+		ret = kfifo_put(&memdump_req_queue.pool, req);
+		BUG_ON(!ret);
+
+		/* for sweep queue */
+		memdump_req_queue_for_sweep.pages[i]
+			= (void *)__get_free_page(GFP_KERNEL);
+		if (!memdump_req_queue_for_sweep.pages[i]) {
+			ret = -ENOMEM;
+			goto err;
+		}
+
+		req.p = memdump_req_queue_for_sweep.pages[i];
+		ret = kfifo_put(&memdump_req_queue_for_sweep.pool, req);
+		BUG_ON(!ret);
+	}
+
+	return 0;
+
+err:
+	free_req_queue();
+	return ret;
+}
+
+/***** Kernel thread *****/
+static struct memdump_thread {
+	struct task_struct *tsk;
+	bool is_active;
+	struct completion completion;
+	wait_queue_head_t waiters;
+} __aligned(PAGE_SIZE) memdump_thread;
+
+static int memdump_thread_func(void *);
+
+static long start_memdump_thread(void)
+{
+	memdump_thread.is_active = true;
+	init_completion(&memdump_thread.completion);
+	init_waitqueue_head(&memdump_thread.waiters);
+	memdump_thread.tsk = kthread_run(
+			memdump_thread_func, NULL, THREAD_NAME);
+	if (IS_ERR(memdump_thread.tsk))
+		return PTR_ERR(memdump_thread.tsk);
+	return 0;
+}
+
+static void stop_memdump_thread(void)
+{
+	memdump_thread.is_active = false;
+	wait_for_completion(&memdump_thread.completion);
+}
+
+static void memdump_endio(struct bio *bio)
+{
+	struct memdump_request req = { .p = page_address(bio_page(bio)) };
+	struct memdump_request_queue *queue = (bio->bi_private ?
+			&memdump_req_queue_for_sweep : &memdump_req_queue);
+
+	spin_lock(&queue->pool_w_lock);
+	kfifo_put(&queue->pool, req);
+	spin_unlock(&queue->pool_w_lock);
+
+	wake_up(&memdump_thread.waiters);
+}
+
+static int memdump_thread_func(void *_)
+{
+	struct bio *bio;
+	struct memdump_request req;
+
+	do {
+		/* Process request */
+		while (kfifo_get(&memdump_req_queue.pend, &req)) {
+			bio = bio_alloc(memdump_bdev, 1, REQ_OP_WRITE, GFP_KERNEL);
+
+			if (WARN_ON(!bio)) {
+				spin_lock(&memdump_req_queue.pool_w_lock);
+				kfifo_put(&memdump_req_queue.pool, req);
+				spin_unlock(&memdump_req_queue.pool_w_lock);
+				continue;
+			}
+
+			bio->bi_bdev = memdump_bdev;
+			bio->bi_end_io = memdump_endio;
+			bio->bi_iter.bi_sector = req.pfn << (PAGE_SHIFT - SECTOR_SHIFT);
+			bio_add_page(bio, virt_to_page(req.p), PAGE_SIZE, 0);
+
+			trace_memdump_bio_submit(memdump_bdev, req.pfn);
+
+			submit_bio(bio);
+		}
+
+		/* Process request for sweep*/
+		while (kfifo_get(&memdump_req_queue_for_sweep.pend, &req)) {
+			bio = bio_alloc(memdump_bdev, 1, REQ_OP_WRITE, GFP_KERNEL);
+
+			if (WARN_ON(!bio)) {
+				spin_lock(&memdump_req_queue_for_sweep.pool_w_lock);
+				kfifo_put(&memdump_req_queue_for_sweep.pool, req);
+				spin_unlock(&memdump_req_queue_for_sweep.pool_w_lock);
+				continue;
+			}
+
+			bio->bi_bdev = memdump_bdev;
+			bio->bi_end_io = memdump_endio;
+			bio->bi_iter.bi_sector = req.pfn << (PAGE_SHIFT - SECTOR_SHIFT);
+			bio->bi_private = (void *)1; /* for sweep */
+			bio_add_page(bio, virt_to_page(req.p), PAGE_SIZE, 0);
+
+			trace_memdump_bio_submit(memdump_bdev, req.pfn);
+
+			submit_bio(bio);
+		}
+
+		msleep(20);
+	} while (memdump_thread.is_active);
+
+	complete(&memdump_thread.completion);
+	return 0;
+}
+
+static int select_pages(void);
+
+int livedump_memdump_init(const char *bdevpath)
+{
+	long ret;
+
+	if (WARN(!memdump_state_transit(MEMDUMP_INACTIVE),
+				"livedump: memdump is already initialized.\n"))
+		return -EBUSY;
+
+	/* Get bdev */
+	ret = -ENOENT;
+	memdump_bdev = blkdev_get_by_path(bdevpath, FMODE_EXCL, &memdump_bdev);
+	if (memdump_bdev < 0)
+		goto err;
+
+	/* Allocate request queue */
+	ret = alloc_req_queue();
+	if (ret)
+		goto err_bdev;
+
+	/* Start thread */
+	ret = start_memdump_thread();
+	if (ret)
+		goto err_freeq;
+
+	/* Select target pages */
+	select_pages();
+
+	/* Allocate space for vmcore info */
+	vmcoreinfo = vmalloc(PAGE_SIZE);
+	cmem = vzalloc(struct_size(cmem, ranges, 1));
+	if (WARN_ON(!vmcoreinfo || !cmem))
+		return -ENOMEM;
+
+	memdump_state_transit(MEMDUMP_ACTIVATING); /* always succeeds */
+	return 0;
+
+err_freeq:
+	free_req_queue();
+err_bdev:
+	blkdev_put(memdump_bdev, FMODE_EXCL);
+err:
+	memdump_state_transit_back();
+	return ret;
+}
+
+void livedump_memdump_uninit(void)
+{
+	if (!memdump_state_transit(MEMDUMP_ACTIVE))
+		return;
+
+	/* Stop thread */
+	stop_memdump_thread();
+
+	/* Free request queue */
+	free_req_queue();
+
+	/* Free vmcoreinfo */
+	if (vmcoreinfo)
+		vunmap(vmcoreinfo);
+	if (cmem)
+		vfree(cmem);
+
+	/* merged notes */
+	if (elfnotes_buf)
+		vfree(elfnotes_buf);
+
+	/* Put bdev */
+	blkdev_put(memdump_bdev, FMODE_EXCL);
+
+	memdump_state_transit(MEMDUMP_INACTIVATING); /* always succeeds */
+}
+
+void livedump_memdump_handle_page(unsigned long pfn, unsigned long addr, int for_sweep)
+{
+	int ret;
+	unsigned long flags;
+	struct memdump_request req;
+	struct memdump_request_queue *queue =
+		(for_sweep ? &memdump_req_queue_for_sweep : &memdump_req_queue);
+	DEFINE_WAIT(wait);
+
+	BUG_ON(addr & ~PAGE_MASK);
+
+	if (!memdump_state_inc())
+		return;
+
+	/* Get buffer */
+retry_after_wait:
+	spin_lock_irqsave(&queue->pool_r_lock, flags);
+	ret = kfifo_get(&queue->pool, &req);
+	spin_unlock_irqrestore(&queue->pool_r_lock, flags);
+
+	if (!ret) {
+		if (WARN_ON_ONCE(!for_sweep))
+			goto err;
+		else {
+			prepare_to_wait(&memdump_thread.waiters, &wait,
+					TASK_UNINTERRUPTIBLE);
+			schedule();
+			finish_wait(&memdump_thread.waiters, &wait);
+			goto retry_after_wait;
+		}
+	}
+
+	/* Make request */
+	req.pfn = pfn;
+	if (pfn == PFN_ELF_0) {
+		memcpy(req.p, elf_data, elf_size);
+		memset(req.p + elf_size, 0, PAGE_SIZE - elf_size);
+	} else if (pfn == PFN_ELF_1)
+		memcpy(req.p, elfnotes_buf, PAGE_SIZE);
+	else
+		memcpy(req.p, (void *)addr, PAGE_SIZE);
+
+	/* Queue request */
+	spin_lock_irqsave(&queue->pend_w_lock, flags);
+	kfifo_put(&queue->pend, req);
+	spin_unlock_irqrestore(&queue->pend_w_lock, flags);
+
+err:
+	memdump_state_dec();
+}
+
+/* select_pages
+ *
+ * Eliminate pages that contain memdump's stuffs from bitmap.
+ */
+static int select_pages(void)
+{
+	unsigned long i;
+
+	/* Unselect memdump stuffs */
+	wrprotect_unselect_pages(
+			(unsigned long)&memdump_state, sizeof(memdump_state));
+	wrprotect_unselect_pages(
+			(unsigned long)&memdump_req_queue,
+			sizeof(memdump_req_queue));
+	wrprotect_unselect_pages(
+			(unsigned long)&memdump_req_queue_for_sweep,
+			sizeof(memdump_req_queue_for_sweep));
+	wrprotect_unselect_pages(
+			(unsigned long)&memdump_thread, sizeof(memdump_thread));
+	for (i = 0; i < MEMDUMP_KFIFO_SIZE; i++) {
+		wrprotect_unselect_pages((unsigned long)memdump_req_queue.pages[i],
+		    PAGE_ALIGN(sizeof(struct memdump_request)));
+		wrprotect_unselect_pages((unsigned long)memdump_req_queue_for_sweep.pages[i],
+		    PAGE_ALIGN(sizeof(struct memdump_request)));
+		cond_resched();
+	}
+
+	return 0;
+}
+
+void livedump_memdump_sm_init(void)
+{
+	unsigned int cpu;
+	struct pt_regs regs;
+
+	memset(&regs, 0, sizeof(struct pt_regs));
+	regs.ip = (unsigned long)memdump_thread_func;
+
+	for_each_present_cpu(cpu) {
+		crash_save_cpu(&regs, cpu);
+	}
+
+	cmem->max_nr_ranges = 1;
+	cmem->nr_ranges = 1;
+	cmem->ranges[0].start = SZ_1M;
+	cmem->ranges[0].end = ((max_pfn + 1) << PAGE_SHIFT) - 1;
+	crash_update_vmcoreinfo_safecopy(vmcoreinfo);
+	crash_save_vmcoreinfo();
+	crash_prepare_elf64_headers(cmem, 1, &elf_data, &elf_size);
+	crash_update_vmcoreinfo_safecopy(NULL);
+	merge_note_headers_elf64((char *)elf_data, &elf_size, &elfnotes_buf, &elfnotes_sz);
+}
+
+void livedump_memdump_write_elf_hdr(void)
+{
+	livedump_memdump_handle_page(PFN_ELF_0, 0, 1);
+	livedump_memdump_handle_page(PFN_ELF_1, 0, 1);
+}
diff --git a/kernel/livedump/memdump.h b/kernel/livedump/memdump.h
new file mode 100644
index 000000000000..9df9b2fe9ae9
--- /dev/null
+++ b/kernel/livedump/memdump.h
@@ -0,0 +1,34 @@
+/* SPDX-License-Identifier: GPL-2.0-or-later */
+/* memdump.h - Live Dump's memory dumping management
+ * Copyright (C) 2012 Hitachi, Ltd.
+ * Copyright (C) 2023 SUSE
+ * Author: YOSHIDA Masanori <masanori.yoshida.tv@hitachi.com>
+ * Author: Lukas Hruska <lhruska@suse.cz>
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version 2
+ * of the License, or (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ */
+
+#ifndef _LIVEDUMP_MEMDUMP_H
+#define _LIVEDUMP_MEMDUMP_H
+
+#include <linux/fs.h>
+
+extern int livedump_memdump_init(const char *bdevpath);
+
+extern void livedump_memdump_uninit(void);
+
+extern void livedump_memdump_handle_page(unsigned long pfn, unsigned long addr, int for_sweep);
+
+extern void livedump_memdump_sm_init(void);
+
+extern void livedump_memdump_write_elf_hdr(void);
+
+#endif /* _LIVEDUMP_MEMDUMP_H */
diff --git a/kernel/livedump/memdump_trace.h b/kernel/livedump/memdump_trace.h
new file mode 100644
index 000000000000..cd9fe08b034f
--- /dev/null
+++ b/kernel/livedump/memdump_trace.h
@@ -0,0 +1,30 @@
+/* SPDX-License-Identifier: GPL-2.0-or-later */
+#undef TRACE_SYSTEM
+#define TRACE_SYSTEM livedump
+
+#if !defined(_TRACE_LIVEDUMP_H) || defined(TRACE_HEADER_MULTI_READ)
+#define _TRACE_LIVEDUMP_H
+
+#include <linux/blk_types.h>
+#include <linux/tracepoint.h>
+
+TRACE_EVENT(memdump_bio_submit,
+	TP_PROTO(struct block_device *bdev, unsigned long pfn),
+	TP_ARGS(bdev, pfn),
+	TP_STRUCT__entry(
+		__field(struct block_device *, bdev)
+		__field(unsigned long, pfn)
+	),
+	TP_fast_assign(
+		if (bdev != NULL)
+			__entry->bdev = bdev;
+		__entry->pfn = pfn;
+	),
+	TP_printk("bdev=%u, pfn=%lu", __entry->bdev->bd_dev, __entry->pfn)
+);
+#endif /* _TRACE_LIVEDUMP_H */
+
+#undef TRACE_INCLUDE_PATH
+#define TRACE_INCLUDE_PATH ../../kernel/livedump/
+#define TRACE_INCLUDE_FILE memdump_trace
+#include <trace/define_trace.h>
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [RFC PATCH 4/4 v1] livedump: Add tools to make livedump creation easier
  2023-11-10 15:00 [RFC PATCH 0/4 v1] LPC materials: livedump Lukas Hruska
                   ` (2 preceding siblings ...)
  2023-11-10 15:00 ` [RFC PATCH 3/4 v1] livedump: Add memory dumping functionality Lukas Hruska
@ 2023-11-10 15:00 ` Lukas Hruska
  3 siblings, 0 replies; 10+ messages in thread
From: Lukas Hruska @ 2023-11-10 15:00 UTC (permalink / raw
  To: linux-debuggers, linux-kernel; +Cc: Michal Koutny, YOSHIDA Masanori

Add tool wrapping all handling of livedump's sysfs. Add tool extracting the
dump from block device first checking the correct size of dumped memory. This
way the extraction can be done even after resizing the physical memory.

Signed-off-by: Lukas Hruska <lhruska@suse.cz>
---
 tools/livedump/livedump.sh         | 44 ++++++++++++++++++++++++++++++
 tools/livedump/livedump_extract.sh | 19 +++++++++++++
 2 files changed, 63 insertions(+)
 create mode 100755 tools/livedump/livedump.sh
 create mode 100755 tools/livedump/livedump_extract.sh

diff --git a/tools/livedump/livedump.sh b/tools/livedump/livedump.sh
new file mode 100755
index 000000000000..2cc67bbbc380
--- /dev/null
+++ b/tools/livedump/livedump.sh
@@ -0,0 +1,44 @@
+#!/bin/bash
+# SPDX-License-Identifier: GPL-2.0-or-later
+#
+# This is a wrapper for livedump's sysfs to make a complete memdump.
+# Usage: livedump block_device
+#
+# Author: Lukas Hruska <lhruska@suse.cz>
+#
+# This file has been put into the public domain.
+# You can do whatever you want with this file.
+#
+
+if [ $# -ne 1 ]; then
+	>&2 echo "Usage: livedump block_device"
+	>&2 echo "Not enough arugments"
+	exit 1
+fi
+
+DEV=$1
+
+write_and_check() {
+	NAME=$1
+	VAL=$2
+	PATH=$3
+
+	echo -n "$NAME: "
+	echo $VAL > $PATH
+	if [ $? -ne 0 ]; then
+		exit 1
+	fi
+	echo "OK"
+}
+
+CUR_STATE=`head -n 1 /sys/kernel/livedump/state`
+if [ $CUR_STATE -ne 0 ] && [ $CUR_STATE -ne 5 ]; then
+	write_and_check "reset" 5 /sys/kernel/livedump/state
+fi
+
+write_and_check "device" $DEV /sys/kernel/livedump/output
+write_and_check "init" 1 /sys/kernel/livedump/state
+write_and_check "start" 2 /sys/kernel/livedump/state
+write_and_check "sweep" 3 /sys/kernel/livedump/state
+write_and_check "finish" 4 /sys/kernel/livedump/state
+write_and_check "uninit" 5 /sys/kernel/livedump/state
diff --git a/tools/livedump/livedump_extract.sh b/tools/livedump/livedump_extract.sh
new file mode 100755
index 000000000000..c1dc69da7559
--- /dev/null
+++ b/tools/livedump/livedump_extract.sh
@@ -0,0 +1,19 @@
+#!/bin/bash
+# SPDX-License-Identifier: GPL-2.0-or-later
+#
+# This script extracts the ELF formatted livedump from block device with correct size.
+# Usage: livedump_extract block_device output_file
+#
+# Author: Lukas Hruska <lhruska@suse.cz>
+#
+# This file has been put into the public domain.
+# You can do whatever you want with this file.
+#
+device=$1
+output=$2
+
+head -c 4096 $device > /tmp/livedump_hdr
+size=$(readelf -l /tmp/livedump_hdr | tail -2 | tr '\n' ' ' | tr -s ' ' \
+	| cut -d ' ' -f 5,6 | xargs printf "%d + %d" | xargs expr)
+size=$(expr $size / 4096)
+dd if=$device of=$output count=$size bs=4096 status=progress
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [RFC PATCH 1/4 v1] crash/vmcore: VMCOREINFO creation from non-kdump kernel
  2023-11-10 15:00 ` [RFC PATCH 1/4 v1] crash/vmcore: VMCOREINFO creation from non-kdump kernel Lukas Hruska
@ 2023-11-10 21:56   ` kernel test robot
  2023-11-10 23:26   ` kernel test robot
  1 sibling, 0 replies; 10+ messages in thread
From: kernel test robot @ 2023-11-10 21:56 UTC (permalink / raw
  To: Lukas Hruska; +Cc: oe-kbuild-all

Hi Lukas,

[This is a private test report for your RFC patch.]
kernel test robot noticed the following build warnings:

[auto build test WARNING on tip/x86/core]
[also build test WARNING on tip/x86/mm linus/master v6.6 next-20231110]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]

url:    https://github.com/intel-lab-lkp/linux/commits/Lukas-Hruska/crash-vmcore-VMCOREINFO-creation-from-non-kdump-kernel/20231111-022332
base:   tip/x86/core
patch link:    https://lore.kernel.org/r/20231110150057.15717-2-lhruska%40suse.cz
patch subject: [RFC PATCH 1/4 v1] crash/vmcore: VMCOREINFO creation from non-kdump kernel
config: loongarch-randconfig-002-20231111 (https://download.01.org/0day-ci/archive/20231111/202311110545.vOE0m9Y8-lkp@intel.com/config)
compiler: loongarch64-linux-gcc (GCC) 13.2.0
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20231111/202311110545.vOE0m9Y8-lkp@intel.com/reproduce)

If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202311110545.vOE0m9Y8-lkp@intel.com/

All warnings (new ones prefixed by >>):

   kernel/crash_dump.c: In function 'elfcorehdr_read_notes':
   kernel/crash_dump.c:77:25: error: implicit declaration of function 'cc_platform_has' [-Werror=implicit-function-declaration]
      77 |                         cc_platform_has(CC_ATTR_MEM_ENCRYPT));
         |                         ^~~~~~~~~~~~~~~
   kernel/crash_dump.c:77:41: error: 'CC_ATTR_MEM_ENCRYPT' undeclared (first use in this function)
      77 |                         cc_platform_has(CC_ATTR_MEM_ENCRYPT));
         |                                         ^~~~~~~~~~~~~~~~~~~
   kernel/crash_dump.c:77:41: note: each undeclared identifier is reported only once for each function it appears in
>> kernel/crash_dump.c:78:1: warning: control reaches end of non-void function [-Wreturn-type]
      78 | }
         | ^
   cc1: some warnings being treated as errors


vim +78 kernel/crash_dump.c

    60	
    61	/*
    62	 * Architectures may override this function to read from notes sections
    63	 */
    64	ssize_t __weak elfcorehdr_read_notes(char *buf, size_t count, u64 *ppos)
    65	{
    66		struct kvec kvec = { .iov_base = buf, .iov_len = count };
    67		struct iov_iter iter;
    68	
    69		if (!is_kdump_kernel()) {
    70			memcpy(buf, __va(*ppos), count);
    71			return count;
    72		}
    73	
    74		iov_iter_kvec(&iter, ITER_DEST, &kvec, 1, count);
    75	
    76		return read_from_oldmem(&iter, count, ppos,
  > 77				cc_platform_has(CC_ATTR_MEM_ENCRYPT));
  > 78	}
    79	

-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [RFC PATCH 1/4 v1] crash/vmcore: VMCOREINFO creation from non-kdump kernel
  2023-11-10 15:00 ` [RFC PATCH 1/4 v1] crash/vmcore: VMCOREINFO creation from non-kdump kernel Lukas Hruska
  2023-11-10 21:56   ` kernel test robot
@ 2023-11-10 23:26   ` kernel test robot
  1 sibling, 0 replies; 10+ messages in thread
From: kernel test robot @ 2023-11-10 23:26 UTC (permalink / raw
  To: Lukas Hruska; +Cc: oe-kbuild-all

Hi Lukas,

[This is a private test report for your RFC patch.]
kernel test robot noticed the following build errors:

[auto build test ERROR on tip/x86/core]
[also build test ERROR on tip/x86/mm linus/master v6.6 next-20231110]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]

url:    https://github.com/intel-lab-lkp/linux/commits/Lukas-Hruska/crash-vmcore-VMCOREINFO-creation-from-non-kdump-kernel/20231111-022332
base:   tip/x86/core
patch link:    https://lore.kernel.org/r/20231110150057.15717-2-lhruska%40suse.cz
patch subject: [RFC PATCH 1/4 v1] crash/vmcore: VMCOREINFO creation from non-kdump kernel
config: loongarch-randconfig-002-20231111 (https://download.01.org/0day-ci/archive/20231111/202311110746.PACzMVRq-lkp@intel.com/config)
compiler: loongarch64-linux-gcc (GCC) 13.2.0
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20231111/202311110746.PACzMVRq-lkp@intel.com/reproduce)

If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202311110746.PACzMVRq-lkp@intel.com/

All errors (new ones prefixed by >>):

   kernel/crash_dump.c: In function 'elfcorehdr_read_notes':
>> kernel/crash_dump.c:77:25: error: implicit declaration of function 'cc_platform_has' [-Werror=implicit-function-declaration]
      77 |                         cc_platform_has(CC_ATTR_MEM_ENCRYPT));
         |                         ^~~~~~~~~~~~~~~
>> kernel/crash_dump.c:77:41: error: 'CC_ATTR_MEM_ENCRYPT' undeclared (first use in this function)
      77 |                         cc_platform_has(CC_ATTR_MEM_ENCRYPT));
         |                                         ^~~~~~~~~~~~~~~~~~~
   kernel/crash_dump.c:77:41: note: each undeclared identifier is reported only once for each function it appears in
   kernel/crash_dump.c:78:1: warning: control reaches end of non-void function [-Wreturn-type]
      78 | }
         | ^
   cc1: some warnings being treated as errors


vim +/cc_platform_has +77 kernel/crash_dump.c

    60	
    61	/*
    62	 * Architectures may override this function to read from notes sections
    63	 */
    64	ssize_t __weak elfcorehdr_read_notes(char *buf, size_t count, u64 *ppos)
    65	{
    66		struct kvec kvec = { .iov_base = buf, .iov_len = count };
    67		struct iov_iter iter;
    68	
    69		if (!is_kdump_kernel()) {
    70			memcpy(buf, __va(*ppos), count);
    71			return count;
    72		}
    73	
    74		iov_iter_kvec(&iter, ITER_DEST, &kvec, 1, count);
    75	
    76		return read_from_oldmem(&iter, count, ppos,
  > 77				cc_platform_has(CC_ATTR_MEM_ENCRYPT));
    78	}
    79	

-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [RFC PATCH 2/4 v1] livedump: Add write protection management
  2023-11-10 15:00 ` [RFC PATCH 2/4 v1] livedump: Add write protection management Lukas Hruska
@ 2023-11-11  5:20   ` kernel test robot
  0 siblings, 0 replies; 10+ messages in thread
From: kernel test robot @ 2023-11-11  5:20 UTC (permalink / raw
  To: Lukas Hruska; +Cc: oe-kbuild-all

Hi Lukas,

[This is a private test report for your RFC patch.]
kernel test robot noticed the following build warnings:

[auto build test WARNING on tip/x86/core]
[also build test WARNING on tip/x86/mm linus/master v6.6 next-20231110]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]

url:    https://github.com/intel-lab-lkp/linux/commits/Lukas-Hruska/crash-vmcore-VMCOREINFO-creation-from-non-kdump-kernel/20231111-022332
base:   tip/x86/core
patch link:    https://lore.kernel.org/r/20231110150057.15717-3-lhruska%40suse.cz
patch subject: [RFC PATCH 2/4 v1] livedump: Add write protection management
config: x86_64-allyesconfig (https://download.01.org/0day-ci/archive/20231111/202311111350.ddmhyoBh-lkp@intel.com/config)
compiler: gcc-12 (Debian 12.2.0-14) 12.2.0
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20231111/202311111350.ddmhyoBh-lkp@intel.com/reproduce)

If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202311111350.ddmhyoBh-lkp@intel.com/

All warnings (new ones prefixed by >>):

   arch/x86/mm/wrprotect.c: In function 'split_large_pages':
>> arch/x86/mm/wrprotect.c:103:13: warning: variable 'ret' set but not used [-Wunused-but-set-variable]
     103 |         int ret;
         |             ^~~
   arch/x86/mm/wrprotect.c: In function 'handle_tasks':
   arch/x86/mm/wrprotect.c:263:9: error: implicit declaration of function 'do_each_thread'; did you mean 'for_each_thread'? [-Werror=implicit-function-declaration]
     263 |         do_each_thread(p, t) {
         |         ^~~~~~~~~~~~~~
         |         for_each_thread
   arch/x86/mm/wrprotect.c:263:29: error: expected ';' before '{' token
     263 |         do_each_thread(p, t) {
         |                             ^~
         |                             ;
   In file included from include/linux/pgtable.h:6,
                    from include/linux/mm.h:29,
                    from arch/x86/include/asm/wrprotect.h:23,
                    from arch/x86/mm/wrprotect.c:21:
   arch/x86/mm/wrprotect.c: In function 'protect_pte':
   arch/x86/include/asm/pgtable.h:461:21: error: too few arguments to function 'pte_mkwrite'
     461 | #define pte_mkwrite pte_mkwrite
         |                     ^~~~~~~~~~~
   arch/x86/mm/wrprotect.c:386:23: note: in expansion of macro 'pte_mkwrite'
     386 |                 pte = pte_mkwrite(pte);
         |                       ^~~~~~~~~~~
   arch/x86/include/asm/pgtable.h:460:7: note: declared here
     460 | pte_t pte_mkwrite(pte_t pte, struct vm_area_struct *vma);
         |       ^~~~~~~~~~~
   arch/x86/mm/wrprotect.c: In function 'wrprotect_uninit':
   arch/x86/mm/wrprotect.c:728:13: warning: variable 'ret' set but not used [-Wunused-but-set-variable]
     728 |         int ret;
         |             ^~~
   cc1: some warnings being treated as errors


vim +/ret +103 arch/x86/mm/wrprotect.c

    95	
    96	/* split_large_pages
    97	 *
    98	 * This function splits all large pages in straight mapping area into 4K ones.
    99	 * Currently wrprotect supports only 4K pages, and so this is needed.
   100	 */
   101	static int split_large_pages(void)
   102	{
 > 103		int ret;
   104		struct mm_walk_ops split_large_pages_walk_ops;
   105	
   106		memset(&split_large_pages_walk_ops, 0, sizeof(struct mm_walk_ops));
   107		split_large_pages_walk_ops.pud_entry = split_large_pages_walk_pud;
   108		split_large_pages_walk_ops.pmd_entry = split_large_pages_walk_pmd;
   109	
   110		mmap_write_lock(&init_mm);
   111		ret = walk_page_range_novma(&init_mm, PAGE_OFFSET, PAGE_OFFSET + DIRECT_MAP_SIZE,
   112			&split_large_pages_walk_ops, init_mm.pgd, NULL);
   113		mmap_write_unlock(&init_mm);
   114	
   115		return 0;
   116	}
   117	

-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [RFC PATCH 3/4 v1] livedump: Add memory dumping functionality
  2023-11-10 15:00 ` [RFC PATCH 3/4 v1] livedump: Add memory dumping functionality Lukas Hruska
@ 2023-11-11 15:06   ` kernel test robot
  2023-11-11 19:19   ` kernel test robot
  1 sibling, 0 replies; 10+ messages in thread
From: kernel test robot @ 2023-11-11 15:06 UTC (permalink / raw
  To: Lukas Hruska; +Cc: oe-kbuild-all

Hi Lukas,

[This is a private test report for your RFC patch.]
kernel test robot noticed the following build warnings:

[auto build test WARNING on tip/x86/core]
[also build test WARNING on tip/x86/mm linus/master v6.6 next-20231110]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]

url:    https://github.com/intel-lab-lkp/linux/commits/Lukas-Hruska/crash-vmcore-VMCOREINFO-creation-from-non-kdump-kernel/20231111-022332
base:   tip/x86/core
patch link:    https://lore.kernel.org/r/20231110150057.15717-4-lhruska%40suse.cz
patch subject: [RFC PATCH 3/4 v1] livedump: Add memory dumping functionality
config: x86_64-allyesconfig (https://download.01.org/0day-ci/archive/20231111/202311112220.byee3qTQ-lkp@intel.com/config)
compiler: gcc-12 (Debian 12.2.0-14) 12.2.0
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20231111/202311112220.byee3qTQ-lkp@intel.com/reproduce)

If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202311112220.byee3qTQ-lkp@intel.com/

All warnings (new ones prefixed by >>):

   kernel/livedump/memdump.c: In function 'livedump_memdump_init':
   kernel/livedump/memdump.c:346:53: error: 'FMODE_EXCL' undeclared (first use in this function); did you mean 'FMODE_EXEC'?
     346 |         memdump_bdev = blkdev_get_by_path(bdevpath, FMODE_EXCL, &memdump_bdev);
         |                                                     ^~~~~~~~~~
         |                                                     FMODE_EXEC
   kernel/livedump/memdump.c:346:53: note: each undeclared identifier is reported only once for each function it appears in
   kernel/livedump/memdump.c:346:24: error: too few arguments to function 'blkdev_get_by_path'
     346 |         memdump_bdev = blkdev_get_by_path(bdevpath, FMODE_EXCL, &memdump_bdev);
         |                        ^~~~~~~~~~~~~~~~~~
   In file included from kernel/livedump/memdump.c:34:
   include/linux/blkdev.h:1484:22: note: declared here
    1484 | struct block_device *blkdev_get_by_path(const char *path, blk_mode_t mode,
         |                      ^~~~~~~~~~~~~~~~~~
>> kernel/livedump/memdump.c:347:26: warning: ordered comparison of pointer with integer zero [-Wextra]
     347 |         if (memdump_bdev < 0)
         |                          ^
   kernel/livedump/memdump.c: In function 'livedump_memdump_uninit':
   kernel/livedump/memdump.c:403:34: error: 'FMODE_EXCL' undeclared (first use in this function); did you mean 'FMODE_EXEC'?
     403 |         blkdev_put(memdump_bdev, FMODE_EXCL);
         |                                  ^~~~~~~~~~
         |                                  FMODE_EXEC
   kernel/livedump/memdump.c: In function 'memdump_thread_func':
>> kernel/livedump/memdump.c:298:25: warning: ignoring return value of 'bio_add_page' declared with attribute 'warn_unused_result' [-Wunused-result]
     298 |                         bio_add_page(bio, virt_to_page(req.p), PAGE_SIZE, 0);
         |                         ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
   kernel/livedump/memdump.c:320:25: warning: ignoring return value of 'bio_add_page' declared with attribute 'warn_unused_result' [-Wunused-result]
     320 |                         bio_add_page(bio, virt_to_page(req.p), PAGE_SIZE, 0);
         |                         ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~


vim +347 kernel/livedump/memdump.c

   277	
   278	static int memdump_thread_func(void *_)
   279	{
   280		struct bio *bio;
   281		struct memdump_request req;
   282	
   283		do {
   284			/* Process request */
   285			while (kfifo_get(&memdump_req_queue.pend, &req)) {
   286				bio = bio_alloc(memdump_bdev, 1, REQ_OP_WRITE, GFP_KERNEL);
   287	
   288				if (WARN_ON(!bio)) {
   289					spin_lock(&memdump_req_queue.pool_w_lock);
   290					kfifo_put(&memdump_req_queue.pool, req);
   291					spin_unlock(&memdump_req_queue.pool_w_lock);
   292					continue;
   293				}
   294	
   295				bio->bi_bdev = memdump_bdev;
   296				bio->bi_end_io = memdump_endio;
   297				bio->bi_iter.bi_sector = req.pfn << (PAGE_SHIFT - SECTOR_SHIFT);
 > 298				bio_add_page(bio, virt_to_page(req.p), PAGE_SIZE, 0);
   299	
   300				trace_memdump_bio_submit(memdump_bdev, req.pfn);
   301	
   302				submit_bio(bio);
   303			}
   304	
   305			/* Process request for sweep*/
   306			while (kfifo_get(&memdump_req_queue_for_sweep.pend, &req)) {
   307				bio = bio_alloc(memdump_bdev, 1, REQ_OP_WRITE, GFP_KERNEL);
   308	
   309				if (WARN_ON(!bio)) {
   310					spin_lock(&memdump_req_queue_for_sweep.pool_w_lock);
   311					kfifo_put(&memdump_req_queue_for_sweep.pool, req);
   312					spin_unlock(&memdump_req_queue_for_sweep.pool_w_lock);
   313					continue;
   314				}
   315	
   316				bio->bi_bdev = memdump_bdev;
   317				bio->bi_end_io = memdump_endio;
   318				bio->bi_iter.bi_sector = req.pfn << (PAGE_SHIFT - SECTOR_SHIFT);
   319				bio->bi_private = (void *)1; /* for sweep */
   320				bio_add_page(bio, virt_to_page(req.p), PAGE_SIZE, 0);
   321	
   322				trace_memdump_bio_submit(memdump_bdev, req.pfn);
   323	
   324				submit_bio(bio);
   325			}
   326	
   327			msleep(20);
   328		} while (memdump_thread.is_active);
   329	
   330		complete(&memdump_thread.completion);
   331		return 0;
   332	}
   333	
   334	static int select_pages(void);
   335	
   336	int livedump_memdump_init(const char *bdevpath)
   337	{
   338		long ret;
   339	
   340		if (WARN(!memdump_state_transit(MEMDUMP_INACTIVE),
   341					"livedump: memdump is already initialized.\n"))
   342			return -EBUSY;
   343	
   344		/* Get bdev */
   345		ret = -ENOENT;
 > 346		memdump_bdev = blkdev_get_by_path(bdevpath, FMODE_EXCL, &memdump_bdev);
 > 347		if (memdump_bdev < 0)
   348			goto err;
   349	
   350		/* Allocate request queue */
   351		ret = alloc_req_queue();
   352		if (ret)
   353			goto err_bdev;
   354	
   355		/* Start thread */
   356		ret = start_memdump_thread();
   357		if (ret)
   358			goto err_freeq;
   359	
   360		/* Select target pages */
   361		select_pages();
   362	
   363		/* Allocate space for vmcore info */
   364		vmcoreinfo = vmalloc(PAGE_SIZE);
   365		cmem = vzalloc(struct_size(cmem, ranges, 1));
   366		if (WARN_ON(!vmcoreinfo || !cmem))
   367			return -ENOMEM;
   368	
   369		memdump_state_transit(MEMDUMP_ACTIVATING); /* always succeeds */
   370		return 0;
   371	
   372	err_freeq:
   373		free_req_queue();
   374	err_bdev:
   375		blkdev_put(memdump_bdev, FMODE_EXCL);
   376	err:
   377		memdump_state_transit_back();
   378		return ret;
   379	}
   380	

-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [RFC PATCH 3/4 v1] livedump: Add memory dumping functionality
  2023-11-10 15:00 ` [RFC PATCH 3/4 v1] livedump: Add memory dumping functionality Lukas Hruska
  2023-11-11 15:06   ` kernel test robot
@ 2023-11-11 19:19   ` kernel test robot
  1 sibling, 0 replies; 10+ messages in thread
From: kernel test robot @ 2023-11-11 19:19 UTC (permalink / raw
  To: Lukas Hruska; +Cc: oe-kbuild-all

Hi Lukas,

[This is a private test report for your RFC patch.]
kernel test robot noticed the following build errors:

[auto build test ERROR on tip/x86/core]
[also build test ERROR on tip/x86/mm linus/master v6.6 next-20231110]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]

url:    https://github.com/intel-lab-lkp/linux/commits/Lukas-Hruska/crash-vmcore-VMCOREINFO-creation-from-non-kdump-kernel/20231111-022332
base:   tip/x86/core
patch link:    https://lore.kernel.org/r/20231110150057.15717-4-lhruska%40suse.cz
patch subject: [RFC PATCH 3/4 v1] livedump: Add memory dumping functionality
config: x86_64-allyesconfig (https://download.01.org/0day-ci/archive/20231112/202311120256.zqaXBHnU-lkp@intel.com/config)
compiler: gcc-12 (Debian 12.2.0-14) 12.2.0
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20231112/202311120256.zqaXBHnU-lkp@intel.com/reproduce)

If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202311120256.zqaXBHnU-lkp@intel.com/

All errors (new ones prefixed by >>):

   kernel/livedump/memdump.c: In function 'livedump_memdump_init':
>> kernel/livedump/memdump.c:346:53: error: 'FMODE_EXCL' undeclared (first use in this function); did you mean 'FMODE_EXEC'?
     346 |         memdump_bdev = blkdev_get_by_path(bdevpath, FMODE_EXCL, &memdump_bdev);
         |                                                     ^~~~~~~~~~
         |                                                     FMODE_EXEC
   kernel/livedump/memdump.c:346:53: note: each undeclared identifier is reported only once for each function it appears in
>> kernel/livedump/memdump.c:346:24: error: too few arguments to function 'blkdev_get_by_path'
     346 |         memdump_bdev = blkdev_get_by_path(bdevpath, FMODE_EXCL, &memdump_bdev);
         |                        ^~~~~~~~~~~~~~~~~~
   In file included from kernel/livedump/memdump.c:34:
   include/linux/blkdev.h:1484:22: note: declared here
    1484 | struct block_device *blkdev_get_by_path(const char *path, blk_mode_t mode,
         |                      ^~~~~~~~~~~~~~~~~~
   kernel/livedump/memdump.c:347:26: warning: ordered comparison of pointer with integer zero [-Wextra]
     347 |         if (memdump_bdev < 0)
         |                          ^
   kernel/livedump/memdump.c: In function 'livedump_memdump_uninit':
   kernel/livedump/memdump.c:403:34: error: 'FMODE_EXCL' undeclared (first use in this function); did you mean 'FMODE_EXEC'?
     403 |         blkdev_put(memdump_bdev, FMODE_EXCL);
         |                                  ^~~~~~~~~~
         |                                  FMODE_EXEC
   kernel/livedump/memdump.c: In function 'memdump_thread_func':
   kernel/livedump/memdump.c:298:25: warning: ignoring return value of 'bio_add_page' declared with attribute 'warn_unused_result' [-Wunused-result]
     298 |                         bio_add_page(bio, virt_to_page(req.p), PAGE_SIZE, 0);
         |                         ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
   kernel/livedump/memdump.c:320:25: warning: ignoring return value of 'bio_add_page' declared with attribute 'warn_unused_result' [-Wunused-result]
     320 |                         bio_add_page(bio, virt_to_page(req.p), PAGE_SIZE, 0);
         |                         ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~


vim +346 kernel/livedump/memdump.c

   335	
   336	int livedump_memdump_init(const char *bdevpath)
   337	{
   338		long ret;
   339	
   340		if (WARN(!memdump_state_transit(MEMDUMP_INACTIVE),
   341					"livedump: memdump is already initialized.\n"))
   342			return -EBUSY;
   343	
   344		/* Get bdev */
   345		ret = -ENOENT;
 > 346		memdump_bdev = blkdev_get_by_path(bdevpath, FMODE_EXCL, &memdump_bdev);
   347		if (memdump_bdev < 0)
   348			goto err;
   349	
   350		/* Allocate request queue */
   351		ret = alloc_req_queue();
   352		if (ret)
   353			goto err_bdev;
   354	
   355		/* Start thread */
   356		ret = start_memdump_thread();
   357		if (ret)
   358			goto err_freeq;
   359	
   360		/* Select target pages */
   361		select_pages();
   362	
   363		/* Allocate space for vmcore info */
   364		vmcoreinfo = vmalloc(PAGE_SIZE);
   365		cmem = vzalloc(struct_size(cmem, ranges, 1));
   366		if (WARN_ON(!vmcoreinfo || !cmem))
   367			return -ENOMEM;
   368	
   369		memdump_state_transit(MEMDUMP_ACTIVATING); /* always succeeds */
   370		return 0;
   371	
   372	err_freeq:
   373		free_req_queue();
   374	err_bdev:
   375		blkdev_put(memdump_bdev, FMODE_EXCL);
   376	err:
   377		memdump_state_transit_back();
   378		return ret;
   379	}
   380	

-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2023-11-11 19:20 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-11-10 15:00 [RFC PATCH 0/4 v1] LPC materials: livedump Lukas Hruska
2023-11-10 15:00 ` [RFC PATCH 1/4 v1] crash/vmcore: VMCOREINFO creation from non-kdump kernel Lukas Hruska
2023-11-10 21:56   ` kernel test robot
2023-11-10 23:26   ` kernel test robot
2023-11-10 15:00 ` [RFC PATCH 2/4 v1] livedump: Add write protection management Lukas Hruska
2023-11-11  5:20   ` kernel test robot
2023-11-10 15:00 ` [RFC PATCH 3/4 v1] livedump: Add memory dumping functionality Lukas Hruska
2023-11-11 15:06   ` kernel test robot
2023-11-11 19:19   ` kernel test robot
2023-11-10 15:00 ` [RFC PATCH 4/4 v1] livedump: Add tools to make livedump creation easier Lukas Hruska

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.