acpica-devel.lists.linux.dev archive mirror
 help / color / mirror / Atom feed
From: Shuai Xue <xueshuai@linux.alibaba.com>
To: keescook@chromium.org, tony.luck@intel.com, gpiccoli@igalia.com,
	rafael@kernel.org, lenb@kernel.org, james.morse@arm.com,
	bp@alien8.de, tglx@linutronix.de, mingo@redhat.com,
	dave.hansen@linux.intel.com, x86@kernel.org, hpa@zytor.com,
	ardb@kernel.org, robert.moore@intel.com
Cc: linux-efi@vger.kernel.org, linux-kernel@vger.kernel.org,
	linux-acpi@vger.kernel.org, linux-hardening@vger.kernel.org,
	baolin.wang@linux.alibaba.com, xueshuai@linux.alibaba.com,
	acpica-devel@lists.linuxfoundation.org,
	linux-edac@vger.kernel.org
Subject: [Acpica-devel] [RFC PATCH v2 0/9] Use ERST for persistent storage of MCE and APEI errors
Date: Mon, 25 Sep 2023 07:44:40 -0000	[thread overview]
Message-ID: <20230925074426.97856-1-xueshuai@linux.alibaba.com> (raw)

changes log since v1:
- fix a compile waring by dereferencing rcd pointer before memset
- add a compile error by add CONFIG_X86_MCE
- Link: https://lore.kernel.org/all/20230916130316.65815-3-xueshuai@linux.alibaba.com/

In certain scenarios (ie. hosts/guests with root filesystems on NFS/iSCSI
where networking software and/or hardware fails, and thus kdump fails), it
is necessary to serialize hardware error information available for
post-mortem debugging. Save the hardware error log into flash via ERST
before go panic, the hardware error log can be gotten from the flash after
system boot successful again, which is very useful in production.

On X86 platform, the kernel has supported to serialize and deserialize MCE
error record by commit 482908b49ebf ("ACPI, APEI, Use ERST for persistent
storage of MCE"). The process involves two steps:

- MCE Producer: When a hardware error is detected, MCE raised and its
  handler writes MCE error record into flash via ERST before panic
- MCE Consumor: After system reboot, /sbin/mcelog run, it reads /dev/mcelog
  to check flash for error record of previous boot via ERST

After /dev/mcelog character device deprecated by commit 5de97c9f6d85
("x86/mce: Factor out and deprecate the /dev/mcelog driver"), the
serialized MCE error record, of previous boot in persistent storage is not
collected via APEI ERST.

This patch set include two part:

- PATCH 1-3: rework apei_{read,write}_mce to use pstore data structure and emit
  the mce_record tracepoint, enabling the collection of MCE records by the
  rasdaemon tool.
- PATCH 4-9: use ERST for persistent storage of APEI errors, and emit
  tracepoints for CPER sections, enabling the collection of MCE records by the
  rasdaemon tool.

Shuai Xue (9):
  pstore: move pstore creator id, section type and record struct to
    common header
  ACPI: APEI: Use common ERST struct to read/write serialized MCE record
  ACPI: APEI: ERST: Emit the mce_record tracepoint
  ACPI: tables: change section_type of generic error data as guid_t
  ACPI: APEI: GHES: Use ERST to serialize APEI generic error before
    panic
  ACPI: APEI: GHES: export ghes_report_chain
  ACPI: APEI: ESRT: kick ghes_report_chain notifier to report serialized
    memory errors
  ACPI: APEI: ESRT: print AER to report serialized PCIe errors
  ACPI: APEI: ESRT: log ARM processor error

 arch/x86/kernel/cpu/mce/apei.c | 82 +++++++++++++++-------------------
 drivers/acpi/acpi_extlog.c     |  2 +-
 drivers/acpi/apei/erst.c       | 55 ++++++++++++++---------
 drivers/acpi/apei/ghes.c       | 48 +++++++++++++++++++-
 drivers/firmware/efi/cper.c    |  2 +-
 fs/pstore/platform.c           |  3 ++
 include/acpi/actbl1.h          |  5 ++-
 include/acpi/ghes.h            |  2 +-
 include/linux/pstore.h         | 29 ++++++++++++
 9 files changed, 154 insertions(+), 74 deletions(-)

-- 
2.41.0


             reply	other threads:[~2023-09-25  7:44 UTC|newest]

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-09-25  7:44 Shuai Xue [this message]
2023-09-25  7:44 ` [Acpica-devel] [RFC PATCH v2 2/9] ACPI: APEI: Use common ERST struct to read/write serialized MCE record Shuai Xue
2023-09-25  7:44 ` [Acpica-devel] [RFC PATCH v2 1/9] pstore: move pstore creator id, section type and record struct to common header Shuai Xue
2023-09-25 17:13   ` Kees Cook
2023-09-26  6:47     ` Shuai Xue
2023-09-25  7:44 ` [Acpica-devel] [RFC PATCH v2 3/9] ACPI: APEI: ERST: Emit the mce_record tracepoint Shuai Xue
2023-09-25  7:44 ` [Acpica-devel] [RFC PATCH v2 4/9] ACPI: tables: change section_type of generic error data as guid_t Shuai Xue
2023-09-25  7:44 ` [Acpica-devel] [RFC PATCH v2 5/9] ACPI: APEI: GHES: Use ERST to serialize APEI generic error before panic Shuai Xue
2023-09-25  7:44 ` [Acpica-devel] [RFC PATCH v2 6/9] ACPI: APEI: GHES: export ghes_report_chain Shuai Xue
2023-09-25  7:44 ` [Acpica-devel] [RFC PATCH v2 7/9] ACPI: APEI: ESRT: kick ghes_report_chain notifier to report serialized memory errors Shuai Xue
2023-09-25  7:44 ` [Acpica-devel] [RFC PATCH v2 8/9] ACPI: APEI: ESRT: print AER to report serialized PCIe errors Shuai Xue
2023-09-25  7:44 ` [Acpica-devel] [RFC PATCH v2 9/9] ACPI: APEI: ESRT: log ARM processor error Shuai Xue
2023-09-28 14:50 ` [Acpica-devel] [RFC PATCH v2 0/9] Use ERST for persistent storage of MCE and APEI errors Borislav Petkov
2023-10-07  7:15   ` Shuai Xue
2023-10-26 10:21     ` Shuai Xue
2023-10-26 13:32     ` Borislav Petkov

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20230925074426.97856-1-xueshuai@linux.alibaba.com \
    --to=xueshuai@linux.alibaba.com \
    --cc=acpica-devel@lists.linuxfoundation.org \
    --cc=ardb@kernel.org \
    --cc=baolin.wang@linux.alibaba.com \
    --cc=bp@alien8.de \
    --cc=dave.hansen@linux.intel.com \
    --cc=gpiccoli@igalia.com \
    --cc=hpa@zytor.com \
    --cc=james.morse@arm.com \
    --cc=keescook@chromium.org \
    --cc=lenb@kernel.org \
    --cc=linux-acpi@vger.kernel.org \
    --cc=linux-edac@vger.kernel.org \
    --cc=linux-efi@vger.kernel.org \
    --cc=linux-hardening@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@redhat.com \
    --cc=rafael@kernel.org \
    --cc=robert.moore@intel.com \
    --cc=tglx@linutronix.de \
    --cc=tony.luck@intel.com \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).