From: Bitao Hu <yaoma@linux.alibaba.com>
To: dianders@chromium.org, akpm@linux-foundation.org,
liusong@linux.alibaba.com, tglx@linutronix.de, pmladek@suse.com,
kernelfans@gmail.com, deller@gmx.de, npiggin@gmail.com,
tsbogend@alpha.franken.de, James.Bottomley@HansenPartnership.com,
jan.kiszka@siemens.com
Cc: linux-kernel@vger.kernel.org, linux-mips@vger.kernel.org,
linux-parisc@vger.kernel.org, linuxppc-dev@lists.ozlabs.org,
yaoma@linux.alibaba.com
Subject: [PATCHv9 0/3] *** Detect interrupt storm in softlockup ***
Date: Thu, 22 Feb 2024 17:34:17 +0800 [thread overview]
Message-ID: <20240222093420.13956-1-yaoma@linux.alibaba.com> (raw)
Hi, guys.
I have implemented a low-overhead method for detecting interrupt
storm in softlockup. Please review it, all comments are welcome.
Changes from v8 to v9:
- Patch #1 remains unchanged.
- From Thomas Gleixner, split patch #2 into two patches. Interrupt
infrastructure first and then the actual usage site in the
watchdog code.
Changes from v7 to v8:
- From Thomas Gleixner, implement statistics within the interrupt
core code and provide sensible interfaces for the watchdog code.
- Patch #1 remains unchanged. Patch #2 has significant changes
based on Thomas's suggestions, which is why I have removed
Liu Song and Douglas's Reviewed-by from patch #2. Please review
it again, and all comments are welcome.
Changes from v6 to v7:
- Remove "READ_ONCE" in "start_counting_irqs"
- Replace the hard-coded 5 with "NUM_SAMPLE_PERIODS" macro in
"set_sample_period".
- Add empty lines to help with reading the code.
- Remove the branch that processes IRQs where "counts_diff = 0".
- Add the Reviewed-by of Liu Song and Douglas.
Changes from v5 to v6:
- Use "./scripts/checkpatch.pl --strict" to get a few extra
style nits and fix them.
- Squash patch #3 into patch #1, and wrapp the help text to
80 columns.
- Sort existing headers alphabetically in watchdog.c
- Drop "softlockup_hardirq_cpus", just read "hardirq_counts"
and see if it's non-NULL.
- Store "nr_irqs" in a local variable.
- Simplify the calculation of "cpu_diff".
Changes from v4 to v5:
- Rearranging variable placement to make code look neater.
Changes from v3 to v4:
- Renaming some variable and function names to make the code logic
more readable.
- Change the code location to avoid predeclaring.
- Just swap rather than a double loop in tabulate_irq_count.
- Since nr_irqs has the potential to grow at runtime, bounds-check
logic has been implemented.
- Add SOFTLOCKUP_DETECTOR_INTR_STORM Kconfig knob.
Changes from v2 to v3:
- From Liu Song, using enum instead of macro for cpu_stats, shortening
the name 'idx_to_stat' to 'stats', adding 'get_16bit_precesion' instead
of using right shift operations, and using 'struct irq_counts'.
- From kernel robot test, using '__this_cpu_read' and '__this_cpu_write'
instead of accessing to an per-cpu array directly, in order to avoid
this warning.
'sparse: incorrect type in initializer (different modifiers)'
Changes from v1 to v2:
- From Douglas, optimize the memory of cpustats. With the maximum number
of CPUs, that's now this.
2 * 8192 * 4 + 1 * 8192 * 5 * 4 + 1 * 8192 = 237,568 bytes.
- From Liu Song, refactor the code format and add necessary comments.
- From Douglas, use interrupt counts instead of interrupt time to
determine the cause of softlockup.
- Remove the cmdline parameter added in PATCHv1.
Bitao Hu (3):
watchdog/softlockup: low-overhead detection of interrupt storm
irq: use a struct for the kstat_irqs in the interrupt descriptor
watchdog/softlockup: report the most frequent interrupts
arch/mips/dec/setup.c | 2 +-
arch/parisc/kernel/smp.c | 2 +-
arch/powerpc/kvm/book3s_hv_rm_xics.c | 2 +-
include/linux/irqdesc.h | 9 +-
include/linux/kernel_stat.h | 3 +
kernel/irq/internals.h | 2 +-
kernel/irq/irqdesc.c | 34 ++++-
kernel/irq/proc.c | 9 +-
kernel/watchdog.c | 213 ++++++++++++++++++++++++++-
lib/Kconfig.debug | 13 ++
scripts/gdb/linux/interrupts.py | 6 +-
11 files changed, 268 insertions(+), 27 deletions(-)
--
2.37.1 (Apple Git-137.1)
next reply other threads:[~2024-02-22 9:34 UTC|newest]
Thread overview: 7+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-02-22 9:34 Bitao Hu [this message]
2024-02-22 9:34 ` [PATCHv9 1/3] watchdog/softlockup: low-overhead detection of interrupt storm Bitao Hu
2024-02-22 9:34 ` [PATCHv9 2/3] irq: use a struct for the kstat_irqs in the interrupt descriptor Bitao Hu
2024-02-22 13:22 ` Thomas Gleixner
2024-02-23 7:18 ` Bitao Hu
2024-02-23 7:29 ` Thomas Gleixner
2024-02-22 9:34 ` [PATCHv9 3/3] watchdog/softlockup: report the most frequent interrupts Bitao Hu
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20240222093420.13956-1-yaoma@linux.alibaba.com \
--to=yaoma@linux.alibaba.com \
--cc=James.Bottomley@HansenPartnership.com \
--cc=akpm@linux-foundation.org \
--cc=deller@gmx.de \
--cc=dianders@chromium.org \
--cc=jan.kiszka@siemens.com \
--cc=kernelfans@gmail.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mips@vger.kernel.org \
--cc=linux-parisc@vger.kernel.org \
--cc=linuxppc-dev@lists.ozlabs.org \
--cc=liusong@linux.alibaba.com \
--cc=npiggin@gmail.com \
--cc=pmladek@suse.com \
--cc=tglx@linutronix.de \
--cc=tsbogend@alpha.franken.de \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).