From: Mathieu Desnoyers via lttng-dev <lttng-dev@lists.lttng.org>
To: Damien Berget <damien.berget@flyzipline.com>,
Kienan Stewart <kstewart@efficios.com>
Cc: lttng-dev@lists.lttng.org
Subject: Re: [lttng-dev] Capturing snapshot on kernel panic
Date: Thu, 16 May 2024 15:56:28 -0400 [thread overview]
Message-ID: <5d6ac618-24ba-4470-834a-7fce7155459c@efficios.com> (raw)
In-Reply-To: <CAA1MA5e+qMEBGAtiuLPKsospQycRFFpOYKVkh66pOAvdX5TtTg@mail.gmail.com>
Hi Damien,
If kexec is not an option on your system, you might be able to
access the pmem+dax filesystem after a warm reboot, but it very
much depends on whether your bios clears your memory or not on
warm reboot.
Cheers,
Mathieu
On 2024-05-16 14:22, Damien Berget via lttng-dev wrote:
> Thanks Kienan for these quick suggestions,
> we'll investigate the pmem route (I was not aware of the lttng-cash
> utility, it's pretty nice) even if I'm not sure how fast it would burn
> through our SSD, it might still be worth trying.
> As for kexec-tool, it's not officially supported on our embedded modules
> unfortunately, so we might be SOL there. We may have to try to add our
> own trace-point in kernel to use as trigger.
> Cheers
> Damien
>
> On Thu, May 16, 2024 at 8:12 AM Kienan Stewart <kstewart@efficios.com
> <mailto:kstewart@efficios.com>> wrote:
>
> Hi Damien,
>
> I want to expand on one of the options that could work for your case.
>
> On 5/16/24 9:37 AM, Kienan Stewart via lttng-dev wrote:
> > Hi Damien,
> >
> >
> > On 5/15/24 6:24 PM, Damien Berget via lttng-dev wrote:
> >> Good day,
> >> we have been using LTTng successfully to capture snapshots on user
> >> defined tracepoints and it did provide invaluable to debug our
> issues.
> >> Thanks to all the contributors of this project!
> >>
> >> We'd like to know if it would be possible to trigger on a kernel
> >> panic? I might be dubiously possible as you would still need to
> have
> >> the file-system working to write the results but I should ask.
> >>
> >
> > For userspace tracing, I think the recommendation is usually to
> use a
> > dax/pmem device and have the buffers for the session mapped
> there. After
> > a panic, the contents of the buffers can be restored using
> lttng-crash[1].
> >
> > Note that dax/pem isn't supported by the kernel space tracer at
> this time.
> >
> > If I recall, there are other ways to things in the panic sequence
> (that
> > aren't lttng specific), but I'm personally not as familiar with the
> > details of that stage of linux.
> >
>
> It's possible to kexec-tools to load a new kernel post-panic[1]. If
> your
> system uses kexec, the contents of RAM aren't necessarily flushed, and
> if both the initial kernel and post-panic kernel started by kexec have
> the same configuration for an emulated PMEM device using the memmap
> paramenter [2,3] that region of memory can have a daxfs created in it
> post-clean boot.
>
> Note: some systems may not flush the memory during a warm reboot, but
> this is dependent on the BIOS.
>
> When your system boots you could do something like the following:
>
> * If it's a clean boot, create the daxfs
> * If it's an "unclean" boot (e.g. the daxfs already exists, or a
> kernel parameter informs you that it started post-panic) then you can
> copy/move/use lttng-crash to persistent storage for analysis
> * Start tracing using a snapshot session and the userspace
> buffers on
> the daxfs.
>
> In this type of situation the "snapshot" command is never invoked
> directly, but the recovery of the buffers to create a snapshot is
> possible.
>
> [1]:
> https://www.kernel.org/doc/html/latest/admin-guide/kdump/kdump.html
> <https://www.kernel.org/doc/html/latest/admin-guide/kdump/kdump.html>
> [2]:
> https://www.kernel.org/doc/html/latest/admin-guide/kernel-parameters.html <https://www.kernel.org/doc/html/latest/admin-guide/kernel-parameters.html>
> [3]:
> https://docs.pmem.io/persistent-memory/getting-started-guide/creating-development-environments/linux-environments/linux-memmap <https://docs.pmem.io/persistent-memory/getting-started-guide/creating-development-environments/linux-environments/linux-memmap>
>
> thanks,
> kienan
>
> >> Looking at available kernel syscall, the "reboot" one seems like a
> >> good candidate, however I was not able to capture a snapshot on
> it. I
> >> have tested the setup below with "--name=chdir" syscall and it
> >> works, "cd" to a directory will create a trace. But no dice with
> reboot.
> >>
> >
> > The details of how this work will depend on your system. For
> example, my
> > installations tend to use systemd as PID 1. The broad strokes
> seem to
> > be: `/usr/sbin/reboot` is actually a link to `systemctl`, which I
> > believe then kicks off the reboot.service, the PID 1 is swapped to
> > /usr/lib/systemd/systemd-shutdown, sigterm then sigkill are sent
> to all
> > processes, unmounts, syncs, calls the reboot system call [2,3].
> >
> > As both the sigterm and the unmounts are done before the syscall,
> > lttng-sessiond and the consumers will have already shutdown by
> the time
> > it enters.
> >
> > While this doesn't necessarily help your original question of
> panics, if
> > you want to snapshot before shutdown or reboot and are using
> systemd,
> > it's possible to leave a script or binary in a known directory so
> that
> > it's invoked prior to the rest of the shutdown sequence[4].
> >
> > [1]:
> https://lttng.org/docs/v2.13/#doc-persistent-memory-file-systems
> <https://lttng.org/docs/v2.13/#doc-persistent-memory-file-systems>
> > [2]:
> >
> https://github.com/systemd/systemd/blob/6533c14997700f74e9ea42121303fc1f5c63e62b/src/shutdown/shutdown.c <https://github.com/systemd/systemd/blob/6533c14997700f74e9ea42121303fc1f5c63e62b/src/shutdown/shutdown.c>
> > [3]:
> >
> https://github.com/systemd/systemd/blob/main/src/shared/reboot-util.c#L77 <https://github.com/systemd/systemd/blob/main/src/shared/reboot-util.c#L77>
> > [4]:
> https://www.systutorials.com/docs/linux/man/8-systemd-reboot/
> <https://www.systutorials.com/docs/linux/man/8-systemd-reboot/>
> >
> > hope this helps,
> > kienan
> >
> >> Would you have any suggestions?
> >> Thanks for your help,
> >> Cheers
> >> Damien
> >>
> >> ============================
> >>
> >> # Prep output dir
> >> mkdir /application/trace/
> >> rm -rf /application/trace/*
> >>
> >> # Create session
> >> sudo lttng destroy snapshot-trace-session
> >> sudo lttng create snapshot-trace-session --snapshot
> >> --output="/application/trace/"
> >> sudo lttng enable-channel --kernel --num-subbuf=8 channelk
> >> sudo lttng enable-channel --userspace --num-subbuf=8 channelu
> >>
> >> # Configure session
> >> sudo lttng enable-event --kernel --syscall --all --channel channelk
> >> sudo lttng enable-event --kernel --tracepoint "sched*" --channel
> channelk
> >> sudo lttng enable-event --userspace --all --channel channelu
> >> sudo lttng add-context -u -t vtid -t procname
> >> sudo lttng remove-trigger trig_reboot
> >> sudo lttng add-trigger --name=trig_reboot \
> >> --condition=event-rule-matches
> --type=kernel:syscall:entry \
> >> --name=reboot\
> >> --action=snapshot-session snapshot-trace-session \
> >> --rate-policy=once-after:1
> >>
> >> # start & list info
> >> sudo lttng start
> >> sudo lttng list snapshot-trace-session
> >> sudo lttng list-triggers
> >>
> >> #======== test it...
> >> sudo reboot
> >>
> >> #======= reconnect and Nothing :(
> >> $ ls -alu /application/trace/
> >> drwxr-xr-x 2 u u 4096 May 15 2024 .
> >> drwxr-xr-x 10 u u 4096 May 15 2024 ..
> >>
> >>
> >> _______________________________________________
> >> lttng-dev mailing list
> >> lttng-dev@lists.lttng.org <mailto:lttng-dev@lists.lttng.org>
> >> https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev
> <https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev>
> > _______________________________________________
> > lttng-dev mailing list
> > lttng-dev@lists.lttng.org <mailto:lttng-dev@lists.lttng.org>
> > https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev
> <https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev>
>
>
>
> --
> *Damien Berget*
>
> _______________________________________________
> lttng-dev mailing list
> lttng-dev@lists.lttng.org
> https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev
--
Mathieu Desnoyers
EfficiOS Inc.
https://www.efficios.com
_______________________________________________
lttng-dev mailing list
lttng-dev@lists.lttng.org
https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev
prev parent reply other threads:[~2024-05-16 19:56 UTC|newest]
Thread overview: 5+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-05-15 22:24 [lttng-dev] Capturing snapshot on kernel panic Damien Berget via lttng-dev
2024-05-16 13:37 ` Kienan Stewart via lttng-dev
2024-05-16 15:12 ` Kienan Stewart via lttng-dev
2024-05-16 18:22 ` Damien Berget via lttng-dev
2024-05-16 19:56 ` Mathieu Desnoyers via lttng-dev [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=5d6ac618-24ba-4470-834a-7fce7155459c@efficios.com \
--to=lttng-dev@lists.lttng.org \
--cc=damien.berget@flyzipline.com \
--cc=kstewart@efficios.com \
--cc=mathieu.desnoyers@efficios.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).