All the mail mirrored from lore.kernel.org
 help / color / mirror / Atom feed
From: Thorsten Leemhuis <linux@leemhuis.info>
To: Jonathan Corbet <corbet@lwn.net>
Cc: regressions@lists.linux.dev, linux-doc@vger.kernel.org,
	linux-kernel@vger.kernel.org, workflows@vger.kernel.org
Subject: [RFC PATCH v1 1/2] docs: reporting-issue: rework the detailed guide
Date: Tue, 26 Mar 2024 13:21:28 +0100	[thread overview]
Message-ID: <ac847c1c539d5f2c3d55ab363a5038ce6d303424.1711455295.git.linux@leemhuis.info> (raw)
In-Reply-To: <cover.1711455295.git.linux@leemhuis.info>

Rework the detailed step-by-step guide for various reasons:

* Simplify the search with the help of lore.kernel.org/all/, which did
  not exist when the text was written.

* Make use of the recently added document
  Documentation/admin-guide/verify-bugs-and-bisect-regressions.rst,
  which covers many steps this text partly covered way better.

* The 'quickly report a stable regression to the stable team' approach
  hardly worked out: most of the time the regression was not known yet.
  Try a different approach using the regressions list.

* Reports about stable/longterm regressions most of the time were
  greeted with a brief reply along the lines of 'Is mainline affected as
  well?'; this is needed to determine who is responsible, so we might as
  well make the reporter check that before sending the report (which
  verify-bugs-and-bisect-regressions.rst already tells them to do, too).

* A lot of fine tuning after seeing what people were struggling with.

FIXME: adjust the entries in the reference section to match these
changes.

Not-signed-off-by: Thorsten Leemhuis <linux@leemhuis.info>
---
 .../admin-guide/reporting-issues.rst          | 391 ++++++++++--------
 1 file changed, 210 insertions(+), 181 deletions(-)

diff --git a/Documentation/admin-guide/reporting-issues.rst b/Documentation/admin-guide/reporting-issues.rst
index 2fd5a030235ad0..e6083946c146e8 100644
--- a/Documentation/admin-guide/reporting-issues.rst
+++ b/Documentation/admin-guide/reporting-issues.rst
@@ -48,187 +48,216 @@ Once the report is out, answer any questions that come up and help where you
 can. That includes keeping the ball rolling by occasionally retesting with newer
 releases and sending a status update afterwards.
 
-Step-by-step guide how to report issues to the kernel maintainers
-=================================================================
-
-The above TL;DR outlines roughly how to report issues to the Linux kernel
-developers. It might be all that's needed for people already familiar with
-reporting issues to Free/Libre & Open Source Software (FLOSS) projects. For
-everyone else there is this section. It is more detailed and uses a
-step-by-step approach. It still tries to be brief for readability and leaves
-out a lot of details; those are described below the step-by-step guide in a
-reference section, which explains each of the steps in more detail.
-
-Note: this section covers a few more aspects than the TL;DR and does things in
-a slightly different order. That's in your interest, to make sure you notice
-early if an issue that looks like a Linux kernel problem is actually caused by
-something else. These steps thus help to ensure the time you invest in this
-process won't feel wasted in the end:
-
- * Are you facing an issue with a Linux kernel a hardware or software vendor
-   provided? Then in almost all cases you are better off to stop reading this
-   document and reporting the issue to your vendor instead, unless you are
-   willing to install the latest Linux version yourself. Be aware the latter
-   will often be needed anyway to hunt down and fix issues.
-
- * Perform a rough search for existing reports with your favorite internet
-   search engine; additionally, check the archives of the `Linux Kernel Mailing
-   List (LKML) <https://lore.kernel.org/lkml/>`_. If you find matching reports,
-   join the discussion instead of sending a new one.
-
- * See if the issue you are dealing with qualifies as regression, security
-   issue, or a really severe problem: those are 'issues of high priority' that
-   need special handling in some steps that are about to follow.
-
- * Make sure it's not the kernel's surroundings that are causing the issue
-   you face.
-
- * Create a fresh backup and put system repair and restore tools at hand.
-
- * Ensure your system does not enhance its kernels by building additional
-   kernel modules on-the-fly, which solutions like DKMS might be doing locally
-   without your knowledge.
-
- * Check if your kernel was 'tainted' when the issue occurred, as the event
-   that made the kernel set this flag might be causing the issue you face.
-
- * Write down coarsely how to reproduce the issue. If you deal with multiple
-   issues at once, create separate notes for each of them and make sure they
-   work independently on a freshly booted system. That's needed, as each issue
-   needs to get reported to the kernel developers separately, unless they are
-   strongly entangled.
-
- * If you are facing a regression within a stable or longterm version line
-   (say something broke when updating from 5.10.4 to 5.10.5), scroll down to
-   'Dealing with regressions within a stable and longterm kernel line'.
-
- * Locate the driver or kernel subsystem that seems to be causing the issue.
-   Find out how and where its developers expect reports. Note: most of the
-   time this won't be bugzilla.kernel.org, as issues typically need to be sent
-   by mail to a maintainer and a public mailing list.
-
- * Search the archives of the bug tracker or mailing list in question
-   thoroughly for reports that might match your issue. If you find anything,
-   join the discussion instead of sending a new report.
-
-After these preparations you'll now enter the main part:
-
- * Unless you are already running the latest 'mainline' Linux kernel, better
-   go and install it for the reporting process. Testing and reporting with
-   the latest 'stable' Linux can be an acceptable alternative in some
-   situations; during the merge window that actually might be even the best
-   approach, but in that development phase it can be an even better idea to
-   suspend your efforts for a few days anyway. Whatever version you choose,
-   ideally use a 'vanilla' build. Ignoring these advices will dramatically
-   increase the risk your report will be rejected or ignored.
-
- * Ensure the kernel you just installed does not 'taint' itself when
-   running.
-
- * Reproduce the issue with the kernel you just installed. If it doesn't show
-   up there, scroll down to the instructions for issues only happening with
-   stable and longterm kernels.
-
- * Optimize your notes: try to find and write the most straightforward way to
-   reproduce your issue. Make sure the end result has all the important
-   details, and at the same time is easy to read and understand for others
-   that hear about it for the first time. And if you learned something in this
-   process, consider searching again for existing reports about the issue.
-
- * If your failure involves a 'panic', 'Oops', 'warning', or 'BUG', consider
-   decoding the kernel log to find the line of code that triggered the error.
-
- * If your problem is a regression, try to narrow down when the issue was
-   introduced as much as possible.
-
- * Start to compile the report by writing a detailed description about the
-   issue. Always mention a few things: the latest kernel version you installed
-   for reproducing, the Linux Distribution used, and your notes on how to
-   reproduce the issue. Ideally, make the kernel's build configuration
-   (.config) and the output from ``dmesg`` available somewhere on the net and
-   link to it. Include or upload all other information that might be relevant,
-   like the output/screenshot of an Oops or the output from ``lspci``. Once
-   you wrote this main part, insert a normal length paragraph on top of it
-   outlining the issue and the impact quickly. On top of this add one sentence
-   that briefly describes the problem and gets people to read on. Now give the
-   thing a descriptive title or subject that yet again is shorter. Then you're
-   ready to send or file the report like the MAINTAINERS file told you, unless
-   you are dealing with one of those 'issues of high priority': they need
-   special care which is explained in 'Special handling for high priority
-   issues' below.
-
- * Wait for reactions and keep the thing rolling until you can accept the
-   outcome in one way or the other. Thus react publicly and in a timely manner
-   to any inquiries. Test proposed fixes. Do proactive testing: retest with at
-   least every first release candidate (RC) of a new mainline version and
-   report your results. Send friendly reminders if things stall. And try to
-   help yourself, if you don't get any help or if it's unsatisfying.
-
-
-Reporting regressions within a stable and longterm kernel line
---------------------------------------------------------------
-
-This subsection is for you, if you followed above process and got sent here at
-the point about regression within a stable or longterm kernel version line. You
-face one of those if something breaks when updating from 5.10.4 to 5.10.5 (a
-switch from 5.9.15 to 5.10.5 does not qualify). The developers want to fix such
-regressions as quickly as possible, hence there is a streamlined process to
-report them:
-
- * Check if the kernel developers still maintain the Linux kernel version
-   line you care about: go to the  `front page of kernel.org
-   <https://kernel.org/>`_ and make sure it mentions
-   the latest release of the particular version line without an '[EOL]' tag.
-
- * Check the archives of the `Linux stable mailing list
-   <https://lore.kernel.org/stable/>`_ for existing reports.
-
- * Install the latest release from the particular version line as a vanilla
-   kernel. Ensure this kernel is not tainted and still shows the problem, as
-   the issue might have already been fixed there. If you first noticed the
-   problem with a vendor kernel, check a vanilla build of the last version
-   known to work performs fine as well.
-
- * Send a short problem report to the Linux stable mailing list
-   (stable@vger.kernel.org) and CC the Linux regressions mailing list
-   (regressions@lists.linux.dev); if you suspect the cause in a particular
-   subsystem, CC its maintainer and its mailing list. Roughly describe the
-   issue and ideally explain how to reproduce it. Mention the first version
-   that shows the problem and the last version that's working fine. Then
-   wait for further instructions.
-
-The reference section below explains each of these steps in more detail.
-
-
-Reporting issues only occurring in older kernel version lines
--------------------------------------------------------------
-
-This subsection is for you, if you tried the latest mainline kernel as outlined
-above, but failed to reproduce your issue there; at the same time you want to
-see the issue fixed in a still supported stable or longterm series or vendor
-kernels regularly rebased on those. If that the case, follow these steps:
-
- * Prepare yourself for the possibility that going through the next few steps
-   might not get the issue solved in older releases: the fix might be too big
-   or risky to get backported there.
-
- * Perform the first three steps in the section "Dealing with regressions
-   within a stable and longterm kernel line" above.
-
- * Search the Linux kernel version control system for the change that fixed
-   the issue in mainline, as its commit message might tell you if the fix is
-   scheduled for backporting already. If you don't find anything that way,
-   search the appropriate mailing lists for posts that discuss such an issue
-   or peer-review possible fixes; then check the discussions if the fix was
-   deemed unsuitable for backporting. If backporting was not considered at
-   all, join the newest discussion, asking if it's in the cards.
-
- * One of the former steps should lead to a solution. If that doesn't work
-   out, ask the maintainers for the subsystem that seems to be causing the
-   issue for advice; CC the mailing list for the particular subsystem as well
-   as the stable mailing list.
-
-The reference section below explains each of these steps in more detail.
+The detailed step-by-step guide on reporting Linux kernel issues
+================================================================
+
+The short guide above might be all needed for people already familiar
+with reporting issues to Free/Libre & Open Source Software projects. For
+everyone else there is this more detailed step-by-step guide. It still tries to
+be brief and leaves a lot of details occasionally relevant to a reference
+section, which holds additional information for almost all of the steps.
+
+Note: this step-by-step guide covers more aspects than the short guide above and
+does things in a slightly different order; that is done in the reader's interest,
+to make sure you notice early on when  on the wrong track.
+
+* Be aware you must have or install a fresh vanilla mainline kernel for
+  reporting; you furthermore must remove any software that builds or relies on
+  externally developed kernel modules possibly installed. There is also a decent
+  chance you will have to build a patched kernel yourself to help resolve the
+  issue.
+
+  In case that sounds do demanding to you, better report the issue to the vendor
+  who built your kernel (usually your Linux distributor or hardware manufacturer).
+
+* Skim the output of ``journalctl -k`` for any indicators of problems that might
+  lead to your bug.
+
+* Check if the kernel was already 'tainted' when the issue first occurred: the
+  event that led to this flag being set might cause your issue, even if it looks
+  totally unrelated.
+
+* Consider some glitch in your kernel's environment makes it misbehave -- like
+  a hardware defect, a mis-configured system firmware, an overclocked component,
+  a broken initramfs, an inconsistent file system, broken firmware files,
+  a pre-release compiler, or a malfunctioning/misconfigured Linux distribution.
+
+* If you deal with multiple issues at once, process them separately from now on.
+  If there is even a small chance they are related, briefly mention the other
+  issues in each of the reports later, ideally while linking to the others.
+
+* Search for fixes and earlier reports referring to an issue like yours. Start
+  by checking `lore <https://lore.kernel.org/all/>`_. Then perform a general
+  internet search. Consult :ref:`MAINTAINERS <maintainers>` to determine where
+  developers of the affected code expect bugs to be submitted to; if in a doubt,
+  use your best guess to determine the driver or kernel subsystem. If its
+  developers have a dedicated mailing list not archived on lore, search its
+  archives; when they are among the few that uses one of
+  various bug trackers, search it as well. Note, bugzilla.kernel.org
+  is the right place to file bugs only for a small percentage of the kernel; if
+  you submit bugs for other code there it most likely will be ignored.
+
+  If you find fixes, try them. If you find matching reports, evaluate whatever
+  is wiser: joining the discussion or reporting the problem anew. In the latter
+  case mention and link to the related report you found; after you submit it,
+  add a note to the related report along the lines of 'I have a problem that
+  might be the same or related, for details see <link_to_your_report>'.
+
+* Are you facing a regression? One still occurring with a less than two
+  (ideally: one) weeks old kernel from the affected series? A kernel that is
+  vanilla or close to it? Then send a brief (one or two short paragraphs) email
+  to <regressions@lists.linux.dev> asking if the problem is known already.
+  Consider proceeding with this guide immediately to confine the problem and
+  report it properly; definitely do so, if you don't receive any helpful
+  answer within three days.
+
+* Evaluate if the issue you are dealing with qualifies as regression, security
+  issue, or a really severe problem: those need special handling in some of the
+  following steps.
+
+* Write down coarsely how to reproduce the issue on a freshly booted system.
+
+* Verify the bug and potentially bisect any regression as described in
+  Documentation/admin-guide/verify-bugs-and-bisect-regressions.rst;
+  alternatively handle the tasks it covers on your own:
+
+  * Verify the bug occurs with an up-to-date kernel. For regressions within a
+    still supported stable or longterm series this means the latest release from
+    that series. In all other cases, this means a mainline release, pre-release, or
+    snapshot ideally less than one week old and two at maximum; the latest release
+    from the newest stable series might work as well, especially if the series
+    is based on a mainline version released in the past two weeks.
+
+  * In case of a regression, consider bisecting it. If it is one within a stable
+    or longterm series, you must verify if current mainline is affected as well.
+
+  * All kernels used for verifying and reporting bugs must be free of externally
+    developed modules (like Nvidia's graphics drivers, OpenZFS, or VirtualBox's
+    host drivers). The kernels also should be built from pristine (aka 'vanilla')
+    Linux sources, but lightly patched might work, too. The kernels furthermore
+    should not be 'tainted' when the issue occurs.
+
+  Note, don't skip this step or take its demands lightheartedly, as there is a
+  decent chance your report otherwise will be ignored or welcomed brusquely.
+
+* If you learned anything new about the bug while following this guide so far,
+  consider searching once more for earlier reports and fixes.
+
+* Were you unable to reproduce a bug with a current mainline kernel you want to
+  see fixed in a stable or longterm series? A bug that is not a regression? Then
+  move over to ‘Resolving non-regressions only occurring in stable or longterm
+  kernels’.
+
+* Optional: if your failure involves a 'panic', 'Oops', 'warning', or 'BUG',
+  ideally decode the included stack trace.
+
+* Prepare the report by writing a detailed description of the issue.
+
+  Always mention the Linux distribution and the kernel version used for the
+  verification; also include your notes on how to reproduce the issue. If your
+  failure involves a 'panic', 'Oops', 'warning', or 'BUG', include a copy or
+  photo of it.
+
+  Most of the time you also want to describe relevant aspects of your
+  environment, like the machine's model name, the relevant hardware components,
+  or the version of related userspace drivers. Often you want to also save the
+  output of ``journalctl -k`` to a file you later attach to your report or
+  upload somewhere and link to.
+
+  If there other aspects about the environment likely are relevant, attach or
+  upload & link detailed information about is as well, like the output from
+  commands as ``lsblk``, ``lspci``, ``lsusb.py`` and
+  ``grep -s '' /sys/class/dmi/id/*``.
+
+  If anything in the attached or linked files is certainly relevant, ensure
+  to copy that part to the body of the report to make it easily accessible.
+  Furthermore make sure to not overload the report with many or huge
+  attachments: developers will ask for additional data when needed.
+
+  Ensure both the subject and the first sentence of the report outlines the core
+  of the problem and gets people interested enough to read on.
+
+  When finished, review and optimize the report once more to make it as
+  straightforward as possible and the core of the problem easy to grasp.
+
+* Submit your report in the appropriate way, which depends on the outcome of the
+  verification:
+
+  * In case you deal with a security issue, follow the instructions in
+    Documentation/process/security-bugs.rst.
+
+  * Are you facing a regression within a stable or longterm kernel series you
+    were unable to reproduce with a fresh mainline kernel? Then report it by
+    email to the stable team while CCing the regressions lists (To:
+    Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
+    Sasha Levin <sashal@kernel.org>; CC: stable@vger.kernel.org,
+    regressions@lists.linux.dev).
+
+  * In all other cases, submit the report as specified in MAINTAINERS. In case
+    of a regression you have to report by mail, CC the regressions list
+    (regressions@lists.linux.dev); when you know the culprit, also CC everyone
+    in its 'Signed-off-by' chain. In case of a regression you had to file in a
+    bug tracker, write a short heads-up email with a link to the report to the
+    list and everyone that signed the patch off, if the culprit is known.
+
+    Did you send the brief inquiry about a regression mentioned earlier? Then in
+    both of these cases keep it involved: either send your report as a reply to
+    the earlier inquiry while adding relevant recipients or send a quick note
+    with a link to the proper report.
+
+* Wait for reactions and keep the ball rolling until you can accept the outcome
+  in one way or the other. That among others means:
+
+  * React publicly and in a timely manner to any inquiries.
+
+  * Try to quickly test proposed fixes.
+
+  * Perform proactive testing: retest with at least every first release
+    candidate (e.g. -rc1) of a new mainline version and report your findings in
+    a reply to your report.
+
+  * If things stall for more than three or four weeks, check if that happened
+    due to an inadequate report of yours; if not, send a friendly inquiry.
+
+  * Be aware that nobody is obliged to help you, unless it is a recent
+    regression, a security issue, or a really severe problem; hence try to help
+    yourself, if you don't receive any or only unsatisfying help.
+
+Resolving non-regressions only occurring in stable or longterm kernels
+----------------------------------------------------------------------
+
+Are you facing an issue in a still supported stable or longterm series you were
+unable to reproduce with a fresh mainline kernel? An issue that is also not a
+regression and still happens in the series latest release? In that case follow
+these steps:
+
+* Prepare yourself for the possibility that trying to resolve the issue resolved
+  in the affected stable or longterm series might not work out: the fix might be
+  too big or risky to include there.
+
+* Search Linux' mainline Git repository or lore for the change that resolved the
+  issue; when unsuccessful, consider using a bisection to find it. Then check
+  the description of the fix for a 'stable tag', e.g, a line like
+  'Cc: <stable@vger.kernel.org>':
+
+ * In case there is such a tag the change is already scheduled for backporting.
+   Usually it will be picked up within two or three weeks after being merged to
+   mainline. Note, a version number after the tag might limit backporting to a
+   series that is newer than the one you care for; plans to backport a change
+   sometimes are also discarded. In such cases search lore or contact the
+   involved developers for details, but you likely are out of luck.
+
+ * If there was no stable tag, search the mailing list archives if backporting
+   nevertheless is in the works. If not, search for the review of the fix and
+   check if backporting to stable and longterm kernels is planned or was
+   rejected. If it's neither, send a reply asking the developers if backporting
+   to the series is an option. Note, they might greenlight it, but unwilling to
+   handle the job themselves -- in that case consider testing and submitting the
+   fix and everything it depends on as explained in
+   Documentation/process/stable-kernel-rules.rst.
+
+ In case you have trouble locating the fix or the discussion about it, consider
+ asking the maintainers and developers of the affected subsystem for advice.
 
 
 Reference section: Reporting issues to the kernel maintainers
-- 
2.44.0


  reply	other threads:[~2024-03-26 12:21 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-03-26 12:21 [RFC PATCH v1 0/2] docs: reporting-issues: rework while involving the 'verify bugs' text Thorsten Leemhuis
2024-03-26 12:21 ` Thorsten Leemhuis [this message]
2024-04-10 20:58   ` [RFC PATCH v1 1/2] docs: reporting-issue: rework the detailed guide Jonathan Corbet
2024-03-26 12:21 ` [RFC PATCH v1 2/2] docs: reporting-issue: rework the TLDR Thorsten Leemhuis
2024-04-10 21:00   ` Jonathan Corbet

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ac847c1c539d5f2c3d55ab363a5038ce6d303424.1711455295.git.linux@leemhuis.info \
    --to=linux@leemhuis.info \
    --cc=corbet@lwn.net \
    --cc=linux-doc@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=regressions@lists.linux.dev \
    --cc=workflows@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.