LinuxPPC-Dev Archive mirror
 help / color / mirror / Atom feed
From: Lukas Wunner <lukas@wunner.de>
To: Brian Norris <briannorris@chromium.org>
Cc: manivannan.sadhasivam@oss.qualcomm.com,
	Bjorn Helgaas <bhelgaas@google.com>,
	Mahesh J Salgaonkar <mahesh@linux.ibm.com>,
	Oliver O'Halloran <oohall@gmail.com>,
	Will Deacon <will@kernel.org>,
	Lorenzo Pieralisi <lpieralisi@kernel.org>,
	Krzysztof Wilczynski <kwilczynski@kernel.org>,
	Manivannan Sadhasivam <mani@kernel.org>,
	Rob Herring <robh@kernel.org>, Heiko Stuebner <heiko@sntech.de>,
	Philipp Zabel <p.zabel@pengutronix.de>,
	linux-pci@vger.kernel.org, linux-kernel@vger.kernel.org,
	linuxppc-dev@lists.ozlabs.org,
	linux-arm-kernel@lists.infradead.org,
	linux-arm-msm@vger.kernel.org,
	linux-rockchip@lists.infradead.org,
	Niklas Cassel <cassel@kernel.org>,
	Wilfred Mallawa <wilfred.mallawa@wdc.com>,
	Krishna Chaitanya Chundru <krishna.chundru@oss.qualcomm.com>
Subject: Re: [PATCH v6 2/4] PCI: host-common: Add link down handling for Root Ports
Date: Fri, 29 Aug 2025 10:35:20 +0200	[thread overview]
Message-ID: <aLFmSFe5iyYDrIjt@wunner.de> (raw)
In-Reply-To: <aLC7KIoi-LoH2en4@google.com>

On Thu, Aug 28, 2025 at 01:25:12PM -0700, Brian Norris wrote:
> On the flip side: it's not clear
> PCI_ERS_RESULT_NEED_RESET+pci_channel_io_normal works as documented
> either. An endpoint might think it's requesting a slot reset, but
> pcie_do_recovery() will ignore that and skip reset_subordinates()
> (pci_host_reset_root_port()).
> 
> All in all, the docs sound like endpoints _should_ have control over
> whether we exercise a full port/slot reset for all types of errors. But
> in practice, we do not actually give it that control. i.e., your commit
> message is correct, and the docs are not.
> 
> I have half a mind to suggest the appended change, so the behavior
> matches (some of) the docs a little better [1].

A change similar to the one you're proposing is already queued on the
pci/aer topic branch for v6.18:

https://git.kernel.org/pci/pci/c/d0a2dee7d458

Here's the corresponding cover letter:

https://lore.kernel.org/r/cover.1755008151.git.lukas@wunner.de

There was a discussion why I didn't take the exact same approach you're
proposing, but only a similar one:

https://lore.kernel.org/r/aJ2uE6v46Zib30Jh@wunner.de
https://lore.kernel.org/r/aKHWf3L0NCl_CET5@wunner.de


> Specifically, I'm trying to see what's supposed to happen with
> PCI_ERS_RESULT_CAN_RECOVER. I see that for pci_channel_io_frozen, almost
> all endpoint drivers return PCI_ERS_RESULT_NEED_RESET, but if drivers
> actually return PCI_ERS_RESULT_CAN_RECOVER, it's unclear what should
> happen.
> 
> Today, we don't actually respect it; pcie_do_recovery() just calls
> reset_subordinates() (pci_host_reset_root_port()) unconditionally. The
> only thing that return code affects is whether we call
> report_mmio_enabled() vs report_slot_reset() afterward. This seems odd.

In the series queued on pci/aer, I've only allowed drivers to opt in
to a reset on Non-Fatal Errors.  I didn't dare also letting them opt
out of a reset on Fatal Errors.

These changes of behavior are always risky, so it seemed prudent to not
introduce too many changes at once.  There was no urgent need to also
change behavior for Fatal Errors for the use case at hand (the xe graphics
driver).  I went through all drivers with pci_error_handlers to avoid
breaking any of them.  It's very tedious work, takes weeks.  It would
be necessary to do that again when changing behavior for Fatal Errors.

pcieaer-howto.rst justifies the unconditional reset on Fatal Errors by
saying that the link is unreliable and that a reset is thus required.

On the other hand, pci-error-recovery.rst (which is a few months older
than pcieaer-howto.rst) says in section "STEP 3: Link Reset":
"This is a PCIe specific step and is done whenever a fatal error has been
detected"

I'm wondering if the authors of pcieaer-howto.rst took that at face value
and thought they'd *have* to reset the link on Fatal Errors.

Looking through the Fatal Errors in PCIe r7.0 sec 6.2.7, I think a reset
is justified for some of them, but optional for others.  Which leads me
to believe that the AER driver should actually enforce a reset only for
certain Fatal Errors, not all of them.  So this seems like something
worth revisiting in the future.


> All in all, the docs sound like endpoints _should_ have control over
> whether we exercise a full port/slot reset for all types of errors. But
> in practice, we do not actually give it that control. i.e., your commit
> message is correct, and the docs are not.

Indeed the documentation is no longer in sync with the code.  I've just
submitted a series to rectify that and cc'ed you:

https://lore.kernel.org/r/cover.1756451884.git.lukas@wunner.de

Thanks,

Lukas


  reply	other threads:[~2025-08-29  8:35 UTC|newest]

Thread overview: 22+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-07-15 14:21 [PATCH v6 0/4] PCI: Add support for resetting the Root Ports in a platform specific way Manivannan Sadhasivam via B4 Relay
2025-07-15 14:21 ` [PATCH v6 1/4] PCI/ERR: " Manivannan Sadhasivam via B4 Relay
2025-07-17 18:28   ` [PATCH v6 1/4] PCI/ERR: Add support for resetting the Root Ports in a platform specific wayy Frank Li
2025-07-15 14:21 ` [PATCH v6 2/4] PCI: host-common: Add link down handling for Root Ports Manivannan Sadhasivam via B4 Relay
2025-07-17 18:31   ` [PATCH v6 2/4] PCI: host-common: Add link down handling for Root Portsy Frank Li
2025-08-28 20:25   ` [PATCH v6 2/4] PCI: host-common: Add link down handling for Root Ports Brian Norris
2025-08-29  8:35     ` Lukas Wunner [this message]
2025-08-29 23:58       ` Brian Norris
2025-07-15 14:21 ` [PATCH v6 3/4] PCI: qcom: Add support for resetting the Root Port due to link down event Manivannan Sadhasivam via B4 Relay
2025-07-15 14:21 ` [PATCH v6 4/4] PCI: dw-rockchip: Add support to reset Root Port upon " Manivannan Sadhasivam via B4 Relay
2025-07-18  3:58 ` [PATCH v6 0/4] PCI: Add support for resetting the Root Ports in a platform specific way Krishna Chaitanya Chundru
2025-07-18 10:28 ` Niklas Cassel
2025-07-18 10:39   ` Niklas Cassel
2025-07-24  5:30     ` Manivannan Sadhasivam
2025-08-15  9:07       ` Niklas Cassel
2025-08-29 16:14         ` Manivannan Sadhasivam
2025-09-04 14:03           ` Niklas Cassel
2025-07-24  9:28 ` Hongxing Zhu
2025-08-28 20:01 ` Brian Norris
2025-08-29 13:56   ` Manivannan Sadhasivam
2025-09-23 13:06 ` David Bremner
2025-09-23 13:46   ` Niklas Cassel

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=aLFmSFe5iyYDrIjt@wunner.de \
    --to=lukas@wunner.de \
    --cc=bhelgaas@google.com \
    --cc=briannorris@chromium.org \
    --cc=cassel@kernel.org \
    --cc=heiko@sntech.de \
    --cc=krishna.chundru@oss.qualcomm.com \
    --cc=kwilczynski@kernel.org \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linux-arm-msm@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-pci@vger.kernel.org \
    --cc=linux-rockchip@lists.infradead.org \
    --cc=linuxppc-dev@lists.ozlabs.org \
    --cc=lpieralisi@kernel.org \
    --cc=mahesh@linux.ibm.com \
    --cc=mani@kernel.org \
    --cc=manivannan.sadhasivam@oss.qualcomm.com \
    --cc=oohall@gmail.com \
    --cc=p.zabel@pengutronix.de \
    --cc=robh@kernel.org \
    --cc=wilfred.mallawa@wdc.com \
    --cc=will@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).