LinuxPPC-Dev Archive mirror
 help / color / mirror / Atom feed
From: Manivannan Sadhasivam <mani@kernel.org>
To: Niklas Cassel <cassel@kernel.org>
Cc: manivannan.sadhasivam@oss.qualcomm.com,
	"Bjorn Helgaas" <bhelgaas@google.com>,
	"Mahesh J Salgaonkar" <mahesh@linux.ibm.com>,
	"Oliver O'Halloran" <oohall@gmail.com>,
	"Will Deacon" <will@kernel.org>,
	"Lorenzo Pieralisi" <lpieralisi@kernel.org>,
	"Krzysztof Wilczyński" <kwilczynski@kernel.org>,
	"Rob Herring" <robh@kernel.org>,
	"Heiko Stuebner" <heiko@sntech.de>,
	"Philipp Zabel" <p.zabel@pengutronix.de>,
	linux-pci@vger.kernel.org, linux-kernel@vger.kernel.org,
	linuxppc-dev@lists.ozlabs.org,
	linux-arm-kernel@lists.infradead.org,
	linux-arm-msm@vger.kernel.org,
	linux-rockchip@lists.infradead.org,
	"Wilfred Mallawa" <wilfred.mallawa@wdc.com>,
	"Krishna Chaitanya Chundru" <krishna.chundru@oss.qualcomm.com>,
	"Lukas Wunner" <lukas@wunner.de>
Subject: Re: [PATCH v6 0/4] PCI: Add support for resetting the Root Ports in a platform specific way
Date: Fri, 29 Aug 2025 21:44:08 +0530	[thread overview]
Message-ID: <lakgphb7ym3cybwmpdqyipzi4tlkwbfijzhd4r6hvhho3pc7iu@6ludgw6wqkjh> (raw)
In-Reply-To: <aJ743hJw-T9y3waX@ryzen>

On Fri, Aug 15, 2025 at 11:07:42AM GMT, Niklas Cassel wrote:
> Hello Mani,
> 
> Sorry for the delayed reply.
> I just came back from vacation.
> 
> On Thu, Jul 24, 2025 at 11:00:05AM +0530, Manivannan Sadhasivam wrote:
> > On Fri, Jul 18, 2025 at 12:39:50PM GMT, Niklas Cassel wrote:
> > > On Fri, Jul 18, 2025 at 12:28:44PM +0200, Niklas Cassel wrote:
> > > > On Tue, Jul 15, 2025 at 07:51:03PM +0530, Manivannan Sadhasivam via B4 Relay wrote:
> > > > 2) Testing link down reset:
> > > > 
> > > > selftests before link down reset:
> > > > # FAILED: 14 / 16 tests passed.
> > > > 
> > > > ## On EP side:
> > > > # echo 0 > /sys/kernel/config/pci_ep/controllers/a40000000.pcie-ep/start && \
> > > >   sleep 0.1 && echo 1 > /sys/kernel/config/pci_ep/controllers/a40000000.pcie-ep/start
> > > > 
> > > > 
> > > > [  111.137162] rockchip-dw-pcie a40000000.pcie: PCIE_CLIENT_INTR_STATUS_MISC: 0x4
> > > > [  111.137881] rockchip-dw-pcie a40000000.pcie: LTSSM_STATUS: 0x0
> > > > [  111.138432] rockchip-dw-pcie a40000000.pcie: hot reset or link-down reset
> > > > [  111.139067] pcieport 0000:00:00.0: Recovering Root Port due to Link Down
> > > > [  111.139686] pci-endpoint-test 0000:01:00.0: AER: can't recover (no error_detected callback)
> > > > [  111.255407] rockchip-dw-pcie a40000000.pcie: PCIe Gen.3 x4 link up
> > > > [  111.256019] rockchip-dw-pcie a40000000.pcie: Root Port reset completed
> > > > [  111.383401] pcieport 0000:00:00.0: Root Port has been reset
> > > > [  111.384060] pcieport 0000:00:00.0: AER: device recovery failed
> > > > [  111.384582] rockchip-dw-pcie a40000000.pcie: PCIE_CLIENT_INTR_STATUS_MISC: 0x3
> > > > [  111.385218] rockchip-dw-pcie a40000000.pcie: LTSSM_STATUS: 0x230011
> > > > [  111.385771] rockchip-dw-pcie a40000000.pcie: Received Link up event. Starting enumeration!
> > > > [  111.390866] pcieport 0000:00:00.0: bridge configuration invalid ([bus 00-00]), reconfiguring
> > > > [  111.391650] pci_bus 0000:01: busn_res: [bus 01-ff] end is updated to 01
> > > > 
> > > > Basically all tests timeout
> > > > # FAILED: 1 / 16 tests passed.
> > > > 
> > > > Which is the same as before this patch series.
> > > 
> > > The above was with CONFIG_PCIEAER=y
> > > 
> > 
> > This is kind of expected since the pci_endpoint_test driver doesn't have the AER
> > err_handlers defined.
> 
> I see.
> Would be nice if we could add them then, so that we can verify that this
> series is working as intended.
> 
> 
> > 
> > > Wilfred suggested that I tried without this config set.
> > > 
> > > However, doing so, I got the exact same result:
> > > # FAILED: 1 / 16 tests passed.
> > > 
> > 
> > Interesting. Could you please share the dmesg log like above.
> 
> It is looking exactly like the dmesg above
> 
> [   86.820059] rockchip-dw-pcie a40000000.pcie: PCIE_CLIENT_INTR_STATUS_MISC: 0x4
> [   86.820791] rockchip-dw-pcie a40000000.pcie: LTSSM_STATUS: 0x0
> [   86.821344] rockchip-dw-pcie a40000000.pcie: hot reset or link-down reset
> [   86.821978] pcieport 0000:00:00.0: Recovering Root Port due to Link Down
> [   87.040551] rockchip-dw-pcie a40000000.pcie: PCIe Gen.3 x4 link up
> [   87.041138] rockchip-dw-pcie a40000000.pcie: Root Port reset completed
> [   87.168378] pcieport 0000:00:00.0: Root Port has been reset
> [   87.168882] rockchip-dw-pcie a40000000.pcie: PCIE_CLIENT_INTR_STATUS_MISC: 0x3
> [   87.169519] rockchip-dw-pcie a40000000.pcie: LTSSM_STATUS: 0x230011
> [   87.272463] rockchip-dw-pcie a40000000.pcie: Received Link up event. Starting enumeration!
> [   87.277552] pcieport 0000:00:00.0: bridge configuration invalid ([bus 00-00]), reconfiguring
> [   87.278314] pci_bus 0000:01: busn_res: [bus 01-ff] end is updated to 01
> 
> except that we don't get the:
> > [  111.139686] pci-endpoint-test 0000:01:00.0: AER: can't recover (no error_detected callback)
> > [  111.384060] pcieport 0000:00:00.0: AER: device recovery failed
> 

Ok, thanks for the logs. I guess what is happening here is that we are not
saving/restoring the config space of the devices under the Root Port if linkdown
is happens. TBH, we cannot do that from the PCI core since once linkdown
happens, we cannot access any devices underneath the Root Port. But if
err_handlers are available for drivers for all devices, they could do something
smart like below:

diff --git a/drivers/misc/pci_endpoint_test.c b/drivers/misc/pci_endpoint_test.c
index c4e5e2c977be..9aabf1fe902e 100644
--- a/drivers/misc/pci_endpoint_test.c
+++ b/drivers/misc/pci_endpoint_test.c
@@ -989,6 +989,8 @@ static int pci_endpoint_test_probe(struct pci_dev *pdev,
 
        pci_set_drvdata(pdev, test);
 
+       pci_save_state(pdev);
+
        id = ida_alloc(&pci_endpoint_test_ida, GFP_KERNEL);
        if (id < 0) {
                ret = id;
@@ -1140,12 +1142,31 @@ static const struct pci_device_id pci_endpoint_test_tbl[] = {
 };
 MODULE_DEVICE_TABLE(pci, pci_endpoint_test_tbl);
 
+static pci_ers_result_t pci_endpoint_test_error_detected(struct pci_dev *pdev,
+                                              pci_channel_state_t state)
+{
+       return PCI_ERS_RESULT_NEED_RESET;
+}
+
+static pci_ers_result_t pci_endpoint_test_slot_reset(struct pci_dev *pdev)
+{
+       pci_restore_state(pdev);
+
+       return PCI_ERS_RESULT_RECOVERED;
+}
+
+static const struct pci_error_handlers pci_endpoint_test_err_handler = {
+       .error_detected = pci_endpoint_test_error_detected,
+       .slot_reset = pci_endpoint_test_slot_reset,
+};
+
 static struct pci_driver pci_endpoint_test_driver = {
        .name           = DRV_MODULE_NAME,
        .id_table       = pci_endpoint_test_tbl,
        .probe          = pci_endpoint_test_probe,
        .remove         = pci_endpoint_test_remove,
        .sriov_configure = pci_sriov_configure_simple,
+       .err_handler    = &pci_endpoint_test_err_handler,
 };
 module_pci_driver(pci_endpoint_test_driver);

This essentially saves the good known config space during probe and restores it
during the slot_reset callback. Ofc, the state would've been overwritten if
suspend/resume happens in-between, but the point I'm making is that unless all
device drivers restore their known config space, devices cannot be resumed
properly post linkdown recovery.

I can add a patch based on the above diff in next revision if that helps. Right
now, I do not have access to my endpoint test setup. So can't test anything.

- Mani

-- 
மணிவண்ணன் சதாசிவம்


  reply	other threads:[~2025-08-29 16:14 UTC|newest]

Thread overview: 22+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-07-15 14:21 [PATCH v6 0/4] PCI: Add support for resetting the Root Ports in a platform specific way Manivannan Sadhasivam via B4 Relay
2025-07-15 14:21 ` [PATCH v6 1/4] PCI/ERR: " Manivannan Sadhasivam via B4 Relay
2025-07-17 18:28   ` [PATCH v6 1/4] PCI/ERR: Add support for resetting the Root Ports in a platform specific wayy Frank Li
2025-07-15 14:21 ` [PATCH v6 2/4] PCI: host-common: Add link down handling for Root Ports Manivannan Sadhasivam via B4 Relay
2025-07-17 18:31   ` [PATCH v6 2/4] PCI: host-common: Add link down handling for Root Portsy Frank Li
2025-08-28 20:25   ` [PATCH v6 2/4] PCI: host-common: Add link down handling for Root Ports Brian Norris
2025-08-29  8:35     ` Lukas Wunner
2025-08-29 23:58       ` Brian Norris
2025-07-15 14:21 ` [PATCH v6 3/4] PCI: qcom: Add support for resetting the Root Port due to link down event Manivannan Sadhasivam via B4 Relay
2025-07-15 14:21 ` [PATCH v6 4/4] PCI: dw-rockchip: Add support to reset Root Port upon " Manivannan Sadhasivam via B4 Relay
2025-07-18  3:58 ` [PATCH v6 0/4] PCI: Add support for resetting the Root Ports in a platform specific way Krishna Chaitanya Chundru
2025-07-18 10:28 ` Niklas Cassel
2025-07-18 10:39   ` Niklas Cassel
2025-07-24  5:30     ` Manivannan Sadhasivam
2025-08-15  9:07       ` Niklas Cassel
2025-08-29 16:14         ` Manivannan Sadhasivam [this message]
2025-09-04 14:03           ` Niklas Cassel
2025-07-24  9:28 ` Hongxing Zhu
2025-08-28 20:01 ` Brian Norris
2025-08-29 13:56   ` Manivannan Sadhasivam
2025-09-23 13:06 ` David Bremner
2025-09-23 13:46   ` Niklas Cassel

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=lakgphb7ym3cybwmpdqyipzi4tlkwbfijzhd4r6hvhho3pc7iu@6ludgw6wqkjh \
    --to=mani@kernel.org \
    --cc=bhelgaas@google.com \
    --cc=cassel@kernel.org \
    --cc=heiko@sntech.de \
    --cc=krishna.chundru@oss.qualcomm.com \
    --cc=kwilczynski@kernel.org \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linux-arm-msm@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-pci@vger.kernel.org \
    --cc=linux-rockchip@lists.infradead.org \
    --cc=linuxppc-dev@lists.ozlabs.org \
    --cc=lpieralisi@kernel.org \
    --cc=lukas@wunner.de \
    --cc=mahesh@linux.ibm.com \
    --cc=manivannan.sadhasivam@oss.qualcomm.com \
    --cc=oohall@gmail.com \
    --cc=p.zabel@pengutronix.de \
    --cc=robh@kernel.org \
    --cc=wilfred.mallawa@wdc.com \
    --cc=will@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).