Linux-PCI Archive mirror
 help / color / mirror / Atom feed
From: Smita Koralahalli <Smita.KoralahalliChannabasappa@amd.com>
To: <linux-pci@vger.kernel.org>, <linux-kernel@vger.kernel.org>
Cc: Bjorn Helgaas <helgaas@kernel.org>,
	Mahesh J Salgaonkar <mahesh@linux.ibm.com>,
	Lukas Wunner <lukas@wunner.de>,
	Yazen Ghannam <yazen.ghannam@amd.com>,
	Ilpo Jarvinen <ilpo.jarvinen@linux.intel.com>,
	Bowman Terry <terry.bowman@amd.com>,
	Hagan Billy <billy.hagan@amd.com>,
	"Simon Guinot" <simon.guinot@seagate.com>,
	"Maciej W . Rozycki" <macro@orcam.me.uk>,
	Kuppuswamy Sathyanarayanan
	<sathyanarayanan.kuppuswamy@linux.intel.com>
Subject: [PATCH] PCI: pciehp: Clear LBMS on hot-remove to prevent link speed reduction
Date: Wed, 24 Apr 2024 03:33:39 +0000	[thread overview]
Message-ID: <20240424033339.250385-1-Smita.KoralahalliChannabasappa@amd.com> (raw)

Clear Link Bandwidth Management Status (LBMS) if set, on a hot-remove event.

The hot-remove event could result in target link speed reduction if LBMS
is set, due to a delay in Presence Detect State Change (PDSC) happening
after a Data Link Layer State Change event (DLLSC).

In reality, PDSC and DLLSC events rarely come in simultaneously. Delay in
PDSC can sometimes be too late and the slot could have already been
powered down just by a DLLSC event. And the delayed PDSC could falsely be
interpreted as an interrupt raised to turn the slot on. This false process
of powering the slot on, without a link forces the kernel to retrain the
link if LBMS is set, to a lower speed to restablish the link thereby
bringing down the link speeds [2].

According to PCIe r6.2 sec 7.5.3.8 [1], it is derived that, LBMS cannot
be set for an unconnected link and if set, it serves the purpose of
indicating that there is actually a device down an inactive link.
However, hardware could have already set LBMS when the device was
connected to the port i.e when the state was DL_Up or DL_Active. Some
hardwares would have even attempted retrain going into recovery mode,
just before transitioning to DL_Down.

Thus the set LBMS is never cleared and might force software to cause link
speed drops when there is no link [2].

Dmesg before:
	pcieport 0000:20:01.1: pciehp: Slot(59): Link Down
	pcieport 0000:20:01.1: pciehp: Slot(59): Card present
	pcieport 0000:20:01.1: broken device, retraining non-functional downstream link at 2.5GT/s
	pcieport 0000:20:01.1: retraining failed
	pcieport 0000:20:01.1: pciehp: Slot(59): No link

Dmesg after:
	pcieport 0000:20:01.1: pciehp: Slot(59): Link Down
	pcieport 0000:20:01.1: pciehp: Slot(59): Card present
	pcieport 0000:20:01.1: pciehp: Slot(59): No link

[1] PCI Express Base Specification Revision 6.2, Jan 25 2024.
    https://members.pcisig.com/wg/PCI-SIG/document/20590
[2] Commit a89c82249c37 ("PCI: Work around PCIe link training failures")

Signed-off-by: Smita Koralahalli <Smita.KoralahalliChannabasappa@amd.com>
---
1. Should be based on top of fixes for link retrain status in
pcie_wait_for_link_delay()
https://patchwork.kernel.org/project/linux-pci/list/?series=824858
https://lore.kernel.org/linux-pci/53b2239b-4a23-a948-a422-4005cbf76148@linux.intel.com/

Without the fixes patch output would be:
	pcieport 0000:20:01.1: pciehp: Slot(59): Link Down
	pcieport 0000:20:01.1: pciehp: Slot(59): Card present
	pcieport 0000:20:01.1: broken device, retraining non-functional downstream link at 2.5GT/s
	pcieport 0000:20:01.1: retraining failed
	pcieport 0000:20:01.1: pciehp: Slot(59): No device found.

2. I initially attempted to wait for both events PDSC and DLLSC to happen
and then turn on the slot.
Similar to: https://lore.kernel.org/lkml/20190205210701.25387-1-mr.nuke.me@gmail.com/
but before turning on the slot.

Something like:
-		ctrl->state = POWERON_STATE;
-		mutex_unlock(&ctrl->state_lock);
-		if (present)
+		if (present && link_active) {
+			ctrl->state = POWERON_STATE;
+			mutex_unlock(&ctrl->state_lock);
			ctrl_info(ctrl, "Slot(%s): Card present\n",
				  slot_name(ctrl));
-		if (link_active)
			ctrl_info(ctrl, "Slot(%s): Link Up\n",
				  slot_name(ctrl));
-		ctrl->request_result = pciehp_enable_slot(ctrl);
-		break;
+			ctrl->request_result = pciehp_enable_slot(ctrl);
+			break;
+		}
+		else {
+			mutex_unlock(&ctrl->state_lock);
+			break;
+		}

This would also avoid printing the lines below on a remove event.
pcieport 0000:20:01.1: pciehp: Slot(59): Card present
pcieport 0000:20:01.1: pciehp: Slot(59): No link

I understand this would likely be not applicable in places where broken
devices hardwire PDS to zero and PDSC would never happen. But I'm open to
making changes if this is more applicable. Because, SW cannot directly
track the interference of HW in attempting link retrain and setting LBMS.

3. I tried introducing delay similar to pcie_wait_for_presence() but I
was not successful in picking the right numbers. Hence hit with the same
link speed drop.

4. For some reason I was unable to clear LBMS with:
	pcie_capability_clear_word(ctrl->pcie->port, PCI_EXP_LNKSTA,
				   PCI_EXP_LNKSTA_LBMS);
---
 drivers/pci/hotplug/pciehp_pci.c | 8 +++++++-
 1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/drivers/pci/hotplug/pciehp_pci.c b/drivers/pci/hotplug/pciehp_pci.c
index ad12515a4a12..9155fdfd1d37 100644
--- a/drivers/pci/hotplug/pciehp_pci.c
+++ b/drivers/pci/hotplug/pciehp_pci.c
@@ -92,7 +92,7 @@ void pciehp_unconfigure_device(struct controller *ctrl, bool presence)
 {
 	struct pci_dev *dev, *temp;
 	struct pci_bus *parent = ctrl->pcie->port->subordinate;
-	u16 command;
+	u16 command, lnksta;
 
 	ctrl_dbg(ctrl, "%s: domain:bus:dev = %04x:%02x:00\n",
 		 __func__, pci_domain_nr(parent), parent->number);
@@ -134,4 +134,10 @@ void pciehp_unconfigure_device(struct controller *ctrl, bool presence)
 	}
 
 	pci_unlock_rescan_remove();
+
+	/* Clear LBMS on removal */
+	pcie_capability_read_word(ctrl->pcie->port, PCI_EXP_LNKSTA, &lnksta);
+	if (lnksta & PCI_EXP_LNKSTA_LBMS)
+		pcie_capability_write_word(ctrl->pcie->port, PCI_EXP_LNKSTA,
+					   PCI_EXP_LNKSTA_LBMS);
 }
-- 
2.17.1


             reply	other threads:[~2024-04-24  3:33 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-04-24  3:33 Smita Koralahalli [this message]
2024-04-24  9:32 ` [PATCH] PCI: pciehp: Clear LBMS on hot-remove to prevent link speed reduction Ilpo Järvinen
2024-04-25 20:23   ` Smita Koralahalli
2024-04-26  8:42     ` Ilpo Järvinen
2024-04-25  3:10 ` Kuppuswamy Sathyanarayanan
2024-04-25 20:49   ` Smita Koralahalli
2024-05-06  7:29 ` Lukas Wunner

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20240424033339.250385-1-Smita.KoralahalliChannabasappa@amd.com \
    --to=smita.koralahallichannabasappa@amd.com \
    --cc=billy.hagan@amd.com \
    --cc=helgaas@kernel.org \
    --cc=ilpo.jarvinen@linux.intel.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-pci@vger.kernel.org \
    --cc=lukas@wunner.de \
    --cc=macro@orcam.me.uk \
    --cc=mahesh@linux.ibm.com \
    --cc=sathyanarayanan.kuppuswamy@linux.intel.com \
    --cc=simon.guinot@seagate.com \
    --cc=terry.bowman@amd.com \
    --cc=yazen.ghannam@amd.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).