Linux-PCI Archive mirror
 help / color / mirror / Atom feed
From: "Maciej W. Rozycki" <macro@orcam.me.uk>
To: "Ilpo Järvinen" <ilpo.jarvinen@linux.intel.com>,
	"Bjorn Helgaas" <bhelgaas@google.com>
Cc: Mika Westerberg <mika.westerberg@linux.intel.com>,
	 linux-pci@vger.kernel.org, linux-kernel@vger.kernel.org
Subject: [PATCH 0/2] PCI: Rework error reporting with PCIe failed link retraining
Date: Sat, 10 Feb 2024 01:43:45 +0000 (GMT)	[thread overview]
Message-ID: <alpine.DEB.2.21.2402092125070.2376@angie.orcam.me.uk> (raw)

Hi,

 This patch series addresses issues observed by Ilpo as reported here: 
<https://lore.kernel.org/r/aa2d1c4e-9961-d54a-00c7-ddf8e858a9b0@linux.intel.com/>, 
one with excessive delays happening when `pcie_failed_link_retrain' is 
called, but link retraining has not been actually attempted, and another 
one with an API misuse caused by a merge mistake.

 See individual change description for further details; 1/2 supersedes: 
<https://patchwork.kernel.org/project/linux-pci/patch/20240202134108.4096-1-ilpo.jarvinen@linux.intel.com/>, 
and 2/2 supersedes: 
<https://patchwork.kernel.org/project/linux-pci/patch/20240208132205.4550-1-ilpo.jarvinen@linux.intel.com/>.

 Unfortunately I cannot verify the changes anymore beyond just checking 
that the system `pcie_failed_link_retrain' was intended for still boots, 
because something happened that makes the problematic link not to work at 
all.

 The system was up for 88 days and the link continued working as I was 
logged in over a serial line wired through a PCIe serial option card 
further downstream and I communicated over the line just fine to log out 
in preparation for a reboot.  After reboot the link did not respond and 
after several further attempts, including reboots and power cycles, the 
link still does not respond, LBMS is never set and I couldn't ever observe 
LT being set either.  This affects U-Boot too, as previously it reported:

PCIE-0: Link up (Gen1-x8, Bus0)
PCI Autoconfig: 02.03.00: Downstream link non-functional
PCI Autoconfig: 02.03.00: Retrying with speed restricted to 2.5GT/s...
PCI Autoconfig: 02.03.00: Succeeded!

and now it only reports:

PCIE-0: Link up (Gen1-x8, Bus0)

 Interestingly enough the system had its mainboard replaced those 3 months 
ago to deal with an unrelated problem, and with the new mainboard in place 
I already had issues with the option cards downstream from the PCIe switch 
immediately wired to 02.03.0.  I had to rewire and reseat the adapter and 
cards several times before it started working reliably.  Maybe something 
has happened to the adapter board with the PCIe switch that caused it to 
stop working, hopefully permanently.  Perhaps it has something to do with 
the power supply connection, which is via an FDC/Berg connector, not my 
favourite one.

 I have four such adapter boards total, so I can try and see if I am able 
to revive the original one or use a replacement one, but it won't happen 
right away, as I have the system installed in a remote lab ~1000mi/1600km 
away from me.  I'll try to bring the system back to fully working order at 
the next opportunity, but it is inconvenient to me to travel there right 
now just to address this problem, so it'll be a couple of weeks and likely 
more before I am able to say something.  I hope it's not the new mainboard 
(PCIe devices in the other slots work just fine).

 Hopefully I'll be able fix it one way or another and will be able to 
report on LBMS behaviour too, that is whether it retriggers with every 
link training iteration or not.

 Meanwhile the patches are hopefully obvious enough to apply.

  Maciej

             reply	other threads:[~2024-02-10  1:43 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-02-10  1:43 Maciej W. Rozycki [this message]
2024-02-10  1:43 ` [PATCH 1/2] PCI: Correct error reporting with PCIe failed link retraining Maciej W. Rozycki
2024-02-12 12:01   ` Ilpo Järvinen
2024-03-14  7:19   ` Pengfei Xu
2024-03-14 11:27     ` Ilpo Järvinen
2024-03-15  3:10       ` Pengfei Xu
2024-04-24 22:13   ` Bjorn Helgaas
2024-02-10  1:43 ` [PATCH 2/2] PCI: Use an error code " Maciej W. Rozycki
2024-02-12 12:11   ` Ilpo Järvinen
2024-02-26 12:55 ` [PATCH 0/2] PCI: Rework error reporting " Maciej W. Rozycki

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=alpine.DEB.2.21.2402092125070.2376@angie.orcam.me.uk \
    --to=macro@orcam.me.uk \
    --cc=bhelgaas@google.com \
    --cc=ilpo.jarvinen@linux.intel.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-pci@vger.kernel.org \
    --cc=mika.westerberg@linux.intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).