All the mail mirrored from lore.kernel.org
 help / color / mirror / Atom feed
From: "Oliver O'Halloran" <oohall@gmail.com>
To: Krishna Kumar <krishnak@linux.ibm.com>
Cc: linuxppc-dev@lists.ozlabs.org, linux-kernel@vger.kernel.org,
	 linux-pci@vger.kernel.org, mahesh@linux.ibm.com,
	 Michael Ellerman <mpe@ellerman.id.au>,
	Nicholas Piggin <npiggin@gmail.com>,
	 Christophe Leroy <christophe.leroy@csgroup.eu>,
	"Aneesh Kumar K.V" <aneesh.kumar@kernel.org>,
	 Bjorn Helgaas <bhelgaas@google.com>,
	Gaurav Batra <gbatra@linux.ibm.com>,
	 Nathan Lynch <nathanl@linux.ibm.com>,
	Brian King <brking@linux.vnet.ibm.com>
Subject: Re: [PATCH v2 2/2] powerpc: hotplug driver bridge support
Date: Fri, 17 May 2024 12:42:03 +1000	[thread overview]
Message-ID: <CAOSf1CFDCTMdmrjoSRdP09rJgtzPVDnCPXpfS-S+J7XKHzKRCw@mail.gmail.com> (raw)
In-Reply-To: <20240514135303.176134-3-krishnak@linux.ibm.com>

On Tue, May 14, 2024 at 11:54 PM Krishna Kumar <krishnak@linux.ibm.com> wrote:
>
> There is an issue with the hotplug operation when it's done on the
> bridge/switch slot. The bridge-port and devices behind the bridge, which
> become offline by hot-unplug operation, don't get hot-plugged/enabled by
> doing hot-plug operation on that slot. Only the first port of the bridge
> gets enabled and the remaining port/devices remain unplugged. The hot
> plug/unplug operation is done by the hotplug driver
> (drivers/pci/hotplug/pnv_php.c).
>
> Root Cause Analysis: This behavior is due to missing code for the DPC
> switch/bridge.

I don't see anything touching DPC in this series?

> *snip*
>
> Command for reproducing the issue :
>
> For hot unplug/disable - echo 0 > /sys/bus/pci/slots/C5/power
> For hot plug/enable -    echo 1 > /sys/bus/pci/slots/C5/power
>
> where C5 is slot associated with bridge.
>
> Scenario/Tests:
> Output of lspci -nn before test is given below. This snippet contains
> devices used for testing on Powernv machine.
>
> 0004:02:00.0 PCI bridge [0604]: PMC-Sierra Inc. Device [11f8:4052]
> 0004:02:01.0 PCI bridge [0604]: PMC-Sierra Inc. Device [11f8:4052]
> 0004:02:02.0 PCI bridge [0604]: PMC-Sierra Inc. Device [11f8:4052]
> 0004:02:03.0 PCI bridge [0604]: PMC-Sierra Inc. Device [11f8:4052]
> 0004:08:00.0 Serial Attached SCSI controller [0107]:
> Broadcom / LSI SAS3216 PCI-Express Fusion-MPT SAS-3 [1000:00c9] (rev 01)
> 0004:09:00.0 Serial Attached SCSI controller [0107]:
> Broadcom / LSI SAS3216 PCI-Express Fusion-MPT SAS-3 [1000:00c9] (rev 01)
>
> Output of lspci -tv before test is as follows:
>
> # lspci -tv
>  +-[0004:00]---00.0-[01-0e]--+-00.0-[02-0e]--+-00.0-[03-07]--
>  |                           |               +-01.0-[08]----00.0  Broadcom / LSI SAS3216 PCI-Express Fusion-MPT SAS-3
>  |                           |               +-02.0-[09]----00.0  Broadcom / LSI SAS3216 PCI-Express Fusion-MPT SAS-3
>  |                           |               \-03.0-[0a-0e]--
>  |                           \-00.1  PMC-Sierra Inc. Device 4052
>
> C5(bridge) and C6(End Point) slot address are as below:
> # cat /sys/bus/pci/slots/C5/address
> 0004:02:00
> # cat /sys/bus/pci/slots/C6/address
> 0004:09:00

Uh, if I'm reading this right it looks like your "slot" C5 is actually
the PCIe switch's internal bus which is definitely not hot pluggable.
I find it helps to look at the PCI topology in terms of where the
physical PCIe links are. Here we've got:

- A link between the PHB (0004:00:00.0) and the switch upstream port
(0004:01:00.0)
- A link from switch downstream port 0 (0004:02:00.0) to nothing
- A link from switch downstream port 1 (0004:02:01.0) to a SAS card
- A link from switch downstream port 2 (0004:02:02.0) to a SAS card
- A link from switch downstream port 2 (0004:02:03.0) to nothing

Note that there's no PCIe link between the switch upstream port
(0004:01:00.0) and the downstream ports on bus 0004:02. The connection
between those is invisible to us because it's custom bus logic
internal to the PCIe switch ASIC. What I think has happened here is
that system firmware has supplied bad PCIe slot information to OPAL
which has resulted in pnv_php advertising a slot in the wrong place.
Assuming this following the usual IBM convention I'd expect the bridge
device for C5 to be the PHB's root port and the bus should be 0004:01.
It might be worth adding some logic to pnv_php to verify the PCI
bridge upstream of the slot actually has the PCIe slot capability to
guard against this problem.

> Hot-unplug operation on slot associated with bridge:
> # echo 0 > /sys/bus/pci/slots/C5/power
> # lspci -tv
>  +-[0004:00]---00.0-[01-0e]--+-00.0-[02-0e]--
>  |                           \-00.1  PMC-Sierra Inc. Device 4052

Yep, "powering off" C5 doesn't remove the upstream port device. This
would create problems if you physically removed the card from C5 since
the kernel would assume the switch device is still present.

> *snip*


> diff --git a/arch/powerpc/kernel/pci_dn.c b/arch/powerpc/kernel/pci_dn.c
> index 38561d6a2079..bea612759832 100644
> --- a/arch/powerpc/kernel/pci_dn.c
> +++ b/arch/powerpc/kernel/pci_dn.c
> @@ -493,4 +493,36 @@ static void pci_dev_pdn_setup(struct pci_dev *pdev)
>         pdn = pci_get_pdn(pdev);
>         pdev->dev.archdata.pci_data = pdn;
>  }
> +
> +void pci_traverse_sibling_nodes_and_scan_slot(struct device_node *start, struct pci_bus *bus)
> +{
> +       struct device_node *dn;
> +       int slotno;
> +
> +       u32 class = 0;
> +
> +       if (!of_property_read_u32(start->child, "class-code", &class)) {
> +               /* Call of pci_scan_slot for non-bridge/EP case */
> +               if (!((class >> 8) == PCI_CLASS_BRIDGE_PCI)) {
> +                       slotno = PCI_SLOT(PCI_DN(start->child)->devfn);
> +                       pci_scan_slot(bus, PCI_DEVFN(slotno, 0));
> +                       return;
> +               }
> +       }
> +
> +       /* Iterate all siblings */
> +       for_each_child_of_node(start, dn) {
> +               class = 0;
> +
> +               if (!of_property_read_u32(start->child, "class-code", &class)) {
> +                       /* Call of pci_scan_slot on each sibling-nodes/bridge-ports */
> +                       if ((class >> 8) == PCI_CLASS_BRIDGE_PCI) {
> +                               slotno = PCI_SLOT(PCI_DN(dn)->devfn);
> +                               pci_scan_slot(bus, PCI_DEVFN(slotno, 0));
> +                       }
> +               }
> +       }

If you're going to iterate over all the DT nodes why not just scan all
of them rather than special casing bridges? IIRC current logic is the
way it is because PowerVM only puts single devices under a PHB and in
the PowerNV (pnv_php) case the PCIe spec guarantees that only device 0
will be present on the end of a link. If you want to handle the more
generic case then feel free, but do it properly.

WARNING: multiple messages have this Message-ID (diff)
From: "Oliver O'Halloran" <oohall@gmail.com>
To: Krishna Kumar <krishnak@linux.ibm.com>
Cc: Nathan Lynch <nathanl@linux.ibm.com>,
	Gaurav Batra <gbatra@linux.ibm.com>,
	linux-kernel@vger.kernel.org, Nicholas Piggin <npiggin@gmail.com>,
	mahesh@linux.ibm.com,
	"Aneesh Kumar K.V" <aneesh.kumar@kernel.org>,
	Brian King <brking@linux.vnet.ibm.com>,
	linux-pci@vger.kernel.org, Bjorn Helgaas <bhelgaas@google.com>,
	linuxppc-dev@lists.ozlabs.org
Subject: Re: [PATCH v2 2/2] powerpc: hotplug driver bridge support
Date: Fri, 17 May 2024 12:42:03 +1000	[thread overview]
Message-ID: <CAOSf1CFDCTMdmrjoSRdP09rJgtzPVDnCPXpfS-S+J7XKHzKRCw@mail.gmail.com> (raw)
In-Reply-To: <20240514135303.176134-3-krishnak@linux.ibm.com>

On Tue, May 14, 2024 at 11:54 PM Krishna Kumar <krishnak@linux.ibm.com> wrote:
>
> There is an issue with the hotplug operation when it's done on the
> bridge/switch slot. The bridge-port and devices behind the bridge, which
> become offline by hot-unplug operation, don't get hot-plugged/enabled by
> doing hot-plug operation on that slot. Only the first port of the bridge
> gets enabled and the remaining port/devices remain unplugged. The hot
> plug/unplug operation is done by the hotplug driver
> (drivers/pci/hotplug/pnv_php.c).
>
> Root Cause Analysis: This behavior is due to missing code for the DPC
> switch/bridge.

I don't see anything touching DPC in this series?

> *snip*
>
> Command for reproducing the issue :
>
> For hot unplug/disable - echo 0 > /sys/bus/pci/slots/C5/power
> For hot plug/enable -    echo 1 > /sys/bus/pci/slots/C5/power
>
> where C5 is slot associated with bridge.
>
> Scenario/Tests:
> Output of lspci -nn before test is given below. This snippet contains
> devices used for testing on Powernv machine.
>
> 0004:02:00.0 PCI bridge [0604]: PMC-Sierra Inc. Device [11f8:4052]
> 0004:02:01.0 PCI bridge [0604]: PMC-Sierra Inc. Device [11f8:4052]
> 0004:02:02.0 PCI bridge [0604]: PMC-Sierra Inc. Device [11f8:4052]
> 0004:02:03.0 PCI bridge [0604]: PMC-Sierra Inc. Device [11f8:4052]
> 0004:08:00.0 Serial Attached SCSI controller [0107]:
> Broadcom / LSI SAS3216 PCI-Express Fusion-MPT SAS-3 [1000:00c9] (rev 01)
> 0004:09:00.0 Serial Attached SCSI controller [0107]:
> Broadcom / LSI SAS3216 PCI-Express Fusion-MPT SAS-3 [1000:00c9] (rev 01)
>
> Output of lspci -tv before test is as follows:
>
> # lspci -tv
>  +-[0004:00]---00.0-[01-0e]--+-00.0-[02-0e]--+-00.0-[03-07]--
>  |                           |               +-01.0-[08]----00.0  Broadcom / LSI SAS3216 PCI-Express Fusion-MPT SAS-3
>  |                           |               +-02.0-[09]----00.0  Broadcom / LSI SAS3216 PCI-Express Fusion-MPT SAS-3
>  |                           |               \-03.0-[0a-0e]--
>  |                           \-00.1  PMC-Sierra Inc. Device 4052
>
> C5(bridge) and C6(End Point) slot address are as below:
> # cat /sys/bus/pci/slots/C5/address
> 0004:02:00
> # cat /sys/bus/pci/slots/C6/address
> 0004:09:00

Uh, if I'm reading this right it looks like your "slot" C5 is actually
the PCIe switch's internal bus which is definitely not hot pluggable.
I find it helps to look at the PCI topology in terms of where the
physical PCIe links are. Here we've got:

- A link between the PHB (0004:00:00.0) and the switch upstream port
(0004:01:00.0)
- A link from switch downstream port 0 (0004:02:00.0) to nothing
- A link from switch downstream port 1 (0004:02:01.0) to a SAS card
- A link from switch downstream port 2 (0004:02:02.0) to a SAS card
- A link from switch downstream port 2 (0004:02:03.0) to nothing

Note that there's no PCIe link between the switch upstream port
(0004:01:00.0) and the downstream ports on bus 0004:02. The connection
between those is invisible to us because it's custom bus logic
internal to the PCIe switch ASIC. What I think has happened here is
that system firmware has supplied bad PCIe slot information to OPAL
which has resulted in pnv_php advertising a slot in the wrong place.
Assuming this following the usual IBM convention I'd expect the bridge
device for C5 to be the PHB's root port and the bus should be 0004:01.
It might be worth adding some logic to pnv_php to verify the PCI
bridge upstream of the slot actually has the PCIe slot capability to
guard against this problem.

> Hot-unplug operation on slot associated with bridge:
> # echo 0 > /sys/bus/pci/slots/C5/power
> # lspci -tv
>  +-[0004:00]---00.0-[01-0e]--+-00.0-[02-0e]--
>  |                           \-00.1  PMC-Sierra Inc. Device 4052

Yep, "powering off" C5 doesn't remove the upstream port device. This
would create problems if you physically removed the card from C5 since
the kernel would assume the switch device is still present.

> *snip*


> diff --git a/arch/powerpc/kernel/pci_dn.c b/arch/powerpc/kernel/pci_dn.c
> index 38561d6a2079..bea612759832 100644
> --- a/arch/powerpc/kernel/pci_dn.c
> +++ b/arch/powerpc/kernel/pci_dn.c
> @@ -493,4 +493,36 @@ static void pci_dev_pdn_setup(struct pci_dev *pdev)
>         pdn = pci_get_pdn(pdev);
>         pdev->dev.archdata.pci_data = pdn;
>  }
> +
> +void pci_traverse_sibling_nodes_and_scan_slot(struct device_node *start, struct pci_bus *bus)
> +{
> +       struct device_node *dn;
> +       int slotno;
> +
> +       u32 class = 0;
> +
> +       if (!of_property_read_u32(start->child, "class-code", &class)) {
> +               /* Call of pci_scan_slot for non-bridge/EP case */
> +               if (!((class >> 8) == PCI_CLASS_BRIDGE_PCI)) {
> +                       slotno = PCI_SLOT(PCI_DN(start->child)->devfn);
> +                       pci_scan_slot(bus, PCI_DEVFN(slotno, 0));
> +                       return;
> +               }
> +       }
> +
> +       /* Iterate all siblings */
> +       for_each_child_of_node(start, dn) {
> +               class = 0;
> +
> +               if (!of_property_read_u32(start->child, "class-code", &class)) {
> +                       /* Call of pci_scan_slot on each sibling-nodes/bridge-ports */
> +                       if ((class >> 8) == PCI_CLASS_BRIDGE_PCI) {
> +                               slotno = PCI_SLOT(PCI_DN(dn)->devfn);
> +                               pci_scan_slot(bus, PCI_DEVFN(slotno, 0));
> +                       }
> +               }
> +       }

If you're going to iterate over all the DT nodes why not just scan all
of them rather than special casing bridges? IIRC current logic is the
way it is because PowerVM only puts single devices under a PHB and in
the PowerNV (pnv_php) case the PCIe spec guarantees that only device 0
will be present on the end of a link. If you want to handle the more
generic case then feel free, but do it properly.

  reply	other threads:[~2024-05-17  2:42 UTC|newest]

Thread overview: 17+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-05-14 13:52 [PATCH v2 0/2] PCI hotplug driver fixes Krishna Kumar
2024-05-14 13:52 ` Krishna Kumar
2024-05-14 13:52 ` [PATCH v2 1/2] pci/hotplug/pnv_php: Fix hotplug driver crash on Powernv Krishna Kumar
2024-05-14 13:52   ` Krishna Kumar
2024-05-14 13:52 ` [PATCH v2 2/2] powerpc: hotplug driver bridge support Krishna Kumar
2024-05-14 13:52   ` Krishna Kumar
2024-05-17  2:42   ` Oliver O'Halloran [this message]
2024-05-17  2:42     ` Oliver O'Halloran
2024-05-17 11:14     ` krishna kumar
2024-05-20 14:59       ` Oliver O'Halloran
2024-05-20 14:59         ` Oliver O'Halloran
2024-05-23 14:52         ` Krishna Kumar
2024-05-23 14:52           ` Krishna Kumar
2024-05-31 10:44           ` Krishna Kumar
2024-05-31 10:44             ` Krishna Kumar
2024-05-31 13:52             ` Krishna Kumar
2024-05-31 13:52               ` Krishna Kumar

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAOSf1CFDCTMdmrjoSRdP09rJgtzPVDnCPXpfS-S+J7XKHzKRCw@mail.gmail.com \
    --to=oohall@gmail.com \
    --cc=aneesh.kumar@kernel.org \
    --cc=bhelgaas@google.com \
    --cc=brking@linux.vnet.ibm.com \
    --cc=christophe.leroy@csgroup.eu \
    --cc=gbatra@linux.ibm.com \
    --cc=krishnak@linux.ibm.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-pci@vger.kernel.org \
    --cc=linuxppc-dev@lists.ozlabs.org \
    --cc=mahesh@linux.ibm.com \
    --cc=mpe@ellerman.id.au \
    --cc=nathanl@linux.ibm.com \
    --cc=npiggin@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.