All the mail mirrored from lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH V6 00/10] VF EEH on Power8
@ 2015-05-19  1:35 ` Wei Yang
  0 siblings, 0 replies; 68+ messages in thread
From: Wei Yang @ 2015-05-19  1:35 UTC (permalink / raw)
  To: gwshan, bhelgaas; +Cc: linuxppc-dev, linux-pci, Wei Yang

This patchset enables EEH on SRIOV VFs. The general idea is to create proper
VF edev and VF PE and handle them properly.

Different from the Bus PE, VF PE just contain one VF. This introduces the
difference of EEH error handling on a VF PE. Generally, it has several
differences.

First, the VF's removal and re-enumerate rely on its PF. VF has a tight
relationship between its PF. This is not proper to enumerate a VF by usual
scan procedure. That's why virtfn_add/virtfn_remove are exported in this patch
set.

Second, the reset/restore of a VF is done in kernel space. FW is not aware of
the VF, this means the usual reset function done in FW will not work. One of
the patch will imitate the reset/restore function in kernel space.

Third, the VF may be removed during the PF's error_detected function. In this
case, the original error_detected->slot_reset->resume sequence is not proper
to those removed VFs, since they are re-created by PF in a fresh state. A flag
in eeh_dev is introduce to mark the eeh_dev is in error state. By doing so, we
track whether this device needs to be reset or not.

This has been tested both on host and in guest on Power8 with latest kernel
version.

v6:
   * code / commit log refactor by Gavin
v5:
   * remove the compound field, iterate on Master VF PE instead
   * some code refine on PCI config restore and reset on VF
     the wait time for assert and deassert
     PCI device address format
     check on edev->pcie_cap and edev->aer_cap before access them
v4:
   * refine the change logs, comment and code style
   * change pnv_pci_fixup_vf_eeh() to pnv_eeh_vf_final_fixup() and remove the
     CONFIG_PCI_IOV macro
   * reorder patch 5/6 to make the logic more reasonable
   * remove remove_dev_pci_data()
   * remove the EEH_DEV_VF flag, use edev->physfn to identify a VF EEH DEV and
     remove related CONFIG_PCI_IOV macro
   * add the option for VF reset
   * fix the pnv_eeh_cfg_blocked() logic
   * replace pnv_pci_cfg_{read,write} with eeh_ops->{read,write}_config in
     pnv_eeh_vf_restore_config()
   * rename pnv_eeh_vf_restore_config() to pnv_eeh_restore_vf_config()
   * rename pnv_pci_fixup_vf_caps() to pnv_pci_vf_header_fixup() and move it
     to arch/powerpc/platforms/powernv/pci.c
   * add a field compound in pnv_ioda_pe to link compound PEs
   * handle compound PE for VF PEs
v3:
   * add back vf_index in pci_dn to track the VF's index
   * rename ppdev in eeh_dev to physfn for consistency
   * move edev->physfn assignment before dev->dev.archdata.edev is set
   * move pnv_pci_fixup_vf_eeh() and pnv_pci_fixup_vf_caps() to eeh-powernv.c
   * more clear and detail in commit log and comment in code
   * merge eeh_rmv_virt_device() with eeh_rmv_device()
   * move the cfg_blocked check logic from pnv_eeh_read/write_config() to
     pnv_eeh_cfg_blocked()
   * move the vf reset/restore logic into its own patch, two patches are
     created.
     powerpc/powernv: Support PCI config restore for VFs
     powerpc/powernv: Support EEH reset for VFs
   * simplify the vf reset logic
v2:
   * add prefix pci_iov_ to virtfn_add/virtfn_remove
   * use EEH_DEV_VF as a flag for a VF's eeh_dev
   * use eeh_dev instead of edev in change log
   * remove vf_index in eeh_dev, calculate it from pdn->busno and devfn
   * do eeh_add_device_late() and eeh_sysfs_add_device() both after pci_dev is
     well initialized
   * do FLR to reset a VF PE
   * imitate the restore function in FW for VF
   * remove the reverse order patch, since it is still under discussion

Wei Yang (10):
  PCI/IOV: Rename and export virtfn_add/virtfn_remove
  powerpc/pci: Cache VF index in pci_dn
  powerpc/pci: Remove VFs prior to PF
  powerpc/eeh: Trace first 7 BARs in address cache
  powerpc/powernv: EEH device for VF
  powerpc/eeh: Create PE for VFs
  powerpc/powernv: Support EEH reset for VF PE
  powerpc/powernv: Support PCI config restore for VFs
  powerpc/eeh: Support error recovery for VF PE
  powerpc/powernv: compound PE for VFs

 arch/powerpc/include/asm/eeh.h               |    4 +
 arch/powerpc/include/asm/pci-bridge.h        |    2 +
 arch/powerpc/kernel/eeh.c                    |    8 +
 arch/powerpc/kernel/eeh_cache.c              |    2 +-
 arch/powerpc/kernel/eeh_driver.c             |  100 +++++++++---
 arch/powerpc/kernel/eeh_pe.c                 |   13 +-
 arch/powerpc/kernel/pci-hotplug.c            |    2 +-
 arch/powerpc/kernel/pci_dn.c                 |   16 +-
 arch/powerpc/platforms/powernv/eeh-powernv.c |  221 +++++++++++++++++++++++++-
 arch/powerpc/platforms/powernv/pci-ioda.c    |   46 +++++-
 arch/powerpc/platforms/powernv/pci.c         |   35 +++-
 drivers/pci/iov.c                            |   10 +-
 include/linux/pci.h                          |    2 +
 13 files changed, 420 insertions(+), 41 deletions(-)

-- 
1.7.9.5


^ permalink raw reply	[flat|nested] 68+ messages in thread

* [PATCH V6 00/10] VF EEH on Power8
@ 2015-05-19  1:35 ` Wei Yang
  0 siblings, 0 replies; 68+ messages in thread
From: Wei Yang @ 2015-05-19  1:35 UTC (permalink / raw)
  To: gwshan, bhelgaas; +Cc: linux-pci, Wei Yang, linuxppc-dev

This patchset enables EEH on SRIOV VFs. The general idea is to create proper
VF edev and VF PE and handle them properly.

Different from the Bus PE, VF PE just contain one VF. This introduces the
difference of EEH error handling on a VF PE. Generally, it has several
differences.

First, the VF's removal and re-enumerate rely on its PF. VF has a tight
relationship between its PF. This is not proper to enumerate a VF by usual
scan procedure. That's why virtfn_add/virtfn_remove are exported in this patch
set.

Second, the reset/restore of a VF is done in kernel space. FW is not aware of
the VF, this means the usual reset function done in FW will not work. One of
the patch will imitate the reset/restore function in kernel space.

Third, the VF may be removed during the PF's error_detected function. In this
case, the original error_detected->slot_reset->resume sequence is not proper
to those removed VFs, since they are re-created by PF in a fresh state. A flag
in eeh_dev is introduce to mark the eeh_dev is in error state. By doing so, we
track whether this device needs to be reset or not.

This has been tested both on host and in guest on Power8 with latest kernel
version.

v6:
   * code / commit log refactor by Gavin
v5:
   * remove the compound field, iterate on Master VF PE instead
   * some code refine on PCI config restore and reset on VF
     the wait time for assert and deassert
     PCI device address format
     check on edev->pcie_cap and edev->aer_cap before access them
v4:
   * refine the change logs, comment and code style
   * change pnv_pci_fixup_vf_eeh() to pnv_eeh_vf_final_fixup() and remove the
     CONFIG_PCI_IOV macro
   * reorder patch 5/6 to make the logic more reasonable
   * remove remove_dev_pci_data()
   * remove the EEH_DEV_VF flag, use edev->physfn to identify a VF EEH DEV and
     remove related CONFIG_PCI_IOV macro
   * add the option for VF reset
   * fix the pnv_eeh_cfg_blocked() logic
   * replace pnv_pci_cfg_{read,write} with eeh_ops->{read,write}_config in
     pnv_eeh_vf_restore_config()
   * rename pnv_eeh_vf_restore_config() to pnv_eeh_restore_vf_config()
   * rename pnv_pci_fixup_vf_caps() to pnv_pci_vf_header_fixup() and move it
     to arch/powerpc/platforms/powernv/pci.c
   * add a field compound in pnv_ioda_pe to link compound PEs
   * handle compound PE for VF PEs
v3:
   * add back vf_index in pci_dn to track the VF's index
   * rename ppdev in eeh_dev to physfn for consistency
   * move edev->physfn assignment before dev->dev.archdata.edev is set
   * move pnv_pci_fixup_vf_eeh() and pnv_pci_fixup_vf_caps() to eeh-powernv.c
   * more clear and detail in commit log and comment in code
   * merge eeh_rmv_virt_device() with eeh_rmv_device()
   * move the cfg_blocked check logic from pnv_eeh_read/write_config() to
     pnv_eeh_cfg_blocked()
   * move the vf reset/restore logic into its own patch, two patches are
     created.
     powerpc/powernv: Support PCI config restore for VFs
     powerpc/powernv: Support EEH reset for VFs
   * simplify the vf reset logic
v2:
   * add prefix pci_iov_ to virtfn_add/virtfn_remove
   * use EEH_DEV_VF as a flag for a VF's eeh_dev
   * use eeh_dev instead of edev in change log
   * remove vf_index in eeh_dev, calculate it from pdn->busno and devfn
   * do eeh_add_device_late() and eeh_sysfs_add_device() both after pci_dev is
     well initialized
   * do FLR to reset a VF PE
   * imitate the restore function in FW for VF
   * remove the reverse order patch, since it is still under discussion

Wei Yang (10):
  PCI/IOV: Rename and export virtfn_add/virtfn_remove
  powerpc/pci: Cache VF index in pci_dn
  powerpc/pci: Remove VFs prior to PF
  powerpc/eeh: Trace first 7 BARs in address cache
  powerpc/powernv: EEH device for VF
  powerpc/eeh: Create PE for VFs
  powerpc/powernv: Support EEH reset for VF PE
  powerpc/powernv: Support PCI config restore for VFs
  powerpc/eeh: Support error recovery for VF PE
  powerpc/powernv: compound PE for VFs

 arch/powerpc/include/asm/eeh.h               |    4 +
 arch/powerpc/include/asm/pci-bridge.h        |    2 +
 arch/powerpc/kernel/eeh.c                    |    8 +
 arch/powerpc/kernel/eeh_cache.c              |    2 +-
 arch/powerpc/kernel/eeh_driver.c             |  100 +++++++++---
 arch/powerpc/kernel/eeh_pe.c                 |   13 +-
 arch/powerpc/kernel/pci-hotplug.c            |    2 +-
 arch/powerpc/kernel/pci_dn.c                 |   16 +-
 arch/powerpc/platforms/powernv/eeh-powernv.c |  221 +++++++++++++++++++++++++-
 arch/powerpc/platforms/powernv/pci-ioda.c    |   46 +++++-
 arch/powerpc/platforms/powernv/pci.c         |   35 +++-
 drivers/pci/iov.c                            |   10 +-
 include/linux/pci.h                          |    2 +
 13 files changed, 420 insertions(+), 41 deletions(-)

-- 
1.7.9.5

^ permalink raw reply	[flat|nested] 68+ messages in thread

* [PATCH V6 01/10] PCI/IOV: Rename and export virtfn_add/virtfn_remove
  2015-05-19  1:35 ` Wei Yang
@ 2015-05-19  1:35   ` Wei Yang
  -1 siblings, 0 replies; 68+ messages in thread
From: Wei Yang @ 2015-05-19  1:35 UTC (permalink / raw)
  To: gwshan, bhelgaas; +Cc: linuxppc-dev, linux-pci, Wei Yang

During EEH recovery, hotplug is applied to the devices which don't
have drivers or their drivers don't support EEH. However, the hotplug,
which was implemented based on PCI bus, can't be applied to VF directly.

The patch renames virtn_{add,remove}() and exports them so that they
can be used in PCI hotplug during EEH recovery.

[gwshan: changelog]
Signed-off-by: Wei Yang <weiyang@linux.vnet.ibm.com>
Reviewed-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
---
 drivers/pci/iov.c   |   10 +++++-----
 include/linux/pci.h |    2 ++
 2 files changed, 7 insertions(+), 5 deletions(-)

diff --git a/drivers/pci/iov.c b/drivers/pci/iov.c
index ee0ebff..cc941dd 100644
--- a/drivers/pci/iov.c
+++ b/drivers/pci/iov.c
@@ -108,7 +108,7 @@ resource_size_t pci_iov_resource_size(struct pci_dev *dev, int resno)
 	return dev->sriov->barsz[resno - PCI_IOV_RESOURCES];
 }
 
-static int virtfn_add(struct pci_dev *dev, int id, int reset)
+int pci_iov_virtfn_add(struct pci_dev *dev, int id, int reset)
 {
 	int i;
 	int rc = -ENOMEM;
@@ -183,7 +183,7 @@ failed:
 	return rc;
 }
 
-static void virtfn_remove(struct pci_dev *dev, int id, int reset)
+void pci_iov_virtfn_remove(struct pci_dev *dev, int id, int reset)
 {
 	char buf[VIRTFN_ID_LEN];
 	struct pci_dev *virtfn;
@@ -320,7 +320,7 @@ static int sriov_enable(struct pci_dev *dev, int nr_virtfn)
 	}
 
 	for (i = 0; i < initial; i++) {
-		rc = virtfn_add(dev, i, 0);
+		rc = pci_iov_virtfn_add(dev, i, 0);
 		if (rc)
 			goto failed;
 	}
@@ -332,7 +332,7 @@ static int sriov_enable(struct pci_dev *dev, int nr_virtfn)
 
 failed:
 	for (j = 0; j < i; j++)
-		virtfn_remove(dev, j, 0);
+		pci_iov_virtfn_remove(dev, j, 0);
 
 	iov->ctrl &= ~(PCI_SRIOV_CTRL_VFE | PCI_SRIOV_CTRL_MSE);
 	pci_cfg_access_lock(dev);
@@ -361,7 +361,7 @@ static void sriov_disable(struct pci_dev *dev)
 		return;
 
 	for (i = 0; i < iov->num_VFs; i++)
-		virtfn_remove(dev, i, 0);
+		pci_iov_virtfn_remove(dev, i, 0);
 
 	pcibios_sriov_disable(dev);
 
diff --git a/include/linux/pci.h b/include/linux/pci.h
index 353db8d..94bacfa 100644
--- a/include/linux/pci.h
+++ b/include/linux/pci.h
@@ -1679,6 +1679,8 @@ int pci_iov_virtfn_devfn(struct pci_dev *dev, int id);
 
 int pci_enable_sriov(struct pci_dev *dev, int nr_virtfn);
 void pci_disable_sriov(struct pci_dev *dev);
+int pci_iov_virtfn_add(struct pci_dev *dev, int id, int reset);
+void pci_iov_virtfn_remove(struct pci_dev *dev, int id, int reset);
 int pci_num_vf(struct pci_dev *dev);
 int pci_vfs_assigned(struct pci_dev *dev);
 int pci_sriov_set_totalvfs(struct pci_dev *dev, u16 numvfs);
-- 
1.7.9.5


^ permalink raw reply related	[flat|nested] 68+ messages in thread

* [PATCH V6 01/10] PCI/IOV: Rename and export virtfn_add/virtfn_remove
@ 2015-05-19  1:35   ` Wei Yang
  0 siblings, 0 replies; 68+ messages in thread
From: Wei Yang @ 2015-05-19  1:35 UTC (permalink / raw)
  To: gwshan, bhelgaas; +Cc: linux-pci, Wei Yang, linuxppc-dev

During EEH recovery, hotplug is applied to the devices which don't
have drivers or their drivers don't support EEH. However, the hotplug,
which was implemented based on PCI bus, can't be applied to VF directly.

The patch renames virtn_{add,remove}() and exports them so that they
can be used in PCI hotplug during EEH recovery.

[gwshan: changelog]
Signed-off-by: Wei Yang <weiyang@linux.vnet.ibm.com>
Reviewed-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
---
 drivers/pci/iov.c   |   10 +++++-----
 include/linux/pci.h |    2 ++
 2 files changed, 7 insertions(+), 5 deletions(-)

diff --git a/drivers/pci/iov.c b/drivers/pci/iov.c
index ee0ebff..cc941dd 100644
--- a/drivers/pci/iov.c
+++ b/drivers/pci/iov.c
@@ -108,7 +108,7 @@ resource_size_t pci_iov_resource_size(struct pci_dev *dev, int resno)
 	return dev->sriov->barsz[resno - PCI_IOV_RESOURCES];
 }
 
-static int virtfn_add(struct pci_dev *dev, int id, int reset)
+int pci_iov_virtfn_add(struct pci_dev *dev, int id, int reset)
 {
 	int i;
 	int rc = -ENOMEM;
@@ -183,7 +183,7 @@ failed:
 	return rc;
 }
 
-static void virtfn_remove(struct pci_dev *dev, int id, int reset)
+void pci_iov_virtfn_remove(struct pci_dev *dev, int id, int reset)
 {
 	char buf[VIRTFN_ID_LEN];
 	struct pci_dev *virtfn;
@@ -320,7 +320,7 @@ static int sriov_enable(struct pci_dev *dev, int nr_virtfn)
 	}
 
 	for (i = 0; i < initial; i++) {
-		rc = virtfn_add(dev, i, 0);
+		rc = pci_iov_virtfn_add(dev, i, 0);
 		if (rc)
 			goto failed;
 	}
@@ -332,7 +332,7 @@ static int sriov_enable(struct pci_dev *dev, int nr_virtfn)
 
 failed:
 	for (j = 0; j < i; j++)
-		virtfn_remove(dev, j, 0);
+		pci_iov_virtfn_remove(dev, j, 0);
 
 	iov->ctrl &= ~(PCI_SRIOV_CTRL_VFE | PCI_SRIOV_CTRL_MSE);
 	pci_cfg_access_lock(dev);
@@ -361,7 +361,7 @@ static void sriov_disable(struct pci_dev *dev)
 		return;
 
 	for (i = 0; i < iov->num_VFs; i++)
-		virtfn_remove(dev, i, 0);
+		pci_iov_virtfn_remove(dev, i, 0);
 
 	pcibios_sriov_disable(dev);
 
diff --git a/include/linux/pci.h b/include/linux/pci.h
index 353db8d..94bacfa 100644
--- a/include/linux/pci.h
+++ b/include/linux/pci.h
@@ -1679,6 +1679,8 @@ int pci_iov_virtfn_devfn(struct pci_dev *dev, int id);
 
 int pci_enable_sriov(struct pci_dev *dev, int nr_virtfn);
 void pci_disable_sriov(struct pci_dev *dev);
+int pci_iov_virtfn_add(struct pci_dev *dev, int id, int reset);
+void pci_iov_virtfn_remove(struct pci_dev *dev, int id, int reset);
 int pci_num_vf(struct pci_dev *dev);
 int pci_vfs_assigned(struct pci_dev *dev);
 int pci_sriov_set_totalvfs(struct pci_dev *dev, u16 numvfs);
-- 
1.7.9.5

^ permalink raw reply related	[flat|nested] 68+ messages in thread

* [PATCH V6 02/10] powerpc/pci: Cache VF index in pci_dn
  2015-05-19  1:35 ` Wei Yang
@ 2015-05-19  1:35   ` Wei Yang
  -1 siblings, 0 replies; 68+ messages in thread
From: Wei Yang @ 2015-05-19  1:35 UTC (permalink / raw)
  To: gwshan, bhelgaas; +Cc: linuxppc-dev, linux-pci, Wei Yang

The patch caches the VF index in pci_dn, which can be used to calculate
VF's bus, device and function number. Those information helps to locate
the VF's PCI device instance when doing hotplug during EEH recovery if
necessary.

Signed-off-by: Wei Yang <weiyang@linux.vnet.ibm.com>
Acked-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
---
 arch/powerpc/include/asm/pci-bridge.h |    1 +
 arch/powerpc/kernel/pci_dn.c          |    4 +++-
 2 files changed, 4 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/include/asm/pci-bridge.h b/arch/powerpc/include/asm/pci-bridge.h
index 1811c44..d78afe4 100644
--- a/arch/powerpc/include/asm/pci-bridge.h
+++ b/arch/powerpc/include/asm/pci-bridge.h
@@ -199,6 +199,7 @@ struct pci_dn {
 #ifdef CONFIG_PCI_IOV
 	u16     vfs_expanded;		/* number of VFs IOV BAR expanded */
 	u16     num_vfs;		/* number of VFs enabled*/
+	int     vf_index;		/* VF index in the PF */
 	int     offset;			/* PE# for the first VF PE */
 #define M64_PER_IOV 4
 	int     m64_per_iov;
diff --git a/arch/powerpc/kernel/pci_dn.c b/arch/powerpc/kernel/pci_dn.c
index b3b4df9..f771130 100644
--- a/arch/powerpc/kernel/pci_dn.c
+++ b/arch/powerpc/kernel/pci_dn.c
@@ -139,6 +139,7 @@ struct pci_dn *pci_get_pdn(struct pci_dev *pdev)
 #ifdef CONFIG_PCI_IOV
 static struct pci_dn *add_one_dev_pci_data(struct pci_dn *parent,
 					   struct pci_dev *pdev,
+					   int vf_index,
 					   int busno, int devfn)
 {
 	struct pci_dn *pdn;
@@ -157,6 +158,7 @@ static struct pci_dn *add_one_dev_pci_data(struct pci_dn *parent,
 	pdn->parent = parent;
 	pdn->busno = busno;
 	pdn->devfn = devfn;
+	pdn->vf_index = vf_index;
 #ifdef CONFIG_PPC_POWERNV
 	pdn->pe_number = IODA_INVALID_PE;
 #endif
@@ -196,7 +198,7 @@ struct pci_dn *add_dev_pci_data(struct pci_dev *pdev)
 		return NULL;
 
 	for (i = 0; i < pci_sriov_get_totalvfs(pdev); i++) {
-		pdn = add_one_dev_pci_data(parent, NULL,
+		pdn = add_one_dev_pci_data(parent, NULL, i,
 					   pci_iov_virtfn_bus(pdev, i),
 					   pci_iov_virtfn_devfn(pdev, i));
 		if (!pdn) {
-- 
1.7.9.5


^ permalink raw reply related	[flat|nested] 68+ messages in thread

* [PATCH V6 02/10] powerpc/pci: Cache VF index in pci_dn
@ 2015-05-19  1:35   ` Wei Yang
  0 siblings, 0 replies; 68+ messages in thread
From: Wei Yang @ 2015-05-19  1:35 UTC (permalink / raw)
  To: gwshan, bhelgaas; +Cc: linux-pci, Wei Yang, linuxppc-dev

The patch caches the VF index in pci_dn, which can be used to calculate
VF's bus, device and function number. Those information helps to locate
the VF's PCI device instance when doing hotplug during EEH recovery if
necessary.

Signed-off-by: Wei Yang <weiyang@linux.vnet.ibm.com>
Acked-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
---
 arch/powerpc/include/asm/pci-bridge.h |    1 +
 arch/powerpc/kernel/pci_dn.c          |    4 +++-
 2 files changed, 4 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/include/asm/pci-bridge.h b/arch/powerpc/include/asm/pci-bridge.h
index 1811c44..d78afe4 100644
--- a/arch/powerpc/include/asm/pci-bridge.h
+++ b/arch/powerpc/include/asm/pci-bridge.h
@@ -199,6 +199,7 @@ struct pci_dn {
 #ifdef CONFIG_PCI_IOV
 	u16     vfs_expanded;		/* number of VFs IOV BAR expanded */
 	u16     num_vfs;		/* number of VFs enabled*/
+	int     vf_index;		/* VF index in the PF */
 	int     offset;			/* PE# for the first VF PE */
 #define M64_PER_IOV 4
 	int     m64_per_iov;
diff --git a/arch/powerpc/kernel/pci_dn.c b/arch/powerpc/kernel/pci_dn.c
index b3b4df9..f771130 100644
--- a/arch/powerpc/kernel/pci_dn.c
+++ b/arch/powerpc/kernel/pci_dn.c
@@ -139,6 +139,7 @@ struct pci_dn *pci_get_pdn(struct pci_dev *pdev)
 #ifdef CONFIG_PCI_IOV
 static struct pci_dn *add_one_dev_pci_data(struct pci_dn *parent,
 					   struct pci_dev *pdev,
+					   int vf_index,
 					   int busno, int devfn)
 {
 	struct pci_dn *pdn;
@@ -157,6 +158,7 @@ static struct pci_dn *add_one_dev_pci_data(struct pci_dn *parent,
 	pdn->parent = parent;
 	pdn->busno = busno;
 	pdn->devfn = devfn;
+	pdn->vf_index = vf_index;
 #ifdef CONFIG_PPC_POWERNV
 	pdn->pe_number = IODA_INVALID_PE;
 #endif
@@ -196,7 +198,7 @@ struct pci_dn *add_dev_pci_data(struct pci_dev *pdev)
 		return NULL;
 
 	for (i = 0; i < pci_sriov_get_totalvfs(pdev); i++) {
-		pdn = add_one_dev_pci_data(parent, NULL,
+		pdn = add_one_dev_pci_data(parent, NULL, i,
 					   pci_iov_virtfn_bus(pdev, i),
 					   pci_iov_virtfn_devfn(pdev, i));
 		if (!pdn) {
-- 
1.7.9.5

^ permalink raw reply related	[flat|nested] 68+ messages in thread

* [PATCH V6 03/10] powerpc/pci: Remove VFs prior to PF
  2015-05-19  1:35 ` Wei Yang
@ 2015-05-19  1:35   ` Wei Yang
  -1 siblings, 0 replies; 68+ messages in thread
From: Wei Yang @ 2015-05-19  1:35 UTC (permalink / raw)
  To: gwshan, bhelgaas; +Cc: linuxppc-dev, linux-pci, Wei Yang

As commit ac205b7b ("PCI: make sriov work with hotplug remove") indicates,
VFs, which might be hooked to same PCI bus as their PF should be removed
before the PF. Otherwise, the PCI hot unplugging on the PCI bus would
cause kernel crash.

The patch applies the above pattern to PowerPC PCI hotplug path.

[gwshan: changelog]
Signed-off-by: Wei Yang <weiyang@linux.vnet.ibm.com>
Acked-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
---
 arch/powerpc/kernel/pci-hotplug.c |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/powerpc/kernel/pci-hotplug.c b/arch/powerpc/kernel/pci-hotplug.c
index 7ed85a6..98f84ed 100644
--- a/arch/powerpc/kernel/pci-hotplug.c
+++ b/arch/powerpc/kernel/pci-hotplug.c
@@ -50,7 +50,7 @@ void pcibios_remove_pci_devices(struct pci_bus *bus)
 
 	pr_debug("PCI: Removing devices on bus %04x:%02x\n",
 		 pci_domain_nr(bus),  bus->number);
-	list_for_each_entry_safe(dev, tmp, &bus->devices, bus_list) {
+	list_for_each_entry_safe_reverse(dev, tmp, &bus->devices, bus_list) {
 		pr_debug("   Removing %s...\n", pci_name(dev));
 		pci_stop_and_remove_bus_device(dev);
 	}
-- 
1.7.9.5


^ permalink raw reply related	[flat|nested] 68+ messages in thread

* [PATCH V6 03/10] powerpc/pci: Remove VFs prior to PF
@ 2015-05-19  1:35   ` Wei Yang
  0 siblings, 0 replies; 68+ messages in thread
From: Wei Yang @ 2015-05-19  1:35 UTC (permalink / raw)
  To: gwshan, bhelgaas; +Cc: linux-pci, Wei Yang, linuxppc-dev

As commit ac205b7b ("PCI: make sriov work with hotplug remove") indicates,
VFs, which might be hooked to same PCI bus as their PF should be removed
before the PF. Otherwise, the PCI hot unplugging on the PCI bus would
cause kernel crash.

The patch applies the above pattern to PowerPC PCI hotplug path.

[gwshan: changelog]
Signed-off-by: Wei Yang <weiyang@linux.vnet.ibm.com>
Acked-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
---
 arch/powerpc/kernel/pci-hotplug.c |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/powerpc/kernel/pci-hotplug.c b/arch/powerpc/kernel/pci-hotplug.c
index 7ed85a6..98f84ed 100644
--- a/arch/powerpc/kernel/pci-hotplug.c
+++ b/arch/powerpc/kernel/pci-hotplug.c
@@ -50,7 +50,7 @@ void pcibios_remove_pci_devices(struct pci_bus *bus)
 
 	pr_debug("PCI: Removing devices on bus %04x:%02x\n",
 		 pci_domain_nr(bus),  bus->number);
-	list_for_each_entry_safe(dev, tmp, &bus->devices, bus_list) {
+	list_for_each_entry_safe_reverse(dev, tmp, &bus->devices, bus_list) {
 		pr_debug("   Removing %s...\n", pci_name(dev));
 		pci_stop_and_remove_bus_device(dev);
 	}
-- 
1.7.9.5

^ permalink raw reply related	[flat|nested] 68+ messages in thread

* [PATCH V6 04/10] powerpc/eeh: Trace first 7 BARs in address cache
  2015-05-19  1:35 ` Wei Yang
@ 2015-05-19  1:35   ` Wei Yang
  -1 siblings, 0 replies; 68+ messages in thread
From: Wei Yang @ 2015-05-19  1:35 UTC (permalink / raw)
  To: gwshan, bhelgaas; +Cc: linuxppc-dev, linux-pci, Wei Yang

EEH address cache, which helps to locate the PCI device according to
the given (physical) MMIO address, didn't cover PCI bridges. Also, it
shouldn't return PF with address in PF's IOV BARs. Instead, the VFs
should be returned.

The patch restricts the address cache to cover first 7 BARs for the
above purposes.

[gwshan: changelog]
Signed-off-by: Wei Yang <weiyang@linux.vnet.ibm.com>
Acked-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
---
 arch/powerpc/kernel/eeh_cache.c |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/powerpc/kernel/eeh_cache.c b/arch/powerpc/kernel/eeh_cache.c
index eeabeab..f6c5f05 100644
--- a/arch/powerpc/kernel/eeh_cache.c
+++ b/arch/powerpc/kernel/eeh_cache.c
@@ -196,7 +196,7 @@ static void __eeh_addr_cache_insert_dev(struct pci_dev *dev)
 	}
 
 	/* Walk resources on this device, poke them into the tree */
-	for (i = 0; i < DEVICE_COUNT_RESOURCE; i++) {
+	for (i = 0; i <= PCI_ROM_RESOURCE; i++) {
 		unsigned long start = pci_resource_start(dev,i);
 		unsigned long end = pci_resource_end(dev,i);
 		unsigned int flags = pci_resource_flags(dev,i);
-- 
1.7.9.5


^ permalink raw reply related	[flat|nested] 68+ messages in thread

* [PATCH V6 04/10] powerpc/eeh: Trace first 7 BARs in address cache
@ 2015-05-19  1:35   ` Wei Yang
  0 siblings, 0 replies; 68+ messages in thread
From: Wei Yang @ 2015-05-19  1:35 UTC (permalink / raw)
  To: gwshan, bhelgaas; +Cc: linux-pci, Wei Yang, linuxppc-dev

EEH address cache, which helps to locate the PCI device according to
the given (physical) MMIO address, didn't cover PCI bridges. Also, it
shouldn't return PF with address in PF's IOV BARs. Instead, the VFs
should be returned.

The patch restricts the address cache to cover first 7 BARs for the
above purposes.

[gwshan: changelog]
Signed-off-by: Wei Yang <weiyang@linux.vnet.ibm.com>
Acked-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
---
 arch/powerpc/kernel/eeh_cache.c |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/powerpc/kernel/eeh_cache.c b/arch/powerpc/kernel/eeh_cache.c
index eeabeab..f6c5f05 100644
--- a/arch/powerpc/kernel/eeh_cache.c
+++ b/arch/powerpc/kernel/eeh_cache.c
@@ -196,7 +196,7 @@ static void __eeh_addr_cache_insert_dev(struct pci_dev *dev)
 	}
 
 	/* Walk resources on this device, poke them into the tree */
-	for (i = 0; i < DEVICE_COUNT_RESOURCE; i++) {
+	for (i = 0; i <= PCI_ROM_RESOURCE; i++) {
 		unsigned long start = pci_resource_start(dev,i);
 		unsigned long end = pci_resource_end(dev,i);
 		unsigned int flags = pci_resource_flags(dev,i);
-- 
1.7.9.5

^ permalink raw reply related	[flat|nested] 68+ messages in thread

* [PATCH V6 05/10] powerpc/powernv: EEH device for VF
  2015-05-19  1:35 ` Wei Yang
@ 2015-05-19  1:35   ` Wei Yang
  -1 siblings, 0 replies; 68+ messages in thread
From: Wei Yang @ 2015-05-19  1:35 UTC (permalink / raw)
  To: gwshan, bhelgaas; +Cc: linuxppc-dev, linux-pci, Wei Yang

VFs and their corresponding pci_dn instances are created and released
dynamically as their PF's SRIOV capability is enabled and disabled.
The patch creates and releases EEH devices for VFs when creating and
releasing their pci_dn instances, which means EEH devices and pci_dn
instances have same life cycle. Also, VF's EEH device is identified
by (struct eeh_dev::physfn).

[gwshan: changelog and removed CONFIG_PCI_IOV]
Signed-off-by: Wei Yang <weiyang@linux.vnet.ibm.com>
Acked-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
---
 arch/powerpc/include/asm/eeh.h |    1 +
 arch/powerpc/kernel/pci_dn.c   |   12 ++++++++++++
 2 files changed, 13 insertions(+)

diff --git a/arch/powerpc/include/asm/eeh.h b/arch/powerpc/include/asm/eeh.h
index a52db28..1b3614d 100644
--- a/arch/powerpc/include/asm/eeh.h
+++ b/arch/powerpc/include/asm/eeh.h
@@ -138,6 +138,7 @@ struct eeh_dev {
 	struct pci_controller *phb;	/* Associated PHB		*/
 	struct pci_dn *pdn;		/* Associated PCI device node	*/
 	struct pci_dev *pdev;		/* Associated PCI device	*/
+	struct pci_dev *physfn;		/* Associated PF PORT		*/
 	struct pci_bus *bus;		/* PCI bus for partial hotplug	*/
 };
 
diff --git a/arch/powerpc/kernel/pci_dn.c b/arch/powerpc/kernel/pci_dn.c
index f771130..f0ddde7 100644
--- a/arch/powerpc/kernel/pci_dn.c
+++ b/arch/powerpc/kernel/pci_dn.c
@@ -180,7 +180,9 @@ static struct pci_dn *add_one_dev_pci_data(struct pci_dn *parent,
 struct pci_dn *add_dev_pci_data(struct pci_dev *pdev)
 {
 #ifdef CONFIG_PCI_IOV
+	struct pci_controller *hose = pci_bus_to_host(pdev->bus);
 	struct pci_dn *parent, *pdn;
+	struct eeh_dev *edev;
 	int i;
 
 	/* Only support IOV for now */
@@ -206,6 +208,9 @@ struct pci_dn *add_dev_pci_data(struct pci_dev *pdev)
 				 __func__, i);
 			return NULL;
 		}
+		eeh_dev_init(pdn, hose);
+		edev = pdn_to_eeh_dev(pdn);
+		edev->physfn = pdev;
 	}
 #endif /* CONFIG_PCI_IOV */
 
@@ -254,10 +259,17 @@ void remove_dev_pci_data(struct pci_dev *pdev)
 	for (i = 0; i < pci_sriov_get_totalvfs(pdev); i++) {
 		list_for_each_entry_safe(pdn, tmp,
 			&parent->child_list, list) {
+			struct eeh_dev *edev;
 			if (pdn->busno != pci_iov_virtfn_bus(pdev, i) ||
 			    pdn->devfn != pci_iov_virtfn_devfn(pdev, i))
 				continue;
 
+			edev = pdn_to_eeh_dev(pdn);
+			if (edev) {
+				pdn->edev = NULL;
+				kfree(edev);
+			}
+
 			if (!list_empty(&pdn->list))
 				list_del(&pdn->list);
 
-- 
1.7.9.5


^ permalink raw reply related	[flat|nested] 68+ messages in thread

* [PATCH V6 05/10] powerpc/powernv: EEH device for VF
@ 2015-05-19  1:35   ` Wei Yang
  0 siblings, 0 replies; 68+ messages in thread
From: Wei Yang @ 2015-05-19  1:35 UTC (permalink / raw)
  To: gwshan, bhelgaas; +Cc: linux-pci, Wei Yang, linuxppc-dev

VFs and their corresponding pci_dn instances are created and released
dynamically as their PF's SRIOV capability is enabled and disabled.
The patch creates and releases EEH devices for VFs when creating and
releasing their pci_dn instances, which means EEH devices and pci_dn
instances have same life cycle. Also, VF's EEH device is identified
by (struct eeh_dev::physfn).

[gwshan: changelog and removed CONFIG_PCI_IOV]
Signed-off-by: Wei Yang <weiyang@linux.vnet.ibm.com>
Acked-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
---
 arch/powerpc/include/asm/eeh.h |    1 +
 arch/powerpc/kernel/pci_dn.c   |   12 ++++++++++++
 2 files changed, 13 insertions(+)

diff --git a/arch/powerpc/include/asm/eeh.h b/arch/powerpc/include/asm/eeh.h
index a52db28..1b3614d 100644
--- a/arch/powerpc/include/asm/eeh.h
+++ b/arch/powerpc/include/asm/eeh.h
@@ -138,6 +138,7 @@ struct eeh_dev {
 	struct pci_controller *phb;	/* Associated PHB		*/
 	struct pci_dn *pdn;		/* Associated PCI device node	*/
 	struct pci_dev *pdev;		/* Associated PCI device	*/
+	struct pci_dev *physfn;		/* Associated PF PORT		*/
 	struct pci_bus *bus;		/* PCI bus for partial hotplug	*/
 };
 
diff --git a/arch/powerpc/kernel/pci_dn.c b/arch/powerpc/kernel/pci_dn.c
index f771130..f0ddde7 100644
--- a/arch/powerpc/kernel/pci_dn.c
+++ b/arch/powerpc/kernel/pci_dn.c
@@ -180,7 +180,9 @@ static struct pci_dn *add_one_dev_pci_data(struct pci_dn *parent,
 struct pci_dn *add_dev_pci_data(struct pci_dev *pdev)
 {
 #ifdef CONFIG_PCI_IOV
+	struct pci_controller *hose = pci_bus_to_host(pdev->bus);
 	struct pci_dn *parent, *pdn;
+	struct eeh_dev *edev;
 	int i;
 
 	/* Only support IOV for now */
@@ -206,6 +208,9 @@ struct pci_dn *add_dev_pci_data(struct pci_dev *pdev)
 				 __func__, i);
 			return NULL;
 		}
+		eeh_dev_init(pdn, hose);
+		edev = pdn_to_eeh_dev(pdn);
+		edev->physfn = pdev;
 	}
 #endif /* CONFIG_PCI_IOV */
 
@@ -254,10 +259,17 @@ void remove_dev_pci_data(struct pci_dev *pdev)
 	for (i = 0; i < pci_sriov_get_totalvfs(pdev); i++) {
 		list_for_each_entry_safe(pdn, tmp,
 			&parent->child_list, list) {
+			struct eeh_dev *edev;
 			if (pdn->busno != pci_iov_virtfn_bus(pdev, i) ||
 			    pdn->devfn != pci_iov_virtfn_devfn(pdev, i))
 				continue;
 
+			edev = pdn_to_eeh_dev(pdn);
+			if (edev) {
+				pdn->edev = NULL;
+				kfree(edev);
+			}
+
 			if (!list_empty(&pdn->list))
 				list_del(&pdn->list);
 
-- 
1.7.9.5

^ permalink raw reply related	[flat|nested] 68+ messages in thread

* [PATCH V6 06/10] powerpc/eeh: Create PE for VFs
  2015-05-19  1:35 ` Wei Yang
@ 2015-05-19  1:35   ` Wei Yang
  -1 siblings, 0 replies; 68+ messages in thread
From: Wei Yang @ 2015-05-19  1:35 UTC (permalink / raw)
  To: gwshan, bhelgaas; +Cc: linuxppc-dev, linux-pci, Wei Yang

Current EEH recovery code works with the assumption: the PE has primary
bus. Unfortunately, that's not true to VF PEs, which generally contains
one or multiple VFs (for VF group case). The patch creates PEs for VFs
at PCI final fixup time. Those PEs for VFs are indentified with newly
introduced flag EEH_PE_VF so that we handle them differently during
EEH recovery.

[gwshan: changelog and code refactoring]
Signed-off-by: Wei Yang <weiyang@linux.vnet.ibm.com>
Acked-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
---
 arch/powerpc/include/asm/eeh.h               |    1 +
 arch/powerpc/kernel/eeh_pe.c                 |   10 ++++++++--
 arch/powerpc/platforms/powernv/eeh-powernv.c |   17 +++++++++++++++++
 3 files changed, 26 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/include/asm/eeh.h b/arch/powerpc/include/asm/eeh.h
index 1b3614d..c1fde48 100644
--- a/arch/powerpc/include/asm/eeh.h
+++ b/arch/powerpc/include/asm/eeh.h
@@ -70,6 +70,7 @@ struct pci_dn;
 #define EEH_PE_PHB	(1 << 1)	/* PHB PE    */
 #define EEH_PE_DEVICE 	(1 << 2)	/* Device PE */
 #define EEH_PE_BUS	(1 << 3)	/* Bus PE    */
+#define EEH_PE_VF	(1 << 4)	/* VF PE     */
 
 #define EEH_PE_ISOLATED		(1 << 0)	/* Isolated PE		*/
 #define EEH_PE_RECOVERING	(1 << 1)	/* Recovering PE	*/
diff --git a/arch/powerpc/kernel/eeh_pe.c b/arch/powerpc/kernel/eeh_pe.c
index 35f0b62..260a701 100644
--- a/arch/powerpc/kernel/eeh_pe.c
+++ b/arch/powerpc/kernel/eeh_pe.c
@@ -299,7 +299,10 @@ static struct eeh_pe *eeh_pe_get_parent(struct eeh_dev *edev)
 	 * EEH device already having associated PE, but
 	 * the direct parent EEH device doesn't have yet.
 	 */
-	pdn = pdn ? pdn->parent : NULL;
+	if (edev->physfn)
+		pdn = pci_get_pdn(edev->physfn);
+	else
+		pdn = pdn ? pdn->parent : NULL;
 	while (pdn) {
 		/* We're poking out of PCI territory */
 		parent = pdn_to_eeh_dev(pdn);
@@ -382,7 +385,10 @@ int eeh_add_to_parent_pe(struct eeh_dev *edev)
 	}
 
 	/* Create a new EEH PE */
-	pe = eeh_pe_alloc(edev->phb, EEH_PE_DEVICE);
+	if (edev->physfn)
+		pe = eeh_pe_alloc(edev->phb, EEH_PE_VF);
+	else
+		pe = eeh_pe_alloc(edev->phb, EEH_PE_DEVICE);
 	if (!pe) {
 		pr_err("%s: out of memory!\n", __func__);
 		return -ENOMEM;
diff --git a/arch/powerpc/platforms/powernv/eeh-powernv.c b/arch/powerpc/platforms/powernv/eeh-powernv.c
index ce738ab..c505036 100644
--- a/arch/powerpc/platforms/powernv/eeh-powernv.c
+++ b/arch/powerpc/platforms/powernv/eeh-powernv.c
@@ -1520,6 +1520,23 @@ static struct eeh_ops pnv_eeh_ops = {
 	.restore_config		= pnv_eeh_restore_config
 };
 
+static void pnv_eeh_vf_final_fixup(struct pci_dev *pdev)
+{
+	struct pci_dn *pdn = pci_get_pdn(pdev);
+
+	if (!pdev->is_virtfn)
+		return;
+
+	/*
+	 * The following operations will fail if VF's sysfs files
+	 * aren't created or its resources aren't finalized.
+	 */
+	eeh_add_device_early(pdn);
+	eeh_add_device_late(pdev);
+	eeh_sysfs_add_device(pdev);
+}
+DECLARE_PCI_FIXUP_FINAL(PCI_ANY_ID, PCI_ANY_ID, pnv_eeh_vf_final_fixup);
+
 /**
  * eeh_powernv_init - Register platform dependent EEH operations
  *
-- 
1.7.9.5


^ permalink raw reply related	[flat|nested] 68+ messages in thread

* [PATCH V6 06/10] powerpc/eeh: Create PE for VFs
@ 2015-05-19  1:35   ` Wei Yang
  0 siblings, 0 replies; 68+ messages in thread
From: Wei Yang @ 2015-05-19  1:35 UTC (permalink / raw)
  To: gwshan, bhelgaas; +Cc: linux-pci, Wei Yang, linuxppc-dev

Current EEH recovery code works with the assumption: the PE has primary
bus. Unfortunately, that's not true to VF PEs, which generally contains
one or multiple VFs (for VF group case). The patch creates PEs for VFs
at PCI final fixup time. Those PEs for VFs are indentified with newly
introduced flag EEH_PE_VF so that we handle them differently during
EEH recovery.

[gwshan: changelog and code refactoring]
Signed-off-by: Wei Yang <weiyang@linux.vnet.ibm.com>
Acked-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
---
 arch/powerpc/include/asm/eeh.h               |    1 +
 arch/powerpc/kernel/eeh_pe.c                 |   10 ++++++++--
 arch/powerpc/platforms/powernv/eeh-powernv.c |   17 +++++++++++++++++
 3 files changed, 26 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/include/asm/eeh.h b/arch/powerpc/include/asm/eeh.h
index 1b3614d..c1fde48 100644
--- a/arch/powerpc/include/asm/eeh.h
+++ b/arch/powerpc/include/asm/eeh.h
@@ -70,6 +70,7 @@ struct pci_dn;
 #define EEH_PE_PHB	(1 << 1)	/* PHB PE    */
 #define EEH_PE_DEVICE 	(1 << 2)	/* Device PE */
 #define EEH_PE_BUS	(1 << 3)	/* Bus PE    */
+#define EEH_PE_VF	(1 << 4)	/* VF PE     */
 
 #define EEH_PE_ISOLATED		(1 << 0)	/* Isolated PE		*/
 #define EEH_PE_RECOVERING	(1 << 1)	/* Recovering PE	*/
diff --git a/arch/powerpc/kernel/eeh_pe.c b/arch/powerpc/kernel/eeh_pe.c
index 35f0b62..260a701 100644
--- a/arch/powerpc/kernel/eeh_pe.c
+++ b/arch/powerpc/kernel/eeh_pe.c
@@ -299,7 +299,10 @@ static struct eeh_pe *eeh_pe_get_parent(struct eeh_dev *edev)
 	 * EEH device already having associated PE, but
 	 * the direct parent EEH device doesn't have yet.
 	 */
-	pdn = pdn ? pdn->parent : NULL;
+	if (edev->physfn)
+		pdn = pci_get_pdn(edev->physfn);
+	else
+		pdn = pdn ? pdn->parent : NULL;
 	while (pdn) {
 		/* We're poking out of PCI territory */
 		parent = pdn_to_eeh_dev(pdn);
@@ -382,7 +385,10 @@ int eeh_add_to_parent_pe(struct eeh_dev *edev)
 	}
 
 	/* Create a new EEH PE */
-	pe = eeh_pe_alloc(edev->phb, EEH_PE_DEVICE);
+	if (edev->physfn)
+		pe = eeh_pe_alloc(edev->phb, EEH_PE_VF);
+	else
+		pe = eeh_pe_alloc(edev->phb, EEH_PE_DEVICE);
 	if (!pe) {
 		pr_err("%s: out of memory!\n", __func__);
 		return -ENOMEM;
diff --git a/arch/powerpc/platforms/powernv/eeh-powernv.c b/arch/powerpc/platforms/powernv/eeh-powernv.c
index ce738ab..c505036 100644
--- a/arch/powerpc/platforms/powernv/eeh-powernv.c
+++ b/arch/powerpc/platforms/powernv/eeh-powernv.c
@@ -1520,6 +1520,23 @@ static struct eeh_ops pnv_eeh_ops = {
 	.restore_config		= pnv_eeh_restore_config
 };
 
+static void pnv_eeh_vf_final_fixup(struct pci_dev *pdev)
+{
+	struct pci_dn *pdn = pci_get_pdn(pdev);
+
+	if (!pdev->is_virtfn)
+		return;
+
+	/*
+	 * The following operations will fail if VF's sysfs files
+	 * aren't created or its resources aren't finalized.
+	 */
+	eeh_add_device_early(pdn);
+	eeh_add_device_late(pdev);
+	eeh_sysfs_add_device(pdev);
+}
+DECLARE_PCI_FIXUP_FINAL(PCI_ANY_ID, PCI_ANY_ID, pnv_eeh_vf_final_fixup);
+
 /**
  * eeh_powernv_init - Register platform dependent EEH operations
  *
-- 
1.7.9.5

^ permalink raw reply related	[flat|nested] 68+ messages in thread

* [PATCH V6 07/10] powerpc/powernv: Support EEH reset for VF PE
  2015-05-19  1:35 ` Wei Yang
@ 2015-05-19  1:35   ` Wei Yang
  -1 siblings, 0 replies; 68+ messages in thread
From: Wei Yang @ 2015-05-19  1:35 UTC (permalink / raw)
  To: gwshan, bhelgaas; +Cc: linuxppc-dev, linux-pci, Wei Yang

PEs for VFs don't have primary bus. So they have to have their own reset
backend, which is used during EEH recovery. The patch implements the reset
backend for VF's PE by issuing FLR or AF FLR to the VFs, which are contained
in the PE.

[gwshan: changelog and code refactoring]
Signed-off-by: Wei Yang <weiyang@linux.vnet.ibm.com>
Acked-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
---
 arch/powerpc/include/asm/eeh.h               |    1 +
 arch/powerpc/platforms/powernv/eeh-powernv.c |  134 +++++++++++++++++++++++++-
 2 files changed, 134 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/include/asm/eeh.h b/arch/powerpc/include/asm/eeh.h
index c1fde48..3d64cf3 100644
--- a/arch/powerpc/include/asm/eeh.h
+++ b/arch/powerpc/include/asm/eeh.h
@@ -134,6 +134,7 @@ struct eeh_dev {
 	int pcix_cap;			/* Saved PCIx capability	*/
 	int pcie_cap;			/* Saved PCIe capability	*/
 	int aer_cap;			/* Saved AER capability		*/
+	int af_cap;			/* Saved AF capability		*/
 	struct eeh_pe *pe;		/* Associated PE		*/
 	struct list_head list;		/* Form link list in the PE	*/
 	struct pci_controller *phb;	/* Associated PHB		*/
diff --git a/arch/powerpc/platforms/powernv/eeh-powernv.c b/arch/powerpc/platforms/powernv/eeh-powernv.c
index c505036..7af3c1e 100644
--- a/arch/powerpc/platforms/powernv/eeh-powernv.c
+++ b/arch/powerpc/platforms/powernv/eeh-powernv.c
@@ -402,6 +402,7 @@ static void *pnv_eeh_probe(struct pci_dn *pdn, void *data)
 	edev->pcix_cap = pnv_eeh_find_cap(pdn, PCI_CAP_ID_PCIX);
 	edev->pcie_cap = pnv_eeh_find_cap(pdn, PCI_CAP_ID_EXP);
 	edev->aer_cap  = pnv_eeh_find_ecap(pdn, PCI_EXT_CAP_ID_ERR);
+	edev->af_cap   = pnv_eeh_find_cap(pdn, PCI_CAP_ID_AF);
 	if ((edev->class_code >> 8) == PCI_CLASS_BRIDGE_PCI) {
 		edev->mode |= EEH_DEV_BRIDGE;
 		if (edev->pcie_cap) {
@@ -891,6 +892,127 @@ static int pnv_eeh_bridge_reset(struct pci_dev *dev, int option)
 	return 0;
 }
 
+static void pnv_eeh_wait_for_pending(struct pci_dn *pdn, int pos,
+				     u16 mask, bool af_flr_rst)
+{
+	struct eeh_dev *edev = pdn_to_eeh_dev(pdn);
+	int status, i;
+
+	/* Wait for Transaction Pending bit to be cleared */
+	for (i = 0; i < 4; i++) {
+		eeh_ops->read_config(pdn, pos, 2, &status);
+		if (!(status & mask))
+			return;
+
+		msleep((1 << i) * 100);
+	}
+
+	pr_warn("%s: Pending transaction while issuing %s FLR to "
+		"%04x:%02x:%02x.%01x\n",
+		__func__, af_flr_rst ? "AF" : "",
+		edev->phb->global_number, pdn->busno,
+		PCI_SLOT(pdn->devfn), PCI_FUNC(pdn->devfn));
+}
+
+static int pnv_eeh_do_flr(struct pci_dn *pdn, int option)
+{
+	struct eeh_dev *edev = pdn_to_eeh_dev(pdn);
+	u32 reg;
+
+	if (!edev->pcie_cap)
+		return -ENOTTY;
+
+	eeh_ops->read_config(pdn, edev->pcie_cap + PCI_EXP_DEVCAP, 4, &reg);
+	if (!(reg & PCI_EXP_DEVCAP_FLR))
+		return -ENOTTY;
+
+	switch (option) {
+	case EEH_RESET_HOT:
+	case EEH_RESET_FUNDAMENTAL:
+		pnv_eeh_wait_for_pending(pdn, edev->pcie_cap + PCI_EXP_DEVSTA,
+					 PCI_EXP_DEVSTA_TRPND, false);
+		eeh_ops->read_config(pdn, edev->pcie_cap + PCI_EXP_DEVCTL,
+				     4, &reg);
+		reg |= PCI_EXP_DEVCTL_BCR_FLR;
+		eeh_ops->write_config(pdn, edev->pcie_cap + PCI_EXP_DEVCTL,
+				      4, reg);
+		msleep(EEH_PE_RST_HOLD_TIME);
+		break;
+	case EEH_RESET_DEACTIVATE:
+		eeh_ops->read_config(pdn, edev->pcie_cap + PCI_EXP_DEVCTL,
+				     4, &reg);
+		reg &= ~PCI_EXP_DEVCTL_BCR_FLR;
+		eeh_ops->write_config(pdn, edev->pcie_cap + PCI_EXP_DEVCTL,
+				      4, reg);
+		msleep(EEH_PE_RST_SETTLE_TIME);
+		break;
+	}
+
+	return 0;
+}
+
+static int pnv_eeh_do_af_flr(struct pci_dn *pdn, int option)
+{
+	struct eeh_dev *edev = pdn_to_eeh_dev(pdn);
+	u32 cap;
+
+	if (!edev->af_cap)
+		return -ENOTTY;
+
+	eeh_ops->read_config(pdn, edev->af_cap + PCI_AF_CAP, 1, &cap);
+	if (!(cap & PCI_AF_CAP_TP) || !(cap & PCI_AF_CAP_FLR))
+		return -ENOTTY;
+
+	switch (option) {
+	case EEH_RESET_HOT:
+	case EEH_RESET_FUNDAMENTAL:
+		/*
+		 * Wait for Transaction Pending bit to clear. A word-aligned
+		 * test is used, so we use the conrol offset rather than status
+		 * and shift the test bit to match.
+		 */
+		pnv_eeh_wait_for_pending(pdn, edev->af_cap + PCI_AF_CTRL,
+					 PCI_AF_STATUS_TP << 8, true);
+		eeh_ops->write_config(pdn, edev->af_cap + PCI_AF_CTRL,
+				      1, PCI_AF_CTRL_FLR);
+		msleep(EEH_PE_RST_HOLD_TIME);
+		break;
+	case EEH_RESET_DEACTIVATE:
+		eeh_ops->write_config(pdn, edev->af_cap + PCI_AF_CTRL, 1, 0);
+		msleep(EEH_PE_RST_SETTLE_TIME);
+		break;
+	}
+
+	return 0;
+}
+
+static int pnv_eeh_reset_vf(struct pci_dn *pdn, int option)
+{
+	int ret;
+
+	ret = pnv_eeh_do_flr(pdn, option);
+	if (ret)
+		return ret;
+
+	return pnv_eeh_do_af_flr(pdn, option);
+}
+
+static int pnv_eeh_vf_pe_reset(struct eeh_pe *pe, int option)
+{
+	struct eeh_dev *edev, *tmp;
+	struct pci_dn *pdn;
+	int ret;
+
+	eeh_pe_for_each_dev(pe, edev, tmp) {
+		pdn = eeh_dev_to_pdn(edev);
+		ret = pnv_eeh_reset_vf(pdn, option);
+		if (ret)
+			return ret;
+	}
+
+	return 0;
+}
+
 void pnv_pci_reset_secondary_bus(struct pci_dev *dev)
 {
 	struct pci_controller *hose;
@@ -966,7 +1088,9 @@ static int pnv_eeh_reset(struct eeh_pe *pe, int option)
 		}
 
 		bus = eeh_pe_bus_get(pe);
-		if (pci_is_root_bus(bus) ||
+		if (pe->type & EEH_PE_VF)
+			ret = pnv_eeh_vf_pe_reset(pe, option);
+		else if (pci_is_root_bus(bus) ||
 			pci_is_root_bus(bus->parent))
 			ret = pnv_eeh_root_reset(hose, option);
 		else
@@ -1106,6 +1230,14 @@ static inline bool pnv_eeh_cfg_blocked(struct pci_dn *pdn)
 	if (!edev || !edev->pe)
 		return false;
 
+	/*
+	 * We will issue FLR or AF FLR to all VFs, which are contained
+	 * in VF PE. It relies on the EEH PCI config accessors. So we
+	 * can't block them during the window.
+	 */
+	if ((edev->physfn) && (edev->pe->state & EEH_PE_RESET))
+		return false;
+
 	if (edev->pe->state & EEH_PE_CFG_BLOCKED)
 		return true;
 
-- 
1.7.9.5


^ permalink raw reply related	[flat|nested] 68+ messages in thread

* [PATCH V6 07/10] powerpc/powernv: Support EEH reset for VF PE
@ 2015-05-19  1:35   ` Wei Yang
  0 siblings, 0 replies; 68+ messages in thread
From: Wei Yang @ 2015-05-19  1:35 UTC (permalink / raw)
  To: gwshan, bhelgaas; +Cc: linux-pci, Wei Yang, linuxppc-dev

PEs for VFs don't have primary bus. So they have to have their own reset
backend, which is used during EEH recovery. The patch implements the reset
backend for VF's PE by issuing FLR or AF FLR to the VFs, which are contained
in the PE.

[gwshan: changelog and code refactoring]
Signed-off-by: Wei Yang <weiyang@linux.vnet.ibm.com>
Acked-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
---
 arch/powerpc/include/asm/eeh.h               |    1 +
 arch/powerpc/platforms/powernv/eeh-powernv.c |  134 +++++++++++++++++++++++++-
 2 files changed, 134 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/include/asm/eeh.h b/arch/powerpc/include/asm/eeh.h
index c1fde48..3d64cf3 100644
--- a/arch/powerpc/include/asm/eeh.h
+++ b/arch/powerpc/include/asm/eeh.h
@@ -134,6 +134,7 @@ struct eeh_dev {
 	int pcix_cap;			/* Saved PCIx capability	*/
 	int pcie_cap;			/* Saved PCIe capability	*/
 	int aer_cap;			/* Saved AER capability		*/
+	int af_cap;			/* Saved AF capability		*/
 	struct eeh_pe *pe;		/* Associated PE		*/
 	struct list_head list;		/* Form link list in the PE	*/
 	struct pci_controller *phb;	/* Associated PHB		*/
diff --git a/arch/powerpc/platforms/powernv/eeh-powernv.c b/arch/powerpc/platforms/powernv/eeh-powernv.c
index c505036..7af3c1e 100644
--- a/arch/powerpc/platforms/powernv/eeh-powernv.c
+++ b/arch/powerpc/platforms/powernv/eeh-powernv.c
@@ -402,6 +402,7 @@ static void *pnv_eeh_probe(struct pci_dn *pdn, void *data)
 	edev->pcix_cap = pnv_eeh_find_cap(pdn, PCI_CAP_ID_PCIX);
 	edev->pcie_cap = pnv_eeh_find_cap(pdn, PCI_CAP_ID_EXP);
 	edev->aer_cap  = pnv_eeh_find_ecap(pdn, PCI_EXT_CAP_ID_ERR);
+	edev->af_cap   = pnv_eeh_find_cap(pdn, PCI_CAP_ID_AF);
 	if ((edev->class_code >> 8) == PCI_CLASS_BRIDGE_PCI) {
 		edev->mode |= EEH_DEV_BRIDGE;
 		if (edev->pcie_cap) {
@@ -891,6 +892,127 @@ static int pnv_eeh_bridge_reset(struct pci_dev *dev, int option)
 	return 0;
 }
 
+static void pnv_eeh_wait_for_pending(struct pci_dn *pdn, int pos,
+				     u16 mask, bool af_flr_rst)
+{
+	struct eeh_dev *edev = pdn_to_eeh_dev(pdn);
+	int status, i;
+
+	/* Wait for Transaction Pending bit to be cleared */
+	for (i = 0; i < 4; i++) {
+		eeh_ops->read_config(pdn, pos, 2, &status);
+		if (!(status & mask))
+			return;
+
+		msleep((1 << i) * 100);
+	}
+
+	pr_warn("%s: Pending transaction while issuing %s FLR to "
+		"%04x:%02x:%02x.%01x\n",
+		__func__, af_flr_rst ? "AF" : "",
+		edev->phb->global_number, pdn->busno,
+		PCI_SLOT(pdn->devfn), PCI_FUNC(pdn->devfn));
+}
+
+static int pnv_eeh_do_flr(struct pci_dn *pdn, int option)
+{
+	struct eeh_dev *edev = pdn_to_eeh_dev(pdn);
+	u32 reg;
+
+	if (!edev->pcie_cap)
+		return -ENOTTY;
+
+	eeh_ops->read_config(pdn, edev->pcie_cap + PCI_EXP_DEVCAP, 4, &reg);
+	if (!(reg & PCI_EXP_DEVCAP_FLR))
+		return -ENOTTY;
+
+	switch (option) {
+	case EEH_RESET_HOT:
+	case EEH_RESET_FUNDAMENTAL:
+		pnv_eeh_wait_for_pending(pdn, edev->pcie_cap + PCI_EXP_DEVSTA,
+					 PCI_EXP_DEVSTA_TRPND, false);
+		eeh_ops->read_config(pdn, edev->pcie_cap + PCI_EXP_DEVCTL,
+				     4, &reg);
+		reg |= PCI_EXP_DEVCTL_BCR_FLR;
+		eeh_ops->write_config(pdn, edev->pcie_cap + PCI_EXP_DEVCTL,
+				      4, reg);
+		msleep(EEH_PE_RST_HOLD_TIME);
+		break;
+	case EEH_RESET_DEACTIVATE:
+		eeh_ops->read_config(pdn, edev->pcie_cap + PCI_EXP_DEVCTL,
+				     4, &reg);
+		reg &= ~PCI_EXP_DEVCTL_BCR_FLR;
+		eeh_ops->write_config(pdn, edev->pcie_cap + PCI_EXP_DEVCTL,
+				      4, reg);
+		msleep(EEH_PE_RST_SETTLE_TIME);
+		break;
+	}
+
+	return 0;
+}
+
+static int pnv_eeh_do_af_flr(struct pci_dn *pdn, int option)
+{
+	struct eeh_dev *edev = pdn_to_eeh_dev(pdn);
+	u32 cap;
+
+	if (!edev->af_cap)
+		return -ENOTTY;
+
+	eeh_ops->read_config(pdn, edev->af_cap + PCI_AF_CAP, 1, &cap);
+	if (!(cap & PCI_AF_CAP_TP) || !(cap & PCI_AF_CAP_FLR))
+		return -ENOTTY;
+
+	switch (option) {
+	case EEH_RESET_HOT:
+	case EEH_RESET_FUNDAMENTAL:
+		/*
+		 * Wait for Transaction Pending bit to clear. A word-aligned
+		 * test is used, so we use the conrol offset rather than status
+		 * and shift the test bit to match.
+		 */
+		pnv_eeh_wait_for_pending(pdn, edev->af_cap + PCI_AF_CTRL,
+					 PCI_AF_STATUS_TP << 8, true);
+		eeh_ops->write_config(pdn, edev->af_cap + PCI_AF_CTRL,
+				      1, PCI_AF_CTRL_FLR);
+		msleep(EEH_PE_RST_HOLD_TIME);
+		break;
+	case EEH_RESET_DEACTIVATE:
+		eeh_ops->write_config(pdn, edev->af_cap + PCI_AF_CTRL, 1, 0);
+		msleep(EEH_PE_RST_SETTLE_TIME);
+		break;
+	}
+
+	return 0;
+}
+
+static int pnv_eeh_reset_vf(struct pci_dn *pdn, int option)
+{
+	int ret;
+
+	ret = pnv_eeh_do_flr(pdn, option);
+	if (ret)
+		return ret;
+
+	return pnv_eeh_do_af_flr(pdn, option);
+}
+
+static int pnv_eeh_vf_pe_reset(struct eeh_pe *pe, int option)
+{
+	struct eeh_dev *edev, *tmp;
+	struct pci_dn *pdn;
+	int ret;
+
+	eeh_pe_for_each_dev(pe, edev, tmp) {
+		pdn = eeh_dev_to_pdn(edev);
+		ret = pnv_eeh_reset_vf(pdn, option);
+		if (ret)
+			return ret;
+	}
+
+	return 0;
+}
+
 void pnv_pci_reset_secondary_bus(struct pci_dev *dev)
 {
 	struct pci_controller *hose;
@@ -966,7 +1088,9 @@ static int pnv_eeh_reset(struct eeh_pe *pe, int option)
 		}
 
 		bus = eeh_pe_bus_get(pe);
-		if (pci_is_root_bus(bus) ||
+		if (pe->type & EEH_PE_VF)
+			ret = pnv_eeh_vf_pe_reset(pe, option);
+		else if (pci_is_root_bus(bus) ||
 			pci_is_root_bus(bus->parent))
 			ret = pnv_eeh_root_reset(hose, option);
 		else
@@ -1106,6 +1230,14 @@ static inline bool pnv_eeh_cfg_blocked(struct pci_dn *pdn)
 	if (!edev || !edev->pe)
 		return false;
 
+	/*
+	 * We will issue FLR or AF FLR to all VFs, which are contained
+	 * in VF PE. It relies on the EEH PCI config accessors. So we
+	 * can't block them during the window.
+	 */
+	if ((edev->physfn) && (edev->pe->state & EEH_PE_RESET))
+		return false;
+
 	if (edev->pe->state & EEH_PE_CFG_BLOCKED)
 		return true;
 
-- 
1.7.9.5

^ permalink raw reply related	[flat|nested] 68+ messages in thread

* [PATCH V6 08/10] powerpc/powernv: Support PCI config restore for VFs
  2015-05-19  1:35 ` Wei Yang
@ 2015-05-19  1:35   ` Wei Yang
  -1 siblings, 0 replies; 68+ messages in thread
From: Wei Yang @ 2015-05-19  1:35 UTC (permalink / raw)
  To: gwshan, bhelgaas; +Cc: linuxppc-dev, linux-pci, Wei Yang

After PE reset, OPAL API opal_pci_reinit() is called on all devices
contained in the PE to reinitialize them. However, VFs can't be seen
from skiboot firmware. We have to implement the functions, similar
those in skiboot firmware, to reinitialize VFs after reset on PE
for VFs.

[gwshan: changelog and code refactoring]
Signed-off-by: Wei Yang <weiyang@linux.vnet.ibm.com>
Acked-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
---
 arch/powerpc/include/asm/pci-bridge.h        |    1 +
 arch/powerpc/platforms/powernv/eeh-powernv.c |   70 +++++++++++++++++++++++++-
 arch/powerpc/platforms/powernv/pci.c         |   18 +++++++
 3 files changed, 88 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/include/asm/pci-bridge.h b/arch/powerpc/include/asm/pci-bridge.h
index d78afe4..168b991 100644
--- a/arch/powerpc/include/asm/pci-bridge.h
+++ b/arch/powerpc/include/asm/pci-bridge.h
@@ -205,6 +205,7 @@ struct pci_dn {
 	int     m64_per_iov;
 #define IODA_INVALID_M64        (-1)
 	int     m64_wins[PCI_SRIOV_NUM_BARS][M64_PER_IOV];
+	int	mps;
 #endif /* CONFIG_PCI_IOV */
 #endif
 	struct list_head child_list;
diff --git a/arch/powerpc/platforms/powernv/eeh-powernv.c b/arch/powerpc/platforms/powernv/eeh-powernv.c
index 7af3c1e..33deb78 100644
--- a/arch/powerpc/platforms/powernv/eeh-powernv.c
+++ b/arch/powerpc/platforms/powernv/eeh-powernv.c
@@ -1612,6 +1612,67 @@ static int pnv_eeh_next_error(struct eeh_pe **pe)
 	return ret;
 }
 
+static int pnv_eeh_restore_vf_config(struct pci_dn *pdn)
+{
+	struct eeh_dev *edev = pdn_to_eeh_dev(pdn);
+	u32 devctl, cmd, cap2, aer_capctl;
+	int old_mps;
+
+	/* Restore MPS */
+	if (edev->pcie_cap) {
+		old_mps = (ffs(pdn->mps) - 8) << 5;
+		eeh_ops->read_config(pdn, edev->pcie_cap + PCI_EXP_DEVCTL,
+				     2, &devctl);
+		devctl &= ~PCI_EXP_DEVCTL_PAYLOAD;
+		devctl |= old_mps;
+		eeh_ops->write_config(pdn, edev->pcie_cap + PCI_EXP_DEVCTL,
+				      2, devctl);
+	}
+
+	/* Disable Completion Timeout */
+	if (edev->pcie_cap) {
+		eeh_ops->read_config(pdn, edev->pcie_cap + PCI_EXP_DEVCAP2,
+				     4, &cap2);
+		if (cap2 & 0x10) {
+			eeh_ops->read_config(pdn,
+					edev->pcie_cap + PCI_EXP_DEVCTL2,
+					4, &cap2);
+			cap2 |= 0x10;
+			eeh_ops->write_config(pdn,
+					edev->pcie_cap + PCI_EXP_DEVCTL2,
+					4, cap2);
+		}
+	}
+
+	/* Enable SERR and parity checking */
+	eeh_ops->read_config(pdn, PCI_COMMAND, 2, &cmd);
+	cmd |= (PCI_COMMAND_PARITY | PCI_COMMAND_SERR);
+	eeh_ops->write_config(pdn, PCI_COMMAND, 2, cmd);
+
+	/* Enable report various errors */
+	if (edev->pcie_cap) {
+		eeh_ops->read_config(pdn, edev->pcie_cap + PCI_EXP_DEVCTL,
+				2, &devctl);
+		devctl &= ~PCI_EXP_DEVCTL_CERE;
+		devctl |= (PCI_EXP_DEVCTL_NFERE |
+			   PCI_EXP_DEVCTL_FERE |
+			   PCI_EXP_DEVCTL_URRE);
+		eeh_ops->write_config(pdn, edev->pcie_cap + PCI_EXP_DEVCTL,
+				2, devctl);
+	}
+
+	/* Enable ECRC generation and check */
+	if (edev->pcie_cap && edev->aer_cap) {
+		eeh_ops->read_config(pdn, edev->aer_cap + PCI_ERR_CAP,
+				4, &aer_capctl);
+		aer_capctl |= (PCI_ERR_CAP_ECRC_GENE | PCI_ERR_CAP_ECRC_CHKE);
+		eeh_ops->write_config(pdn, edev->aer_cap + PCI_ERR_CAP,
+				4, aer_capctl);
+	}
+
+	return 0;
+}
+
 static int pnv_eeh_restore_config(struct pci_dn *pdn)
 {
 	struct eeh_dev *edev = pdn_to_eeh_dev(pdn);
@@ -1622,7 +1683,14 @@ static int pnv_eeh_restore_config(struct pci_dn *pdn)
 		return -EEXIST;
 
 	phb = edev->phb->private_data;
-	ret = opal_pci_reinit(phb->opal_id,
+	/*
+	 * We have to restore the PCI config space after reset since the
+	 * firmware can't see SRIOV VFs.
+	 */
+	if (edev->physfn)
+		ret = pnv_eeh_restore_vf_config(pdn);
+	else
+		ret = opal_pci_reinit(phb->opal_id,
 			      OPAL_REINIT_PCI_DEV, edev->config_addr);
 	if (ret) {
 		pr_warn("%s: Can't reinit PCI dev 0x%x (%lld)\n",
diff --git a/arch/powerpc/platforms/powernv/pci.c b/arch/powerpc/platforms/powernv/pci.c
index bca2aeb..10bc8c3 100644
--- a/arch/powerpc/platforms/powernv/pci.c
+++ b/arch/powerpc/platforms/powernv/pci.c
@@ -729,6 +729,24 @@ static void pnv_p7ioc_rc_quirk(struct pci_dev *dev)
 }
 DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_IBM, 0x3b9, pnv_p7ioc_rc_quirk);
 
+#ifdef CONFIG_PCI_IOV
+static void pnv_pci_fixup_vf_mps(struct pci_dev *pdev)
+{
+	struct pci_dn *pdn = pci_get_pdn(pdev);
+	int parent_mps;
+
+	if (!pdev->is_virtfn)
+		return;
+
+	/* Synchronize MPS for VF and PF */
+	parent_mps = pcie_get_mps(pdev->physfn);
+	if ((128 << pdev->pcie_mpss) >= parent_mps)
+		pcie_set_mps(pdev, parent_mps);
+	pdn->mps = pcie_get_mps(pdev);
+}
+DECLARE_PCI_FIXUP_HEADER(PCI_ANY_ID, PCI_ANY_ID, pnv_pci_fixup_vf_mps);
+#endif /* CONFIG_PCI_IOV */
+
 void __init pnv_pci_init(void)
 {
 	struct device_node *np;
-- 
1.7.9.5


^ permalink raw reply related	[flat|nested] 68+ messages in thread

* [PATCH V6 08/10] powerpc/powernv: Support PCI config restore for VFs
@ 2015-05-19  1:35   ` Wei Yang
  0 siblings, 0 replies; 68+ messages in thread
From: Wei Yang @ 2015-05-19  1:35 UTC (permalink / raw)
  To: gwshan, bhelgaas; +Cc: linux-pci, Wei Yang, linuxppc-dev

After PE reset, OPAL API opal_pci_reinit() is called on all devices
contained in the PE to reinitialize them. However, VFs can't be seen
from skiboot firmware. We have to implement the functions, similar
those in skiboot firmware, to reinitialize VFs after reset on PE
for VFs.

[gwshan: changelog and code refactoring]
Signed-off-by: Wei Yang <weiyang@linux.vnet.ibm.com>
Acked-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
---
 arch/powerpc/include/asm/pci-bridge.h        |    1 +
 arch/powerpc/platforms/powernv/eeh-powernv.c |   70 +++++++++++++++++++++++++-
 arch/powerpc/platforms/powernv/pci.c         |   18 +++++++
 3 files changed, 88 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/include/asm/pci-bridge.h b/arch/powerpc/include/asm/pci-bridge.h
index d78afe4..168b991 100644
--- a/arch/powerpc/include/asm/pci-bridge.h
+++ b/arch/powerpc/include/asm/pci-bridge.h
@@ -205,6 +205,7 @@ struct pci_dn {
 	int     m64_per_iov;
 #define IODA_INVALID_M64        (-1)
 	int     m64_wins[PCI_SRIOV_NUM_BARS][M64_PER_IOV];
+	int	mps;
 #endif /* CONFIG_PCI_IOV */
 #endif
 	struct list_head child_list;
diff --git a/arch/powerpc/platforms/powernv/eeh-powernv.c b/arch/powerpc/platforms/powernv/eeh-powernv.c
index 7af3c1e..33deb78 100644
--- a/arch/powerpc/platforms/powernv/eeh-powernv.c
+++ b/arch/powerpc/platforms/powernv/eeh-powernv.c
@@ -1612,6 +1612,67 @@ static int pnv_eeh_next_error(struct eeh_pe **pe)
 	return ret;
 }
 
+static int pnv_eeh_restore_vf_config(struct pci_dn *pdn)
+{
+	struct eeh_dev *edev = pdn_to_eeh_dev(pdn);
+	u32 devctl, cmd, cap2, aer_capctl;
+	int old_mps;
+
+	/* Restore MPS */
+	if (edev->pcie_cap) {
+		old_mps = (ffs(pdn->mps) - 8) << 5;
+		eeh_ops->read_config(pdn, edev->pcie_cap + PCI_EXP_DEVCTL,
+				     2, &devctl);
+		devctl &= ~PCI_EXP_DEVCTL_PAYLOAD;
+		devctl |= old_mps;
+		eeh_ops->write_config(pdn, edev->pcie_cap + PCI_EXP_DEVCTL,
+				      2, devctl);
+	}
+
+	/* Disable Completion Timeout */
+	if (edev->pcie_cap) {
+		eeh_ops->read_config(pdn, edev->pcie_cap + PCI_EXP_DEVCAP2,
+				     4, &cap2);
+		if (cap2 & 0x10) {
+			eeh_ops->read_config(pdn,
+					edev->pcie_cap + PCI_EXP_DEVCTL2,
+					4, &cap2);
+			cap2 |= 0x10;
+			eeh_ops->write_config(pdn,
+					edev->pcie_cap + PCI_EXP_DEVCTL2,
+					4, cap2);
+		}
+	}
+
+	/* Enable SERR and parity checking */
+	eeh_ops->read_config(pdn, PCI_COMMAND, 2, &cmd);
+	cmd |= (PCI_COMMAND_PARITY | PCI_COMMAND_SERR);
+	eeh_ops->write_config(pdn, PCI_COMMAND, 2, cmd);
+
+	/* Enable report various errors */
+	if (edev->pcie_cap) {
+		eeh_ops->read_config(pdn, edev->pcie_cap + PCI_EXP_DEVCTL,
+				2, &devctl);
+		devctl &= ~PCI_EXP_DEVCTL_CERE;
+		devctl |= (PCI_EXP_DEVCTL_NFERE |
+			   PCI_EXP_DEVCTL_FERE |
+			   PCI_EXP_DEVCTL_URRE);
+		eeh_ops->write_config(pdn, edev->pcie_cap + PCI_EXP_DEVCTL,
+				2, devctl);
+	}
+
+	/* Enable ECRC generation and check */
+	if (edev->pcie_cap && edev->aer_cap) {
+		eeh_ops->read_config(pdn, edev->aer_cap + PCI_ERR_CAP,
+				4, &aer_capctl);
+		aer_capctl |= (PCI_ERR_CAP_ECRC_GENE | PCI_ERR_CAP_ECRC_CHKE);
+		eeh_ops->write_config(pdn, edev->aer_cap + PCI_ERR_CAP,
+				4, aer_capctl);
+	}
+
+	return 0;
+}
+
 static int pnv_eeh_restore_config(struct pci_dn *pdn)
 {
 	struct eeh_dev *edev = pdn_to_eeh_dev(pdn);
@@ -1622,7 +1683,14 @@ static int pnv_eeh_restore_config(struct pci_dn *pdn)
 		return -EEXIST;
 
 	phb = edev->phb->private_data;
-	ret = opal_pci_reinit(phb->opal_id,
+	/*
+	 * We have to restore the PCI config space after reset since the
+	 * firmware can't see SRIOV VFs.
+	 */
+	if (edev->physfn)
+		ret = pnv_eeh_restore_vf_config(pdn);
+	else
+		ret = opal_pci_reinit(phb->opal_id,
 			      OPAL_REINIT_PCI_DEV, edev->config_addr);
 	if (ret) {
 		pr_warn("%s: Can't reinit PCI dev 0x%x (%lld)\n",
diff --git a/arch/powerpc/platforms/powernv/pci.c b/arch/powerpc/platforms/powernv/pci.c
index bca2aeb..10bc8c3 100644
--- a/arch/powerpc/platforms/powernv/pci.c
+++ b/arch/powerpc/platforms/powernv/pci.c
@@ -729,6 +729,24 @@ static void pnv_p7ioc_rc_quirk(struct pci_dev *dev)
 }
 DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_IBM, 0x3b9, pnv_p7ioc_rc_quirk);
 
+#ifdef CONFIG_PCI_IOV
+static void pnv_pci_fixup_vf_mps(struct pci_dev *pdev)
+{
+	struct pci_dn *pdn = pci_get_pdn(pdev);
+	int parent_mps;
+
+	if (!pdev->is_virtfn)
+		return;
+
+	/* Synchronize MPS for VF and PF */
+	parent_mps = pcie_get_mps(pdev->physfn);
+	if ((128 << pdev->pcie_mpss) >= parent_mps)
+		pcie_set_mps(pdev, parent_mps);
+	pdn->mps = pcie_get_mps(pdev);
+}
+DECLARE_PCI_FIXUP_HEADER(PCI_ANY_ID, PCI_ANY_ID, pnv_pci_fixup_vf_mps);
+#endif /* CONFIG_PCI_IOV */
+
 void __init pnv_pci_init(void)
 {
 	struct device_node *np;
-- 
1.7.9.5

^ permalink raw reply related	[flat|nested] 68+ messages in thread

* [PATCH V6 09/10] powerpc/eeh: Support error recovery for VF PE
  2015-05-19  1:35 ` Wei Yang
@ 2015-05-19  1:35   ` Wei Yang
  -1 siblings, 0 replies; 68+ messages in thread
From: Wei Yang @ 2015-05-19  1:35 UTC (permalink / raw)
  To: gwshan, bhelgaas; +Cc: linuxppc-dev, linux-pci, Wei Yang

Different from PCI bus dependent PE, PE for VFs doesn't have the
primary bus, on which the PCI hotplug is implemented. The patch
supports error recovery, especially the PCI hotplug for VF's PE.
The hotplug on VF's PE is implemented based on VFs, instead of
PCI bus any more.

[gwshan: changelog and code refactoring]
Signed-off-by: Wei Yang <weiyang@linux.vnet.ibm.com>
Acked-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
---
 arch/powerpc/include/asm/eeh.h   |    1 +
 arch/powerpc/kernel/eeh.c        |    8 +++
 arch/powerpc/kernel/eeh_driver.c |  100 ++++++++++++++++++++++++++++++--------
 arch/powerpc/kernel/eeh_pe.c     |    3 +-
 4 files changed, 90 insertions(+), 22 deletions(-)

diff --git a/arch/powerpc/include/asm/eeh.h b/arch/powerpc/include/asm/eeh.h
index 3d64cf3..d24382c 100644
--- a/arch/powerpc/include/asm/eeh.h
+++ b/arch/powerpc/include/asm/eeh.h
@@ -140,6 +140,7 @@ struct eeh_dev {
 	struct pci_controller *phb;	/* Associated PHB		*/
 	struct pci_dn *pdn;		/* Associated PCI device node	*/
 	struct pci_dev *pdev;		/* Associated PCI device	*/
+	int    in_error;		/* Error flag for eeh_dev	*/
 	struct pci_dev *physfn;		/* Associated PF PORT		*/
 	struct pci_bus *bus;		/* PCI bus for partial hotplug	*/
 };
diff --git a/arch/powerpc/kernel/eeh.c b/arch/powerpc/kernel/eeh.c
index 9ee61d1..1207547 100644
--- a/arch/powerpc/kernel/eeh.c
+++ b/arch/powerpc/kernel/eeh.c
@@ -1229,6 +1229,14 @@ void eeh_remove_device(struct pci_dev *dev)
 	 * from the parent PE during the BAR resotre.
 	 */
 	edev->pdev = NULL;
+
+	/*
+	 * The flag "in_error" is used to trace EEH devices for VFs
+	 * in error state or not. It's set in eeh_report_error(). If
+	 * it's not set, eeh_report_{reset,resume}() won't be called
+	 * for the VF EEH device.
+	 */
+	edev->in_error = 0;
 	dev->dev.archdata.edev = NULL;
 	if (!(edev->pe->state & EEH_PE_KEEP))
 		eeh_rmv_from_parent_pe(edev);
diff --git a/arch/powerpc/kernel/eeh_driver.c b/arch/powerpc/kernel/eeh_driver.c
index 24768ff..63a2c33 100644
--- a/arch/powerpc/kernel/eeh_driver.c
+++ b/arch/powerpc/kernel/eeh_driver.c
@@ -211,6 +211,7 @@ static void *eeh_report_error(void *data, void *userdata)
 	if (rc == PCI_ERS_RESULT_NEED_RESET) *res = rc;
 	if (*res == PCI_ERS_RESULT_NONE) *res = rc;
 
+	edev->in_error = 1;
 	eeh_pcid_put(dev);
 	return NULL;
 }
@@ -282,7 +283,8 @@ static void *eeh_report_reset(void *data, void *userdata)
 
 	if (!driver->err_handler ||
 	    !driver->err_handler->slot_reset ||
-	    (edev->mode & EEH_DEV_NO_HANDLER)) {
+	    (edev->mode & EEH_DEV_NO_HANDLER) ||
+	    (!edev->in_error)) {
 		eeh_pcid_put(dev);
 		return NULL;
 	}
@@ -339,14 +341,16 @@ static void *eeh_report_resume(void *data, void *userdata)
 
 	if (!driver->err_handler ||
 	    !driver->err_handler->resume ||
-	    (edev->mode & EEH_DEV_NO_HANDLER)) {
+	    (edev->mode & EEH_DEV_NO_HANDLER) ||
+	    (!edev->in_error)) {
 		edev->mode &= ~EEH_DEV_NO_HANDLER;
-		eeh_pcid_put(dev);
-		return NULL;
+		goto out;
 	}
 
 	driver->err_handler->resume(dev);
 
+out:
+	edev->in_error = 0;
 	eeh_pcid_put(dev);
 	return NULL;
 }
@@ -386,12 +390,38 @@ static void *eeh_report_failure(void *data, void *userdata)
 	return NULL;
 }
 
+static void *eeh_add_virt_device(void *data, void *userdata)
+{
+	struct pci_driver *driver;
+	struct eeh_dev *edev = (struct eeh_dev *)data;
+	struct pci_dev *dev = eeh_dev_to_pci_dev(edev);
+	struct pci_dn *pdn = eeh_dev_to_pdn(edev);
+
+	if (!(edev->physfn)) {
+		pr_warn("%s: EEH dev %04x:%02x:%02x.%01x not for VF\n",
+			__func__, edev->phb->global_number, pdn->busno,
+			PCI_SLOT(pdn->devfn), PCI_FUNC(pdn->devfn));
+		return NULL;
+	}
+
+	driver = eeh_pcid_get(dev);
+	if (driver) {
+		eeh_pcid_put(dev);
+		if (driver->err_handler)
+			return NULL;
+	}
+
+	pci_iov_virtfn_add(edev->physfn, pdn->vf_index, 0);
+	return NULL;
+}
+
 static void *eeh_rmv_device(void *data, void *userdata)
 {
 	struct pci_driver *driver;
 	struct eeh_dev *edev = (struct eeh_dev *)data;
 	struct pci_dev *dev = eeh_dev_to_pci_dev(edev);
 	int *removed = (int *)userdata;
+	struct pci_dn *pdn = eeh_dev_to_pdn(edev);
 
 	/*
 	 * Actually, we should remove the PCI bridges as well.
@@ -416,7 +446,7 @@ static void *eeh_rmv_device(void *data, void *userdata)
 	driver = eeh_pcid_get(dev);
 	if (driver) {
 		eeh_pcid_put(dev);
-		if (driver->err_handler)
+		if (removed && driver->err_handler)
 			return NULL;
 	}
 
@@ -425,11 +455,23 @@ static void *eeh_rmv_device(void *data, void *userdata)
 		 pci_name(dev));
 	edev->bus = dev->bus;
 	edev->mode |= EEH_DEV_DISCONNECTED;
-	(*removed)++;
+	if (removed)
+		(*removed)++;
 
-	pci_lock_rescan_remove();
-	pci_stop_and_remove_bus_device(dev);
-	pci_unlock_rescan_remove();
+	if (edev->physfn) {
+		pci_iov_virtfn_remove(edev->physfn, pdn->vf_index, 0);
+		edev->pdev = NULL;
+
+		/*
+		 * We have to set the VF PE number to invalid one, which is
+		 * required to plug the VF successfully.
+		 */
+		pdn->pe_number = IODA_INVALID_PE;
+	} else {
+		pci_lock_rescan_remove();
+		pci_stop_and_remove_bus_device(dev);
+		pci_unlock_rescan_remove();
+	}
 
 	return NULL;
 }
@@ -548,6 +590,7 @@ static int eeh_reset_device(struct eeh_pe *pe, struct pci_bus *bus)
 	struct pci_bus *frozen_bus = eeh_pe_bus_get(pe);
 	struct timeval tstamp;
 	int cnt, rc, removed = 0;
+	struct eeh_dev *edev;
 
 	/* pcibios will clear the counter; save the value */
 	cnt = pe->freeze_count;
@@ -561,12 +604,15 @@ static int eeh_reset_device(struct eeh_pe *pe, struct pci_bus *bus)
 	 */
 	eeh_pe_state_mark(pe, EEH_PE_KEEP);
 	if (bus) {
-		pci_lock_rescan_remove();
-		pcibios_remove_pci_devices(bus);
-		pci_unlock_rescan_remove();
-	} else if (frozen_bus) {
+		if (pe->type & EEH_PE_VF)
+			eeh_pe_dev_traverse(pe, eeh_rmv_device, NULL);
+		else {
+			pci_lock_rescan_remove();
+			pcibios_remove_pci_devices(bus);
+			pci_unlock_rescan_remove();
+		}
+	} else if (frozen_bus)
 		eeh_pe_dev_traverse(pe, eeh_rmv_device, &removed);
-	}
 
 	/*
 	 * Reset the pci controller. (Asserts RST#; resets config space).
@@ -607,14 +653,22 @@ static int eeh_reset_device(struct eeh_pe *pe, struct pci_bus *bus)
 		 * PE. We should disconnect it so the binding can be
 		 * rebuilt when adding PCI devices.
 		 */
+		edev = list_first_entry(&pe->edevs, struct eeh_dev, list);
 		eeh_pe_traverse(pe, eeh_pe_detach_dev, NULL);
-		pcibios_add_pci_devices(bus);
+		if (pe->type & EEH_PE_VF)
+			eeh_add_virt_device(edev, NULL);
+		else
+			pcibios_add_pci_devices(bus);
 	} else if (frozen_bus && removed) {
 		pr_info("EEH: Sleep 5s ahead of partial hotplug\n");
 		ssleep(5);
 
+		edev = list_first_entry(&pe->edevs, struct eeh_dev, list);
 		eeh_pe_traverse(pe, eeh_pe_detach_dev, NULL);
-		pcibios_add_pci_devices(frozen_bus);
+		if (pe->type & EEH_PE_VF)
+			eeh_add_virt_device(edev, NULL);
+		else
+			pcibios_add_pci_devices(frozen_bus);
 	}
 	eeh_pe_state_clear(pe, EEH_PE_KEEP);
 
@@ -792,11 +846,15 @@ perm_error:
 	 * the their PCI config any more.
 	 */
 	if (frozen_bus) {
-		eeh_pe_dev_mode_mark(pe, EEH_DEV_REMOVED);
-
-		pci_lock_rescan_remove();
-		pcibios_remove_pci_devices(frozen_bus);
-		pci_unlock_rescan_remove();
+		if (pe->type & EEH_PE_VF) {
+			eeh_pe_dev_traverse(pe, eeh_rmv_device, NULL);
+			eeh_pe_dev_mode_mark(pe, EEH_DEV_REMOVED);
+		} else {
+			eeh_pe_dev_mode_mark(pe, EEH_DEV_REMOVED);
+			pci_lock_rescan_remove();
+			pcibios_remove_pci_devices(frozen_bus);
+			pci_unlock_rescan_remove();
+		}
 	}
 }
 
diff --git a/arch/powerpc/kernel/eeh_pe.c b/arch/powerpc/kernel/eeh_pe.c
index 260a701..5cde950 100644
--- a/arch/powerpc/kernel/eeh_pe.c
+++ b/arch/powerpc/kernel/eeh_pe.c
@@ -914,7 +914,8 @@ struct pci_bus *eeh_pe_bus_get(struct eeh_pe *pe)
 	if (pe->type & EEH_PE_PHB) {
 		bus = pe->phb->bus;
 	} else if (pe->type & EEH_PE_BUS ||
-		   pe->type & EEH_PE_DEVICE) {
+		   pe->type & EEH_PE_DEVICE ||
+		   pe->type & EEH_PE_VF) {
 		if (pe->bus) {
 			bus = pe->bus;
 			goto out;
-- 
1.7.9.5


^ permalink raw reply related	[flat|nested] 68+ messages in thread

* [PATCH V6 09/10] powerpc/eeh: Support error recovery for VF PE
@ 2015-05-19  1:35   ` Wei Yang
  0 siblings, 0 replies; 68+ messages in thread
From: Wei Yang @ 2015-05-19  1:35 UTC (permalink / raw)
  To: gwshan, bhelgaas; +Cc: linux-pci, Wei Yang, linuxppc-dev

Different from PCI bus dependent PE, PE for VFs doesn't have the
primary bus, on which the PCI hotplug is implemented. The patch
supports error recovery, especially the PCI hotplug for VF's PE.
The hotplug on VF's PE is implemented based on VFs, instead of
PCI bus any more.

[gwshan: changelog and code refactoring]
Signed-off-by: Wei Yang <weiyang@linux.vnet.ibm.com>
Acked-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
---
 arch/powerpc/include/asm/eeh.h   |    1 +
 arch/powerpc/kernel/eeh.c        |    8 +++
 arch/powerpc/kernel/eeh_driver.c |  100 ++++++++++++++++++++++++++++++--------
 arch/powerpc/kernel/eeh_pe.c     |    3 +-
 4 files changed, 90 insertions(+), 22 deletions(-)

diff --git a/arch/powerpc/include/asm/eeh.h b/arch/powerpc/include/asm/eeh.h
index 3d64cf3..d24382c 100644
--- a/arch/powerpc/include/asm/eeh.h
+++ b/arch/powerpc/include/asm/eeh.h
@@ -140,6 +140,7 @@ struct eeh_dev {
 	struct pci_controller *phb;	/* Associated PHB		*/
 	struct pci_dn *pdn;		/* Associated PCI device node	*/
 	struct pci_dev *pdev;		/* Associated PCI device	*/
+	int    in_error;		/* Error flag for eeh_dev	*/
 	struct pci_dev *physfn;		/* Associated PF PORT		*/
 	struct pci_bus *bus;		/* PCI bus for partial hotplug	*/
 };
diff --git a/arch/powerpc/kernel/eeh.c b/arch/powerpc/kernel/eeh.c
index 9ee61d1..1207547 100644
--- a/arch/powerpc/kernel/eeh.c
+++ b/arch/powerpc/kernel/eeh.c
@@ -1229,6 +1229,14 @@ void eeh_remove_device(struct pci_dev *dev)
 	 * from the parent PE during the BAR resotre.
 	 */
 	edev->pdev = NULL;
+
+	/*
+	 * The flag "in_error" is used to trace EEH devices for VFs
+	 * in error state or not. It's set in eeh_report_error(). If
+	 * it's not set, eeh_report_{reset,resume}() won't be called
+	 * for the VF EEH device.
+	 */
+	edev->in_error = 0;
 	dev->dev.archdata.edev = NULL;
 	if (!(edev->pe->state & EEH_PE_KEEP))
 		eeh_rmv_from_parent_pe(edev);
diff --git a/arch/powerpc/kernel/eeh_driver.c b/arch/powerpc/kernel/eeh_driver.c
index 24768ff..63a2c33 100644
--- a/arch/powerpc/kernel/eeh_driver.c
+++ b/arch/powerpc/kernel/eeh_driver.c
@@ -211,6 +211,7 @@ static void *eeh_report_error(void *data, void *userdata)
 	if (rc == PCI_ERS_RESULT_NEED_RESET) *res = rc;
 	if (*res == PCI_ERS_RESULT_NONE) *res = rc;
 
+	edev->in_error = 1;
 	eeh_pcid_put(dev);
 	return NULL;
 }
@@ -282,7 +283,8 @@ static void *eeh_report_reset(void *data, void *userdata)
 
 	if (!driver->err_handler ||
 	    !driver->err_handler->slot_reset ||
-	    (edev->mode & EEH_DEV_NO_HANDLER)) {
+	    (edev->mode & EEH_DEV_NO_HANDLER) ||
+	    (!edev->in_error)) {
 		eeh_pcid_put(dev);
 		return NULL;
 	}
@@ -339,14 +341,16 @@ static void *eeh_report_resume(void *data, void *userdata)
 
 	if (!driver->err_handler ||
 	    !driver->err_handler->resume ||
-	    (edev->mode & EEH_DEV_NO_HANDLER)) {
+	    (edev->mode & EEH_DEV_NO_HANDLER) ||
+	    (!edev->in_error)) {
 		edev->mode &= ~EEH_DEV_NO_HANDLER;
-		eeh_pcid_put(dev);
-		return NULL;
+		goto out;
 	}
 
 	driver->err_handler->resume(dev);
 
+out:
+	edev->in_error = 0;
 	eeh_pcid_put(dev);
 	return NULL;
 }
@@ -386,12 +390,38 @@ static void *eeh_report_failure(void *data, void *userdata)
 	return NULL;
 }
 
+static void *eeh_add_virt_device(void *data, void *userdata)
+{
+	struct pci_driver *driver;
+	struct eeh_dev *edev = (struct eeh_dev *)data;
+	struct pci_dev *dev = eeh_dev_to_pci_dev(edev);
+	struct pci_dn *pdn = eeh_dev_to_pdn(edev);
+
+	if (!(edev->physfn)) {
+		pr_warn("%s: EEH dev %04x:%02x:%02x.%01x not for VF\n",
+			__func__, edev->phb->global_number, pdn->busno,
+			PCI_SLOT(pdn->devfn), PCI_FUNC(pdn->devfn));
+		return NULL;
+	}
+
+	driver = eeh_pcid_get(dev);
+	if (driver) {
+		eeh_pcid_put(dev);
+		if (driver->err_handler)
+			return NULL;
+	}
+
+	pci_iov_virtfn_add(edev->physfn, pdn->vf_index, 0);
+	return NULL;
+}
+
 static void *eeh_rmv_device(void *data, void *userdata)
 {
 	struct pci_driver *driver;
 	struct eeh_dev *edev = (struct eeh_dev *)data;
 	struct pci_dev *dev = eeh_dev_to_pci_dev(edev);
 	int *removed = (int *)userdata;
+	struct pci_dn *pdn = eeh_dev_to_pdn(edev);
 
 	/*
 	 * Actually, we should remove the PCI bridges as well.
@@ -416,7 +446,7 @@ static void *eeh_rmv_device(void *data, void *userdata)
 	driver = eeh_pcid_get(dev);
 	if (driver) {
 		eeh_pcid_put(dev);
-		if (driver->err_handler)
+		if (removed && driver->err_handler)
 			return NULL;
 	}
 
@@ -425,11 +455,23 @@ static void *eeh_rmv_device(void *data, void *userdata)
 		 pci_name(dev));
 	edev->bus = dev->bus;
 	edev->mode |= EEH_DEV_DISCONNECTED;
-	(*removed)++;
+	if (removed)
+		(*removed)++;
 
-	pci_lock_rescan_remove();
-	pci_stop_and_remove_bus_device(dev);
-	pci_unlock_rescan_remove();
+	if (edev->physfn) {
+		pci_iov_virtfn_remove(edev->physfn, pdn->vf_index, 0);
+		edev->pdev = NULL;
+
+		/*
+		 * We have to set the VF PE number to invalid one, which is
+		 * required to plug the VF successfully.
+		 */
+		pdn->pe_number = IODA_INVALID_PE;
+	} else {
+		pci_lock_rescan_remove();
+		pci_stop_and_remove_bus_device(dev);
+		pci_unlock_rescan_remove();
+	}
 
 	return NULL;
 }
@@ -548,6 +590,7 @@ static int eeh_reset_device(struct eeh_pe *pe, struct pci_bus *bus)
 	struct pci_bus *frozen_bus = eeh_pe_bus_get(pe);
 	struct timeval tstamp;
 	int cnt, rc, removed = 0;
+	struct eeh_dev *edev;
 
 	/* pcibios will clear the counter; save the value */
 	cnt = pe->freeze_count;
@@ -561,12 +604,15 @@ static int eeh_reset_device(struct eeh_pe *pe, struct pci_bus *bus)
 	 */
 	eeh_pe_state_mark(pe, EEH_PE_KEEP);
 	if (bus) {
-		pci_lock_rescan_remove();
-		pcibios_remove_pci_devices(bus);
-		pci_unlock_rescan_remove();
-	} else if (frozen_bus) {
+		if (pe->type & EEH_PE_VF)
+			eeh_pe_dev_traverse(pe, eeh_rmv_device, NULL);
+		else {
+			pci_lock_rescan_remove();
+			pcibios_remove_pci_devices(bus);
+			pci_unlock_rescan_remove();
+		}
+	} else if (frozen_bus)
 		eeh_pe_dev_traverse(pe, eeh_rmv_device, &removed);
-	}
 
 	/*
 	 * Reset the pci controller. (Asserts RST#; resets config space).
@@ -607,14 +653,22 @@ static int eeh_reset_device(struct eeh_pe *pe, struct pci_bus *bus)
 		 * PE. We should disconnect it so the binding can be
 		 * rebuilt when adding PCI devices.
 		 */
+		edev = list_first_entry(&pe->edevs, struct eeh_dev, list);
 		eeh_pe_traverse(pe, eeh_pe_detach_dev, NULL);
-		pcibios_add_pci_devices(bus);
+		if (pe->type & EEH_PE_VF)
+			eeh_add_virt_device(edev, NULL);
+		else
+			pcibios_add_pci_devices(bus);
 	} else if (frozen_bus && removed) {
 		pr_info("EEH: Sleep 5s ahead of partial hotplug\n");
 		ssleep(5);
 
+		edev = list_first_entry(&pe->edevs, struct eeh_dev, list);
 		eeh_pe_traverse(pe, eeh_pe_detach_dev, NULL);
-		pcibios_add_pci_devices(frozen_bus);
+		if (pe->type & EEH_PE_VF)
+			eeh_add_virt_device(edev, NULL);
+		else
+			pcibios_add_pci_devices(frozen_bus);
 	}
 	eeh_pe_state_clear(pe, EEH_PE_KEEP);
 
@@ -792,11 +846,15 @@ perm_error:
 	 * the their PCI config any more.
 	 */
 	if (frozen_bus) {
-		eeh_pe_dev_mode_mark(pe, EEH_DEV_REMOVED);
-
-		pci_lock_rescan_remove();
-		pcibios_remove_pci_devices(frozen_bus);
-		pci_unlock_rescan_remove();
+		if (pe->type & EEH_PE_VF) {
+			eeh_pe_dev_traverse(pe, eeh_rmv_device, NULL);
+			eeh_pe_dev_mode_mark(pe, EEH_DEV_REMOVED);
+		} else {
+			eeh_pe_dev_mode_mark(pe, EEH_DEV_REMOVED);
+			pci_lock_rescan_remove();
+			pcibios_remove_pci_devices(frozen_bus);
+			pci_unlock_rescan_remove();
+		}
 	}
 }
 
diff --git a/arch/powerpc/kernel/eeh_pe.c b/arch/powerpc/kernel/eeh_pe.c
index 260a701..5cde950 100644
--- a/arch/powerpc/kernel/eeh_pe.c
+++ b/arch/powerpc/kernel/eeh_pe.c
@@ -914,7 +914,8 @@ struct pci_bus *eeh_pe_bus_get(struct eeh_pe *pe)
 	if (pe->type & EEH_PE_PHB) {
 		bus = pe->phb->bus;
 	} else if (pe->type & EEH_PE_BUS ||
-		   pe->type & EEH_PE_DEVICE) {
+		   pe->type & EEH_PE_DEVICE ||
+		   pe->type & EEH_PE_VF) {
 		if (pe->bus) {
 			bus = pe->bus;
 			goto out;
-- 
1.7.9.5

^ permalink raw reply related	[flat|nested] 68+ messages in thread

* [PATCH V6 10/10] powerpc/powernv: compound PE for VFs
  2015-05-19  1:35 ` Wei Yang
@ 2015-05-19  1:35   ` Wei Yang
  -1 siblings, 0 replies; 68+ messages in thread
From: Wei Yang @ 2015-05-19  1:35 UTC (permalink / raw)
  To: gwshan, bhelgaas; +Cc: linuxppc-dev, linux-pci, Wei Yang

When VF BAR size is larger than 64MB, we group VFs in terms of M64 BAR,
which means those VFs in a group should form a compound PE.

This patch links those VF PEs into compound PE in this case.

[gwshan: code refactoring for a bit]
Signed-off-by: Wei Yang <weiyang@linux.vnet.ibm.com>
Acked-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
---
 arch/powerpc/platforms/powernv/pci-ioda.c |   46 +++++++++++++++++++++++++----
 arch/powerpc/platforms/powernv/pci.c      |   17 +++++++++--
 2 files changed, 56 insertions(+), 7 deletions(-)

diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c b/arch/powerpc/platforms/powernv/pci-ioda.c
index f8bc950..56e7b65 100644
--- a/arch/powerpc/platforms/powernv/pci-ioda.c
+++ b/arch/powerpc/platforms/powernv/pci-ioda.c
@@ -1363,9 +1363,20 @@ static void pnv_ioda_release_vf_PE(struct pci_dev *pdev, u16 num_vfs)
 	}
 
 	list_for_each_entry_safe(pe, pe_n, &phb->ioda.pe_list, list) {
+		struct pnv_ioda_pe *s, *sn;
 		if (pe->parent_dev != pdev)
 			continue;
 
+		if ((pe->flags & PNV_IODA_PE_MASTER) &&
+		    (pe->flags & PNV_IODA_PE_VF)) {
+			list_for_each_entry_safe(s, sn, &pe->slaves, list) {
+				pnv_pci_ioda2_release_dma_pe(pdev, s);
+				list_del(&s->list);
+				pnv_ioda_deconfigure_pe(phb, s);
+				pnv_ioda_free_pe(phb, s->pe_number);
+			}
+		}
+
 		pnv_pci_ioda2_release_dma_pe(pdev, pe);
 
 		/* Remove from list */
@@ -1418,7 +1429,7 @@ static void pnv_ioda_setup_vf_PE(struct pci_dev *pdev, u16 num_vfs)
 	struct pci_bus        *bus;
 	struct pci_controller *hose;
 	struct pnv_phb        *phb;
-	struct pnv_ioda_pe    *pe;
+	struct pnv_ioda_pe    *pe, *master_pe;
 	int                    pe_num;
 	u16                    vf_index;
 	struct pci_dn         *pdn;
@@ -1464,10 +1475,13 @@ static void pnv_ioda_setup_vf_PE(struct pci_dev *pdev, u16 num_vfs)
 				GFP_KERNEL, hose->node);
 		pe->tce32_table->data = pe;
 
-		/* Put PE to the list */
-		mutex_lock(&phb->ioda.pe_list_mutex);
-		list_add_tail(&pe->list, &phb->ioda.pe_list);
-		mutex_unlock(&phb->ioda.pe_list_mutex);
+		/* Put PE to the list, or postpone it for compound PEs */
+		if ((pdn->m64_per_iov != M64_PER_IOV) ||
+		    (num_vfs <= M64_PER_IOV)) {
+			mutex_lock(&phb->ioda.pe_list_mutex);
+			list_add_tail(&pe->list, &phb->ioda.pe_list);
+			mutex_unlock(&phb->ioda.pe_list_mutex);
+		}
 
 		pnv_pci_ioda2_setup_dma_pe(phb, pe);
 	}
@@ -1480,10 +1494,32 @@ static void pnv_ioda_setup_vf_PE(struct pci_dev *pdev, u16 num_vfs)
 		vf_per_group = roundup_pow_of_two(num_vfs) / pdn->m64_per_iov;
 
 		for (vf_group = 0; vf_group < M64_PER_IOV; vf_group++) {
+			master_pe = NULL;
+
 			for (vf_index = vf_group * vf_per_group;
 			     vf_index < (vf_group + 1) * vf_per_group &&
 			     vf_index < num_vfs;
 			     vf_index++) {
+
+				/*
+				 * Figure out the master PE and put all slave
+				 * PEs to master PE's list.
+				 */
+				pe = &phb->ioda.pe_array[pdn->offset + vf_index];
+				if (!master_pe) {
+					pe->flags |= PNV_IODA_PE_MASTER;
+					INIT_LIST_HEAD(&pe->slaves);
+					master_pe = pe;
+					mutex_lock(&phb->ioda.pe_list_mutex);
+					list_add_tail(&pe->list, &phb->ioda.pe_list);
+					mutex_unlock(&phb->ioda.pe_list_mutex);
+				} else {
+					pe->flags |= PNV_IODA_PE_SLAVE;
+					pe->master = master_pe;
+					list_add_tail(&pe->list,
+						&master_pe->slaves);
+				}
+
 				for (vf_index1 = vf_group * vf_per_group;
 				     vf_index1 < (vf_group + 1) * vf_per_group &&
 				     vf_index1 < num_vfs;
diff --git a/arch/powerpc/platforms/powernv/pci.c b/arch/powerpc/platforms/powernv/pci.c
index 10bc8c3..717a58e 100644
--- a/arch/powerpc/platforms/powernv/pci.c
+++ b/arch/powerpc/platforms/powernv/pci.c
@@ -667,7 +667,7 @@ static void pnv_pci_dma_dev_setup(struct pci_dev *pdev)
 	struct pci_controller *hose = pci_bus_to_host(pdev->bus);
 	struct pnv_phb *phb = hose->private_data;
 #ifdef CONFIG_PCI_IOV
-	struct pnv_ioda_pe *pe;
+	struct pnv_ioda_pe *pe, *slave;
 	struct pci_dn *pdn;
 
 	/* Fix the VF pdn PE number */
@@ -679,10 +679,23 @@ static void pnv_pci_dma_dev_setup(struct pci_dev *pdev)
 			    (pdev->devfn & 0xff))) {
 				pdn->pe_number = pe->pe_number;
 				pe->pdev = pdev;
-				break;
+				goto found;
+			}
+
+			if ((pe->flags & PNV_IODA_PE_MASTER) &&
+			    (pe->flags & PNV_IODA_PE_VF)) {
+				list_for_each_entry(slave, &pe->slaves, list) {
+					if (slave->rid == ((pdev->bus->number << 8)
+					   | (pdev->devfn & 0xff))) {
+						pdn->pe_number = slave->pe_number;
+						slave->pdev = pdev;
+						goto found;
+					}
+				}
 			}
 		}
 	}
+found:
 #endif /* CONFIG_PCI_IOV */
 
 	if (phb && phb->dma_dev_setup)
-- 
1.7.9.5


^ permalink raw reply related	[flat|nested] 68+ messages in thread

* [PATCH V6 10/10] powerpc/powernv: compound PE for VFs
@ 2015-05-19  1:35   ` Wei Yang
  0 siblings, 0 replies; 68+ messages in thread
From: Wei Yang @ 2015-05-19  1:35 UTC (permalink / raw)
  To: gwshan, bhelgaas; +Cc: linux-pci, Wei Yang, linuxppc-dev

When VF BAR size is larger than 64MB, we group VFs in terms of M64 BAR,
which means those VFs in a group should form a compound PE.

This patch links those VF PEs into compound PE in this case.

[gwshan: code refactoring for a bit]
Signed-off-by: Wei Yang <weiyang@linux.vnet.ibm.com>
Acked-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
---
 arch/powerpc/platforms/powernv/pci-ioda.c |   46 +++++++++++++++++++++++++----
 arch/powerpc/platforms/powernv/pci.c      |   17 +++++++++--
 2 files changed, 56 insertions(+), 7 deletions(-)

diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c b/arch/powerpc/platforms/powernv/pci-ioda.c
index f8bc950..56e7b65 100644
--- a/arch/powerpc/platforms/powernv/pci-ioda.c
+++ b/arch/powerpc/platforms/powernv/pci-ioda.c
@@ -1363,9 +1363,20 @@ static void pnv_ioda_release_vf_PE(struct pci_dev *pdev, u16 num_vfs)
 	}
 
 	list_for_each_entry_safe(pe, pe_n, &phb->ioda.pe_list, list) {
+		struct pnv_ioda_pe *s, *sn;
 		if (pe->parent_dev != pdev)
 			continue;
 
+		if ((pe->flags & PNV_IODA_PE_MASTER) &&
+		    (pe->flags & PNV_IODA_PE_VF)) {
+			list_for_each_entry_safe(s, sn, &pe->slaves, list) {
+				pnv_pci_ioda2_release_dma_pe(pdev, s);
+				list_del(&s->list);
+				pnv_ioda_deconfigure_pe(phb, s);
+				pnv_ioda_free_pe(phb, s->pe_number);
+			}
+		}
+
 		pnv_pci_ioda2_release_dma_pe(pdev, pe);
 
 		/* Remove from list */
@@ -1418,7 +1429,7 @@ static void pnv_ioda_setup_vf_PE(struct pci_dev *pdev, u16 num_vfs)
 	struct pci_bus        *bus;
 	struct pci_controller *hose;
 	struct pnv_phb        *phb;
-	struct pnv_ioda_pe    *pe;
+	struct pnv_ioda_pe    *pe, *master_pe;
 	int                    pe_num;
 	u16                    vf_index;
 	struct pci_dn         *pdn;
@@ -1464,10 +1475,13 @@ static void pnv_ioda_setup_vf_PE(struct pci_dev *pdev, u16 num_vfs)
 				GFP_KERNEL, hose->node);
 		pe->tce32_table->data = pe;
 
-		/* Put PE to the list */
-		mutex_lock(&phb->ioda.pe_list_mutex);
-		list_add_tail(&pe->list, &phb->ioda.pe_list);
-		mutex_unlock(&phb->ioda.pe_list_mutex);
+		/* Put PE to the list, or postpone it for compound PEs */
+		if ((pdn->m64_per_iov != M64_PER_IOV) ||
+		    (num_vfs <= M64_PER_IOV)) {
+			mutex_lock(&phb->ioda.pe_list_mutex);
+			list_add_tail(&pe->list, &phb->ioda.pe_list);
+			mutex_unlock(&phb->ioda.pe_list_mutex);
+		}
 
 		pnv_pci_ioda2_setup_dma_pe(phb, pe);
 	}
@@ -1480,10 +1494,32 @@ static void pnv_ioda_setup_vf_PE(struct pci_dev *pdev, u16 num_vfs)
 		vf_per_group = roundup_pow_of_two(num_vfs) / pdn->m64_per_iov;
 
 		for (vf_group = 0; vf_group < M64_PER_IOV; vf_group++) {
+			master_pe = NULL;
+
 			for (vf_index = vf_group * vf_per_group;
 			     vf_index < (vf_group + 1) * vf_per_group &&
 			     vf_index < num_vfs;
 			     vf_index++) {
+
+				/*
+				 * Figure out the master PE and put all slave
+				 * PEs to master PE's list.
+				 */
+				pe = &phb->ioda.pe_array[pdn->offset + vf_index];
+				if (!master_pe) {
+					pe->flags |= PNV_IODA_PE_MASTER;
+					INIT_LIST_HEAD(&pe->slaves);
+					master_pe = pe;
+					mutex_lock(&phb->ioda.pe_list_mutex);
+					list_add_tail(&pe->list, &phb->ioda.pe_list);
+					mutex_unlock(&phb->ioda.pe_list_mutex);
+				} else {
+					pe->flags |= PNV_IODA_PE_SLAVE;
+					pe->master = master_pe;
+					list_add_tail(&pe->list,
+						&master_pe->slaves);
+				}
+
 				for (vf_index1 = vf_group * vf_per_group;
 				     vf_index1 < (vf_group + 1) * vf_per_group &&
 				     vf_index1 < num_vfs;
diff --git a/arch/powerpc/platforms/powernv/pci.c b/arch/powerpc/platforms/powernv/pci.c
index 10bc8c3..717a58e 100644
--- a/arch/powerpc/platforms/powernv/pci.c
+++ b/arch/powerpc/platforms/powernv/pci.c
@@ -667,7 +667,7 @@ static void pnv_pci_dma_dev_setup(struct pci_dev *pdev)
 	struct pci_controller *hose = pci_bus_to_host(pdev->bus);
 	struct pnv_phb *phb = hose->private_data;
 #ifdef CONFIG_PCI_IOV
-	struct pnv_ioda_pe *pe;
+	struct pnv_ioda_pe *pe, *slave;
 	struct pci_dn *pdn;
 
 	/* Fix the VF pdn PE number */
@@ -679,10 +679,23 @@ static void pnv_pci_dma_dev_setup(struct pci_dev *pdev)
 			    (pdev->devfn & 0xff))) {
 				pdn->pe_number = pe->pe_number;
 				pe->pdev = pdev;
-				break;
+				goto found;
+			}
+
+			if ((pe->flags & PNV_IODA_PE_MASTER) &&
+			    (pe->flags & PNV_IODA_PE_VF)) {
+				list_for_each_entry(slave, &pe->slaves, list) {
+					if (slave->rid == ((pdev->bus->number << 8)
+					   | (pdev->devfn & 0xff))) {
+						pdn->pe_number = slave->pe_number;
+						slave->pdev = pdev;
+						goto found;
+					}
+				}
 			}
 		}
 	}
+found:
 #endif /* CONFIG_PCI_IOV */
 
 	if (phb && phb->dma_dev_setup)
-- 
1.7.9.5

^ permalink raw reply related	[flat|nested] 68+ messages in thread

* Re: [PATCH V6 01/10] PCI/IOV: Rename and export virtfn_add/virtfn_remove
  2015-05-19  1:35   ` Wei Yang
@ 2015-05-19  5:24     ` Wei Yang
  -1 siblings, 0 replies; 68+ messages in thread
From: Wei Yang @ 2015-05-19  5:24 UTC (permalink / raw)
  To: Wei Yang; +Cc: gwshan, bhelgaas, linuxppc-dev, linux-pci

Bjorn,

This patch set is dedicated for the VF EEH on Power. Most of them are in
powerpc arch, while this first one is PCI core related. Those following
patches are review and ack from Gavin, your ack to this one is important.

I'd appreciated it a lot if you have time to take a look at this one.

Thanks a lot.

On Tue, May 19, 2015 at 09:35:03AM +0800, Wei Yang wrote:
>During EEH recovery, hotplug is applied to the devices which don't
>have drivers or their drivers don't support EEH. However, the hotplug,
>which was implemented based on PCI bus, can't be applied to VF directly.
>
>The patch renames virtn_{add,remove}() and exports them so that they
>can be used in PCI hotplug during EEH recovery.
>
>[gwshan: changelog]
>Signed-off-by: Wei Yang <weiyang@linux.vnet.ibm.com>
>Reviewed-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
>---
> drivers/pci/iov.c   |   10 +++++-----
> include/linux/pci.h |    2 ++
> 2 files changed, 7 insertions(+), 5 deletions(-)
>
>diff --git a/drivers/pci/iov.c b/drivers/pci/iov.c
>index ee0ebff..cc941dd 100644
>--- a/drivers/pci/iov.c
>+++ b/drivers/pci/iov.c
>@@ -108,7 +108,7 @@ resource_size_t pci_iov_resource_size(struct pci_dev *dev, int resno)
> 	return dev->sriov->barsz[resno - PCI_IOV_RESOURCES];
> }
>
>-static int virtfn_add(struct pci_dev *dev, int id, int reset)
>+int pci_iov_virtfn_add(struct pci_dev *dev, int id, int reset)
> {
> 	int i;
> 	int rc = -ENOMEM;
>@@ -183,7 +183,7 @@ failed:
> 	return rc;
> }
>
>-static void virtfn_remove(struct pci_dev *dev, int id, int reset)
>+void pci_iov_virtfn_remove(struct pci_dev *dev, int id, int reset)
> {
> 	char buf[VIRTFN_ID_LEN];
> 	struct pci_dev *virtfn;
>@@ -320,7 +320,7 @@ static int sriov_enable(struct pci_dev *dev, int nr_virtfn)
> 	}
>
> 	for (i = 0; i < initial; i++) {
>-		rc = virtfn_add(dev, i, 0);
>+		rc = pci_iov_virtfn_add(dev, i, 0);
> 		if (rc)
> 			goto failed;
> 	}
>@@ -332,7 +332,7 @@ static int sriov_enable(struct pci_dev *dev, int nr_virtfn)
>
> failed:
> 	for (j = 0; j < i; j++)
>-		virtfn_remove(dev, j, 0);
>+		pci_iov_virtfn_remove(dev, j, 0);
>
> 	iov->ctrl &= ~(PCI_SRIOV_CTRL_VFE | PCI_SRIOV_CTRL_MSE);
> 	pci_cfg_access_lock(dev);
>@@ -361,7 +361,7 @@ static void sriov_disable(struct pci_dev *dev)
> 		return;
>
> 	for (i = 0; i < iov->num_VFs; i++)
>-		virtfn_remove(dev, i, 0);
>+		pci_iov_virtfn_remove(dev, i, 0);
>
> 	pcibios_sriov_disable(dev);
>
>diff --git a/include/linux/pci.h b/include/linux/pci.h
>index 353db8d..94bacfa 100644
>--- a/include/linux/pci.h
>+++ b/include/linux/pci.h
>@@ -1679,6 +1679,8 @@ int pci_iov_virtfn_devfn(struct pci_dev *dev, int id);
>
> int pci_enable_sriov(struct pci_dev *dev, int nr_virtfn);
> void pci_disable_sriov(struct pci_dev *dev);
>+int pci_iov_virtfn_add(struct pci_dev *dev, int id, int reset);
>+void pci_iov_virtfn_remove(struct pci_dev *dev, int id, int reset);
> int pci_num_vf(struct pci_dev *dev);
> int pci_vfs_assigned(struct pci_dev *dev);
> int pci_sriov_set_totalvfs(struct pci_dev *dev, u16 numvfs);
>-- 
>1.7.9.5

-- 
Richard Yang
Help you, Help me


^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH V6 01/10] PCI/IOV: Rename and export virtfn_add/virtfn_remove
@ 2015-05-19  5:24     ` Wei Yang
  0 siblings, 0 replies; 68+ messages in thread
From: Wei Yang @ 2015-05-19  5:24 UTC (permalink / raw)
  To: Wei Yang; +Cc: bhelgaas, linux-pci, linuxppc-dev, gwshan

Bjorn,

This patch set is dedicated for the VF EEH on Power. Most of them are in
powerpc arch, while this first one is PCI core related. Those following
patches are review and ack from Gavin, your ack to this one is important.

I'd appreciated it a lot if you have time to take a look at this one.

Thanks a lot.

On Tue, May 19, 2015 at 09:35:03AM +0800, Wei Yang wrote:
>During EEH recovery, hotplug is applied to the devices which don't
>have drivers or their drivers don't support EEH. However, the hotplug,
>which was implemented based on PCI bus, can't be applied to VF directly.
>
>The patch renames virtn_{add,remove}() and exports them so that they
>can be used in PCI hotplug during EEH recovery.
>
>[gwshan: changelog]
>Signed-off-by: Wei Yang <weiyang@linux.vnet.ibm.com>
>Reviewed-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
>---
> drivers/pci/iov.c   |   10 +++++-----
> include/linux/pci.h |    2 ++
> 2 files changed, 7 insertions(+), 5 deletions(-)
>
>diff --git a/drivers/pci/iov.c b/drivers/pci/iov.c
>index ee0ebff..cc941dd 100644
>--- a/drivers/pci/iov.c
>+++ b/drivers/pci/iov.c
>@@ -108,7 +108,7 @@ resource_size_t pci_iov_resource_size(struct pci_dev *dev, int resno)
> 	return dev->sriov->barsz[resno - PCI_IOV_RESOURCES];
> }
>
>-static int virtfn_add(struct pci_dev *dev, int id, int reset)
>+int pci_iov_virtfn_add(struct pci_dev *dev, int id, int reset)
> {
> 	int i;
> 	int rc = -ENOMEM;
>@@ -183,7 +183,7 @@ failed:
> 	return rc;
> }
>
>-static void virtfn_remove(struct pci_dev *dev, int id, int reset)
>+void pci_iov_virtfn_remove(struct pci_dev *dev, int id, int reset)
> {
> 	char buf[VIRTFN_ID_LEN];
> 	struct pci_dev *virtfn;
>@@ -320,7 +320,7 @@ static int sriov_enable(struct pci_dev *dev, int nr_virtfn)
> 	}
>
> 	for (i = 0; i < initial; i++) {
>-		rc = virtfn_add(dev, i, 0);
>+		rc = pci_iov_virtfn_add(dev, i, 0);
> 		if (rc)
> 			goto failed;
> 	}
>@@ -332,7 +332,7 @@ static int sriov_enable(struct pci_dev *dev, int nr_virtfn)
>
> failed:
> 	for (j = 0; j < i; j++)
>-		virtfn_remove(dev, j, 0);
>+		pci_iov_virtfn_remove(dev, j, 0);
>
> 	iov->ctrl &= ~(PCI_SRIOV_CTRL_VFE | PCI_SRIOV_CTRL_MSE);
> 	pci_cfg_access_lock(dev);
>@@ -361,7 +361,7 @@ static void sriov_disable(struct pci_dev *dev)
> 		return;
>
> 	for (i = 0; i < iov->num_VFs; i++)
>-		virtfn_remove(dev, i, 0);
>+		pci_iov_virtfn_remove(dev, i, 0);
>
> 	pcibios_sriov_disable(dev);
>
>diff --git a/include/linux/pci.h b/include/linux/pci.h
>index 353db8d..94bacfa 100644
>--- a/include/linux/pci.h
>+++ b/include/linux/pci.h
>@@ -1679,6 +1679,8 @@ int pci_iov_virtfn_devfn(struct pci_dev *dev, int id);
>
> int pci_enable_sriov(struct pci_dev *dev, int nr_virtfn);
> void pci_disable_sriov(struct pci_dev *dev);
>+int pci_iov_virtfn_add(struct pci_dev *dev, int id, int reset);
>+void pci_iov_virtfn_remove(struct pci_dev *dev, int id, int reset);
> int pci_num_vf(struct pci_dev *dev);
> int pci_vfs_assigned(struct pci_dev *dev);
> int pci_sriov_set_totalvfs(struct pci_dev *dev, u16 numvfs);
>-- 
>1.7.9.5

-- 
Richard Yang
Help you, Help me

^ permalink raw reply	[flat|nested] 68+ messages in thread

* [PATCH V7 00/10] VF EEH on Power8
  2015-05-19  1:35 ` Wei Yang
@ 2015-05-19 10:50   ` Wei Yang
  -1 siblings, 0 replies; 68+ messages in thread
From: Wei Yang @ 2015-05-19 10:50 UTC (permalink / raw)
  To: gwshan, bhelgaas; +Cc: linuxppc-dev, linux-pci, Wei Yang

This patchset enables EEH on SRIOV VFs. The general idea is to create proper
VF edev and VF PE and handle them properly.

Different from the Bus PE, VF PE just contain one VF. This introduces the
difference of EEH error handling on a VF PE. Generally, it has several
differences.

First, the VF's removal and re-enumerate rely on its PF. VF has a tight
relationship between its PF. This is not proper to enumerate a VF by usual
scan procedure. That's why virtfn_add/virtfn_remove are exported in this patch
set.

Second, the reset/restore of a VF is done in kernel space. FW is not aware of
the VF, this means the usual reset function done in FW will not work. One of
the patch will imitate the reset/restore function in kernel space.

Third, the VF may be removed during the PF's error_detected function. In this
case, the original error_detected->slot_reset->resume sequence is not proper
to those removed VFs, since they are re-created by PF in a fresh state. A flag
in eeh_dev is introduce to mark the eeh_dev is in error state. By doing so, we
track whether this device needs to be reset or not.

This has been tested both on host and in guest on Power8 with latest kernel
version.

v7:
   * fix compile error when PCI_IOV is not set
v6:
   * code / commit log refactor by Gavin
v5:
   * remove the compound field, iterate on Master VF PE instead
   * some code refine on PCI config restore and reset on VF
     the wait time for assert and deassert
     PCI device address format
     check on edev->pcie_cap and edev->aer_cap before access them
v4:
   * refine the change logs, comment and code style
   * change pnv_pci_fixup_vf_eeh() to pnv_eeh_vf_final_fixup() and remove the
     CONFIG_PCI_IOV macro
   * reorder patch 5/6 to make the logic more reasonable
   * remove remove_dev_pci_data()
   * remove the EEH_DEV_VF flag, use edev->physfn to identify a VF EEH DEV and
     remove related CONFIG_PCI_IOV macro
   * add the option for VF reset
   * fix the pnv_eeh_cfg_blocked() logic
   * replace pnv_pci_cfg_{read,write} with eeh_ops->{read,write}_config in
     pnv_eeh_vf_restore_config()
   * rename pnv_eeh_vf_restore_config() to pnv_eeh_restore_vf_config()
   * rename pnv_pci_fixup_vf_caps() to pnv_pci_vf_header_fixup() and move it
     to arch/powerpc/platforms/powernv/pci.c
   * add a field compound in pnv_ioda_pe to link compound PEs
   * handle compound PE for VF PEs
v3:
   * add back vf_index in pci_dn to track the VF's index
   * rename ppdev in eeh_dev to physfn for consistency
   * move edev->physfn assignment before dev->dev.archdata.edev is set
   * move pnv_pci_fixup_vf_eeh() and pnv_pci_fixup_vf_caps() to eeh-powernv.c
   * more clear and detail in commit log and comment in code
   * merge eeh_rmv_virt_device() with eeh_rmv_device()
   * move the cfg_blocked check logic from pnv_eeh_read/write_config() to
     pnv_eeh_cfg_blocked()
   * move the vf reset/restore logic into its own patch, two patches are
     created.
     powerpc/powernv: Support PCI config restore for VFs
     powerpc/powernv: Support EEH reset for VFs
   * simplify the vf reset logic
v2:
   * add prefix pci_iov_ to virtfn_add/virtfn_remove
   * use EEH_DEV_VF as a flag for a VF's eeh_dev
   * use eeh_dev instead of edev in change log
   * remove vf_index in eeh_dev, calculate it from pdn->busno and devfn
   * do eeh_add_device_late() and eeh_sysfs_add_device() both after pci_dev is
     well initialized
   * do FLR to reset a VF PE
   * imitate the restore function in FW for VF
   * remove the reverse order patch, since it is still under discussion

Wei Yang (10):
  PCI/IOV: Rename and export virtfn_add/virtfn_remove
  powerpc/pci: Cache VF index in pci_dn
  powerpc/pci: Remove VFs prior to PF
  powerpc/eeh: Trace first 7 BARs in address cache
  powerpc/powernv: EEH device for VF
  powerpc/eeh: Create PE for VFs
  powerpc/powernv: Support EEH reset for VF PE
  powerpc/powernv: Support PCI config restore for VFs
  powerpc/eeh: Support error recovery for VF PE
  powerpc/powernv: compound PE for VFs

 arch/powerpc/include/asm/eeh.h               |    4 +
 arch/powerpc/include/asm/pci-bridge.h        |    2 +
 arch/powerpc/kernel/eeh.c                    |    8 +
 arch/powerpc/kernel/eeh_cache.c              |    2 +-
 arch/powerpc/kernel/eeh_driver.c             |  100 +++++++++---
 arch/powerpc/kernel/eeh_pe.c                 |   13 +-
 arch/powerpc/kernel/pci-hotplug.c            |    2 +-
 arch/powerpc/kernel/pci_dn.c                 |   16 +-
 arch/powerpc/platforms/powernv/eeh-powernv.c |  221 +++++++++++++++++++++++++-
 arch/powerpc/platforms/powernv/pci-ioda.c    |   46 +++++-
 arch/powerpc/platforms/powernv/pci.c         |   35 +++-
 drivers/pci/iov.c                            |   10 +-
 include/linux/pci.h                          |    8 +
 13 files changed, 426 insertions(+), 41 deletions(-)

-- 
1.7.9.5


^ permalink raw reply	[flat|nested] 68+ messages in thread

* [PATCH V7 00/10] VF EEH on Power8
@ 2015-05-19 10:50   ` Wei Yang
  0 siblings, 0 replies; 68+ messages in thread
From: Wei Yang @ 2015-05-19 10:50 UTC (permalink / raw)
  To: gwshan, bhelgaas; +Cc: linux-pci, Wei Yang, linuxppc-dev

This patchset enables EEH on SRIOV VFs. The general idea is to create proper
VF edev and VF PE and handle them properly.

Different from the Bus PE, VF PE just contain one VF. This introduces the
difference of EEH error handling on a VF PE. Generally, it has several
differences.

First, the VF's removal and re-enumerate rely on its PF. VF has a tight
relationship between its PF. This is not proper to enumerate a VF by usual
scan procedure. That's why virtfn_add/virtfn_remove are exported in this patch
set.

Second, the reset/restore of a VF is done in kernel space. FW is not aware of
the VF, this means the usual reset function done in FW will not work. One of
the patch will imitate the reset/restore function in kernel space.

Third, the VF may be removed during the PF's error_detected function. In this
case, the original error_detected->slot_reset->resume sequence is not proper
to those removed VFs, since they are re-created by PF in a fresh state. A flag
in eeh_dev is introduce to mark the eeh_dev is in error state. By doing so, we
track whether this device needs to be reset or not.

This has been tested both on host and in guest on Power8 with latest kernel
version.

v7:
   * fix compile error when PCI_IOV is not set
v6:
   * code / commit log refactor by Gavin
v5:
   * remove the compound field, iterate on Master VF PE instead
   * some code refine on PCI config restore and reset on VF
     the wait time for assert and deassert
     PCI device address format
     check on edev->pcie_cap and edev->aer_cap before access them
v4:
   * refine the change logs, comment and code style
   * change pnv_pci_fixup_vf_eeh() to pnv_eeh_vf_final_fixup() and remove the
     CONFIG_PCI_IOV macro
   * reorder patch 5/6 to make the logic more reasonable
   * remove remove_dev_pci_data()
   * remove the EEH_DEV_VF flag, use edev->physfn to identify a VF EEH DEV and
     remove related CONFIG_PCI_IOV macro
   * add the option for VF reset
   * fix the pnv_eeh_cfg_blocked() logic
   * replace pnv_pci_cfg_{read,write} with eeh_ops->{read,write}_config in
     pnv_eeh_vf_restore_config()
   * rename pnv_eeh_vf_restore_config() to pnv_eeh_restore_vf_config()
   * rename pnv_pci_fixup_vf_caps() to pnv_pci_vf_header_fixup() and move it
     to arch/powerpc/platforms/powernv/pci.c
   * add a field compound in pnv_ioda_pe to link compound PEs
   * handle compound PE for VF PEs
v3:
   * add back vf_index in pci_dn to track the VF's index
   * rename ppdev in eeh_dev to physfn for consistency
   * move edev->physfn assignment before dev->dev.archdata.edev is set
   * move pnv_pci_fixup_vf_eeh() and pnv_pci_fixup_vf_caps() to eeh-powernv.c
   * more clear and detail in commit log and comment in code
   * merge eeh_rmv_virt_device() with eeh_rmv_device()
   * move the cfg_blocked check logic from pnv_eeh_read/write_config() to
     pnv_eeh_cfg_blocked()
   * move the vf reset/restore logic into its own patch, two patches are
     created.
     powerpc/powernv: Support PCI config restore for VFs
     powerpc/powernv: Support EEH reset for VFs
   * simplify the vf reset logic
v2:
   * add prefix pci_iov_ to virtfn_add/virtfn_remove
   * use EEH_DEV_VF as a flag for a VF's eeh_dev
   * use eeh_dev instead of edev in change log
   * remove vf_index in eeh_dev, calculate it from pdn->busno and devfn
   * do eeh_add_device_late() and eeh_sysfs_add_device() both after pci_dev is
     well initialized
   * do FLR to reset a VF PE
   * imitate the restore function in FW for VF
   * remove the reverse order patch, since it is still under discussion

Wei Yang (10):
  PCI/IOV: Rename and export virtfn_add/virtfn_remove
  powerpc/pci: Cache VF index in pci_dn
  powerpc/pci: Remove VFs prior to PF
  powerpc/eeh: Trace first 7 BARs in address cache
  powerpc/powernv: EEH device for VF
  powerpc/eeh: Create PE for VFs
  powerpc/powernv: Support EEH reset for VF PE
  powerpc/powernv: Support PCI config restore for VFs
  powerpc/eeh: Support error recovery for VF PE
  powerpc/powernv: compound PE for VFs

 arch/powerpc/include/asm/eeh.h               |    4 +
 arch/powerpc/include/asm/pci-bridge.h        |    2 +
 arch/powerpc/kernel/eeh.c                    |    8 +
 arch/powerpc/kernel/eeh_cache.c              |    2 +-
 arch/powerpc/kernel/eeh_driver.c             |  100 +++++++++---
 arch/powerpc/kernel/eeh_pe.c                 |   13 +-
 arch/powerpc/kernel/pci-hotplug.c            |    2 +-
 arch/powerpc/kernel/pci_dn.c                 |   16 +-
 arch/powerpc/platforms/powernv/eeh-powernv.c |  221 +++++++++++++++++++++++++-
 arch/powerpc/platforms/powernv/pci-ioda.c    |   46 +++++-
 arch/powerpc/platforms/powernv/pci.c         |   35 +++-
 drivers/pci/iov.c                            |   10 +-
 include/linux/pci.h                          |    8 +
 13 files changed, 426 insertions(+), 41 deletions(-)

-- 
1.7.9.5

^ permalink raw reply	[flat|nested] 68+ messages in thread

* [PATCH V7 01/10] PCI/IOV: Rename and export virtfn_add/virtfn_remove
  2015-05-19 10:50   ` Wei Yang
@ 2015-05-19 10:50     ` Wei Yang
  -1 siblings, 0 replies; 68+ messages in thread
From: Wei Yang @ 2015-05-19 10:50 UTC (permalink / raw)
  To: gwshan, bhelgaas; +Cc: linuxppc-dev, linux-pci, Wei Yang

During EEH recovery, hotplug is applied to the devices which don't
have drivers or their drivers don't support EEH. However, the hotplug,
which was implemented based on PCI bus, can't be applied to VF directly.

The patch renames virtn_{add,remove}() and exports them so that they
can be used in PCI hotplug during EEH recovery.

[gwshan: changelog]
Signed-off-by: Wei Yang <weiyang@linux.vnet.ibm.com>
Reviewed-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
---
 drivers/pci/iov.c   |   10 +++++-----
 include/linux/pci.h |    8 ++++++++
 2 files changed, 13 insertions(+), 5 deletions(-)

diff --git a/drivers/pci/iov.c b/drivers/pci/iov.c
index ee0ebff..cc941dd 100644
--- a/drivers/pci/iov.c
+++ b/drivers/pci/iov.c
@@ -108,7 +108,7 @@ resource_size_t pci_iov_resource_size(struct pci_dev *dev, int resno)
 	return dev->sriov->barsz[resno - PCI_IOV_RESOURCES];
 }
 
-static int virtfn_add(struct pci_dev *dev, int id, int reset)
+int pci_iov_virtfn_add(struct pci_dev *dev, int id, int reset)
 {
 	int i;
 	int rc = -ENOMEM;
@@ -183,7 +183,7 @@ failed:
 	return rc;
 }
 
-static void virtfn_remove(struct pci_dev *dev, int id, int reset)
+void pci_iov_virtfn_remove(struct pci_dev *dev, int id, int reset)
 {
 	char buf[VIRTFN_ID_LEN];
 	struct pci_dev *virtfn;
@@ -320,7 +320,7 @@ static int sriov_enable(struct pci_dev *dev, int nr_virtfn)
 	}
 
 	for (i = 0; i < initial; i++) {
-		rc = virtfn_add(dev, i, 0);
+		rc = pci_iov_virtfn_add(dev, i, 0);
 		if (rc)
 			goto failed;
 	}
@@ -332,7 +332,7 @@ static int sriov_enable(struct pci_dev *dev, int nr_virtfn)
 
 failed:
 	for (j = 0; j < i; j++)
-		virtfn_remove(dev, j, 0);
+		pci_iov_virtfn_remove(dev, j, 0);
 
 	iov->ctrl &= ~(PCI_SRIOV_CTRL_VFE | PCI_SRIOV_CTRL_MSE);
 	pci_cfg_access_lock(dev);
@@ -361,7 +361,7 @@ static void sriov_disable(struct pci_dev *dev)
 		return;
 
 	for (i = 0; i < iov->num_VFs; i++)
-		virtfn_remove(dev, i, 0);
+		pci_iov_virtfn_remove(dev, i, 0);
 
 	pcibios_sriov_disable(dev);
 
diff --git a/include/linux/pci.h b/include/linux/pci.h
index 353db8d..06aa5dd 100644
--- a/include/linux/pci.h
+++ b/include/linux/pci.h
@@ -1679,6 +1679,8 @@ int pci_iov_virtfn_devfn(struct pci_dev *dev, int id);
 
 int pci_enable_sriov(struct pci_dev *dev, int nr_virtfn);
 void pci_disable_sriov(struct pci_dev *dev);
+int pci_iov_virtfn_add(struct pci_dev *dev, int id, int reset);
+void pci_iov_virtfn_remove(struct pci_dev *dev, int id, int reset);
 int pci_num_vf(struct pci_dev *dev);
 int pci_vfs_assigned(struct pci_dev *dev);
 int pci_sriov_set_totalvfs(struct pci_dev *dev, u16 numvfs);
@@ -1696,6 +1698,12 @@ static inline int pci_iov_virtfn_devfn(struct pci_dev *dev, int id)
 static inline int pci_enable_sriov(struct pci_dev *dev, int nr_virtfn)
 { return -ENODEV; }
 static inline void pci_disable_sriov(struct pci_dev *dev) { }
+static inline int pci_iov_virtfn_add(struct pci_dev *dev, int id, int reset)
+{
+	return -ENOSYS;
+}
+static inline void pci_iov_virtfn_remove(struct pci_dev *dev, int id, int reset)
+{ }
 static inline int pci_num_vf(struct pci_dev *dev) { return 0; }
 static inline int pci_vfs_assigned(struct pci_dev *dev)
 { return 0; }
-- 
1.7.9.5


^ permalink raw reply related	[flat|nested] 68+ messages in thread

* [PATCH V7 01/10] PCI/IOV: Rename and export virtfn_add/virtfn_remove
@ 2015-05-19 10:50     ` Wei Yang
  0 siblings, 0 replies; 68+ messages in thread
From: Wei Yang @ 2015-05-19 10:50 UTC (permalink / raw)
  To: gwshan, bhelgaas; +Cc: linux-pci, Wei Yang, linuxppc-dev

During EEH recovery, hotplug is applied to the devices which don't
have drivers or their drivers don't support EEH. However, the hotplug,
which was implemented based on PCI bus, can't be applied to VF directly.

The patch renames virtn_{add,remove}() and exports them so that they
can be used in PCI hotplug during EEH recovery.

[gwshan: changelog]
Signed-off-by: Wei Yang <weiyang@linux.vnet.ibm.com>
Reviewed-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
---
 drivers/pci/iov.c   |   10 +++++-----
 include/linux/pci.h |    8 ++++++++
 2 files changed, 13 insertions(+), 5 deletions(-)

diff --git a/drivers/pci/iov.c b/drivers/pci/iov.c
index ee0ebff..cc941dd 100644
--- a/drivers/pci/iov.c
+++ b/drivers/pci/iov.c
@@ -108,7 +108,7 @@ resource_size_t pci_iov_resource_size(struct pci_dev *dev, int resno)
 	return dev->sriov->barsz[resno - PCI_IOV_RESOURCES];
 }
 
-static int virtfn_add(struct pci_dev *dev, int id, int reset)
+int pci_iov_virtfn_add(struct pci_dev *dev, int id, int reset)
 {
 	int i;
 	int rc = -ENOMEM;
@@ -183,7 +183,7 @@ failed:
 	return rc;
 }
 
-static void virtfn_remove(struct pci_dev *dev, int id, int reset)
+void pci_iov_virtfn_remove(struct pci_dev *dev, int id, int reset)
 {
 	char buf[VIRTFN_ID_LEN];
 	struct pci_dev *virtfn;
@@ -320,7 +320,7 @@ static int sriov_enable(struct pci_dev *dev, int nr_virtfn)
 	}
 
 	for (i = 0; i < initial; i++) {
-		rc = virtfn_add(dev, i, 0);
+		rc = pci_iov_virtfn_add(dev, i, 0);
 		if (rc)
 			goto failed;
 	}
@@ -332,7 +332,7 @@ static int sriov_enable(struct pci_dev *dev, int nr_virtfn)
 
 failed:
 	for (j = 0; j < i; j++)
-		virtfn_remove(dev, j, 0);
+		pci_iov_virtfn_remove(dev, j, 0);
 
 	iov->ctrl &= ~(PCI_SRIOV_CTRL_VFE | PCI_SRIOV_CTRL_MSE);
 	pci_cfg_access_lock(dev);
@@ -361,7 +361,7 @@ static void sriov_disable(struct pci_dev *dev)
 		return;
 
 	for (i = 0; i < iov->num_VFs; i++)
-		virtfn_remove(dev, i, 0);
+		pci_iov_virtfn_remove(dev, i, 0);
 
 	pcibios_sriov_disable(dev);
 
diff --git a/include/linux/pci.h b/include/linux/pci.h
index 353db8d..06aa5dd 100644
--- a/include/linux/pci.h
+++ b/include/linux/pci.h
@@ -1679,6 +1679,8 @@ int pci_iov_virtfn_devfn(struct pci_dev *dev, int id);
 
 int pci_enable_sriov(struct pci_dev *dev, int nr_virtfn);
 void pci_disable_sriov(struct pci_dev *dev);
+int pci_iov_virtfn_add(struct pci_dev *dev, int id, int reset);
+void pci_iov_virtfn_remove(struct pci_dev *dev, int id, int reset);
 int pci_num_vf(struct pci_dev *dev);
 int pci_vfs_assigned(struct pci_dev *dev);
 int pci_sriov_set_totalvfs(struct pci_dev *dev, u16 numvfs);
@@ -1696,6 +1698,12 @@ static inline int pci_iov_virtfn_devfn(struct pci_dev *dev, int id)
 static inline int pci_enable_sriov(struct pci_dev *dev, int nr_virtfn)
 { return -ENODEV; }
 static inline void pci_disable_sriov(struct pci_dev *dev) { }
+static inline int pci_iov_virtfn_add(struct pci_dev *dev, int id, int reset)
+{
+	return -ENOSYS;
+}
+static inline void pci_iov_virtfn_remove(struct pci_dev *dev, int id, int reset)
+{ }
 static inline int pci_num_vf(struct pci_dev *dev) { return 0; }
 static inline int pci_vfs_assigned(struct pci_dev *dev)
 { return 0; }
-- 
1.7.9.5

^ permalink raw reply related	[flat|nested] 68+ messages in thread

* [PATCH V7 02/10] powerpc/pci: Cache VF index in pci_dn
  2015-05-19 10:50   ` Wei Yang
@ 2015-05-19 10:50     ` Wei Yang
  -1 siblings, 0 replies; 68+ messages in thread
From: Wei Yang @ 2015-05-19 10:50 UTC (permalink / raw)
  To: gwshan, bhelgaas; +Cc: linuxppc-dev, linux-pci, Wei Yang

The patch caches the VF index in pci_dn, which can be used to calculate
VF's bus, device and function number. Those information helps to locate
the VF's PCI device instance when doing hotplug during EEH recovery if
necessary.

Signed-off-by: Wei Yang <weiyang@linux.vnet.ibm.com>
Acked-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
---
 arch/powerpc/include/asm/pci-bridge.h |    1 +
 arch/powerpc/kernel/pci_dn.c          |    4 +++-
 2 files changed, 4 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/include/asm/pci-bridge.h b/arch/powerpc/include/asm/pci-bridge.h
index 1811c44..c324882 100644
--- a/arch/powerpc/include/asm/pci-bridge.h
+++ b/arch/powerpc/include/asm/pci-bridge.h
@@ -196,6 +196,7 @@ struct pci_dn {
 #define IODA_INVALID_PE		(-1)
 #ifdef CONFIG_PPC_POWERNV
 	int	pe_number;
+	int     vf_index;		/* VF index in the PF */
 #ifdef CONFIG_PCI_IOV
 	u16     vfs_expanded;		/* number of VFs IOV BAR expanded */
 	u16     num_vfs;		/* number of VFs enabled*/
diff --git a/arch/powerpc/kernel/pci_dn.c b/arch/powerpc/kernel/pci_dn.c
index b3b4df9..f771130 100644
--- a/arch/powerpc/kernel/pci_dn.c
+++ b/arch/powerpc/kernel/pci_dn.c
@@ -139,6 +139,7 @@ struct pci_dn *pci_get_pdn(struct pci_dev *pdev)
 #ifdef CONFIG_PCI_IOV
 static struct pci_dn *add_one_dev_pci_data(struct pci_dn *parent,
 					   struct pci_dev *pdev,
+					   int vf_index,
 					   int busno, int devfn)
 {
 	struct pci_dn *pdn;
@@ -157,6 +158,7 @@ static struct pci_dn *add_one_dev_pci_data(struct pci_dn *parent,
 	pdn->parent = parent;
 	pdn->busno = busno;
 	pdn->devfn = devfn;
+	pdn->vf_index = vf_index;
 #ifdef CONFIG_PPC_POWERNV
 	pdn->pe_number = IODA_INVALID_PE;
 #endif
@@ -196,7 +198,7 @@ struct pci_dn *add_dev_pci_data(struct pci_dev *pdev)
 		return NULL;
 
 	for (i = 0; i < pci_sriov_get_totalvfs(pdev); i++) {
-		pdn = add_one_dev_pci_data(parent, NULL,
+		pdn = add_one_dev_pci_data(parent, NULL, i,
 					   pci_iov_virtfn_bus(pdev, i),
 					   pci_iov_virtfn_devfn(pdev, i));
 		if (!pdn) {
-- 
1.7.9.5


^ permalink raw reply related	[flat|nested] 68+ messages in thread

* [PATCH V7 02/10] powerpc/pci: Cache VF index in pci_dn
@ 2015-05-19 10:50     ` Wei Yang
  0 siblings, 0 replies; 68+ messages in thread
From: Wei Yang @ 2015-05-19 10:50 UTC (permalink / raw)
  To: gwshan, bhelgaas; +Cc: linux-pci, Wei Yang, linuxppc-dev

The patch caches the VF index in pci_dn, which can be used to calculate
VF's bus, device and function number. Those information helps to locate
the VF's PCI device instance when doing hotplug during EEH recovery if
necessary.

Signed-off-by: Wei Yang <weiyang@linux.vnet.ibm.com>
Acked-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
---
 arch/powerpc/include/asm/pci-bridge.h |    1 +
 arch/powerpc/kernel/pci_dn.c          |    4 +++-
 2 files changed, 4 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/include/asm/pci-bridge.h b/arch/powerpc/include/asm/pci-bridge.h
index 1811c44..c324882 100644
--- a/arch/powerpc/include/asm/pci-bridge.h
+++ b/arch/powerpc/include/asm/pci-bridge.h
@@ -196,6 +196,7 @@ struct pci_dn {
 #define IODA_INVALID_PE		(-1)
 #ifdef CONFIG_PPC_POWERNV
 	int	pe_number;
+	int     vf_index;		/* VF index in the PF */
 #ifdef CONFIG_PCI_IOV
 	u16     vfs_expanded;		/* number of VFs IOV BAR expanded */
 	u16     num_vfs;		/* number of VFs enabled*/
diff --git a/arch/powerpc/kernel/pci_dn.c b/arch/powerpc/kernel/pci_dn.c
index b3b4df9..f771130 100644
--- a/arch/powerpc/kernel/pci_dn.c
+++ b/arch/powerpc/kernel/pci_dn.c
@@ -139,6 +139,7 @@ struct pci_dn *pci_get_pdn(struct pci_dev *pdev)
 #ifdef CONFIG_PCI_IOV
 static struct pci_dn *add_one_dev_pci_data(struct pci_dn *parent,
 					   struct pci_dev *pdev,
+					   int vf_index,
 					   int busno, int devfn)
 {
 	struct pci_dn *pdn;
@@ -157,6 +158,7 @@ static struct pci_dn *add_one_dev_pci_data(struct pci_dn *parent,
 	pdn->parent = parent;
 	pdn->busno = busno;
 	pdn->devfn = devfn;
+	pdn->vf_index = vf_index;
 #ifdef CONFIG_PPC_POWERNV
 	pdn->pe_number = IODA_INVALID_PE;
 #endif
@@ -196,7 +198,7 @@ struct pci_dn *add_dev_pci_data(struct pci_dev *pdev)
 		return NULL;
 
 	for (i = 0; i < pci_sriov_get_totalvfs(pdev); i++) {
-		pdn = add_one_dev_pci_data(parent, NULL,
+		pdn = add_one_dev_pci_data(parent, NULL, i,
 					   pci_iov_virtfn_bus(pdev, i),
 					   pci_iov_virtfn_devfn(pdev, i));
 		if (!pdn) {
-- 
1.7.9.5

^ permalink raw reply related	[flat|nested] 68+ messages in thread

* [PATCH V7 03/10] powerpc/pci: Remove VFs prior to PF
  2015-05-19 10:50   ` Wei Yang
@ 2015-05-19 10:50     ` Wei Yang
  -1 siblings, 0 replies; 68+ messages in thread
From: Wei Yang @ 2015-05-19 10:50 UTC (permalink / raw)
  To: gwshan, bhelgaas; +Cc: linuxppc-dev, linux-pci, Wei Yang

As commit ac205b7b ("PCI: make sriov work with hotplug remove") indicates,
VFs, which might be hooked to same PCI bus as their PF should be removed
before the PF. Otherwise, the PCI hot unplugging on the PCI bus would
cause kernel crash.

The patch applies the above pattern to PowerPC PCI hotplug path.

[gwshan: changelog]
Signed-off-by: Wei Yang <weiyang@linux.vnet.ibm.com>
Acked-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
---
 arch/powerpc/kernel/pci-hotplug.c |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/powerpc/kernel/pci-hotplug.c b/arch/powerpc/kernel/pci-hotplug.c
index 7ed85a6..98f84ed 100644
--- a/arch/powerpc/kernel/pci-hotplug.c
+++ b/arch/powerpc/kernel/pci-hotplug.c
@@ -50,7 +50,7 @@ void pcibios_remove_pci_devices(struct pci_bus *bus)
 
 	pr_debug("PCI: Removing devices on bus %04x:%02x\n",
 		 pci_domain_nr(bus),  bus->number);
-	list_for_each_entry_safe(dev, tmp, &bus->devices, bus_list) {
+	list_for_each_entry_safe_reverse(dev, tmp, &bus->devices, bus_list) {
 		pr_debug("   Removing %s...\n", pci_name(dev));
 		pci_stop_and_remove_bus_device(dev);
 	}
-- 
1.7.9.5


^ permalink raw reply related	[flat|nested] 68+ messages in thread

* [PATCH V7 03/10] powerpc/pci: Remove VFs prior to PF
@ 2015-05-19 10:50     ` Wei Yang
  0 siblings, 0 replies; 68+ messages in thread
From: Wei Yang @ 2015-05-19 10:50 UTC (permalink / raw)
  To: gwshan, bhelgaas; +Cc: linux-pci, Wei Yang, linuxppc-dev

As commit ac205b7b ("PCI: make sriov work with hotplug remove") indicates,
VFs, which might be hooked to same PCI bus as their PF should be removed
before the PF. Otherwise, the PCI hot unplugging on the PCI bus would
cause kernel crash.

The patch applies the above pattern to PowerPC PCI hotplug path.

[gwshan: changelog]
Signed-off-by: Wei Yang <weiyang@linux.vnet.ibm.com>
Acked-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
---
 arch/powerpc/kernel/pci-hotplug.c |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/powerpc/kernel/pci-hotplug.c b/arch/powerpc/kernel/pci-hotplug.c
index 7ed85a6..98f84ed 100644
--- a/arch/powerpc/kernel/pci-hotplug.c
+++ b/arch/powerpc/kernel/pci-hotplug.c
@@ -50,7 +50,7 @@ void pcibios_remove_pci_devices(struct pci_bus *bus)
 
 	pr_debug("PCI: Removing devices on bus %04x:%02x\n",
 		 pci_domain_nr(bus),  bus->number);
-	list_for_each_entry_safe(dev, tmp, &bus->devices, bus_list) {
+	list_for_each_entry_safe_reverse(dev, tmp, &bus->devices, bus_list) {
 		pr_debug("   Removing %s...\n", pci_name(dev));
 		pci_stop_and_remove_bus_device(dev);
 	}
-- 
1.7.9.5

^ permalink raw reply related	[flat|nested] 68+ messages in thread

* [PATCH V7 04/10] powerpc/eeh: Trace first 7 BARs in address cache
  2015-05-19 10:50   ` Wei Yang
@ 2015-05-19 10:50     ` Wei Yang
  -1 siblings, 0 replies; 68+ messages in thread
From: Wei Yang @ 2015-05-19 10:50 UTC (permalink / raw)
  To: gwshan, bhelgaas; +Cc: linuxppc-dev, linux-pci, Wei Yang

EEH address cache, which helps to locate the PCI device according to
the given (physical) MMIO address, didn't cover PCI bridges. Also, it
shouldn't return PF with address in PF's IOV BARs. Instead, the VFs
should be returned.

The patch restricts the address cache to cover first 7 BARs for the
above purposes.

[gwshan: changelog]
Signed-off-by: Wei Yang <weiyang@linux.vnet.ibm.com>
Acked-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
---
 arch/powerpc/kernel/eeh_cache.c |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/powerpc/kernel/eeh_cache.c b/arch/powerpc/kernel/eeh_cache.c
index eeabeab..f6c5f05 100644
--- a/arch/powerpc/kernel/eeh_cache.c
+++ b/arch/powerpc/kernel/eeh_cache.c
@@ -196,7 +196,7 @@ static void __eeh_addr_cache_insert_dev(struct pci_dev *dev)
 	}
 
 	/* Walk resources on this device, poke them into the tree */
-	for (i = 0; i < DEVICE_COUNT_RESOURCE; i++) {
+	for (i = 0; i <= PCI_ROM_RESOURCE; i++) {
 		unsigned long start = pci_resource_start(dev,i);
 		unsigned long end = pci_resource_end(dev,i);
 		unsigned int flags = pci_resource_flags(dev,i);
-- 
1.7.9.5


^ permalink raw reply related	[flat|nested] 68+ messages in thread

* [PATCH V7 04/10] powerpc/eeh: Trace first 7 BARs in address cache
@ 2015-05-19 10:50     ` Wei Yang
  0 siblings, 0 replies; 68+ messages in thread
From: Wei Yang @ 2015-05-19 10:50 UTC (permalink / raw)
  To: gwshan, bhelgaas; +Cc: linux-pci, Wei Yang, linuxppc-dev

EEH address cache, which helps to locate the PCI device according to
the given (physical) MMIO address, didn't cover PCI bridges. Also, it
shouldn't return PF with address in PF's IOV BARs. Instead, the VFs
should be returned.

The patch restricts the address cache to cover first 7 BARs for the
above purposes.

[gwshan: changelog]
Signed-off-by: Wei Yang <weiyang@linux.vnet.ibm.com>
Acked-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
---
 arch/powerpc/kernel/eeh_cache.c |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/powerpc/kernel/eeh_cache.c b/arch/powerpc/kernel/eeh_cache.c
index eeabeab..f6c5f05 100644
--- a/arch/powerpc/kernel/eeh_cache.c
+++ b/arch/powerpc/kernel/eeh_cache.c
@@ -196,7 +196,7 @@ static void __eeh_addr_cache_insert_dev(struct pci_dev *dev)
 	}
 
 	/* Walk resources on this device, poke them into the tree */
-	for (i = 0; i < DEVICE_COUNT_RESOURCE; i++) {
+	for (i = 0; i <= PCI_ROM_RESOURCE; i++) {
 		unsigned long start = pci_resource_start(dev,i);
 		unsigned long end = pci_resource_end(dev,i);
 		unsigned int flags = pci_resource_flags(dev,i);
-- 
1.7.9.5

^ permalink raw reply related	[flat|nested] 68+ messages in thread

* [PATCH V7 05/10] powerpc/powernv: EEH device for VF
  2015-05-19 10:50   ` Wei Yang
@ 2015-05-19 10:50     ` Wei Yang
  -1 siblings, 0 replies; 68+ messages in thread
From: Wei Yang @ 2015-05-19 10:50 UTC (permalink / raw)
  To: gwshan, bhelgaas; +Cc: linuxppc-dev, linux-pci, Wei Yang

VFs and their corresponding pci_dn instances are created and released
dynamically as their PF's SRIOV capability is enabled and disabled.
The patch creates and releases EEH devices for VFs when creating and
releasing their pci_dn instances, which means EEH devices and pci_dn
instances have same life cycle. Also, VF's EEH device is identified
by (struct eeh_dev::physfn).

[gwshan: changelog and removed CONFIG_PCI_IOV]
Signed-off-by: Wei Yang <weiyang@linux.vnet.ibm.com>
Acked-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
---
 arch/powerpc/include/asm/eeh.h |    1 +
 arch/powerpc/kernel/pci_dn.c   |   12 ++++++++++++
 2 files changed, 13 insertions(+)

diff --git a/arch/powerpc/include/asm/eeh.h b/arch/powerpc/include/asm/eeh.h
index a52db28..1b3614d 100644
--- a/arch/powerpc/include/asm/eeh.h
+++ b/arch/powerpc/include/asm/eeh.h
@@ -138,6 +138,7 @@ struct eeh_dev {
 	struct pci_controller *phb;	/* Associated PHB		*/
 	struct pci_dn *pdn;		/* Associated PCI device node	*/
 	struct pci_dev *pdev;		/* Associated PCI device	*/
+	struct pci_dev *physfn;		/* Associated PF PORT		*/
 	struct pci_bus *bus;		/* PCI bus for partial hotplug	*/
 };
 
diff --git a/arch/powerpc/kernel/pci_dn.c b/arch/powerpc/kernel/pci_dn.c
index f771130..f0ddde7 100644
--- a/arch/powerpc/kernel/pci_dn.c
+++ b/arch/powerpc/kernel/pci_dn.c
@@ -180,7 +180,9 @@ static struct pci_dn *add_one_dev_pci_data(struct pci_dn *parent,
 struct pci_dn *add_dev_pci_data(struct pci_dev *pdev)
 {
 #ifdef CONFIG_PCI_IOV
+	struct pci_controller *hose = pci_bus_to_host(pdev->bus);
 	struct pci_dn *parent, *pdn;
+	struct eeh_dev *edev;
 	int i;
 
 	/* Only support IOV for now */
@@ -206,6 +208,9 @@ struct pci_dn *add_dev_pci_data(struct pci_dev *pdev)
 				 __func__, i);
 			return NULL;
 		}
+		eeh_dev_init(pdn, hose);
+		edev = pdn_to_eeh_dev(pdn);
+		edev->physfn = pdev;
 	}
 #endif /* CONFIG_PCI_IOV */
 
@@ -254,10 +259,17 @@ void remove_dev_pci_data(struct pci_dev *pdev)
 	for (i = 0; i < pci_sriov_get_totalvfs(pdev); i++) {
 		list_for_each_entry_safe(pdn, tmp,
 			&parent->child_list, list) {
+			struct eeh_dev *edev;
 			if (pdn->busno != pci_iov_virtfn_bus(pdev, i) ||
 			    pdn->devfn != pci_iov_virtfn_devfn(pdev, i))
 				continue;
 
+			edev = pdn_to_eeh_dev(pdn);
+			if (edev) {
+				pdn->edev = NULL;
+				kfree(edev);
+			}
+
 			if (!list_empty(&pdn->list))
 				list_del(&pdn->list);
 
-- 
1.7.9.5


^ permalink raw reply related	[flat|nested] 68+ messages in thread

* [PATCH V7 05/10] powerpc/powernv: EEH device for VF
@ 2015-05-19 10:50     ` Wei Yang
  0 siblings, 0 replies; 68+ messages in thread
From: Wei Yang @ 2015-05-19 10:50 UTC (permalink / raw)
  To: gwshan, bhelgaas; +Cc: linux-pci, Wei Yang, linuxppc-dev

VFs and their corresponding pci_dn instances are created and released
dynamically as their PF's SRIOV capability is enabled and disabled.
The patch creates and releases EEH devices for VFs when creating and
releasing their pci_dn instances, which means EEH devices and pci_dn
instances have same life cycle. Also, VF's EEH device is identified
by (struct eeh_dev::physfn).

[gwshan: changelog and removed CONFIG_PCI_IOV]
Signed-off-by: Wei Yang <weiyang@linux.vnet.ibm.com>
Acked-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
---
 arch/powerpc/include/asm/eeh.h |    1 +
 arch/powerpc/kernel/pci_dn.c   |   12 ++++++++++++
 2 files changed, 13 insertions(+)

diff --git a/arch/powerpc/include/asm/eeh.h b/arch/powerpc/include/asm/eeh.h
index a52db28..1b3614d 100644
--- a/arch/powerpc/include/asm/eeh.h
+++ b/arch/powerpc/include/asm/eeh.h
@@ -138,6 +138,7 @@ struct eeh_dev {
 	struct pci_controller *phb;	/* Associated PHB		*/
 	struct pci_dn *pdn;		/* Associated PCI device node	*/
 	struct pci_dev *pdev;		/* Associated PCI device	*/
+	struct pci_dev *physfn;		/* Associated PF PORT		*/
 	struct pci_bus *bus;		/* PCI bus for partial hotplug	*/
 };
 
diff --git a/arch/powerpc/kernel/pci_dn.c b/arch/powerpc/kernel/pci_dn.c
index f771130..f0ddde7 100644
--- a/arch/powerpc/kernel/pci_dn.c
+++ b/arch/powerpc/kernel/pci_dn.c
@@ -180,7 +180,9 @@ static struct pci_dn *add_one_dev_pci_data(struct pci_dn *parent,
 struct pci_dn *add_dev_pci_data(struct pci_dev *pdev)
 {
 #ifdef CONFIG_PCI_IOV
+	struct pci_controller *hose = pci_bus_to_host(pdev->bus);
 	struct pci_dn *parent, *pdn;
+	struct eeh_dev *edev;
 	int i;
 
 	/* Only support IOV for now */
@@ -206,6 +208,9 @@ struct pci_dn *add_dev_pci_data(struct pci_dev *pdev)
 				 __func__, i);
 			return NULL;
 		}
+		eeh_dev_init(pdn, hose);
+		edev = pdn_to_eeh_dev(pdn);
+		edev->physfn = pdev;
 	}
 #endif /* CONFIG_PCI_IOV */
 
@@ -254,10 +259,17 @@ void remove_dev_pci_data(struct pci_dev *pdev)
 	for (i = 0; i < pci_sriov_get_totalvfs(pdev); i++) {
 		list_for_each_entry_safe(pdn, tmp,
 			&parent->child_list, list) {
+			struct eeh_dev *edev;
 			if (pdn->busno != pci_iov_virtfn_bus(pdev, i) ||
 			    pdn->devfn != pci_iov_virtfn_devfn(pdev, i))
 				continue;
 
+			edev = pdn_to_eeh_dev(pdn);
+			if (edev) {
+				pdn->edev = NULL;
+				kfree(edev);
+			}
+
 			if (!list_empty(&pdn->list))
 				list_del(&pdn->list);
 
-- 
1.7.9.5

^ permalink raw reply related	[flat|nested] 68+ messages in thread

* [PATCH V7 06/10] powerpc/eeh: Create PE for VFs
  2015-05-19 10:50   ` Wei Yang
@ 2015-05-19 10:50     ` Wei Yang
  -1 siblings, 0 replies; 68+ messages in thread
From: Wei Yang @ 2015-05-19 10:50 UTC (permalink / raw)
  To: gwshan, bhelgaas; +Cc: linuxppc-dev, linux-pci, Wei Yang

Current EEH recovery code works with the assumption: the PE has primary
bus. Unfortunately, that's not true to VF PEs, which generally contains
one or multiple VFs (for VF group case). The patch creates PEs for VFs
at PCI final fixup time. Those PEs for VFs are indentified with newly
introduced flag EEH_PE_VF so that we handle them differently during
EEH recovery.

[gwshan: changelog and code refactoring]
Signed-off-by: Wei Yang <weiyang@linux.vnet.ibm.com>
Acked-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
---
 arch/powerpc/include/asm/eeh.h               |    1 +
 arch/powerpc/kernel/eeh_pe.c                 |   10 ++++++++--
 arch/powerpc/platforms/powernv/eeh-powernv.c |   17 +++++++++++++++++
 3 files changed, 26 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/include/asm/eeh.h b/arch/powerpc/include/asm/eeh.h
index 1b3614d..c1fde48 100644
--- a/arch/powerpc/include/asm/eeh.h
+++ b/arch/powerpc/include/asm/eeh.h
@@ -70,6 +70,7 @@ struct pci_dn;
 #define EEH_PE_PHB	(1 << 1)	/* PHB PE    */
 #define EEH_PE_DEVICE 	(1 << 2)	/* Device PE */
 #define EEH_PE_BUS	(1 << 3)	/* Bus PE    */
+#define EEH_PE_VF	(1 << 4)	/* VF PE     */
 
 #define EEH_PE_ISOLATED		(1 << 0)	/* Isolated PE		*/
 #define EEH_PE_RECOVERING	(1 << 1)	/* Recovering PE	*/
diff --git a/arch/powerpc/kernel/eeh_pe.c b/arch/powerpc/kernel/eeh_pe.c
index 35f0b62..260a701 100644
--- a/arch/powerpc/kernel/eeh_pe.c
+++ b/arch/powerpc/kernel/eeh_pe.c
@@ -299,7 +299,10 @@ static struct eeh_pe *eeh_pe_get_parent(struct eeh_dev *edev)
 	 * EEH device already having associated PE, but
 	 * the direct parent EEH device doesn't have yet.
 	 */
-	pdn = pdn ? pdn->parent : NULL;
+	if (edev->physfn)
+		pdn = pci_get_pdn(edev->physfn);
+	else
+		pdn = pdn ? pdn->parent : NULL;
 	while (pdn) {
 		/* We're poking out of PCI territory */
 		parent = pdn_to_eeh_dev(pdn);
@@ -382,7 +385,10 @@ int eeh_add_to_parent_pe(struct eeh_dev *edev)
 	}
 
 	/* Create a new EEH PE */
-	pe = eeh_pe_alloc(edev->phb, EEH_PE_DEVICE);
+	if (edev->physfn)
+		pe = eeh_pe_alloc(edev->phb, EEH_PE_VF);
+	else
+		pe = eeh_pe_alloc(edev->phb, EEH_PE_DEVICE);
 	if (!pe) {
 		pr_err("%s: out of memory!\n", __func__);
 		return -ENOMEM;
diff --git a/arch/powerpc/platforms/powernv/eeh-powernv.c b/arch/powerpc/platforms/powernv/eeh-powernv.c
index ce738ab..c505036 100644
--- a/arch/powerpc/platforms/powernv/eeh-powernv.c
+++ b/arch/powerpc/platforms/powernv/eeh-powernv.c
@@ -1520,6 +1520,23 @@ static struct eeh_ops pnv_eeh_ops = {
 	.restore_config		= pnv_eeh_restore_config
 };
 
+static void pnv_eeh_vf_final_fixup(struct pci_dev *pdev)
+{
+	struct pci_dn *pdn = pci_get_pdn(pdev);
+
+	if (!pdev->is_virtfn)
+		return;
+
+	/*
+	 * The following operations will fail if VF's sysfs files
+	 * aren't created or its resources aren't finalized.
+	 */
+	eeh_add_device_early(pdn);
+	eeh_add_device_late(pdev);
+	eeh_sysfs_add_device(pdev);
+}
+DECLARE_PCI_FIXUP_FINAL(PCI_ANY_ID, PCI_ANY_ID, pnv_eeh_vf_final_fixup);
+
 /**
  * eeh_powernv_init - Register platform dependent EEH operations
  *
-- 
1.7.9.5


^ permalink raw reply related	[flat|nested] 68+ messages in thread

* [PATCH V7 06/10] powerpc/eeh: Create PE for VFs
@ 2015-05-19 10:50     ` Wei Yang
  0 siblings, 0 replies; 68+ messages in thread
From: Wei Yang @ 2015-05-19 10:50 UTC (permalink / raw)
  To: gwshan, bhelgaas; +Cc: linux-pci, Wei Yang, linuxppc-dev

Current EEH recovery code works with the assumption: the PE has primary
bus. Unfortunately, that's not true to VF PEs, which generally contains
one or multiple VFs (for VF group case). The patch creates PEs for VFs
at PCI final fixup time. Those PEs for VFs are indentified with newly
introduced flag EEH_PE_VF so that we handle them differently during
EEH recovery.

[gwshan: changelog and code refactoring]
Signed-off-by: Wei Yang <weiyang@linux.vnet.ibm.com>
Acked-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
---
 arch/powerpc/include/asm/eeh.h               |    1 +
 arch/powerpc/kernel/eeh_pe.c                 |   10 ++++++++--
 arch/powerpc/platforms/powernv/eeh-powernv.c |   17 +++++++++++++++++
 3 files changed, 26 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/include/asm/eeh.h b/arch/powerpc/include/asm/eeh.h
index 1b3614d..c1fde48 100644
--- a/arch/powerpc/include/asm/eeh.h
+++ b/arch/powerpc/include/asm/eeh.h
@@ -70,6 +70,7 @@ struct pci_dn;
 #define EEH_PE_PHB	(1 << 1)	/* PHB PE    */
 #define EEH_PE_DEVICE 	(1 << 2)	/* Device PE */
 #define EEH_PE_BUS	(1 << 3)	/* Bus PE    */
+#define EEH_PE_VF	(1 << 4)	/* VF PE     */
 
 #define EEH_PE_ISOLATED		(1 << 0)	/* Isolated PE		*/
 #define EEH_PE_RECOVERING	(1 << 1)	/* Recovering PE	*/
diff --git a/arch/powerpc/kernel/eeh_pe.c b/arch/powerpc/kernel/eeh_pe.c
index 35f0b62..260a701 100644
--- a/arch/powerpc/kernel/eeh_pe.c
+++ b/arch/powerpc/kernel/eeh_pe.c
@@ -299,7 +299,10 @@ static struct eeh_pe *eeh_pe_get_parent(struct eeh_dev *edev)
 	 * EEH device already having associated PE, but
 	 * the direct parent EEH device doesn't have yet.
 	 */
-	pdn = pdn ? pdn->parent : NULL;
+	if (edev->physfn)
+		pdn = pci_get_pdn(edev->physfn);
+	else
+		pdn = pdn ? pdn->parent : NULL;
 	while (pdn) {
 		/* We're poking out of PCI territory */
 		parent = pdn_to_eeh_dev(pdn);
@@ -382,7 +385,10 @@ int eeh_add_to_parent_pe(struct eeh_dev *edev)
 	}
 
 	/* Create a new EEH PE */
-	pe = eeh_pe_alloc(edev->phb, EEH_PE_DEVICE);
+	if (edev->physfn)
+		pe = eeh_pe_alloc(edev->phb, EEH_PE_VF);
+	else
+		pe = eeh_pe_alloc(edev->phb, EEH_PE_DEVICE);
 	if (!pe) {
 		pr_err("%s: out of memory!\n", __func__);
 		return -ENOMEM;
diff --git a/arch/powerpc/platforms/powernv/eeh-powernv.c b/arch/powerpc/platforms/powernv/eeh-powernv.c
index ce738ab..c505036 100644
--- a/arch/powerpc/platforms/powernv/eeh-powernv.c
+++ b/arch/powerpc/platforms/powernv/eeh-powernv.c
@@ -1520,6 +1520,23 @@ static struct eeh_ops pnv_eeh_ops = {
 	.restore_config		= pnv_eeh_restore_config
 };
 
+static void pnv_eeh_vf_final_fixup(struct pci_dev *pdev)
+{
+	struct pci_dn *pdn = pci_get_pdn(pdev);
+
+	if (!pdev->is_virtfn)
+		return;
+
+	/*
+	 * The following operations will fail if VF's sysfs files
+	 * aren't created or its resources aren't finalized.
+	 */
+	eeh_add_device_early(pdn);
+	eeh_add_device_late(pdev);
+	eeh_sysfs_add_device(pdev);
+}
+DECLARE_PCI_FIXUP_FINAL(PCI_ANY_ID, PCI_ANY_ID, pnv_eeh_vf_final_fixup);
+
 /**
  * eeh_powernv_init - Register platform dependent EEH operations
  *
-- 
1.7.9.5

^ permalink raw reply related	[flat|nested] 68+ messages in thread

* [PATCH V7 07/10] powerpc/powernv: Support EEH reset for VF PE
  2015-05-19 10:50   ` Wei Yang
@ 2015-05-19 10:50     ` Wei Yang
  -1 siblings, 0 replies; 68+ messages in thread
From: Wei Yang @ 2015-05-19 10:50 UTC (permalink / raw)
  To: gwshan, bhelgaas; +Cc: linuxppc-dev, linux-pci, Wei Yang

PEs for VFs don't have primary bus. So they have to have their own reset
backend, which is used during EEH recovery. The patch implements the reset
backend for VF's PE by issuing FLR or AF FLR to the VFs, which are contained
in the PE.

[gwshan: changelog and code refactoring]
Signed-off-by: Wei Yang <weiyang@linux.vnet.ibm.com>
Acked-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
---
 arch/powerpc/include/asm/eeh.h               |    1 +
 arch/powerpc/platforms/powernv/eeh-powernv.c |  134 +++++++++++++++++++++++++-
 2 files changed, 134 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/include/asm/eeh.h b/arch/powerpc/include/asm/eeh.h
index c1fde48..3d64cf3 100644
--- a/arch/powerpc/include/asm/eeh.h
+++ b/arch/powerpc/include/asm/eeh.h
@@ -134,6 +134,7 @@ struct eeh_dev {
 	int pcix_cap;			/* Saved PCIx capability	*/
 	int pcie_cap;			/* Saved PCIe capability	*/
 	int aer_cap;			/* Saved AER capability		*/
+	int af_cap;			/* Saved AF capability		*/
 	struct eeh_pe *pe;		/* Associated PE		*/
 	struct list_head list;		/* Form link list in the PE	*/
 	struct pci_controller *phb;	/* Associated PHB		*/
diff --git a/arch/powerpc/platforms/powernv/eeh-powernv.c b/arch/powerpc/platforms/powernv/eeh-powernv.c
index c505036..7af3c1e 100644
--- a/arch/powerpc/platforms/powernv/eeh-powernv.c
+++ b/arch/powerpc/platforms/powernv/eeh-powernv.c
@@ -402,6 +402,7 @@ static void *pnv_eeh_probe(struct pci_dn *pdn, void *data)
 	edev->pcix_cap = pnv_eeh_find_cap(pdn, PCI_CAP_ID_PCIX);
 	edev->pcie_cap = pnv_eeh_find_cap(pdn, PCI_CAP_ID_EXP);
 	edev->aer_cap  = pnv_eeh_find_ecap(pdn, PCI_EXT_CAP_ID_ERR);
+	edev->af_cap   = pnv_eeh_find_cap(pdn, PCI_CAP_ID_AF);
 	if ((edev->class_code >> 8) == PCI_CLASS_BRIDGE_PCI) {
 		edev->mode |= EEH_DEV_BRIDGE;
 		if (edev->pcie_cap) {
@@ -891,6 +892,127 @@ static int pnv_eeh_bridge_reset(struct pci_dev *dev, int option)
 	return 0;
 }
 
+static void pnv_eeh_wait_for_pending(struct pci_dn *pdn, int pos,
+				     u16 mask, bool af_flr_rst)
+{
+	struct eeh_dev *edev = pdn_to_eeh_dev(pdn);
+	int status, i;
+
+	/* Wait for Transaction Pending bit to be cleared */
+	for (i = 0; i < 4; i++) {
+		eeh_ops->read_config(pdn, pos, 2, &status);
+		if (!(status & mask))
+			return;
+
+		msleep((1 << i) * 100);
+	}
+
+	pr_warn("%s: Pending transaction while issuing %s FLR to "
+		"%04x:%02x:%02x.%01x\n",
+		__func__, af_flr_rst ? "AF" : "",
+		edev->phb->global_number, pdn->busno,
+		PCI_SLOT(pdn->devfn), PCI_FUNC(pdn->devfn));
+}
+
+static int pnv_eeh_do_flr(struct pci_dn *pdn, int option)
+{
+	struct eeh_dev *edev = pdn_to_eeh_dev(pdn);
+	u32 reg;
+
+	if (!edev->pcie_cap)
+		return -ENOTTY;
+
+	eeh_ops->read_config(pdn, edev->pcie_cap + PCI_EXP_DEVCAP, 4, &reg);
+	if (!(reg & PCI_EXP_DEVCAP_FLR))
+		return -ENOTTY;
+
+	switch (option) {
+	case EEH_RESET_HOT:
+	case EEH_RESET_FUNDAMENTAL:
+		pnv_eeh_wait_for_pending(pdn, edev->pcie_cap + PCI_EXP_DEVSTA,
+					 PCI_EXP_DEVSTA_TRPND, false);
+		eeh_ops->read_config(pdn, edev->pcie_cap + PCI_EXP_DEVCTL,
+				     4, &reg);
+		reg |= PCI_EXP_DEVCTL_BCR_FLR;
+		eeh_ops->write_config(pdn, edev->pcie_cap + PCI_EXP_DEVCTL,
+				      4, reg);
+		msleep(EEH_PE_RST_HOLD_TIME);
+		break;
+	case EEH_RESET_DEACTIVATE:
+		eeh_ops->read_config(pdn, edev->pcie_cap + PCI_EXP_DEVCTL,
+				     4, &reg);
+		reg &= ~PCI_EXP_DEVCTL_BCR_FLR;
+		eeh_ops->write_config(pdn, edev->pcie_cap + PCI_EXP_DEVCTL,
+				      4, reg);
+		msleep(EEH_PE_RST_SETTLE_TIME);
+		break;
+	}
+
+	return 0;
+}
+
+static int pnv_eeh_do_af_flr(struct pci_dn *pdn, int option)
+{
+	struct eeh_dev *edev = pdn_to_eeh_dev(pdn);
+	u32 cap;
+
+	if (!edev->af_cap)
+		return -ENOTTY;
+
+	eeh_ops->read_config(pdn, edev->af_cap + PCI_AF_CAP, 1, &cap);
+	if (!(cap & PCI_AF_CAP_TP) || !(cap & PCI_AF_CAP_FLR))
+		return -ENOTTY;
+
+	switch (option) {
+	case EEH_RESET_HOT:
+	case EEH_RESET_FUNDAMENTAL:
+		/*
+		 * Wait for Transaction Pending bit to clear. A word-aligned
+		 * test is used, so we use the conrol offset rather than status
+		 * and shift the test bit to match.
+		 */
+		pnv_eeh_wait_for_pending(pdn, edev->af_cap + PCI_AF_CTRL,
+					 PCI_AF_STATUS_TP << 8, true);
+		eeh_ops->write_config(pdn, edev->af_cap + PCI_AF_CTRL,
+				      1, PCI_AF_CTRL_FLR);
+		msleep(EEH_PE_RST_HOLD_TIME);
+		break;
+	case EEH_RESET_DEACTIVATE:
+		eeh_ops->write_config(pdn, edev->af_cap + PCI_AF_CTRL, 1, 0);
+		msleep(EEH_PE_RST_SETTLE_TIME);
+		break;
+	}
+
+	return 0;
+}
+
+static int pnv_eeh_reset_vf(struct pci_dn *pdn, int option)
+{
+	int ret;
+
+	ret = pnv_eeh_do_flr(pdn, option);
+	if (ret)
+		return ret;
+
+	return pnv_eeh_do_af_flr(pdn, option);
+}
+
+static int pnv_eeh_vf_pe_reset(struct eeh_pe *pe, int option)
+{
+	struct eeh_dev *edev, *tmp;
+	struct pci_dn *pdn;
+	int ret;
+
+	eeh_pe_for_each_dev(pe, edev, tmp) {
+		pdn = eeh_dev_to_pdn(edev);
+		ret = pnv_eeh_reset_vf(pdn, option);
+		if (ret)
+			return ret;
+	}
+
+	return 0;
+}
+
 void pnv_pci_reset_secondary_bus(struct pci_dev *dev)
 {
 	struct pci_controller *hose;
@@ -966,7 +1088,9 @@ static int pnv_eeh_reset(struct eeh_pe *pe, int option)
 		}
 
 		bus = eeh_pe_bus_get(pe);
-		if (pci_is_root_bus(bus) ||
+		if (pe->type & EEH_PE_VF)
+			ret = pnv_eeh_vf_pe_reset(pe, option);
+		else if (pci_is_root_bus(bus) ||
 			pci_is_root_bus(bus->parent))
 			ret = pnv_eeh_root_reset(hose, option);
 		else
@@ -1106,6 +1230,14 @@ static inline bool pnv_eeh_cfg_blocked(struct pci_dn *pdn)
 	if (!edev || !edev->pe)
 		return false;
 
+	/*
+	 * We will issue FLR or AF FLR to all VFs, which are contained
+	 * in VF PE. It relies on the EEH PCI config accessors. So we
+	 * can't block them during the window.
+	 */
+	if ((edev->physfn) && (edev->pe->state & EEH_PE_RESET))
+		return false;
+
 	if (edev->pe->state & EEH_PE_CFG_BLOCKED)
 		return true;
 
-- 
1.7.9.5


^ permalink raw reply related	[flat|nested] 68+ messages in thread

* [PATCH V7 07/10] powerpc/powernv: Support EEH reset for VF PE
@ 2015-05-19 10:50     ` Wei Yang
  0 siblings, 0 replies; 68+ messages in thread
From: Wei Yang @ 2015-05-19 10:50 UTC (permalink / raw)
  To: gwshan, bhelgaas; +Cc: linux-pci, Wei Yang, linuxppc-dev

PEs for VFs don't have primary bus. So they have to have their own reset
backend, which is used during EEH recovery. The patch implements the reset
backend for VF's PE by issuing FLR or AF FLR to the VFs, which are contained
in the PE.

[gwshan: changelog and code refactoring]
Signed-off-by: Wei Yang <weiyang@linux.vnet.ibm.com>
Acked-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
---
 arch/powerpc/include/asm/eeh.h               |    1 +
 arch/powerpc/platforms/powernv/eeh-powernv.c |  134 +++++++++++++++++++++++++-
 2 files changed, 134 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/include/asm/eeh.h b/arch/powerpc/include/asm/eeh.h
index c1fde48..3d64cf3 100644
--- a/arch/powerpc/include/asm/eeh.h
+++ b/arch/powerpc/include/asm/eeh.h
@@ -134,6 +134,7 @@ struct eeh_dev {
 	int pcix_cap;			/* Saved PCIx capability	*/
 	int pcie_cap;			/* Saved PCIe capability	*/
 	int aer_cap;			/* Saved AER capability		*/
+	int af_cap;			/* Saved AF capability		*/
 	struct eeh_pe *pe;		/* Associated PE		*/
 	struct list_head list;		/* Form link list in the PE	*/
 	struct pci_controller *phb;	/* Associated PHB		*/
diff --git a/arch/powerpc/platforms/powernv/eeh-powernv.c b/arch/powerpc/platforms/powernv/eeh-powernv.c
index c505036..7af3c1e 100644
--- a/arch/powerpc/platforms/powernv/eeh-powernv.c
+++ b/arch/powerpc/platforms/powernv/eeh-powernv.c
@@ -402,6 +402,7 @@ static void *pnv_eeh_probe(struct pci_dn *pdn, void *data)
 	edev->pcix_cap = pnv_eeh_find_cap(pdn, PCI_CAP_ID_PCIX);
 	edev->pcie_cap = pnv_eeh_find_cap(pdn, PCI_CAP_ID_EXP);
 	edev->aer_cap  = pnv_eeh_find_ecap(pdn, PCI_EXT_CAP_ID_ERR);
+	edev->af_cap   = pnv_eeh_find_cap(pdn, PCI_CAP_ID_AF);
 	if ((edev->class_code >> 8) == PCI_CLASS_BRIDGE_PCI) {
 		edev->mode |= EEH_DEV_BRIDGE;
 		if (edev->pcie_cap) {
@@ -891,6 +892,127 @@ static int pnv_eeh_bridge_reset(struct pci_dev *dev, int option)
 	return 0;
 }
 
+static void pnv_eeh_wait_for_pending(struct pci_dn *pdn, int pos,
+				     u16 mask, bool af_flr_rst)
+{
+	struct eeh_dev *edev = pdn_to_eeh_dev(pdn);
+	int status, i;
+
+	/* Wait for Transaction Pending bit to be cleared */
+	for (i = 0; i < 4; i++) {
+		eeh_ops->read_config(pdn, pos, 2, &status);
+		if (!(status & mask))
+			return;
+
+		msleep((1 << i) * 100);
+	}
+
+	pr_warn("%s: Pending transaction while issuing %s FLR to "
+		"%04x:%02x:%02x.%01x\n",
+		__func__, af_flr_rst ? "AF" : "",
+		edev->phb->global_number, pdn->busno,
+		PCI_SLOT(pdn->devfn), PCI_FUNC(pdn->devfn));
+}
+
+static int pnv_eeh_do_flr(struct pci_dn *pdn, int option)
+{
+	struct eeh_dev *edev = pdn_to_eeh_dev(pdn);
+	u32 reg;
+
+	if (!edev->pcie_cap)
+		return -ENOTTY;
+
+	eeh_ops->read_config(pdn, edev->pcie_cap + PCI_EXP_DEVCAP, 4, &reg);
+	if (!(reg & PCI_EXP_DEVCAP_FLR))
+		return -ENOTTY;
+
+	switch (option) {
+	case EEH_RESET_HOT:
+	case EEH_RESET_FUNDAMENTAL:
+		pnv_eeh_wait_for_pending(pdn, edev->pcie_cap + PCI_EXP_DEVSTA,
+					 PCI_EXP_DEVSTA_TRPND, false);
+		eeh_ops->read_config(pdn, edev->pcie_cap + PCI_EXP_DEVCTL,
+				     4, &reg);
+		reg |= PCI_EXP_DEVCTL_BCR_FLR;
+		eeh_ops->write_config(pdn, edev->pcie_cap + PCI_EXP_DEVCTL,
+				      4, reg);
+		msleep(EEH_PE_RST_HOLD_TIME);
+		break;
+	case EEH_RESET_DEACTIVATE:
+		eeh_ops->read_config(pdn, edev->pcie_cap + PCI_EXP_DEVCTL,
+				     4, &reg);
+		reg &= ~PCI_EXP_DEVCTL_BCR_FLR;
+		eeh_ops->write_config(pdn, edev->pcie_cap + PCI_EXP_DEVCTL,
+				      4, reg);
+		msleep(EEH_PE_RST_SETTLE_TIME);
+		break;
+	}
+
+	return 0;
+}
+
+static int pnv_eeh_do_af_flr(struct pci_dn *pdn, int option)
+{
+	struct eeh_dev *edev = pdn_to_eeh_dev(pdn);
+	u32 cap;
+
+	if (!edev->af_cap)
+		return -ENOTTY;
+
+	eeh_ops->read_config(pdn, edev->af_cap + PCI_AF_CAP, 1, &cap);
+	if (!(cap & PCI_AF_CAP_TP) || !(cap & PCI_AF_CAP_FLR))
+		return -ENOTTY;
+
+	switch (option) {
+	case EEH_RESET_HOT:
+	case EEH_RESET_FUNDAMENTAL:
+		/*
+		 * Wait for Transaction Pending bit to clear. A word-aligned
+		 * test is used, so we use the conrol offset rather than status
+		 * and shift the test bit to match.
+		 */
+		pnv_eeh_wait_for_pending(pdn, edev->af_cap + PCI_AF_CTRL,
+					 PCI_AF_STATUS_TP << 8, true);
+		eeh_ops->write_config(pdn, edev->af_cap + PCI_AF_CTRL,
+				      1, PCI_AF_CTRL_FLR);
+		msleep(EEH_PE_RST_HOLD_TIME);
+		break;
+	case EEH_RESET_DEACTIVATE:
+		eeh_ops->write_config(pdn, edev->af_cap + PCI_AF_CTRL, 1, 0);
+		msleep(EEH_PE_RST_SETTLE_TIME);
+		break;
+	}
+
+	return 0;
+}
+
+static int pnv_eeh_reset_vf(struct pci_dn *pdn, int option)
+{
+	int ret;
+
+	ret = pnv_eeh_do_flr(pdn, option);
+	if (ret)
+		return ret;
+
+	return pnv_eeh_do_af_flr(pdn, option);
+}
+
+static int pnv_eeh_vf_pe_reset(struct eeh_pe *pe, int option)
+{
+	struct eeh_dev *edev, *tmp;
+	struct pci_dn *pdn;
+	int ret;
+
+	eeh_pe_for_each_dev(pe, edev, tmp) {
+		pdn = eeh_dev_to_pdn(edev);
+		ret = pnv_eeh_reset_vf(pdn, option);
+		if (ret)
+			return ret;
+	}
+
+	return 0;
+}
+
 void pnv_pci_reset_secondary_bus(struct pci_dev *dev)
 {
 	struct pci_controller *hose;
@@ -966,7 +1088,9 @@ static int pnv_eeh_reset(struct eeh_pe *pe, int option)
 		}
 
 		bus = eeh_pe_bus_get(pe);
-		if (pci_is_root_bus(bus) ||
+		if (pe->type & EEH_PE_VF)
+			ret = pnv_eeh_vf_pe_reset(pe, option);
+		else if (pci_is_root_bus(bus) ||
 			pci_is_root_bus(bus->parent))
 			ret = pnv_eeh_root_reset(hose, option);
 		else
@@ -1106,6 +1230,14 @@ static inline bool pnv_eeh_cfg_blocked(struct pci_dn *pdn)
 	if (!edev || !edev->pe)
 		return false;
 
+	/*
+	 * We will issue FLR or AF FLR to all VFs, which are contained
+	 * in VF PE. It relies on the EEH PCI config accessors. So we
+	 * can't block them during the window.
+	 */
+	if ((edev->physfn) && (edev->pe->state & EEH_PE_RESET))
+		return false;
+
 	if (edev->pe->state & EEH_PE_CFG_BLOCKED)
 		return true;
 
-- 
1.7.9.5

^ permalink raw reply related	[flat|nested] 68+ messages in thread

* [PATCH V7 08/10] powerpc/powernv: Support PCI config restore for VFs
  2015-05-19 10:50   ` Wei Yang
@ 2015-05-19 10:50     ` Wei Yang
  -1 siblings, 0 replies; 68+ messages in thread
From: Wei Yang @ 2015-05-19 10:50 UTC (permalink / raw)
  To: gwshan, bhelgaas; +Cc: linuxppc-dev, linux-pci, Wei Yang

After PE reset, OPAL API opal_pci_reinit() is called on all devices
contained in the PE to reinitialize them. However, VFs can't be seen
from skiboot firmware. We have to implement the functions, similar
those in skiboot firmware, to reinitialize VFs after reset on PE
for VFs.

[gwshan: changelog and code refactoring]
Signed-off-by: Wei Yang <weiyang@linux.vnet.ibm.com>
Acked-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
---
 arch/powerpc/include/asm/pci-bridge.h        |    1 +
 arch/powerpc/platforms/powernv/eeh-powernv.c |   70 +++++++++++++++++++++++++-
 arch/powerpc/platforms/powernv/pci.c         |   18 +++++++
 3 files changed, 88 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/include/asm/pci-bridge.h b/arch/powerpc/include/asm/pci-bridge.h
index c324882..ad60263 100644
--- a/arch/powerpc/include/asm/pci-bridge.h
+++ b/arch/powerpc/include/asm/pci-bridge.h
@@ -206,6 +206,7 @@ struct pci_dn {
 #define IODA_INVALID_M64        (-1)
 	int     m64_wins[PCI_SRIOV_NUM_BARS][M64_PER_IOV];
 #endif /* CONFIG_PCI_IOV */
+	int	mps;
 #endif
 	struct list_head child_list;
 	struct list_head list;
diff --git a/arch/powerpc/platforms/powernv/eeh-powernv.c b/arch/powerpc/platforms/powernv/eeh-powernv.c
index 7af3c1e..33deb78 100644
--- a/arch/powerpc/platforms/powernv/eeh-powernv.c
+++ b/arch/powerpc/platforms/powernv/eeh-powernv.c
@@ -1612,6 +1612,67 @@ static int pnv_eeh_next_error(struct eeh_pe **pe)
 	return ret;
 }
 
+static int pnv_eeh_restore_vf_config(struct pci_dn *pdn)
+{
+	struct eeh_dev *edev = pdn_to_eeh_dev(pdn);
+	u32 devctl, cmd, cap2, aer_capctl;
+	int old_mps;
+
+	/* Restore MPS */
+	if (edev->pcie_cap) {
+		old_mps = (ffs(pdn->mps) - 8) << 5;
+		eeh_ops->read_config(pdn, edev->pcie_cap + PCI_EXP_DEVCTL,
+				     2, &devctl);
+		devctl &= ~PCI_EXP_DEVCTL_PAYLOAD;
+		devctl |= old_mps;
+		eeh_ops->write_config(pdn, edev->pcie_cap + PCI_EXP_DEVCTL,
+				      2, devctl);
+	}
+
+	/* Disable Completion Timeout */
+	if (edev->pcie_cap) {
+		eeh_ops->read_config(pdn, edev->pcie_cap + PCI_EXP_DEVCAP2,
+				     4, &cap2);
+		if (cap2 & 0x10) {
+			eeh_ops->read_config(pdn,
+					edev->pcie_cap + PCI_EXP_DEVCTL2,
+					4, &cap2);
+			cap2 |= 0x10;
+			eeh_ops->write_config(pdn,
+					edev->pcie_cap + PCI_EXP_DEVCTL2,
+					4, cap2);
+		}
+	}
+
+	/* Enable SERR and parity checking */
+	eeh_ops->read_config(pdn, PCI_COMMAND, 2, &cmd);
+	cmd |= (PCI_COMMAND_PARITY | PCI_COMMAND_SERR);
+	eeh_ops->write_config(pdn, PCI_COMMAND, 2, cmd);
+
+	/* Enable report various errors */
+	if (edev->pcie_cap) {
+		eeh_ops->read_config(pdn, edev->pcie_cap + PCI_EXP_DEVCTL,
+				2, &devctl);
+		devctl &= ~PCI_EXP_DEVCTL_CERE;
+		devctl |= (PCI_EXP_DEVCTL_NFERE |
+			   PCI_EXP_DEVCTL_FERE |
+			   PCI_EXP_DEVCTL_URRE);
+		eeh_ops->write_config(pdn, edev->pcie_cap + PCI_EXP_DEVCTL,
+				2, devctl);
+	}
+
+	/* Enable ECRC generation and check */
+	if (edev->pcie_cap && edev->aer_cap) {
+		eeh_ops->read_config(pdn, edev->aer_cap + PCI_ERR_CAP,
+				4, &aer_capctl);
+		aer_capctl |= (PCI_ERR_CAP_ECRC_GENE | PCI_ERR_CAP_ECRC_CHKE);
+		eeh_ops->write_config(pdn, edev->aer_cap + PCI_ERR_CAP,
+				4, aer_capctl);
+	}
+
+	return 0;
+}
+
 static int pnv_eeh_restore_config(struct pci_dn *pdn)
 {
 	struct eeh_dev *edev = pdn_to_eeh_dev(pdn);
@@ -1622,7 +1683,14 @@ static int pnv_eeh_restore_config(struct pci_dn *pdn)
 		return -EEXIST;
 
 	phb = edev->phb->private_data;
-	ret = opal_pci_reinit(phb->opal_id,
+	/*
+	 * We have to restore the PCI config space after reset since the
+	 * firmware can't see SRIOV VFs.
+	 */
+	if (edev->physfn)
+		ret = pnv_eeh_restore_vf_config(pdn);
+	else
+		ret = opal_pci_reinit(phb->opal_id,
 			      OPAL_REINIT_PCI_DEV, edev->config_addr);
 	if (ret) {
 		pr_warn("%s: Can't reinit PCI dev 0x%x (%lld)\n",
diff --git a/arch/powerpc/platforms/powernv/pci.c b/arch/powerpc/platforms/powernv/pci.c
index bca2aeb..10bc8c3 100644
--- a/arch/powerpc/platforms/powernv/pci.c
+++ b/arch/powerpc/platforms/powernv/pci.c
@@ -729,6 +729,24 @@ static void pnv_p7ioc_rc_quirk(struct pci_dev *dev)
 }
 DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_IBM, 0x3b9, pnv_p7ioc_rc_quirk);
 
+#ifdef CONFIG_PCI_IOV
+static void pnv_pci_fixup_vf_mps(struct pci_dev *pdev)
+{
+	struct pci_dn *pdn = pci_get_pdn(pdev);
+	int parent_mps;
+
+	if (!pdev->is_virtfn)
+		return;
+
+	/* Synchronize MPS for VF and PF */
+	parent_mps = pcie_get_mps(pdev->physfn);
+	if ((128 << pdev->pcie_mpss) >= parent_mps)
+		pcie_set_mps(pdev, parent_mps);
+	pdn->mps = pcie_get_mps(pdev);
+}
+DECLARE_PCI_FIXUP_HEADER(PCI_ANY_ID, PCI_ANY_ID, pnv_pci_fixup_vf_mps);
+#endif /* CONFIG_PCI_IOV */
+
 void __init pnv_pci_init(void)
 {
 	struct device_node *np;
-- 
1.7.9.5


^ permalink raw reply related	[flat|nested] 68+ messages in thread

* [PATCH V7 08/10] powerpc/powernv: Support PCI config restore for VFs
@ 2015-05-19 10:50     ` Wei Yang
  0 siblings, 0 replies; 68+ messages in thread
From: Wei Yang @ 2015-05-19 10:50 UTC (permalink / raw)
  To: gwshan, bhelgaas; +Cc: linux-pci, Wei Yang, linuxppc-dev

After PE reset, OPAL API opal_pci_reinit() is called on all devices
contained in the PE to reinitialize them. However, VFs can't be seen
from skiboot firmware. We have to implement the functions, similar
those in skiboot firmware, to reinitialize VFs after reset on PE
for VFs.

[gwshan: changelog and code refactoring]
Signed-off-by: Wei Yang <weiyang@linux.vnet.ibm.com>
Acked-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
---
 arch/powerpc/include/asm/pci-bridge.h        |    1 +
 arch/powerpc/platforms/powernv/eeh-powernv.c |   70 +++++++++++++++++++++++++-
 arch/powerpc/platforms/powernv/pci.c         |   18 +++++++
 3 files changed, 88 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/include/asm/pci-bridge.h b/arch/powerpc/include/asm/pci-bridge.h
index c324882..ad60263 100644
--- a/arch/powerpc/include/asm/pci-bridge.h
+++ b/arch/powerpc/include/asm/pci-bridge.h
@@ -206,6 +206,7 @@ struct pci_dn {
 #define IODA_INVALID_M64        (-1)
 	int     m64_wins[PCI_SRIOV_NUM_BARS][M64_PER_IOV];
 #endif /* CONFIG_PCI_IOV */
+	int	mps;
 #endif
 	struct list_head child_list;
 	struct list_head list;
diff --git a/arch/powerpc/platforms/powernv/eeh-powernv.c b/arch/powerpc/platforms/powernv/eeh-powernv.c
index 7af3c1e..33deb78 100644
--- a/arch/powerpc/platforms/powernv/eeh-powernv.c
+++ b/arch/powerpc/platforms/powernv/eeh-powernv.c
@@ -1612,6 +1612,67 @@ static int pnv_eeh_next_error(struct eeh_pe **pe)
 	return ret;
 }
 
+static int pnv_eeh_restore_vf_config(struct pci_dn *pdn)
+{
+	struct eeh_dev *edev = pdn_to_eeh_dev(pdn);
+	u32 devctl, cmd, cap2, aer_capctl;
+	int old_mps;
+
+	/* Restore MPS */
+	if (edev->pcie_cap) {
+		old_mps = (ffs(pdn->mps) - 8) << 5;
+		eeh_ops->read_config(pdn, edev->pcie_cap + PCI_EXP_DEVCTL,
+				     2, &devctl);
+		devctl &= ~PCI_EXP_DEVCTL_PAYLOAD;
+		devctl |= old_mps;
+		eeh_ops->write_config(pdn, edev->pcie_cap + PCI_EXP_DEVCTL,
+				      2, devctl);
+	}
+
+	/* Disable Completion Timeout */
+	if (edev->pcie_cap) {
+		eeh_ops->read_config(pdn, edev->pcie_cap + PCI_EXP_DEVCAP2,
+				     4, &cap2);
+		if (cap2 & 0x10) {
+			eeh_ops->read_config(pdn,
+					edev->pcie_cap + PCI_EXP_DEVCTL2,
+					4, &cap2);
+			cap2 |= 0x10;
+			eeh_ops->write_config(pdn,
+					edev->pcie_cap + PCI_EXP_DEVCTL2,
+					4, cap2);
+		}
+	}
+
+	/* Enable SERR and parity checking */
+	eeh_ops->read_config(pdn, PCI_COMMAND, 2, &cmd);
+	cmd |= (PCI_COMMAND_PARITY | PCI_COMMAND_SERR);
+	eeh_ops->write_config(pdn, PCI_COMMAND, 2, cmd);
+
+	/* Enable report various errors */
+	if (edev->pcie_cap) {
+		eeh_ops->read_config(pdn, edev->pcie_cap + PCI_EXP_DEVCTL,
+				2, &devctl);
+		devctl &= ~PCI_EXP_DEVCTL_CERE;
+		devctl |= (PCI_EXP_DEVCTL_NFERE |
+			   PCI_EXP_DEVCTL_FERE |
+			   PCI_EXP_DEVCTL_URRE);
+		eeh_ops->write_config(pdn, edev->pcie_cap + PCI_EXP_DEVCTL,
+				2, devctl);
+	}
+
+	/* Enable ECRC generation and check */
+	if (edev->pcie_cap && edev->aer_cap) {
+		eeh_ops->read_config(pdn, edev->aer_cap + PCI_ERR_CAP,
+				4, &aer_capctl);
+		aer_capctl |= (PCI_ERR_CAP_ECRC_GENE | PCI_ERR_CAP_ECRC_CHKE);
+		eeh_ops->write_config(pdn, edev->aer_cap + PCI_ERR_CAP,
+				4, aer_capctl);
+	}
+
+	return 0;
+}
+
 static int pnv_eeh_restore_config(struct pci_dn *pdn)
 {
 	struct eeh_dev *edev = pdn_to_eeh_dev(pdn);
@@ -1622,7 +1683,14 @@ static int pnv_eeh_restore_config(struct pci_dn *pdn)
 		return -EEXIST;
 
 	phb = edev->phb->private_data;
-	ret = opal_pci_reinit(phb->opal_id,
+	/*
+	 * We have to restore the PCI config space after reset since the
+	 * firmware can't see SRIOV VFs.
+	 */
+	if (edev->physfn)
+		ret = pnv_eeh_restore_vf_config(pdn);
+	else
+		ret = opal_pci_reinit(phb->opal_id,
 			      OPAL_REINIT_PCI_DEV, edev->config_addr);
 	if (ret) {
 		pr_warn("%s: Can't reinit PCI dev 0x%x (%lld)\n",
diff --git a/arch/powerpc/platforms/powernv/pci.c b/arch/powerpc/platforms/powernv/pci.c
index bca2aeb..10bc8c3 100644
--- a/arch/powerpc/platforms/powernv/pci.c
+++ b/arch/powerpc/platforms/powernv/pci.c
@@ -729,6 +729,24 @@ static void pnv_p7ioc_rc_quirk(struct pci_dev *dev)
 }
 DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_IBM, 0x3b9, pnv_p7ioc_rc_quirk);
 
+#ifdef CONFIG_PCI_IOV
+static void pnv_pci_fixup_vf_mps(struct pci_dev *pdev)
+{
+	struct pci_dn *pdn = pci_get_pdn(pdev);
+	int parent_mps;
+
+	if (!pdev->is_virtfn)
+		return;
+
+	/* Synchronize MPS for VF and PF */
+	parent_mps = pcie_get_mps(pdev->physfn);
+	if ((128 << pdev->pcie_mpss) >= parent_mps)
+		pcie_set_mps(pdev, parent_mps);
+	pdn->mps = pcie_get_mps(pdev);
+}
+DECLARE_PCI_FIXUP_HEADER(PCI_ANY_ID, PCI_ANY_ID, pnv_pci_fixup_vf_mps);
+#endif /* CONFIG_PCI_IOV */
+
 void __init pnv_pci_init(void)
 {
 	struct device_node *np;
-- 
1.7.9.5

^ permalink raw reply related	[flat|nested] 68+ messages in thread

* [PATCH V7 09/10] powerpc/eeh: Support error recovery for VF PE
  2015-05-19 10:50   ` Wei Yang
@ 2015-05-19 10:50     ` Wei Yang
  -1 siblings, 0 replies; 68+ messages in thread
From: Wei Yang @ 2015-05-19 10:50 UTC (permalink / raw)
  To: gwshan, bhelgaas; +Cc: linuxppc-dev, linux-pci, Wei Yang

Different from PCI bus dependent PE, PE for VFs doesn't have the
primary bus, on which the PCI hotplug is implemented. The patch
supports error recovery, especially the PCI hotplug for VF's PE.
The hotplug on VF's PE is implemented based on VFs, instead of
PCI bus any more.

[gwshan: changelog and code refactoring]
Signed-off-by: Wei Yang <weiyang@linux.vnet.ibm.com>
Acked-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
---
 arch/powerpc/include/asm/eeh.h   |    1 +
 arch/powerpc/kernel/eeh.c        |    8 +++
 arch/powerpc/kernel/eeh_driver.c |  100 ++++++++++++++++++++++++++++++--------
 arch/powerpc/kernel/eeh_pe.c     |    3 +-
 4 files changed, 90 insertions(+), 22 deletions(-)

diff --git a/arch/powerpc/include/asm/eeh.h b/arch/powerpc/include/asm/eeh.h
index 3d64cf3..d24382c 100644
--- a/arch/powerpc/include/asm/eeh.h
+++ b/arch/powerpc/include/asm/eeh.h
@@ -140,6 +140,7 @@ struct eeh_dev {
 	struct pci_controller *phb;	/* Associated PHB		*/
 	struct pci_dn *pdn;		/* Associated PCI device node	*/
 	struct pci_dev *pdev;		/* Associated PCI device	*/
+	int    in_error;		/* Error flag for eeh_dev	*/
 	struct pci_dev *physfn;		/* Associated PF PORT		*/
 	struct pci_bus *bus;		/* PCI bus for partial hotplug	*/
 };
diff --git a/arch/powerpc/kernel/eeh.c b/arch/powerpc/kernel/eeh.c
index 9ee61d1..1207547 100644
--- a/arch/powerpc/kernel/eeh.c
+++ b/arch/powerpc/kernel/eeh.c
@@ -1229,6 +1229,14 @@ void eeh_remove_device(struct pci_dev *dev)
 	 * from the parent PE during the BAR resotre.
 	 */
 	edev->pdev = NULL;
+
+	/*
+	 * The flag "in_error" is used to trace EEH devices for VFs
+	 * in error state or not. It's set in eeh_report_error(). If
+	 * it's not set, eeh_report_{reset,resume}() won't be called
+	 * for the VF EEH device.
+	 */
+	edev->in_error = 0;
 	dev->dev.archdata.edev = NULL;
 	if (!(edev->pe->state & EEH_PE_KEEP))
 		eeh_rmv_from_parent_pe(edev);
diff --git a/arch/powerpc/kernel/eeh_driver.c b/arch/powerpc/kernel/eeh_driver.c
index 24768ff..63a2c33 100644
--- a/arch/powerpc/kernel/eeh_driver.c
+++ b/arch/powerpc/kernel/eeh_driver.c
@@ -211,6 +211,7 @@ static void *eeh_report_error(void *data, void *userdata)
 	if (rc == PCI_ERS_RESULT_NEED_RESET) *res = rc;
 	if (*res == PCI_ERS_RESULT_NONE) *res = rc;
 
+	edev->in_error = 1;
 	eeh_pcid_put(dev);
 	return NULL;
 }
@@ -282,7 +283,8 @@ static void *eeh_report_reset(void *data, void *userdata)
 
 	if (!driver->err_handler ||
 	    !driver->err_handler->slot_reset ||
-	    (edev->mode & EEH_DEV_NO_HANDLER)) {
+	    (edev->mode & EEH_DEV_NO_HANDLER) ||
+	    (!edev->in_error)) {
 		eeh_pcid_put(dev);
 		return NULL;
 	}
@@ -339,14 +341,16 @@ static void *eeh_report_resume(void *data, void *userdata)
 
 	if (!driver->err_handler ||
 	    !driver->err_handler->resume ||
-	    (edev->mode & EEH_DEV_NO_HANDLER)) {
+	    (edev->mode & EEH_DEV_NO_HANDLER) ||
+	    (!edev->in_error)) {
 		edev->mode &= ~EEH_DEV_NO_HANDLER;
-		eeh_pcid_put(dev);
-		return NULL;
+		goto out;
 	}
 
 	driver->err_handler->resume(dev);
 
+out:
+	edev->in_error = 0;
 	eeh_pcid_put(dev);
 	return NULL;
 }
@@ -386,12 +390,38 @@ static void *eeh_report_failure(void *data, void *userdata)
 	return NULL;
 }
 
+static void *eeh_add_virt_device(void *data, void *userdata)
+{
+	struct pci_driver *driver;
+	struct eeh_dev *edev = (struct eeh_dev *)data;
+	struct pci_dev *dev = eeh_dev_to_pci_dev(edev);
+	struct pci_dn *pdn = eeh_dev_to_pdn(edev);
+
+	if (!(edev->physfn)) {
+		pr_warn("%s: EEH dev %04x:%02x:%02x.%01x not for VF\n",
+			__func__, edev->phb->global_number, pdn->busno,
+			PCI_SLOT(pdn->devfn), PCI_FUNC(pdn->devfn));
+		return NULL;
+	}
+
+	driver = eeh_pcid_get(dev);
+	if (driver) {
+		eeh_pcid_put(dev);
+		if (driver->err_handler)
+			return NULL;
+	}
+
+	pci_iov_virtfn_add(edev->physfn, pdn->vf_index, 0);
+	return NULL;
+}
+
 static void *eeh_rmv_device(void *data, void *userdata)
 {
 	struct pci_driver *driver;
 	struct eeh_dev *edev = (struct eeh_dev *)data;
 	struct pci_dev *dev = eeh_dev_to_pci_dev(edev);
 	int *removed = (int *)userdata;
+	struct pci_dn *pdn = eeh_dev_to_pdn(edev);
 
 	/*
 	 * Actually, we should remove the PCI bridges as well.
@@ -416,7 +446,7 @@ static void *eeh_rmv_device(void *data, void *userdata)
 	driver = eeh_pcid_get(dev);
 	if (driver) {
 		eeh_pcid_put(dev);
-		if (driver->err_handler)
+		if (removed && driver->err_handler)
 			return NULL;
 	}
 
@@ -425,11 +455,23 @@ static void *eeh_rmv_device(void *data, void *userdata)
 		 pci_name(dev));
 	edev->bus = dev->bus;
 	edev->mode |= EEH_DEV_DISCONNECTED;
-	(*removed)++;
+	if (removed)
+		(*removed)++;
 
-	pci_lock_rescan_remove();
-	pci_stop_and_remove_bus_device(dev);
-	pci_unlock_rescan_remove();
+	if (edev->physfn) {
+		pci_iov_virtfn_remove(edev->physfn, pdn->vf_index, 0);
+		edev->pdev = NULL;
+
+		/*
+		 * We have to set the VF PE number to invalid one, which is
+		 * required to plug the VF successfully.
+		 */
+		pdn->pe_number = IODA_INVALID_PE;
+	} else {
+		pci_lock_rescan_remove();
+		pci_stop_and_remove_bus_device(dev);
+		pci_unlock_rescan_remove();
+	}
 
 	return NULL;
 }
@@ -548,6 +590,7 @@ static int eeh_reset_device(struct eeh_pe *pe, struct pci_bus *bus)
 	struct pci_bus *frozen_bus = eeh_pe_bus_get(pe);
 	struct timeval tstamp;
 	int cnt, rc, removed = 0;
+	struct eeh_dev *edev;
 
 	/* pcibios will clear the counter; save the value */
 	cnt = pe->freeze_count;
@@ -561,12 +604,15 @@ static int eeh_reset_device(struct eeh_pe *pe, struct pci_bus *bus)
 	 */
 	eeh_pe_state_mark(pe, EEH_PE_KEEP);
 	if (bus) {
-		pci_lock_rescan_remove();
-		pcibios_remove_pci_devices(bus);
-		pci_unlock_rescan_remove();
-	} else if (frozen_bus) {
+		if (pe->type & EEH_PE_VF)
+			eeh_pe_dev_traverse(pe, eeh_rmv_device, NULL);
+		else {
+			pci_lock_rescan_remove();
+			pcibios_remove_pci_devices(bus);
+			pci_unlock_rescan_remove();
+		}
+	} else if (frozen_bus)
 		eeh_pe_dev_traverse(pe, eeh_rmv_device, &removed);
-	}
 
 	/*
 	 * Reset the pci controller. (Asserts RST#; resets config space).
@@ -607,14 +653,22 @@ static int eeh_reset_device(struct eeh_pe *pe, struct pci_bus *bus)
 		 * PE. We should disconnect it so the binding can be
 		 * rebuilt when adding PCI devices.
 		 */
+		edev = list_first_entry(&pe->edevs, struct eeh_dev, list);
 		eeh_pe_traverse(pe, eeh_pe_detach_dev, NULL);
-		pcibios_add_pci_devices(bus);
+		if (pe->type & EEH_PE_VF)
+			eeh_add_virt_device(edev, NULL);
+		else
+			pcibios_add_pci_devices(bus);
 	} else if (frozen_bus && removed) {
 		pr_info("EEH: Sleep 5s ahead of partial hotplug\n");
 		ssleep(5);
 
+		edev = list_first_entry(&pe->edevs, struct eeh_dev, list);
 		eeh_pe_traverse(pe, eeh_pe_detach_dev, NULL);
-		pcibios_add_pci_devices(frozen_bus);
+		if (pe->type & EEH_PE_VF)
+			eeh_add_virt_device(edev, NULL);
+		else
+			pcibios_add_pci_devices(frozen_bus);
 	}
 	eeh_pe_state_clear(pe, EEH_PE_KEEP);
 
@@ -792,11 +846,15 @@ perm_error:
 	 * the their PCI config any more.
 	 */
 	if (frozen_bus) {
-		eeh_pe_dev_mode_mark(pe, EEH_DEV_REMOVED);
-
-		pci_lock_rescan_remove();
-		pcibios_remove_pci_devices(frozen_bus);
-		pci_unlock_rescan_remove();
+		if (pe->type & EEH_PE_VF) {
+			eeh_pe_dev_traverse(pe, eeh_rmv_device, NULL);
+			eeh_pe_dev_mode_mark(pe, EEH_DEV_REMOVED);
+		} else {
+			eeh_pe_dev_mode_mark(pe, EEH_DEV_REMOVED);
+			pci_lock_rescan_remove();
+			pcibios_remove_pci_devices(frozen_bus);
+			pci_unlock_rescan_remove();
+		}
 	}
 }
 
diff --git a/arch/powerpc/kernel/eeh_pe.c b/arch/powerpc/kernel/eeh_pe.c
index 260a701..5cde950 100644
--- a/arch/powerpc/kernel/eeh_pe.c
+++ b/arch/powerpc/kernel/eeh_pe.c
@@ -914,7 +914,8 @@ struct pci_bus *eeh_pe_bus_get(struct eeh_pe *pe)
 	if (pe->type & EEH_PE_PHB) {
 		bus = pe->phb->bus;
 	} else if (pe->type & EEH_PE_BUS ||
-		   pe->type & EEH_PE_DEVICE) {
+		   pe->type & EEH_PE_DEVICE ||
+		   pe->type & EEH_PE_VF) {
 		if (pe->bus) {
 			bus = pe->bus;
 			goto out;
-- 
1.7.9.5


^ permalink raw reply related	[flat|nested] 68+ messages in thread

* [PATCH V7 09/10] powerpc/eeh: Support error recovery for VF PE
@ 2015-05-19 10:50     ` Wei Yang
  0 siblings, 0 replies; 68+ messages in thread
From: Wei Yang @ 2015-05-19 10:50 UTC (permalink / raw)
  To: gwshan, bhelgaas; +Cc: linux-pci, Wei Yang, linuxppc-dev

Different from PCI bus dependent PE, PE for VFs doesn't have the
primary bus, on which the PCI hotplug is implemented. The patch
supports error recovery, especially the PCI hotplug for VF's PE.
The hotplug on VF's PE is implemented based on VFs, instead of
PCI bus any more.

[gwshan: changelog and code refactoring]
Signed-off-by: Wei Yang <weiyang@linux.vnet.ibm.com>
Acked-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
---
 arch/powerpc/include/asm/eeh.h   |    1 +
 arch/powerpc/kernel/eeh.c        |    8 +++
 arch/powerpc/kernel/eeh_driver.c |  100 ++++++++++++++++++++++++++++++--------
 arch/powerpc/kernel/eeh_pe.c     |    3 +-
 4 files changed, 90 insertions(+), 22 deletions(-)

diff --git a/arch/powerpc/include/asm/eeh.h b/arch/powerpc/include/asm/eeh.h
index 3d64cf3..d24382c 100644
--- a/arch/powerpc/include/asm/eeh.h
+++ b/arch/powerpc/include/asm/eeh.h
@@ -140,6 +140,7 @@ struct eeh_dev {
 	struct pci_controller *phb;	/* Associated PHB		*/
 	struct pci_dn *pdn;		/* Associated PCI device node	*/
 	struct pci_dev *pdev;		/* Associated PCI device	*/
+	int    in_error;		/* Error flag for eeh_dev	*/
 	struct pci_dev *physfn;		/* Associated PF PORT		*/
 	struct pci_bus *bus;		/* PCI bus for partial hotplug	*/
 };
diff --git a/arch/powerpc/kernel/eeh.c b/arch/powerpc/kernel/eeh.c
index 9ee61d1..1207547 100644
--- a/arch/powerpc/kernel/eeh.c
+++ b/arch/powerpc/kernel/eeh.c
@@ -1229,6 +1229,14 @@ void eeh_remove_device(struct pci_dev *dev)
 	 * from the parent PE during the BAR resotre.
 	 */
 	edev->pdev = NULL;
+
+	/*
+	 * The flag "in_error" is used to trace EEH devices for VFs
+	 * in error state or not. It's set in eeh_report_error(). If
+	 * it's not set, eeh_report_{reset,resume}() won't be called
+	 * for the VF EEH device.
+	 */
+	edev->in_error = 0;
 	dev->dev.archdata.edev = NULL;
 	if (!(edev->pe->state & EEH_PE_KEEP))
 		eeh_rmv_from_parent_pe(edev);
diff --git a/arch/powerpc/kernel/eeh_driver.c b/arch/powerpc/kernel/eeh_driver.c
index 24768ff..63a2c33 100644
--- a/arch/powerpc/kernel/eeh_driver.c
+++ b/arch/powerpc/kernel/eeh_driver.c
@@ -211,6 +211,7 @@ static void *eeh_report_error(void *data, void *userdata)
 	if (rc == PCI_ERS_RESULT_NEED_RESET) *res = rc;
 	if (*res == PCI_ERS_RESULT_NONE) *res = rc;
 
+	edev->in_error = 1;
 	eeh_pcid_put(dev);
 	return NULL;
 }
@@ -282,7 +283,8 @@ static void *eeh_report_reset(void *data, void *userdata)
 
 	if (!driver->err_handler ||
 	    !driver->err_handler->slot_reset ||
-	    (edev->mode & EEH_DEV_NO_HANDLER)) {
+	    (edev->mode & EEH_DEV_NO_HANDLER) ||
+	    (!edev->in_error)) {
 		eeh_pcid_put(dev);
 		return NULL;
 	}
@@ -339,14 +341,16 @@ static void *eeh_report_resume(void *data, void *userdata)
 
 	if (!driver->err_handler ||
 	    !driver->err_handler->resume ||
-	    (edev->mode & EEH_DEV_NO_HANDLER)) {
+	    (edev->mode & EEH_DEV_NO_HANDLER) ||
+	    (!edev->in_error)) {
 		edev->mode &= ~EEH_DEV_NO_HANDLER;
-		eeh_pcid_put(dev);
-		return NULL;
+		goto out;
 	}
 
 	driver->err_handler->resume(dev);
 
+out:
+	edev->in_error = 0;
 	eeh_pcid_put(dev);
 	return NULL;
 }
@@ -386,12 +390,38 @@ static void *eeh_report_failure(void *data, void *userdata)
 	return NULL;
 }
 
+static void *eeh_add_virt_device(void *data, void *userdata)
+{
+	struct pci_driver *driver;
+	struct eeh_dev *edev = (struct eeh_dev *)data;
+	struct pci_dev *dev = eeh_dev_to_pci_dev(edev);
+	struct pci_dn *pdn = eeh_dev_to_pdn(edev);
+
+	if (!(edev->physfn)) {
+		pr_warn("%s: EEH dev %04x:%02x:%02x.%01x not for VF\n",
+			__func__, edev->phb->global_number, pdn->busno,
+			PCI_SLOT(pdn->devfn), PCI_FUNC(pdn->devfn));
+		return NULL;
+	}
+
+	driver = eeh_pcid_get(dev);
+	if (driver) {
+		eeh_pcid_put(dev);
+		if (driver->err_handler)
+			return NULL;
+	}
+
+	pci_iov_virtfn_add(edev->physfn, pdn->vf_index, 0);
+	return NULL;
+}
+
 static void *eeh_rmv_device(void *data, void *userdata)
 {
 	struct pci_driver *driver;
 	struct eeh_dev *edev = (struct eeh_dev *)data;
 	struct pci_dev *dev = eeh_dev_to_pci_dev(edev);
 	int *removed = (int *)userdata;
+	struct pci_dn *pdn = eeh_dev_to_pdn(edev);
 
 	/*
 	 * Actually, we should remove the PCI bridges as well.
@@ -416,7 +446,7 @@ static void *eeh_rmv_device(void *data, void *userdata)
 	driver = eeh_pcid_get(dev);
 	if (driver) {
 		eeh_pcid_put(dev);
-		if (driver->err_handler)
+		if (removed && driver->err_handler)
 			return NULL;
 	}
 
@@ -425,11 +455,23 @@ static void *eeh_rmv_device(void *data, void *userdata)
 		 pci_name(dev));
 	edev->bus = dev->bus;
 	edev->mode |= EEH_DEV_DISCONNECTED;
-	(*removed)++;
+	if (removed)
+		(*removed)++;
 
-	pci_lock_rescan_remove();
-	pci_stop_and_remove_bus_device(dev);
-	pci_unlock_rescan_remove();
+	if (edev->physfn) {
+		pci_iov_virtfn_remove(edev->physfn, pdn->vf_index, 0);
+		edev->pdev = NULL;
+
+		/*
+		 * We have to set the VF PE number to invalid one, which is
+		 * required to plug the VF successfully.
+		 */
+		pdn->pe_number = IODA_INVALID_PE;
+	} else {
+		pci_lock_rescan_remove();
+		pci_stop_and_remove_bus_device(dev);
+		pci_unlock_rescan_remove();
+	}
 
 	return NULL;
 }
@@ -548,6 +590,7 @@ static int eeh_reset_device(struct eeh_pe *pe, struct pci_bus *bus)
 	struct pci_bus *frozen_bus = eeh_pe_bus_get(pe);
 	struct timeval tstamp;
 	int cnt, rc, removed = 0;
+	struct eeh_dev *edev;
 
 	/* pcibios will clear the counter; save the value */
 	cnt = pe->freeze_count;
@@ -561,12 +604,15 @@ static int eeh_reset_device(struct eeh_pe *pe, struct pci_bus *bus)
 	 */
 	eeh_pe_state_mark(pe, EEH_PE_KEEP);
 	if (bus) {
-		pci_lock_rescan_remove();
-		pcibios_remove_pci_devices(bus);
-		pci_unlock_rescan_remove();
-	} else if (frozen_bus) {
+		if (pe->type & EEH_PE_VF)
+			eeh_pe_dev_traverse(pe, eeh_rmv_device, NULL);
+		else {
+			pci_lock_rescan_remove();
+			pcibios_remove_pci_devices(bus);
+			pci_unlock_rescan_remove();
+		}
+	} else if (frozen_bus)
 		eeh_pe_dev_traverse(pe, eeh_rmv_device, &removed);
-	}
 
 	/*
 	 * Reset the pci controller. (Asserts RST#; resets config space).
@@ -607,14 +653,22 @@ static int eeh_reset_device(struct eeh_pe *pe, struct pci_bus *bus)
 		 * PE. We should disconnect it so the binding can be
 		 * rebuilt when adding PCI devices.
 		 */
+		edev = list_first_entry(&pe->edevs, struct eeh_dev, list);
 		eeh_pe_traverse(pe, eeh_pe_detach_dev, NULL);
-		pcibios_add_pci_devices(bus);
+		if (pe->type & EEH_PE_VF)
+			eeh_add_virt_device(edev, NULL);
+		else
+			pcibios_add_pci_devices(bus);
 	} else if (frozen_bus && removed) {
 		pr_info("EEH: Sleep 5s ahead of partial hotplug\n");
 		ssleep(5);
 
+		edev = list_first_entry(&pe->edevs, struct eeh_dev, list);
 		eeh_pe_traverse(pe, eeh_pe_detach_dev, NULL);
-		pcibios_add_pci_devices(frozen_bus);
+		if (pe->type & EEH_PE_VF)
+			eeh_add_virt_device(edev, NULL);
+		else
+			pcibios_add_pci_devices(frozen_bus);
 	}
 	eeh_pe_state_clear(pe, EEH_PE_KEEP);
 
@@ -792,11 +846,15 @@ perm_error:
 	 * the their PCI config any more.
 	 */
 	if (frozen_bus) {
-		eeh_pe_dev_mode_mark(pe, EEH_DEV_REMOVED);
-
-		pci_lock_rescan_remove();
-		pcibios_remove_pci_devices(frozen_bus);
-		pci_unlock_rescan_remove();
+		if (pe->type & EEH_PE_VF) {
+			eeh_pe_dev_traverse(pe, eeh_rmv_device, NULL);
+			eeh_pe_dev_mode_mark(pe, EEH_DEV_REMOVED);
+		} else {
+			eeh_pe_dev_mode_mark(pe, EEH_DEV_REMOVED);
+			pci_lock_rescan_remove();
+			pcibios_remove_pci_devices(frozen_bus);
+			pci_unlock_rescan_remove();
+		}
 	}
 }
 
diff --git a/arch/powerpc/kernel/eeh_pe.c b/arch/powerpc/kernel/eeh_pe.c
index 260a701..5cde950 100644
--- a/arch/powerpc/kernel/eeh_pe.c
+++ b/arch/powerpc/kernel/eeh_pe.c
@@ -914,7 +914,8 @@ struct pci_bus *eeh_pe_bus_get(struct eeh_pe *pe)
 	if (pe->type & EEH_PE_PHB) {
 		bus = pe->phb->bus;
 	} else if (pe->type & EEH_PE_BUS ||
-		   pe->type & EEH_PE_DEVICE) {
+		   pe->type & EEH_PE_DEVICE ||
+		   pe->type & EEH_PE_VF) {
 		if (pe->bus) {
 			bus = pe->bus;
 			goto out;
-- 
1.7.9.5

^ permalink raw reply related	[flat|nested] 68+ messages in thread

* [PATCH V7 10/10] powerpc/powernv: compound PE for VFs
  2015-05-19 10:50   ` Wei Yang
@ 2015-05-19 10:50     ` Wei Yang
  -1 siblings, 0 replies; 68+ messages in thread
From: Wei Yang @ 2015-05-19 10:50 UTC (permalink / raw)
  To: gwshan, bhelgaas; +Cc: linuxppc-dev, linux-pci, Wei Yang

When VF BAR size is larger than 64MB, we group VFs in terms of M64 BAR,
which means those VFs in a group should form a compound PE.

This patch links those VF PEs into compound PE in this case.

[gwshan: code refactoring for a bit]
Signed-off-by: Wei Yang <weiyang@linux.vnet.ibm.com>
Acked-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
---
 arch/powerpc/platforms/powernv/pci-ioda.c |   46 +++++++++++++++++++++++++----
 arch/powerpc/platforms/powernv/pci.c      |   17 +++++++++--
 2 files changed, 56 insertions(+), 7 deletions(-)

diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c b/arch/powerpc/platforms/powernv/pci-ioda.c
index f8bc950..56e7b65 100644
--- a/arch/powerpc/platforms/powernv/pci-ioda.c
+++ b/arch/powerpc/platforms/powernv/pci-ioda.c
@@ -1363,9 +1363,20 @@ static void pnv_ioda_release_vf_PE(struct pci_dev *pdev, u16 num_vfs)
 	}
 
 	list_for_each_entry_safe(pe, pe_n, &phb->ioda.pe_list, list) {
+		struct pnv_ioda_pe *s, *sn;
 		if (pe->parent_dev != pdev)
 			continue;
 
+		if ((pe->flags & PNV_IODA_PE_MASTER) &&
+		    (pe->flags & PNV_IODA_PE_VF)) {
+			list_for_each_entry_safe(s, sn, &pe->slaves, list) {
+				pnv_pci_ioda2_release_dma_pe(pdev, s);
+				list_del(&s->list);
+				pnv_ioda_deconfigure_pe(phb, s);
+				pnv_ioda_free_pe(phb, s->pe_number);
+			}
+		}
+
 		pnv_pci_ioda2_release_dma_pe(pdev, pe);
 
 		/* Remove from list */
@@ -1418,7 +1429,7 @@ static void pnv_ioda_setup_vf_PE(struct pci_dev *pdev, u16 num_vfs)
 	struct pci_bus        *bus;
 	struct pci_controller *hose;
 	struct pnv_phb        *phb;
-	struct pnv_ioda_pe    *pe;
+	struct pnv_ioda_pe    *pe, *master_pe;
 	int                    pe_num;
 	u16                    vf_index;
 	struct pci_dn         *pdn;
@@ -1464,10 +1475,13 @@ static void pnv_ioda_setup_vf_PE(struct pci_dev *pdev, u16 num_vfs)
 				GFP_KERNEL, hose->node);
 		pe->tce32_table->data = pe;
 
-		/* Put PE to the list */
-		mutex_lock(&phb->ioda.pe_list_mutex);
-		list_add_tail(&pe->list, &phb->ioda.pe_list);
-		mutex_unlock(&phb->ioda.pe_list_mutex);
+		/* Put PE to the list, or postpone it for compound PEs */
+		if ((pdn->m64_per_iov != M64_PER_IOV) ||
+		    (num_vfs <= M64_PER_IOV)) {
+			mutex_lock(&phb->ioda.pe_list_mutex);
+			list_add_tail(&pe->list, &phb->ioda.pe_list);
+			mutex_unlock(&phb->ioda.pe_list_mutex);
+		}
 
 		pnv_pci_ioda2_setup_dma_pe(phb, pe);
 	}
@@ -1480,10 +1494,32 @@ static void pnv_ioda_setup_vf_PE(struct pci_dev *pdev, u16 num_vfs)
 		vf_per_group = roundup_pow_of_two(num_vfs) / pdn->m64_per_iov;
 
 		for (vf_group = 0; vf_group < M64_PER_IOV; vf_group++) {
+			master_pe = NULL;
+
 			for (vf_index = vf_group * vf_per_group;
 			     vf_index < (vf_group + 1) * vf_per_group &&
 			     vf_index < num_vfs;
 			     vf_index++) {
+
+				/*
+				 * Figure out the master PE and put all slave
+				 * PEs to master PE's list.
+				 */
+				pe = &phb->ioda.pe_array[pdn->offset + vf_index];
+				if (!master_pe) {
+					pe->flags |= PNV_IODA_PE_MASTER;
+					INIT_LIST_HEAD(&pe->slaves);
+					master_pe = pe;
+					mutex_lock(&phb->ioda.pe_list_mutex);
+					list_add_tail(&pe->list, &phb->ioda.pe_list);
+					mutex_unlock(&phb->ioda.pe_list_mutex);
+				} else {
+					pe->flags |= PNV_IODA_PE_SLAVE;
+					pe->master = master_pe;
+					list_add_tail(&pe->list,
+						&master_pe->slaves);
+				}
+
 				for (vf_index1 = vf_group * vf_per_group;
 				     vf_index1 < (vf_group + 1) * vf_per_group &&
 				     vf_index1 < num_vfs;
diff --git a/arch/powerpc/platforms/powernv/pci.c b/arch/powerpc/platforms/powernv/pci.c
index 10bc8c3..717a58e 100644
--- a/arch/powerpc/platforms/powernv/pci.c
+++ b/arch/powerpc/platforms/powernv/pci.c
@@ -667,7 +667,7 @@ static void pnv_pci_dma_dev_setup(struct pci_dev *pdev)
 	struct pci_controller *hose = pci_bus_to_host(pdev->bus);
 	struct pnv_phb *phb = hose->private_data;
 #ifdef CONFIG_PCI_IOV
-	struct pnv_ioda_pe *pe;
+	struct pnv_ioda_pe *pe, *slave;
 	struct pci_dn *pdn;
 
 	/* Fix the VF pdn PE number */
@@ -679,10 +679,23 @@ static void pnv_pci_dma_dev_setup(struct pci_dev *pdev)
 			    (pdev->devfn & 0xff))) {
 				pdn->pe_number = pe->pe_number;
 				pe->pdev = pdev;
-				break;
+				goto found;
+			}
+
+			if ((pe->flags & PNV_IODA_PE_MASTER) &&
+			    (pe->flags & PNV_IODA_PE_VF)) {
+				list_for_each_entry(slave, &pe->slaves, list) {
+					if (slave->rid == ((pdev->bus->number << 8)
+					   | (pdev->devfn & 0xff))) {
+						pdn->pe_number = slave->pe_number;
+						slave->pdev = pdev;
+						goto found;
+					}
+				}
 			}
 		}
 	}
+found:
 #endif /* CONFIG_PCI_IOV */
 
 	if (phb && phb->dma_dev_setup)
-- 
1.7.9.5


^ permalink raw reply related	[flat|nested] 68+ messages in thread

* [PATCH V7 10/10] powerpc/powernv: compound PE for VFs
@ 2015-05-19 10:50     ` Wei Yang
  0 siblings, 0 replies; 68+ messages in thread
From: Wei Yang @ 2015-05-19 10:50 UTC (permalink / raw)
  To: gwshan, bhelgaas; +Cc: linux-pci, Wei Yang, linuxppc-dev

When VF BAR size is larger than 64MB, we group VFs in terms of M64 BAR,
which means those VFs in a group should form a compound PE.

This patch links those VF PEs into compound PE in this case.

[gwshan: code refactoring for a bit]
Signed-off-by: Wei Yang <weiyang@linux.vnet.ibm.com>
Acked-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
---
 arch/powerpc/platforms/powernv/pci-ioda.c |   46 +++++++++++++++++++++++++----
 arch/powerpc/platforms/powernv/pci.c      |   17 +++++++++--
 2 files changed, 56 insertions(+), 7 deletions(-)

diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c b/arch/powerpc/platforms/powernv/pci-ioda.c
index f8bc950..56e7b65 100644
--- a/arch/powerpc/platforms/powernv/pci-ioda.c
+++ b/arch/powerpc/platforms/powernv/pci-ioda.c
@@ -1363,9 +1363,20 @@ static void pnv_ioda_release_vf_PE(struct pci_dev *pdev, u16 num_vfs)
 	}
 
 	list_for_each_entry_safe(pe, pe_n, &phb->ioda.pe_list, list) {
+		struct pnv_ioda_pe *s, *sn;
 		if (pe->parent_dev != pdev)
 			continue;
 
+		if ((pe->flags & PNV_IODA_PE_MASTER) &&
+		    (pe->flags & PNV_IODA_PE_VF)) {
+			list_for_each_entry_safe(s, sn, &pe->slaves, list) {
+				pnv_pci_ioda2_release_dma_pe(pdev, s);
+				list_del(&s->list);
+				pnv_ioda_deconfigure_pe(phb, s);
+				pnv_ioda_free_pe(phb, s->pe_number);
+			}
+		}
+
 		pnv_pci_ioda2_release_dma_pe(pdev, pe);
 
 		/* Remove from list */
@@ -1418,7 +1429,7 @@ static void pnv_ioda_setup_vf_PE(struct pci_dev *pdev, u16 num_vfs)
 	struct pci_bus        *bus;
 	struct pci_controller *hose;
 	struct pnv_phb        *phb;
-	struct pnv_ioda_pe    *pe;
+	struct pnv_ioda_pe    *pe, *master_pe;
 	int                    pe_num;
 	u16                    vf_index;
 	struct pci_dn         *pdn;
@@ -1464,10 +1475,13 @@ static void pnv_ioda_setup_vf_PE(struct pci_dev *pdev, u16 num_vfs)
 				GFP_KERNEL, hose->node);
 		pe->tce32_table->data = pe;
 
-		/* Put PE to the list */
-		mutex_lock(&phb->ioda.pe_list_mutex);
-		list_add_tail(&pe->list, &phb->ioda.pe_list);
-		mutex_unlock(&phb->ioda.pe_list_mutex);
+		/* Put PE to the list, or postpone it for compound PEs */
+		if ((pdn->m64_per_iov != M64_PER_IOV) ||
+		    (num_vfs <= M64_PER_IOV)) {
+			mutex_lock(&phb->ioda.pe_list_mutex);
+			list_add_tail(&pe->list, &phb->ioda.pe_list);
+			mutex_unlock(&phb->ioda.pe_list_mutex);
+		}
 
 		pnv_pci_ioda2_setup_dma_pe(phb, pe);
 	}
@@ -1480,10 +1494,32 @@ static void pnv_ioda_setup_vf_PE(struct pci_dev *pdev, u16 num_vfs)
 		vf_per_group = roundup_pow_of_two(num_vfs) / pdn->m64_per_iov;
 
 		for (vf_group = 0; vf_group < M64_PER_IOV; vf_group++) {
+			master_pe = NULL;
+
 			for (vf_index = vf_group * vf_per_group;
 			     vf_index < (vf_group + 1) * vf_per_group &&
 			     vf_index < num_vfs;
 			     vf_index++) {
+
+				/*
+				 * Figure out the master PE and put all slave
+				 * PEs to master PE's list.
+				 */
+				pe = &phb->ioda.pe_array[pdn->offset + vf_index];
+				if (!master_pe) {
+					pe->flags |= PNV_IODA_PE_MASTER;
+					INIT_LIST_HEAD(&pe->slaves);
+					master_pe = pe;
+					mutex_lock(&phb->ioda.pe_list_mutex);
+					list_add_tail(&pe->list, &phb->ioda.pe_list);
+					mutex_unlock(&phb->ioda.pe_list_mutex);
+				} else {
+					pe->flags |= PNV_IODA_PE_SLAVE;
+					pe->master = master_pe;
+					list_add_tail(&pe->list,
+						&master_pe->slaves);
+				}
+
 				for (vf_index1 = vf_group * vf_per_group;
 				     vf_index1 < (vf_group + 1) * vf_per_group &&
 				     vf_index1 < num_vfs;
diff --git a/arch/powerpc/platforms/powernv/pci.c b/arch/powerpc/platforms/powernv/pci.c
index 10bc8c3..717a58e 100644
--- a/arch/powerpc/platforms/powernv/pci.c
+++ b/arch/powerpc/platforms/powernv/pci.c
@@ -667,7 +667,7 @@ static void pnv_pci_dma_dev_setup(struct pci_dev *pdev)
 	struct pci_controller *hose = pci_bus_to_host(pdev->bus);
 	struct pnv_phb *phb = hose->private_data;
 #ifdef CONFIG_PCI_IOV
-	struct pnv_ioda_pe *pe;
+	struct pnv_ioda_pe *pe, *slave;
 	struct pci_dn *pdn;
 
 	/* Fix the VF pdn PE number */
@@ -679,10 +679,23 @@ static void pnv_pci_dma_dev_setup(struct pci_dev *pdev)
 			    (pdev->devfn & 0xff))) {
 				pdn->pe_number = pe->pe_number;
 				pe->pdev = pdev;
-				break;
+				goto found;
+			}
+
+			if ((pe->flags & PNV_IODA_PE_MASTER) &&
+			    (pe->flags & PNV_IODA_PE_VF)) {
+				list_for_each_entry(slave, &pe->slaves, list) {
+					if (slave->rid == ((pdev->bus->number << 8)
+					   | (pdev->devfn & 0xff))) {
+						pdn->pe_number = slave->pe_number;
+						slave->pdev = pdev;
+						goto found;
+					}
+				}
 			}
 		}
 	}
+found:
 #endif /* CONFIG_PCI_IOV */
 
 	if (phb && phb->dma_dev_setup)
-- 
1.7.9.5

^ permalink raw reply related	[flat|nested] 68+ messages in thread

* Re: [PATCH V7 03/10] powerpc/pci: Remove VFs prior to PF
  2015-05-19 10:50     ` Wei Yang
  (?)
@ 2015-06-01 23:20     ` Bjorn Helgaas
  2015-06-02  3:44       ` Wei Yang
  -1 siblings, 1 reply; 68+ messages in thread
From: Bjorn Helgaas @ 2015-06-01 23:20 UTC (permalink / raw)
  To: Wei Yang; +Cc: gwshan, linuxppc-dev, linux-pci

On Tue, May 19, 2015 at 06:50:05PM +0800, Wei Yang wrote:
> As commit ac205b7b ("PCI: make sriov work with hotplug remove") indicates,

The conventional reference is:

  ac205b7bb72f ("PCI: make sriov work with hotplug remove")

> VFs, which might be hooked to same PCI bus as their PF should be removed
> before the PF. Otherwise, the PCI hot unplugging on the PCI bus would
> cause kernel crash.
> 
> The patch applies the above pattern to PowerPC PCI hotplug path.
> 
> [gwshan: changelog]
> Signed-off-by: Wei Yang <weiyang@linux.vnet.ibm.com>
> Acked-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
> ---
>  arch/powerpc/kernel/pci-hotplug.c |    2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/arch/powerpc/kernel/pci-hotplug.c b/arch/powerpc/kernel/pci-hotplug.c
> index 7ed85a6..98f84ed 100644
> --- a/arch/powerpc/kernel/pci-hotplug.c
> +++ b/arch/powerpc/kernel/pci-hotplug.c
> @@ -50,7 +50,7 @@ void pcibios_remove_pci_devices(struct pci_bus *bus)
>  
>  	pr_debug("PCI: Removing devices on bus %04x:%02x\n",
>  		 pci_domain_nr(bus),  bus->number);
> -	list_for_each_entry_safe(dev, tmp, &bus->devices, bus_list) {
> +	list_for_each_entry_safe_reverse(dev, tmp, &bus->devices, bus_list) {
>  		pr_debug("   Removing %s...\n", pci_name(dev));
>  		pci_stop_and_remove_bus_device(dev);
>  	}
> -- 
> 1.7.9.5
> 

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH V7 04/10] powerpc/eeh: Trace first 7 BARs in address cache
  2015-05-19 10:50     ` Wei Yang
  (?)
@ 2015-06-01 23:32     ` Bjorn Helgaas
  2015-06-02  3:51       ` Wei Yang
  -1 siblings, 1 reply; 68+ messages in thread
From: Bjorn Helgaas @ 2015-06-01 23:32 UTC (permalink / raw)
  To: Wei Yang; +Cc: gwshan, linuxppc-dev, linux-pci

The subject says "Trace first 7 BARs..."  I think maybe you meant "Track
first 7 BARs" or maybe "Cache only BARs, not windows or IOV BARs"

On Tue, May 19, 2015 at 06:50:06PM +0800, Wei Yang wrote:
> EEH address cache, which helps to locate the PCI device according to
> the given (physical) MMIO address, didn't cover PCI bridges. 

"doesn't contain PCI bridge windows"?

I see that eeh_addr_cache_insert_dev() ignores bridges because it never
calls __eeh_addr_cache_insert_dev() when "(dev->class >> 16) ==
PCI_BASE_CLASS_BRIDGE".  I think it would be more technically correct if
you removed that test and relied on the "i <= PCI_ROM_RESOURCE" test in
this patch, because it is legal (though rare) for bridge devices to have
two BARs, and I assume you would want to put those in your cache if they
exist.

> Also, it
> shouldn't return PF with address in PF's IOV BARs. Instead, the VFs
> should be returned.
> The patch restricts the address cache to cover first 7 BARs for the
> above purposes.
> 
> [gwshan: changelog]
> Signed-off-by: Wei Yang <weiyang@linux.vnet.ibm.com>
> Acked-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
> ---
>  arch/powerpc/kernel/eeh_cache.c |    2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/arch/powerpc/kernel/eeh_cache.c b/arch/powerpc/kernel/eeh_cache.c
> index eeabeab..f6c5f05 100644
> --- a/arch/powerpc/kernel/eeh_cache.c
> +++ b/arch/powerpc/kernel/eeh_cache.c
> @@ -196,7 +196,7 @@ static void __eeh_addr_cache_insert_dev(struct pci_dev *dev)
>  	}
>  
>  	/* Walk resources on this device, poke them into the tree */
> -	for (i = 0; i < DEVICE_COUNT_RESOURCE; i++) {
> +	for (i = 0; i <= PCI_ROM_RESOURCE; i++) {
>  		unsigned long start = pci_resource_start(dev,i);
>  		unsigned long end = pci_resource_end(dev,i);
>  		unsigned int flags = pci_resource_flags(dev,i);
> -- 
> 1.7.9.5
> 

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH V7 06/10] powerpc/eeh: Create PE for VFs
  2015-05-19 10:50     ` Wei Yang
  (?)
@ 2015-06-01 23:46     ` Bjorn Helgaas
  2015-06-03  3:31       ` Wei Yang
  -1 siblings, 1 reply; 68+ messages in thread
From: Bjorn Helgaas @ 2015-06-01 23:46 UTC (permalink / raw)
  To: Wei Yang; +Cc: gwshan, linuxppc-dev, linux-pci

On Tue, May 19, 2015 at 06:50:08PM +0800, Wei Yang wrote:
> Current EEH recovery code works with the assumption: the PE has primary
> bus. Unfortunately, that's not true to VF PEs, which generally contains
> one or multiple VFs (for VF group case). The patch creates PEs for VFs
> at PCI final fixup time. Those PEs for VFs are indentified with newly
> introduced flag EEH_PE_VF so that we handle them differently during
> EEH recovery.
> 
> [gwshan: changelog and code refactoring]
> Signed-off-by: Wei Yang <weiyang@linux.vnet.ibm.com>
> Acked-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
> ---
>  arch/powerpc/include/asm/eeh.h               |    1 +
>  arch/powerpc/kernel/eeh_pe.c                 |   10 ++++++++--
>  arch/powerpc/platforms/powernv/eeh-powernv.c |   17 +++++++++++++++++
>  3 files changed, 26 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/powerpc/include/asm/eeh.h b/arch/powerpc/include/asm/eeh.h
> index 1b3614d..c1fde48 100644
> --- a/arch/powerpc/include/asm/eeh.h
> +++ b/arch/powerpc/include/asm/eeh.h
> @@ -70,6 +70,7 @@ struct pci_dn;
>  #define EEH_PE_PHB	(1 << 1)	/* PHB PE    */
>  #define EEH_PE_DEVICE 	(1 << 2)	/* Device PE */
>  #define EEH_PE_BUS	(1 << 3)	/* Bus PE    */
> +#define EEH_PE_VF	(1 << 4)	/* VF PE     */
>  
>  #define EEH_PE_ISOLATED		(1 << 0)	/* Isolated PE		*/
>  #define EEH_PE_RECOVERING	(1 << 1)	/* Recovering PE	*/
> diff --git a/arch/powerpc/kernel/eeh_pe.c b/arch/powerpc/kernel/eeh_pe.c
> index 35f0b62..260a701 100644
> --- a/arch/powerpc/kernel/eeh_pe.c
> +++ b/arch/powerpc/kernel/eeh_pe.c
> @@ -299,7 +299,10 @@ static struct eeh_pe *eeh_pe_get_parent(struct eeh_dev *edev)
>  	 * EEH device already having associated PE, but
>  	 * the direct parent EEH device doesn't have yet.
>  	 */
> -	pdn = pdn ? pdn->parent : NULL;
> +	if (edev->physfn)
> +		pdn = pci_get_pdn(edev->physfn);
> +	else
> +		pdn = pdn ? pdn->parent : NULL;
>  	while (pdn) {
>  		/* We're poking out of PCI territory */
>  		parent = pdn_to_eeh_dev(pdn);
> @@ -382,7 +385,10 @@ int eeh_add_to_parent_pe(struct eeh_dev *edev)
>  	}
>  
>  	/* Create a new EEH PE */
> -	pe = eeh_pe_alloc(edev->phb, EEH_PE_DEVICE);
> +	if (edev->physfn)
> +		pe = eeh_pe_alloc(edev->phb, EEH_PE_VF);
> +	else
> +		pe = eeh_pe_alloc(edev->phb, EEH_PE_DEVICE);
>  	if (!pe) {
>  		pr_err("%s: out of memory!\n", __func__);
>  		return -ENOMEM;
> diff --git a/arch/powerpc/platforms/powernv/eeh-powernv.c b/arch/powerpc/platforms/powernv/eeh-powernv.c
> index ce738ab..c505036 100644
> --- a/arch/powerpc/platforms/powernv/eeh-powernv.c
> +++ b/arch/powerpc/platforms/powernv/eeh-powernv.c
> @@ -1520,6 +1520,23 @@ static struct eeh_ops pnv_eeh_ops = {
>  	.restore_config		= pnv_eeh_restore_config
>  };
>  
> +static void pnv_eeh_vf_final_fixup(struct pci_dev *pdev)
> +{
> +	struct pci_dn *pdn = pci_get_pdn(pdev);
> +
> +	if (!pdev->is_virtfn)
> +		return;
> +
> +	/*
> +	 * The following operations will fail if VF's sysfs files
> +	 * aren't created or its resources aren't finalized.
> +	 */

I don't understand this comment.  "The following operations" seems to refer
to eeh_add_device_early() and eeh_add_device_late(), and
"VF's sysfs files being created" seems to refer to eeh_sysfs_add_device().

So the comment suggests that eeh_add_device_early() and
eeh_add_device_late() will fail because they're called before
eeh_sysfs_add_device().  So I think you must be talking about some other
"following operations," not eeh_add_device_early() and
eeh_add_device_late().

> +	eeh_add_device_early(pdn);
> +	eeh_add_device_late(pdev);
> +	eeh_sysfs_add_device(pdev);
> +}
> +DECLARE_PCI_FIXUP_FINAL(PCI_ANY_ID, PCI_ANY_ID, pnv_eeh_vf_final_fixup);

Ugh.  This is powerpc code, but I don't like using fixups as a hook like
this.  There is a pcibios_add_device() -- could this be done there?

If not, what happens after pcibios_add_device() that is required for this
code?  Maybe we need a pcibios_bus_add_device() hook?

> +
>  /**
>   * eeh_powernv_init - Register platform dependent EEH operations
>   *
> -- 
> 1.7.9.5
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-pci" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH V7 06/10] powerpc/eeh: Create PE for VFs
  2015-05-19 10:50     ` Wei Yang
  (?)
  (?)
@ 2015-06-01 23:49     ` Bjorn Helgaas
  2015-06-03  3:39       ` Wei Yang
  -1 siblings, 1 reply; 68+ messages in thread
From: Bjorn Helgaas @ 2015-06-01 23:49 UTC (permalink / raw)
  To: Wei Yang; +Cc: gwshan, linuxppc-dev, linux-pci

On Tue, May 19, 2015 at 06:50:08PM +0800, Wei Yang wrote:
> Current EEH recovery code works with the assumption: the PE has primary
> bus. Unfortunately, that's not true to VF PEs, which generally contains

"Primary bus" normally means the bus on the upstream side of a PCI bridge.
But a PE is not a bridge, so I don't know what it means here.

s/not true to VF PEs/not true for VF PEs/

> one or multiple VFs (for VF group case). The patch creates PEs for VFs
> at PCI final fixup time. Those PEs for VFs are indentified with newly

s/indentified/identified/

> introduced flag EEH_PE_VF so that we handle them differently during
> EEH recovery.

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH V7 08/10] powerpc/powernv: Support PCI config restore for VFs
  2015-05-19 10:50     ` Wei Yang
  (?)
@ 2015-06-02  0:01     ` Bjorn Helgaas
  2015-06-03  1:37       ` Wei Yang
  -1 siblings, 1 reply; 68+ messages in thread
From: Bjorn Helgaas @ 2015-06-02  0:01 UTC (permalink / raw)
  To: Wei Yang; +Cc: gwshan, linuxppc-dev, linux-pci

On Tue, May 19, 2015 at 06:50:10PM +0800, Wei Yang wrote:
> After PE reset, OPAL API opal_pci_reinit() is called on all devices
> contained in the PE to reinitialize them. However, VFs can't be seen
> from skiboot firmware. We have to implement the functions, similar
> those in skiboot firmware, to reinitialize VFs after reset on PE
> for VFs.
> 
> [gwshan: changelog and code refactoring]
> Signed-off-by: Wei Yang <weiyang@linux.vnet.ibm.com>
> Acked-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
> ---
>  arch/powerpc/include/asm/pci-bridge.h        |    1 +
>  arch/powerpc/platforms/powernv/eeh-powernv.c |   70 +++++++++++++++++++++++++-
>  arch/powerpc/platforms/powernv/pci.c         |   18 +++++++
>  3 files changed, 88 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/powerpc/include/asm/pci-bridge.h b/arch/powerpc/include/asm/pci-bridge.h
> index c324882..ad60263 100644
> --- a/arch/powerpc/include/asm/pci-bridge.h
> +++ b/arch/powerpc/include/asm/pci-bridge.h
> @@ -206,6 +206,7 @@ struct pci_dn {
>  #define IODA_INVALID_M64        (-1)
>  	int     m64_wins[PCI_SRIOV_NUM_BARS][M64_PER_IOV];
>  #endif /* CONFIG_PCI_IOV */
> +	int	mps;
>  #endif
>  	struct list_head child_list;
>  	struct list_head list;
> diff --git a/arch/powerpc/platforms/powernv/eeh-powernv.c b/arch/powerpc/platforms/powernv/eeh-powernv.c
> index 7af3c1e..33deb78 100644
> --- a/arch/powerpc/platforms/powernv/eeh-powernv.c
> +++ b/arch/powerpc/platforms/powernv/eeh-powernv.c
> @@ -1612,6 +1612,67 @@ static int pnv_eeh_next_error(struct eeh_pe **pe)
>  	return ret;
>  }
>  
> +static int pnv_eeh_restore_vf_config(struct pci_dn *pdn)
> +{
> +	struct eeh_dev *edev = pdn_to_eeh_dev(pdn);
> +	u32 devctl, cmd, cap2, aer_capctl;
> +	int old_mps;
> +
> +	/* Restore MPS */
> +	if (edev->pcie_cap) {
> +		old_mps = (ffs(pdn->mps) - 8) << 5;
> +		eeh_ops->read_config(pdn, edev->pcie_cap + PCI_EXP_DEVCTL,
> +				     2, &devctl);
> +		devctl &= ~PCI_EXP_DEVCTL_PAYLOAD;
> +		devctl |= old_mps;
> +		eeh_ops->write_config(pdn, edev->pcie_cap + PCI_EXP_DEVCTL,
> +				      2, devctl);
> +	}
> +
> +	/* Disable Completion Timeout */
> +	if (edev->pcie_cap) {
> +		eeh_ops->read_config(pdn, edev->pcie_cap + PCI_EXP_DEVCAP2,
> +				     4, &cap2);
> +		if (cap2 & 0x10) {
> +			eeh_ops->read_config(pdn,
> +					edev->pcie_cap + PCI_EXP_DEVCTL2,
> +					4, &cap2);
> +			cap2 |= 0x10;
> +			eeh_ops->write_config(pdn,
> +					edev->pcie_cap + PCI_EXP_DEVCTL2,
> +					4, cap2);
> +		}
> +	}
> +
> +	/* Enable SERR and parity checking */
> +	eeh_ops->read_config(pdn, PCI_COMMAND, 2, &cmd);
> +	cmd |= (PCI_COMMAND_PARITY | PCI_COMMAND_SERR);
> +	eeh_ops->write_config(pdn, PCI_COMMAND, 2, cmd);
> +
> +	/* Enable report various errors */
> +	if (edev->pcie_cap) {
> +		eeh_ops->read_config(pdn, edev->pcie_cap + PCI_EXP_DEVCTL,
> +				2, &devctl);
> +		devctl &= ~PCI_EXP_DEVCTL_CERE;
> +		devctl |= (PCI_EXP_DEVCTL_NFERE |
> +			   PCI_EXP_DEVCTL_FERE |
> +			   PCI_EXP_DEVCTL_URRE);
> +		eeh_ops->write_config(pdn, edev->pcie_cap + PCI_EXP_DEVCTL,
> +				2, devctl);
> +	}
> +
> +	/* Enable ECRC generation and check */
> +	if (edev->pcie_cap && edev->aer_cap) {
> +		eeh_ops->read_config(pdn, edev->aer_cap + PCI_ERR_CAP,
> +				4, &aer_capctl);
> +		aer_capctl |= (PCI_ERR_CAP_ECRC_GENE | PCI_ERR_CAP_ECRC_CHKE);
> +		eeh_ops->write_config(pdn, edev->aer_cap + PCI_ERR_CAP,
> +				4, aer_capctl);
> +	}

Where is all this stuff set up the first time?  It'd be nice if we could
use the same path that did it the first time.

Are we setting up a PF or a VF here?  The function is called
pnv_eeh_restore_vf_config(), but it's called when "edev->physfn", so it's a
little confusing.

> +
> +	return 0;
> +}
> +
>  static int pnv_eeh_restore_config(struct pci_dn *pdn)
>  {
>  	struct eeh_dev *edev = pdn_to_eeh_dev(pdn);
> @@ -1622,7 +1683,14 @@ static int pnv_eeh_restore_config(struct pci_dn *pdn)
>  		return -EEXIST;
>  
>  	phb = edev->phb->private_data;
> -	ret = opal_pci_reinit(phb->opal_id,
> +	/*
> +	 * We have to restore the PCI config space after reset since the
> +	 * firmware can't see SRIOV VFs.
> +	 */
> +	if (edev->physfn)
> +		ret = pnv_eeh_restore_vf_config(pdn);
> +	else
> +		ret = opal_pci_reinit(phb->opal_id,
>  			      OPAL_REINIT_PCI_DEV, edev->config_addr);
>  	if (ret) {
>  		pr_warn("%s: Can't reinit PCI dev 0x%x (%lld)\n",
> diff --git a/arch/powerpc/platforms/powernv/pci.c b/arch/powerpc/platforms/powernv/pci.c
> index bca2aeb..10bc8c3 100644
> --- a/arch/powerpc/platforms/powernv/pci.c
> +++ b/arch/powerpc/platforms/powernv/pci.c
> @@ -729,6 +729,24 @@ static void pnv_p7ioc_rc_quirk(struct pci_dev *dev)
>  }
>  DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_IBM, 0x3b9, pnv_p7ioc_rc_quirk);
>  
> +#ifdef CONFIG_PCI_IOV
> +static void pnv_pci_fixup_vf_mps(struct pci_dev *pdev)
> +{
> +	struct pci_dn *pdn = pci_get_pdn(pdev);
> +	int parent_mps;
> +
> +	if (!pdev->is_virtfn)
> +		return;
> +
> +	/* Synchronize MPS for VF and PF */
> +	parent_mps = pcie_get_mps(pdev->physfn);
> +	if ((128 << pdev->pcie_mpss) >= parent_mps)
> +		pcie_set_mps(pdev, parent_mps);
> +	pdn->mps = pcie_get_mps(pdev);
> +}
> +DECLARE_PCI_FIXUP_HEADER(PCI_ANY_ID, PCI_ANY_ID, pnv_pci_fixup_vf_mps);

Same comment as before -- I don't like this usage of fixups.  Would it work
to do this in pcibios_add_device()?

I assume you need this to happen when you hot-remove and hot-add a VF
during EEH recovery?  Where does this happen in the normal hotplug path,
e.g., pciehp, and can you do it in a corresponding place for EEH hotplug?


> +#endif /* CONFIG_PCI_IOV */
> +
>  void __init pnv_pci_init(void)
>  {
>  	struct device_node *np;
> -- 
> 1.7.9.5
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-pci" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH V7 03/10] powerpc/pci: Remove VFs prior to PF
  2015-06-01 23:20     ` Bjorn Helgaas
@ 2015-06-02  3:44       ` Wei Yang
  0 siblings, 0 replies; 68+ messages in thread
From: Wei Yang @ 2015-06-02  3:44 UTC (permalink / raw)
  To: Bjorn Helgaas; +Cc: Wei Yang, gwshan, linuxppc-dev, linux-pci

On Mon, Jun 01, 2015 at 06:20:05PM -0500, Bjorn Helgaas wrote:
>On Tue, May 19, 2015 at 06:50:05PM +0800, Wei Yang wrote:
>> As commit ac205b7b ("PCI: make sriov work with hotplug remove") indicates,
>
>The conventional reference is:
>
>  ac205b7bb72f ("PCI: make sriov work with hotplug remove")
>

Thanks, will change it in next version.

>> VFs, which might be hooked to same PCI bus as their PF should be removed
>> before the PF. Otherwise, the PCI hot unplugging on the PCI bus would
>> cause kernel crash.
>> 
>> The patch applies the above pattern to PowerPC PCI hotplug path.
>> 
>> [gwshan: changelog]
>> Signed-off-by: Wei Yang <weiyang@linux.vnet.ibm.com>
>> Acked-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
>> ---
>>  arch/powerpc/kernel/pci-hotplug.c |    2 +-
>>  1 file changed, 1 insertion(+), 1 deletion(-)
>> 
>> diff --git a/arch/powerpc/kernel/pci-hotplug.c b/arch/powerpc/kernel/pci-hotplug.c
>> index 7ed85a6..98f84ed 100644
>> --- a/arch/powerpc/kernel/pci-hotplug.c
>> +++ b/arch/powerpc/kernel/pci-hotplug.c
>> @@ -50,7 +50,7 @@ void pcibios_remove_pci_devices(struct pci_bus *bus)
>>  
>>  	pr_debug("PCI: Removing devices on bus %04x:%02x\n",
>>  		 pci_domain_nr(bus),  bus->number);
>> -	list_for_each_entry_safe(dev, tmp, &bus->devices, bus_list) {
>> +	list_for_each_entry_safe_reverse(dev, tmp, &bus->devices, bus_list) {
>>  		pr_debug("   Removing %s...\n", pci_name(dev));
>>  		pci_stop_and_remove_bus_device(dev);
>>  	}
>> -- 
>> 1.7.9.5
>> 

-- 
Richard Yang
Help you, Help me


^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH V7 04/10] powerpc/eeh: Trace first 7 BARs in address cache
  2015-06-01 23:32     ` Bjorn Helgaas
@ 2015-06-02  3:51       ` Wei Yang
  2015-06-02  4:11         ` Gavin Shan
  0 siblings, 1 reply; 68+ messages in thread
From: Wei Yang @ 2015-06-02  3:51 UTC (permalink / raw)
  To: Bjorn Helgaas; +Cc: Wei Yang, gwshan, linuxppc-dev, linux-pci

On Mon, Jun 01, 2015 at 06:32:33PM -0500, Bjorn Helgaas wrote:
>The subject says "Trace first 7 BARs..."  I think maybe you meant "Track
>first 7 BARs" or maybe "Cache only BARs, not windows or IOV BARs"
>

Agree, Track is more accurate.

Gavin,

Which subject you prefer?

>On Tue, May 19, 2015 at 06:50:06PM +0800, Wei Yang wrote:
>> EEH address cache, which helps to locate the PCI device according to
>> the given (physical) MMIO address, didn't cover PCI bridges. 
>
>"doesn't contain PCI bridge windows"?
>
>I see that eeh_addr_cache_insert_dev() ignores bridges because it never
>calls __eeh_addr_cache_insert_dev() when "(dev->class >> 16) ==
>PCI_BASE_CLASS_BRIDGE".  I think it would be more technically correct if
>you removed that test and relied on the "i <= PCI_ROM_RESOURCE" test in
>this patch, because it is legal (though rare) for bridge devices to have
>two BARs, and I assume you would want to put those in your cache if they
>exist.
>

I think this is fine to remove the test "(dev->class >> 16) ==
PCI_BASE_CLASS_BRIDGE" for a bridge and rely on the "i <= PCI_ROM_RESOURCE"

Gavin,

Do you thinks this is fine?

>> Also, it
>> shouldn't return PF with address in PF's IOV BARs. Instead, the VFs
>> should be returned.
>> The patch restricts the address cache to cover first 7 BARs for the
>> above purposes.
>> 
>> [gwshan: changelog]
>> Signed-off-by: Wei Yang <weiyang@linux.vnet.ibm.com>
>> Acked-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
>> ---
>>  arch/powerpc/kernel/eeh_cache.c |    2 +-
>>  1 file changed, 1 insertion(+), 1 deletion(-)
>> 
>> diff --git a/arch/powerpc/kernel/eeh_cache.c b/arch/powerpc/kernel/eeh_cache.c
>> index eeabeab..f6c5f05 100644
>> --- a/arch/powerpc/kernel/eeh_cache.c
>> +++ b/arch/powerpc/kernel/eeh_cache.c
>> @@ -196,7 +196,7 @@ static void __eeh_addr_cache_insert_dev(struct pci_dev *dev)
>>  	}
>>  
>>  	/* Walk resources on this device, poke them into the tree */
>> -	for (i = 0; i < DEVICE_COUNT_RESOURCE; i++) {
>> +	for (i = 0; i <= PCI_ROM_RESOURCE; i++) {
>>  		unsigned long start = pci_resource_start(dev,i);
>>  		unsigned long end = pci_resource_end(dev,i);
>>  		unsigned int flags = pci_resource_flags(dev,i);
>> -- 
>> 1.7.9.5
>> 

-- 
Richard Yang
Help you, Help me


^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH V7 04/10] powerpc/eeh: Trace first 7 BARs in address cache
  2015-06-02  3:51       ` Wei Yang
@ 2015-06-02  4:11         ` Gavin Shan
  2015-06-03  1:47           ` Wei Yang
  0 siblings, 1 reply; 68+ messages in thread
From: Gavin Shan @ 2015-06-02  4:11 UTC (permalink / raw)
  To: Wei Yang; +Cc: Bjorn Helgaas, gwshan, linuxppc-dev, linux-pci

On Tue, Jun 02, 2015 at 11:51:15AM +0800, Wei Yang wrote:
>On Mon, Jun 01, 2015 at 06:32:33PM -0500, Bjorn Helgaas wrote:
>>The subject says "Trace first 7 BARs..."  I think maybe you meant "Track
>>first 7 BARs" or maybe "Cache only BARs, not windows or IOV BARs"
>>
>
>Agree, Track is more accurate.
>
>Gavin,
>
>Which subject you prefer?
>

"Cache only BARs, not windows or IOV BARs" is better.

>>On Tue, May 19, 2015 at 06:50:06PM +0800, Wei Yang wrote:
>>> EEH address cache, which helps to locate the PCI device according to
>>> the given (physical) MMIO address, didn't cover PCI bridges. 
>>
>>"doesn't contain PCI bridge windows"?
>>
>>I see that eeh_addr_cache_insert_dev() ignores bridges because it never
>>calls __eeh_addr_cache_insert_dev() when "(dev->class >> 16) ==
>>PCI_BASE_CLASS_BRIDGE".  I think it would be more technically correct if
>>you removed that test and relied on the "i <= PCI_ROM_RESOURCE" test in
>>this patch, because it is legal (though rare) for bridge devices to have
>>two BARs, and I assume you would want to put those in your cache if they
>>exist.
>>
>
>I think this is fine to remove the test "(dev->class >> 16) ==
>PCI_BASE_CLASS_BRIDGE" for a bridge and rely on the "i <= PCI_ROM_RESOURCE"
>
>Gavin,
>
>Do you thinks this is fine?
>

Fine to me.

>>> Also, it
>>> shouldn't return PF with address in PF's IOV BARs. Instead, the VFs
>>> should be returned.
>>> The patch restricts the address cache to cover first 7 BARs for the
>>> above purposes.
>>> 
>>> [gwshan: changelog]
>>> Signed-off-by: Wei Yang <weiyang@linux.vnet.ibm.com>
>>> Acked-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
>>> ---
>>>  arch/powerpc/kernel/eeh_cache.c |    2 +-
>>>  1 file changed, 1 insertion(+), 1 deletion(-)
>>> 
>>> diff --git a/arch/powerpc/kernel/eeh_cache.c b/arch/powerpc/kernel/eeh_cache.c
>>> index eeabeab..f6c5f05 100644
>>> --- a/arch/powerpc/kernel/eeh_cache.c
>>> +++ b/arch/powerpc/kernel/eeh_cache.c
>>> @@ -196,7 +196,7 @@ static void __eeh_addr_cache_insert_dev(struct pci_dev *dev)
>>>  	}
>>>  
>>>  	/* Walk resources on this device, poke them into the tree */
>>> -	for (i = 0; i < DEVICE_COUNT_RESOURCE; i++) {
>>> +	for (i = 0; i <= PCI_ROM_RESOURCE; i++) {
>>>  		unsigned long start = pci_resource_start(dev,i);
>>>  		unsigned long end = pci_resource_end(dev,i);
>>>  		unsigned int flags = pci_resource_flags(dev,i);
>>> -- 
>>> 1.7.9.5
>>> 
>
>-- 
>Richard Yang
>Help you, Help me


^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH V7 01/10] PCI/IOV: Rename and export virtfn_add/virtfn_remove
  2015-05-19 10:50     ` Wei Yang
  (?)
@ 2015-06-02 17:19     ` Bjorn Helgaas
  2015-06-03  1:38       ` Wei Yang
  -1 siblings, 1 reply; 68+ messages in thread
From: Bjorn Helgaas @ 2015-06-02 17:19 UTC (permalink / raw)
  To: Wei Yang; +Cc: gwshan, linuxppc-dev, linux-pci

On Tue, May 19, 2015 at 06:50:03PM +0800, Wei Yang wrote:
> During EEH recovery, hotplug is applied to the devices which don't
> have drivers or their drivers don't support EEH. However, the hotplug,
> which was implemented based on PCI bus, can't be applied to VF directly.
> 
> The patch renames virtn_{add,remove}() and exports them so that they
> can be used in PCI hotplug during EEH recovery.
> 
> [gwshan: changelog]
> Signed-off-by: Wei Yang <weiyang@linux.vnet.ibm.com>
> Reviewed-by: Gavin Shan <gwshan@linux.vnet.ibm.com>

Acked-by: Bjorn Helgaas <bhelgaas@google.com>

I assume you'll merge this along with the rest of this series via the
powerpc tree.

> ---
>  drivers/pci/iov.c   |   10 +++++-----
>  include/linux/pci.h |    8 ++++++++
>  2 files changed, 13 insertions(+), 5 deletions(-)
> 
> diff --git a/drivers/pci/iov.c b/drivers/pci/iov.c
> index ee0ebff..cc941dd 100644
> --- a/drivers/pci/iov.c
> +++ b/drivers/pci/iov.c
> @@ -108,7 +108,7 @@ resource_size_t pci_iov_resource_size(struct pci_dev *dev, int resno)
>  	return dev->sriov->barsz[resno - PCI_IOV_RESOURCES];
>  }
>  
> -static int virtfn_add(struct pci_dev *dev, int id, int reset)
> +int pci_iov_virtfn_add(struct pci_dev *dev, int id, int reset)
>  {
>  	int i;
>  	int rc = -ENOMEM;
> @@ -183,7 +183,7 @@ failed:
>  	return rc;
>  }
>  
> -static void virtfn_remove(struct pci_dev *dev, int id, int reset)
> +void pci_iov_virtfn_remove(struct pci_dev *dev, int id, int reset)
>  {
>  	char buf[VIRTFN_ID_LEN];
>  	struct pci_dev *virtfn;
> @@ -320,7 +320,7 @@ static int sriov_enable(struct pci_dev *dev, int nr_virtfn)
>  	}
>  
>  	for (i = 0; i < initial; i++) {
> -		rc = virtfn_add(dev, i, 0);
> +		rc = pci_iov_virtfn_add(dev, i, 0);
>  		if (rc)
>  			goto failed;
>  	}
> @@ -332,7 +332,7 @@ static int sriov_enable(struct pci_dev *dev, int nr_virtfn)
>  
>  failed:
>  	for (j = 0; j < i; j++)
> -		virtfn_remove(dev, j, 0);
> +		pci_iov_virtfn_remove(dev, j, 0);
>  
>  	iov->ctrl &= ~(PCI_SRIOV_CTRL_VFE | PCI_SRIOV_CTRL_MSE);
>  	pci_cfg_access_lock(dev);
> @@ -361,7 +361,7 @@ static void sriov_disable(struct pci_dev *dev)
>  		return;
>  
>  	for (i = 0; i < iov->num_VFs; i++)
> -		virtfn_remove(dev, i, 0);
> +		pci_iov_virtfn_remove(dev, i, 0);
>  
>  	pcibios_sriov_disable(dev);
>  
> diff --git a/include/linux/pci.h b/include/linux/pci.h
> index 353db8d..06aa5dd 100644
> --- a/include/linux/pci.h
> +++ b/include/linux/pci.h
> @@ -1679,6 +1679,8 @@ int pci_iov_virtfn_devfn(struct pci_dev *dev, int id);
>  
>  int pci_enable_sriov(struct pci_dev *dev, int nr_virtfn);
>  void pci_disable_sriov(struct pci_dev *dev);
> +int pci_iov_virtfn_add(struct pci_dev *dev, int id, int reset);
> +void pci_iov_virtfn_remove(struct pci_dev *dev, int id, int reset);
>  int pci_num_vf(struct pci_dev *dev);
>  int pci_vfs_assigned(struct pci_dev *dev);
>  int pci_sriov_set_totalvfs(struct pci_dev *dev, u16 numvfs);
> @@ -1696,6 +1698,12 @@ static inline int pci_iov_virtfn_devfn(struct pci_dev *dev, int id)
>  static inline int pci_enable_sriov(struct pci_dev *dev, int nr_virtfn)
>  { return -ENODEV; }
>  static inline void pci_disable_sriov(struct pci_dev *dev) { }
> +static inline int pci_iov_virtfn_add(struct pci_dev *dev, int id, int reset)
> +{
> +	return -ENOSYS;
> +}
> +static inline void pci_iov_virtfn_remove(struct pci_dev *dev, int id, int reset)
> +{ }
>  static inline int pci_num_vf(struct pci_dev *dev) { return 0; }
>  static inline int pci_vfs_assigned(struct pci_dev *dev)
>  { return 0; }
> -- 
> 1.7.9.5
> 

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH V7 08/10] powerpc/powernv: Support PCI config restore for VFs
  2015-06-02  0:01     ` Bjorn Helgaas
@ 2015-06-03  1:37       ` Wei Yang
  2015-06-03  5:14         ` Gavin Shan
  0 siblings, 1 reply; 68+ messages in thread
From: Wei Yang @ 2015-06-03  1:37 UTC (permalink / raw)
  To: Bjorn Helgaas; +Cc: Wei Yang, gwshan, linuxppc-dev, linux-pci

On Mon, Jun 01, 2015 at 07:01:36PM -0500, Bjorn Helgaas wrote:
>On Tue, May 19, 2015 at 06:50:10PM +0800, Wei Yang wrote:
>> After PE reset, OPAL API opal_pci_reinit() is called on all devices
>> contained in the PE to reinitialize them. However, VFs can't be seen
>> from skiboot firmware. We have to implement the functions, similar
>> those in skiboot firmware, to reinitialize VFs after reset on PE
>> for VFs.
>> 
>> [gwshan: changelog and code refactoring]
>> Signed-off-by: Wei Yang <weiyang@linux.vnet.ibm.com>
>> Acked-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
>> ---
>>  arch/powerpc/include/asm/pci-bridge.h        |    1 +
>>  arch/powerpc/platforms/powernv/eeh-powernv.c |   70 +++++++++++++++++++++++++-
>>  arch/powerpc/platforms/powernv/pci.c         |   18 +++++++
>>  3 files changed, 88 insertions(+), 1 deletion(-)
>> 
>> diff --git a/arch/powerpc/include/asm/pci-bridge.h b/arch/powerpc/include/asm/pci-bridge.h
>> index c324882..ad60263 100644
>> --- a/arch/powerpc/include/asm/pci-bridge.h
>> +++ b/arch/powerpc/include/asm/pci-bridge.h
>> @@ -206,6 +206,7 @@ struct pci_dn {
>>  #define IODA_INVALID_M64        (-1)
>>  	int     m64_wins[PCI_SRIOV_NUM_BARS][M64_PER_IOV];
>>  #endif /* CONFIG_PCI_IOV */
>> +	int	mps;
>>  #endif
>>  	struct list_head child_list;
>>  	struct list_head list;
>> diff --git a/arch/powerpc/platforms/powernv/eeh-powernv.c b/arch/powerpc/platforms/powernv/eeh-powernv.c
>> index 7af3c1e..33deb78 100644
>> --- a/arch/powerpc/platforms/powernv/eeh-powernv.c
>> +++ b/arch/powerpc/platforms/powernv/eeh-powernv.c
>> @@ -1612,6 +1612,67 @@ static int pnv_eeh_next_error(struct eeh_pe **pe)
>>  	return ret;
>>  }
>>  
>> +static int pnv_eeh_restore_vf_config(struct pci_dn *pdn)
>> +{
>> +	struct eeh_dev *edev = pdn_to_eeh_dev(pdn);
>> +	u32 devctl, cmd, cap2, aer_capctl;
>> +	int old_mps;
>> +
>> +	/* Restore MPS */
>> +	if (edev->pcie_cap) {
>> +		old_mps = (ffs(pdn->mps) - 8) << 5;
>> +		eeh_ops->read_config(pdn, edev->pcie_cap + PCI_EXP_DEVCTL,
>> +				     2, &devctl);
>> +		devctl &= ~PCI_EXP_DEVCTL_PAYLOAD;
>> +		devctl |= old_mps;
>> +		eeh_ops->write_config(pdn, edev->pcie_cap + PCI_EXP_DEVCTL,
>> +				      2, devctl);
>> +	}
>> +
>> +	/* Disable Completion Timeout */
>> +	if (edev->pcie_cap) {
>> +		eeh_ops->read_config(pdn, edev->pcie_cap + PCI_EXP_DEVCAP2,
>> +				     4, &cap2);
>> +		if (cap2 & 0x10) {
>> +			eeh_ops->read_config(pdn,
>> +					edev->pcie_cap + PCI_EXP_DEVCTL2,
>> +					4, &cap2);
>> +			cap2 |= 0x10;
>> +			eeh_ops->write_config(pdn,
>> +					edev->pcie_cap + PCI_EXP_DEVCTL2,
>> +					4, cap2);
>> +		}
>> +	}
>> +
>> +	/* Enable SERR and parity checking */
>> +	eeh_ops->read_config(pdn, PCI_COMMAND, 2, &cmd);
>> +	cmd |= (PCI_COMMAND_PARITY | PCI_COMMAND_SERR);
>> +	eeh_ops->write_config(pdn, PCI_COMMAND, 2, cmd);
>> +
>> +	/* Enable report various errors */
>> +	if (edev->pcie_cap) {
>> +		eeh_ops->read_config(pdn, edev->pcie_cap + PCI_EXP_DEVCTL,
>> +				2, &devctl);
>> +		devctl &= ~PCI_EXP_DEVCTL_CERE;
>> +		devctl |= (PCI_EXP_DEVCTL_NFERE |
>> +			   PCI_EXP_DEVCTL_FERE |
>> +			   PCI_EXP_DEVCTL_URRE);
>> +		eeh_ops->write_config(pdn, edev->pcie_cap + PCI_EXP_DEVCTL,
>> +				2, devctl);
>> +	}
>> +
>> +	/* Enable ECRC generation and check */
>> +	if (edev->pcie_cap && edev->aer_cap) {
>> +		eeh_ops->read_config(pdn, edev->aer_cap + PCI_ERR_CAP,
>> +				4, &aer_capctl);
>> +		aer_capctl |= (PCI_ERR_CAP_ECRC_GENE | PCI_ERR_CAP_ECRC_CHKE);
>> +		eeh_ops->write_config(pdn, edev->aer_cap + PCI_ERR_CAP,
>> +				4, aer_capctl);
>> +	}
>
>Where is all this stuff set up the first time?  It'd be nice if we could
>use the same path that did it the first time.
>

Those steps in this function are called to setup the pci_dev. For those PFs,
they are done in the skiboot firmware, invoked by opal_pci_reinit(). For VFs,
since skiboot firmware is not aware of those VFs, we need to setup it in
kernel.

This means originally, those stuffs are in skiboot firmware. This is the first
time appears in kernel.

>Are we setting up a PF or a VF here?  The function is called
>pnv_eeh_restore_vf_config(), but it's called when "edev->physfn", so it's a
>little confusing.
>

Yes, this is called for VFs. "edev->physfn" means the edev has a PF, so that
this edev is a VF.

>> +
>> +	return 0;
>> +}
>> +
>>  static int pnv_eeh_restore_config(struct pci_dn *pdn)
>>  {
>>  	struct eeh_dev *edev = pdn_to_eeh_dev(pdn);
>> @@ -1622,7 +1683,14 @@ static int pnv_eeh_restore_config(struct pci_dn *pdn)
>>  		return -EEXIST;
>>  
>>  	phb = edev->phb->private_data;
>> -	ret = opal_pci_reinit(phb->opal_id,
>> +	/*
>> +	 * We have to restore the PCI config space after reset since the
>> +	 * firmware can't see SRIOV VFs.
>> +	 */
>> +	if (edev->physfn)
>> +		ret = pnv_eeh_restore_vf_config(pdn);
>> +	else
>> +		ret = opal_pci_reinit(phb->opal_id,
>>  			      OPAL_REINIT_PCI_DEV, edev->config_addr);
>>  	if (ret) {
>>  		pr_warn("%s: Can't reinit PCI dev 0x%x (%lld)\n",
>> diff --git a/arch/powerpc/platforms/powernv/pci.c b/arch/powerpc/platforms/powernv/pci.c
>> index bca2aeb..10bc8c3 100644
>> --- a/arch/powerpc/platforms/powernv/pci.c
>> +++ b/arch/powerpc/platforms/powernv/pci.c
>> @@ -729,6 +729,24 @@ static void pnv_p7ioc_rc_quirk(struct pci_dev *dev)
>>  }
>>  DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_IBM, 0x3b9, pnv_p7ioc_rc_quirk);
>>  
>> +#ifdef CONFIG_PCI_IOV
>> +static void pnv_pci_fixup_vf_mps(struct pci_dev *pdev)
>> +{
>> +	struct pci_dn *pdn = pci_get_pdn(pdev);
>> +	int parent_mps;
>> +
>> +	if (!pdev->is_virtfn)
>> +		return;
>> +
>> +	/* Synchronize MPS for VF and PF */
>> +	parent_mps = pcie_get_mps(pdev->physfn);
>> +	if ((128 << pdev->pcie_mpss) >= parent_mps)
>> +		pcie_set_mps(pdev, parent_mps);
>> +	pdn->mps = pcie_get_mps(pdev);
>> +}
>> +DECLARE_PCI_FIXUP_HEADER(PCI_ANY_ID, PCI_ANY_ID, pnv_pci_fixup_vf_mps);
>
>Same comment as before -- I don't like this usage of fixups.  Would it work
>to do this in pcibios_add_device()?
>
>I assume you need this to happen when you hot-remove and hot-add a VF
>during EEH recovery?  Where does this happen in the normal hotplug path,
>e.g., pciehp, and can you do it in a corresponding place for EEH hotplug?
>

Yes, this is called in the EEH recovery. While not only in the hot-add case,
when we do the normal EEH reset, 

eeh_reset_device()
    eeh_pe_restore_bars()
        eeh_restore_one_device_bars()
	    eeh_ops->restore_config()
	        pnv_eeh_restore_config()

eeh_reset_device() would handle both hot-add and non-hot-add cases. So this is
not proper to move it to the hot-plug path.

>
>> +#endif /* CONFIG_PCI_IOV */
>> +
>>  void __init pnv_pci_init(void)
>>  {
>>  	struct device_node *np;
>> -- 
>> 1.7.9.5
>> 
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-pci" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html

-- 
Richard Yang
Help you, Help me


^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH V7 01/10] PCI/IOV: Rename and export virtfn_add/virtfn_remove
  2015-06-02 17:19     ` Bjorn Helgaas
@ 2015-06-03  1:38       ` Wei Yang
  0 siblings, 0 replies; 68+ messages in thread
From: Wei Yang @ 2015-06-03  1:38 UTC (permalink / raw)
  To: Bjorn Helgaas; +Cc: Wei Yang, gwshan, linuxppc-dev, linux-pci

On Tue, Jun 02, 2015 at 12:19:07PM -0500, Bjorn Helgaas wrote:
>On Tue, May 19, 2015 at 06:50:03PM +0800, Wei Yang wrote:
>> During EEH recovery, hotplug is applied to the devices which don't
>> have drivers or their drivers don't support EEH. However, the hotplug,
>> which was implemented based on PCI bus, can't be applied to VF directly.
>> 
>> The patch renames virtn_{add,remove}() and exports them so that they
>> can be used in PCI hotplug during EEH recovery.
>> 
>> [gwshan: changelog]
>> Signed-off-by: Wei Yang <weiyang@linux.vnet.ibm.com>
>> Reviewed-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
>
>Acked-by: Bjorn Helgaas <bhelgaas@google.com>
>
>I assume you'll merge this along with the rest of this series via the
>powerpc tree.
>

Thanks, I think so.

>> ---
>>  drivers/pci/iov.c   |   10 +++++-----
>>  include/linux/pci.h |    8 ++++++++
>>  2 files changed, 13 insertions(+), 5 deletions(-)
>> 
>> diff --git a/drivers/pci/iov.c b/drivers/pci/iov.c
>> index ee0ebff..cc941dd 100644
>> --- a/drivers/pci/iov.c
>> +++ b/drivers/pci/iov.c
>> @@ -108,7 +108,7 @@ resource_size_t pci_iov_resource_size(struct pci_dev *dev, int resno)
>>  	return dev->sriov->barsz[resno - PCI_IOV_RESOURCES];
>>  }
>>  
>> -static int virtfn_add(struct pci_dev *dev, int id, int reset)
>> +int pci_iov_virtfn_add(struct pci_dev *dev, int id, int reset)
>>  {
>>  	int i;
>>  	int rc = -ENOMEM;
>> @@ -183,7 +183,7 @@ failed:
>>  	return rc;
>>  }
>>  
>> -static void virtfn_remove(struct pci_dev *dev, int id, int reset)
>> +void pci_iov_virtfn_remove(struct pci_dev *dev, int id, int reset)
>>  {
>>  	char buf[VIRTFN_ID_LEN];
>>  	struct pci_dev *virtfn;
>> @@ -320,7 +320,7 @@ static int sriov_enable(struct pci_dev *dev, int nr_virtfn)
>>  	}
>>  
>>  	for (i = 0; i < initial; i++) {
>> -		rc = virtfn_add(dev, i, 0);
>> +		rc = pci_iov_virtfn_add(dev, i, 0);
>>  		if (rc)
>>  			goto failed;
>>  	}
>> @@ -332,7 +332,7 @@ static int sriov_enable(struct pci_dev *dev, int nr_virtfn)
>>  
>>  failed:
>>  	for (j = 0; j < i; j++)
>> -		virtfn_remove(dev, j, 0);
>> +		pci_iov_virtfn_remove(dev, j, 0);
>>  
>>  	iov->ctrl &= ~(PCI_SRIOV_CTRL_VFE | PCI_SRIOV_CTRL_MSE);
>>  	pci_cfg_access_lock(dev);
>> @@ -361,7 +361,7 @@ static void sriov_disable(struct pci_dev *dev)
>>  		return;
>>  
>>  	for (i = 0; i < iov->num_VFs; i++)
>> -		virtfn_remove(dev, i, 0);
>> +		pci_iov_virtfn_remove(dev, i, 0);
>>  
>>  	pcibios_sriov_disable(dev);
>>  
>> diff --git a/include/linux/pci.h b/include/linux/pci.h
>> index 353db8d..06aa5dd 100644
>> --- a/include/linux/pci.h
>> +++ b/include/linux/pci.h
>> @@ -1679,6 +1679,8 @@ int pci_iov_virtfn_devfn(struct pci_dev *dev, int id);
>>  
>>  int pci_enable_sriov(struct pci_dev *dev, int nr_virtfn);
>>  void pci_disable_sriov(struct pci_dev *dev);
>> +int pci_iov_virtfn_add(struct pci_dev *dev, int id, int reset);
>> +void pci_iov_virtfn_remove(struct pci_dev *dev, int id, int reset);
>>  int pci_num_vf(struct pci_dev *dev);
>>  int pci_vfs_assigned(struct pci_dev *dev);
>>  int pci_sriov_set_totalvfs(struct pci_dev *dev, u16 numvfs);
>> @@ -1696,6 +1698,12 @@ static inline int pci_iov_virtfn_devfn(struct pci_dev *dev, int id)
>>  static inline int pci_enable_sriov(struct pci_dev *dev, int nr_virtfn)
>>  { return -ENODEV; }
>>  static inline void pci_disable_sriov(struct pci_dev *dev) { }
>> +static inline int pci_iov_virtfn_add(struct pci_dev *dev, int id, int reset)
>> +{
>> +	return -ENOSYS;
>> +}
>> +static inline void pci_iov_virtfn_remove(struct pci_dev *dev, int id, int reset)
>> +{ }
>>  static inline int pci_num_vf(struct pci_dev *dev) { return 0; }
>>  static inline int pci_vfs_assigned(struct pci_dev *dev)
>>  { return 0; }
>> -- 
>> 1.7.9.5
>> 

-- 
Richard Yang
Help you, Help me


^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH V7 04/10] powerpc/eeh: Trace first 7 BARs in address cache
  2015-06-02  4:11         ` Gavin Shan
@ 2015-06-03  1:47           ` Wei Yang
  0 siblings, 0 replies; 68+ messages in thread
From: Wei Yang @ 2015-06-03  1:47 UTC (permalink / raw)
  To: Gavin Shan; +Cc: Wei Yang, Bjorn Helgaas, linuxppc-dev, linux-pci

On Tue, Jun 02, 2015 at 02:11:24PM +1000, Gavin Shan wrote:
>On Tue, Jun 02, 2015 at 11:51:15AM +0800, Wei Yang wrote:
>>On Mon, Jun 01, 2015 at 06:32:33PM -0500, Bjorn Helgaas wrote:
>>>The subject says "Trace first 7 BARs..."  I think maybe you meant "Track
>>>first 7 BARs" or maybe "Cache only BARs, not windows or IOV BARs"
>>>
>>
>>Agree, Track is more accurate.
>>
>>Gavin,
>>
>>Which subject you prefer?
>>
>
>"Cache only BARs, not windows or IOV BARs" is better.
>
>>>On Tue, May 19, 2015 at 06:50:06PM +0800, Wei Yang wrote:
>>>> EEH address cache, which helps to locate the PCI device according to
>>>> the given (physical) MMIO address, didn't cover PCI bridges. 
>>>
>>>"doesn't contain PCI bridge windows"?
>>>
>>>I see that eeh_addr_cache_insert_dev() ignores bridges because it never
>>>calls __eeh_addr_cache_insert_dev() when "(dev->class >> 16) ==
>>>PCI_BASE_CLASS_BRIDGE".  I think it would be more technically correct if
>>>you removed that test and relied on the "i <= PCI_ROM_RESOURCE" test in
>>>this patch, because it is legal (though rare) for bridge devices to have
>>>two BARs, and I assume you would want to put those in your cache if they
>>>exist.
>>>
>>
>>I think this is fine to remove the test "(dev->class >> 16) ==
>>PCI_BASE_CLASS_BRIDGE" for a bridge and rely on the "i <= PCI_ROM_RESOURCE"
>>
>>Gavin,
>>
>>Do you thinks this is fine?
>>
>
>Fine to me.

Thanks, I will change accordingly and do some tests.

>
>>>> Also, it
>>>> shouldn't return PF with address in PF's IOV BARs. Instead, the VFs
>>>> should be returned.
>>>> The patch restricts the address cache to cover first 7 BARs for the
>>>> above purposes.
>>>> 
>>>> [gwshan: changelog]
>>>> Signed-off-by: Wei Yang <weiyang@linux.vnet.ibm.com>
>>>> Acked-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
>>>> ---
>>>>  arch/powerpc/kernel/eeh_cache.c |    2 +-
>>>>  1 file changed, 1 insertion(+), 1 deletion(-)
>>>> 
>>>> diff --git a/arch/powerpc/kernel/eeh_cache.c b/arch/powerpc/kernel/eeh_cache.c
>>>> index eeabeab..f6c5f05 100644
>>>> --- a/arch/powerpc/kernel/eeh_cache.c
>>>> +++ b/arch/powerpc/kernel/eeh_cache.c
>>>> @@ -196,7 +196,7 @@ static void __eeh_addr_cache_insert_dev(struct pci_dev *dev)
>>>>  	}
>>>>  
>>>>  	/* Walk resources on this device, poke them into the tree */
>>>> -	for (i = 0; i < DEVICE_COUNT_RESOURCE; i++) {
>>>> +	for (i = 0; i <= PCI_ROM_RESOURCE; i++) {
>>>>  		unsigned long start = pci_resource_start(dev,i);
>>>>  		unsigned long end = pci_resource_end(dev,i);
>>>>  		unsigned int flags = pci_resource_flags(dev,i);
>>>> -- 
>>>> 1.7.9.5
>>>> 
>>
>>-- 
>>Richard Yang
>>Help you, Help me

-- 
Richard Yang
Help you, Help me


^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH V7 06/10] powerpc/eeh: Create PE for VFs
  2015-06-01 23:46     ` Bjorn Helgaas
@ 2015-06-03  3:31       ` Wei Yang
  2015-06-03  5:10         ` Gavin Shan
  0 siblings, 1 reply; 68+ messages in thread
From: Wei Yang @ 2015-06-03  3:31 UTC (permalink / raw)
  To: Bjorn Helgaas; +Cc: Wei Yang, gwshan, linuxppc-dev, linux-pci

On Mon, Jun 01, 2015 at 06:46:45PM -0500, Bjorn Helgaas wrote:
>On Tue, May 19, 2015 at 06:50:08PM +0800, Wei Yang wrote:
>> Current EEH recovery code works with the assumption: the PE has primary
>> bus. Unfortunately, that's not true to VF PEs, which generally contains
>> one or multiple VFs (for VF group case). The patch creates PEs for VFs
>> at PCI final fixup time. Those PEs for VFs are indentified with newly
>> introduced flag EEH_PE_VF so that we handle them differently during
>> EEH recovery.
>> 
>> [gwshan: changelog and code refactoring]
>> Signed-off-by: Wei Yang <weiyang@linux.vnet.ibm.com>
>> Acked-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
>> ---
>>  arch/powerpc/include/asm/eeh.h               |    1 +
>>  arch/powerpc/kernel/eeh_pe.c                 |   10 ++++++++--
>>  arch/powerpc/platforms/powernv/eeh-powernv.c |   17 +++++++++++++++++
>>  3 files changed, 26 insertions(+), 2 deletions(-)
>> 
>> diff --git a/arch/powerpc/include/asm/eeh.h b/arch/powerpc/include/asm/eeh.h
>> index 1b3614d..c1fde48 100644
>> --- a/arch/powerpc/include/asm/eeh.h
>> +++ b/arch/powerpc/include/asm/eeh.h
>> @@ -70,6 +70,7 @@ struct pci_dn;
>>  #define EEH_PE_PHB	(1 << 1)	/* PHB PE    */
>>  #define EEH_PE_DEVICE 	(1 << 2)	/* Device PE */
>>  #define EEH_PE_BUS	(1 << 3)	/* Bus PE    */
>> +#define EEH_PE_VF	(1 << 4)	/* VF PE     */
>>  
>>  #define EEH_PE_ISOLATED		(1 << 0)	/* Isolated PE		*/
>>  #define EEH_PE_RECOVERING	(1 << 1)	/* Recovering PE	*/
>> diff --git a/arch/powerpc/kernel/eeh_pe.c b/arch/powerpc/kernel/eeh_pe.c
>> index 35f0b62..260a701 100644
>> --- a/arch/powerpc/kernel/eeh_pe.c
>> +++ b/arch/powerpc/kernel/eeh_pe.c
>> @@ -299,7 +299,10 @@ static struct eeh_pe *eeh_pe_get_parent(struct eeh_dev *edev)
>>  	 * EEH device already having associated PE, but
>>  	 * the direct parent EEH device doesn't have yet.
>>  	 */
>> -	pdn = pdn ? pdn->parent : NULL;
>> +	if (edev->physfn)
>> +		pdn = pci_get_pdn(edev->physfn);
>> +	else
>> +		pdn = pdn ? pdn->parent : NULL;
>>  	while (pdn) {
>>  		/* We're poking out of PCI territory */
>>  		parent = pdn_to_eeh_dev(pdn);
>> @@ -382,7 +385,10 @@ int eeh_add_to_parent_pe(struct eeh_dev *edev)
>>  	}
>>  
>>  	/* Create a new EEH PE */
>> -	pe = eeh_pe_alloc(edev->phb, EEH_PE_DEVICE);
>> +	if (edev->physfn)
>> +		pe = eeh_pe_alloc(edev->phb, EEH_PE_VF);
>> +	else
>> +		pe = eeh_pe_alloc(edev->phb, EEH_PE_DEVICE);
>>  	if (!pe) {
>>  		pr_err("%s: out of memory!\n", __func__);
>>  		return -ENOMEM;
>> diff --git a/arch/powerpc/platforms/powernv/eeh-powernv.c b/arch/powerpc/platforms/powernv/eeh-powernv.c
>> index ce738ab..c505036 100644
>> --- a/arch/powerpc/platforms/powernv/eeh-powernv.c
>> +++ b/arch/powerpc/platforms/powernv/eeh-powernv.c
>> @@ -1520,6 +1520,23 @@ static struct eeh_ops pnv_eeh_ops = {
>>  	.restore_config		= pnv_eeh_restore_config
>>  };
>>  
>> +static void pnv_eeh_vf_final_fixup(struct pci_dev *pdev)
>> +{
>> +	struct pci_dn *pdn = pci_get_pdn(pdev);
>> +
>> +	if (!pdev->is_virtfn)
>> +		return;
>> +
>> +	/*
>> +	 * The following operations will fail if VF's sysfs files
>> +	 * aren't created or its resources aren't finalized.
>> +	 */
>
>I don't understand this comment.  "The following operations" seems to refer
>to eeh_add_device_early() and eeh_add_device_late(), and
>"VF's sysfs files being created" seems to refer to eeh_sysfs_add_device().
>
>So the comment suggests that eeh_add_device_early() and
>eeh_add_device_late() will fail because they're called before
>eeh_sysfs_add_device().  So I think you must be talking about some other
>"following operations," not eeh_add_device_early() and
>eeh_add_device_late().

Sorry for this confusion.

The comment here wants to say the eeh_sysfs_add_device() will fail if the VF's
sysfs is not created well. Or it will fail if the VF's resources are not set
properly, since we would cache the VF's BAR in eeh_add_device_late().

Gavin,

If my understanding is not correct please let me know.

>
>> +	eeh_add_device_early(pdn);
>> +	eeh_add_device_late(pdev);
>> +	eeh_sysfs_add_device(pdev);
>> +}
>> +DECLARE_PCI_FIXUP_FINAL(PCI_ANY_ID, PCI_ANY_ID, pnv_eeh_vf_final_fixup);
>
>Ugh.  This is powerpc code, but I don't like using fixups as a hook like
>this.  There is a pcibios_add_device() -- could this be done there?
>

I don't like it neither :-) But looks we can't put it in the
pcibios_add_device().

>If not, what happens after pcibios_add_device() that is required for this
>code?  Maybe we need a pcibios_bus_add_device() hook?

The pnv_eeh_vf_final_fixup() will try to create sysfs for VFs. This requires
the VF sysfs(kobj) is initialized properly. If we put these into
pcibios_add_device(), the eeh_sysfs_add_device() would fail.

Below is the call flow for your reference:

pci_device_add()
    pcibios_add_device()
    device_add()                <--- kobj initialized here

>
>> +
>>  /**
>>   * eeh_powernv_init - Register platform dependent EEH operations
>>   *
>> -- 
>> 1.7.9.5
>> 
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-pci" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html

-- 
Richard Yang
Help you, Help me


^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH V7 06/10] powerpc/eeh: Create PE for VFs
  2015-06-01 23:49     ` Bjorn Helgaas
@ 2015-06-03  3:39       ` Wei Yang
  0 siblings, 0 replies; 68+ messages in thread
From: Wei Yang @ 2015-06-03  3:39 UTC (permalink / raw)
  To: Bjorn Helgaas; +Cc: Wei Yang, linux-pci, linuxppc-dev, gwshan

On Mon, Jun 01, 2015 at 06:49:58PM -0500, Bjorn Helgaas wrote:
>On Tue, May 19, 2015 at 06:50:08PM +0800, Wei Yang wrote:
>> Current EEH recovery code works with the assumption: the PE has primary
>> bus. Unfortunately, that's not true to VF PEs, which generally contains
>
>"Primary bus" normally means the bus on the upstream side of a PCI bridge.
>But a PE is not a bridge, so I don't know what it means here.
>

Before VF PE introduced, a PE is a "Bus PE" which contains a pci bus. Yes, the
"primary bus" may be a little confusing, since this may refer to the upstream
side of a PCI bridge. I think the log here tries to emphasize the Bus PE
contains a whole pci bus.

>s/not true to VF PEs/not true for VF PEs/

Thanks, changed

>
>> one or multiple VFs (for VF group case). The patch creates PEs for VFs
>> at PCI final fixup time. Those PEs for VFs are indentified with newly
>
>s/indentified/identified/

Thanks, changed

>
>> introduced flag EEH_PE_VF so that we handle them differently during
>> EEH recovery.
>_______________________________________________
>Linuxppc-dev mailing list
>Linuxppc-dev@lists.ozlabs.org
>https://lists.ozlabs.org/listinfo/linuxppc-dev

-- 
Richard Yang
Help you, Help me


^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH V7 06/10] powerpc/eeh: Create PE for VFs
  2015-06-03  3:31       ` Wei Yang
@ 2015-06-03  5:10         ` Gavin Shan
  2015-06-03 15:46           ` Bjorn Helgaas
  0 siblings, 1 reply; 68+ messages in thread
From: Gavin Shan @ 2015-06-03  5:10 UTC (permalink / raw)
  To: Wei Yang; +Cc: Bjorn Helgaas, gwshan, linuxppc-dev, linux-pci

On Wed, Jun 03, 2015 at 11:31:42AM +0800, Wei Yang wrote:
>On Mon, Jun 01, 2015 at 06:46:45PM -0500, Bjorn Helgaas wrote:
>>On Tue, May 19, 2015 at 06:50:08PM +0800, Wei Yang wrote:
>>> Current EEH recovery code works with the assumption: the PE has primary
>>> bus. Unfortunately, that's not true to VF PEs, which generally contains
>>> one or multiple VFs (for VF group case). The patch creates PEs for VFs
>>> at PCI final fixup time. Those PEs for VFs are indentified with newly
>>> introduced flag EEH_PE_VF so that we handle them differently during
>>> EEH recovery.
>>> 
>>> [gwshan: changelog and code refactoring]
>>> Signed-off-by: Wei Yang <weiyang@linux.vnet.ibm.com>
>>> Acked-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
>>> ---
>>>  arch/powerpc/include/asm/eeh.h               |    1 +
>>>  arch/powerpc/kernel/eeh_pe.c                 |   10 ++++++++--
>>>  arch/powerpc/platforms/powernv/eeh-powernv.c |   17 +++++++++++++++++
>>>  3 files changed, 26 insertions(+), 2 deletions(-)
>>> 
>>> diff --git a/arch/powerpc/include/asm/eeh.h b/arch/powerpc/include/asm/eeh.h
>>> index 1b3614d..c1fde48 100644
>>> --- a/arch/powerpc/include/asm/eeh.h
>>> +++ b/arch/powerpc/include/asm/eeh.h
>>> @@ -70,6 +70,7 @@ struct pci_dn;
>>>  #define EEH_PE_PHB	(1 << 1)	/* PHB PE    */
>>>  #define EEH_PE_DEVICE 	(1 << 2)	/* Device PE */
>>>  #define EEH_PE_BUS	(1 << 3)	/* Bus PE    */
>>> +#define EEH_PE_VF	(1 << 4)	/* VF PE     */
>>>  
>>>  #define EEH_PE_ISOLATED		(1 << 0)	/* Isolated PE		*/
>>>  #define EEH_PE_RECOVERING	(1 << 1)	/* Recovering PE	*/
>>> diff --git a/arch/powerpc/kernel/eeh_pe.c b/arch/powerpc/kernel/eeh_pe.c
>>> index 35f0b62..260a701 100644
>>> --- a/arch/powerpc/kernel/eeh_pe.c
>>> +++ b/arch/powerpc/kernel/eeh_pe.c
>>> @@ -299,7 +299,10 @@ static struct eeh_pe *eeh_pe_get_parent(struct eeh_dev *edev)
>>>  	 * EEH device already having associated PE, but
>>>  	 * the direct parent EEH device doesn't have yet.
>>>  	 */
>>> -	pdn = pdn ? pdn->parent : NULL;
>>> +	if (edev->physfn)
>>> +		pdn = pci_get_pdn(edev->physfn);
>>> +	else
>>> +		pdn = pdn ? pdn->parent : NULL;
>>>  	while (pdn) {
>>>  		/* We're poking out of PCI territory */
>>>  		parent = pdn_to_eeh_dev(pdn);
>>> @@ -382,7 +385,10 @@ int eeh_add_to_parent_pe(struct eeh_dev *edev)
>>>  	}
>>>  
>>>  	/* Create a new EEH PE */
>>> -	pe = eeh_pe_alloc(edev->phb, EEH_PE_DEVICE);
>>> +	if (edev->physfn)
>>> +		pe = eeh_pe_alloc(edev->phb, EEH_PE_VF);
>>> +	else
>>> +		pe = eeh_pe_alloc(edev->phb, EEH_PE_DEVICE);
>>>  	if (!pe) {
>>>  		pr_err("%s: out of memory!\n", __func__);
>>>  		return -ENOMEM;
>>> diff --git a/arch/powerpc/platforms/powernv/eeh-powernv.c b/arch/powerpc/platforms/powernv/eeh-powernv.c
>>> index ce738ab..c505036 100644
>>> --- a/arch/powerpc/platforms/powernv/eeh-powernv.c
>>> +++ b/arch/powerpc/platforms/powernv/eeh-powernv.c
>>> @@ -1520,6 +1520,23 @@ static struct eeh_ops pnv_eeh_ops = {
>>>  	.restore_config		= pnv_eeh_restore_config
>>>  };
>>>  
>>> +static void pnv_eeh_vf_final_fixup(struct pci_dev *pdev)
>>> +{
>>> +	struct pci_dn *pdn = pci_get_pdn(pdev);
>>> +
>>> +	if (!pdev->is_virtfn)
>>> +		return;
>>> +
>>> +	/*
>>> +	 * The following operations will fail if VF's sysfs files
>>> +	 * aren't created or its resources aren't finalized.
>>> +	 */
>>
>>I don't understand this comment.  "The following operations" seems to refer
>>to eeh_add_device_early() and eeh_add_device_late(), and
>>"VF's sysfs files being created" seems to refer to eeh_sysfs_add_device().
>>
>>So the comment suggests that eeh_add_device_early() and
>>eeh_add_device_late() will fail because they're called before
>>eeh_sysfs_add_device().  So I think you must be talking about some other
>>"following operations," not eeh_add_device_early() and
>>eeh_add_device_late().
>
>Sorry for this confusion.
>
>The comment here wants to say the eeh_sysfs_add_device() will fail if the VF's
>sysfs is not created well. Or it will fail if the VF's resources are not set
>properly, since we would cache the VF's BAR in eeh_add_device_late().
>
>Gavin,
>
>If my understanding is not correct please let me know.
>

It's correct. "The following operations" refers to eeh_add_device_late()
and eeh_sysfs_add_device(). The former one requires the resources for
one particular PCI device (VF here) are finalized (assigned). eeh_sysfs_add_device()
will fail if the sysfs entry for the PCI device isn't populated yet.

>>
>>> +	eeh_add_device_early(pdn);
>>> +	eeh_add_device_late(pdev);
>>> +	eeh_sysfs_add_device(pdev);
>>> +}
>>> +DECLARE_PCI_FIXUP_FINAL(PCI_ANY_ID, PCI_ANY_ID, pnv_eeh_vf_final_fixup);
>>
>>Ugh.  This is powerpc code, but I don't like using fixups as a hook like
>>this.  There is a pcibios_add_device() -- could this be done there?
>>
>
>I don't like it neither :-) But looks we can't put it in the
>pcibios_add_device().
>
>>If not, what happens after pcibios_add_device() that is required for this
>>code?  Maybe we need a pcibios_bus_add_device() hook?
>
>The pnv_eeh_vf_final_fixup() will try to create sysfs for VFs. This requires
>the VF sysfs(kobj) is initialized properly. If we put these into
>pcibios_add_device(), the eeh_sysfs_add_device() would fail.
>
>Below is the call flow for your reference:
>
>pci_device_add()
>    pcibios_add_device()
>    device_add()                <--- kobj initialized here
>

We can put it into pcibios_bus_add_device(), but we don't it currently. If
Bjorn agree to add pcibios_bus_add_device(), I'm fine to move the block code
there.

Thanks,
Gavin

>>
>>> +
>>>  /**
>>>   * eeh_powernv_init - Register platform dependent EEH operations
>>>   *
>>> -- 
>>> 1.7.9.5
>>> 
>>> --
>>> To unsubscribe from this list: send the line "unsubscribe linux-pci" in
>>> the body of a message to majordomo@vger.kernel.org
>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>
>-- 
>Richard Yang
>Help you, Help me


^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH V7 08/10] powerpc/powernv: Support PCI config restore for VFs
  2015-06-03  1:37       ` Wei Yang
@ 2015-06-03  5:14         ` Gavin Shan
  0 siblings, 0 replies; 68+ messages in thread
From: Gavin Shan @ 2015-06-03  5:14 UTC (permalink / raw)
  To: Wei Yang; +Cc: Bjorn Helgaas, gwshan, linuxppc-dev, linux-pci

On Wed, Jun 03, 2015 at 09:37:53AM +0800, Wei Yang wrote:
>On Mon, Jun 01, 2015 at 07:01:36PM -0500, Bjorn Helgaas wrote:
>>On Tue, May 19, 2015 at 06:50:10PM +0800, Wei Yang wrote:
>>> After PE reset, OPAL API opal_pci_reinit() is called on all devices
>>> contained in the PE to reinitialize them. However, VFs can't be seen
>>> from skiboot firmware. We have to implement the functions, similar
>>> those in skiboot firmware, to reinitialize VFs after reset on PE
>>> for VFs.
>>> 
>>> [gwshan: changelog and code refactoring]
>>> Signed-off-by: Wei Yang <weiyang@linux.vnet.ibm.com>
>>> Acked-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
>>> ---
>>>  arch/powerpc/include/asm/pci-bridge.h        |    1 +
>>>  arch/powerpc/platforms/powernv/eeh-powernv.c |   70 +++++++++++++++++++++++++-
>>>  arch/powerpc/platforms/powernv/pci.c         |   18 +++++++
>>>  3 files changed, 88 insertions(+), 1 deletion(-)
>>> 
>>> diff --git a/arch/powerpc/include/asm/pci-bridge.h b/arch/powerpc/include/asm/pci-bridge.h
>>> index c324882..ad60263 100644
>>> --- a/arch/powerpc/include/asm/pci-bridge.h
>>> +++ b/arch/powerpc/include/asm/pci-bridge.h
>>> @@ -206,6 +206,7 @@ struct pci_dn {
>>>  #define IODA_INVALID_M64        (-1)
>>>  	int     m64_wins[PCI_SRIOV_NUM_BARS][M64_PER_IOV];
>>>  #endif /* CONFIG_PCI_IOV */
>>> +	int	mps;
>>>  #endif
>>>  	struct list_head child_list;
>>>  	struct list_head list;
>>> diff --git a/arch/powerpc/platforms/powernv/eeh-powernv.c b/arch/powerpc/platforms/powernv/eeh-powernv.c
>>> index 7af3c1e..33deb78 100644
>>> --- a/arch/powerpc/platforms/powernv/eeh-powernv.c
>>> +++ b/arch/powerpc/platforms/powernv/eeh-powernv.c
>>> @@ -1612,6 +1612,67 @@ static int pnv_eeh_next_error(struct eeh_pe **pe)
>>>  	return ret;
>>>  }
>>>  
>>> +static int pnv_eeh_restore_vf_config(struct pci_dn *pdn)
>>> +{
>>> +	struct eeh_dev *edev = pdn_to_eeh_dev(pdn);
>>> +	u32 devctl, cmd, cap2, aer_capctl;
>>> +	int old_mps;
>>> +
>>> +	/* Restore MPS */
>>> +	if (edev->pcie_cap) {
>>> +		old_mps = (ffs(pdn->mps) - 8) << 5;
>>> +		eeh_ops->read_config(pdn, edev->pcie_cap + PCI_EXP_DEVCTL,
>>> +				     2, &devctl);
>>> +		devctl &= ~PCI_EXP_DEVCTL_PAYLOAD;
>>> +		devctl |= old_mps;
>>> +		eeh_ops->write_config(pdn, edev->pcie_cap + PCI_EXP_DEVCTL,
>>> +				      2, devctl);
>>> +	}
>>> +
>>> +	/* Disable Completion Timeout */
>>> +	if (edev->pcie_cap) {
>>> +		eeh_ops->read_config(pdn, edev->pcie_cap + PCI_EXP_DEVCAP2,
>>> +				     4, &cap2);
>>> +		if (cap2 & 0x10) {
>>> +			eeh_ops->read_config(pdn,
>>> +					edev->pcie_cap + PCI_EXP_DEVCTL2,
>>> +					4, &cap2);
>>> +			cap2 |= 0x10;
>>> +			eeh_ops->write_config(pdn,
>>> +					edev->pcie_cap + PCI_EXP_DEVCTL2,
>>> +					4, cap2);
>>> +		}
>>> +	}
>>> +
>>> +	/* Enable SERR and parity checking */
>>> +	eeh_ops->read_config(pdn, PCI_COMMAND, 2, &cmd);
>>> +	cmd |= (PCI_COMMAND_PARITY | PCI_COMMAND_SERR);
>>> +	eeh_ops->write_config(pdn, PCI_COMMAND, 2, cmd);
>>> +
>>> +	/* Enable report various errors */
>>> +	if (edev->pcie_cap) {
>>> +		eeh_ops->read_config(pdn, edev->pcie_cap + PCI_EXP_DEVCTL,
>>> +				2, &devctl);
>>> +		devctl &= ~PCI_EXP_DEVCTL_CERE;
>>> +		devctl |= (PCI_EXP_DEVCTL_NFERE |
>>> +			   PCI_EXP_DEVCTL_FERE |
>>> +			   PCI_EXP_DEVCTL_URRE);
>>> +		eeh_ops->write_config(pdn, edev->pcie_cap + PCI_EXP_DEVCTL,
>>> +				2, devctl);
>>> +	}
>>> +
>>> +	/* Enable ECRC generation and check */
>>> +	if (edev->pcie_cap && edev->aer_cap) {
>>> +		eeh_ops->read_config(pdn, edev->aer_cap + PCI_ERR_CAP,
>>> +				4, &aer_capctl);
>>> +		aer_capctl |= (PCI_ERR_CAP_ECRC_GENE | PCI_ERR_CAP_ECRC_CHKE);
>>> +		eeh_ops->write_config(pdn, edev->aer_cap + PCI_ERR_CAP,
>>> +				4, aer_capctl);
>>> +	}
>>
>>Where is all this stuff set up the first time?  It'd be nice if we could
>>use the same path that did it the first time.
>>
>
>Those steps in this function are called to setup the pci_dev. For those PFs,
>they are done in the skiboot firmware, invoked by opal_pci_reinit(). For VFs,
>since skiboot firmware is not aware of those VFs, we need to setup it in
>kernel.
>
>This means originally, those stuffs are in skiboot firmware. This is the first
>time appears in kernel.
>
>>Are we setting up a PF or a VF here?  The function is called
>>pnv_eeh_restore_vf_config(), but it's called when "edev->physfn", so it's a
>>little confusing.
>>
>
>Yes, this is called for VFs. "edev->physfn" means the edev has a PF, so that
>this edev is a VF.
>
>>> +
>>> +	return 0;
>>> +}
>>> +
>>>  static int pnv_eeh_restore_config(struct pci_dn *pdn)
>>>  {
>>>  	struct eeh_dev *edev = pdn_to_eeh_dev(pdn);
>>> @@ -1622,7 +1683,14 @@ static int pnv_eeh_restore_config(struct pci_dn *pdn)
>>>  		return -EEXIST;
>>>  
>>>  	phb = edev->phb->private_data;
>>> -	ret = opal_pci_reinit(phb->opal_id,
>>> +	/*
>>> +	 * We have to restore the PCI config space after reset since the
>>> +	 * firmware can't see SRIOV VFs.
>>> +	 */
>>> +	if (edev->physfn)
>>> +		ret = pnv_eeh_restore_vf_config(pdn);
>>> +	else
>>> +		ret = opal_pci_reinit(phb->opal_id,
>>>  			      OPAL_REINIT_PCI_DEV, edev->config_addr);
>>>  	if (ret) {
>>>  		pr_warn("%s: Can't reinit PCI dev 0x%x (%lld)\n",
>>> diff --git a/arch/powerpc/platforms/powernv/pci.c b/arch/powerpc/platforms/powernv/pci.c
>>> index bca2aeb..10bc8c3 100644
>>> --- a/arch/powerpc/platforms/powernv/pci.c
>>> +++ b/arch/powerpc/platforms/powernv/pci.c
>>> @@ -729,6 +729,24 @@ static void pnv_p7ioc_rc_quirk(struct pci_dev *dev)
>>>  }
>>>  DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_IBM, 0x3b9, pnv_p7ioc_rc_quirk);
>>>  
>>> +#ifdef CONFIG_PCI_IOV
>>> +static void pnv_pci_fixup_vf_mps(struct pci_dev *pdev)
>>> +{
>>> +	struct pci_dn *pdn = pci_get_pdn(pdev);
>>> +	int parent_mps;
>>> +
>>> +	if (!pdev->is_virtfn)
>>> +		return;
>>> +
>>> +	/* Synchronize MPS for VF and PF */
>>> +	parent_mps = pcie_get_mps(pdev->physfn);
>>> +	if ((128 << pdev->pcie_mpss) >= parent_mps)
>>> +		pcie_set_mps(pdev, parent_mps);
>>> +	pdn->mps = pcie_get_mps(pdev);
>>> +}
>>> +DECLARE_PCI_FIXUP_HEADER(PCI_ANY_ID, PCI_ANY_ID, pnv_pci_fixup_vf_mps);
>>
>>Same comment as before -- I don't like this usage of fixups.  Would it work
>>to do this in pcibios_add_device()?
>>
>>I assume you need this to happen when you hot-remove and hot-add a VF
>>during EEH recovery?  Where does this happen in the normal hotplug path,
>>e.g., pciehp, and can you do it in a corresponding place for EEH hotplug?
>>
>
>Yes, this is called in the EEH recovery. While not only in the hot-add case,
>when we do the normal EEH reset, 
>
>eeh_reset_device()
>    eeh_pe_restore_bars()
>        eeh_restore_one_device_bars()
>	    eeh_ops->restore_config()
>	        pnv_eeh_restore_config()
>
>eeh_reset_device() would handle both hot-add and non-hot-add cases. So this is
>not proper to move it to the hot-plug path.
>

If I don't miss anything here. The code does have the problem as Bjorn raised:
When VF is brought up for the first time (without EEH involved yet), the AER
and PCIe timeout setting are not initialized to the expected values.

For non-VFs, the device has been initialize properly before it's exposed to
OS from firmware.

Thanks,
Gavin

>>
>>> +#endif /* CONFIG_PCI_IOV */
>>> +
>>>  void __init pnv_pci_init(void)
>>>  {
>>>  	struct device_node *np;
>>> -- 
>>> 1.7.9.5
>>> 
>>> --
>>> To unsubscribe from this list: send the line "unsubscribe linux-pci" in
>>> the body of a message to majordomo@vger.kernel.org
>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>
>-- 
>Richard Yang
>Help you, Help me


^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH V7 06/10] powerpc/eeh: Create PE for VFs
  2015-06-03  5:10         ` Gavin Shan
@ 2015-06-03 15:46           ` Bjorn Helgaas
  2015-06-04  1:25             ` Gavin Shan
                               ` (2 more replies)
  0 siblings, 3 replies; 68+ messages in thread
From: Bjorn Helgaas @ 2015-06-03 15:46 UTC (permalink / raw)
  To: Gavin Shan; +Cc: Wei Yang, linuxppc-dev, linux-pci

On Wed, Jun 03, 2015 at 03:10:23PM +1000, Gavin Shan wrote:
> On Wed, Jun 03, 2015 at 11:31:42AM +0800, Wei Yang wrote:
> >On Mon, Jun 01, 2015 at 06:46:45PM -0500, Bjorn Helgaas wrote:
> >>On Tue, May 19, 2015 at 06:50:08PM +0800, Wei Yang wrote:
> >>> Current EEH recovery code works with the assumption: the PE has primary
> >>> bus. Unfortunately, that's not true to VF PEs, which generally contains
> >>> one or multiple VFs (for VF group case). The patch creates PEs for VFs
> >>> at PCI final fixup time. Those PEs for VFs are indentified with newly
> >>> introduced flag EEH_PE_VF so that we handle them differently during
> >>> EEH recovery.
> >>> 
> >>> [gwshan: changelog and code refactoring]
> >>> Signed-off-by: Wei Yang <weiyang@linux.vnet.ibm.com>
> >>> Acked-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
> >>> ---
> >>>  arch/powerpc/include/asm/eeh.h               |    1 +
> >>>  arch/powerpc/kernel/eeh_pe.c                 |   10 ++++++++--
> >>>  arch/powerpc/platforms/powernv/eeh-powernv.c |   17 +++++++++++++++++
> >>>  3 files changed, 26 insertions(+), 2 deletions(-)
> >>> 
> >>> diff --git a/arch/powerpc/include/asm/eeh.h b/arch/powerpc/include/asm/eeh.h
> >>> index 1b3614d..c1fde48 100644
> >>> --- a/arch/powerpc/include/asm/eeh.h
> >>> +++ b/arch/powerpc/include/asm/eeh.h
> >>> @@ -70,6 +70,7 @@ struct pci_dn;
> >>>  #define EEH_PE_PHB	(1 << 1)	/* PHB PE    */
> >>>  #define EEH_PE_DEVICE 	(1 << 2)	/* Device PE */
> >>>  #define EEH_PE_BUS	(1 << 3)	/* Bus PE    */
> >>> +#define EEH_PE_VF	(1 << 4)	/* VF PE     */
> >>>  
> >>>  #define EEH_PE_ISOLATED		(1 << 0)	/* Isolated PE		*/
> >>>  #define EEH_PE_RECOVERING	(1 << 1)	/* Recovering PE	*/
> >>> diff --git a/arch/powerpc/kernel/eeh_pe.c b/arch/powerpc/kernel/eeh_pe.c
> >>> index 35f0b62..260a701 100644
> >>> --- a/arch/powerpc/kernel/eeh_pe.c
> >>> +++ b/arch/powerpc/kernel/eeh_pe.c
> >>> @@ -299,7 +299,10 @@ static struct eeh_pe *eeh_pe_get_parent(struct eeh_dev *edev)
> >>>  	 * EEH device already having associated PE, but
> >>>  	 * the direct parent EEH device doesn't have yet.
> >>>  	 */
> >>> -	pdn = pdn ? pdn->parent : NULL;
> >>> +	if (edev->physfn)
> >>> +		pdn = pci_get_pdn(edev->physfn);
> >>> +	else
> >>> +		pdn = pdn ? pdn->parent : NULL;
> >>>  	while (pdn) {
> >>>  		/* We're poking out of PCI territory */
> >>>  		parent = pdn_to_eeh_dev(pdn);
> >>> @@ -382,7 +385,10 @@ int eeh_add_to_parent_pe(struct eeh_dev *edev)
> >>>  	}
> >>>  
> >>>  	/* Create a new EEH PE */
> >>> -	pe = eeh_pe_alloc(edev->phb, EEH_PE_DEVICE);
> >>> +	if (edev->physfn)
> >>> +		pe = eeh_pe_alloc(edev->phb, EEH_PE_VF);
> >>> +	else
> >>> +		pe = eeh_pe_alloc(edev->phb, EEH_PE_DEVICE);
> >>>  	if (!pe) {
> >>>  		pr_err("%s: out of memory!\n", __func__);
> >>>  		return -ENOMEM;
> >>> diff --git a/arch/powerpc/platforms/powernv/eeh-powernv.c b/arch/powerpc/platforms/powernv/eeh-powernv.c
> >>> index ce738ab..c505036 100644
> >>> --- a/arch/powerpc/platforms/powernv/eeh-powernv.c
> >>> +++ b/arch/powerpc/platforms/powernv/eeh-powernv.c
> >>> @@ -1520,6 +1520,23 @@ static struct eeh_ops pnv_eeh_ops = {
> >>>  	.restore_config		= pnv_eeh_restore_config
> >>>  };
> >>>  
> >>> +static void pnv_eeh_vf_final_fixup(struct pci_dev *pdev)
> >>> +{
> >>> +	struct pci_dn *pdn = pci_get_pdn(pdev);
> >>> +
> >>> +	if (!pdev->is_virtfn)
> >>> +		return;
> >>> +
> >>> +	/*
> >>> +	 * The following operations will fail if VF's sysfs files
> >>> +	 * aren't created or its resources aren't finalized.
> >>> +	 */
> >>
> >>I don't understand this comment.  "The following operations" seems to refer
> >>to eeh_add_device_early() and eeh_add_device_late(), and
> >>"VF's sysfs files being created" seems to refer to eeh_sysfs_add_device().
> >>
> >>So the comment suggests that eeh_add_device_early() and
> >>eeh_add_device_late() will fail because they're called before
> >>eeh_sysfs_add_device().  So I think you must be talking about some other
> >>"following operations," not eeh_add_device_early() and
> >>eeh_add_device_late().
> >
> >Sorry for this confusion.
> >
> >The comment here wants to say the eeh_sysfs_add_device() will fail if the VF's
> >sysfs is not created well. Or it will fail if the VF's resources are not set
> >properly, since we would cache the VF's BAR in eeh_add_device_late().
> >
> >Gavin,
> >
> >If my understanding is not correct please let me know.
> >
> 
> It's correct. "The following operations" refers to eeh_add_device_late()
> and eeh_sysfs_add_device(). The former one requires the resources for
> one particular PCI device (VF here) are finalized (assigned). eeh_sysfs_add_device()
> will fail if the sysfs entry for the PCI device isn't populated yet.

eeh_add_device_late() contains several things that read config space:
eeh_save_bars() caches the entire config header, and
eeh_addr_cache_insert_dev() looks at the device resources (which are
determined by BARs in config space).  I think this is an error-prone
approach.  I think it would be simpler and safer for you to capture what
you need in your PCI config accessors.

eeh_add_device_late() also contains code to deal with an EEH cache that
"might not be removed correctly because of unbalanced kref to the device
during unplug time."  That's unrelated to this patch series, but it sounds
... like a hacky workaround for some bug in the unplug path.

> >>> +	eeh_add_device_early(pdn);
> >>> +	eeh_add_device_late(pdev);
> >>> +	eeh_sysfs_add_device(pdev);
> >>> +}
> >>> +DECLARE_PCI_FIXUP_FINAL(PCI_ANY_ID, PCI_ANY_ID, pnv_eeh_vf_final_fixup);
> >>
> >>Ugh.  This is powerpc code, but I don't like using fixups as a hook like
> >>this.  There is a pcibios_add_device() -- could this be done there?
> >>
> >
> >I don't like it neither :-) But looks we can't put it in the
> >pcibios_add_device().
> >
> >>If not, what happens after pcibios_add_device() that is required for this
> >>code?  Maybe we need a pcibios_bus_add_device() hook?
> >
> >The pnv_eeh_vf_final_fixup() will try to create sysfs for VFs. This requires
> >the VF sysfs(kobj) is initialized properly. If we put these into
> >pcibios_add_device(), the eeh_sysfs_add_device() would fail.
> >
> >Below is the call flow for your reference:
> >
> >pci_device_add()
> >    pcibios_add_device()
> >    device_add()                <--- kobj initialized here
> >
> 
> We can put it into pcibios_bus_add_device(), but we don't it currently. If
> Bjorn agree to add pcibios_bus_add_device(), I'm fine to move the block code
> there.

I think I'm OK with adding a pcibios_bus_add_device().  I think that would
be better than using the fixup mechanism for this.

Bjorn

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH V7 06/10] powerpc/eeh: Create PE for VFs
  2015-06-03 15:46           ` Bjorn Helgaas
@ 2015-06-04  1:25             ` Gavin Shan
  2015-06-04  5:46             ` Wei Yang
  2015-06-16  8:50             ` Wei Yang
  2 siblings, 0 replies; 68+ messages in thread
From: Gavin Shan @ 2015-06-04  1:25 UTC (permalink / raw)
  To: Bjorn Helgaas; +Cc: Gavin Shan, Wei Yang, linuxppc-dev, linux-pci

On Wed, Jun 03, 2015 at 10:46:38AM -0500, Bjorn Helgaas wrote:
>On Wed, Jun 03, 2015 at 03:10:23PM +1000, Gavin Shan wrote:
>> On Wed, Jun 03, 2015 at 11:31:42AM +0800, Wei Yang wrote:
>> >On Mon, Jun 01, 2015 at 06:46:45PM -0500, Bjorn Helgaas wrote:
>> >>On Tue, May 19, 2015 at 06:50:08PM +0800, Wei Yang wrote:
>> >>> Current EEH recovery code works with the assumption: the PE has primary
>> >>> bus. Unfortunately, that's not true to VF PEs, which generally contains
>> >>> one or multiple VFs (for VF group case). The patch creates PEs for VFs
>> >>> at PCI final fixup time. Those PEs for VFs are indentified with newly
>> >>> introduced flag EEH_PE_VF so that we handle them differently during
>> >>> EEH recovery.
>> >>> 
>> >>> [gwshan: changelog and code refactoring]
>> >>> Signed-off-by: Wei Yang <weiyang@linux.vnet.ibm.com>
>> >>> Acked-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
>> >>> ---
>> >>>  arch/powerpc/include/asm/eeh.h               |    1 +
>> >>>  arch/powerpc/kernel/eeh_pe.c                 |   10 ++++++++--
>> >>>  arch/powerpc/platforms/powernv/eeh-powernv.c |   17 +++++++++++++++++
>> >>>  3 files changed, 26 insertions(+), 2 deletions(-)
>> >>> 
>> >>> diff --git a/arch/powerpc/include/asm/eeh.h b/arch/powerpc/include/asm/eeh.h
>> >>> index 1b3614d..c1fde48 100644
>> >>> --- a/arch/powerpc/include/asm/eeh.h
>> >>> +++ b/arch/powerpc/include/asm/eeh.h
>> >>> @@ -70,6 +70,7 @@ struct pci_dn;
>> >>>  #define EEH_PE_PHB	(1 << 1)	/* PHB PE    */
>> >>>  #define EEH_PE_DEVICE 	(1 << 2)	/* Device PE */
>> >>>  #define EEH_PE_BUS	(1 << 3)	/* Bus PE    */
>> >>> +#define EEH_PE_VF	(1 << 4)	/* VF PE     */
>> >>>  
>> >>>  #define EEH_PE_ISOLATED		(1 << 0)	/* Isolated PE		*/
>> >>>  #define EEH_PE_RECOVERING	(1 << 1)	/* Recovering PE	*/
>> >>> diff --git a/arch/powerpc/kernel/eeh_pe.c b/arch/powerpc/kernel/eeh_pe.c
>> >>> index 35f0b62..260a701 100644
>> >>> --- a/arch/powerpc/kernel/eeh_pe.c
>> >>> +++ b/arch/powerpc/kernel/eeh_pe.c
>> >>> @@ -299,7 +299,10 @@ static struct eeh_pe *eeh_pe_get_parent(struct eeh_dev *edev)
>> >>>  	 * EEH device already having associated PE, but
>> >>>  	 * the direct parent EEH device doesn't have yet.
>> >>>  	 */
>> >>> -	pdn = pdn ? pdn->parent : NULL;
>> >>> +	if (edev->physfn)
>> >>> +		pdn = pci_get_pdn(edev->physfn);
>> >>> +	else
>> >>> +		pdn = pdn ? pdn->parent : NULL;
>> >>>  	while (pdn) {
>> >>>  		/* We're poking out of PCI territory */
>> >>>  		parent = pdn_to_eeh_dev(pdn);
>> >>> @@ -382,7 +385,10 @@ int eeh_add_to_parent_pe(struct eeh_dev *edev)
>> >>>  	}
>> >>>  
>> >>>  	/* Create a new EEH PE */
>> >>> -	pe = eeh_pe_alloc(edev->phb, EEH_PE_DEVICE);
>> >>> +	if (edev->physfn)
>> >>> +		pe = eeh_pe_alloc(edev->phb, EEH_PE_VF);
>> >>> +	else
>> >>> +		pe = eeh_pe_alloc(edev->phb, EEH_PE_DEVICE);
>> >>>  	if (!pe) {
>> >>>  		pr_err("%s: out of memory!\n", __func__);
>> >>>  		return -ENOMEM;
>> >>> diff --git a/arch/powerpc/platforms/powernv/eeh-powernv.c b/arch/powerpc/platforms/powernv/eeh-powernv.c
>> >>> index ce738ab..c505036 100644
>> >>> --- a/arch/powerpc/platforms/powernv/eeh-powernv.c
>> >>> +++ b/arch/powerpc/platforms/powernv/eeh-powernv.c
>> >>> @@ -1520,6 +1520,23 @@ static struct eeh_ops pnv_eeh_ops = {
>> >>>  	.restore_config		= pnv_eeh_restore_config
>> >>>  };
>> >>>  
>> >>> +static void pnv_eeh_vf_final_fixup(struct pci_dev *pdev)
>> >>> +{
>> >>> +	struct pci_dn *pdn = pci_get_pdn(pdev);
>> >>> +
>> >>> +	if (!pdev->is_virtfn)
>> >>> +		return;
>> >>> +
>> >>> +	/*
>> >>> +	 * The following operations will fail if VF's sysfs files
>> >>> +	 * aren't created or its resources aren't finalized.
>> >>> +	 */
>> >>
>> >>I don't understand this comment.  "The following operations" seems to refer
>> >>to eeh_add_device_early() and eeh_add_device_late(), and
>> >>"VF's sysfs files being created" seems to refer to eeh_sysfs_add_device().
>> >>
>> >>So the comment suggests that eeh_add_device_early() and
>> >>eeh_add_device_late() will fail because they're called before
>> >>eeh_sysfs_add_device().  So I think you must be talking about some other
>> >>"following operations," not eeh_add_device_early() and
>> >>eeh_add_device_late().
>> >
>> >Sorry for this confusion.
>> >
>> >The comment here wants to say the eeh_sysfs_add_device() will fail if the VF's
>> >sysfs is not created well. Or it will fail if the VF's resources are not set
>> >properly, since we would cache the VF's BAR in eeh_add_device_late().
>> >
>> >Gavin,
>> >
>> >If my understanding is not correct please let me know.
>> >
>> 
>> It's correct. "The following operations" refers to eeh_add_device_late()
>> and eeh_sysfs_add_device(). The former one requires the resources for
>> one particular PCI device (VF here) are finalized (assigned). eeh_sysfs_add_device()
>> will fail if the sysfs entry for the PCI device isn't populated yet.
>
>eeh_add_device_late() contains several things that read config space:
>eeh_save_bars() caches the entire config header, and
>eeh_addr_cache_insert_dev() looks at the device resources (which are
>determined by BARs in config space).  I think this is an error-prone
>approach.  I think it would be simpler and safer for you to capture what
>you need in your PCI config accessors.
>

I don't follow you very well. I think you're saying the source of all
information should be config space exclusively. The code is shared by
multple platforms, one of which is pSeries running on top of PowerVM
hypervisor or KVM/QEMU. The device resources are figured from device-tree,
not from config space.

>eeh_add_device_late() also contains code to deal with an EEH cache that
>"might not be removed correctly because of unbalanced kref to the device
>during unplug time."  That's unrelated to this patch series, but it sounds
>... like a hacky workaround for some bug in the unplug path.
>

Yes. We depend on pcibios_release_device() to disconnect EEH device
and PCI device. pcibios_release_device() might not be called because
of unbalanced refcount. The workaround here is to disconnect EEH device
and PCI device lazily, and then the EEH device is connected to the
right PCI device and EEH address cache is updated accordingly.

>> >>> +	eeh_add_device_early(pdn);
>> >>> +	eeh_add_device_late(pdev);
>> >>> +	eeh_sysfs_add_device(pdev);
>> >>> +}
>> >>> +DECLARE_PCI_FIXUP_FINAL(PCI_ANY_ID, PCI_ANY_ID, pnv_eeh_vf_final_fixup);
>> >>
>> >>Ugh.  This is powerpc code, but I don't like using fixups as a hook like
>> >>this.  There is a pcibios_add_device() -- could this be done there?
>> >>
>> >
>> >I don't like it neither :-) But looks we can't put it in the
>> >pcibios_add_device().
>> >
>> >>If not, what happens after pcibios_add_device() that is required for this
>> >>code?  Maybe we need a pcibios_bus_add_device() hook?
>> >
>> >The pnv_eeh_vf_final_fixup() will try to create sysfs for VFs. This requires
>> >the VF sysfs(kobj) is initialized properly. If we put these into
>> >pcibios_add_device(), the eeh_sysfs_add_device() would fail.
>> >
>> >Below is the call flow for your reference:
>> >
>> >pci_device_add()
>> >    pcibios_add_device()
>> >    device_add()                <--- kobj initialized here
>> >
>> 
>> We can put it into pcibios_bus_add_device(), but we don't it currently. If
>> Bjorn agree to add pcibios_bus_add_device(), I'm fine to move the block code
>> there.
>
>I think I'm OK with adding a pcibios_bus_add_device().  I think that would
>be better than using the fixup mechanism for this.
>

Ok. Thanks for confirm.

Thanks,
Gavin


^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH V7 06/10] powerpc/eeh: Create PE for VFs
  2015-06-03 15:46           ` Bjorn Helgaas
  2015-06-04  1:25             ` Gavin Shan
@ 2015-06-04  5:46             ` Wei Yang
  2015-06-04  7:10               ` Gavin Shan
  2015-06-16  8:50             ` Wei Yang
  2 siblings, 1 reply; 68+ messages in thread
From: Wei Yang @ 2015-06-04  5:46 UTC (permalink / raw)
  To: Bjorn Helgaas; +Cc: Gavin Shan, Wei Yang, linuxppc-dev, linux-pci

On Wed, Jun 03, 2015 at 10:46:38AM -0500, Bjorn Helgaas wrote:
>On Wed, Jun 03, 2015 at 03:10:23PM +1000, Gavin Shan wrote:
>> On Wed, Jun 03, 2015 at 11:31:42AM +0800, Wei Yang wrote:
>> >On Mon, Jun 01, 2015 at 06:46:45PM -0500, Bjorn Helgaas wrote:
>> >>On Tue, May 19, 2015 at 06:50:08PM +0800, Wei Yang wrote:
>> >>> Current EEH recovery code works with the assumption: the PE has primary
>> >>> bus. Unfortunately, that's not true to VF PEs, which generally contains
>> >>> one or multiple VFs (for VF group case). The patch creates PEs for VFs
>> >>> at PCI final fixup time. Those PEs for VFs are indentified with newly
>> >>> introduced flag EEH_PE_VF so that we handle them differently during
>> >>> EEH recovery.
>> >>> 
>> >>> [gwshan: changelog and code refactoring]
>> >>> Signed-off-by: Wei Yang <weiyang@linux.vnet.ibm.com>
>> >>> Acked-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
>> >>> ---
>> >>>  arch/powerpc/include/asm/eeh.h               |    1 +
>> >>>  arch/powerpc/kernel/eeh_pe.c                 |   10 ++++++++--
>> >>>  arch/powerpc/platforms/powernv/eeh-powernv.c |   17 +++++++++++++++++
>> >>>  3 files changed, 26 insertions(+), 2 deletions(-)
>> >>> 
>> >>> diff --git a/arch/powerpc/include/asm/eeh.h b/arch/powerpc/include/asm/eeh.h
>> >>> index 1b3614d..c1fde48 100644
>> >>> --- a/arch/powerpc/include/asm/eeh.h
>> >>> +++ b/arch/powerpc/include/asm/eeh.h
>> >>> @@ -70,6 +70,7 @@ struct pci_dn;
>> >>>  #define EEH_PE_PHB	(1 << 1)	/* PHB PE    */
>> >>>  #define EEH_PE_DEVICE 	(1 << 2)	/* Device PE */
>> >>>  #define EEH_PE_BUS	(1 << 3)	/* Bus PE    */
>> >>> +#define EEH_PE_VF	(1 << 4)	/* VF PE     */
>> >>>  
>> >>>  #define EEH_PE_ISOLATED		(1 << 0)	/* Isolated PE		*/
>> >>>  #define EEH_PE_RECOVERING	(1 << 1)	/* Recovering PE	*/
>> >>> diff --git a/arch/powerpc/kernel/eeh_pe.c b/arch/powerpc/kernel/eeh_pe.c
>> >>> index 35f0b62..260a701 100644
>> >>> --- a/arch/powerpc/kernel/eeh_pe.c
>> >>> +++ b/arch/powerpc/kernel/eeh_pe.c
>> >>> @@ -299,7 +299,10 @@ static struct eeh_pe *eeh_pe_get_parent(struct eeh_dev *edev)
>> >>>  	 * EEH device already having associated PE, but
>> >>>  	 * the direct parent EEH device doesn't have yet.
>> >>>  	 */
>> >>> -	pdn = pdn ? pdn->parent : NULL;
>> >>> +	if (edev->physfn)
>> >>> +		pdn = pci_get_pdn(edev->physfn);
>> >>> +	else
>> >>> +		pdn = pdn ? pdn->parent : NULL;
>> >>>  	while (pdn) {
>> >>>  		/* We're poking out of PCI territory */
>> >>>  		parent = pdn_to_eeh_dev(pdn);
>> >>> @@ -382,7 +385,10 @@ int eeh_add_to_parent_pe(struct eeh_dev *edev)
>> >>>  	}
>> >>>  
>> >>>  	/* Create a new EEH PE */
>> >>> -	pe = eeh_pe_alloc(edev->phb, EEH_PE_DEVICE);
>> >>> +	if (edev->physfn)
>> >>> +		pe = eeh_pe_alloc(edev->phb, EEH_PE_VF);
>> >>> +	else
>> >>> +		pe = eeh_pe_alloc(edev->phb, EEH_PE_DEVICE);
>> >>>  	if (!pe) {
>> >>>  		pr_err("%s: out of memory!\n", __func__);
>> >>>  		return -ENOMEM;
>> >>> diff --git a/arch/powerpc/platforms/powernv/eeh-powernv.c b/arch/powerpc/platforms/powernv/eeh-powernv.c
>> >>> index ce738ab..c505036 100644
>> >>> --- a/arch/powerpc/platforms/powernv/eeh-powernv.c
>> >>> +++ b/arch/powerpc/platforms/powernv/eeh-powernv.c
>> >>> @@ -1520,6 +1520,23 @@ static struct eeh_ops pnv_eeh_ops = {
>> >>>  	.restore_config		= pnv_eeh_restore_config
>> >>>  };
>> >>>  
>> >>> +static void pnv_eeh_vf_final_fixup(struct pci_dev *pdev)
>> >>> +{
>> >>> +	struct pci_dn *pdn = pci_get_pdn(pdev);
>> >>> +
>> >>> +	if (!pdev->is_virtfn)
>> >>> +		return;
>> >>> +
>> >>> +	/*
>> >>> +	 * The following operations will fail if VF's sysfs files
>> >>> +	 * aren't created or its resources aren't finalized.
>> >>> +	 */
>> >>
>> >>I don't understand this comment.  "The following operations" seems to refer
>> >>to eeh_add_device_early() and eeh_add_device_late(), and
>> >>"VF's sysfs files being created" seems to refer to eeh_sysfs_add_device().
>> >>
>> >>So the comment suggests that eeh_add_device_early() and
>> >>eeh_add_device_late() will fail because they're called before
>> >>eeh_sysfs_add_device().  So I think you must be talking about some other
>> >>"following operations," not eeh_add_device_early() and
>> >>eeh_add_device_late().
>> >
>> >Sorry for this confusion.
>> >
>> >The comment here wants to say the eeh_sysfs_add_device() will fail if the VF's
>> >sysfs is not created well. Or it will fail if the VF's resources are not set
>> >properly, since we would cache the VF's BAR in eeh_add_device_late().
>> >
>> >Gavin,
>> >
>> >If my understanding is not correct please let me know.
>> >
>> 
>> It's correct. "The following operations" refers to eeh_add_device_late()
>> and eeh_sysfs_add_device(). The former one requires the resources for
>> one particular PCI device (VF here) are finalized (assigned). eeh_sysfs_add_device()
>> will fail if the sysfs entry for the PCI device isn't populated yet.
>
>eeh_add_device_late() contains several things that read config space:
>eeh_save_bars() caches the entire config header, and
>eeh_addr_cache_insert_dev() looks at the device resources (which are
>determined by BARs in config space).  I think this is an error-prone
>approach.  I think it would be simpler and safer for you to capture what
>you need in your PCI config accessors.
>
>eeh_add_device_late() also contains code to deal with an EEH cache that
>"might not be removed correctly because of unbalanced kref to the device
>during unplug time."  That's unrelated to this patch series, but it sounds
>... like a hacky workaround for some bug in the unplug path.
>
>> >>> +	eeh_add_device_early(pdn);
>> >>> +	eeh_add_device_late(pdev);
>> >>> +	eeh_sysfs_add_device(pdev);
>> >>> +}
>> >>> +DECLARE_PCI_FIXUP_FINAL(PCI_ANY_ID, PCI_ANY_ID, pnv_eeh_vf_final_fixup);
>> >>
>> >>Ugh.  This is powerpc code, but I don't like using fixups as a hook like
>> >>this.  There is a pcibios_add_device() -- could this be done there?
>> >>
>> >
>> >I don't like it neither :-) But looks we can't put it in the
>> >pcibios_add_device().
>> >
>> >>If not, what happens after pcibios_add_device() that is required for this
>> >>code?  Maybe we need a pcibios_bus_add_device() hook?
>> >
>> >The pnv_eeh_vf_final_fixup() will try to create sysfs for VFs. This requires
>> >the VF sysfs(kobj) is initialized properly. If we put these into
>> >pcibios_add_device(), the eeh_sysfs_add_device() would fail.
>> >
>> >Below is the call flow for your reference:
>> >
>> >pci_device_add()
>> >    pcibios_add_device()
>> >    device_add()                <--- kobj initialized here
>> >
>> 
>> We can put it into pcibios_bus_add_device(), but we don't it currently. If
>> Bjorn agree to add pcibios_bus_add_device(), I'm fine to move the block code
>> there.
>
>I think I'm OK with adding a pcibios_bus_add_device().  I think that would
>be better than using the fixup mechanism for this.
>

Thanks for your confirmation.

While I am a little out of the page, where should I put the
pcibios_bus_add_device()?

Gavin,

After we have this, we should move the EEH probe related code all to this
place, right? Then both PF and VF has the same place to initialized the EEH,
right?

If my understanding is not correct, please let me know:)

>Bjorn

-- 
Richard Yang
Help you, Help me


^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH V7 06/10] powerpc/eeh: Create PE for VFs
  2015-06-04  5:46             ` Wei Yang
@ 2015-06-04  7:10               ` Gavin Shan
  0 siblings, 0 replies; 68+ messages in thread
From: Gavin Shan @ 2015-06-04  7:10 UTC (permalink / raw)
  To: Wei Yang; +Cc: Bjorn Helgaas, Gavin Shan, linuxppc-dev, linux-pci

On Thu, Jun 04, 2015 at 01:46:15PM +0800, Wei Yang wrote:
>On Wed, Jun 03, 2015 at 10:46:38AM -0500, Bjorn Helgaas wrote:
>>On Wed, Jun 03, 2015 at 03:10:23PM +1000, Gavin Shan wrote:
>>> On Wed, Jun 03, 2015 at 11:31:42AM +0800, Wei Yang wrote:
>>> >On Mon, Jun 01, 2015 at 06:46:45PM -0500, Bjorn Helgaas wrote:
>>> >>On Tue, May 19, 2015 at 06:50:08PM +0800, Wei Yang wrote:
>>> >>> Current EEH recovery code works with the assumption: the PE has primary
>>> >>> bus. Unfortunately, that's not true to VF PEs, which generally contains
>>> >>> one or multiple VFs (for VF group case). The patch creates PEs for VFs
>>> >>> at PCI final fixup time. Those PEs for VFs are indentified with newly
>>> >>> introduced flag EEH_PE_VF so that we handle them differently during
>>> >>> EEH recovery.
>>> >>> 
>>> >>> [gwshan: changelog and code refactoring]
>>> >>> Signed-off-by: Wei Yang <weiyang@linux.vnet.ibm.com>
>>> >>> Acked-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
>>> >>> ---
>>> >>>  arch/powerpc/include/asm/eeh.h               |    1 +
>>> >>>  arch/powerpc/kernel/eeh_pe.c                 |   10 ++++++++--
>>> >>>  arch/powerpc/platforms/powernv/eeh-powernv.c |   17 +++++++++++++++++
>>> >>>  3 files changed, 26 insertions(+), 2 deletions(-)
>>> >>> 
>>> >>> diff --git a/arch/powerpc/include/asm/eeh.h b/arch/powerpc/include/asm/eeh.h
>>> >>> index 1b3614d..c1fde48 100644
>>> >>> --- a/arch/powerpc/include/asm/eeh.h
>>> >>> +++ b/arch/powerpc/include/asm/eeh.h
>>> >>> @@ -70,6 +70,7 @@ struct pci_dn;
>>> >>>  #define EEH_PE_PHB	(1 << 1)	/* PHB PE    */
>>> >>>  #define EEH_PE_DEVICE 	(1 << 2)	/* Device PE */
>>> >>>  #define EEH_PE_BUS	(1 << 3)	/* Bus PE    */
>>> >>> +#define EEH_PE_VF	(1 << 4)	/* VF PE     */
>>> >>>  
>>> >>>  #define EEH_PE_ISOLATED		(1 << 0)	/* Isolated PE		*/
>>> >>>  #define EEH_PE_RECOVERING	(1 << 1)	/* Recovering PE	*/
>>> >>> diff --git a/arch/powerpc/kernel/eeh_pe.c b/arch/powerpc/kernel/eeh_pe.c
>>> >>> index 35f0b62..260a701 100644
>>> >>> --- a/arch/powerpc/kernel/eeh_pe.c
>>> >>> +++ b/arch/powerpc/kernel/eeh_pe.c
>>> >>> @@ -299,7 +299,10 @@ static struct eeh_pe *eeh_pe_get_parent(struct eeh_dev *edev)
>>> >>>  	 * EEH device already having associated PE, but
>>> >>>  	 * the direct parent EEH device doesn't have yet.
>>> >>>  	 */
>>> >>> -	pdn = pdn ? pdn->parent : NULL;
>>> >>> +	if (edev->physfn)
>>> >>> +		pdn = pci_get_pdn(edev->physfn);
>>> >>> +	else
>>> >>> +		pdn = pdn ? pdn->parent : NULL;
>>> >>>  	while (pdn) {
>>> >>>  		/* We're poking out of PCI territory */
>>> >>>  		parent = pdn_to_eeh_dev(pdn);
>>> >>> @@ -382,7 +385,10 @@ int eeh_add_to_parent_pe(struct eeh_dev *edev)
>>> >>>  	}
>>> >>>  
>>> >>>  	/* Create a new EEH PE */
>>> >>> -	pe = eeh_pe_alloc(edev->phb, EEH_PE_DEVICE);
>>> >>> +	if (edev->physfn)
>>> >>> +		pe = eeh_pe_alloc(edev->phb, EEH_PE_VF);
>>> >>> +	else
>>> >>> +		pe = eeh_pe_alloc(edev->phb, EEH_PE_DEVICE);
>>> >>>  	if (!pe) {
>>> >>>  		pr_err("%s: out of memory!\n", __func__);
>>> >>>  		return -ENOMEM;
>>> >>> diff --git a/arch/powerpc/platforms/powernv/eeh-powernv.c b/arch/powerpc/platforms/powernv/eeh-powernv.c
>>> >>> index ce738ab..c505036 100644
>>> >>> --- a/arch/powerpc/platforms/powernv/eeh-powernv.c
>>> >>> +++ b/arch/powerpc/platforms/powernv/eeh-powernv.c
>>> >>> @@ -1520,6 +1520,23 @@ static struct eeh_ops pnv_eeh_ops = {
>>> >>>  	.restore_config		= pnv_eeh_restore_config
>>> >>>  };
>>> >>>  
>>> >>> +static void pnv_eeh_vf_final_fixup(struct pci_dev *pdev)
>>> >>> +{
>>> >>> +	struct pci_dn *pdn = pci_get_pdn(pdev);
>>> >>> +
>>> >>> +	if (!pdev->is_virtfn)
>>> >>> +		return;
>>> >>> +
>>> >>> +	/*
>>> >>> +	 * The following operations will fail if VF's sysfs files
>>> >>> +	 * aren't created or its resources aren't finalized.
>>> >>> +	 */
>>> >>
>>> >>I don't understand this comment.  "The following operations" seems to refer
>>> >>to eeh_add_device_early() and eeh_add_device_late(), and
>>> >>"VF's sysfs files being created" seems to refer to eeh_sysfs_add_device().
>>> >>
>>> >>So the comment suggests that eeh_add_device_early() and
>>> >>eeh_add_device_late() will fail because they're called before
>>> >>eeh_sysfs_add_device().  So I think you must be talking about some other
>>> >>"following operations," not eeh_add_device_early() and
>>> >>eeh_add_device_late().
>>> >
>>> >Sorry for this confusion.
>>> >
>>> >The comment here wants to say the eeh_sysfs_add_device() will fail if the VF's
>>> >sysfs is not created well. Or it will fail if the VF's resources are not set
>>> >properly, since we would cache the VF's BAR in eeh_add_device_late().
>>> >
>>> >Gavin,
>>> >
>>> >If my understanding is not correct please let me know.
>>> >
>>> 
>>> It's correct. "The following operations" refers to eeh_add_device_late()
>>> and eeh_sysfs_add_device(). The former one requires the resources for
>>> one particular PCI device (VF here) are finalized (assigned). eeh_sysfs_add_device()
>>> will fail if the sysfs entry for the PCI device isn't populated yet.
>>
>>eeh_add_device_late() contains several things that read config space:
>>eeh_save_bars() caches the entire config header, and
>>eeh_addr_cache_insert_dev() looks at the device resources (which are
>>determined by BARs in config space).  I think this is an error-prone
>>approach.  I think it would be simpler and safer for you to capture what
>>you need in your PCI config accessors.
>>
>>eeh_add_device_late() also contains code to deal with an EEH cache that
>>"might not be removed correctly because of unbalanced kref to the device
>>during unplug time."  That's unrelated to this patch series, but it sounds
>>... like a hacky workaround for some bug in the unplug path.
>>
>>> >>> +	eeh_add_device_early(pdn);
>>> >>> +	eeh_add_device_late(pdev);
>>> >>> +	eeh_sysfs_add_device(pdev);
>>> >>> +}
>>> >>> +DECLARE_PCI_FIXUP_FINAL(PCI_ANY_ID, PCI_ANY_ID, pnv_eeh_vf_final_fixup);
>>> >>
>>> >>Ugh.  This is powerpc code, but I don't like using fixups as a hook like
>>> >>this.  There is a pcibios_add_device() -- could this be done there?
>>> >>
>>> >
>>> >I don't like it neither :-) But looks we can't put it in the
>>> >pcibios_add_device().
>>> >
>>> >>If not, what happens after pcibios_add_device() that is required for this
>>> >>code?  Maybe we need a pcibios_bus_add_device() hook?
>>> >
>>> >The pnv_eeh_vf_final_fixup() will try to create sysfs for VFs. This requires
>>> >the VF sysfs(kobj) is initialized properly. If we put these into
>>> >pcibios_add_device(), the eeh_sysfs_add_device() would fail.
>>> >
>>> >Below is the call flow for your reference:
>>> >
>>> >pci_device_add()
>>> >    pcibios_add_device()
>>> >    device_add()                <--- kobj initialized here
>>> >
>>> 
>>> We can put it into pcibios_bus_add_device(), but we don't it currently. If
>>> Bjorn agree to add pcibios_bus_add_device(), I'm fine to move the block code
>>> there.
>>
>>I think I'm OK with adding a pcibios_bus_add_device().  I think that would
>>be better than using the fixup mechanism for this.
>>
>
>Thanks for your confirmation.
>
>While I am a little out of the page, where should I put the
>pcibios_bus_add_device()?
>
>Gavin,
>
>After we have this, we should move the EEH probe related code all to this
>place, right? Then both PF and VF has the same place to initialized the EEH,
>right?
>
>If my understanding is not correct, please let me know:)
>

No, we can't do things as you suggested for various reasons: The EEH device
is probed based on device-tree (or pdn on pSeries) or PCI device (on PowerNV)
though eeh_ops->probe() takes "pdn" as argument. That means the time for
probing EEH device is different for PowerNV/pSeries and we can't unify them
by simply puting the logic to pcibios_bus_add_device().

So I'm expecting something like below:

- (A) Introduce weak pcibios_bus_add_device() as Bjorn suggested.
- (B) Introduce pci_controller_ops::bus_add_device(), whose PowerNV backend
  does those things here (EEH device probing, building EEH addr cache ...)
  for VF only.

(B) potentially conflicts with current code if PF is involved in hot plug.
I'm not sure if you tested this case or not.

- PF is removed during hot unplug time;
- PF is added during hot plug time. PF's driver is loaded and VFs are enabled.
- pcibios_bus_add_device() is called for VFs and initialize EEH stuff.
- At later point, pcibios_finish_adding_to_bus() is called to initialize EEH
  stuff for VFs again.

Current code already should have avoided the conflict, but worhty to test to
see if there're any problems:

- In probing time, EEH device is skipped if EEH device already had parent PE
  connected.
- sysfs won't be populated again when seeing flag EEH_DEV_SYSFS.
- The PCI device shouldn't be added to address cache if it has been there.


Thanks,
Gavin


^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH V7 06/10] powerpc/eeh: Create PE for VFs
  2015-06-03 15:46           ` Bjorn Helgaas
  2015-06-04  1:25             ` Gavin Shan
  2015-06-04  5:46             ` Wei Yang
@ 2015-06-16  8:50             ` Wei Yang
  2015-06-16 13:22               ` Bjorn Helgaas
  2 siblings, 1 reply; 68+ messages in thread
From: Wei Yang @ 2015-06-16  8:50 UTC (permalink / raw)
  To: Bjorn Helgaas; +Cc: Gavin Shan, Wei Yang, linuxppc-dev, linux-pci

On Wed, Jun 03, 2015 at 10:46:38AM -0500, Bjorn Helgaas wrote:
>On Wed, Jun 03, 2015 at 03:10:23PM +1000, Gavin Shan wrote:
>> It's correct. "The following operations" refers to eeh_add_device_late()
>> and eeh_sysfs_add_device(). The former one requires the resources for
>> one particular PCI device (VF here) are finalized (assigned). eeh_sysfs_add_device()
>> will fail if the sysfs entry for the PCI device isn't populated yet.
>
>eeh_add_device_late() contains several things that read config space:
>eeh_save_bars() caches the entire config header, and
>eeh_addr_cache_insert_dev() looks at the device resources (which are
>determined by BARs in config space).  I think this is an error-prone
>approach.  I think it would be simpler and safer for you to capture what
>you need in your PCI config accessors.
>
>eeh_add_device_late() also contains code to deal with an EEH cache that
>"might not be removed correctly because of unbalanced kref to the device
>during unplug time."  That's unrelated to this patch series, but it sounds
>... like a hacky workaround for some bug in the unplug path.
>
>> >>> +	eeh_add_device_early(pdn);
>> >>> +	eeh_add_device_late(pdev);
>> >>> +	eeh_sysfs_add_device(pdev);
>> >>> +}
>> >>> +DECLARE_PCI_FIXUP_FINAL(PCI_ANY_ID, PCI_ANY_ID, pnv_eeh_vf_final_fixup);
>> >>
>> >>Ugh.  This is powerpc code, but I don't like using fixups as a hook like
>> >>this.  There is a pcibios_add_device() -- could this be done there?
>> >>
>> >
>> >I don't like it neither :-) But looks we can't put it in the
>> >pcibios_add_device().
>> >
>> >>If not, what happens after pcibios_add_device() that is required for this
>> >>code?  Maybe we need a pcibios_bus_add_device() hook?
>> >
>> >The pnv_eeh_vf_final_fixup() will try to create sysfs for VFs. This requires
>> >the VF sysfs(kobj) is initialized properly. If we put these into
>> >pcibios_add_device(), the eeh_sysfs_add_device() would fail.
>> >
>> >Below is the call flow for your reference:
>> >
>> >pci_device_add()
>> >    pcibios_add_device()
>> >    device_add()                <--- kobj initialized here
>> >
>> 
>> We can put it into pcibios_bus_add_device(), but we don't it currently. If
>> Bjorn agree to add pcibios_bus_add_device(), I'm fine to move the block code
>> there.
>
>I think I'm OK with adding a pcibios_bus_add_device().  I think that would
>be better than using the fixup mechanism for this.
>

Hi, Bjorn, Gavin,

Been working on some bug recently, just got a chance to this one.

Would you mind giving me some hint, where you suggest to put the
pcibios_bus_add_device()?

>Bjorn

-- 
Richard Yang
Help you, Help me


^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH V7 06/10] powerpc/eeh: Create PE for VFs
  2015-06-16  8:50             ` Wei Yang
@ 2015-06-16 13:22               ` Bjorn Helgaas
  0 siblings, 0 replies; 68+ messages in thread
From: Bjorn Helgaas @ 2015-06-16 13:22 UTC (permalink / raw)
  To: Wei Yang; +Cc: Gavin Shan, linuxppc-dev, linux-pci@vger.kernel.org

On Tue, Jun 16, 2015 at 3:50 AM, Wei Yang <weiyang@linux.vnet.ibm.com> wrote:
> On Wed, Jun 03, 2015 at 10:46:38AM -0500, Bjorn Helgaas wrote:
>>On Wed, Jun 03, 2015 at 03:10:23PM +1000, Gavin Shan wrote:
>>> It's correct. "The following operations" refers to eeh_add_device_late()
>>> and eeh_sysfs_add_device(). The former one requires the resources for
>>> one particular PCI device (VF here) are finalized (assigned). eeh_sysfs_add_device()
>>> will fail if the sysfs entry for the PCI device isn't populated yet.
>>
>>eeh_add_device_late() contains several things that read config space:
>>eeh_save_bars() caches the entire config header, and
>>eeh_addr_cache_insert_dev() looks at the device resources (which are
>>determined by BARs in config space).  I think this is an error-prone
>>approach.  I think it would be simpler and safer for you to capture what
>>you need in your PCI config accessors.
>>
>>eeh_add_device_late() also contains code to deal with an EEH cache that
>>"might not be removed correctly because of unbalanced kref to the device
>>during unplug time."  That's unrelated to this patch series, but it sounds
>>... like a hacky workaround for some bug in the unplug path.
>>
>>> >>> +        eeh_add_device_early(pdn);
>>> >>> +        eeh_add_device_late(pdev);
>>> >>> +        eeh_sysfs_add_device(pdev);
>>> >>> +}
>>> >>> +DECLARE_PCI_FIXUP_FINAL(PCI_ANY_ID, PCI_ANY_ID, pnv_eeh_vf_final_fixup);
>>> >>
>>> >>Ugh.  This is powerpc code, but I don't like using fixups as a hook like
>>> >>this.  There is a pcibios_add_device() -- could this be done there?
>>> >>
>>> >
>>> >I don't like it neither :-) But looks we can't put it in the
>>> >pcibios_add_device().
>>> >
>>> >>If not, what happens after pcibios_add_device() that is required for this
>>> >>code?  Maybe we need a pcibios_bus_add_device() hook?
>>> >
>>> >The pnv_eeh_vf_final_fixup() will try to create sysfs for VFs. This requires
>>> >the VF sysfs(kobj) is initialized properly. If we put these into
>>> >pcibios_add_device(), the eeh_sysfs_add_device() would fail.
>>> >
>>> >Below is the call flow for your reference:
>>> >
>>> >pci_device_add()
>>> >    pcibios_add_device()
>>> >    device_add()                <--- kobj initialized here
>>> >
>>>
>>> We can put it into pcibios_bus_add_device(), but we don't it currently. If
>>> Bjorn agree to add pcibios_bus_add_device(), I'm fine to move the block code
>>> there.
>>
>>I think I'm OK with adding a pcibios_bus_add_device().  I think that would
>>be better than using the fixup mechanism for this.
>>
>
> Hi, Bjorn, Gavin,
>
> Been working on some bug recently, just got a chance to this one.
>
> Would you mind giving me some hint, where you suggest to put the
> pcibios_bus_add_device()?

I would expect it to be called from pci_bus_add_device().

Bjorn

^ permalink raw reply	[flat|nested] 68+ messages in thread

end of thread, other threads:[~2015-06-16 13:22 UTC | newest]

Thread overview: 68+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-05-19  1:35 [PATCH V6 00/10] VF EEH on Power8 Wei Yang
2015-05-19  1:35 ` Wei Yang
2015-05-19  1:35 ` [PATCH V6 01/10] PCI/IOV: Rename and export virtfn_add/virtfn_remove Wei Yang
2015-05-19  1:35   ` Wei Yang
2015-05-19  5:24   ` Wei Yang
2015-05-19  5:24     ` Wei Yang
2015-05-19  1:35 ` [PATCH V6 02/10] powerpc/pci: Cache VF index in pci_dn Wei Yang
2015-05-19  1:35   ` Wei Yang
2015-05-19  1:35 ` [PATCH V6 03/10] powerpc/pci: Remove VFs prior to PF Wei Yang
2015-05-19  1:35   ` Wei Yang
2015-05-19  1:35 ` [PATCH V6 04/10] powerpc/eeh: Trace first 7 BARs in address cache Wei Yang
2015-05-19  1:35   ` Wei Yang
2015-05-19  1:35 ` [PATCH V6 05/10] powerpc/powernv: EEH device for VF Wei Yang
2015-05-19  1:35   ` Wei Yang
2015-05-19  1:35 ` [PATCH V6 06/10] powerpc/eeh: Create PE for VFs Wei Yang
2015-05-19  1:35   ` Wei Yang
2015-05-19  1:35 ` [PATCH V6 07/10] powerpc/powernv: Support EEH reset for VF PE Wei Yang
2015-05-19  1:35   ` Wei Yang
2015-05-19  1:35 ` [PATCH V6 08/10] powerpc/powernv: Support PCI config restore for VFs Wei Yang
2015-05-19  1:35   ` Wei Yang
2015-05-19  1:35 ` [PATCH V6 09/10] powerpc/eeh: Support error recovery for VF PE Wei Yang
2015-05-19  1:35   ` Wei Yang
2015-05-19  1:35 ` [PATCH V6 10/10] powerpc/powernv: compound PE for VFs Wei Yang
2015-05-19  1:35   ` Wei Yang
2015-05-19 10:50 ` [PATCH V7 00/10] VF EEH on Power8 Wei Yang
2015-05-19 10:50   ` Wei Yang
2015-05-19 10:50   ` [PATCH V7 01/10] PCI/IOV: Rename and export virtfn_add/virtfn_remove Wei Yang
2015-05-19 10:50     ` Wei Yang
2015-06-02 17:19     ` Bjorn Helgaas
2015-06-03  1:38       ` Wei Yang
2015-05-19 10:50   ` [PATCH V7 02/10] powerpc/pci: Cache VF index in pci_dn Wei Yang
2015-05-19 10:50     ` Wei Yang
2015-05-19 10:50   ` [PATCH V7 03/10] powerpc/pci: Remove VFs prior to PF Wei Yang
2015-05-19 10:50     ` Wei Yang
2015-06-01 23:20     ` Bjorn Helgaas
2015-06-02  3:44       ` Wei Yang
2015-05-19 10:50   ` [PATCH V7 04/10] powerpc/eeh: Trace first 7 BARs in address cache Wei Yang
2015-05-19 10:50     ` Wei Yang
2015-06-01 23:32     ` Bjorn Helgaas
2015-06-02  3:51       ` Wei Yang
2015-06-02  4:11         ` Gavin Shan
2015-06-03  1:47           ` Wei Yang
2015-05-19 10:50   ` [PATCH V7 05/10] powerpc/powernv: EEH device for VF Wei Yang
2015-05-19 10:50     ` Wei Yang
2015-05-19 10:50   ` [PATCH V7 06/10] powerpc/eeh: Create PE for VFs Wei Yang
2015-05-19 10:50     ` Wei Yang
2015-06-01 23:46     ` Bjorn Helgaas
2015-06-03  3:31       ` Wei Yang
2015-06-03  5:10         ` Gavin Shan
2015-06-03 15:46           ` Bjorn Helgaas
2015-06-04  1:25             ` Gavin Shan
2015-06-04  5:46             ` Wei Yang
2015-06-04  7:10               ` Gavin Shan
2015-06-16  8:50             ` Wei Yang
2015-06-16 13:22               ` Bjorn Helgaas
2015-06-01 23:49     ` Bjorn Helgaas
2015-06-03  3:39       ` Wei Yang
2015-05-19 10:50   ` [PATCH V7 07/10] powerpc/powernv: Support EEH reset for VF PE Wei Yang
2015-05-19 10:50     ` Wei Yang
2015-05-19 10:50   ` [PATCH V7 08/10] powerpc/powernv: Support PCI config restore for VFs Wei Yang
2015-05-19 10:50     ` Wei Yang
2015-06-02  0:01     ` Bjorn Helgaas
2015-06-03  1:37       ` Wei Yang
2015-06-03  5:14         ` Gavin Shan
2015-05-19 10:50   ` [PATCH V7 09/10] powerpc/eeh: Support error recovery for VF PE Wei Yang
2015-05-19 10:50     ` Wei Yang
2015-05-19 10:50   ` [PATCH V7 10/10] powerpc/powernv: compound PE for VFs Wei Yang
2015-05-19 10:50     ` Wei Yang

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.