[PATCHv3 0/3] ARM: implement workaround for Cortex-A9/PL310/PCIe deadlock

All the mail mirrored from lore.kernel.org
 help / color / mirror / Atom feed

* [PATCHv3 0/3] ARM: implement workaround for Cortex-A9/PL310/PCIe deadlock
@ 2014-05-15  9:18 ` Thomas Petazzoni
  0 siblings, 0 replies; 66+ messages in thread
From: Thomas Petazzoni @ 2014-05-15  9:18 UTC (permalink / raw
  To: Russell King, Will Deacon, Catalin Marinas,
	devicetree-u79uwXL29TY76Z2rM5mHXA, Grant Likely, Rob Herring
  Cc: Albin Tonnerre, linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	Jason Cooper, Andrew Lunn, Sebastian Hesselbarth, Gregory Clement,
	Tawfik Bayouk, Nadav Haklai, Lior Amsalem, Ezequiel Garcia

Russell, Will, Catalin,

This patch series adresses a problem that affects the newer Marvell
Armada 375 and 38x SOCs, based on Cortex-A9+PL310, combined with the
Marvell PCIe hardware unit. When the hardware I/O coherency is
enabled, the combination of Cortex-A9/PL310/Marvell PCIe hardware unit
will quickly cause a deadlock when the PCIe bus is stressed.

The workaround for this problem has been suggested by ARM, and
consists in two things:

 (1) Map the PCIe regions as strongly-ordered

 (2) Disable the outer cache sync of the PL310 when hardware I/O
     coherency is used, since it is unneeded and causes the deadlock.

The following three patches address the problem in the following way:

 * PATCH 1/3 adds the necessary infrastructure to allow
   sub-architecture to override the memory type used to map PCIe I/O
   regions. It has been Acked by Catalin, and should be routed through
   Russell's tree.

 * PATCH 2/3 extends the l2x0 cache driver with a new property
   "dma-coherent", valid for the PL310, which makes the driver use a
   different set of L2 cache maintenance operations having ->sync set
   to NULL. This patch should be routed through Russell's tree.

 * PATCH 3/3 uses both of the added infrastructures, as well as the
   existing infrastructure to customize the behavior of ioremap() on a
   per-platform basis, to implement the workaround for the Armada 375
   and 38x SOCs. This patch should go through the mvebu maintainers
   tree. However, it has a build dependency on PATCH 1/3 that needs to
   be taken into account.

Changes since v2:

 - Added Acked-by from Catalin on "ARM: mm: allow sub-architectures to
   override PCI I/O memory type".

 - Dropped the patch fixing the of_update_property() function, since
   we're no longer using it.

 - Instead of using a different compatible string to identify PL310
   used in an I/O coherent configuration, use a separate boolean
   property. Suggested by Catalin.

 - Rework the mach-mvebu/coherency.c to add the boolean property
   "dma-coherent" when needed instead of updating the compatible
   string of the cache controller.

Changes since v1:

 - Instead of introducing separate l2x0 initialization functions, rely
   on a separate compatible string to identify whether we're coherent
   or not. The compatible string *has* to be modified at runtime,
   because Armada 375 and 38x are only I/O coherent when in SMP
   mode. In non-SMP mode, they are not I/O coherent, so we cannot
   change the DT to 'arm,pl310-coherent-cache'.

 - Addition of the drivers/of fix to be able to use
   of_update_property() early and fix up the PL310 compatible string,
   as explained in the previous item.

Thanks!

Thomas

Thomas Petazzoni (3):
  ARM: mm: allow sub-architectures to override PCI I/O memory type
  ARM: mm: add support for HW coherent systems in PL310
  ARM: mvebu: implement L2/PCIe deadlock workaround

 Documentation/devicetree/bindings/arm/l2cc.txt |  3 ++
 arch/arm/include/asm/io.h                      |  6 +++
 arch/arm/mach-mvebu/board-v7.c                 | 51 ++++++++++++++++++++++++++
 arch/arm/mm/cache-l2x0.c                       | 24 ++++++++++++
 arch/arm/mm/ioremap.c                          |  9 ++++-
 5 files changed, 92 insertions(+), 1 deletion(-)

-- 
1.9.3

--
To unsubscribe from this list: send the line "unsubscribe devicetree" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 66+ messages in thread

* [PATCHv3 0/3] ARM: implement workaround for Cortex-A9/PL310/PCIe deadlock
@ 2014-05-15  9:18 ` Thomas Petazzoni
  0 siblings, 0 replies; 66+ messages in thread
From: Thomas Petazzoni @ 2014-05-15  9:18 UTC (permalink / raw
  To: linux-arm-kernel

Russell, Will, Catalin,

This patch series adresses a problem that affects the newer Marvell
Armada 375 and 38x SOCs, based on Cortex-A9+PL310, combined with the
Marvell PCIe hardware unit. When the hardware I/O coherency is
enabled, the combination of Cortex-A9/PL310/Marvell PCIe hardware unit
will quickly cause a deadlock when the PCIe bus is stressed.

The workaround for this problem has been suggested by ARM, and
consists in two things:

 (1) Map the PCIe regions as strongly-ordered

 (2) Disable the outer cache sync of the PL310 when hardware I/O
     coherency is used, since it is unneeded and causes the deadlock.

The following three patches address the problem in the following way:

 * PATCH 1/3 adds the necessary infrastructure to allow
   sub-architecture to override the memory type used to map PCIe I/O
   regions. It has been Acked by Catalin, and should be routed through
   Russell's tree.

 * PATCH 2/3 extends the l2x0 cache driver with a new property
   "dma-coherent", valid for the PL310, which makes the driver use a
   different set of L2 cache maintenance operations having ->sync set
   to NULL. This patch should be routed through Russell's tree.

 * PATCH 3/3 uses both of the added infrastructures, as well as the
   existing infrastructure to customize the behavior of ioremap() on a
   per-platform basis, to implement the workaround for the Armada 375
   and 38x SOCs. This patch should go through the mvebu maintainers
   tree. However, it has a build dependency on PATCH 1/3 that needs to
   be taken into account.

Changes since v2:

 - Added Acked-by from Catalin on "ARM: mm: allow sub-architectures to
   override PCI I/O memory type".

 - Dropped the patch fixing the of_update_property() function, since
   we're no longer using it.

 - Instead of using a different compatible string to identify PL310
   used in an I/O coherent configuration, use a separate boolean
   property. Suggested by Catalin.

 - Rework the mach-mvebu/coherency.c to add the boolean property
   "dma-coherent" when needed instead of updating the compatible
   string of the cache controller.

Changes since v1:

 - Instead of introducing separate l2x0 initialization functions, rely
   on a separate compatible string to identify whether we're coherent
   or not. The compatible string *has* to be modified at runtime,
   because Armada 375 and 38x are only I/O coherent when in SMP
   mode. In non-SMP mode, they are not I/O coherent, so we cannot
   change the DT to 'arm,pl310-coherent-cache'.

 - Addition of the drivers/of fix to be able to use
   of_update_property() early and fix up the PL310 compatible string,
   as explained in the previous item.

Thanks!

Thomas

Thomas Petazzoni (3):
  ARM: mm: allow sub-architectures to override PCI I/O memory type
  ARM: mm: add support for HW coherent systems in PL310
  ARM: mvebu: implement L2/PCIe deadlock workaround

 Documentation/devicetree/bindings/arm/l2cc.txt |  3 ++
 arch/arm/include/asm/io.h                      |  6 +++
 arch/arm/mach-mvebu/board-v7.c                 | 51 ++++++++++++++++++++++++++
 arch/arm/mm/cache-l2x0.c                       | 24 ++++++++++++
 arch/arm/mm/ioremap.c                          |  9 ++++-
 5 files changed, 92 insertions(+), 1 deletion(-)

-- 
1.9.3

^ permalink raw reply	[flat|nested] 66+ messages in thread

* [PATCHv3 1/3] ARM: mm: allow sub-architectures to override PCI I/O memory type
  2014-05-15  9:18 ` Thomas Petazzoni
@ 2014-05-15  9:18     ` Thomas Petazzoni
  -1 siblings, 0 replies; 66+ messages in thread
From: Thomas Petazzoni @ 2014-05-15  9:18 UTC (permalink / raw
  To: Russell King, Will Deacon, Catalin Marinas,
	devicetree-u79uwXL29TY76Z2rM5mHXA, Grant Likely, Rob Herring
  Cc: Albin Tonnerre, linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	Jason Cooper, Andrew Lunn, Sebastian Hesselbarth, Gregory Clement,
	Tawfik Bayouk, Nadav Haklai, Lior Amsalem, Ezequiel Garcia

Due to a design incompatibility between the PCIe Marvell controller
and the Cortex-A9, stressing PCIe devices with a lot of traffic
quickly causes a deadlock.

One part of the workaround for this is to have all PCIe regions mapped
as strongly-ordered (MT_UNCACHED) instead of the default
MT_DEVICE. While the arch_ioremap_caller() mechanism allows
sub-architecture code to override ioremap(), used to map PCIe memory
regions, there isn't such a mechanism to override the behavior of
pci_ioremap_io().

This commit adds the arch_pci_ioremap_mem_type variable, initialized
to MT_DEVICE by default, and that sub-architecture code can
override. We have chosen to expose a single variable rather than
offering the possibility of overriding the entire pci_ioremap_io(),
because implementing pci_ioremap_io() requires calling functions
(get_mem_type()) that are private to the arch/arm/mm/ code.

Signed-off-by: Thomas Petazzoni <thomas.petazzoni-wi1+55ScJUtKEb57/3fJTNBPR1lH4CV8@public.gmane.org>
Acked-by: Catalin Marinas <catalin.marinas-5wv7dgnIgG8@public.gmane.org>
---
 arch/arm/include/asm/io.h | 6 ++++++
 arch/arm/mm/ioremap.c     | 9 ++++++++-
 2 files changed, 14 insertions(+), 1 deletion(-)

diff --git a/arch/arm/include/asm/io.h b/arch/arm/include/asm/io.h
index 8aa4cca..3d23418 100644
--- a/arch/arm/include/asm/io.h
+++ b/arch/arm/include/asm/io.h
@@ -179,6 +179,12 @@ static inline void __iomem *__typesafe_io(unsigned long addr)
 /* PCI fixed i/o mapping */
 #define PCI_IO_VIRT_BASE	0xfee00000
 
+#if defined(CONFIG_PCI)
+void pci_ioremap_set_mem_type(int mem_type);
+#else
+static inline void pci_ioremap_set_mem_type(int mem_type) {}
+#endif
+
 extern int pci_ioremap_io(unsigned int offset, phys_addr_t phys_addr);
 
 /*
diff --git a/arch/arm/mm/ioremap.c b/arch/arm/mm/ioremap.c
index f9c32ba..d1e5ad7 100644
--- a/arch/arm/mm/ioremap.c
+++ b/arch/arm/mm/ioremap.c
@@ -438,6 +438,13 @@ void __arm_iounmap(volatile void __iomem *io_addr)
 EXPORT_SYMBOL(__arm_iounmap);
 
 #ifdef CONFIG_PCI
+static int pci_ioremap_mem_type = MT_DEVICE;
+
+void pci_ioremap_set_mem_type(int mem_type)
+{
+	pci_ioremap_mem_type = mem_type;
+}
+
 int pci_ioremap_io(unsigned int offset, phys_addr_t phys_addr)
 {
 	BUG_ON(offset + SZ_64K > IO_SPACE_LIMIT);
@@ -445,7 +452,7 @@ int pci_ioremap_io(unsigned int offset, phys_addr_t phys_addr)
 	return ioremap_page_range(PCI_IO_VIRT_BASE + offset,
 				  PCI_IO_VIRT_BASE + offset + SZ_64K,
 				  phys_addr,
-				  __pgprot(get_mem_type(MT_DEVICE)->prot_pte));
+				  __pgprot(get_mem_type(pci_ioremap_mem_type)->prot_pte));
 }
 EXPORT_SYMBOL_GPL(pci_ioremap_io);
 #endif
-- 
1.9.3

--
To unsubscribe from this list: send the line "unsubscribe devicetree" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 66+ messages in thread

* [PATCHv3 1/3] ARM: mm: allow sub-architectures to override PCI I/O memory type
@ 2014-05-15  9:18     ` Thomas Petazzoni
  0 siblings, 0 replies; 66+ messages in thread
From: Thomas Petazzoni @ 2014-05-15  9:18 UTC (permalink / raw
  To: linux-arm-kernel

Due to a design incompatibility between the PCIe Marvell controller
and the Cortex-A9, stressing PCIe devices with a lot of traffic
quickly causes a deadlock.

One part of the workaround for this is to have all PCIe regions mapped
as strongly-ordered (MT_UNCACHED) instead of the default
MT_DEVICE. While the arch_ioremap_caller() mechanism allows
sub-architecture code to override ioremap(), used to map PCIe memory
regions, there isn't such a mechanism to override the behavior of
pci_ioremap_io().

This commit adds the arch_pci_ioremap_mem_type variable, initialized
to MT_DEVICE by default, and that sub-architecture code can
override. We have chosen to expose a single variable rather than
offering the possibility of overriding the entire pci_ioremap_io(),
because implementing pci_ioremap_io() requires calling functions
(get_mem_type()) that are private to the arch/arm/mm/ code.

Signed-off-by: Thomas Petazzoni <thomas.petazzoni@free-electrons.com>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
---
 arch/arm/include/asm/io.h | 6 ++++++
 arch/arm/mm/ioremap.c     | 9 ++++++++-
 2 files changed, 14 insertions(+), 1 deletion(-)

diff --git a/arch/arm/include/asm/io.h b/arch/arm/include/asm/io.h
index 8aa4cca..3d23418 100644
--- a/arch/arm/include/asm/io.h
+++ b/arch/arm/include/asm/io.h
@@ -179,6 +179,12 @@ static inline void __iomem *__typesafe_io(unsigned long addr)
 /* PCI fixed i/o mapping */
 #define PCI_IO_VIRT_BASE	0xfee00000

+#if defined(CONFIG_PCI)
+void pci_ioremap_set_mem_type(int mem_type);
+#else
+static inline void pci_ioremap_set_mem_type(int mem_type) {}
+#endif
+
 extern int pci_ioremap_io(unsigned int offset, phys_addr_t phys_addr);

 /*
diff --git a/arch/arm/mm/ioremap.c b/arch/arm/mm/ioremap.c
index f9c32ba..d1e5ad7 100644
--- a/arch/arm/mm/ioremap.c
+++ b/arch/arm/mm/ioremap.c
@@ -438,6 +438,13 @@ void __arm_iounmap(volatile void __iomem *io_addr)
 EXPORT_SYMBOL(__arm_iounmap);

 #ifdef CONFIG_PCI
+static int pci_ioremap_mem_type = MT_DEVICE;
+
+void pci_ioremap_set_mem_type(int mem_type)
+{
+	pci_ioremap_mem_type = mem_type;
+}
+
 int pci_ioremap_io(unsigned int offset, phys_addr_t phys_addr)
 {
 	BUG_ON(offset + SZ_64K > IO_SPACE_LIMIT);
@@ -445,7 +452,7 @@ int pci_ioremap_io(unsigned int offset, phys_addr_t phys_addr)
 	return ioremap_page_range(PCI_IO_VIRT_BASE + offset,
 				  PCI_IO_VIRT_BASE + offset + SZ_64K,
 				  phys_addr,
-				  __pgprot(get_mem_type(MT_DEVICE)->prot_pte));
+				  __pgprot(get_mem_type(pci_ioremap_mem_type)->prot_pte));
 }
 EXPORT_SYMBOL_GPL(pci_ioremap_io);
 #endif
-- 
1.9.3

^ permalink raw reply related	[flat|nested] 66+ messages in thread

* [PATCHv3 2/3] ARM: mm: add support for HW coherent systems in PL310
  2014-05-15  9:18 ` Thomas Petazzoni
@ 2014-05-15  9:18     ` Thomas Petazzoni
  -1 siblings, 0 replies; 66+ messages in thread
From: Thomas Petazzoni @ 2014-05-15  9:18 UTC (permalink / raw
  To: Russell King, Will Deacon, Catalin Marinas,
	devicetree-u79uwXL29TY76Z2rM5mHXA, Grant Likely, Rob Herring
  Cc: Albin Tonnerre, linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	Jason Cooper, Andrew Lunn, Sebastian Hesselbarth, Gregory Clement,
	Tawfik Bayouk, Nadav Haklai, Lior Amsalem, Ezequiel Garcia

When a PL310 cache is used on a system that provides hardware
coherency, the outer cache sync operation is useless, and can be
skipped. Moreover, on some systems, it is harmful as it causes
deadlocks between the Marvell coherency mechanism, the Marvell PCIe
controller and the Cortex-A9.

To avoid this, this commit introduces a new Device Tree property
'dma-coherent' for the L2 cache controller node, valid only for the
PL310 cache. It identifies the usage of the PL310 cache in an I/O
coherent configuration. Internally, it makes the driver use a
different set of l2x0_of_data, in which the ->sync operation is NULL.

Note that technically speaking, a fully coherent system wouldn't
require any of the other .outer_cache operations. However, in
practice, when booting secondary CPUs, these are not yet coherent, and
therefore a set of cache maintenance operations are necessary at this
point. This explains why we keep the other .outer_cache operations and
only ->sync is disabled.

While in theory any write to a PL310 register could cause the
deadlock, in practice, disabling ->sync is sufficient to workaround
the deadlock, since the other cache maintenance operations are only
used in very specific situations.

Signed-off-by: Thomas Petazzoni <thomas.petazzoni-wi1+55ScJUtKEb57/3fJTNBPR1lH4CV8@public.gmane.org>
---
 Documentation/devicetree/bindings/arm/l2cc.txt |  3 +++
 arch/arm/mm/cache-l2x0.c                       | 24 ++++++++++++++++++++++++
 2 files changed, 27 insertions(+)

diff --git a/Documentation/devicetree/bindings/arm/l2cc.txt b/Documentation/devicetree/bindings/arm/l2cc.txt
index b513cb8..077d837 100644
--- a/Documentation/devicetree/bindings/arm/l2cc.txt
+++ b/Documentation/devicetree/bindings/arm/l2cc.txt
@@ -40,6 +40,9 @@ Optional properties:
 - arm,filter-ranges : <start length> Starting address and length of window to
   filter. Addresses in the filter window are directed to the M1 port. Other
   addresses will go to the M0 port.
+- dma-coherent : indicates that the system is operating in an hardware
+  I/O coherent mode. Valid only when the arm,pl310-cache compatible
+  string is used.
 - interrupts : 1 combined interrupt.
 - cache-id-part: cache id part number to be used if it is not present
   on hardware
diff --git a/arch/arm/mm/cache-l2x0.c b/arch/arm/mm/cache-l2x0.c
index 7abde2ce..cf0c037 100644
--- a/arch/arm/mm/cache-l2x0.c
+++ b/arch/arm/mm/cache-l2x0.c
@@ -889,6 +889,26 @@ static const struct l2x0_of_data pl310_data = {
 	},
 };
 
+/*
+ * PL310 operations used on I/O coherent systems. Theoretically, no
+ * outer cache operations would be needed, except that for secondary
+ * processors bring up, a few cache maintenance operations are needed
+ * because secondary processors are not directly coherent with the L2
+ * cache when they start up.
+ */
+static const struct l2x0_of_data pl310_coherent_data = {
+	.setup = pl310_of_setup,
+	.save  = pl310_save,
+	.outer_cache = {
+		.resume      = pl310_resume,
+		.inv_range   = l2x0_inv_range,
+		.clean_range = l2x0_clean_range,
+		.flush_range = l2x0_flush_range,
+		.flush_all   = l2x0_flush_all,
+		.inv_all     = l2x0_inv_all,
+	},
+};
+
 static const struct l2x0_of_data l2x0_data = {
 	.setup = l2x0_of_setup,
 	.save  = NULL,
@@ -989,6 +1009,10 @@ int __init l2x0_of_init(u32 aux_val, u32 aux_mask)
 
 	data = of_match_node(l2x0_ids, np)->data;
 
+	if (of_device_is_compatible(np, "arm,pl310-cache") &&
+	    of_property_read_bool(np, "dma-coherent"))
+		data = &pl310_coherent_data;
+
 	/* L2 configuration can only be changed if the cache is disabled */
 	if (!(readl_relaxed(l2x0_base + L2X0_CTRL) & L2X0_CTRL_EN)) {
 		if (data->setup)
-- 
1.9.3

--
To unsubscribe from this list: send the line "unsubscribe devicetree" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 66+ messages in thread

* [PATCHv3 2/3] ARM: mm: add support for HW coherent systems in PL310
@ 2014-05-15  9:18     ` Thomas Petazzoni
  0 siblings, 0 replies; 66+ messages in thread
From: Thomas Petazzoni @ 2014-05-15  9:18 UTC (permalink / raw
  To: linux-arm-kernel

When a PL310 cache is used on a system that provides hardware
coherency, the outer cache sync operation is useless, and can be
skipped. Moreover, on some systems, it is harmful as it causes
deadlocks between the Marvell coherency mechanism, the Marvell PCIe
controller and the Cortex-A9.

To avoid this, this commit introduces a new Device Tree property
'dma-coherent' for the L2 cache controller node, valid only for the
PL310 cache. It identifies the usage of the PL310 cache in an I/O
coherent configuration. Internally, it makes the driver use a
different set of l2x0_of_data, in which the ->sync operation is NULL.

Note that technically speaking, a fully coherent system wouldn't
require any of the other .outer_cache operations. However, in
practice, when booting secondary CPUs, these are not yet coherent, and
therefore a set of cache maintenance operations are necessary at this
point. This explains why we keep the other .outer_cache operations and
only ->sync is disabled.

While in theory any write to a PL310 register could cause the
deadlock, in practice, disabling ->sync is sufficient to workaround
the deadlock, since the other cache maintenance operations are only
used in very specific situations.

Signed-off-by: Thomas Petazzoni <thomas.petazzoni@free-electrons.com>
---
 Documentation/devicetree/bindings/arm/l2cc.txt |  3 +++
 arch/arm/mm/cache-l2x0.c                       | 24 ++++++++++++++++++++++++
 2 files changed, 27 insertions(+)

diff --git a/Documentation/devicetree/bindings/arm/l2cc.txt b/Documentation/devicetree/bindings/arm/l2cc.txt
index b513cb8..077d837 100644
--- a/Documentation/devicetree/bindings/arm/l2cc.txt
+++ b/Documentation/devicetree/bindings/arm/l2cc.txt
@@ -40,6 +40,9 @@ Optional properties:
 - arm,filter-ranges : <start length> Starting address and length of window to
   filter. Addresses in the filter window are directed to the M1 port. Other
   addresses will go to the M0 port.
+- dma-coherent : indicates that the system is operating in an hardware
+  I/O coherent mode. Valid only when the arm,pl310-cache compatible
+  string is used.
 - interrupts : 1 combined interrupt.
 - cache-id-part: cache id part number to be used if it is not present
   on hardware
diff --git a/arch/arm/mm/cache-l2x0.c b/arch/arm/mm/cache-l2x0.c
index 7abde2ce..cf0c037 100644
--- a/arch/arm/mm/cache-l2x0.c
+++ b/arch/arm/mm/cache-l2x0.c
@@ -889,6 +889,26 @@ static const struct l2x0_of_data pl310_data = {
 	},
 };

+/*
+ * PL310 operations used on I/O coherent systems. Theoretically, no
+ * outer cache operations would be needed, except that for secondary
+ * processors bring up, a few cache maintenance operations are needed
+ * because secondary processors are not directly coherent with the L2
+ * cache when they start up.
+ */
+static const struct l2x0_of_data pl310_coherent_data = {
+	.setup = pl310_of_setup,
+	.save  = pl310_save,
+	.outer_cache = {
+		.resume      = pl310_resume,
+		.inv_range   = l2x0_inv_range,
+		.clean_range = l2x0_clean_range,
+		.flush_range = l2x0_flush_range,
+		.flush_all   = l2x0_flush_all,
+		.inv_all     = l2x0_inv_all,
+	},
+};
+
 static const struct l2x0_of_data l2x0_data = {
 	.setup = l2x0_of_setup,
 	.save  = NULL,
@@ -989,6 +1009,10 @@ int __init l2x0_of_init(u32 aux_val, u32 aux_mask)

 	data = of_match_node(l2x0_ids, np)->data;

+	if (of_device_is_compatible(np, "arm,pl310-cache") &&
+	    of_property_read_bool(np, "dma-coherent"))
+		data = &pl310_coherent_data;
+
 	/* L2 configuration can only be changed if the cache is disabled */
 	if (!(readl_relaxed(l2x0_base + L2X0_CTRL) & L2X0_CTRL_EN)) {
 		if (data->setup)
-- 
1.9.3

^ permalink raw reply related	[flat|nested] 66+ messages in thread

* [PATCHv3 3/3] ARM: mvebu: implement L2/PCIe deadlock workaround
  2014-05-15  9:18 ` Thomas Petazzoni
@ 2014-05-15  9:18     ` Thomas Petazzoni
  -1 siblings, 0 replies; 66+ messages in thread
From: Thomas Petazzoni @ 2014-05-15  9:18 UTC (permalink / raw
  To: Russell King, Will Deacon, Catalin Marinas,
	devicetree-u79uwXL29TY76Z2rM5mHXA, Grant Likely, Rob Herring
  Cc: Albin Tonnerre, linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	Jason Cooper, Andrew Lunn, Sebastian Hesselbarth, Gregory Clement,
	Tawfik Bayouk, Nadav Haklai, Lior Amsalem, Ezequiel Garcia

The Marvell Armada 375 and Armada 38x SOCs, which use the Cortex-A9
CPU core, the PL310 cache and the Marvell PCIe hardware block are
affected a L2/PCIe deadlock caused by a system erratum when hardware
I/O coherency is used.

This deadlock can be avoided by mapping the PCIe memory areas as
strongly-ordered (note: MT_UNCACHED is strongly-ordered), and by
removing the outer cache sync done in software. This is done in this
patch, thanks to the new bits of infrastructure added in 'ARM: mm:
allow sub-architectures to override PCI I/O memory type' and 'ARM: mm:
add support for HW coherent systems in PL310' respectively.

Signed-off-by: Thomas Petazzoni <thomas.petazzoni-wi1+55ScJUtKEb57/3fJTNBPR1lH4CV8@public.gmane.org>
---
 arch/arm/mach-mvebu/board-v7.c | 51 ++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 51 insertions(+)

diff --git a/arch/arm/mach-mvebu/board-v7.c b/arch/arm/mach-mvebu/board-v7.c
index 01cfce6..5016c58 100644
--- a/arch/arm/mach-mvebu/board-v7.c
+++ b/arch/arm/mach-mvebu/board-v7.c
@@ -71,6 +71,49 @@ static int armada_375_external_abort_wa(unsigned long addr, unsigned int fsr,
 	return 1;
 }
 
+/*
+ * This ioremap hook is used on Armada 375/38x to ensure that PCIe
+ * memory areas are mapped as MT_MEMORY_RW_SO instead of
+ * MT_DEVICE. This is needed as a workaround for a deadlock issue
+ * between the PCIe interface and the cache controller.
+ */
+static void __iomem *
+armada_pcie_wa_ioremap_caller(phys_addr_t phys_addr, size_t size,
+			      unsigned int mtype, void *caller)
+{
+	struct resource pcie_mem;
+
+	mvebu_mbus_get_pcie_mem_aperture(&pcie_mem);
+
+	if (pcie_mem.start <= phys_addr && (phys_addr + size) <= pcie_mem.end)
+		mtype = MT_UNCACHED;
+
+	return __arm_ioremap_caller(phys_addr, size, mtype, caller);
+}
+
+/*
+ * When we are I/O coherent and we use the PL310 cache controller, we
+ * add the PL310 property "dma-coherent". This makes sure the outer
+ * sync operation is not used, which allows to workaround the system
+ * erratum that causes deadlocks when doing PCIe in an SMP situation
+ * on Armada 375 and Armada 38x.
+ */
+static void __init mvebu_l2x0_pl310_coherent(void)
+{
+	struct device_node *np;
+
+	if (!coherency_available())
+		return;
+
+	for_each_compatible_node(np, NULL, "arm,pl310-cache") {
+		struct property *p;
+
+		p = kzalloc(sizeof(*p), GFP_KERNEL);
+		p->name = kstrdup("dma-coherent", GFP_KERNEL);
+		of_add_property(np, p);
+	}
+}
+
 static void __init mvebu_timer_and_clk_init(void)
 {
 	of_clk_init(NULL);
@@ -78,6 +121,14 @@ static void __init mvebu_timer_and_clk_init(void)
 	mvebu_scu_enable();
 	coherency_init();
 	BUG_ON(mvebu_mbus_dt_init(coherency_available()));
+
+	if (of_machine_is_compatible("marvell,armada375") ||
+	    of_machine_is_compatible("marvell,armada38x")) {
+		arch_ioremap_caller = armada_pcie_wa_ioremap_caller;
+		pci_ioremap_set_mem_type(MT_UNCACHED);
+		mvebu_l2x0_pl310_coherent();
+	}
+
 	l2x0_of_init(0, ~0UL);
 
 	if (of_machine_is_compatible("marvell,armada375"))
-- 
1.9.3

--
To unsubscribe from this list: send the line "unsubscribe devicetree" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 66+ messages in thread

* [PATCHv3 3/3] ARM: mvebu: implement L2/PCIe deadlock workaround
@ 2014-05-15  9:18     ` Thomas Petazzoni
  0 siblings, 0 replies; 66+ messages in thread
From: Thomas Petazzoni @ 2014-05-15  9:18 UTC (permalink / raw
  To: linux-arm-kernel

The Marvell Armada 375 and Armada 38x SOCs, which use the Cortex-A9
CPU core, the PL310 cache and the Marvell PCIe hardware block are
affected a L2/PCIe deadlock caused by a system erratum when hardware
I/O coherency is used.

This deadlock can be avoided by mapping the PCIe memory areas as
strongly-ordered (note: MT_UNCACHED is strongly-ordered), and by
removing the outer cache sync done in software. This is done in this
patch, thanks to the new bits of infrastructure added in 'ARM: mm:
allow sub-architectures to override PCI I/O memory type' and 'ARM: mm:
add support for HW coherent systems in PL310' respectively.

Signed-off-by: Thomas Petazzoni <thomas.petazzoni@free-electrons.com>
---
 arch/arm/mach-mvebu/board-v7.c | 51 ++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 51 insertions(+)

diff --git a/arch/arm/mach-mvebu/board-v7.c b/arch/arm/mach-mvebu/board-v7.c
index 01cfce6..5016c58 100644
--- a/arch/arm/mach-mvebu/board-v7.c
+++ b/arch/arm/mach-mvebu/board-v7.c
@@ -71,6 +71,49 @@ static int armada_375_external_abort_wa(unsigned long addr, unsigned int fsr,
 	return 1;
 }
 
+/*
+ * This ioremap hook is used on Armada 375/38x to ensure that PCIe
+ * memory areas are mapped as MT_MEMORY_RW_SO instead of
+ * MT_DEVICE. This is needed as a workaround for a deadlock issue
+ * between the PCIe interface and the cache controller.
+ */
+static void __iomem *
+armada_pcie_wa_ioremap_caller(phys_addr_t phys_addr, size_t size,
+			      unsigned int mtype, void *caller)
+{
+	struct resource pcie_mem;
+
+	mvebu_mbus_get_pcie_mem_aperture(&pcie_mem);
+
+	if (pcie_mem.start <= phys_addr && (phys_addr + size) <= pcie_mem.end)
+		mtype = MT_UNCACHED;
+
+	return __arm_ioremap_caller(phys_addr, size, mtype, caller);
+}
+
+/*
+ * When we are I/O coherent and we use the PL310 cache controller, we
+ * add the PL310 property "dma-coherent". This makes sure the outer
+ * sync operation is not used, which allows to workaround the system
+ * erratum that causes deadlocks when doing PCIe in an SMP situation
+ * on Armada 375 and Armada 38x.
+ */
+static void __init mvebu_l2x0_pl310_coherent(void)
+{
+	struct device_node *np;
+
+	if (!coherency_available())
+		return;
+
+	for_each_compatible_node(np, NULL, "arm,pl310-cache") {
+		struct property *p;
+
+		p = kzalloc(sizeof(*p), GFP_KERNEL);
+		p->name = kstrdup("dma-coherent", GFP_KERNEL);
+		of_add_property(np, p);
+	}
+}
+
 static void __init mvebu_timer_and_clk_init(void)
 {
 	of_clk_init(NULL);
@@ -78,6 +121,14 @@ static void __init mvebu_timer_and_clk_init(void)
 	mvebu_scu_enable();
 	coherency_init();
 	BUG_ON(mvebu_mbus_dt_init(coherency_available()));
+
+	if (of_machine_is_compatible("marvell,armada375") ||
+	    of_machine_is_compatible("marvell,armada38x")) {
+		arch_ioremap_caller = armada_pcie_wa_ioremap_caller;
+		pci_ioremap_set_mem_type(MT_UNCACHED);
+		mvebu_l2x0_pl310_coherent();
+	}
+
 	l2x0_of_init(0, ~0UL);
 
 	if (of_machine_is_compatible("marvell,armada375"))
-- 
1.9.3

^ permalink raw reply related	[flat|nested] 66+ messages in thread

* Re: [PATCHv3 2/3] ARM: mm: add support for HW coherent systems in PL310
  2014-05-15  9:18     ` Thomas Petazzoni
@ 2014-05-15  9:36         ` Catalin Marinas
  -1 siblings, 0 replies; 66+ messages in thread
From: Catalin Marinas @ 2014-05-15  9:36 UTC (permalink / raw
  To: Thomas Petazzoni
  Cc: Russell King, Will Deacon,
	devicetree-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, Grant Likely,
	Rob Herring, Albin Tonnerre,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r@public.gmane.org,
	Jason Cooper, Andrew Lunn, Sebastian Hesselbarth, Gregory Clement,
	Tawfik Bayouk, Nadav Haklai, Lior Amsalem, Ezequiel Garcia

On Thu, May 15, 2014 at 10:18:38AM +0100, Thomas Petazzoni wrote:
> When a PL310 cache is used on a system that provides hardware
> coherency, the outer cache sync operation is useless, and can be
> skipped. Moreover, on some systems, it is harmful as it causes
> deadlocks between the Marvell coherency mechanism, the Marvell PCIe
> controller and the Cortex-A9.
> 
> To avoid this, this commit introduces a new Device Tree property
> 'dma-coherent' for the L2 cache controller node, valid only for the
> PL310 cache. It identifies the usage of the PL310 cache in an I/O
> coherent configuration. Internally, it makes the driver use a
> different set of l2x0_of_data, in which the ->sync operation is NULL.
> 
> Note that technically speaking, a fully coherent system wouldn't
> require any of the other .outer_cache operations. However, in
> practice, when booting secondary CPUs, these are not yet coherent, and
> therefore a set of cache maintenance operations are necessary at this
> point. This explains why we keep the other .outer_cache operations and
> only ->sync is disabled.
> 
> While in theory any write to a PL310 register could cause the
> deadlock, in practice, disabling ->sync is sufficient to workaround
> the deadlock, since the other cache maintenance operations are only
> used in very specific situations.
> 
> Signed-off-by: Thomas Petazzoni <thomas.petazzoni-wi1+55ScJUtKEb57/3fJTNBPR1lH4CV8@public.gmane.org>

Acked-by: Catalin Marinas <catalin.marinas-5wv7dgnIgG8@public.gmane.org>
--
To unsubscribe from this list: send the line "unsubscribe devicetree" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 66+ messages in thread

* [PATCHv3 2/3] ARM: mm: add support for HW coherent systems in PL310
@ 2014-05-15  9:36         ` Catalin Marinas
  0 siblings, 0 replies; 66+ messages in thread
From: Catalin Marinas @ 2014-05-15  9:36 UTC (permalink / raw
  To: linux-arm-kernel

On Thu, May 15, 2014 at 10:18:38AM +0100, Thomas Petazzoni wrote:
> When a PL310 cache is used on a system that provides hardware
> coherency, the outer cache sync operation is useless, and can be
> skipped. Moreover, on some systems, it is harmful as it causes
> deadlocks between the Marvell coherency mechanism, the Marvell PCIe
> controller and the Cortex-A9.
> 
> To avoid this, this commit introduces a new Device Tree property
> 'dma-coherent' for the L2 cache controller node, valid only for the
> PL310 cache. It identifies the usage of the PL310 cache in an I/O
> coherent configuration. Internally, it makes the driver use a
> different set of l2x0_of_data, in which the ->sync operation is NULL.
> 
> Note that technically speaking, a fully coherent system wouldn't
> require any of the other .outer_cache operations. However, in
> practice, when booting secondary CPUs, these are not yet coherent, and
> therefore a set of cache maintenance operations are necessary at this
> point. This explains why we keep the other .outer_cache operations and
> only ->sync is disabled.
> 
> While in theory any write to a PL310 register could cause the
> deadlock, in practice, disabling ->sync is sufficient to workaround
> the deadlock, since the other cache maintenance operations are only
> used in very specific situations.
> 
> Signed-off-by: Thomas Petazzoni <thomas.petazzoni@free-electrons.com>

Acked-by: Catalin Marinas <catalin.marinas@arm.com>

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCHv3 3/3] ARM: mvebu: implement L2/PCIe deadlock workaround
  2014-05-15  9:18     ` Thomas Petazzoni
@ 2014-05-15  9:36       ` Catalin Marinas
  -1 siblings, 0 replies; 66+ messages in thread
From: Catalin Marinas @ 2014-05-15  9:36 UTC (permalink / raw
  To: Thomas Petazzoni
  Cc: Lior Amsalem, devicetree@vger.kernel.org, Russell King,
	Jason Cooper, Tawfik Bayouk, Andrew Lunn, Will Deacon,
	Grant Likely, Gregory Clement, Nadav Haklai, Rob Herring,
	Ezequiel Garcia, Albin Tonnerre,
	linux-arm-kernel@lists.infradead.org, Sebastian Hesselbarth

On Thu, May 15, 2014 at 10:18:39AM +0100, Thomas Petazzoni wrote:
> The Marvell Armada 375 and Armada 38x SOCs, which use the Cortex-A9
> CPU core, the PL310 cache and the Marvell PCIe hardware block are
> affected a L2/PCIe deadlock caused by a system erratum when hardware
> I/O coherency is used.
> 
> This deadlock can be avoided by mapping the PCIe memory areas as
> strongly-ordered (note: MT_UNCACHED is strongly-ordered), and by
> removing the outer cache sync done in software. This is done in this
> patch, thanks to the new bits of infrastructure added in 'ARM: mm:
> allow sub-architectures to override PCI I/O memory type' and 'ARM: mm:
> add support for HW coherent systems in PL310' respectively.
> 
> Signed-off-by: Thomas Petazzoni <thomas.petazzoni@free-electrons.com>

Acked-by: Catalin Marinas <catalin.marinas@arm.com>

^ permalink raw reply	[flat|nested] 66+ messages in thread

* [PATCHv3 3/3] ARM: mvebu: implement L2/PCIe deadlock workaround
@ 2014-05-15  9:36       ` Catalin Marinas
  0 siblings, 0 replies; 66+ messages in thread
From: Catalin Marinas @ 2014-05-15  9:36 UTC (permalink / raw
  To: linux-arm-kernel

On Thu, May 15, 2014 at 10:18:39AM +0100, Thomas Petazzoni wrote:
> The Marvell Armada 375 and Armada 38x SOCs, which use the Cortex-A9
> CPU core, the PL310 cache and the Marvell PCIe hardware block are
> affected a L2/PCIe deadlock caused by a system erratum when hardware
> I/O coherency is used.
> 
> This deadlock can be avoided by mapping the PCIe memory areas as
> strongly-ordered (note: MT_UNCACHED is strongly-ordered), and by
> removing the outer cache sync done in software. This is done in this
> patch, thanks to the new bits of infrastructure added in 'ARM: mm:
> allow sub-architectures to override PCI I/O memory type' and 'ARM: mm:
> add support for HW coherent systems in PL310' respectively.
> 
> Signed-off-by: Thomas Petazzoni <thomas.petazzoni@free-electrons.com>

Acked-by: Catalin Marinas <catalin.marinas@arm.com>

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCHv3 2/3] ARM: mm: add support for HW coherent systems in PL310
  2014-05-15  9:36         ` Catalin Marinas
@ 2014-05-15 11:39           ` Thomas Petazzoni
  -1 siblings, 0 replies; 66+ messages in thread
From: Thomas Petazzoni @ 2014-05-15 11:39 UTC (permalink / raw
  To: Catalin Marinas
  Cc: Russell King, Will Deacon,
	devicetree-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, Grant Likely,
	Rob Herring, Albin Tonnerre,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r@public.gmane.org,
	Jason Cooper, Andrew Lunn, Sebastian Hesselbarth, Gregory Clement,
	Tawfik Bayouk, Nadav Haklai, Lior Amsalem, Ezequiel Garcia

Dear Catalin Marinas,

On Thu, 15 May 2014 10:36:39 +0100, Catalin Marinas wrote:

> > While in theory any write to a PL310 register could cause the
> > deadlock, in practice, disabling ->sync is sufficient to workaround
> > the deadlock, since the other cache maintenance operations are only
> > used in very specific situations.
> > 
> > Signed-off-by: Thomas Petazzoni <thomas.petazzoni-wi1+55ScJUtKEb57/3fJTNBPR1lH4CV8@public.gmane.org>
> 
> Acked-by: Catalin Marinas <catalin.marinas-5wv7dgnIgG8@public.gmane.org>

Thanks a lot for having reviewed these patches! I will submit PATCH 1/3
and PATCH 2/3 in Russell's patch system now.

Thanks,

Thomas
-- 
Thomas Petazzoni, CTO, Free Electrons
Embedded Linux, Kernel and Android engineering
http://free-electrons.com
--
To unsubscribe from this list: send the line "unsubscribe devicetree" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 66+ messages in thread

* [PATCHv3 2/3] ARM: mm: add support for HW coherent systems in PL310
@ 2014-05-15 11:39           ` Thomas Petazzoni
  0 siblings, 0 replies; 66+ messages in thread
From: Thomas Petazzoni @ 2014-05-15 11:39 UTC (permalink / raw
  To: linux-arm-kernel

Dear Catalin Marinas,

On Thu, 15 May 2014 10:36:39 +0100, Catalin Marinas wrote:

> > While in theory any write to a PL310 register could cause the
> > deadlock, in practice, disabling ->sync is sufficient to workaround
> > the deadlock, since the other cache maintenance operations are only
> > used in very specific situations.
> > 
> > Signed-off-by: Thomas Petazzoni <thomas.petazzoni@free-electrons.com>
> 
> Acked-by: Catalin Marinas <catalin.marinas@arm.com>

Thanks a lot for having reviewed these patches! I will submit PATCH 1/3
and PATCH 2/3 in Russell's patch system now.

Thanks,

Thomas
-- 
Thomas Petazzoni, CTO, Free Electrons
Embedded Linux, Kernel and Android engineering
http://free-electrons.com

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCHv3 3/3] ARM: mvebu: implement L2/PCIe deadlock workaround
  2014-05-15  9:18     ` Thomas Petazzoni
@ 2014-05-15 13:21         ` Jason Cooper
  -1 siblings, 0 replies; 66+ messages in thread
From: Jason Cooper @ 2014-05-15 13:21 UTC (permalink / raw
  To: Thomas Petazzoni
  Cc: Russell King, Will Deacon, Catalin Marinas,
	devicetree-u79uwXL29TY76Z2rM5mHXA, Grant Likely, Rob Herring,
	Albin Tonnerre, linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	Andrew Lunn, Sebastian Hesselbarth, Gregory Clement,
	Tawfik Bayouk, Nadav Haklai, Lior Amsalem, Ezequiel Garcia

On Thu, May 15, 2014 at 11:18:39AM +0200, Thomas Petazzoni wrote:
> The Marvell Armada 375 and Armada 38x SOCs, which use the Cortex-A9
> CPU core, the PL310 cache and the Marvell PCIe hardware block are
> affected a L2/PCIe deadlock caused by a system erratum when hardware
> I/O coherency is used.
> 
> This deadlock can be avoided by mapping the PCIe memory areas as
> strongly-ordered (note: MT_UNCACHED is strongly-ordered), and by
> removing the outer cache sync done in software. This is done in this
> patch, thanks to the new bits of infrastructure added in 'ARM: mm:
> allow sub-architectures to override PCI I/O memory type' and 'ARM: mm:
> add support for HW coherent systems in PL310' respectively.
> 
> Signed-off-by: Thomas Petazzoni <thomas.petazzoni-wi1+55ScJUtKEb57/3fJTNBPR1lH4CV8@public.gmane.org>
> ---
>  arch/arm/mach-mvebu/board-v7.c | 51 ++++++++++++++++++++++++++++++++++++++++++
>  1 file changed, 51 insertions(+)
> 
> diff --git a/arch/arm/mach-mvebu/board-v7.c b/arch/arm/mach-mvebu/board-v7.c
> index 01cfce6..5016c58 100644
> --- a/arch/arm/mach-mvebu/board-v7.c
> +++ b/arch/arm/mach-mvebu/board-v7.c
> @@ -71,6 +71,49 @@ static int armada_375_external_abort_wa(unsigned long addr, unsigned int fsr,
>  	return 1;
>  }
>  
> +/*
> + * This ioremap hook is used on Armada 375/38x to ensure that PCIe
> + * memory areas are mapped as MT_MEMORY_RW_SO instead of
> + * MT_DEVICE. This is needed as a workaround for a deadlock issue
> + * between the PCIe interface and the cache controller.
> + */
> +static void __iomem *
> +armada_pcie_wa_ioremap_caller(phys_addr_t phys_addr, size_t size,
> +			      unsigned int mtype, void *caller)
> +{
> +	struct resource pcie_mem;
> +
> +	mvebu_mbus_get_pcie_mem_aperture(&pcie_mem);
> +
> +	if (pcie_mem.start <= phys_addr && (phys_addr + size) <= pcie_mem.end)
> +		mtype = MT_UNCACHED;
> +
> +	return __arm_ioremap_caller(phys_addr, size, mtype, caller);
> +}
> +
> +/*
> + * When we are I/O coherent and we use the PL310 cache controller, we
> + * add the PL310 property "dma-coherent". This makes sure the outer
> + * sync operation is not used, which allows to workaround the system
> + * erratum that causes deadlocks when doing PCIe in an SMP situation
> + * on Armada 375 and Armada 38x.
> + */
> +static void __init mvebu_l2x0_pl310_coherent(void)
> +{
> +	struct device_node *np;
> +
> +	if (!coherency_available())
> +		return;
> +
> +	for_each_compatible_node(np, NULL, "arm,pl310-cache") {
> +		struct property *p;
> +
> +		p = kzalloc(sizeof(*p), GFP_KERNEL);
> +		p->name = kstrdup("dma-coherent", GFP_KERNEL);
> +		of_add_property(np, p);
> +	}
> +}
> +
>  static void __init mvebu_timer_and_clk_init(void)
>  {
>  	of_clk_init(NULL);
> @@ -78,6 +121,14 @@ static void __init mvebu_timer_and_clk_init(void)
>  	mvebu_scu_enable();
>  	coherency_init();
>  	BUG_ON(mvebu_mbus_dt_init(coherency_available()));
> +
> +	if (of_machine_is_compatible("marvell,armada375") ||
> +	    of_machine_is_compatible("marvell,armada38x")) {
> +		arch_ioremap_caller = armada_pcie_wa_ioremap_caller;
> +		pci_ioremap_set_mem_type(MT_UNCACHED);

iiuc, this patch depends on 1/3.  So how would you like to handle it?

thx,

Jason.
--
To unsubscribe from this list: send the line "unsubscribe devicetree" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 66+ messages in thread

* [PATCHv3 3/3] ARM: mvebu: implement L2/PCIe deadlock workaround
@ 2014-05-15 13:21         ` Jason Cooper
  0 siblings, 0 replies; 66+ messages in thread
From: Jason Cooper @ 2014-05-15 13:21 UTC (permalink / raw
  To: linux-arm-kernel

On Thu, May 15, 2014 at 11:18:39AM +0200, Thomas Petazzoni wrote:
> The Marvell Armada 375 and Armada 38x SOCs, which use the Cortex-A9
> CPU core, the PL310 cache and the Marvell PCIe hardware block are
> affected a L2/PCIe deadlock caused by a system erratum when hardware
> I/O coherency is used.
> 
> This deadlock can be avoided by mapping the PCIe memory areas as
> strongly-ordered (note: MT_UNCACHED is strongly-ordered), and by
> removing the outer cache sync done in software. This is done in this
> patch, thanks to the new bits of infrastructure added in 'ARM: mm:
> allow sub-architectures to override PCI I/O memory type' and 'ARM: mm:
> add support for HW coherent systems in PL310' respectively.
> 
> Signed-off-by: Thomas Petazzoni <thomas.petazzoni@free-electrons.com>
> ---
>  arch/arm/mach-mvebu/board-v7.c | 51 ++++++++++++++++++++++++++++++++++++++++++
>  1 file changed, 51 insertions(+)
> 
> diff --git a/arch/arm/mach-mvebu/board-v7.c b/arch/arm/mach-mvebu/board-v7.c
> index 01cfce6..5016c58 100644
> --- a/arch/arm/mach-mvebu/board-v7.c
> +++ b/arch/arm/mach-mvebu/board-v7.c
> @@ -71,6 +71,49 @@ static int armada_375_external_abort_wa(unsigned long addr, unsigned int fsr,
>  	return 1;
>  }
>  
> +/*
> + * This ioremap hook is used on Armada 375/38x to ensure that PCIe
> + * memory areas are mapped as MT_MEMORY_RW_SO instead of
> + * MT_DEVICE. This is needed as a workaround for a deadlock issue
> + * between the PCIe interface and the cache controller.
> + */
> +static void __iomem *
> +armada_pcie_wa_ioremap_caller(phys_addr_t phys_addr, size_t size,
> +			      unsigned int mtype, void *caller)
> +{
> +	struct resource pcie_mem;
> +
> +	mvebu_mbus_get_pcie_mem_aperture(&pcie_mem);
> +
> +	if (pcie_mem.start <= phys_addr && (phys_addr + size) <= pcie_mem.end)
> +		mtype = MT_UNCACHED;
> +
> +	return __arm_ioremap_caller(phys_addr, size, mtype, caller);
> +}
> +
> +/*
> + * When we are I/O coherent and we use the PL310 cache controller, we
> + * add the PL310 property "dma-coherent". This makes sure the outer
> + * sync operation is not used, which allows to workaround the system
> + * erratum that causes deadlocks when doing PCIe in an SMP situation
> + * on Armada 375 and Armada 38x.
> + */
> +static void __init mvebu_l2x0_pl310_coherent(void)
> +{
> +	struct device_node *np;
> +
> +	if (!coherency_available())
> +		return;
> +
> +	for_each_compatible_node(np, NULL, "arm,pl310-cache") {
> +		struct property *p;
> +
> +		p = kzalloc(sizeof(*p), GFP_KERNEL);
> +		p->name = kstrdup("dma-coherent", GFP_KERNEL);
> +		of_add_property(np, p);
> +	}
> +}
> +
>  static void __init mvebu_timer_and_clk_init(void)
>  {
>  	of_clk_init(NULL);
> @@ -78,6 +121,14 @@ static void __init mvebu_timer_and_clk_init(void)
>  	mvebu_scu_enable();
>  	coherency_init();
>  	BUG_ON(mvebu_mbus_dt_init(coherency_available()));
> +
> +	if (of_machine_is_compatible("marvell,armada375") ||
> +	    of_machine_is_compatible("marvell,armada38x")) {
> +		arch_ioremap_caller = armada_pcie_wa_ioremap_caller;
> +		pci_ioremap_set_mem_type(MT_UNCACHED);

iiuc, this patch depends on 1/3.  So how would you like to handle it?

thx,

Jason.

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCHv3 1/3] ARM: mm: allow sub-architectures to override PCI I/O memory type
  2014-05-15  9:18     ` Thomas Petazzoni
@ 2014-05-15 13:21         ` Arnd Bergmann
  -1 siblings, 0 replies; 66+ messages in thread
From: Arnd Bergmann @ 2014-05-15 13:21 UTC (permalink / raw
  To: linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r
  Cc: Thomas Petazzoni, Russell King, Will Deacon, Catalin Marinas,
	devicetree-u79uwXL29TY76Z2rM5mHXA, Grant Likely, Rob Herring,
	Lior Amsalem, Andrew Lunn, Jason Cooper, Tawfik Bayouk,
	Nadav Haklai, Gregory Clement, Ezequiel Garcia, Albin Tonnerre,
	Sebastian Hesselbarth

On Thursday 15 May 2014 11:18:37 Thomas Petazzoni wrote:
> @@ -445,7 +452,7 @@ int pci_ioremap_io(unsigned int offset, phys_addr_t phys_addr)
>         return ioremap_page_range(PCI_IO_VIRT_BASE + offset,
>                                   PCI_IO_VIRT_BASE + offset + SZ_64K,
>                                   phys_addr,
> -                                 __pgprot(get_mem_type(MT_DEVICE)->prot_pte));
> +                                 __pgprot(get_mem_type(pci_ioremap_mem_type)->prot_pte));
>  }
>  EXPORT_SYMBOL_GPL(pci_ioremap_io);
> 

As discussed on IRC, I think we'd be better off making this a strong-ordered
mapping for all platforms unconditionally. The PCI I/O space semantics
require non-posted writes, which is the main difference between device
and SO mappings, so the same fix is required both for mvebu as a workaround
for the deadlock as well as for everyone else as a fix for an incorrect
PCI behavior.

	Arnd
--
To unsubscribe from this list: send the line "unsubscribe devicetree" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 66+ messages in thread

* [PATCHv3 1/3] ARM: mm: allow sub-architectures to override PCI I/O memory type
@ 2014-05-15 13:21         ` Arnd Bergmann
  0 siblings, 0 replies; 66+ messages in thread
From: Arnd Bergmann @ 2014-05-15 13:21 UTC (permalink / raw
  To: linux-arm-kernel

On Thursday 15 May 2014 11:18:37 Thomas Petazzoni wrote:
> @@ -445,7 +452,7 @@ int pci_ioremap_io(unsigned int offset, phys_addr_t phys_addr)
>         return ioremap_page_range(PCI_IO_VIRT_BASE + offset,
>                                   PCI_IO_VIRT_BASE + offset + SZ_64K,
>                                   phys_addr,
> -                                 __pgprot(get_mem_type(MT_DEVICE)->prot_pte));
> +                                 __pgprot(get_mem_type(pci_ioremap_mem_type)->prot_pte));
>  }
>  EXPORT_SYMBOL_GPL(pci_ioremap_io);
> 

As discussed on IRC, I think we'd be better off making this a strong-ordered
mapping for all platforms unconditionally. The PCI I/O space semantics
require non-posted writes, which is the main difference between device
and SO mappings, so the same fix is required both for mvebu as a workaround
for the deadlock as well as for everyone else as a fix for an incorrect
PCI behavior.

	Arnd

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCHv3 2/3] ARM: mm: add support for HW coherent systems in PL310
  2014-05-15  9:18     ` Thomas Petazzoni
@ 2014-05-15 13:23         ` Arnd Bergmann
  -1 siblings, 0 replies; 66+ messages in thread
From: Arnd Bergmann @ 2014-05-15 13:23 UTC (permalink / raw
  To: Thomas Petazzoni
  Cc: Russell King, Will Deacon, Catalin Marinas,
	devicetree-u79uwXL29TY76Z2rM5mHXA, Grant Likely, Rob Herring,
	Albin Tonnerre, linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	Jason Cooper, Andrew Lunn, Sebastian Hesselbarth, Gregory Clement,
	Tawfik Bayouk, Nadav Haklai, Lior Amsalem, Ezequiel Garcia

On Thursday 15 May 2014 11:18:38 Thomas Petazzoni wrote:
> When a PL310 cache is used on a system that provides hardware
> coherency, the outer cache sync operation is useless, and can be
> skipped. Moreover, on some systems, it is harmful as it causes
> deadlocks between the Marvell coherency mechanism, the Marvell PCIe
> controller and the Cortex-A9.
> 
> To avoid this, this commit introduces a new Device Tree property
> 'dma-coherent' for the L2 cache controller node, valid only for the
> PL310 cache. It identifies the usage of the PL310 cache in an I/O
> coherent configuration. Internally, it makes the driver use a
> different set of l2x0_of_data, in which the ->sync operation is NULL.
> 
> Note that technically speaking, a fully coherent system wouldn't
> require any of the other .outer_cache operations. However, in
> practice, when booting secondary CPUs, these are not yet coherent, and
> therefore a set of cache maintenance operations are necessary at this
> point. This explains why we keep the other .outer_cache operations and
> only ->sync is disabled.
> 
> While in theory any write to a PL310 register could cause the
> deadlock, in practice, disabling ->sync is sufficient to workaround
> the deadlock, since the other cache maintenance operations are only
> used in very specific situations.
> 
> Signed-off-by: Thomas Petazzoni <thomas.petazzoni-wi1+55ScJUtKEb57/3fJTNBPR1lH4CV8@public.gmane.org>
> 

Acked-by: Arnd Bergmann <arnd-r2nGTMty4D4@public.gmane.org>
--
To unsubscribe from this list: send the line "unsubscribe devicetree" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 66+ messages in thread

* [PATCHv3 2/3] ARM: mm: add support for HW coherent systems in PL310
@ 2014-05-15 13:23         ` Arnd Bergmann
  0 siblings, 0 replies; 66+ messages in thread
From: Arnd Bergmann @ 2014-05-15 13:23 UTC (permalink / raw
  To: linux-arm-kernel

On Thursday 15 May 2014 11:18:38 Thomas Petazzoni wrote:
> When a PL310 cache is used on a system that provides hardware
> coherency, the outer cache sync operation is useless, and can be
> skipped. Moreover, on some systems, it is harmful as it causes
> deadlocks between the Marvell coherency mechanism, the Marvell PCIe
> controller and the Cortex-A9.
> 
> To avoid this, this commit introduces a new Device Tree property
> 'dma-coherent' for the L2 cache controller node, valid only for the
> PL310 cache. It identifies the usage of the PL310 cache in an I/O
> coherent configuration. Internally, it makes the driver use a
> different set of l2x0_of_data, in which the ->sync operation is NULL.
> 
> Note that technically speaking, a fully coherent system wouldn't
> require any of the other .outer_cache operations. However, in
> practice, when booting secondary CPUs, these are not yet coherent, and
> therefore a set of cache maintenance operations are necessary at this
> point. This explains why we keep the other .outer_cache operations and
> only ->sync is disabled.
> 
> While in theory any write to a PL310 register could cause the
> deadlock, in practice, disabling ->sync is sufficient to workaround
> the deadlock, since the other cache maintenance operations are only
> used in very specific situations.
> 
> Signed-off-by: Thomas Petazzoni <thomas.petazzoni@free-electrons.com>
> 

Acked-by: Arnd Bergmann <arnd@arndb.de>

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCHv3 3/3] ARM: mvebu: implement L2/PCIe deadlock workaround
  2014-05-15  9:18     ` Thomas Petazzoni
@ 2014-05-15 13:26         ` Arnd Bergmann
  -1 siblings, 0 replies; 66+ messages in thread
From: Arnd Bergmann @ 2014-05-15 13:26 UTC (permalink / raw
  To: Thomas Petazzoni
  Cc: Russell King, Will Deacon, Catalin Marinas,
	devicetree-u79uwXL29TY76Z2rM5mHXA, Grant Likely, Rob Herring,
	Albin Tonnerre, linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	Jason Cooper, Andrew Lunn, Sebastian Hesselbarth, Gregory Clement,
	Tawfik Bayouk, Nadav Haklai, Lior Amsalem, Ezequiel Garcia

On Thursday 15 May 2014 11:18:39 Thomas Petazzoni wrote:
> @@ -78,6 +121,14 @@ static void __init mvebu_timer_and_clk_init(void)
>         mvebu_scu_enable();
>         coherency_init();
>         BUG_ON(mvebu_mbus_dt_init(coherency_available()));
> +
> +       if (of_machine_is_compatible("marvell,armada375") ||
> +           of_machine_is_compatible("marvell,armada38x")) {
> +               arch_ioremap_caller = armada_pcie_wa_ioremap_caller;
> +               pci_ioremap_set_mem_type(MT_UNCACHED);
> +               mvebu_l2x0_pl310_coherent();
> +       }
> +
>         l2x0_of_init(0, ~0UL);


I wonder whether this should just be move to the existing
armada_375_380_coherency_init() function to avoid the
of_machine_is_compatible() check. Other than that,

Acked-by: Arnd Bergmann <arnd-r2nGTMty4D4@public.gmane.org>
--
To unsubscribe from this list: send the line "unsubscribe devicetree" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 66+ messages in thread

* [PATCHv3 3/3] ARM: mvebu: implement L2/PCIe deadlock workaround
@ 2014-05-15 13:26         ` Arnd Bergmann
  0 siblings, 0 replies; 66+ messages in thread
From: Arnd Bergmann @ 2014-05-15 13:26 UTC (permalink / raw
  To: linux-arm-kernel

On Thursday 15 May 2014 11:18:39 Thomas Petazzoni wrote:
> @@ -78,6 +121,14 @@ static void __init mvebu_timer_and_clk_init(void)
>         mvebu_scu_enable();
>         coherency_init();
>         BUG_ON(mvebu_mbus_dt_init(coherency_available()));
> +
> +       if (of_machine_is_compatible("marvell,armada375") ||
> +           of_machine_is_compatible("marvell,armada38x")) {
> +               arch_ioremap_caller = armada_pcie_wa_ioremap_caller;
> +               pci_ioremap_set_mem_type(MT_UNCACHED);
> +               mvebu_l2x0_pl310_coherent();
> +       }
> +
>         l2x0_of_init(0, ~0UL);


I wonder whether this should just be move to the existing
armada_375_380_coherency_init() function to avoid the
of_machine_is_compatible() check. Other than that,

Acked-by: Arnd Bergmann <arnd@arndb.de>

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCHv3 2/3] ARM: mm: add support for HW coherent systems in PL310
  2014-05-15  9:18     ` Thomas Petazzoni
@ 2014-05-15 13:35         ` Rob Herring
  -1 siblings, 0 replies; 66+ messages in thread
From: Rob Herring @ 2014-05-15 13:35 UTC (permalink / raw
  To: Thomas Petazzoni
  Cc: Russell King, Will Deacon, Catalin Marinas,
	devicetree-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, Grant Likely,
	Rob Herring, Albin Tonnerre,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r@public.gmane.org,
	Jason Cooper, Andrew Lunn, Sebastian Hesselbarth, Gregory Clement,
	Tawfik Bayouk, Nadav Haklai, Lior Amsalem, Ezequiel Garcia

On Thu, May 15, 2014 at 4:18 AM, Thomas Petazzoni
<thomas.petazzoni-wi1+55ScJUtKEb57/3fJTNBPR1lH4CV8@public.gmane.org> wrote:
> When a PL310 cache is used on a system that provides hardware
> coherency, the outer cache sync operation is useless, and can be
> skipped. Moreover, on some systems, it is harmful as it causes
> deadlocks between the Marvell coherency mechanism, the Marvell PCIe
> controller and the Cortex-A9.
>
> To avoid this, this commit introduces a new Device Tree property
> 'dma-coherent' for the L2 cache controller node, valid only for the
> PL310 cache. It identifies the usage of the PL310 cache in an I/O
> coherent configuration. Internally, it makes the driver use a
> different set of l2x0_of_data, in which the ->sync operation is NULL.
>
> Note that technically speaking, a fully coherent system wouldn't
> require any of the other .outer_cache operations. However, in
> practice, when booting secondary CPUs, these are not yet coherent, and
> therefore a set of cache maintenance operations are necessary at this
> point. This explains why we keep the other .outer_cache operations and
> only ->sync is disabled.
>
> While in theory any write to a PL310 register could cause the
> deadlock, in practice, disabling ->sync is sufficient to workaround
> the deadlock, since the other cache maintenance operations are only
> used in very specific situations.
>
> Signed-off-by: Thomas Petazzoni <thomas.petazzoni-wi1+55ScJUtKEb57/3fJTNBPR1lH4CV8@public.gmane.org>
> ---
>  Documentation/devicetree/bindings/arm/l2cc.txt |  3 +++
>  arch/arm/mm/cache-l2x0.c                       | 24 ++++++++++++++++++++++++
>  2 files changed, 27 insertions(+)
>
> diff --git a/Documentation/devicetree/bindings/arm/l2cc.txt b/Documentation/devicetree/bindings/arm/l2cc.txt
> index b513cb8..077d837 100644
> --- a/Documentation/devicetree/bindings/arm/l2cc.txt
> +++ b/Documentation/devicetree/bindings/arm/l2cc.txt
> @@ -40,6 +40,9 @@ Optional properties:
>  - arm,filter-ranges : <start length> Starting address and length of window to
>    filter. Addresses in the filter window are directed to the M1 port. Other
>    addresses will go to the M0 port.
> +- dma-coherent : indicates that the system is operating in an hardware
> +  I/O coherent mode. Valid only when the arm,pl310-cache compatible
> +  string is used.

I don't like this because it creates 2 different meanings for
dma-coherent. dma-coherent is meant to be a property of DMA masters
and that is not really what the L2 is. Perhaps "arm,io-coherent" or
"pl310-io-coherent" instead.

Arguably, the cache nodes would be the more correct location to
describe coherency if we described the DMA buses properly, but we
don't.

>  - interrupts : 1 combined interrupt.
>  - cache-id-part: cache id part number to be used if it is not present
>    on hardware
> diff --git a/arch/arm/mm/cache-l2x0.c b/arch/arm/mm/cache-l2x0.c
> index 7abde2ce..cf0c037 100644
> --- a/arch/arm/mm/cache-l2x0.c
> +++ b/arch/arm/mm/cache-l2x0.c
> @@ -889,6 +889,26 @@ static const struct l2x0_of_data pl310_data = {
>         },
>  };
>
> +/*
> + * PL310 operations used on I/O coherent systems. Theoretically, no
> + * outer cache operations would be needed, except that for secondary
> + * processors bring up, a few cache maintenance operations are needed
> + * because secondary processors are not directly coherent with the L2
> + * cache when they start up.
> + */
> +static const struct l2x0_of_data pl310_coherent_data = {
> +       .setup = pl310_of_setup,
> +       .save  = pl310_save,
> +       .outer_cache = {
> +               .resume      = pl310_resume,
> +               .inv_range   = l2x0_inv_range,
> +               .clean_range = l2x0_clean_range,
> +               .flush_range = l2x0_flush_range,
> +               .flush_all   = l2x0_flush_all,
> +               .inv_all     = l2x0_inv_all,
> +       },
> +};

Why do you need a whole new struct. Can't you just null out the sync ptr?

Rob
--
To unsubscribe from this list: send the line "unsubscribe devicetree" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 66+ messages in thread

* [PATCHv3 2/3] ARM: mm: add support for HW coherent systems in PL310
@ 2014-05-15 13:35         ` Rob Herring
  0 siblings, 0 replies; 66+ messages in thread
From: Rob Herring @ 2014-05-15 13:35 UTC (permalink / raw
  To: linux-arm-kernel

On Thu, May 15, 2014 at 4:18 AM, Thomas Petazzoni
<thomas.petazzoni@free-electrons.com> wrote:
> When a PL310 cache is used on a system that provides hardware
> coherency, the outer cache sync operation is useless, and can be
> skipped. Moreover, on some systems, it is harmful as it causes
> deadlocks between the Marvell coherency mechanism, the Marvell PCIe
> controller and the Cortex-A9.
>
> To avoid this, this commit introduces a new Device Tree property
> 'dma-coherent' for the L2 cache controller node, valid only for the
> PL310 cache. It identifies the usage of the PL310 cache in an I/O
> coherent configuration. Internally, it makes the driver use a
> different set of l2x0_of_data, in which the ->sync operation is NULL.
>
> Note that technically speaking, a fully coherent system wouldn't
> require any of the other .outer_cache operations. However, in
> practice, when booting secondary CPUs, these are not yet coherent, and
> therefore a set of cache maintenance operations are necessary at this
> point. This explains why we keep the other .outer_cache operations and
> only ->sync is disabled.
>
> While in theory any write to a PL310 register could cause the
> deadlock, in practice, disabling ->sync is sufficient to workaround
> the deadlock, since the other cache maintenance operations are only
> used in very specific situations.
>
> Signed-off-by: Thomas Petazzoni <thomas.petazzoni@free-electrons.com>
> ---
>  Documentation/devicetree/bindings/arm/l2cc.txt |  3 +++
>  arch/arm/mm/cache-l2x0.c                       | 24 ++++++++++++++++++++++++
>  2 files changed, 27 insertions(+)
>
> diff --git a/Documentation/devicetree/bindings/arm/l2cc.txt b/Documentation/devicetree/bindings/arm/l2cc.txt
> index b513cb8..077d837 100644
> --- a/Documentation/devicetree/bindings/arm/l2cc.txt
> +++ b/Documentation/devicetree/bindings/arm/l2cc.txt
> @@ -40,6 +40,9 @@ Optional properties:
>  - arm,filter-ranges : <start length> Starting address and length of window to
>    filter. Addresses in the filter window are directed to the M1 port. Other
>    addresses will go to the M0 port.
> +- dma-coherent : indicates that the system is operating in an hardware
> +  I/O coherent mode. Valid only when the arm,pl310-cache compatible
> +  string is used.

I don't like this because it creates 2 different meanings for
dma-coherent. dma-coherent is meant to be a property of DMA masters
and that is not really what the L2 is. Perhaps "arm,io-coherent" or
"pl310-io-coherent" instead.

Arguably, the cache nodes would be the more correct location to
describe coherency if we described the DMA buses properly, but we
don't.

>  - interrupts : 1 combined interrupt.
>  - cache-id-part: cache id part number to be used if it is not present
>    on hardware
> diff --git a/arch/arm/mm/cache-l2x0.c b/arch/arm/mm/cache-l2x0.c
> index 7abde2ce..cf0c037 100644
> --- a/arch/arm/mm/cache-l2x0.c
> +++ b/arch/arm/mm/cache-l2x0.c
> @@ -889,6 +889,26 @@ static const struct l2x0_of_data pl310_data = {
>         },
>  };
>
> +/*
> + * PL310 operations used on I/O coherent systems. Theoretically, no
> + * outer cache operations would be needed, except that for secondary
> + * processors bring up, a few cache maintenance operations are needed
> + * because secondary processors are not directly coherent with the L2
> + * cache when they start up.
> + */
> +static const struct l2x0_of_data pl310_coherent_data = {
> +       .setup = pl310_of_setup,
> +       .save  = pl310_save,
> +       .outer_cache = {
> +               .resume      = pl310_resume,
> +               .inv_range   = l2x0_inv_range,
> +               .clean_range = l2x0_clean_range,
> +               .flush_range = l2x0_flush_range,
> +               .flush_all   = l2x0_flush_all,
> +               .inv_all     = l2x0_inv_all,
> +       },
> +};

Why do you need a whole new struct. Can't you just null out the sync ptr?

Rob

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCHv3 2/3] ARM: mm: add support for HW coherent systems in PL310
  2014-05-15 13:35         ` Rob Herring
@ 2014-05-15 13:46             ` Thomas Petazzoni
  -1 siblings, 0 replies; 66+ messages in thread
From: Thomas Petazzoni @ 2014-05-15 13:46 UTC (permalink / raw
  To: Rob Herring
  Cc: Russell King, Will Deacon, Catalin Marinas,
	devicetree-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, Grant Likely,
	Rob Herring, Albin Tonnerre,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r@public.gmane.org,
	Jason Cooper, Andrew Lunn, Sebastian Hesselbarth, Gregory Clement,
	Tawfik Bayouk, Nadav Haklai, Lior Amsalem, Ezequiel Garcia

Dear Rob Herring,

On Thu, 15 May 2014 08:35:18 -0500, Rob Herring wrote:

> > diff --git a/Documentation/devicetree/bindings/arm/l2cc.txt b/Documentation/devicetree/bindings/arm/l2cc.txt
> > index b513cb8..077d837 100644
> > --- a/Documentation/devicetree/bindings/arm/l2cc.txt
> > +++ b/Documentation/devicetree/bindings/arm/l2cc.txt
> > @@ -40,6 +40,9 @@ Optional properties:
> >  - arm,filter-ranges : <start length> Starting address and length of window to
> >    filter. Addresses in the filter window are directed to the M1 port. Other
> >    addresses will go to the M0 port.
> > +- dma-coherent : indicates that the system is operating in an hardware
> > +  I/O coherent mode. Valid only when the arm,pl310-cache compatible
> > +  string is used.
> 
> I don't like this because it creates 2 different meanings for
> dma-coherent. dma-coherent is meant to be a property of DMA masters
> and that is not really what the L2 is. Perhaps "arm,io-coherent" or
> "pl310-io-coherent" instead.

Yes, indeed, makes sense.

> > +/*
> > + * PL310 operations used on I/O coherent systems. Theoretically, no
> > + * outer cache operations would be needed, except that for secondary
> > + * processors bring up, a few cache maintenance operations are needed
> > + * because secondary processors are not directly coherent with the L2
> > + * cache when they start up.
> > + */
> > +static const struct l2x0_of_data pl310_coherent_data = {
> > +       .setup = pl310_of_setup,
> > +       .save  = pl310_save,
> > +       .outer_cache = {
> > +               .resume      = pl310_resume,
> > +               .inv_range   = l2x0_inv_range,
> > +               .clean_range = l2x0_clean_range,
> > +               .flush_range = l2x0_flush_range,
> > +               .flush_all   = l2x0_flush_all,
> > +               .inv_all     = l2x0_inv_all,
> > +       },
> > +};
> 
> Why do you need a whole new struct. Can't you just null out the sync ptr?

Because originally Catalin suggested a separate compatible string, and
therefore a separate set of operations. But you're right, with the move
to just an additional property, nullify-ing the sync pointer is much
simpler.

Thanks,

Thomas
-- 
Thomas Petazzoni, CTO, Free Electrons
Embedded Linux, Kernel and Android engineering
http://free-electrons.com
--
To unsubscribe from this list: send the line "unsubscribe devicetree" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 66+ messages in thread

* [PATCHv3 2/3] ARM: mm: add support for HW coherent systems in PL310
@ 2014-05-15 13:46             ` Thomas Petazzoni
  0 siblings, 0 replies; 66+ messages in thread
From: Thomas Petazzoni @ 2014-05-15 13:46 UTC (permalink / raw
  To: linux-arm-kernel

Dear Rob Herring,

On Thu, 15 May 2014 08:35:18 -0500, Rob Herring wrote:

> > diff --git a/Documentation/devicetree/bindings/arm/l2cc.txt b/Documentation/devicetree/bindings/arm/l2cc.txt
> > index b513cb8..077d837 100644
> > --- a/Documentation/devicetree/bindings/arm/l2cc.txt
> > +++ b/Documentation/devicetree/bindings/arm/l2cc.txt
> > @@ -40,6 +40,9 @@ Optional properties:
> >  - arm,filter-ranges : <start length> Starting address and length of window to
> >    filter. Addresses in the filter window are directed to the M1 port. Other
> >    addresses will go to the M0 port.
> > +- dma-coherent : indicates that the system is operating in an hardware
> > +  I/O coherent mode. Valid only when the arm,pl310-cache compatible
> > +  string is used.
> 
> I don't like this because it creates 2 different meanings for
> dma-coherent. dma-coherent is meant to be a property of DMA masters
> and that is not really what the L2 is. Perhaps "arm,io-coherent" or
> "pl310-io-coherent" instead.

Yes, indeed, makes sense.

> > +/*
> > + * PL310 operations used on I/O coherent systems. Theoretically, no
> > + * outer cache operations would be needed, except that for secondary
> > + * processors bring up, a few cache maintenance operations are needed
> > + * because secondary processors are not directly coherent with the L2
> > + * cache when they start up.
> > + */
> > +static const struct l2x0_of_data pl310_coherent_data = {
> > +       .setup = pl310_of_setup,
> > +       .save  = pl310_save,
> > +       .outer_cache = {
> > +               .resume      = pl310_resume,
> > +               .inv_range   = l2x0_inv_range,
> > +               .clean_range = l2x0_clean_range,
> > +               .flush_range = l2x0_flush_range,
> > +               .flush_all   = l2x0_flush_all,
> > +               .inv_all     = l2x0_inv_all,
> > +       },
> > +};
> 
> Why do you need a whole new struct. Can't you just null out the sync ptr?

Because originally Catalin suggested a separate compatible string, and
therefore a separate set of operations. But you're right, with the move
to just an additional property, nullify-ing the sync pointer is much
simpler.

Thanks,

Thomas
-- 
Thomas Petazzoni, CTO, Free Electrons
Embedded Linux, Kernel and Android engineering
http://free-electrons.com

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCHv3 3/3] ARM: mvebu: implement L2/PCIe deadlock workaround
  2014-05-15 13:21         ` Jason Cooper
@ 2014-05-15 13:50             ` Thomas Petazzoni
  -1 siblings, 0 replies; 66+ messages in thread
From: Thomas Petazzoni @ 2014-05-15 13:50 UTC (permalink / raw
  To: Jason Cooper
  Cc: Russell King, Will Deacon, Catalin Marinas,
	devicetree-u79uwXL29TY76Z2rM5mHXA, Grant Likely, Rob Herring,
	Albin Tonnerre, linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	Andrew Lunn, Sebastian Hesselbarth, Gregory Clement,
	Tawfik Bayouk, Nadav Haklai, Lior Amsalem, Ezequiel Garcia

Dear Jason Cooper,

On Thu, 15 May 2014 09:21:18 -0400, Jason Cooper wrote:

> > +
> > +	if (of_machine_is_compatible("marvell,armada375") ||
> > +	    of_machine_is_compatible("marvell,armada38x")) {
> > +		arch_ioremap_caller = armada_pcie_wa_ioremap_caller;
> > +		pci_ioremap_set_mem_type(MT_UNCACHED);
> 
> iiuc, this patch depends on 1/3.  So how would you like to handle it?

Seems like my long cover letters are not always read in their
entirety :-)

>From my cover letter:

 * PATCH 3/3 uses both of the added infrastructures, as well as the
   existing infrastructure to customize the behavior of ioremap() on a
   per-platform basis, to implement the workaround for the Armada 375
   and 38x SOCs. This patch should go through the mvebu maintainers
   tree. However, it has a build dependency on PATCH 1/3 that needs to
   be taken into account.

The last two sentences are the most important ones here :-)

Joke aside, I'm really open to your suggestions as to what the
appropriate merging patch is. I know Russell prefers to have the ARM
core code flow through his tree, which obviously make sense.

However, routing PATCH 3/3 through Russell tree is going to be a mess
of conflicts when things will get merged by Linus, due to the numerous
other changes we've been doing in mach-mvebu/board-v7.c.

So the only course of action I see right now is to work with Russell to
have him pull patches 1 and 2, and then provide a stable branch/tag
that you can merge as a dependency in whatever branch you decide to
merge patch 3/3. Would that work?

Thanks a lot!

Thomas
-- 
Thomas Petazzoni, CTO, Free Electrons
Embedded Linux, Kernel and Android engineering
http://free-electrons.com
--
To unsubscribe from this list: send the line "unsubscribe devicetree" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 66+ messages in thread

* [PATCHv3 3/3] ARM: mvebu: implement L2/PCIe deadlock workaround
@ 2014-05-15 13:50             ` Thomas Petazzoni
  0 siblings, 0 replies; 66+ messages in thread
From: Thomas Petazzoni @ 2014-05-15 13:50 UTC (permalink / raw
  To: linux-arm-kernel

Dear Jason Cooper,

On Thu, 15 May 2014 09:21:18 -0400, Jason Cooper wrote:

> > +
> > +	if (of_machine_is_compatible("marvell,armada375") ||
> > +	    of_machine_is_compatible("marvell,armada38x")) {
> > +		arch_ioremap_caller = armada_pcie_wa_ioremap_caller;
> > +		pci_ioremap_set_mem_type(MT_UNCACHED);
> 
> iiuc, this patch depends on 1/3.  So how would you like to handle it?

Seems like my long cover letters are not always read in their
entirety :-)

>From my cover letter:

 * PATCH 3/3 uses both of the added infrastructures, as well as the
   existing infrastructure to customize the behavior of ioremap() on a
   per-platform basis, to implement the workaround for the Armada 375
   and 38x SOCs. This patch should go through the mvebu maintainers
   tree. However, it has a build dependency on PATCH 1/3 that needs to
   be taken into account.

The last two sentences are the most important ones here :-)

Joke aside, I'm really open to your suggestions as to what the
appropriate merging patch is. I know Russell prefers to have the ARM
core code flow through his tree, which obviously make sense.

However, routing PATCH 3/3 through Russell tree is going to be a mess
of conflicts when things will get merged by Linus, due to the numerous
other changes we've been doing in mach-mvebu/board-v7.c.

So the only course of action I see right now is to work with Russell to
have him pull patches 1 and 2, and then provide a stable branch/tag
that you can merge as a dependency in whatever branch you decide to
merge patch 3/3. Would that work?

Thanks a lot!

Thomas
-- 
Thomas Petazzoni, CTO, Free Electrons
Embedded Linux, Kernel and Android engineering
http://free-electrons.com

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCHv3 1/3] ARM: mm: allow sub-architectures to override PCI I/O memory type
  2014-05-15 13:21         ` Arnd Bergmann
@ 2014-05-15 13:51           ` Thomas Petazzoni
  -1 siblings, 0 replies; 66+ messages in thread
From: Thomas Petazzoni @ 2014-05-15 13:51 UTC (permalink / raw
  To: Arnd Bergmann
  Cc: linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r, Russell King,
	Will Deacon, Catalin Marinas, devicetree-u79uwXL29TY76Z2rM5mHXA,
	Grant Likely, Rob Herring, Lior Amsalem, Andrew Lunn,
	Jason Cooper, Tawfik Bayouk, Nadav Haklai, Gregory Clement,
	Ezequiel Garcia, Albin Tonnerre, Sebastian Hesselbarth

Dear Arnd Bergmann,

On Thu, 15 May 2014 15:21:25 +0200, Arnd Bergmann wrote:
> On Thursday 15 May 2014 11:18:37 Thomas Petazzoni wrote:
> > @@ -445,7 +452,7 @@ int pci_ioremap_io(unsigned int offset, phys_addr_t phys_addr)
> >         return ioremap_page_range(PCI_IO_VIRT_BASE + offset,
> >                                   PCI_IO_VIRT_BASE + offset + SZ_64K,
> >                                   phys_addr,
> > -                                 __pgprot(get_mem_type(MT_DEVICE)->prot_pte));
> > +                                 __pgprot(get_mem_type(pci_ioremap_mem_type)->prot_pte));
> >  }
> >  EXPORT_SYMBOL_GPL(pci_ioremap_io);
> > 
> 
> As discussed on IRC, I think we'd be better off making this a strong-ordered
> mapping for all platforms unconditionally. The PCI I/O space semantics
> require non-posted writes, which is the main difference between device
> and SO mappings, so the same fix is required both for mvebu as a workaround
> for the deadlock as well as for everyone else as a fix for an incorrect
> PCI behavior.

Ok, I'll take that into account when posting a v4.

Thanks for the reminder,

Thomas
-- 
Thomas Petazzoni, CTO, Free Electrons
Embedded Linux, Kernel and Android engineering
http://free-electrons.com
--
To unsubscribe from this list: send the line "unsubscribe devicetree" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 66+ messages in thread

* [PATCHv3 1/3] ARM: mm: allow sub-architectures to override PCI I/O memory type
@ 2014-05-15 13:51           ` Thomas Petazzoni
  0 siblings, 0 replies; 66+ messages in thread
From: Thomas Petazzoni @ 2014-05-15 13:51 UTC (permalink / raw
  To: linux-arm-kernel

Dear Arnd Bergmann,

On Thu, 15 May 2014 15:21:25 +0200, Arnd Bergmann wrote:
> On Thursday 15 May 2014 11:18:37 Thomas Petazzoni wrote:
> > @@ -445,7 +452,7 @@ int pci_ioremap_io(unsigned int offset, phys_addr_t phys_addr)
> >         return ioremap_page_range(PCI_IO_VIRT_BASE + offset,
> >                                   PCI_IO_VIRT_BASE + offset + SZ_64K,
> >                                   phys_addr,
> > -                                 __pgprot(get_mem_type(MT_DEVICE)->prot_pte));
> > +                                 __pgprot(get_mem_type(pci_ioremap_mem_type)->prot_pte));
> >  }
> >  EXPORT_SYMBOL_GPL(pci_ioremap_io);
> > 
> 
> As discussed on IRC, I think we'd be better off making this a strong-ordered
> mapping for all platforms unconditionally. The PCI I/O space semantics
> require non-posted writes, which is the main difference between device
> and SO mappings, so the same fix is required both for mvebu as a workaround
> for the deadlock as well as for everyone else as a fix for an incorrect
> PCI behavior.

Ok, I'll take that into account when posting a v4.

Thanks for the reminder,

Thomas
-- 
Thomas Petazzoni, CTO, Free Electrons
Embedded Linux, Kernel and Android engineering
http://free-electrons.com

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCHv3 3/3] ARM: mvebu: implement L2/PCIe deadlock workaround
  2014-05-15 13:26         ` Arnd Bergmann
@ 2014-05-15 14:22           ` Thomas Petazzoni
  -1 siblings, 0 replies; 66+ messages in thread
From: Thomas Petazzoni @ 2014-05-15 14:22 UTC (permalink / raw
  To: Arnd Bergmann
  Cc: Lior Amsalem, devicetree-u79uwXL29TY76Z2rM5mHXA, Russell King,
	Jason Cooper, Tawfik Bayouk, Andrew Lunn, Catalin Marinas,
	Will Deacon, Grant Likely, Gregory Clement, Nadav Haklai,
	Rob Herring, Ezequiel Garcia, Albin Tonnerre,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	Sebastian Hesselbarth

Dear Arnd Bergmann,

On Thu, 15 May 2014 15:26:29 +0200, Arnd Bergmann wrote:

> > +
> > +       if (of_machine_is_compatible("marvell,armada375") ||
> > +           of_machine_is_compatible("marvell,armada38x")) {
> > +               arch_ioremap_caller = armada_pcie_wa_ioremap_caller;
> > +               pci_ioremap_set_mem_type(MT_UNCACHED);
> > +               mvebu_l2x0_pl310_coherent();
> > +       }
> > +
> >         l2x0_of_init(0, ~0UL);
> 
> 
> I wonder whether this should just be move to the existing
> armada_375_380_coherency_init() function to avoid the
> of_machine_is_compatible() check. Other than that,

Indeed, makes a lot of sense to have this in coherency.c instead. Will
be part of v4.

Thanks,

Thomas
-- 
Thomas Petazzoni, CTO, Free Electrons
Embedded Linux, Kernel and Android engineering
http://free-electrons.com
--
To unsubscribe from this list: send the line "unsubscribe devicetree" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 66+ messages in thread

* [PATCHv3 3/3] ARM: mvebu: implement L2/PCIe deadlock workaround
@ 2014-05-15 14:22           ` Thomas Petazzoni
  0 siblings, 0 replies; 66+ messages in thread
From: Thomas Petazzoni @ 2014-05-15 14:22 UTC (permalink / raw
  To: linux-arm-kernel

Dear Arnd Bergmann,

On Thu, 15 May 2014 15:26:29 +0200, Arnd Bergmann wrote:

> > +
> > +       if (of_machine_is_compatible("marvell,armada375") ||
> > +           of_machine_is_compatible("marvell,armada38x")) {
> > +               arch_ioremap_caller = armada_pcie_wa_ioremap_caller;
> > +               pci_ioremap_set_mem_type(MT_UNCACHED);
> > +               mvebu_l2x0_pl310_coherent();
> > +       }
> > +
> >         l2x0_of_init(0, ~0UL);
> 
> 
> I wonder whether this should just be move to the existing
> armada_375_380_coherency_init() function to avoid the
> of_machine_is_compatible() check. Other than that,

Indeed, makes a lot of sense to have this in coherency.c instead. Will
be part of v4.

Thanks,

Thomas
-- 
Thomas Petazzoni, CTO, Free Electrons
Embedded Linux, Kernel and Android engineering
http://free-electrons.com

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCHv3 1/3] ARM: mm: allow sub-architectures to override PCI I/O memory type
  2014-05-15 13:51           ` Thomas Petazzoni
@ 2014-05-15 14:29               ` Will Deacon
  -1 siblings, 0 replies; 66+ messages in thread
From: Will Deacon @ 2014-05-15 14:29 UTC (permalink / raw
  To: Thomas Petazzoni
  Cc: Arnd Bergmann,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r@public.gmane.org,
	Russell King, Catalin Marinas,
	devicetree-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, Grant Likely,
	Rob Herring, Lior Amsalem, Andrew Lunn, Jason Cooper,
	Tawfik Bayouk, Nadav Haklai, Gregory Clement, Ezequiel Garcia,
	Albin Tonnerre, Sebastian Hesselbarth

On Thu, May 15, 2014 at 02:51:30PM +0100, Thomas Petazzoni wrote:
> On Thu, 15 May 2014 15:21:25 +0200, Arnd Bergmann wrote:
> > On Thursday 15 May 2014 11:18:37 Thomas Petazzoni wrote:
> > > @@ -445,7 +452,7 @@ int pci_ioremap_io(unsigned int offset, phys_addr_t phys_addr)
> > >         return ioremap_page_range(PCI_IO_VIRT_BASE + offset,
> > >                                   PCI_IO_VIRT_BASE + offset + SZ_64K,
> > >                                   phys_addr,
> > > -                                 __pgprot(get_mem_type(MT_DEVICE)->prot_pte));
> > > +                                 __pgprot(get_mem_type(pci_ioremap_mem_type)->prot_pte));
> > >  }
> > >  EXPORT_SYMBOL_GPL(pci_ioremap_io);
> > > 
> > 
> > As discussed on IRC, I think we'd be better off making this a strong-ordered
> > mapping for all platforms unconditionally. The PCI I/O space semantics
> > require non-posted writes, which is the main difference between device
> > and SO mappings, so the same fix is required both for mvebu as a workaround
> > for the deadlock as well as for everyone else as a fix for an incorrect
> > PCI behavior.
> 
> Ok, I'll take that into account when posting a v4.

Actually, I don't think this reasoning is correct. The memory type here
applies to accesses mastered by the CPU onto AXI -- it is the job of the
AXI/PCI bridge (i.e. the host controller) to ensure that writes are not
posted, so this is irrelevant.

Of course, the erratum in question requires SO memory, but that's an
entirely different problem. The default should remain as device memory.

Will
--
To unsubscribe from this list: send the line "unsubscribe devicetree" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 66+ messages in thread

* [PATCHv3 1/3] ARM: mm: allow sub-architectures to override PCI I/O memory type
@ 2014-05-15 14:29               ` Will Deacon
  0 siblings, 0 replies; 66+ messages in thread
From: Will Deacon @ 2014-05-15 14:29 UTC (permalink / raw
  To: linux-arm-kernel

On Thu, May 15, 2014 at 02:51:30PM +0100, Thomas Petazzoni wrote:
> On Thu, 15 May 2014 15:21:25 +0200, Arnd Bergmann wrote:
> > On Thursday 15 May 2014 11:18:37 Thomas Petazzoni wrote:
> > > @@ -445,7 +452,7 @@ int pci_ioremap_io(unsigned int offset, phys_addr_t phys_addr)
> > >         return ioremap_page_range(PCI_IO_VIRT_BASE + offset,
> > >                                   PCI_IO_VIRT_BASE + offset + SZ_64K,
> > >                                   phys_addr,
> > > -                                 __pgprot(get_mem_type(MT_DEVICE)->prot_pte));
> > > +                                 __pgprot(get_mem_type(pci_ioremap_mem_type)->prot_pte));
> > >  }
> > >  EXPORT_SYMBOL_GPL(pci_ioremap_io);
> > > 
> > 
> > As discussed on IRC, I think we'd be better off making this a strong-ordered
> > mapping for all platforms unconditionally. The PCI I/O space semantics
> > require non-posted writes, which is the main difference between device
> > and SO mappings, so the same fix is required both for mvebu as a workaround
> > for the deadlock as well as for everyone else as a fix for an incorrect
> > PCI behavior.
> 
> Ok, I'll take that into account when posting a v4.

Actually, I don't think this reasoning is correct. The memory type here
applies to accesses mastered by the CPU onto AXI -- it is the job of the
AXI/PCI bridge (i.e. the host controller) to ensure that writes are not
posted, so this is irrelevant.

Of course, the erratum in question requires SO memory, but that's an
entirely different problem. The default should remain as device memory.

Will

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCHv3 1/3] ARM: mm: allow sub-architectures to override PCI I/O memory type
  2014-05-15 14:29               ` Will Deacon
@ 2014-05-15 14:32                   ` Arnd Bergmann
  -1 siblings, 0 replies; 66+ messages in thread
From: Arnd Bergmann @ 2014-05-15 14:32 UTC (permalink / raw
  To: Will Deacon
  Cc: Thomas Petazzoni,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r@public.gmane.org,
	Russell King, Catalin Marinas,
	devicetree-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, Grant Likely,
	Rob Herring, Lior Amsalem, Andrew Lunn, Jason Cooper,
	Tawfik Bayouk, Nadav Haklai, Gregory Clement, Ezequiel Garcia,
	Albin Tonnerre, Sebastian Hesselbarth

On Thursday 15 May 2014 15:29:24 Will Deacon wrote:
>  Thu, May 15, 2014 at 02:51:30PM +0100, Thomas Petazzoni wrote:
> > On Thu, 15 May 2014 15:21:25 +0200, Arnd Bergmann wrote:
> > > On Thursday 15 May 2014 11:18:37 Thomas Petazzoni wrote:
> > > > @@ -445,7 +452,7 @@ int pci_ioremap_io(unsigned int offset, phys_addr_t phys_addr)
> > > >         return ioremap_page_range(PCI_IO_VIRT_BASE + offset,
> > > >                                   PCI_IO_VIRT_BASE + offset + SZ_64K,
> > > >                                   phys_addr,
> > > > -                                 __pgprot(get_mem_type(MT_DEVICE)->prot_pte));
> > > > +                                 __pgprot(get_mem_type(pci_ioremap_mem_type)->prot_pte));
> > > >  }
> > > >  EXPORT_SYMBOL_GPL(pci_ioremap_io);
> > > > 
> > > 
> > > As discussed on IRC, I think we'd be better off making this a strong-ordered
> > > mapping for all platforms unconditionally. The PCI I/O space semantics
> > > require non-posted writes, which is the main difference between device
> > > and SO mappings, so the same fix is required both for mvebu as a workaround
> > > for the deadlock as well as for everyone else as a fix for an incorrect
> > > PCI behavior.
> > 
> > Ok, I'll take that into account when posting a v4.
> 
> Actually, I don't think this reasoning is correct. The memory type here
> applies to accesses mastered by the CPU onto AXI -- it is the job of the
> AXI/PCI bridge (i.e. the host controller) to ensure that writes are not
> posted, so this is irrelevant.
> 
> Of course, the erratum in question requires SO memory, but that's an
> entirely different problem. The default should remain as device memory.

How can a write be non-posted on the PCI bus if it's posted on AXI?
The way I understand it, the CPU would continue with the next instruction
as soon as the write has made it out to the AXI fabric, i.e. before
the PIO instruction is complete. If this is used to synchronize with a
DMA, there is no guarantee that the transaction from PCI will be visible
in memory by then.

	Arnd
--
To unsubscribe from this list: send the line "unsubscribe devicetree" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 66+ messages in thread

* [PATCHv3 1/3] ARM: mm: allow sub-architectures to override PCI I/O memory type
@ 2014-05-15 14:32                   ` Arnd Bergmann
  0 siblings, 0 replies; 66+ messages in thread
From: Arnd Bergmann @ 2014-05-15 14:32 UTC (permalink / raw
  To: linux-arm-kernel

On Thursday 15 May 2014 15:29:24 Will Deacon wrote:
>  Thu, May 15, 2014 at 02:51:30PM +0100, Thomas Petazzoni wrote:
> > On Thu, 15 May 2014 15:21:25 +0200, Arnd Bergmann wrote:
> > > On Thursday 15 May 2014 11:18:37 Thomas Petazzoni wrote:
> > > > @@ -445,7 +452,7 @@ int pci_ioremap_io(unsigned int offset, phys_addr_t phys_addr)
> > > >         return ioremap_page_range(PCI_IO_VIRT_BASE + offset,
> > > >                                   PCI_IO_VIRT_BASE + offset + SZ_64K,
> > > >                                   phys_addr,
> > > > -                                 __pgprot(get_mem_type(MT_DEVICE)->prot_pte));
> > > > +                                 __pgprot(get_mem_type(pci_ioremap_mem_type)->prot_pte));
> > > >  }
> > > >  EXPORT_SYMBOL_GPL(pci_ioremap_io);
> > > > 
> > > 
> > > As discussed on IRC, I think we'd be better off making this a strong-ordered
> > > mapping for all platforms unconditionally. The PCI I/O space semantics
> > > require non-posted writes, which is the main difference between device
> > > and SO mappings, so the same fix is required both for mvebu as a workaround
> > > for the deadlock as well as for everyone else as a fix for an incorrect
> > > PCI behavior.
> > 
> > Ok, I'll take that into account when posting a v4.
> 
> Actually, I don't think this reasoning is correct. The memory type here
> applies to accesses mastered by the CPU onto AXI -- it is the job of the
> AXI/PCI bridge (i.e. the host controller) to ensure that writes are not
> posted, so this is irrelevant.
> 
> Of course, the erratum in question requires SO memory, but that's an
> entirely different problem. The default should remain as device memory.

How can a write be non-posted on the PCI bus if it's posted on AXI?
The way I understand it, the CPU would continue with the next instruction
as soon as the write has made it out to the AXI fabric, i.e. before
the PIO instruction is complete. If this is used to synchronize with a
DMA, there is no guarantee that the transaction from PCI will be visible
in memory by then.

	Arnd

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCHv3 3/3] ARM: mvebu: implement L2/PCIe deadlock workaround
  2014-05-15 13:50             ` Thomas Petazzoni
@ 2014-05-15 15:31                 ` Jason Cooper
  -1 siblings, 0 replies; 66+ messages in thread
From: Jason Cooper @ 2014-05-15 15:31 UTC (permalink / raw
  To: Thomas Petazzoni
  Cc: Russell King, Will Deacon, Catalin Marinas,
	devicetree-u79uwXL29TY76Z2rM5mHXA, Grant Likely, Rob Herring,
	Albin Tonnerre, linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	Andrew Lunn, Sebastian Hesselbarth, Gregory Clement,
	Tawfik Bayouk, Nadav Haklai, Lior Amsalem, Ezequiel Garcia

On Thu, May 15, 2014 at 03:50:08PM +0200, Thomas Petazzoni wrote:
> Dear Jason Cooper,
> 
> On Thu, 15 May 2014 09:21:18 -0400, Jason Cooper wrote:
> 
> > > +
> > > +	if (of_machine_is_compatible("marvell,armada375") ||
> > > +	    of_machine_is_compatible("marvell,armada38x")) {
> > > +		arch_ioremap_caller = armada_pcie_wa_ioremap_caller;
> > > +		pci_ioremap_set_mem_type(MT_UNCACHED);
> > 
> > iiuc, this patch depends on 1/3.  So how would you like to handle it?
> 
> Seems like my long cover letters are not always read in their
> entirety :-)

It was, the 'u' comes from 'u'nderstanding the coverletter _and_ the
code. ;-)

One a side note, I should mention I find your detailed coverletters
extremely helpful for getting up to speed quickly on what the intention
of a series is.  Even better, the Link tag that's added to the commit
message allows future devs to be three clicks away from that
information.  Thanks for setting that standard.

Which makes me think, in a lot of cases, it's better to start at the top
of the thread when researching a patch.  The Message-Id of the
coverletter is always the first address in the References header.
Perhaps I should link to that instead, or add both.

> From my cover letter:
> 
>  * PATCH 3/3 uses both of the added infrastructures, as well as the
>    existing infrastructure to customize the behavior of ioremap() on a
>    per-platform basis, to implement the workaround for the Armada 375
>    and 38x SOCs. This patch should go through the mvebu maintainers
>    tree. However, it has a build dependency on PATCH 1/3 that needs to
>    be taken into account.
> 
> The last two sentences are the most important ones here :-)
> 
> Joke aside, I'm really open to your suggestions as to what the
> appropriate merging patch is. I know Russell prefers to have the ARM
> core code flow through his tree, which obviously make sense.
> 
> However, routing PATCH 3/3 through Russell tree is going to be a mess
> of conflicts when things will get merged by Linus, due to the numerous
> other changes we've been doing in mach-mvebu/board-v7.c.

Yep.

> So the only course of action I see right now is to work with Russell to
> have him pull patches 1 and 2, and then provide a stable branch/tag
> that you can merge as a dependency in whatever branch you decide to
> merge patch 3/3. Would that work?

Dunno.  I have a 100% failure rate getting Russell to create a stable
topic branch to resolve dependencies like these.  Unfortunately, I think
the answer is going to be resubmit 3/3 once 3.16-rc1 lands with 1/3 and
2/3.

Not the answer I want to give, but unfortunately, it's the most
realistic.

thx,

Jason.
--
To unsubscribe from this list: send the line "unsubscribe devicetree" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 66+ messages in thread

* [PATCHv3 3/3] ARM: mvebu: implement L2/PCIe deadlock workaround
@ 2014-05-15 15:31                 ` Jason Cooper
  0 siblings, 0 replies; 66+ messages in thread
From: Jason Cooper @ 2014-05-15 15:31 UTC (permalink / raw
  To: linux-arm-kernel

On Thu, May 15, 2014 at 03:50:08PM +0200, Thomas Petazzoni wrote:
> Dear Jason Cooper,
> 
> On Thu, 15 May 2014 09:21:18 -0400, Jason Cooper wrote:
> 
> > > +
> > > +	if (of_machine_is_compatible("marvell,armada375") ||
> > > +	    of_machine_is_compatible("marvell,armada38x")) {
> > > +		arch_ioremap_caller = armada_pcie_wa_ioremap_caller;
> > > +		pci_ioremap_set_mem_type(MT_UNCACHED);
> > 
> > iiuc, this patch depends on 1/3.  So how would you like to handle it?
> 
> Seems like my long cover letters are not always read in their
> entirety :-)

It was, the 'u' comes from 'u'nderstanding the coverletter _and_ the
code. ;-)

One a side note, I should mention I find your detailed coverletters
extremely helpful for getting up to speed quickly on what the intention
of a series is.  Even better, the Link tag that's added to the commit
message allows future devs to be three clicks away from that
information.  Thanks for setting that standard.

Which makes me think, in a lot of cases, it's better to start at the top
of the thread when researching a patch.  The Message-Id of the
coverletter is always the first address in the References header.
Perhaps I should link to that instead, or add both.

> From my cover letter:
> 
>  * PATCH 3/3 uses both of the added infrastructures, as well as the
>    existing infrastructure to customize the behavior of ioremap() on a
>    per-platform basis, to implement the workaround for the Armada 375
>    and 38x SOCs. This patch should go through the mvebu maintainers
>    tree. However, it has a build dependency on PATCH 1/3 that needs to
>    be taken into account.
> 
> The last two sentences are the most important ones here :-)
> 
> Joke aside, I'm really open to your suggestions as to what the
> appropriate merging patch is. I know Russell prefers to have the ARM
> core code flow through his tree, which obviously make sense.
> 
> However, routing PATCH 3/3 through Russell tree is going to be a mess
> of conflicts when things will get merged by Linus, due to the numerous
> other changes we've been doing in mach-mvebu/board-v7.c.

Yep.

> So the only course of action I see right now is to work with Russell to
> have him pull patches 1 and 2, and then provide a stable branch/tag
> that you can merge as a dependency in whatever branch you decide to
> merge patch 3/3. Would that work?

Dunno.  I have a 100% failure rate getting Russell to create a stable
topic branch to resolve dependencies like these.  Unfortunately, I think
the answer is going to be resubmit 3/3 once 3.16-rc1 lands with 1/3 and
2/3.

Not the answer I want to give, but unfortunately, it's the most
realistic.

thx,

Jason.

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCHv3 1/3] ARM: mm: allow sub-architectures to override PCI I/O memory type
  2014-05-15 14:32                   ` Arnd Bergmann
@ 2014-05-15 15:34                     ` Will Deacon
  -1 siblings, 0 replies; 66+ messages in thread
From: Will Deacon @ 2014-05-15 15:34 UTC (permalink / raw
  To: Arnd Bergmann
  Cc: Thomas Petazzoni,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r@public.gmane.org,
	Russell King, Catalin Marinas,
	devicetree-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, Grant Likely,
	Rob Herring, Lior Amsalem, Andrew Lunn, Jason Cooper,
	Tawfik Bayouk, Nadav Haklai, Gregory Clement, Ezequiel Garcia,
	Albin Tonnerre, Sebastian Hesselbarth

Hi Arnd,

On Thu, May 15, 2014 at 03:32:43PM +0100, Arnd Bergmann wrote:
> On Thursday 15 May 2014 15:29:24 Will Deacon wrote:
> >  Thu, May 15, 2014 at 02:51:30PM +0100, Thomas Petazzoni wrote:
> > > On Thu, 15 May 2014 15:21:25 +0200, Arnd Bergmann wrote:
> > > > As discussed on IRC, I think we'd be better off making this a strong-ordered
> > > > mapping for all platforms unconditionally. The PCI I/O space semantics
> > > > require non-posted writes, which is the main difference between device
> > > > and SO mappings, so the same fix is required both for mvebu as a workaround
> > > > for the deadlock as well as for everyone else as a fix for an incorrect
> > > > PCI behavior.
> > > 
> > > Ok, I'll take that into account when posting a v4.
> > 
> > Actually, I don't think this reasoning is correct. The memory type here
> > applies to accesses mastered by the CPU onto AXI -- it is the job of the
> > AXI/PCI bridge (i.e. the host controller) to ensure that writes are not
> > posted, so this is irrelevant.
> > 
> > Of course, the erratum in question requires SO memory, but that's an
> > entirely different problem. The default should remain as device memory.
> 
> How can a write be non-posted on the PCI bus if it's posted on AXI?

>From the point-of-view of the CPU it would be posted, but the PCI bus would
see an unposted write (so I imagine there would be write buffering at the
host controller). However, I worry that I'm missing your point :)

> The way I understand it, the CPU would continue with the next instruction
> as soon as the write has made it out to the AXI fabric, i.e. before
> the PIO instruction is complete.

The CPU can continue regardless -- you'd need a DSB if you want to hold up
the instruction stream based on completion of a memory access. With the
posted write (device type), the write may complete as soon as it reaches an
ordered bus.

Note that nGnRnE accesses in AArch64 (the equivalent to strongly-ordered)
*can* still get an early write response -- that is simply a hint to the
memory subsystem.

> If this is used to synchronize with a DMA, there is no guarantee that the
> transaction from PCI will be visible in memory by then.

Can you elaborate on this scenario please? When would we use an I/O space
write to synchronise with a DMA transfer from a PCI endpoint? You're
definitely referring to I/O space as opposed to Configuration Space, right?

Will
--
To unsubscribe from this list: send the line "unsubscribe devicetree" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 66+ messages in thread

* [PATCHv3 1/3] ARM: mm: allow sub-architectures to override PCI I/O memory type
@ 2014-05-15 15:34                     ` Will Deacon
  0 siblings, 0 replies; 66+ messages in thread
From: Will Deacon @ 2014-05-15 15:34 UTC (permalink / raw
  To: linux-arm-kernel

Hi Arnd,

On Thu, May 15, 2014 at 03:32:43PM +0100, Arnd Bergmann wrote:
> On Thursday 15 May 2014 15:29:24 Will Deacon wrote:
> >  Thu, May 15, 2014 at 02:51:30PM +0100, Thomas Petazzoni wrote:
> > > On Thu, 15 May 2014 15:21:25 +0200, Arnd Bergmann wrote:
> > > > As discussed on IRC, I think we'd be better off making this a strong-ordered
> > > > mapping for all platforms unconditionally. The PCI I/O space semantics
> > > > require non-posted writes, which is the main difference between device
> > > > and SO mappings, so the same fix is required both for mvebu as a workaround
> > > > for the deadlock as well as for everyone else as a fix for an incorrect
> > > > PCI behavior.
> > > 
> > > Ok, I'll take that into account when posting a v4.
> > 
> > Actually, I don't think this reasoning is correct. The memory type here
> > applies to accesses mastered by the CPU onto AXI -- it is the job of the
> > AXI/PCI bridge (i.e. the host controller) to ensure that writes are not
> > posted, so this is irrelevant.
> > 
> > Of course, the erratum in question requires SO memory, but that's an
> > entirely different problem. The default should remain as device memory.
> 
> How can a write be non-posted on the PCI bus if it's posted on AXI?

>From the point-of-view of the CPU it would be posted, but the PCI bus would
see an unposted write (so I imagine there would be write buffering at the
host controller). However, I worry that I'm missing your point :)

> The way I understand it, the CPU would continue with the next instruction
> as soon as the write has made it out to the AXI fabric, i.e. before
> the PIO instruction is complete.

The CPU can continue regardless -- you'd need a DSB if you want to hold up
the instruction stream based on completion of a memory access. With the
posted write (device type), the write may complete as soon as it reaches an
ordered bus.

Note that nGnRnE accesses in AArch64 (the equivalent to strongly-ordered)
*can* still get an early write response -- that is simply a hint to the
memory subsystem.

> If this is used to synchronize with a DMA, there is no guarantee that the
> transaction from PCI will be visible in memory by then.

Can you elaborate on this scenario please? When would we use an I/O space
write to synchronise with a DMA transfer from a PCI endpoint? You're
definitely referring to I/O space as opposed to Configuration Space, right?

Will

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCHv3 1/3] ARM: mm: allow sub-architectures to override PCI I/O memory type
  2014-05-15 15:34                     ` Will Deacon
@ 2014-05-15 15:55                         ` Arnd Bergmann
  -1 siblings, 0 replies; 66+ messages in thread
From: Arnd Bergmann @ 2014-05-15 15:55 UTC (permalink / raw
  To: Will Deacon
  Cc: Thomas Petazzoni,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r@public.gmane.org,
	Russell King, Catalin Marinas,
	devicetree-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, Grant Likely,
	Rob Herring, Lior Amsalem, Andrew Lunn, Jason Cooper,
	Tawfik Bayouk, Nadav Haklai, Gregory Clement, Ezequiel Garcia,
	Albin Tonnerre, Sebastian Hesselbarth

On Thursday 15 May 2014 16:34:30 Will Deacon wrote:
> > The way I understand it, the CPU would continue with the next instruction
> > as soon as the write has made it out to the AXI fabric, i.e. before
> > the PIO instruction is complete.
> 
> The CPU can continue regardless -- you'd need a DSB if you want to hold up
> the instruction stream based on completion of a memory access. With the
> posted write (device type), the write may complete as soon as it reaches an
> ordered bus.
> 
> Note that nGnRnE accesses in AArch64 (the equivalent to strongly-ordered)
> *can* still get an early write response -- that is simply a hint to the
> memory subsystem.
> 
> > If this is used to synchronize with a DMA, there is no guarantee that the
> > transaction from PCI will be visible in memory by then.
> 
> Can you elaborate on this scenario please? When would we use an I/O space
> write to synchronise with a DMA transfer from a PCI endpoint? You're
> definitely referring to I/O space as opposed to Configuration Space, right?

Correct. Assume a PCI device uses PIO and DMA. It sends a DMA to main memory
and lets the CPU know about the data using a level (IntA as opposed to MSI)
interrupt. The CPU performs an outl() operation to an I/O port to let the
hardware know it has received the IRQ and the response of the outl() is
guaranteed to flush the DMA transaction: by the time the outl() completes
we know that the data in memory is valid because it is strongly ordered
relative to the DMA.

outl() actually does a dsb() internally, but unfortunately that is
before the store, not after, so I assume that a driver relying on the
behavior above would still be racy.

Note that there are very few drivers using any port I/O at all, so I
don't know if this is actually a real-world problem or not. They
might all be doing no DMA, or have an inb()/inw()/inl() after the
interrupt, which would always be sufficiently ordered with the DMA.

	Arnd
--
To unsubscribe from this list: send the line "unsubscribe devicetree" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 66+ messages in thread

* [PATCHv3 1/3] ARM: mm: allow sub-architectures to override PCI I/O memory type
@ 2014-05-15 15:55                         ` Arnd Bergmann
  0 siblings, 0 replies; 66+ messages in thread
From: Arnd Bergmann @ 2014-05-15 15:55 UTC (permalink / raw
  To: linux-arm-kernel

On Thursday 15 May 2014 16:34:30 Will Deacon wrote:
> > The way I understand it, the CPU would continue with the next instruction
> > as soon as the write has made it out to the AXI fabric, i.e. before
> > the PIO instruction is complete.
> 
> The CPU can continue regardless -- you'd need a DSB if you want to hold up
> the instruction stream based on completion of a memory access. With the
> posted write (device type), the write may complete as soon as it reaches an
> ordered bus.
> 
> Note that nGnRnE accesses in AArch64 (the equivalent to strongly-ordered)
> *can* still get an early write response -- that is simply a hint to the
> memory subsystem.
> 
> > If this is used to synchronize with a DMA, there is no guarantee that the
> > transaction from PCI will be visible in memory by then.
> 
> Can you elaborate on this scenario please? When would we use an I/O space
> write to synchronise with a DMA transfer from a PCI endpoint? You're
> definitely referring to I/O space as opposed to Configuration Space, right?

Correct. Assume a PCI device uses PIO and DMA. It sends a DMA to main memory
and lets the CPU know about the data using a level (IntA as opposed to MSI)
interrupt. The CPU performs an outl() operation to an I/O port to let the
hardware know it has received the IRQ and the response of the outl() is
guaranteed to flush the DMA transaction: by the time the outl() completes
we know that the data in memory is valid because it is strongly ordered
relative to the DMA.

outl() actually does a dsb() internally, but unfortunately that is
before the store, not after, so I assume that a driver relying on the
behavior above would still be racy.

Note that there are very few drivers using any port I/O at all, so I
don't know if this is actually a real-world problem or not. They
might all be doing no DMA, or have an inb()/inw()/inl() after the
interrupt, which would always be sufficiently ordered with the DMA.

	Arnd

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCHv3 1/3] ARM: mm: allow sub-architectures to override PCI I/O memory type
  2014-05-15 15:34                     ` Will Deacon
@ 2014-05-15 17:53                         ` Jason Gunthorpe
  -1 siblings, 0 replies; 66+ messages in thread
From: Jason Gunthorpe @ 2014-05-15 17:53 UTC (permalink / raw
  To: Will Deacon
  Cc: Arnd Bergmann, Thomas Petazzoni,
	devicetree-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, Russell King,
	Jason Cooper, Tawfik Bayouk, Andrew Lunn, Catalin Marinas,
	Grant Likely, Albin Tonnerre, Lior Amsalem, Rob Herring,
	Ezequiel Garcia, Gregory Clement, Nadav Haklai,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r@public.gmane.org,
	Sebastian Hesselbarth

On Thu, May 15, 2014 at 04:34:30PM +0100, Will Deacon wrote:

> > How can a write be non-posted on the PCI bus if it's posted on AXI?
> 
> From the point-of-view of the CPU it would be posted, but the PCI bus would
> see an unposted write (so I imagine there would be write buffering at the
> host controller). However, I worry that I'm missing your point :)

It is worth being a bit careful with language here, from an AXI
perspective there is not really such thing as a posted write. 

All writes are explicitly ack'd upon 'completion', however the memory
type influences when that is allowed to happen.

For PCI IO writes the AXI memory type from the CPU must be 'Device
Non-bufferable' (AWCACHE = 0), which will require the AXI ACK to be
generated only once the PCI target returns an IOWr completion TLP.

For PCI Memory writes the AXI memory type from the CPU could be
'Device Non-bufferable' but it would be best if it is 'Device
Bufferable' (AWCACHE = 1).

The latter allows more performance by permitting any AXI bridge in the
path to ack the write early. This is as close as AXI gets to 'posted
writes'

It is very important that the page tables in the CPU properly select
the right AXI Memory Type for each space.

Somewhere there should be a table describing how the CPU page table
attributes map into AXI *CACHE/Memory Type signaling selectors.

Beyond that, as Will points out, a DSB as part of the outl might be
required to spin the cpu and prevent pipelining.

AFAIK, to duplicate x86 semantics an outl/inl must spin the CPU until
it completes at the target, and the CPU must not pipeline outl/inl
operations: outl();  outl(); produces 1 IOWr TLP, waits for
completion, then produces another.

Jason
--
To unsubscribe from this list: send the line "unsubscribe devicetree" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 66+ messages in thread

* [PATCHv3 1/3] ARM: mm: allow sub-architectures to override PCI I/O memory type
@ 2014-05-15 17:53                         ` Jason Gunthorpe
  0 siblings, 0 replies; 66+ messages in thread
From: Jason Gunthorpe @ 2014-05-15 17:53 UTC (permalink / raw
  To: linux-arm-kernel

On Thu, May 15, 2014 at 04:34:30PM +0100, Will Deacon wrote:

> > How can a write be non-posted on the PCI bus if it's posted on AXI?
> 
> From the point-of-view of the CPU it would be posted, but the PCI bus would
> see an unposted write (so I imagine there would be write buffering at the
> host controller). However, I worry that I'm missing your point :)

It is worth being a bit careful with language here, from an AXI
perspective there is not really such thing as a posted write. 

All writes are explicitly ack'd upon 'completion', however the memory
type influences when that is allowed to happen.

For PCI IO writes the AXI memory type from the CPU must be 'Device
Non-bufferable' (AWCACHE = 0), which will require the AXI ACK to be
generated only once the PCI target returns an IOWr completion TLP.

For PCI Memory writes the AXI memory type from the CPU could be
'Device Non-bufferable' but it would be best if it is 'Device
Bufferable' (AWCACHE = 1).

The latter allows more performance by permitting any AXI bridge in the
path to ack the write early. This is as close as AXI gets to 'posted
writes'

It is very important that the page tables in the CPU properly select
the right AXI Memory Type for each space.

Somewhere there should be a table describing how the CPU page table
attributes map into AXI *CACHE/Memory Type signaling selectors.

Beyond that, as Will points out, a DSB as part of the outl might be
required to spin the cpu and prevent pipelining.

AFAIK, to duplicate x86 semantics an outl/inl must spin the CPU until
it completes at the target, and the CPU must not pipeline outl/inl
operations: outl();  outl(); produces 1 IOWr TLP, waits for
completion, then produces another.

Jason

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCHv3 3/3] ARM: mvebu: implement L2/PCIe deadlock workaround
  2014-05-15 15:31                 ` Jason Cooper
@ 2014-05-16  7:19                     ` Thomas Petazzoni
  -1 siblings, 0 replies; 66+ messages in thread
From: Thomas Petazzoni @ 2014-05-16  7:19 UTC (permalink / raw
  To: Jason Cooper
  Cc: Russell King, Will Deacon, Catalin Marinas,
	devicetree-u79uwXL29TY76Z2rM5mHXA, Grant Likely, Rob Herring,
	Albin Tonnerre, linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	Andrew Lunn, Sebastian Hesselbarth, Gregory Clement,
	Tawfik Bayouk, Nadav Haklai, Lior Amsalem, Ezequiel Garcia

Dear Jason Cooper,

On Thu, 15 May 2014 11:31:50 -0400, Jason Cooper wrote:

> > > iiuc, this patch depends on 1/3.  So how would you like to handle it?
> > 
> > Seems like my long cover letters are not always read in their
> > entirety :-)
> 
> It was, the 'u' comes from 'u'nderstanding the coverletter _and_ the
> code. ;-)

Hehe :-)

> One a side note, I should mention I find your detailed coverletters
> extremely helpful for getting up to speed quickly on what the intention
> of a series is.  Even better, the Link tag that's added to the commit
> message allows future devs to be three clicks away from that
> information.  Thanks for setting that standard.

Thanks, I appreciate. I intentionally do some efforts to write detailed
cover letters, especially for the most complicated topics, I believe
it's needed to "sell" the patches to the appropriate maintainers.

> Which makes me think, in a lot of cases, it's better to start at the top
> of the thread when researching a patch.  The Message-Id of the
> coverletter is always the first address in the References header.
> Perhaps I should link to that instead, or add both.

Maybe. In any case, the cover letter is never far away for the patch
e-mails, so it isn't that complicated to get back to the cover letter.

Sometimes, it's a bit difficult to decide how to balance the
information between the cover letter and the commit logs. The
explanation in the cover letter has the nice property that it covers
the entire patch series, so it's very convenient to explain the overall
goal of the patch series, the global architecture of the solution being
proposed. But on the other hand, the contents of the cover letter do
not become part of the Git history. On the other side, commit logs
become part of the Git history (so people are more likely to see them
in the future), but are more difficult to use to convey the overall
design, since they are only per-patch.

> > So the only course of action I see right now is to work with Russell to
> > have him pull patches 1 and 2, and then provide a stable branch/tag
> > that you can merge as a dependency in whatever branch you decide to
> > merge patch 3/3. Would that work?
> 
> Dunno.  I have a 100% failure rate getting Russell to create a stable
> topic branch to resolve dependencies like these.  Unfortunately, I think
> the answer is going to be resubmit 3/3 once 3.16-rc1 lands with 1/3 and
> 2/3.

Let's go for the solution you've chosen: merge PATCH 3/3 of the v4.
This one does not have build dependency on PATCH 1/3, and already
solves most of the problems: MT_UNCACHED mappings for PCI memory
mappings, and avoidance of outer cache sync operations when hardware
I/O coherency is used. The only thing that remains to solve is the PCI
I/O mappings, but they are a lot less frequently used so it is not as
important as the other aspects of the problem. And this aspect can
indeed be solved as a followup patch.

Best regards,

Thomas
-- 
Thomas Petazzoni, CTO, Free Electrons
Embedded Linux, Kernel and Android engineering
http://free-electrons.com
--
To unsubscribe from this list: send the line "unsubscribe devicetree" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 66+ messages in thread

* [PATCHv3 3/3] ARM: mvebu: implement L2/PCIe deadlock workaround
@ 2014-05-16  7:19                     ` Thomas Petazzoni
  0 siblings, 0 replies; 66+ messages in thread
From: Thomas Petazzoni @ 2014-05-16  7:19 UTC (permalink / raw
  To: linux-arm-kernel

Dear Jason Cooper,

On Thu, 15 May 2014 11:31:50 -0400, Jason Cooper wrote:

> > > iiuc, this patch depends on 1/3.  So how would you like to handle it?
> > 
> > Seems like my long cover letters are not always read in their
> > entirety :-)
> 
> It was, the 'u' comes from 'u'nderstanding the coverletter _and_ the
> code. ;-)

Hehe :-)

> One a side note, I should mention I find your detailed coverletters
> extremely helpful for getting up to speed quickly on what the intention
> of a series is.  Even better, the Link tag that's added to the commit
> message allows future devs to be three clicks away from that
> information.  Thanks for setting that standard.

Thanks, I appreciate. I intentionally do some efforts to write detailed
cover letters, especially for the most complicated topics, I believe
it's needed to "sell" the patches to the appropriate maintainers.

> Which makes me think, in a lot of cases, it's better to start at the top
> of the thread when researching a patch.  The Message-Id of the
> coverletter is always the first address in the References header.
> Perhaps I should link to that instead, or add both.

Maybe. In any case, the cover letter is never far away for the patch
e-mails, so it isn't that complicated to get back to the cover letter.

Sometimes, it's a bit difficult to decide how to balance the
information between the cover letter and the commit logs. The
explanation in the cover letter has the nice property that it covers
the entire patch series, so it's very convenient to explain the overall
goal of the patch series, the global architecture of the solution being
proposed. But on the other hand, the contents of the cover letter do
not become part of the Git history. On the other side, commit logs
become part of the Git history (so people are more likely to see them
in the future), but are more difficult to use to convey the overall
design, since they are only per-patch.

> > So the only course of action I see right now is to work with Russell to
> > have him pull patches 1 and 2, and then provide a stable branch/tag
> > that you can merge as a dependency in whatever branch you decide to
> > merge patch 3/3. Would that work?
> 
> Dunno.  I have a 100% failure rate getting Russell to create a stable
> topic branch to resolve dependencies like these.  Unfortunately, I think
> the answer is going to be resubmit 3/3 once 3.16-rc1 lands with 1/3 and
> 2/3.

Let's go for the solution you've chosen: merge PATCH 3/3 of the v4.
This one does not have build dependency on PATCH 1/3, and already
solves most of the problems: MT_UNCACHED mappings for PCI memory
mappings, and avoidance of outer cache sync operations when hardware
I/O coherency is used. The only thing that remains to solve is the PCI
I/O mappings, but they are a lot less frequently used so it is not as
important as the other aspects of the problem. And this aspect can
indeed be solved as a followup patch.

Best regards,

Thomas
-- 
Thomas Petazzoni, CTO, Free Electrons
Embedded Linux, Kernel and Android engineering
http://free-electrons.com

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCHv3 1/3] ARM: mm: allow sub-architectures to override PCI I/O memory type
  2014-05-15 15:55                         ` Arnd Bergmann
@ 2014-05-16  9:53                           ` Will Deacon
  -1 siblings, 0 replies; 66+ messages in thread
From: Will Deacon @ 2014-05-16  9:53 UTC (permalink / raw
  To: Arnd Bergmann
  Cc: Thomas Petazzoni,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r@public.gmane.org,
	Russell King, Catalin Marinas,
	devicetree-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, Grant Likely,
	Rob Herring, Lior Amsalem, Andrew Lunn, Jason Cooper,
	Tawfik Bayouk, Nadav Haklai, Gregory Clement, Ezequiel Garcia,
	Albin Tonnerre, Sebastian Hesselbarth

On Thu, May 15, 2014 at 04:55:52PM +0100, Arnd Bergmann wrote:
> On Thursday 15 May 2014 16:34:30 Will Deacon wrote:
> > > The way I understand it, the CPU would continue with the next instruction
> > > as soon as the write has made it out to the AXI fabric, i.e. before
> > > the PIO instruction is complete.
> > 
> > The CPU can continue regardless -- you'd need a DSB if you want to hold up
> > the instruction stream based on completion of a memory access. With the
> > posted write (device type), the write may complete as soon as it reaches an
> > ordered bus.
> > 
> > Note that nGnRnE accesses in AArch64 (the equivalent to strongly-ordered)
> > *can* still get an early write response -- that is simply a hint to the
> > memory subsystem.
> > 
> > > If this is used to synchronize with a DMA, there is no guarantee that the
> > > transaction from PCI will be visible in memory by then.
> > 
> > Can you elaborate on this scenario please? When would we use an I/O space
> > write to synchronise with a DMA transfer from a PCI endpoint? You're
> > definitely referring to I/O space as opposed to Configuration Space, right?
> 
> Correct. Assume a PCI device uses PIO and DMA. It sends a DMA to main memory
> and lets the CPU know about the data using a level (IntA as opposed to MSI)
> interrupt. The CPU performs an outl() operation to an I/O port to let the
> hardware know it has received the IRQ and the response of the outl() is
> guaranteed to flush the DMA transaction: by the time the outl() completes
> we know that the data in memory is valid because it is strongly ordered
> relative to the DMA.

Hmm, when you say `guaranteed to flush the DMA transaction', is that a PCI
requirement? If so, whether or not that DMA data is then visible to the CPU
is really specific to the host-controller implementation. It could easily be
buffered somewhere between the host controller and memory, for example.

> outl() actually does a dsb() internally, but unfortunately that is
> before the store, not after, so I assume that a driver relying on the
> behavior above would still be racy.

Yup, we'd need an additional dsb. I think we're confusing what the PCI
specification says about ordering and what the inb/outb instructions provide
on x86. It may well be that we want to emulate the x86 behaviour on ARM, but
that's not going to come cheap and I don't think it's a decision we should
take lightly.

Will
--
To unsubscribe from this list: send the line "unsubscribe devicetree" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 66+ messages in thread

* [PATCHv3 1/3] ARM: mm: allow sub-architectures to override PCI I/O memory type
@ 2014-05-16  9:53                           ` Will Deacon
  0 siblings, 0 replies; 66+ messages in thread
From: Will Deacon @ 2014-05-16  9:53 UTC (permalink / raw
  To: linux-arm-kernel

On Thu, May 15, 2014 at 04:55:52PM +0100, Arnd Bergmann wrote:
> On Thursday 15 May 2014 16:34:30 Will Deacon wrote:
> > > The way I understand it, the CPU would continue with the next instruction
> > > as soon as the write has made it out to the AXI fabric, i.e. before
> > > the PIO instruction is complete.
> > 
> > The CPU can continue regardless -- you'd need a DSB if you want to hold up
> > the instruction stream based on completion of a memory access. With the
> > posted write (device type), the write may complete as soon as it reaches an
> > ordered bus.
> > 
> > Note that nGnRnE accesses in AArch64 (the equivalent to strongly-ordered)
> > *can* still get an early write response -- that is simply a hint to the
> > memory subsystem.
> > 
> > > If this is used to synchronize with a DMA, there is no guarantee that the
> > > transaction from PCI will be visible in memory by then.
> > 
> > Can you elaborate on this scenario please? When would we use an I/O space
> > write to synchronise with a DMA transfer from a PCI endpoint? You're
> > definitely referring to I/O space as opposed to Configuration Space, right?
> 
> Correct. Assume a PCI device uses PIO and DMA. It sends a DMA to main memory
> and lets the CPU know about the data using a level (IntA as opposed to MSI)
> interrupt. The CPU performs an outl() operation to an I/O port to let the
> hardware know it has received the IRQ and the response of the outl() is
> guaranteed to flush the DMA transaction: by the time the outl() completes
> we know that the data in memory is valid because it is strongly ordered
> relative to the DMA.

Hmm, when you say `guaranteed to flush the DMA transaction', is that a PCI
requirement? If so, whether or not that DMA data is then visible to the CPU
is really specific to the host-controller implementation. It could easily be
buffered somewhere between the host controller and memory, for example.

> outl() actually does a dsb() internally, but unfortunately that is
> before the store, not after, so I assume that a driver relying on the
> behavior above would still be racy.

Yup, we'd need an additional dsb. I think we're confusing what the PCI
specification says about ordering and what the inb/outb instructions provide
on x86. It may well be that we want to emulate the x86 behaviour on ARM, but
that's not going to come cheap and I don't think it's a decision we should
take lightly.

Will

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCHv3 1/3] ARM: mm: allow sub-architectures to override PCI I/O memory type
  2014-05-15 17:53                         ` Jason Gunthorpe
@ 2014-05-16  9:57                             ` Will Deacon
  -1 siblings, 0 replies; 66+ messages in thread
From: Will Deacon @ 2014-05-16  9:57 UTC (permalink / raw
  To: Jason Gunthorpe
  Cc: Arnd Bergmann, Thomas Petazzoni,
	devicetree-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, Russell King,
	Jason Cooper, Tawfik Bayouk, Andrew Lunn, Catalin Marinas,
	Grant Likely, Albin Tonnerre, Lior Amsalem, Rob Herring,
	Ezequiel Garcia, Gregory Clement, Nadav Haklai,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r@public.gmane.org,
	Sebastian Hesselbarth

On Thu, May 15, 2014 at 06:53:07PM +0100, Jason Gunthorpe wrote:
> On Thu, May 15, 2014 at 04:34:30PM +0100, Will Deacon wrote:
> > > How can a write be non-posted on the PCI bus if it's posted on AXI?
> > 
> > From the point-of-view of the CPU it would be posted, but the PCI bus would
> > see an unposted write (so I imagine there would be write buffering at the
> > host controller). However, I worry that I'm missing your point :)
> 
> It is worth being a bit careful with language here, from an AXI
> perspective there is not really such thing as a posted write. 
> 
> All writes are explicitly ack'd upon 'completion', however the memory
> type influences when that is allowed to happen.

Correct. I was trying desperately to avoid delving into AXI signals as it
adds another source of confusion, despite the attempt at being precise.

> For PCI IO writes the AXI memory type from the CPU must be 'Device
> Non-bufferable' (AWCACHE = 0), which will require the AXI ACK to be
> generated only once the PCI target returns an IOWr completion TLP.

That sounds like `strongly-ordered memory' for ARMv7.

> For PCI Memory writes the AXI memory type from the CPU could be
> 'Device Non-bufferable' but it would be best if it is 'Device
> Bufferable' (AWCACHE = 1).

That sounds like `device memory' for ARMv7.

> The latter allows more performance by permitting any AXI bridge in the
> path to ack the write early. This is as close as AXI gets to 'posted
> writes'
> 
> It is very important that the page tables in the CPU properly select
> the right AXI Memory Type for each space.

But, as far as I know, this ordering/completion guarantee for I/O space
accesses is a property of the x86 architecture, not something mandated by
the PCI spec (after all, this is nothing to do with the PCI bus).

> AFAIK, to duplicate x86 semantics an outl/inl must spin the CPU until
> it completes at the target, and the CPU must not pipeline outl/inl
> operations: outl();  outl(); produces 1 IOWr TLP, waits for
> completion, then produces another.

So that's the real question: Do we really need to duplicate x86 semantics
for IO space accesses? If we do, then we need both strongly-ordered memory
*and* a dsb in our accessors. That's not going to be much fun.

Will
--
To unsubscribe from this list: send the line "unsubscribe devicetree" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 66+ messages in thread

* [PATCHv3 1/3] ARM: mm: allow sub-architectures to override PCI I/O memory type
@ 2014-05-16  9:57                             ` Will Deacon
  0 siblings, 0 replies; 66+ messages in thread
From: Will Deacon @ 2014-05-16  9:57 UTC (permalink / raw
  To: linux-arm-kernel

On Thu, May 15, 2014 at 06:53:07PM +0100, Jason Gunthorpe wrote:
> On Thu, May 15, 2014 at 04:34:30PM +0100, Will Deacon wrote:
> > > How can a write be non-posted on the PCI bus if it's posted on AXI?
> > 
> > From the point-of-view of the CPU it would be posted, but the PCI bus would
> > see an unposted write (so I imagine there would be write buffering at the
> > host controller). However, I worry that I'm missing your point :)
> 
> It is worth being a bit careful with language here, from an AXI
> perspective there is not really such thing as a posted write. 
> 
> All writes are explicitly ack'd upon 'completion', however the memory
> type influences when that is allowed to happen.

Correct. I was trying desperately to avoid delving into AXI signals as it
adds another source of confusion, despite the attempt at being precise.

> For PCI IO writes the AXI memory type from the CPU must be 'Device
> Non-bufferable' (AWCACHE = 0), which will require the AXI ACK to be
> generated only once the PCI target returns an IOWr completion TLP.

That sounds like `strongly-ordered memory' for ARMv7.

> For PCI Memory writes the AXI memory type from the CPU could be
> 'Device Non-bufferable' but it would be best if it is 'Device
> Bufferable' (AWCACHE = 1).

That sounds like `device memory' for ARMv7.

> The latter allows more performance by permitting any AXI bridge in the
> path to ack the write early. This is as close as AXI gets to 'posted
> writes'
> 
> It is very important that the page tables in the CPU properly select
> the right AXI Memory Type for each space.

But, as far as I know, this ordering/completion guarantee for I/O space
accesses is a property of the x86 architecture, not something mandated by
the PCI spec (after all, this is nothing to do with the PCI bus).

> AFAIK, to duplicate x86 semantics an outl/inl must spin the CPU until
> it completes at the target, and the CPU must not pipeline outl/inl
> operations: outl();  outl(); produces 1 IOWr TLP, waits for
> completion, then produces another.

So that's the real question: Do we really need to duplicate x86 semantics
for IO space accesses? If we do, then we need both strongly-ordered memory
*and* a dsb in our accessors. That's not going to be much fun.

Will

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCHv3 1/3] ARM: mm: allow sub-architectures to override PCI I/O memory type
  2014-05-16  9:57                             ` Will Deacon
@ 2014-05-16 15:33                                 ` Jason Gunthorpe
  -1 siblings, 0 replies; 66+ messages in thread
From: Jason Gunthorpe @ 2014-05-16 15:33 UTC (permalink / raw
  To: Will Deacon
  Cc: Arnd Bergmann, Thomas Petazzoni,
	devicetree-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, Russell King,
	Jason Cooper, Tawfik Bayouk, Andrew Lunn, Catalin Marinas,
	Grant Likely, Albin Tonnerre, Lior Amsalem, Rob Herring,
	Ezequiel Garcia, Gregory Clement, Nadav Haklai,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r@public.gmane.org,
	Sebastian Hesselbarth

On Fri, May 16, 2014 at 10:57:36AM +0100, Will Deacon wrote:
> > AFAIK, to duplicate x86 semantics an outl/inl must spin the CPU until
> > it completes at the target, and the CPU must not pipeline outl/inl
> > operations: outl();  outl(); produces 1 IOWr TLP, waits for
> > completion, then produces another.
> 
> So that's the real question: Do we really need to duplicate x86 semantics
> for IO space accesses? If we do, then we need both strongly-ordered memory
> *and* a dsb in our accessors. That's not going to be much fun.

Yep, that is the real question.

It has been over 10 years since I last converted a driver from IO to
MMIO - but IIRC the completion timing became a software visible
difference.

The entire reasons that this funny non-posted outl exists in PCI is to
support software compatability. You could take an ISA device and stick
it on PCI and the driver timing would be completely unaltered.

Linux obviously works on systems where outl is posted, ARM currently,
for instance. I've also run on MIPS systems that completely lack the
ability to synchronize outl (and I recall having to convert all the
drivers to mmio on that system..)

Arnd is right though, I doubt anyone cares, using IO space has been
discouraged for a decade at least at this point.

Jason
--
To unsubscribe from this list: send the line "unsubscribe devicetree" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 66+ messages in thread

* [PATCHv3 1/3] ARM: mm: allow sub-architectures to override PCI I/O memory type
@ 2014-05-16 15:33                                 ` Jason Gunthorpe
  0 siblings, 0 replies; 66+ messages in thread
From: Jason Gunthorpe @ 2014-05-16 15:33 UTC (permalink / raw
  To: linux-arm-kernel

On Fri, May 16, 2014 at 10:57:36AM +0100, Will Deacon wrote:
> > AFAIK, to duplicate x86 semantics an outl/inl must spin the CPU until
> > it completes at the target, and the CPU must not pipeline outl/inl
> > operations: outl();  outl(); produces 1 IOWr TLP, waits for
> > completion, then produces another.
> 
> So that's the real question: Do we really need to duplicate x86 semantics
> for IO space accesses? If we do, then we need both strongly-ordered memory
> *and* a dsb in our accessors. That's not going to be much fun.

Yep, that is the real question.

It has been over 10 years since I last converted a driver from IO to
MMIO - but IIRC the completion timing became a software visible
difference.

The entire reasons that this funny non-posted outl exists in PCI is to
support software compatability. You could take an ISA device and stick
it on PCI and the driver timing would be completely unaltered.

Linux obviously works on systems where outl is posted, ARM currently,
for instance. I've also run on MIPS systems that completely lack the
ability to synchronize outl (and I recall having to convert all the
drivers to mmio on that system..)

Arnd is right though, I doubt anyone cares, using IO space has been
discouraged for a decade at least at this point.

Jason

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCHv3 1/3] ARM: mm: allow sub-architectures to override PCI I/O memory type
  2014-05-16  9:53                           ` Will Deacon
@ 2014-05-19 13:19                               ` Arnd Bergmann
  -1 siblings, 0 replies; 66+ messages in thread
From: Arnd Bergmann @ 2014-05-19 13:19 UTC (permalink / raw
  To: linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r
  Cc: Will Deacon, Thomas Petazzoni,
	devicetree-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, Russell King,
	Jason Cooper, Tawfik Bayouk, Andrew Lunn, Catalin Marinas,
	Grant Likely, Albin Tonnerre, Lior Amsalem, Rob Herring,
	Ezequiel Garcia, Gregory Clement, Nadav Haklai,
	Sebastian Hesselbarth

On Friday 16 May 2014 10:53:33 Will Deacon wrote:
> On Thu, May 15, 2014 at 04:55:52PM +0100, Arnd Bergmann wrote:
> > On Thursday 15 May 2014 16:34:30 Will Deacon wrote:
> > > > If this is used to synchronize with a DMA, there is no guarantee that the
> > > > transaction from PCI will be visible in memory by then.
> > > 
> > > Can you elaborate on this scenario please? When would we use an I/O space
> > > write to synchronise with a DMA transfer from a PCI endpoint? You're
> > > definitely referring to I/O space as opposed to Configuration Space, right?
> > 
> > Correct. Assume a PCI device uses PIO and DMA. It sends a DMA to main memory
> > and lets the CPU know about the data using a level (IntA as opposed to MSI)
> > interrupt. The CPU performs an outl() operation to an I/O port to let the
> > hardware know it has received the IRQ and the response of the outl() is
> > guaranteed to flush the DMA transaction: by the time the outl() completes
> > we know that the data in memory is valid because it is strongly ordered
> > relative to the DMA.
> 
> Hmm, when you say `guaranteed to flush the DMA transaction', is that a PCI
> requirement? If so, whether or not that DMA data is then visible to the CPU
> is really specific to the host-controller implementation. It could easily be
> buffered somewhere between the host controller and memory, for example.

It's something that drivers are supposed to rely on, and the PCI host
already has to make the same guarantee about ordering of MSI, MMIO-read
and DMA transactions, all of which we definitely rely on in the kernel.

> > outl() actually does a dsb() internally, but unfortunately that is
> > before the store, not after, so I assume that a driver relying on the
> > behavior above would still be racy.
> 
> Yup, we'd need an additional dsb. I think we're confusing what the PCI
> specification says about ordering and what the inb/outb instructions provide
> on x86. It may well be that we want to emulate the x86 behaviour on ARM, but
> that's not going to come cheap and I don't think it's a decision we should
> take lightly.

outb() is expected to be an extremely heavyweight operation, that's why nobody
uses it. Why would you care about whether it's one or two microsecond latency
on a MIDI port or an ISDN adapter?

We can decide that we don't care about correctness here, but the performance
argument doesn't seem overly important.

	Arnd
--
To unsubscribe from this list: send the line "unsubscribe devicetree" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 66+ messages in thread

* [PATCHv3 1/3] ARM: mm: allow sub-architectures to override PCI I/O memory type
@ 2014-05-19 13:19                               ` Arnd Bergmann
  0 siblings, 0 replies; 66+ messages in thread
From: Arnd Bergmann @ 2014-05-19 13:19 UTC (permalink / raw
  To: linux-arm-kernel

On Friday 16 May 2014 10:53:33 Will Deacon wrote:
> On Thu, May 15, 2014 at 04:55:52PM +0100, Arnd Bergmann wrote:
> > On Thursday 15 May 2014 16:34:30 Will Deacon wrote:
> > > > If this is used to synchronize with a DMA, there is no guarantee that the
> > > > transaction from PCI will be visible in memory by then.
> > > 
> > > Can you elaborate on this scenario please? When would we use an I/O space
> > > write to synchronise with a DMA transfer from a PCI endpoint? You're
> > > definitely referring to I/O space as opposed to Configuration Space, right?
> > 
> > Correct. Assume a PCI device uses PIO and DMA. It sends a DMA to main memory
> > and lets the CPU know about the data using a level (IntA as opposed to MSI)
> > interrupt. The CPU performs an outl() operation to an I/O port to let the
> > hardware know it has received the IRQ and the response of the outl() is
> > guaranteed to flush the DMA transaction: by the time the outl() completes
> > we know that the data in memory is valid because it is strongly ordered
> > relative to the DMA.
> 
> Hmm, when you say `guaranteed to flush the DMA transaction', is that a PCI
> requirement? If so, whether or not that DMA data is then visible to the CPU
> is really specific to the host-controller implementation. It could easily be
> buffered somewhere between the host controller and memory, for example.

It's something that drivers are supposed to rely on, and the PCI host
already has to make the same guarantee about ordering of MSI, MMIO-read
and DMA transactions, all of which we definitely rely on in the kernel.

> > outl() actually does a dsb() internally, but unfortunately that is
> > before the store, not after, so I assume that a driver relying on the
> > behavior above would still be racy.
> 
> Yup, we'd need an additional dsb. I think we're confusing what the PCI
> specification says about ordering and what the inb/outb instructions provide
> on x86. It may well be that we want to emulate the x86 behaviour on ARM, but
> that's not going to come cheap and I don't think it's a decision we should
> take lightly.

outb() is expected to be an extremely heavyweight operation, that's why nobody
uses it. Why would you care about whether it's one or two microsecond latency
on a MIDI port or an ISDN adapter?

We can decide that we don't care about correctness here, but the performance
argument doesn't seem overly important.

	Arnd

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCHv3 1/3] ARM: mm: allow sub-architectures to override PCI I/O memory type
  2014-05-19 13:19                               ` Arnd Bergmann
@ 2014-05-19 14:23                                 ` Will Deacon
  -1 siblings, 0 replies; 66+ messages in thread
From: Will Deacon @ 2014-05-19 14:23 UTC (permalink / raw
  To: Arnd Bergmann
  Cc: linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r@public.gmane.org,
	Thomas Petazzoni,
	devicetree-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, Russell King,
	Jason Cooper, Tawfik Bayouk, Andrew Lunn, Catalin Marinas,
	Grant Likely, Albin Tonnerre, Lior Amsalem, Rob Herring,
	Ezequiel Garcia, Gregory Clement, Nadav Haklai,
	Sebastian Hesselbarth

On Mon, May 19, 2014 at 02:19:58PM +0100, Arnd Bergmann wrote:
> On Friday 16 May 2014 10:53:33 Will Deacon wrote:
> > On Thu, May 15, 2014 at 04:55:52PM +0100, Arnd Bergmann wrote:
> > > On Thursday 15 May 2014 16:34:30 Will Deacon wrote:
> > > > > If this is used to synchronize with a DMA, there is no guarantee that the
> > > > > transaction from PCI will be visible in memory by then.
> > > > 
> > > > Can you elaborate on this scenario please? When would we use an I/O space
> > > > write to synchronise with a DMA transfer from a PCI endpoint? You're
> > > > definitely referring to I/O space as opposed to Configuration Space, right?
> > > 
> > > Correct. Assume a PCI device uses PIO and DMA. It sends a DMA to main memory
> > > and lets the CPU know about the data using a level (IntA as opposed to MSI)
> > > interrupt. The CPU performs an outl() operation to an I/O port to let the
> > > hardware know it has received the IRQ and the response of the outl() is
> > > guaranteed to flush the DMA transaction: by the time the outl() completes
> > > we know that the data in memory is valid because it is strongly ordered
> > > relative to the DMA.
> > 
> > Hmm, when you say `guaranteed to flush the DMA transaction', is that a PCI
> > requirement? If so, whether or not that DMA data is then visible to the CPU
> > is really specific to the host-controller implementation. It could easily be
> > buffered somewhere between the host controller and memory, for example.
> 
> It's something that drivers are supposed to rely on, and the PCI host
> already has to make the same guarantee about ordering of MSI, MMIO-read
> and DMA transactions, all of which we definitely rely on in the kernel.

Supposed to rely on for x86, sure. I don't think we can even give you this
guarantee for arm64, where nE is nothing more than a *hint* to the memory
subsystem.

For the MSI case, I thought we had to go and poke the SCU for the Marvell
SoC?

> > > outl() actually does a dsb() internally, but unfortunately that is
> > > before the store, not after, so I assume that a driver relying on the
> > > behavior above would still be racy.
> > 
> > Yup, we'd need an additional dsb. I think we're confusing what the PCI
> > specification says about ordering and what the inb/outb instructions provide
> > on x86. It may well be that we want to emulate the x86 behaviour on ARM, but
> > that's not going to come cheap and I don't think it's a decision we should
> > take lightly.
> 
> outb() is expected to be an extremely heavyweight operation, that's why nobody
> uses it. Why would you care about whether it's one or two microsecond latency
> on a MIDI port or an ISDN adapter?

Fair enough -- I just don't think we should dress up an erratum workaround as
a bug fix, especially when it's adding a new user of strongly-ordered memory
to the kernel which we can't honour for arm64.

> We can decide that we don't care about correctness here, but the performance
> argument doesn't seem overly important.

In this case, I think that's true.

Will
--
To unsubscribe from this list: send the line "unsubscribe devicetree" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 66+ messages in thread

* [PATCHv3 1/3] ARM: mm: allow sub-architectures to override PCI I/O memory type
@ 2014-05-19 14:23                                 ` Will Deacon
  0 siblings, 0 replies; 66+ messages in thread
From: Will Deacon @ 2014-05-19 14:23 UTC (permalink / raw
  To: linux-arm-kernel

On Mon, May 19, 2014 at 02:19:58PM +0100, Arnd Bergmann wrote:
> On Friday 16 May 2014 10:53:33 Will Deacon wrote:
> > On Thu, May 15, 2014 at 04:55:52PM +0100, Arnd Bergmann wrote:
> > > On Thursday 15 May 2014 16:34:30 Will Deacon wrote:
> > > > > If this is used to synchronize with a DMA, there is no guarantee that the
> > > > > transaction from PCI will be visible in memory by then.
> > > > 
> > > > Can you elaborate on this scenario please? When would we use an I/O space
> > > > write to synchronise with a DMA transfer from a PCI endpoint? You're
> > > > definitely referring to I/O space as opposed to Configuration Space, right?
> > > 
> > > Correct. Assume a PCI device uses PIO and DMA. It sends a DMA to main memory
> > > and lets the CPU know about the data using a level (IntA as opposed to MSI)
> > > interrupt. The CPU performs an outl() operation to an I/O port to let the
> > > hardware know it has received the IRQ and the response of the outl() is
> > > guaranteed to flush the DMA transaction: by the time the outl() completes
> > > we know that the data in memory is valid because it is strongly ordered
> > > relative to the DMA.
> > 
> > Hmm, when you say `guaranteed to flush the DMA transaction', is that a PCI
> > requirement? If so, whether or not that DMA data is then visible to the CPU
> > is really specific to the host-controller implementation. It could easily be
> > buffered somewhere between the host controller and memory, for example.
> 
> It's something that drivers are supposed to rely on, and the PCI host
> already has to make the same guarantee about ordering of MSI, MMIO-read
> and DMA transactions, all of which we definitely rely on in the kernel.

Supposed to rely on for x86, sure. I don't think we can even give you this
guarantee for arm64, where nE is nothing more than a *hint* to the memory
subsystem.

For the MSI case, I thought we had to go and poke the SCU for the Marvell
SoC?

> > > outl() actually does a dsb() internally, but unfortunately that is
> > > before the store, not after, so I assume that a driver relying on the
> > > behavior above would still be racy.
> > 
> > Yup, we'd need an additional dsb. I think we're confusing what the PCI
> > specification says about ordering and what the inb/outb instructions provide
> > on x86. It may well be that we want to emulate the x86 behaviour on ARM, but
> > that's not going to come cheap and I don't think it's a decision we should
> > take lightly.
> 
> outb() is expected to be an extremely heavyweight operation, that's why nobody
> uses it. Why would you care about whether it's one or two microsecond latency
> on a MIDI port or an ISDN adapter?

Fair enough -- I just don't think we should dress up an erratum workaround as
a bug fix, especially when it's adding a new user of strongly-ordered memory
to the kernel which we can't honour for arm64.

> We can decide that we don't care about correctness here, but the performance
> argument doesn't seem overly important.

In this case, I think that's true.

Will

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCHv3 1/3] ARM: mm: allow sub-architectures to override PCI I/O memory type
  2014-05-19 14:23                                 ` Will Deacon
@ 2014-05-19 16:40                                     ` Arnd Bergmann
  -1 siblings, 0 replies; 66+ messages in thread
From: Arnd Bergmann @ 2014-05-19 16:40 UTC (permalink / raw
  To: Will Deacon
  Cc: linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r@public.gmane.org,
	Thomas Petazzoni,
	devicetree-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, Russell King,
	Jason Cooper, Tawfik Bayouk, Andrew Lunn, Catalin Marinas,
	Grant Likely, Albin Tonnerre, Lior Amsalem, Rob Herring,
	Ezequiel Garcia, Gregory Clement, Nadav Haklai,
	Sebastian Hesselbarth

On Monday 19 May 2014 15:23:55 Will Deacon wrote:
> On Mon, May 19, 2014 at 02:19:58PM +0100, Arnd Bergmann wrote:
> > On Friday 16 May 2014 10:53:33 Will Deacon wrote:
> > > On Thu, May 15, 2014 at 04:55:52PM +0100, Arnd Bergmann wrote:
> > > > On Thursday 15 May 2014 16:34:30 Will Deacon wrote:
> > > > > > If this is used to synchronize with a DMA, there is no guarantee that the
> > > > > > transaction from PCI will be visible in memory by then.
> > > > > 
> > > > > Can you elaborate on this scenario please? When would we use an I/O space
> > > > > write to synchronise with a DMA transfer from a PCI endpoint? You're
> > > > > definitely referring to I/O space as opposed to Configuration Space, right?
> > > > 
> > > > Correct. Assume a PCI device uses PIO and DMA. It sends a DMA to main memory
> > > > and lets the CPU know about the data using a level (IntA as opposed to MSI)
> > > > interrupt. The CPU performs an outl() operation to an I/O port to let the
> > > > hardware know it has received the IRQ and the response of the outl() is
> > > > guaranteed to flush the DMA transaction: by the time the outl() completes
> > > > we know that the data in memory is valid because it is strongly ordered
> > > > relative to the DMA.
> > > 
> > > Hmm, when you say `guaranteed to flush the DMA transaction', is that a PCI
> > > requirement? If so, whether or not that DMA data is then visible to the CPU
> > > is really specific to the host-controller implementation. It could easily be
> > > buffered somewhere between the host controller and memory, for example.
> > 
> > It's something that drivers are supposed to rely on, and the PCI host
> > already has to make the same guarantee about ordering of MSI, MMIO-read
> > and DMA transactions, all of which we definitely rely on in the kernel.
> 
> Supposed to rely on for x86, sure. I don't think we can even give you this
> guarantee for arm64, where nE is nothing more than a *hint* to the memory
> subsystem.

That might be why SBSA actually doesn't specify I/O space at all ;-)
 
> For the MSI case, I thought we had to go and poke the SCU for the Marvell
> SoC?

I'd consider that a bug workaround for a case where the hardware doesn't
do the reasonably thing.

> > > > outl() actually does a dsb() internally, but unfortunately that is
> > > > before the store, not after, so I assume that a driver relying on the
> > > > behavior above would still be racy.
> > > 
> > > Yup, we'd need an additional dsb. I think we're confusing what the PCI
> > > specification says about ordering and what the inb/outb instructions provide
> > > on x86. It may well be that we want to emulate the x86 behaviour on ARM, but
> > > that's not going to come cheap and I don't think it's a decision we should
> > > take lightly.
> > 
> > outb() is expected to be an extremely heavyweight operation, that's why nobody
> > uses it. Why would you care about whether it's one or two microsecond latency
> > on a MIDI port or an ISDN adapter?
> 
> Fair enough -- I just don't think we should dress up an erratum workaround as
> a bug fix, especially when it's adding a new user of strongly-ordered memory
> to the kernel which we can't honour for arm64.

Makes sense. How about a patch then that just changes the memory type for
the I/O space and adds a comment explaining all we found out? From what
I can tell, strict ordering is required for Armada 3xx, may or may not
be required elsewhere but should never hurt noticeably. We may want to restrict
it to ARMv6/v7 so we don't have to pick the right domain.

	Arnd
--
To unsubscribe from this list: send the line "unsubscribe devicetree" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 66+ messages in thread

* [PATCHv3 1/3] ARM: mm: allow sub-architectures to override PCI I/O memory type
@ 2014-05-19 16:40                                     ` Arnd Bergmann
  0 siblings, 0 replies; 66+ messages in thread
From: Arnd Bergmann @ 2014-05-19 16:40 UTC (permalink / raw
  To: linux-arm-kernel

On Monday 19 May 2014 15:23:55 Will Deacon wrote:
> On Mon, May 19, 2014 at 02:19:58PM +0100, Arnd Bergmann wrote:
> > On Friday 16 May 2014 10:53:33 Will Deacon wrote:
> > > On Thu, May 15, 2014 at 04:55:52PM +0100, Arnd Bergmann wrote:
> > > > On Thursday 15 May 2014 16:34:30 Will Deacon wrote:
> > > > > > If this is used to synchronize with a DMA, there is no guarantee that the
> > > > > > transaction from PCI will be visible in memory by then.
> > > > > 
> > > > > Can you elaborate on this scenario please? When would we use an I/O space
> > > > > write to synchronise with a DMA transfer from a PCI endpoint? You're
> > > > > definitely referring to I/O space as opposed to Configuration Space, right?
> > > > 
> > > > Correct. Assume a PCI device uses PIO and DMA. It sends a DMA to main memory
> > > > and lets the CPU know about the data using a level (IntA as opposed to MSI)
> > > > interrupt. The CPU performs an outl() operation to an I/O port to let the
> > > > hardware know it has received the IRQ and the response of the outl() is
> > > > guaranteed to flush the DMA transaction: by the time the outl() completes
> > > > we know that the data in memory is valid because it is strongly ordered
> > > > relative to the DMA.
> > > 
> > > Hmm, when you say `guaranteed to flush the DMA transaction', is that a PCI
> > > requirement? If so, whether or not that DMA data is then visible to the CPU
> > > is really specific to the host-controller implementation. It could easily be
> > > buffered somewhere between the host controller and memory, for example.
> > 
> > It's something that drivers are supposed to rely on, and the PCI host
> > already has to make the same guarantee about ordering of MSI, MMIO-read
> > and DMA transactions, all of which we definitely rely on in the kernel.
> 
> Supposed to rely on for x86, sure. I don't think we can even give you this
> guarantee for arm64, where nE is nothing more than a *hint* to the memory
> subsystem.

That might be why SBSA actually doesn't specify I/O space at all ;-)
 
> For the MSI case, I thought we had to go and poke the SCU for the Marvell
> SoC?

I'd consider that a bug workaround for a case where the hardware doesn't
do the reasonably thing.

> > > > outl() actually does a dsb() internally, but unfortunately that is
> > > > before the store, not after, so I assume that a driver relying on the
> > > > behavior above would still be racy.
> > > 
> > > Yup, we'd need an additional dsb. I think we're confusing what the PCI
> > > specification says about ordering and what the inb/outb instructions provide
> > > on x86. It may well be that we want to emulate the x86 behaviour on ARM, but
> > > that's not going to come cheap and I don't think it's a decision we should
> > > take lightly.
> > 
> > outb() is expected to be an extremely heavyweight operation, that's why nobody
> > uses it. Why would you care about whether it's one or two microsecond latency
> > on a MIDI port or an ISDN adapter?
> 
> Fair enough -- I just don't think we should dress up an erratum workaround as
> a bug fix, especially when it's adding a new user of strongly-ordered memory
> to the kernel which we can't honour for arm64.

Makes sense. How about a patch then that just changes the memory type for
the I/O space and adds a comment explaining all we found out? From what
I can tell, strict ordering is required for Armada 3xx, may or may not
be required elsewhere but should never hurt noticeably. We may want to restrict
it to ARMv6/v7 so we don't have to pick the right domain.

	Arnd

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCHv3 1/3] ARM: mm: allow sub-architectures to override PCI I/O memory type
  2014-05-19 16:40                                     ` Arnd Bergmann
@ 2014-05-19 16:50                                       ` Will Deacon
  -1 siblings, 0 replies; 66+ messages in thread
From: Will Deacon @ 2014-05-19 16:50 UTC (permalink / raw
  To: Arnd Bergmann
  Cc: linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r@public.gmane.org,
	Thomas Petazzoni,
	devicetree-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, Russell King,
	Jason Cooper, Tawfik Bayouk, Andrew Lunn, Catalin Marinas,
	Grant Likely, Albin Tonnerre, Lior Amsalem, Rob Herring,
	Ezequiel Garcia, Gregory Clement, Nadav Haklai,
	Sebastian Hesselbarth

On Mon, May 19, 2014 at 05:40:51PM +0100, Arnd Bergmann wrote:
> On Monday 19 May 2014 15:23:55 Will Deacon wrote:
> > Fair enough -- I just don't think we should dress up an erratum workaround as
> > a bug fix, especially when it's adding a new user of strongly-ordered memory
> > to the kernel which we can't honour for arm64.
> 
> Makes sense. How about a patch then that just changes the memory type for
> the I/O space and adds a comment explaining all we found out? From what
> I can tell, strict ordering is required for Armada 3xx, may or may not
> be required elsewhere but should never hurt noticeably. We may want to restrict
> it to ARMv6/v7 so we don't have to pick the right domain.

Ok, and we don't bother with the extra dsb in outb? I'm alright with that if
the comment goes into enough detail.

Will
--
To unsubscribe from this list: send the line "unsubscribe devicetree" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 66+ messages in thread

* [PATCHv3 1/3] ARM: mm: allow sub-architectures to override PCI I/O memory type
@ 2014-05-19 16:50                                       ` Will Deacon
  0 siblings, 0 replies; 66+ messages in thread
From: Will Deacon @ 2014-05-19 16:50 UTC (permalink / raw
  To: linux-arm-kernel

On Mon, May 19, 2014 at 05:40:51PM +0100, Arnd Bergmann wrote:
> On Monday 19 May 2014 15:23:55 Will Deacon wrote:
> > Fair enough -- I just don't think we should dress up an erratum workaround as
> > a bug fix, especially when it's adding a new user of strongly-ordered memory
> > to the kernel which we can't honour for arm64.
> 
> Makes sense. How about a patch then that just changes the memory type for
> the I/O space and adds a comment explaining all we found out? From what
> I can tell, strict ordering is required for Armada 3xx, may or may not
> be required elsewhere but should never hurt noticeably. We may want to restrict
> it to ARMv6/v7 so we don't have to pick the right domain.

Ok, and we don't bother with the extra dsb in outb? I'm alright with that if
the comment goes into enough detail.

Will

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCHv3 1/3] ARM: mm: allow sub-architectures to override PCI I/O memory type
  2014-05-19 16:50                                       ` Will Deacon
@ 2014-05-19 17:04                                           ` Arnd Bergmann
  -1 siblings, 0 replies; 66+ messages in thread
From: Arnd Bergmann @ 2014-05-19 17:04 UTC (permalink / raw
  To: Will Deacon
  Cc: linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r@public.gmane.org,
	Thomas Petazzoni,
	devicetree-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, Russell King,
	Jason Cooper, Tawfik Bayouk, Andrew Lunn, Catalin Marinas,
	Grant Likely, Albin Tonnerre, Lior Amsalem, Rob Herring,
	Ezequiel Garcia, Gregory Clement, Nadav Haklai,
	Sebastian Hesselbarth

On Monday 19 May 2014 17:50:10 Will Deacon wrote:
> On Mon, May 19, 2014 at 05:40:51PM +0100, Arnd Bergmann wrote:
> > On Monday 19 May 2014 15:23:55 Will Deacon wrote:
> > > Fair enough -- I just don't think we should dress up an erratum workaround as
> > > a bug fix, especially when it's adding a new user of strongly-ordered memory
> > > to the kernel which we can't honour for arm64.
> > 
> > Makes sense. How about a patch then that just changes the memory type for
> > the I/O space and adds a comment explaining all we found out? From what
> > I can tell, strict ordering is required for Armada 3xx, may or may not
> > be required elsewhere but should never hurt noticeably. We may want to restrict
> > it to ARMv6/v7 so we don't have to pick the right domain.
> 
> Ok, and we don't bother with the extra dsb in outb? I'm alright with that if
> the comment goes into enough detail.

That was my idea. I'm fine with it either way, with the dsb or without it.
Let's see what Russell thinks about this, he probably has a preference.

	Arnd
--
To unsubscribe from this list: send the line "unsubscribe devicetree" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 66+ messages in thread

* [PATCHv3 1/3] ARM: mm: allow sub-architectures to override PCI I/O memory type
@ 2014-05-19 17:04                                           ` Arnd Bergmann
  0 siblings, 0 replies; 66+ messages in thread
From: Arnd Bergmann @ 2014-05-19 17:04 UTC (permalink / raw
  To: linux-arm-kernel

On Monday 19 May 2014 17:50:10 Will Deacon wrote:
> On Mon, May 19, 2014 at 05:40:51PM +0100, Arnd Bergmann wrote:
> > On Monday 19 May 2014 15:23:55 Will Deacon wrote:
> > > Fair enough -- I just don't think we should dress up an erratum workaround as
> > > a bug fix, especially when it's adding a new user of strongly-ordered memory
> > > to the kernel which we can't honour for arm64.
> > 
> > Makes sense. How about a patch then that just changes the memory type for
> > the I/O space and adds a comment explaining all we found out? From what
> > I can tell, strict ordering is required for Armada 3xx, may or may not
> > be required elsewhere but should never hurt noticeably. We may want to restrict
> > it to ARMv6/v7 so we don't have to pick the right domain.
> 
> Ok, and we don't bother with the extra dsb in outb? I'm alright with that if
> the comment goes into enough detail.

That was my idea. I'm fine with it either way, with the dsb or without it.
Let's see what Russell thinks about this, he probably has a preference.

	Arnd

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCHv3 1/3] ARM: mm: allow sub-architectures to override PCI I/O memory type
  2014-05-16  9:53                           ` Will Deacon
@ 2014-05-21  5:20                               ` Jason Gunthorpe
  -1 siblings, 0 replies; 66+ messages in thread
From: Jason Gunthorpe @ 2014-05-21  5:20 UTC (permalink / raw
  To: Will Deacon
  Cc: Arnd Bergmann, Thomas Petazzoni,
	devicetree-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, Russell King,
	Jason Cooper, Tawfik Bayouk, Andrew Lunn, Catalin Marinas,
	Grant Likely, Albin Tonnerre, Lior Amsalem, Rob Herring,
	Ezequiel Garcia, Gregory Clement, Nadav Haklai,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r@public.gmane.org,
	Sebastian Hesselbarth

On Fri, May 16, 2014 at 10:53:33AM +0100, Will Deacon wrote:

> > Correct. Assume a PCI device uses PIO and DMA. It sends a DMA to main memory
> > and lets the CPU know about the data using a level (IntA as opposed to MSI)
> > interrupt. The CPU performs an outl() operation to an I/O port to let the
> > hardware know it has received the IRQ and the response of the outl() is
> > guaranteed to flush the DMA transaction: by the time the outl() completes
> > we know that the data in memory is valid because it is strongly ordered
> > relative to the DMA.

Keep in mind that the IntA message itself is going to flush the DMA,
no sane host bridge implementation should process the IntA until all
prior DMA writes are completed, just like MSI.

Also, legacy non-MSI interrupts are always sharable, so the ISR must
always start with a read of a device specific status reguster, which
will also flush any DMA writes.

The simplest common scenario to show synchronous outl is this:

void pci_isr()
{
   if (inl(status_reg) & INT_PENDING)
      outl(ACK_INT,status_reg);
}

Where the outl is not expected to complete at the CPU until the device
has lowered the level triggered interrupt line.

If outl is not synchronous then a spurious interrupt will be caused.

When converting a driver to MMIO you'd often have to do this:

void pci_isr()
{
   if (readl(status_reg) & INT_PENDING) {
      writel(ACK_INT,status_reg);
      readl(status_reg); // Synchronizing read, flushes write.
   }
}

Which is one of the software visible impacts of io vs mmio.

> Hmm, when you say `guaranteed to flush the DMA transaction', is that a PCI
> requirement? If so, whether or not that DMA data is then visible to the CPU
> is really specific to the host-controller implementation. It could easily be
> buffered somewhere between the host controller and memory, for example.

PCI has the producer/consumer ordering model as part of the
driving concept in the spec. Basically it wants to see the ordering
model preserved right to the driver code itself.

Realistically, way back, archs that couldn't do the synchronous IO
(like my old MIPS design) had to convert their drivers to MMIO and run
that way. It never worked 100% properly, or made sense to try an use an
async outl, even though some systems provided it :)

Jason
--
To unsubscribe from this list: send the line "unsubscribe devicetree" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 66+ messages in thread

* [PATCHv3 1/3] ARM: mm: allow sub-architectures to override PCI I/O memory type
@ 2014-05-21  5:20                               ` Jason Gunthorpe
  0 siblings, 0 replies; 66+ messages in thread
From: Jason Gunthorpe @ 2014-05-21  5:20 UTC (permalink / raw
  To: linux-arm-kernel

On Fri, May 16, 2014 at 10:53:33AM +0100, Will Deacon wrote:

> > Correct. Assume a PCI device uses PIO and DMA. It sends a DMA to main memory
> > and lets the CPU know about the data using a level (IntA as opposed to MSI)
> > interrupt. The CPU performs an outl() operation to an I/O port to let the
> > hardware know it has received the IRQ and the response of the outl() is
> > guaranteed to flush the DMA transaction: by the time the outl() completes
> > we know that the data in memory is valid because it is strongly ordered
> > relative to the DMA.

Keep in mind that the IntA message itself is going to flush the DMA,
no sane host bridge implementation should process the IntA until all
prior DMA writes are completed, just like MSI.

Also, legacy non-MSI interrupts are always sharable, so the ISR must
always start with a read of a device specific status reguster, which
will also flush any DMA writes.

The simplest common scenario to show synchronous outl is this:

void pci_isr()
{
   if (inl(status_reg) & INT_PENDING)
      outl(ACK_INT,status_reg);
}

Where the outl is not expected to complete at the CPU until the device
has lowered the level triggered interrupt line.

If outl is not synchronous then a spurious interrupt will be caused.

When converting a driver to MMIO you'd often have to do this:

void pci_isr()
{
   if (readl(status_reg) & INT_PENDING) {
      writel(ACK_INT,status_reg);
      readl(status_reg); // Synchronizing read, flushes write.
   }
}

Which is one of the software visible impacts of io vs mmio.

> Hmm, when you say `guaranteed to flush the DMA transaction', is that a PCI
> requirement? If so, whether or not that DMA data is then visible to the CPU
> is really specific to the host-controller implementation. It could easily be
> buffered somewhere between the host controller and memory, for example.

PCI has the producer/consumer ordering model as part of the
driving concept in the spec. Basically it wants to see the ordering
model preserved right to the driver code itself.

Realistically, way back, archs that couldn't do the synchronous IO
(like my old MIPS design) had to convert their drivers to MMIO and run
that way. It never worked 100% properly, or made sense to try an use an
async outl, even though some systems provided it :)

Jason

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCHv3 1/3] ARM: mm: allow sub-architectures to override PCI I/O memory type
  2014-05-21  5:20                               ` Jason Gunthorpe
@ 2014-05-21  8:20                                   ` Arnd Bergmann
  -1 siblings, 0 replies; 66+ messages in thread
From: Arnd Bergmann @ 2014-05-21  8:20 UTC (permalink / raw
  To: Jason Gunthorpe
  Cc: Will Deacon, Thomas Petazzoni,
	devicetree-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, Russell King,
	Jason Cooper, Tawfik Bayouk, Andrew Lunn, Catalin Marinas,
	Grant Likely, Albin Tonnerre, Lior Amsalem, Rob Herring,
	Ezequiel Garcia, Gregory Clement, Nadav Haklai,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r@public.gmane.org,
	Sebastian Hesselbarth

On Tuesday 20 May 2014 23:20:07 Jason Gunthorpe wrote:
> On Fri, May 16, 2014 at 10:53:33AM +0100, Will Deacon wrote:
> 
> > > Correct. Assume a PCI device uses PIO and DMA. It sends a DMA to main memory
> > > and lets the CPU know about the data using a level (IntA as opposed to MSI)
> > > interrupt. The CPU performs an outl() operation to an I/O port to let the
> > > hardware know it has received the IRQ and the response of the outl() is
> > > guaranteed to flush the DMA transaction: by the time the outl() completes
> > > we know that the data in memory is valid because it is strongly ordered
> > > relative to the DMA.
> 
> Keep in mind that the IntA message itself is going to flush the DMA,
> no sane host bridge implementation should process the IntA until all
> prior DMA writes are completed, just like MSI.

I was thinking of PCI, not PCIe here, where the interrupt can be directly
wired to the irqchip.

> Also, legacy non-MSI interrupts are always sharable, so the ISR must
> always start with a read of a device specific status reguster, which
> will also flush any DMA writes.

Right, good point.

> The simplest common scenario to show synchronous outl is this:
> 
> void pci_isr()
> {
>    if (inl(status_reg) & INT_PENDING)
>       outl(ACK_INT,status_reg);
> }
> 
> Where the outl is not expected to complete at the CPU until the device
> has lowered the level triggered interrupt line.
> 
> If outl is not synchronous then a spurious interrupt will be caused.
> 
> When converting a driver to MMIO you'd often have to do this:
> 
> void pci_isr()
> {
>    if (readl(status_reg) & INT_PENDING) {
>       writel(ACK_INT,status_reg);
>       readl(status_reg); // Synchronizing read, flushes write.
>    }
> }
> 
> Which is one of the software visible impacts of io vs mmio.
> 
> > Hmm, when you say `guaranteed to flush the DMA transaction', is that a PCI
> > requirement? If so, whether or not that DMA data is then visible to the CPU
> > is really specific to the host-controller implementation. It could easily be
> > buffered somewhere between the host controller and memory, for example.
> 
> PCI has the producer/consumer ordering model as part of the
> driving concept in the spec. Basically it wants to see the ordering
> model preserved right to the driver code itself.
> 
> Realistically, way back, archs that couldn't do the synchronous IO
> (like my old MIPS design) had to convert their drivers to MMIO and run
> that way. It never worked 100% properly, or made sense to try an use an
> async outl, even though some systems provided it 

Thanks for the extra background information!

	Arnd
--
To unsubscribe from this list: send the line "unsubscribe devicetree" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 66+ messages in thread

* [PATCHv3 1/3] ARM: mm: allow sub-architectures to override PCI I/O memory type
@ 2014-05-21  8:20                                   ` Arnd Bergmann
  0 siblings, 0 replies; 66+ messages in thread
From: Arnd Bergmann @ 2014-05-21  8:20 UTC (permalink / raw
  To: linux-arm-kernel

On Tuesday 20 May 2014 23:20:07 Jason Gunthorpe wrote:
> On Fri, May 16, 2014 at 10:53:33AM +0100, Will Deacon wrote:
> 
> > > Correct. Assume a PCI device uses PIO and DMA. It sends a DMA to main memory
> > > and lets the CPU know about the data using a level (IntA as opposed to MSI)
> > > interrupt. The CPU performs an outl() operation to an I/O port to let the
> > > hardware know it has received the IRQ and the response of the outl() is
> > > guaranteed to flush the DMA transaction: by the time the outl() completes
> > > we know that the data in memory is valid because it is strongly ordered
> > > relative to the DMA.
> 
> Keep in mind that the IntA message itself is going to flush the DMA,
> no sane host bridge implementation should process the IntA until all
> prior DMA writes are completed, just like MSI.

I was thinking of PCI, not PCIe here, where the interrupt can be directly
wired to the irqchip.

> Also, legacy non-MSI interrupts are always sharable, so the ISR must
> always start with a read of a device specific status reguster, which
> will also flush any DMA writes.

Right, good point.

> The simplest common scenario to show synchronous outl is this:
> 
> void pci_isr()
> {
>    if (inl(status_reg) & INT_PENDING)
>       outl(ACK_INT,status_reg);
> }
> 
> Where the outl is not expected to complete at the CPU until the device
> has lowered the level triggered interrupt line.
> 
> If outl is not synchronous then a spurious interrupt will be caused.
> 
> When converting a driver to MMIO you'd often have to do this:
> 
> void pci_isr()
> {
>    if (readl(status_reg) & INT_PENDING) {
>       writel(ACK_INT,status_reg);
>       readl(status_reg); // Synchronizing read, flushes write.
>    }
> }
> 
> Which is one of the software visible impacts of io vs mmio.
> 
> > Hmm, when you say `guaranteed to flush the DMA transaction', is that a PCI
> > requirement? If so, whether or not that DMA data is then visible to the CPU
> > is really specific to the host-controller implementation. It could easily be
> > buffered somewhere between the host controller and memory, for example.
> 
> PCI has the producer/consumer ordering model as part of the
> driving concept in the spec. Basically it wants to see the ordering
> model preserved right to the driver code itself.
> 
> Realistically, way back, archs that couldn't do the synchronous IO
> (like my old MIPS design) had to convert their drivers to MMIO and run
> that way. It never worked 100% properly, or made sense to try an use an
> async outl, even though some systems provided it 

Thanks for the extra background information!

	Arnd

^ permalink raw reply	[flat|nested] 66+ messages in thread

end of thread, other threads:[~2014-05-21  8:20 UTC | newest]

Thread overview: 66+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-05-15  9:18 [PATCHv3 0/3] ARM: implement workaround for Cortex-A9/PL310/PCIe deadlock Thomas Petazzoni
2014-05-15  9:18 ` Thomas Petazzoni
     [not found] ` <1400145519-28530-1-git-send-email-thomas.petazzoni-wi1+55ScJUtKEb57/3fJTNBPR1lH4CV8@public.gmane.org>
2014-05-15  9:18   ` [PATCHv3 1/3] ARM: mm: allow sub-architectures to override PCI I/O memory type Thomas Petazzoni
2014-05-15  9:18     ` Thomas Petazzoni
     [not found]     ` <1400145519-28530-2-git-send-email-thomas.petazzoni-wi1+55ScJUtKEb57/3fJTNBPR1lH4CV8@public.gmane.org>
2014-05-15 13:21       ` Arnd Bergmann
2014-05-15 13:21         ` Arnd Bergmann
2014-05-15 13:51         ` Thomas Petazzoni
2014-05-15 13:51           ` Thomas Petazzoni
     [not found]           ` <20140515155130.193c3181-wi1+55ScJUtKEb57/3fJTNBPR1lH4CV8@public.gmane.org>
2014-05-15 14:29             ` Will Deacon
2014-05-15 14:29               ` Will Deacon
     [not found]               ` <20140515142924.GI27594-5wv7dgnIgG8@public.gmane.org>
2014-05-15 14:32                 ` Arnd Bergmann
2014-05-15 14:32                   ` Arnd Bergmann
2014-05-15 15:34                   ` Will Deacon
2014-05-15 15:34                     ` Will Deacon
     [not found]                     ` <20140515153430.GM27594-5wv7dgnIgG8@public.gmane.org>
2014-05-15 15:55                       ` Arnd Bergmann
2014-05-15 15:55                         ` Arnd Bergmann
2014-05-16  9:53                         ` Will Deacon
2014-05-16  9:53                           ` Will Deacon
     [not found]                           ` <20140516095333.GE12341-5wv7dgnIgG8@public.gmane.org>
2014-05-19 13:19                             ` Arnd Bergmann
2014-05-19 13:19                               ` Arnd Bergmann
2014-05-19 14:23                               ` Will Deacon
2014-05-19 14:23                                 ` Will Deacon
     [not found]                                 ` <20140519142355.GD15130-5wv7dgnIgG8@public.gmane.org>
2014-05-19 16:40                                   ` Arnd Bergmann
2014-05-19 16:40                                     ` Arnd Bergmann
2014-05-19 16:50                                     ` Will Deacon
2014-05-19 16:50                                       ` Will Deacon
     [not found]                                       ` <20140519165010.GQ15130-5wv7dgnIgG8@public.gmane.org>
2014-05-19 17:04                                         ` Arnd Bergmann
2014-05-19 17:04                                           ` Arnd Bergmann
2014-05-21  5:20                             ` Jason Gunthorpe
2014-05-21  5:20                               ` Jason Gunthorpe
     [not found]                               ` <20140521052007.GA14888-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
2014-05-21  8:20                                 ` Arnd Bergmann
2014-05-21  8:20                                   ` Arnd Bergmann
2014-05-15 17:53                       ` Jason Gunthorpe
2014-05-15 17:53                         ` Jason Gunthorpe
     [not found]                         ` <20140515175307.GA12259-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
2014-05-16  9:57                           ` Will Deacon
2014-05-16  9:57                             ` Will Deacon
     [not found]                             ` <20140516095736.GF12341-5wv7dgnIgG8@public.gmane.org>
2014-05-16 15:33                               ` Jason Gunthorpe
2014-05-16 15:33                                 ` Jason Gunthorpe
2014-05-15  9:18   ` [PATCHv3 2/3] ARM: mm: add support for HW coherent systems in PL310 Thomas Petazzoni
2014-05-15  9:18     ` Thomas Petazzoni
     [not found]     ` <1400145519-28530-3-git-send-email-thomas.petazzoni-wi1+55ScJUtKEb57/3fJTNBPR1lH4CV8@public.gmane.org>
2014-05-15  9:36       ` Catalin Marinas
2014-05-15  9:36         ` Catalin Marinas
2014-05-15 11:39         ` Thomas Petazzoni
2014-05-15 11:39           ` Thomas Petazzoni
2014-05-15 13:23       ` Arnd Bergmann
2014-05-15 13:23         ` Arnd Bergmann
2014-05-15 13:35       ` Rob Herring
2014-05-15 13:35         ` Rob Herring
     [not found]         ` <CAL_JsqJZDhi8qtzSbDAdkN3BQqdZQZtRcC-yha+QsreLFpaNRQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2014-05-15 13:46           ` Thomas Petazzoni
2014-05-15 13:46             ` Thomas Petazzoni
2014-05-15  9:18   ` [PATCHv3 3/3] ARM: mvebu: implement L2/PCIe deadlock workaround Thomas Petazzoni
2014-05-15  9:18     ` Thomas Petazzoni
2014-05-15  9:36     ` Catalin Marinas
2014-05-15  9:36       ` Catalin Marinas
     [not found]     ` <1400145519-28530-4-git-send-email-thomas.petazzoni-wi1+55ScJUtKEb57/3fJTNBPR1lH4CV8@public.gmane.org>
2014-05-15 13:21       ` Jason Cooper
2014-05-15 13:21         ` Jason Cooper
     [not found]         ` <20140515132118.GK27822-u4khhh1J0LxI1Ri9qeTfzeTW4wlIGRCZ@public.gmane.org>
2014-05-15 13:50           ` Thomas Petazzoni
2014-05-15 13:50             ` Thomas Petazzoni
     [not found]             ` <20140515155008.1dcca042-wi1+55ScJUtKEb57/3fJTNBPR1lH4CV8@public.gmane.org>
2014-05-15 15:31               ` Jason Cooper
2014-05-15 15:31                 ` Jason Cooper
     [not found]                 ` <20140515153150.GP27822-u4khhh1J0LxI1Ri9qeTfzeTW4wlIGRCZ@public.gmane.org>
2014-05-16  7:19                   ` Thomas Petazzoni
2014-05-16  7:19                     ` Thomas Petazzoni
2014-05-15 13:26       ` Arnd Bergmann
2014-05-15 13:26         ` Arnd Bergmann
2014-05-15 14:22         ` Thomas Petazzoni
2014-05-15 14:22           ` Thomas Petazzoni

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.