LKML Archive mirror
 help / color / mirror / Atom feed
* [PATCH v3 0/8] PCI: Solve two bridge window sizing issues
@ 2024-05-07 10:25 Ilpo Järvinen
  2024-05-07 10:25 ` [PATCH v3 1/8] PCI: Fix resource double counting on remove & rescan Ilpo Järvinen
                   ` (9 more replies)
  0 siblings, 10 replies; 16+ messages in thread
From: Ilpo Järvinen @ 2024-05-07 10:25 UTC (permalink / raw
  To: linux-pci, Bjorn Helgaas, Lorenzo Pieralisi, Rob Herring,
	Krzysztof Wilczyński, Igor Mammedov, Mika Westerberg,
	Andy Shevchenko, Rafael J . Wysocki, Jonathan Cameron
  Cc: linux-kernel, Ilpo Järvinen

Hi all,

Here's a series that contains two fixes to PCI bridge window sizing
algorithm. Together, they should enable remove & rescan cycle to work
for a PCI bus that has PCI devices with optional resources and/or
disparity in BAR sizes.

For the second fix, I chose to expose find_resource_space() from
kernel/resource.c because it should increase accuracy of the cannot-fit
decision (currently that function is called find_resource()). In order
to do that sensibly, a few improvements seemed in order to make its
interface and name of the function sane before exposing it. Thus, the
few extra patches on resource side.

v3:
- Removed "slot" wording
        - Renamed find_empty_resource_slot() -> find_resource_space()
- find_resource_space() returns bool instead of int
- Added patch to convert literal 20 related to bridge win minimum
  alignment to __ffs(SZ_1M)
- Fixed kerneldoc missing "struct"
- Tweaked prints (one dbg -> info, added new dbg one for success case)
- Changelog tweaks
        - Take account largest >> 1 (in alignment calc)
        - Adjust to minor changes made into calculate_memsize()
        - Take logs from more recent kernel to get rid of reg 0xXX

v2:
- Add "typedef" to kerneldoc to get correct formatting
- Use RESOURCE_SIZE_MAX instead of literal
- Remove unnecessary checks for io{port/mem}_resource
- Apply a few style tweaks from Andy

Ilpo Järvinen (8):
  PCI: Fix resource double counting on remove & rescan
  resource: Rename find_resource() to find_resource_space()
  resource: Document find_resource_space() and resource_constraint
  resource: Use typedef for alignf callback
  resource: Handle simple alignment inside __find_resource_space()
  resource: Export find_resource_space()
  PCI: Make minimum bridge window alignment reference more obvious
  PCI: Relax bridge window tail sizing rules

 drivers/pci/bus.c       | 10 +----
 drivers/pci/setup-bus.c | 91 +++++++++++++++++++++++++++++++++++++----
 include/linux/ioport.h  | 44 ++++++++++++++++++--
 include/linux/pci.h     |  5 +--
 kernel/resource.c       | 68 ++++++++++++++----------------
 5 files changed, 157 insertions(+), 61 deletions(-)

-- 
2.39.2


^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCH v3 1/8] PCI: Fix resource double counting on remove & rescan
  2024-05-07 10:25 [PATCH v3 0/8] PCI: Solve two bridge window sizing issues Ilpo Järvinen
@ 2024-05-07 10:25 ` Ilpo Järvinen
  2024-05-07 10:25 ` [PATCH v3 2/8] resource: Rename find_resource() to find_resource_space() Ilpo Järvinen
                   ` (8 subsequent siblings)
  9 siblings, 0 replies; 16+ messages in thread
From: Ilpo Järvinen @ 2024-05-07 10:25 UTC (permalink / raw
  To: linux-pci, Bjorn Helgaas, Lorenzo Pieralisi, Rob Herring,
	Krzysztof Wilczyński, Igor Mammedov, Mika Westerberg,
	Andy Shevchenko, Rafael J . Wysocki, Jonathan Cameron, Yinghai Lu,
	Jesse Barnes, linux-kernel
  Cc: Ilpo Järvinen, Lidong Wang

pbus_size_mem() keeps the size of the optional resources in
children_add_size. When calculating the PCI bridge window size,
calculate_memsize() lower bounds size by old_size before adding
children_add_size and performing the window size alignment. This
results in double counting for the resources in children_add_size
because old_size may be based on the previous size of the bridge
window after it has already included children_add_size (that is,
size1 in pbus_size_mem() from an earlier invocation of that
function).

As a result, on repeated remove of the bus & rescan cycles the resource
size keeps increasing when children_add_size is non-zero as can be seen
from this extract:

iomem0:  23fffd00000-23fffdfffff : PCI Bus 0000:03
iomem1:  20000000000-200001fffff : PCI Bus 0000:03
iomem2:  20000000000-200002fffff : PCI Bus 0000:03
iomem3:  20000000000-200003fffff : PCI Bus 0000:03
iomem4:  20000000000-200004fffff : PCI Bus 0000:03

Solve the double counting by moving old_size check later in
calculate_memsize() so that children_add_size is already accounted for.

After the patch, the bridge window retains its size as expected:

iomem0:  23fffd00000-23fffdfffff : PCI Bus 0000:03
iomem1:  20000000000-200000fffff : PCI Bus 0000:03
iomem2:  20000000000-200000fffff : PCI Bus 0000:03

Fixes: a4ac9fea016f ("PCI : Calculate right add_size")
Tested-by: Lidong Wang <lidong.wang@intel.com>
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Reviewed-by: Mika Westerberg <mika.westerberg@linux.intel.com>
---
 drivers/pci/setup-bus.c | 6 ++----
 1 file changed, 2 insertions(+), 4 deletions(-)

diff --git a/drivers/pci/setup-bus.c b/drivers/pci/setup-bus.c
index 909e6a7c3cc3..141d6b31959b 100644
--- a/drivers/pci/setup-bus.c
+++ b/drivers/pci/setup-bus.c
@@ -829,11 +829,9 @@ static resource_size_t calculate_memsize(resource_size_t size,
 		size = min_size;
 	if (old_size == 1)
 		old_size = 0;
-	if (size < old_size)
-		size = old_size;
 
-	size = ALIGN(max(size, add_size) + children_add_size, align);
-	return size;
+	size = max(size, add_size) + children_add_size;
+	return ALIGN(max(size, old_size), align);
 }
 
 resource_size_t __weak pcibios_window_alignment(struct pci_bus *bus,
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH v3 2/8] resource: Rename find_resource() to find_resource_space()
  2024-05-07 10:25 [PATCH v3 0/8] PCI: Solve two bridge window sizing issues Ilpo Järvinen
  2024-05-07 10:25 ` [PATCH v3 1/8] PCI: Fix resource double counting on remove & rescan Ilpo Järvinen
@ 2024-05-07 10:25 ` Ilpo Järvinen
  2024-05-07 10:25 ` [PATCH v3 3/8] resource: Document find_resource_space() and resource_constraint Ilpo Järvinen
                   ` (7 subsequent siblings)
  9 siblings, 0 replies; 16+ messages in thread
From: Ilpo Järvinen @ 2024-05-07 10:25 UTC (permalink / raw
  To: linux-pci, Bjorn Helgaas, Lorenzo Pieralisi, Rob Herring,
	Krzysztof Wilczyński, Igor Mammedov, Mika Westerberg,
	Andy Shevchenko, Rafael J . Wysocki, Jonathan Cameron,
	linux-kernel
  Cc: Ilpo Järvinen, Lidong Wang, Andy Shevchenko

Rename find_resource() to find_resource_space() to better describe what
the functions does. This is a preparation for exposing it beyond
resource.c which is needed by PCI core. Also rename the __ variant to
match the names.

Tested-by: Lidong Wang <lidong.wang@intel.com>
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Reviewed-by: Mika Westerberg <mika.westerberg@linux.intel.com>
---
 kernel/resource.c | 23 +++++++++++------------
 1 file changed, 11 insertions(+), 12 deletions(-)

diff --git a/kernel/resource.c b/kernel/resource.c
index fcbca39dbc45..e163e0a8f2f8 100644
--- a/kernel/resource.c
+++ b/kernel/resource.c
@@ -628,13 +628,12 @@ static void resource_clip(struct resource *res, resource_size_t min,
 }
 
 /*
- * Find empty slot in the resource tree with the given range and
+ * Find empty space in the resource tree with the given range and
  * alignment constraints
  */
-static int __find_resource(struct resource *root, struct resource *old,
-			 struct resource *new,
-			 resource_size_t  size,
-			 struct resource_constraint *constraint)
+static int __find_resource_space(struct resource *root, struct resource *old,
+				 struct resource *new, resource_size_t size,
+				 struct resource_constraint *constraint)
 {
 	struct resource *this = root->child;
 	struct resource tmp = *new, avail, alloc;
@@ -688,13 +687,13 @@ next:		if (!this || this->end == root->end)
 }
 
 /*
- * Find empty slot in the resource tree given range and alignment.
+ * Find empty space in the resource tree given range and alignment.
  */
-static int find_resource(struct resource *root, struct resource *new,
-			resource_size_t size,
-			struct resource_constraint  *constraint)
+static int find_resource_space(struct resource *root, struct resource *new,
+			       resource_size_t size,
+			       struct resource_constraint *constraint)
 {
-	return  __find_resource(root, NULL, new, size, constraint);
+	return  __find_resource_space(root, NULL, new, size, constraint);
 }
 
 /**
@@ -717,7 +716,7 @@ static int reallocate_resource(struct resource *root, struct resource *old,
 
 	write_lock(&resource_lock);
 
-	if ((err = __find_resource(root, old, &new, newsize, constraint)))
+	if ((err = __find_resource_space(root, old, &new, newsize, constraint)))
 		goto out;
 
 	if (resource_contains(&new, old)) {
@@ -786,7 +785,7 @@ int allocate_resource(struct resource *root, struct resource *new,
 	}
 
 	write_lock(&resource_lock);
-	err = find_resource(root, new, size, &constraint);
+	err = find_resource_space(root, new, size, &constraint);
 	if (err >= 0 && __request_resource(root, new))
 		err = -EBUSY;
 	write_unlock(&resource_lock);
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH v3 3/8] resource: Document find_resource_space() and resource_constraint
  2024-05-07 10:25 [PATCH v3 0/8] PCI: Solve two bridge window sizing issues Ilpo Järvinen
  2024-05-07 10:25 ` [PATCH v3 1/8] PCI: Fix resource double counting on remove & rescan Ilpo Järvinen
  2024-05-07 10:25 ` [PATCH v3 2/8] resource: Rename find_resource() to find_resource_space() Ilpo Järvinen
@ 2024-05-07 10:25 ` Ilpo Järvinen
  2024-05-07 10:25 ` [PATCH v3 4/8] resource: Use typedef for alignf callback Ilpo Järvinen
                   ` (6 subsequent siblings)
  9 siblings, 0 replies; 16+ messages in thread
From: Ilpo Järvinen @ 2024-05-07 10:25 UTC (permalink / raw
  To: linux-pci, Bjorn Helgaas, Lorenzo Pieralisi, Rob Herring,
	Krzysztof Wilczyński, Igor Mammedov, Mika Westerberg,
	Andy Shevchenko, Rafael J . Wysocki, Jonathan Cameron,
	linux-kernel
  Cc: Ilpo Järvinen, Lidong Wang, Andy Shevchenko

Document find_resource_space() and the struct resource_constraint as
they are going to be exposed outside of resource.c.

Tested-by: Lidong Wang <lidong.wang@intel.com>
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Reviewed-by: Mika Westerberg <mika.westerberg@linux.intel.com>
---
 kernel/resource.c | 29 ++++++++++++++++++++++++++---
 1 file changed, 26 insertions(+), 3 deletions(-)

diff --git a/kernel/resource.c b/kernel/resource.c
index e163e0a8f2f8..3f15a32d9c42 100644
--- a/kernel/resource.c
+++ b/kernel/resource.c
@@ -48,7 +48,19 @@ struct resource iomem_resource = {
 };
 EXPORT_SYMBOL(iomem_resource);
 
-/* constraints to be met while allocating resources */
+/**
+ * struct resource_constraint - constraints to be met while searching empty
+ *				resource space
+ * @min:		The minimum address for the memory range
+ * @max:		The maximum address for the memory range
+ * @align:		Alignment for the start address of the empty space
+ * @alignf:		Additional alignment constraints callback
+ * @alignf_data:	Data provided for @alignf callback
+ *
+ * Contains the range and alignment constraints that have to be met during
+ * find_resource_space(). @alignf can be NULL indicating no alignment beyond
+ * @align is necessary.
+ */
 struct resource_constraint {
 	resource_size_t min, max, align;
 	resource_size_t (*alignf)(void *, const struct resource *,
@@ -686,8 +698,19 @@ next:		if (!this || this->end == root->end)
 	return -EBUSY;
 }
 
-/*
- * Find empty space in the resource tree given range and alignment.
+/**
+ * find_resource_space - Find empty space in the resource tree
+ * @root:	Root resource descriptor
+ * @new:	Resource descriptor awaiting an empty resource space
+ * @size:	The minimum size of the empty space
+ * @constraint:	The range and alignment constraints to be met
+ *
+ * Finds an empty space under @root in the resource tree satisfying range and
+ * alignment @constraints.
+ *
+ * Return:
+ * * %0		- if successful, @new members start, end, and flags are altered.
+ * * %-EBUSY	- if no empty space was found.
  */
 static int find_resource_space(struct resource *root, struct resource *new,
 			       resource_size_t size,
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH v3 4/8] resource: Use typedef for alignf callback
  2024-05-07 10:25 [PATCH v3 0/8] PCI: Solve two bridge window sizing issues Ilpo Järvinen
                   ` (2 preceding siblings ...)
  2024-05-07 10:25 ` [PATCH v3 3/8] resource: Document find_resource_space() and resource_constraint Ilpo Järvinen
@ 2024-05-07 10:25 ` Ilpo Järvinen
  2024-05-07 10:25 ` [PATCH v3 5/8] resource: Handle simple alignment inside __find_resource_space() Ilpo Järvinen
                   ` (5 subsequent siblings)
  9 siblings, 0 replies; 16+ messages in thread
From: Ilpo Järvinen @ 2024-05-07 10:25 UTC (permalink / raw
  To: linux-pci, Bjorn Helgaas, Lorenzo Pieralisi, Rob Herring,
	Krzysztof Wilczyński, Igor Mammedov, Mika Westerberg,
	Andy Shevchenko, Rafael J . Wysocki, Jonathan Cameron,
	linux-kernel
  Cc: Ilpo Järvinen, Andy Shevchenko, Lidong Wang

To make it simpler to declare resource constraint alignf callbacks, add
typedef for it and document it.

Suggested-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Tested-by: Lidong Wang <lidong.wang@intel.com>
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Reviewed-by: Mika Westerberg <mika.westerberg@linux.intel.com>
---
 drivers/pci/bus.c      | 10 ++--------
 include/linux/ioport.h | 22 ++++++++++++++++++----
 include/linux/pci.h    |  5 +----
 kernel/resource.c      |  8 ++------
 4 files changed, 23 insertions(+), 22 deletions(-)

diff --git a/drivers/pci/bus.c b/drivers/pci/bus.c
index 826b5016a101..dfc99b3cb958 100644
--- a/drivers/pci/bus.c
+++ b/drivers/pci/bus.c
@@ -176,10 +176,7 @@ static void pci_clip_resource_to_region(struct pci_bus *bus,
 static int pci_bus_alloc_from_region(struct pci_bus *bus, struct resource *res,
 		resource_size_t size, resource_size_t align,
 		resource_size_t min, unsigned long type_mask,
-		resource_size_t (*alignf)(void *,
-					  const struct resource *,
-					  resource_size_t,
-					  resource_size_t),
+		resource_alignf alignf,
 		void *alignf_data,
 		struct pci_bus_region *region)
 {
@@ -250,10 +247,7 @@ static int pci_bus_alloc_from_region(struct pci_bus *bus, struct resource *res,
 int pci_bus_alloc_resource(struct pci_bus *bus, struct resource *res,
 		resource_size_t size, resource_size_t align,
 		resource_size_t min, unsigned long type_mask,
-		resource_size_t (*alignf)(void *,
-					  const struct resource *,
-					  resource_size_t,
-					  resource_size_t),
+		resource_alignf alignf,
 		void *alignf_data)
 {
 #ifdef CONFIG_ARCH_DMA_ADDR_T_64BIT
diff --git a/include/linux/ioport.h b/include/linux/ioport.h
index db7fe25f3370..28266426e5bf 100644
--- a/include/linux/ioport.h
+++ b/include/linux/ioport.h
@@ -188,6 +188,23 @@ enum {
 #define DEFINE_RES_DMA(_dma)						\
 	DEFINE_RES_DMA_NAMED((_dma), NULL)
 
+/**
+ * typedef resource_alignf - Resource alignment callback
+ * @data:	Private data used by the callback
+ * @res:	Resource candidate range (an empty resource space)
+ * @size:	The minimum size of the empty space
+ * @align:	Alignment from the constraints
+ *
+ * Callback allows calculating resource placement and alignment beyond min,
+ * max, and align fields in the struct resource_constraint.
+ *
+ * Return: Start address for the resource.
+ */
+typedef resource_size_t (*resource_alignf)(void *data,
+					   const struct resource *res,
+					   resource_size_t size,
+					   resource_size_t align);
+
 /* PC/ISA/whatever - the normal PC address spaces: IO and memory */
 extern struct resource ioport_resource;
 extern struct resource iomem_resource;
@@ -207,10 +224,7 @@ extern void arch_remove_reservations(struct resource *avail);
 extern int allocate_resource(struct resource *root, struct resource *new,
 			     resource_size_t size, resource_size_t min,
 			     resource_size_t max, resource_size_t align,
-			     resource_size_t (*alignf)(void *,
-						       const struct resource *,
-						       resource_size_t,
-						       resource_size_t),
+			     resource_alignf alignf,
 			     void *alignf_data);
 struct resource *lookup_resource(struct resource *root, resource_size_t start);
 int adjust_resource(struct resource *res, resource_size_t start,
diff --git a/include/linux/pci.h b/include/linux/pci.h
index 16493426a04f..dde87db6b982 100644
--- a/include/linux/pci.h
+++ b/include/linux/pci.h
@@ -1552,10 +1552,7 @@ int __must_check pci_bus_alloc_resource(struct pci_bus *bus,
 			struct resource *res, resource_size_t size,
 			resource_size_t align, resource_size_t min,
 			unsigned long type_mask,
-			resource_size_t (*alignf)(void *,
-						  const struct resource *,
-						  resource_size_t,
-						  resource_size_t),
+			resource_alignf alignf,
 			void *alignf_data);
 
 
diff --git a/kernel/resource.c b/kernel/resource.c
index 3f15a32d9c42..26ad6f223652 100644
--- a/kernel/resource.c
+++ b/kernel/resource.c
@@ -63,8 +63,7 @@ EXPORT_SYMBOL(iomem_resource);
  */
 struct resource_constraint {
 	resource_size_t min, max, align;
-	resource_size_t (*alignf)(void *, const struct resource *,
-			resource_size_t, resource_size_t);
+	resource_alignf alignf;
 	void *alignf_data;
 };
 
@@ -783,10 +782,7 @@ static int reallocate_resource(struct resource *root, struct resource *old,
 int allocate_resource(struct resource *root, struct resource *new,
 		      resource_size_t size, resource_size_t min,
 		      resource_size_t max, resource_size_t align,
-		      resource_size_t (*alignf)(void *,
-						const struct resource *,
-						resource_size_t,
-						resource_size_t),
+		      resource_alignf alignf,
 		      void *alignf_data)
 {
 	int err;
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH v3 5/8] resource: Handle simple alignment inside __find_resource_space()
  2024-05-07 10:25 [PATCH v3 0/8] PCI: Solve two bridge window sizing issues Ilpo Järvinen
                   ` (3 preceding siblings ...)
  2024-05-07 10:25 ` [PATCH v3 4/8] resource: Use typedef for alignf callback Ilpo Järvinen
@ 2024-05-07 10:25 ` Ilpo Järvinen
  2024-05-07 10:25 ` [PATCH v3 6/8] resource: Export find_resource_space() Ilpo Järvinen
                   ` (4 subsequent siblings)
  9 siblings, 0 replies; 16+ messages in thread
From: Ilpo Järvinen @ 2024-05-07 10:25 UTC (permalink / raw
  To: linux-pci, Bjorn Helgaas, Lorenzo Pieralisi, Rob Herring,
	Krzysztof Wilczyński, Igor Mammedov, Mika Westerberg,
	Andy Shevchenko, Rafael J . Wysocki, Jonathan Cameron,
	linux-kernel
  Cc: Ilpo Järvinen, Lidong Wang, Andy Shevchenko

allocate_resource() accepts ->alignf() callback to perform custom
alignment beyond constraint->align. If alignf is NULL,
simple_align_resource() is used which only returns avail->start (no
change).

Using avail->start directly is natural and can be done with a
conditional in __find_resource_space() instead which avoids
unnecessarily using callback. It makes the code inside
__find_resource_space() more obvious and removes the need for the
caller to provide constraint->alignf unnecessarily.

This is preparation for exporting find_resource_space().

Tested-by: Lidong Wang <lidong.wang@intel.com>
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Reviewed-by: Mika Westerberg <mika.westerberg@linux.intel.com>
---
 kernel/resource.c | 20 +++++++-------------
 1 file changed, 7 insertions(+), 13 deletions(-)

diff --git a/kernel/resource.c b/kernel/resource.c
index 26ad6f223652..35c44c23b037 100644
--- a/kernel/resource.c
+++ b/kernel/resource.c
@@ -621,14 +621,6 @@ void __weak arch_remove_reservations(struct resource *avail)
 {
 }
 
-static resource_size_t simple_align_resource(void *data,
-					     const struct resource *avail,
-					     resource_size_t size,
-					     resource_size_t align)
-{
-	return avail->start;
-}
-
 static void resource_clip(struct resource *res, resource_size_t min,
 			  resource_size_t max)
 {
@@ -648,6 +640,7 @@ static int __find_resource_space(struct resource *root, struct resource *old,
 {
 	struct resource *this = root->child;
 	struct resource tmp = *new, avail, alloc;
+	resource_alignf alignf = constraint->alignf;
 
 	tmp.start = root->start;
 	/*
@@ -676,8 +669,12 @@ static int __find_resource_space(struct resource *root, struct resource *old,
 		avail.flags = new->flags & ~IORESOURCE_UNSET;
 		if (avail.start >= tmp.start) {
 			alloc.flags = avail.flags;
-			alloc.start = constraint->alignf(constraint->alignf_data, &avail,
-					size, constraint->align);
+			if (alignf) {
+				alloc.start = alignf(constraint->alignf_data,
+						     &avail, size, constraint->align);
+			} else {
+				alloc.start = avail.start;
+			}
 			alloc.end = alloc.start + size - 1;
 			if (alloc.start <= alloc.end &&
 			    resource_contains(&avail, &alloc)) {
@@ -788,9 +785,6 @@ int allocate_resource(struct resource *root, struct resource *new,
 	int err;
 	struct resource_constraint constraint;
 
-	if (!alignf)
-		alignf = simple_align_resource;
-
 	constraint.min = min;
 	constraint.max = max;
 	constraint.align = align;
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH v3 6/8] resource: Export find_resource_space()
  2024-05-07 10:25 [PATCH v3 0/8] PCI: Solve two bridge window sizing issues Ilpo Järvinen
                   ` (4 preceding siblings ...)
  2024-05-07 10:25 ` [PATCH v3 5/8] resource: Handle simple alignment inside __find_resource_space() Ilpo Järvinen
@ 2024-05-07 10:25 ` Ilpo Järvinen
  2024-05-07 10:25 ` [PATCH v3 7/8] PCI: Make minimum bridge window alignment reference more obvious Ilpo Järvinen
                   ` (3 subsequent siblings)
  9 siblings, 0 replies; 16+ messages in thread
From: Ilpo Järvinen @ 2024-05-07 10:25 UTC (permalink / raw
  To: linux-pci, Bjorn Helgaas, Lorenzo Pieralisi, Rob Herring,
	Krzysztof Wilczyński, Igor Mammedov, Mika Westerberg,
	Andy Shevchenko, Rafael J . Wysocki, Jonathan Cameron,
	linux-kernel
  Cc: Ilpo Järvinen, Lidong Wang, Andy Shevchenko

PCI bridge window logic needs to find out in advance to the actual
allocation if there is an empty space big enough to fit the window.

Export find_resource_space() for the purpose. Also move the struct
resource_constraint into generic header to be able to use the new
interface.

Tested-by: Lidong Wang <lidong.wang@intel.com>
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Reviewed-by: Mika Westerberg <mika.westerberg@linux.intel.com>
---
 include/linux/ioport.h | 22 ++++++++++++++++++++++
 kernel/resource.c      | 26 ++++----------------------
 2 files changed, 26 insertions(+), 22 deletions(-)

diff --git a/include/linux/ioport.h b/include/linux/ioport.h
index 28266426e5bf..6e9fb667a1c5 100644
--- a/include/linux/ioport.h
+++ b/include/linux/ioport.h
@@ -205,6 +205,25 @@ typedef resource_size_t (*resource_alignf)(void *data,
 					   resource_size_t size,
 					   resource_size_t align);
 
+/**
+ * struct resource_constraint - constraints to be met while searching empty
+ *				resource space
+ * @min:		The minimum address for the memory range
+ * @max:		The maximum address for the memory range
+ * @align:		Alignment for the start address of the empty space
+ * @alignf:		Additional alignment constraints callback
+ * @alignf_data:	Data provided for @alignf callback
+ *
+ * Contains the range and alignment constraints that have to be met during
+ * find_resource_space(). @alignf can be NULL indicating no alignment beyond
+ * @align is necessary.
+ */
+struct resource_constraint {
+	resource_size_t min, max, align;
+	resource_alignf alignf;
+	void *alignf_data;
+};
+
 /* PC/ISA/whatever - the normal PC address spaces: IO and memory */
 extern struct resource ioport_resource;
 extern struct resource iomem_resource;
@@ -278,6 +297,9 @@ static inline bool resource_union(const struct resource *r1, const struct resour
 	return true;
 }
 
+int find_resource_space(struct resource *root, struct resource *new,
+			resource_size_t size, struct resource_constraint *constraint);
+
 /* Convenience shorthand with allocation */
 #define request_region(start,n,name)		__request_region(&ioport_resource, (start), (n), (name), 0)
 #define request_muxed_region(start,n,name)	__request_region(&ioport_resource, (start), (n), (name), IORESOURCE_MUXED)
diff --git a/kernel/resource.c b/kernel/resource.c
index 35c44c23b037..14777afb0a99 100644
--- a/kernel/resource.c
+++ b/kernel/resource.c
@@ -48,25 +48,6 @@ struct resource iomem_resource = {
 };
 EXPORT_SYMBOL(iomem_resource);
 
-/**
- * struct resource_constraint - constraints to be met while searching empty
- *				resource space
- * @min:		The minimum address for the memory range
- * @max:		The maximum address for the memory range
- * @align:		Alignment for the start address of the empty space
- * @alignf:		Additional alignment constraints callback
- * @alignf_data:	Data provided for @alignf callback
- *
- * Contains the range and alignment constraints that have to be met during
- * find_resource_space(). @alignf can be NULL indicating no alignment beyond
- * @align is necessary.
- */
-struct resource_constraint {
-	resource_size_t min, max, align;
-	resource_alignf alignf;
-	void *alignf_data;
-};
-
 static DEFINE_RWLOCK(resource_lock);
 
 static struct resource *next_resource(struct resource *p, bool skip_children)
@@ -708,12 +689,13 @@ next:		if (!this || this->end == root->end)
  * * %0		- if successful, @new members start, end, and flags are altered.
  * * %-EBUSY	- if no empty space was found.
  */
-static int find_resource_space(struct resource *root, struct resource *new,
-			       resource_size_t size,
-			       struct resource_constraint *constraint)
+int find_resource_space(struct resource *root, struct resource *new,
+			resource_size_t size,
+			struct resource_constraint *constraint)
 {
 	return  __find_resource_space(root, NULL, new, size, constraint);
 }
+EXPORT_SYMBOL_GPL(find_resource_space);
 
 /**
  * reallocate_resource - allocate a slot in the resource tree given range & alignment.
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH v3 7/8] PCI: Make minimum bridge window alignment reference more obvious
  2024-05-07 10:25 [PATCH v3 0/8] PCI: Solve two bridge window sizing issues Ilpo Järvinen
                   ` (5 preceding siblings ...)
  2024-05-07 10:25 ` [PATCH v3 6/8] resource: Export find_resource_space() Ilpo Järvinen
@ 2024-05-07 10:25 ` Ilpo Järvinen
  2024-05-07 10:36   ` Mika Westerberg
  2024-05-07 14:01   ` Andy Shevchenko
  2024-05-07 10:25 ` [PATCH v3 8/8] PCI: Relax bridge window tail sizing rules Ilpo Järvinen
                   ` (2 subsequent siblings)
  9 siblings, 2 replies; 16+ messages in thread
From: Ilpo Järvinen @ 2024-05-07 10:25 UTC (permalink / raw
  To: linux-pci, Bjorn Helgaas, Lorenzo Pieralisi, Rob Herring,
	Krzysztof Wilczyński, Igor Mammedov, Mika Westerberg,
	Andy Shevchenko, Rafael J . Wysocki, Jonathan Cameron,
	linux-kernel
  Cc: Ilpo Järvinen

Calculations related to bridge window size contain literal 20 that is
the minimum alignment for a bridge window. Make the code more obvious
by converting the literal 20 to __ffs(SZ_1MB).

Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
---
 drivers/pci/setup-bus.c | 6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/drivers/pci/setup-bus.c b/drivers/pci/setup-bus.c
index 141d6b31959b..bca1df46f19c 100644
--- a/drivers/pci/setup-bus.c
+++ b/drivers/pci/setup-bus.c
@@ -14,6 +14,7 @@
  *	     tighter packing. Prefetchable range support.
  */
 
+#include <linux/bitops.h>
 #include <linux/init.h>
 #include <linux/kernel.h>
 #include <linux/module.h>
@@ -21,6 +22,7 @@
 #include <linux/errno.h>
 #include <linux/ioport.h>
 #include <linux/cache.h>
+#include <linux/sizes.h>
 #include <linux/slab.h>
 #include <linux/acpi.h>
 #include "pci.h"
@@ -957,7 +959,7 @@ static inline resource_size_t calculate_mem_align(resource_size_t *aligns,
 	for (order = 0; order <= max_order; order++) {
 		resource_size_t align1 = 1;
 
-		align1 <<= (order + 20);
+		align1 <<= (order + __ffs(SZ_1M));
 
 		if (!align)
 			min_align = align1;
@@ -1047,7 +1049,7 @@ static int pbus_size_mem(struct pci_bus *bus, unsigned long mask,
 			 * resources.
 			 */
 			align = pci_resource_alignment(dev, r);
-			order = __ffs(align) - 20;
+			order = __ffs(align) - __ffs(SZ_1M);
 			if (order < 0)
 				order = 0;
 			if (order >= ARRAY_SIZE(aligns)) {
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH v3 8/8] PCI: Relax bridge window tail sizing rules
  2024-05-07 10:25 [PATCH v3 0/8] PCI: Solve two bridge window sizing issues Ilpo Järvinen
                   ` (6 preceding siblings ...)
  2024-05-07 10:25 ` [PATCH v3 7/8] PCI: Make minimum bridge window alignment reference more obvious Ilpo Järvinen
@ 2024-05-07 10:25 ` Ilpo Järvinen
  2024-05-07 13:49   ` Andy Shevchenko
  2024-05-28 13:10 ` [PATCH v3 0/8] PCI: Solve two bridge window sizing issues Ilpo Järvinen
  2024-06-11 23:12 ` Bjorn Helgaas
  9 siblings, 1 reply; 16+ messages in thread
From: Ilpo Järvinen @ 2024-05-07 10:25 UTC (permalink / raw
  To: linux-pci, Bjorn Helgaas, Lorenzo Pieralisi, Rob Herring,
	Krzysztof Wilczyński, Igor Mammedov, Mika Westerberg,
	Andy Shevchenko, Rafael J . Wysocki, Jonathan Cameron,
	linux-kernel
  Cc: Ilpo Järvinen, Lidong Wang

During remove & rescan cycle, PCI subsystem will recalculate and adjust
the bridge window sizing that was initially done by "BIOS". The size
calculation is based on the required alignment of the largest resource
among the downstream resources as per pbus_size_mem() (unimportant or
zero parameters marked with "..."):

	min_align = calculate_mem_align(aligns, max_order);
	size0 = calculate_memsize(size, ..., min_align);

inside calculate_memsize(), for the largest alignment:
	min_align = align1 >> 1;
	...
	return min_align;

and then in calculate_memsize():
	return ALIGN(max(size, ...), align);

If the original bridge window sizing tried to conserve space, this will
lead to massive increase of the required bridge window size when the
downstream has a large disparity in BAR sizes. E.g., with 16MiB and
16GiB BARs this results in 24GiB bridge window size even if 16MiB BAR
does not require gigabytes of space to fit.

When doing remove & rescan for a bus that contains such a PCI device, a
larger bridge window is suddenly required on rescan but when there is a
bridge window upstream that is already assigned based on the original
size, it cannot be enlarged to the new requirement. This causes the
allocation of the bridge window to fail (0x600000000 > 0x400ffffff):

pci 0000:02:01.0: PCI bridge to [bus 03]
pci 0000:02:01.0:   bridge window [mem 0x40400000-0x405fffff]
pci 0000:02:01.0:   bridge window [mem 0x6000000000-0x6400ffffff 64bit pref]
pci 0000:01:00.0: PCI bridge to [bus 02-04]
pci 0000:01:00.0:   bridge window [mem 0x40400000-0x406fffff]
pci 0000:01:00.0:   bridge window [mem 0x6000000000-0x6400ffffff 64bit pref]

pci 0000:03:00.0: device released
pci 0000:02:01.0: device released
pcieport 0000:01:00.0: scanning [bus 02-04] behind bridge, pass 0
pci 0000:02:01.0: PCI bridge to [bus 03]
pci 0000:02:01.0:   bridge window [mem 0x40400000-0x405fffff]
pci 0000:02:01.0:   bridge window [mem 0x6000000000-0x6400ffffff 64bit pref]
pci 0000:02:01.0: scanning [bus 03-03] behind bridge, pass 0
pci 0000:03:00.0: BAR 0 [mem 0x6400000000-0x6400ffffff 64bit pref]
pci 0000:03:00.0: BAR 2 [mem 0x6000000000-0x63ffffffff 64bit pref]
pci 0000:03:00.0: ROM [mem 0x40400000-0x405fffff pref]

pci 0000:02:01.0: PCI bridge to [bus 03]
pci 0000:02:01.0: scanning [bus 03-03] behind bridge, pass 1
pcieport 0000:01:00.0: scanning [bus 02-04] behind bridge, pass 1
pci 0000:02:01.0: bridge window [mem size 0x600000000 64bit pref]: can't assign; no space
pci 0000:02:01.0: bridge window [mem size 0x600000000 64bit pref]: failed to assign
pci 0000:02:01.0: bridge window [mem 0x40400000-0x405fffff]: assigned
pci 0000:03:00.0: BAR 2 [mem size 0x400000000 64bit pref]: can't assign; no space
pci 0000:03:00.0: BAR 2 [mem size 0x400000000 64bit pref]: failed to assign
pci 0000:03:00.0: BAR 0 [mem size 0x01000000 64bit pref]: can't assign; no space
pci 0000:03:00.0: BAR 0 [mem size 0x01000000 64bit pref]: failed to assign
pci 0000:03:00.0: ROM [mem 0x40400000-0x405fffff pref]: assigned
pci 0000:02:01.0: PCI bridge to [bus 03]
pci 0000:02:01.0:   bridge window [mem 0x40400000-0x405fffff]

This is a major surprise for users who are suddenly left with a PCIe
device that was working fine with the original bridge window sizing.

Even if the already assigned bridge window could be enlarged by
reallocation in some cases (something the current code does not attempt
to do), it is not possible in general case and the large amount of
wasted space at the tail of the bridge window may lead to other
resource exhaustion problems on Root Complex level (think of multiple
PCIe cards with VFs and BAR size disparity in a single system).

PCI specifications only expect natural alignment for BARs (PCI Express
Base Specification, rev. 6.1 sect. 7.5.1.2.1) and minimum of 1MiB
alignment for the bridge window (PCI Express Base Specification,
rev 6.1 sect. 7.5.1.3). The current bridge window tail alignment rule
was introduced in the commit 5d0a8965aea9 ("[PATCH] 2.5.14: New PCI
allocation code (alpha, arm, parisc) [2/2]") that only states:
"pbus_size_mem: core stuff; tested with randomly generated sets of
resources". It does not explain the motivation for the extra tail space
allocated that is not truly needed by the downstream resources. As
such, it is far from clear if it ever has been required by any HW.

To prevent PCIe cards with BAR size disparity from becoming unusable
after remove & rescan cycle, attempt to do a truly minimal allocation
for memory resources if needed. First check if the normally calculated
bridge window will not fit into an already assigned upstream resource.
In such case, try with relaxed bridge window tail sizing rules instead
where no extra tail space is requested beyond what the downstream
resources require. Only enforce the alignment requirement of the bridge
window itself (normally 1MiB).

With this patch, the resources are successfully allocated:

pci 0000:02:01.0: PCI bridge to [bus 03]
pci 0000:02:01.0: scanning [bus 03-03] behind bridge, pass 1
pcieport 0000:01:00.0: scanning [bus 02-04] behind bridge, pass 1
pcieport 0000:01:00.0: Assigned bridge window [mem 0x6000000000-0x6400ffffff 64bit pref] to [bus 02-04] cannot fit 0x600000000 required for 0000:02:01.0 bridging to [bus 03]
pci 0000:02:01.0: bridge window [mem 0x6000000000-0x6400ffffff 64bit pref] to [bus 03] requires relaxed alignment rules
pcieport 0000:01:00.0: Assigned bridge window [mem 0x40400000-0x406fffff] to [bus 02-04] free space at [mem 0x40400000-0x405fffff]
pci 0000:02:01.0: bridge window [mem 0x6000000000-0x6400ffffff 64bit pref]: assigned
pci 0000:02:01.0: bridge window [mem 0x40400000-0x405fffff]: assigned
pci 0000:03:00.0: BAR 2 [mem 0x6000000000-0x63ffffffff 64bit pref]: assigned
pci 0000:03:00.0: BAR 0 [mem 0x6400000000-0x6400ffffff 64bit pref]: assigned
pci 0000:03:00.0: ROM [mem 0x40400000-0x405fffff pref]: assigned
pci 0000:02:01.0: PCI bridge to [bus 03]
pci 0000:02:01.0:   bridge window [mem 0x40400000-0x405fffff]
pci 0000:02:01.0:   bridge window [mem 0x6000000000-0x6400ffffff 64bit pref]

This patch draws inspiration from the initial investigations and work
by Mika Westerberg.

Closes: https://bugzilla.kernel.org/show_bug.cgi?id=216795
Link: https://lore.kernel.org/linux-pci/20190812144144.2646-1-mika.westerberg@linux.intel.com/
Fixes: 5d0a8965aea9 ("[PATCH] 2.5.14: New PCI allocation code (alpha, arm, parisc) [2/2]")
Tested-by: Lidong Wang <lidong.wang@intel.com>
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Reviewed-by: Mika Westerberg <mika.westerberg@linux.intel.com>
---
 drivers/pci/setup-bus.c | 79 +++++++++++++++++++++++++++++++++++++++--
 1 file changed, 77 insertions(+), 2 deletions(-)

diff --git a/drivers/pci/setup-bus.c b/drivers/pci/setup-bus.c
index bca1df46f19c..11ee60b9ca71 100644
--- a/drivers/pci/setup-bus.c
+++ b/drivers/pci/setup-bus.c
@@ -22,6 +22,7 @@
 #include <linux/errno.h>
 #include <linux/ioport.h>
 #include <linux/cache.h>
+#include <linux/limits.h>
 #include <linux/sizes.h>
 #include <linux/slab.h>
 #include <linux/acpi.h>
@@ -971,6 +972,67 @@ static inline resource_size_t calculate_mem_align(resource_size_t *aligns,
 	return min_align;
 }
 
+/**
+ * pbus_upstream_space_available - Check no upstream resource limits allocation
+ * @bus:	The bus
+ * @mask:	Mask the resource flag, then compare it with type
+ * @type:	The type of resource from bridge
+ * @size:	The size required from the bridge window
+ * @align:	Required alignment for the resource
+ *
+ * Checks that @size can fit inside the upstream bridge resources that are
+ * already assigned.
+ *
+ * Return: %true if enough space is available on all assigned upstream
+ * resources.
+ */
+static bool pbus_upstream_space_available(struct pci_bus *bus, unsigned long mask,
+					  unsigned long type, resource_size_t size,
+					  resource_size_t align)
+{
+	struct resource_constraint constraint = {
+		.max = RESOURCE_SIZE_MAX,
+		.align = align,
+	};
+	struct pci_bus *downstream = bus;
+	struct resource *r;
+
+	while ((bus = bus->parent)) {
+		if (pci_is_root_bus(bus))
+			break;
+
+		pci_bus_for_each_resource(bus, r) {
+			if (!r || !r->parent || (r->flags & mask) != type)
+				continue;
+
+			if (resource_size(r) >= size) {
+				struct resource gap = {};
+
+				if (find_resource_space(r, &gap, size, &constraint) == 0) {
+					gap.flags = type;
+					pci_dbg(bus->self,
+						"Assigned bridge window %pR to %pR free space at %pR\n",
+						r, &bus->busn_res, &gap);
+					return true;
+				}
+			}
+
+			if (bus->self) {
+				pci_info(bus->self,
+					 "Assigned bridge window %pR to %pR cannot fit 0x%llx required for %s bridging to %pR\n",
+					 r, &bus->busn_res,
+					 (unsigned long long)size,
+					 pci_name(downstream->self),
+					 &downstream->busn_res);
+			}
+
+			return false;
+		}
+	}
+
+	return true;
+}
+
 /**
  * pbus_size_mem() - Size the memory window of a given bus
  *
@@ -997,7 +1059,7 @@ static int pbus_size_mem(struct pci_bus *bus, unsigned long mask,
 			 struct list_head *realloc_head)
 {
 	struct pci_dev *dev;
-	resource_size_t min_align, align, size, size0, size1;
+	resource_size_t min_align, win_align, align, size, size0, size1;
 	resource_size_t aligns[24]; /* Alignments from 1MB to 8TB */
 	int order, max_order;
 	struct resource *b_res = find_bus_resource_of_type(bus,
@@ -1076,10 +1138,23 @@ static int pbus_size_mem(struct pci_bus *bus, unsigned long mask,
 		}
 	}
 
+	win_align = window_alignment(bus, b_res->flags);
 	min_align = calculate_mem_align(aligns, max_order);
-	min_align = max(min_align, window_alignment(bus, b_res->flags));
+	min_align = max(min_align, win_align);
 	size0 = calculate_memsize(size, min_size, 0, 0, resource_size(b_res), min_align);
 	add_align = max(min_align, add_align);
+
+	if (bus->self && size0 &&
+	    !pbus_upstream_space_available(bus, mask | IORESOURCE_PREFETCH, type,
+					   size0, add_align)) {
+		min_align = 1ULL << (max_order + __ffs(SZ_1M));
+		min_align = max(min_align, win_align);
+		size0 = calculate_memsize(size, min_size, 0, 0, resource_size(b_res), win_align);
+		add_align = win_align;
+		pci_info(bus->self, "bridge window %pR to %pR requires relaxed alignment rules\n",
+			 b_res, &bus->busn_res);
+	}
+
 	size1 = (!realloc_head || (realloc_head && !add_size && !children_add_size)) ? size0 :
 		calculate_memsize(size, min_size, add_size, children_add_size,
 				resource_size(b_res), add_align);
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: [PATCH v3 7/8] PCI: Make minimum bridge window alignment reference more obvious
  2024-05-07 10:25 ` [PATCH v3 7/8] PCI: Make minimum bridge window alignment reference more obvious Ilpo Järvinen
@ 2024-05-07 10:36   ` Mika Westerberg
  2024-05-07 10:50     ` Ilpo Järvinen
  2024-05-07 14:01   ` Andy Shevchenko
  1 sibling, 1 reply; 16+ messages in thread
From: Mika Westerberg @ 2024-05-07 10:36 UTC (permalink / raw
  To: Ilpo Järvinen
  Cc: linux-pci, Bjorn Helgaas, Lorenzo Pieralisi, Rob Herring,
	Krzysztof Wilczyński, Igor Mammedov, Andy Shevchenko,
	Rafael J . Wysocki, Jonathan Cameron, linux-kernel

On Tue, May 07, 2024 at 01:25:22PM +0300, Ilpo Järvinen wrote:
> Calculations related to bridge window size contain literal 20 that is
> the minimum alignment for a bridge window. Make the code more obvious
> by converting the literal 20 to __ffs(SZ_1MB).

I think that's SZ_1M not SZ_1MB :)

> Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>

Looks good, may be even add a #define for this but either way,

Reviewed-by: Mika Westerberg <mika.westerberg@linux.intel.com>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v3 7/8] PCI: Make minimum bridge window alignment reference more obvious
  2024-05-07 10:36   ` Mika Westerberg
@ 2024-05-07 10:50     ` Ilpo Järvinen
  0 siblings, 0 replies; 16+ messages in thread
From: Ilpo Järvinen @ 2024-05-07 10:50 UTC (permalink / raw
  To: Mika Westerberg
  Cc: linux-pci, Bjorn Helgaas, Lorenzo Pieralisi, Rob Herring,
	Krzysztof Wilczyński, Igor Mammedov, Andy Shevchenko,
	Rafael J . Wysocki, Jonathan Cameron, LKML

[-- Attachment #1: Type: text/plain, Size: 850 bytes --]

On Tue, 7 May 2024, Mika Westerberg wrote:

> On Tue, May 07, 2024 at 01:25:22PM +0300, Ilpo Järvinen wrote:
> > Calculations related to bridge window size contain literal 20 that is
> > the minimum alignment for a bridge window. Make the code more obvious
> > by converting the literal 20 to __ffs(SZ_1MB).
> 
> I think that's SZ_1M not SZ_1MB :)

Of course, that's the only place which is not checked by the compiler so 
it's the place where I type it wrong and forget to use backspace. :-)

> > Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
> 
> Looks good, may be even add a #define for this but either way,

I considered that but I could not find a good name for the whole construct 
(with the __ffs() I mean). Perhaps PCI_BRIDGE_WINDOW_LSB could be an 
option but that feels somewhat clumsy to me.

-- 
 i.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v3 8/8] PCI: Relax bridge window tail sizing rules
  2024-05-07 10:25 ` [PATCH v3 8/8] PCI: Relax bridge window tail sizing rules Ilpo Järvinen
@ 2024-05-07 13:49   ` Andy Shevchenko
  0 siblings, 0 replies; 16+ messages in thread
From: Andy Shevchenko @ 2024-05-07 13:49 UTC (permalink / raw
  To: Ilpo Järvinen
  Cc: linux-pci, Bjorn Helgaas, Lorenzo Pieralisi, Rob Herring,
	Krzysztof Wilczyński, Igor Mammedov, Mika Westerberg,
	Rafael J . Wysocki, Jonathan Cameron, linux-kernel, Lidong Wang

On Tue, May 07, 2024 at 01:25:23PM +0300, Ilpo Järvinen wrote:
> During remove & rescan cycle, PCI subsystem will recalculate and adjust
> the bridge window sizing that was initially done by "BIOS". The size
> calculation is based on the required alignment of the largest resource
> among the downstream resources as per pbus_size_mem() (unimportant or
> zero parameters marked with "..."):
> 
> 	min_align = calculate_mem_align(aligns, max_order);
> 	size0 = calculate_memsize(size, ..., min_align);
> 
> inside calculate_memsize(), for the largest alignment:
> 	min_align = align1 >> 1;
> 	...
> 	return min_align;
> 
> and then in calculate_memsize():
> 	return ALIGN(max(size, ...), align);
> 
> If the original bridge window sizing tried to conserve space, this will
> lead to massive increase of the required bridge window size when the
> downstream has a large disparity in BAR sizes. E.g., with 16MiB and
> 16GiB BARs this results in 24GiB bridge window size even if 16MiB BAR
> does not require gigabytes of space to fit.
> 
> When doing remove & rescan for a bus that contains such a PCI device, a
> larger bridge window is suddenly required on rescan but when there is a
> bridge window upstream that is already assigned based on the original
> size, it cannot be enlarged to the new requirement. This causes the
> allocation of the bridge window to fail (0x600000000 > 0x400ffffff):
> 
> pci 0000:02:01.0: PCI bridge to [bus 03]
> pci 0000:02:01.0:   bridge window [mem 0x40400000-0x405fffff]
> pci 0000:02:01.0:   bridge window [mem 0x6000000000-0x6400ffffff 64bit pref]
> pci 0000:01:00.0: PCI bridge to [bus 02-04]
> pci 0000:01:00.0:   bridge window [mem 0x40400000-0x406fffff]
> pci 0000:01:00.0:   bridge window [mem 0x6000000000-0x6400ffffff 64bit pref]
> 
> pci 0000:03:00.0: device released
> pci 0000:02:01.0: device released
> pcieport 0000:01:00.0: scanning [bus 02-04] behind bridge, pass 0
> pci 0000:02:01.0: PCI bridge to [bus 03]
> pci 0000:02:01.0:   bridge window [mem 0x40400000-0x405fffff]
> pci 0000:02:01.0:   bridge window [mem 0x6000000000-0x6400ffffff 64bit pref]
> pci 0000:02:01.0: scanning [bus 03-03] behind bridge, pass 0
> pci 0000:03:00.0: BAR 0 [mem 0x6400000000-0x6400ffffff 64bit pref]
> pci 0000:03:00.0: BAR 2 [mem 0x6000000000-0x63ffffffff 64bit pref]
> pci 0000:03:00.0: ROM [mem 0x40400000-0x405fffff pref]
> 
> pci 0000:02:01.0: PCI bridge to [bus 03]
> pci 0000:02:01.0: scanning [bus 03-03] behind bridge, pass 1
> pcieport 0000:01:00.0: scanning [bus 02-04] behind bridge, pass 1
> pci 0000:02:01.0: bridge window [mem size 0x600000000 64bit pref]: can't assign; no space
> pci 0000:02:01.0: bridge window [mem size 0x600000000 64bit pref]: failed to assign
> pci 0000:02:01.0: bridge window [mem 0x40400000-0x405fffff]: assigned
> pci 0000:03:00.0: BAR 2 [mem size 0x400000000 64bit pref]: can't assign; no space
> pci 0000:03:00.0: BAR 2 [mem size 0x400000000 64bit pref]: failed to assign
> pci 0000:03:00.0: BAR 0 [mem size 0x01000000 64bit pref]: can't assign; no space
> pci 0000:03:00.0: BAR 0 [mem size 0x01000000 64bit pref]: failed to assign
> pci 0000:03:00.0: ROM [mem 0x40400000-0x405fffff pref]: assigned
> pci 0000:02:01.0: PCI bridge to [bus 03]
> pci 0000:02:01.0:   bridge window [mem 0x40400000-0x405fffff]
> 
> This is a major surprise for users who are suddenly left with a PCIe
> device that was working fine with the original bridge window sizing.
> 
> Even if the already assigned bridge window could be enlarged by
> reallocation in some cases (something the current code does not attempt
> to do), it is not possible in general case and the large amount of
> wasted space at the tail of the bridge window may lead to other
> resource exhaustion problems on Root Complex level (think of multiple
> PCIe cards with VFs and BAR size disparity in a single system).
> 
> PCI specifications only expect natural alignment for BARs (PCI Express
> Base Specification, rev. 6.1 sect. 7.5.1.2.1) and minimum of 1MiB
> alignment for the bridge window (PCI Express Base Specification,
> rev 6.1 sect. 7.5.1.3). The current bridge window tail alignment rule
> was introduced in the commit 5d0a8965aea9 ("[PATCH] 2.5.14: New PCI
> allocation code (alpha, arm, parisc) [2/2]") that only states:
> "pbus_size_mem: core stuff; tested with randomly generated sets of
> resources". It does not explain the motivation for the extra tail space
> allocated that is not truly needed by the downstream resources. As
> such, it is far from clear if it ever has been required by any HW.
> 
> To prevent PCIe cards with BAR size disparity from becoming unusable
> after remove & rescan cycle, attempt to do a truly minimal allocation
> for memory resources if needed. First check if the normally calculated
> bridge window will not fit into an already assigned upstream resource.
> In such case, try with relaxed bridge window tail sizing rules instead
> where no extra tail space is requested beyond what the downstream
> resources require. Only enforce the alignment requirement of the bridge
> window itself (normally 1MiB).
> 
> With this patch, the resources are successfully allocated:
> 
> pci 0000:02:01.0: PCI bridge to [bus 03]
> pci 0000:02:01.0: scanning [bus 03-03] behind bridge, pass 1
> pcieport 0000:01:00.0: scanning [bus 02-04] behind bridge, pass 1
> pcieport 0000:01:00.0: Assigned bridge window [mem 0x6000000000-0x6400ffffff 64bit pref] to [bus 02-04] cannot fit 0x600000000 required for 0000:02:01.0 bridging to [bus 03]
> pci 0000:02:01.0: bridge window [mem 0x6000000000-0x6400ffffff 64bit pref] to [bus 03] requires relaxed alignment rules
> pcieport 0000:01:00.0: Assigned bridge window [mem 0x40400000-0x406fffff] to [bus 02-04] free space at [mem 0x40400000-0x405fffff]
> pci 0000:02:01.0: bridge window [mem 0x6000000000-0x6400ffffff 64bit pref]: assigned
> pci 0000:02:01.0: bridge window [mem 0x40400000-0x405fffff]: assigned
> pci 0000:03:00.0: BAR 2 [mem 0x6000000000-0x63ffffffff 64bit pref]: assigned
> pci 0000:03:00.0: BAR 0 [mem 0x6400000000-0x6400ffffff 64bit pref]: assigned
> pci 0000:03:00.0: ROM [mem 0x40400000-0x405fffff pref]: assigned
> pci 0000:02:01.0: PCI bridge to [bus 03]
> pci 0000:02:01.0:   bridge window [mem 0x40400000-0x405fffff]
> pci 0000:02:01.0:   bridge window [mem 0x6000000000-0x6400ffffff 64bit pref]
> 
> This patch draws inspiration from the initial investigations and work
> by Mika Westerberg.

...

> +		min_align = 1ULL << (max_order + __ffs(SZ_1M));

In case of a new version of the series, this can utilise BIT_ULL().

-- 
With Best Regards,
Andy Shevchenko



^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v3 7/8] PCI: Make minimum bridge window alignment reference more obvious
  2024-05-07 10:25 ` [PATCH v3 7/8] PCI: Make minimum bridge window alignment reference more obvious Ilpo Järvinen
  2024-05-07 10:36   ` Mika Westerberg
@ 2024-05-07 14:01   ` Andy Shevchenko
  1 sibling, 0 replies; 16+ messages in thread
From: Andy Shevchenko @ 2024-05-07 14:01 UTC (permalink / raw
  To: Ilpo Järvinen, Yury Norov
  Cc: linux-pci, Bjorn Helgaas, Lorenzo Pieralisi, Rob Herring,
	Krzysztof Wilczyński, Igor Mammedov, Mika Westerberg,
	Rafael J . Wysocki, Jonathan Cameron, linux-kernel

On Tue, May 07, 2024 at 01:25:22PM +0300, Ilpo Järvinen wrote:
> Calculations related to bridge window size contain literal 20 that is
> the minimum alignment for a bridge window. Make the code more obvious
> by converting the literal 20 to __ffs(SZ_1MB).

...

> -		align1 <<= (order + 20);
> +		align1 <<= (order + __ffs(SZ_1M));

No need for outer parentheses.

...

> +			order = __ffs(align) - __ffs(SZ_1M);

Yeah, would be nice to have something like

#define bit_distance(a, b)	(ffs(a) - fls(b))

in bitops.h as we have a few users and I have heard about one more coming,
but this is topic to another discussion. (Yuri, just FYI.)

-- 
With Best Regards,
Andy Shevchenko



^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v3 0/8] PCI: Solve two bridge window sizing issues
  2024-05-07 10:25 [PATCH v3 0/8] PCI: Solve two bridge window sizing issues Ilpo Järvinen
                   ` (7 preceding siblings ...)
  2024-05-07 10:25 ` [PATCH v3 8/8] PCI: Relax bridge window tail sizing rules Ilpo Järvinen
@ 2024-05-28 13:10 ` Ilpo Järvinen
  2024-06-11 23:12   ` Bjorn Helgaas
  2024-06-11 23:12 ` Bjorn Helgaas
  9 siblings, 1 reply; 16+ messages in thread
From: Ilpo Järvinen @ 2024-05-28 13:10 UTC (permalink / raw
  To: Bjorn Helgaas
  Cc: linux-pci, Lorenzo Pieralisi, Rob Herring,
	Krzysztof Wilczyński, Igor Mammedov, Mika Westerberg,
	Andy Shevchenko, Rafael J . Wysocki, Jonathan Cameron, LKML

[-- Attachment #1: Type: text/plain, Size: 2903 bytes --]

On Tue, 7 May 2024, Ilpo Järvinen wrote:

> Hi all,
> 
> Here's a series that contains two fixes to PCI bridge window sizing
> algorithm. Together, they should enable remove & rescan cycle to work
> for a PCI bus that has PCI devices with optional resources and/or
> disparity in BAR sizes.
> 
> For the second fix, I chose to expose find_resource_space() from
> kernel/resource.c because it should increase accuracy of the cannot-fit
> decision (currently that function is called find_resource()). In order
> to do that sensibly, a few improvements seemed in order to make its
> interface and name of the function sane before exposing it. Thus, the
> few extra patches on resource side.
> 
> v3:

Hi Bjorn,

It's a bit unclear to me what is your view about the status of this 
series? I see you placed these first into some wip branches and then into 
resource branch but the state of the patches in patchwork is still marked 
as "New" nor have you sent any notice they'd have been "applied".

I'm thinking this from the perspective of whether I should send v4 with 
those minor comments from Andy addressed or not? I could also send those
minor things as separate patches on top of the series if that's 
easier/better for you.

-- 
 i.

> - Removed "slot" wording
>         - Renamed find_empty_resource_slot() -> find_resource_space()
> - find_resource_space() returns bool instead of int
> - Added patch to convert literal 20 related to bridge win minimum
>   alignment to __ffs(SZ_1M)
> - Fixed kerneldoc missing "struct"
> - Tweaked prints (one dbg -> info, added new dbg one for success case)
> - Changelog tweaks
>         - Take account largest >> 1 (in alignment calc)
>         - Adjust to minor changes made into calculate_memsize()
>         - Take logs from more recent kernel to get rid of reg 0xXX
> 
> v2:
> - Add "typedef" to kerneldoc to get correct formatting
> - Use RESOURCE_SIZE_MAX instead of literal
> - Remove unnecessary checks for io{port/mem}_resource
> - Apply a few style tweaks from Andy
> 
> Ilpo Järvinen (8):
>   PCI: Fix resource double counting on remove & rescan
>   resource: Rename find_resource() to find_resource_space()
>   resource: Document find_resource_space() and resource_constraint
>   resource: Use typedef for alignf callback
>   resource: Handle simple alignment inside __find_resource_space()
>   resource: Export find_resource_space()
>   PCI: Make minimum bridge window alignment reference more obvious
>   PCI: Relax bridge window tail sizing rules
> 
>  drivers/pci/bus.c       | 10 +----
>  drivers/pci/setup-bus.c | 91 +++++++++++++++++++++++++++++++++++++----
>  include/linux/ioport.h  | 44 ++++++++++++++++++--
>  include/linux/pci.h     |  5 +--
>  kernel/resource.c       | 68 ++++++++++++++----------------
>  5 files changed, 157 insertions(+), 61 deletions(-)
> 
> 

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v3 0/8] PCI: Solve two bridge window sizing issues
  2024-05-28 13:10 ` [PATCH v3 0/8] PCI: Solve two bridge window sizing issues Ilpo Järvinen
@ 2024-06-11 23:12   ` Bjorn Helgaas
  0 siblings, 0 replies; 16+ messages in thread
From: Bjorn Helgaas @ 2024-06-11 23:12 UTC (permalink / raw
  To: Ilpo Järvinen
  Cc: Bjorn Helgaas, linux-pci, Lorenzo Pieralisi, Rob Herring,
	Krzysztof Wilczyński, Igor Mammedov, Mika Westerberg,
	Andy Shevchenko, Rafael J . Wysocki, Jonathan Cameron, LKML

On Tue, May 28, 2024 at 04:10:48PM +0300, Ilpo Järvinen wrote:
> On Tue, 7 May 2024, Ilpo Järvinen wrote:
> 
> > Hi all,
> > 
> > Here's a series that contains two fixes to PCI bridge window sizing
> > algorithm. Together, they should enable remove & rescan cycle to work
> > for a PCI bus that has PCI devices with optional resources and/or
> > disparity in BAR sizes.
> > 
> > For the second fix, I chose to expose find_resource_space() from
> > kernel/resource.c because it should increase accuracy of the cannot-fit
> > decision (currently that function is called find_resource()). In order
> > to do that sensibly, a few improvements seemed in order to make its
> > interface and name of the function sane before exposing it. Thus, the
> > few extra patches on resource side.
> > 
> > v3:
> 
> Hi Bjorn,
> 
> It's a bit unclear to me what is your view about the status of this 
> series? I see you placed these first into some wip branches and then into 
> resource branch but the state of the patches in patchwork is still marked 
> as "New" nor have you sent any notice they'd have been "applied".
> 
> I'm thinking this from the perspective of whether I should send v4 with 
> those minor comments from Andy addressed or not? I could also send those
> minor things as separate patches on top of the series if that's 
> easier/better for you.

Sorry, I dropped the ball in the middle here.  The pci/resource branch
has been in linux-next since May 29, but I forgot to send a note.  If
you want to tweak for Andy's comments, send an incremental patch and
I'll be happy to squash it/them in.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v3 0/8] PCI: Solve two bridge window sizing issues
  2024-05-07 10:25 [PATCH v3 0/8] PCI: Solve two bridge window sizing issues Ilpo Järvinen
                   ` (8 preceding siblings ...)
  2024-05-28 13:10 ` [PATCH v3 0/8] PCI: Solve two bridge window sizing issues Ilpo Järvinen
@ 2024-06-11 23:12 ` Bjorn Helgaas
  9 siblings, 0 replies; 16+ messages in thread
From: Bjorn Helgaas @ 2024-06-11 23:12 UTC (permalink / raw
  To: Ilpo Järvinen
  Cc: linux-pci, Bjorn Helgaas, Lorenzo Pieralisi, Rob Herring,
	Krzysztof Wilczyński, Igor Mammedov, Mika Westerberg,
	Andy Shevchenko, Rafael J . Wysocki, Jonathan Cameron,
	linux-kernel

On Tue, May 07, 2024 at 01:25:15PM +0300, Ilpo Järvinen wrote:
> Hi all,
> 
> Here's a series that contains two fixes to PCI bridge window sizing
> algorithm. Together, they should enable remove & rescan cycle to work
> for a PCI bus that has PCI devices with optional resources and/or
> disparity in BAR sizes.
> 
> For the second fix, I chose to expose find_resource_space() from
> kernel/resource.c because it should increase accuracy of the cannot-fit
> decision (currently that function is called find_resource()). In order
> to do that sensibly, a few improvements seemed in order to make its
> interface and name of the function sane before exposing it. Thus, the
> few extra patches on resource side.
> 
> v3:
> - Removed "slot" wording
>         - Renamed find_empty_resource_slot() -> find_resource_space()
> - find_resource_space() returns bool instead of int
> - Added patch to convert literal 20 related to bridge win minimum
>   alignment to __ffs(SZ_1M)
> - Fixed kerneldoc missing "struct"
> - Tweaked prints (one dbg -> info, added new dbg one for success case)
> - Changelog tweaks
>         - Take account largest >> 1 (in alignment calc)
>         - Adjust to minor changes made into calculate_memsize()
>         - Take logs from more recent kernel to get rid of reg 0xXX
> 
> v2:
> - Add "typedef" to kerneldoc to get correct formatting
> - Use RESOURCE_SIZE_MAX instead of literal
> - Remove unnecessary checks for io{port/mem}_resource
> - Apply a few style tweaks from Andy
> 
> Ilpo Järvinen (8):
>   PCI: Fix resource double counting on remove & rescan
>   resource: Rename find_resource() to find_resource_space()
>   resource: Document find_resource_space() and resource_constraint
>   resource: Use typedef for alignf callback
>   resource: Handle simple alignment inside __find_resource_space()
>   resource: Export find_resource_space()
>   PCI: Make minimum bridge window alignment reference more obvious
>   PCI: Relax bridge window tail sizing rules
> 
>  drivers/pci/bus.c       | 10 +----
>  drivers/pci/setup-bus.c | 91 +++++++++++++++++++++++++++++++++++++----
>  include/linux/ioport.h  | 44 ++++++++++++++++++--
>  include/linux/pci.h     |  5 +--
>  kernel/resource.c       | 68 ++++++++++++++----------------
>  5 files changed, 157 insertions(+), 61 deletions(-)

Applied to pci/resource for v6.11, thanks!

^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2024-06-11 23:12 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-05-07 10:25 [PATCH v3 0/8] PCI: Solve two bridge window sizing issues Ilpo Järvinen
2024-05-07 10:25 ` [PATCH v3 1/8] PCI: Fix resource double counting on remove & rescan Ilpo Järvinen
2024-05-07 10:25 ` [PATCH v3 2/8] resource: Rename find_resource() to find_resource_space() Ilpo Järvinen
2024-05-07 10:25 ` [PATCH v3 3/8] resource: Document find_resource_space() and resource_constraint Ilpo Järvinen
2024-05-07 10:25 ` [PATCH v3 4/8] resource: Use typedef for alignf callback Ilpo Järvinen
2024-05-07 10:25 ` [PATCH v3 5/8] resource: Handle simple alignment inside __find_resource_space() Ilpo Järvinen
2024-05-07 10:25 ` [PATCH v3 6/8] resource: Export find_resource_space() Ilpo Järvinen
2024-05-07 10:25 ` [PATCH v3 7/8] PCI: Make minimum bridge window alignment reference more obvious Ilpo Järvinen
2024-05-07 10:36   ` Mika Westerberg
2024-05-07 10:50     ` Ilpo Järvinen
2024-05-07 14:01   ` Andy Shevchenko
2024-05-07 10:25 ` [PATCH v3 8/8] PCI: Relax bridge window tail sizing rules Ilpo Järvinen
2024-05-07 13:49   ` Andy Shevchenko
2024-05-28 13:10 ` [PATCH v3 0/8] PCI: Solve two bridge window sizing issues Ilpo Järvinen
2024-06-11 23:12   ` Bjorn Helgaas
2024-06-11 23:12 ` Bjorn Helgaas

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).