LKML Archive mirror
 help / color / mirror / Atom feed
* [PATCH v3 00/10] perf: Avoid placing cpumask on the stack
@ 2024-04-03 15:59 Dawei Li
  2024-04-03 15:59 ` [PATCH v3 01/10] cpumask: add cpumask_any_and_but() Dawei Li
                   ` (10 more replies)
  0 siblings, 11 replies; 17+ messages in thread
From: Dawei Li @ 2024-04-03 15:59 UTC (permalink / raw
  To: will, mark.rutland, yury.norov, linux
  Cc: xueshuai, renyu.zj, yangyicong, jonathan.cameron, andersson,
	konrad.dybcio, linux-arm-kernel, linux-kernel, linux-arm-msm,
	Dawei Li

Hi all,

This is v3 of previous series on elimination of placing cpumask var on
stack.

Change since v2:
- Update commit message for patch[2-10]. [Mark] 
- Add Reviewed-by from Mark for patch[2-10].

Change since v1:
- Change from dynamic allocation to a temporary var free helper:
  cpumask_any_and_but().	[Mark]

- Some minor coding style improvements, reverse chrismas tree e.g.

- For cpumask_any_and_but() itself:
  - Moved to cpumask.h, just like other helpers.
  - Return value converted to unsigned int.
  - Remove EXPORT_SYMBOL, for obvious reason.

v1:
https://lore.kernel.org/lkml/20240402105610.1695644-1-dawei.li@shingroup.cn/
https://lore.kernel.org/lkml/1486381132-5610-1-git-send-email-mark.rutland@arm.com/

v2:
https://lore.kernel.org/lkml/20240403125109.2054881-1-dawei.li@shingroup.cn/

Dawei Li (9):
  perf/alibaba_uncore_drw: Avoid placing cpumask on the stack
  perf/arm-cmn: Avoid placing cpumask on the stack
  perf/arm_cspmu: Avoid placing cpumask on the stack
  perf/arm_dsu: Avoid placing cpumask on the stack
  perf/dwc_pcie: Avoid placing cpumask on the stack
  perf/hisi_pcie: Avoid placing cpumask on the stack
  perf/hisi_uncore: Avoid placing cpumask on the stack
  perf/qcom_l2: Avoid placing cpumask on the stack
  perf/thunderx2: Avoid placing cpumask on the stack

Mark Rutland (1):
  cpumask: add cpumask_any_and_but()

 drivers/perf/alibaba_uncore_drw_pmu.c    | 10 +++-------
 drivers/perf/arm-cmn.c                   | 10 +++++-----
 drivers/perf/arm_cspmu/arm_cspmu.c       |  8 +++-----
 drivers/perf/arm_dsu_pmu.c               | 19 ++++++-------------
 drivers/perf/dwc_pcie_pmu.c              | 10 ++++------
 drivers/perf/hisilicon/hisi_pcie_pmu.c   |  9 ++++-----
 drivers/perf/hisilicon/hisi_uncore_pmu.c |  6 ++----
 drivers/perf/qcom_l2_pmu.c               |  8 +++-----
 drivers/perf/thunderx2_pmu.c             | 10 +++-------
 include/linux/cpumask.h                  | 23 +++++++++++++++++++++++
 10 files changed, 56 insertions(+), 57 deletions(-)

Thanks,

    Dawei

-- 
2.27.0


^ permalink raw reply	[flat|nested] 17+ messages in thread

* [PATCH v3 01/10] cpumask: add cpumask_any_and_but()
  2024-04-03 15:59 [PATCH v3 00/10] perf: Avoid placing cpumask on the stack Dawei Li
@ 2024-04-03 15:59 ` Dawei Li
  2024-04-03 16:31   ` Yury Norov
  2024-04-03 15:59 ` [PATCH v3 02/10] perf/alibaba_uncore_drw: Avoid placing cpumask on the stack Dawei Li
                   ` (9 subsequent siblings)
  10 siblings, 1 reply; 17+ messages in thread
From: Dawei Li @ 2024-04-03 15:59 UTC (permalink / raw
  To: will, mark.rutland, yury.norov, linux
  Cc: xueshuai, renyu.zj, yangyicong, jonathan.cameron, andersson,
	konrad.dybcio, linux-arm-kernel, linux-kernel, linux-arm-msm,
	Thomas Gleixner, Andrew Morton, Peter Zijlstra, Rusty Russell,
	Dawei Li

From: Mark Rutland <mark.rutland@arm.com>

In some cases, it's useful to be able to select a random cpu from the
intersection of two masks, excluding a particular CPU.

For example, in some systems an uncore PMU is shared by a subset of
CPUs, and management of this PMU is assigned to some arbitrary CPU in
this set. Whenever the management CPU is hotplugged out, we wish to
migrate responsibility to another arbitrary CPU which is both in this
set and online.

Today we can use cpumask_any_and() to select an arbitrary CPU in the
intersection of two masks. We can also use cpumask_any_but() to select
any arbitrary cpu in a mask excluding, a particular CPU.

To do both, we either need to use a temporary cpumask, which is
wasteful, or use some lower-level cpumask helpers, which can be unclear.

This patch adds a new cpumask_any_and_but() to cater for these cases.

Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Dawei Li <dawei.li@shingroup.cn>
---
 include/linux/cpumask.h | 23 +++++++++++++++++++++++
 1 file changed, 23 insertions(+)

diff --git a/include/linux/cpumask.h b/include/linux/cpumask.h
index 1c29947db848..121f3ac757ff 100644
--- a/include/linux/cpumask.h
+++ b/include/linux/cpumask.h
@@ -388,6 +388,29 @@ unsigned int cpumask_any_but(const struct cpumask *mask, unsigned int cpu)
 	return i;
 }
 
+/**
+ * cpumask_any_and_but - pick a "random" cpu from *mask1 & *mask2, but not this one.
+ * @mask1: the first input cpumask
+ * @mask2: the second input cpumask
+ * @cpu: the cpu to ignore
+ *
+ * Returns >= nr_cpu_ids if no cpus set.
+ */
+static inline
+unsigned int cpumask_any_and_but(const struct cpumask *mask1,
+				 const struct cpumask *mask2,
+				 unsigned int cpu)
+{
+	unsigned int i;
+
+	cpumask_check(cpu);
+	i = cpumask_first_and(mask1, mask2);
+	if (i != cpu)
+		return i;
+
+	return cpumask_next_and(cpu, mask1, mask2);
+}
+
 /**
  * cpumask_nth - get the Nth cpu in a cpumask
  * @srcp: the cpumask pointer
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH v3 02/10] perf/alibaba_uncore_drw: Avoid placing cpumask on the stack
  2024-04-03 15:59 [PATCH v3 00/10] perf: Avoid placing cpumask on the stack Dawei Li
  2024-04-03 15:59 ` [PATCH v3 01/10] cpumask: add cpumask_any_and_but() Dawei Li
@ 2024-04-03 15:59 ` Dawei Li
  2024-04-08  0:28   ` Shuai Xue
  2024-04-03 15:59 ` [PATCH v3 03/10] perf/arm-cmn: " Dawei Li
                   ` (8 subsequent siblings)
  10 siblings, 1 reply; 17+ messages in thread
From: Dawei Li @ 2024-04-03 15:59 UTC (permalink / raw
  To: will, mark.rutland, yury.norov, linux
  Cc: xueshuai, renyu.zj, yangyicong, jonathan.cameron, andersson,
	konrad.dybcio, linux-arm-kernel, linux-kernel, linux-arm-msm,
	Dawei Li

In general it's preferable to avoid placing cpumasks on the stack, as
for large values of NR_CPUS these can consume significant amounts of
stack space and make stack overflows more likely.

Use cpumask_any_and_but() to avoid the need for a temporary cpumask on
the stack.

Suggested-by: Mark Rutland <mark.rutland@arm.com>
Reviewed-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Dawei Li <dawei.li@shingroup.cn>
---
 drivers/perf/alibaba_uncore_drw_pmu.c | 10 +++-------
 1 file changed, 3 insertions(+), 7 deletions(-)

diff --git a/drivers/perf/alibaba_uncore_drw_pmu.c b/drivers/perf/alibaba_uncore_drw_pmu.c
index a9277dcf90ce..d4d14b65c4a5 100644
--- a/drivers/perf/alibaba_uncore_drw_pmu.c
+++ b/drivers/perf/alibaba_uncore_drw_pmu.c
@@ -746,18 +746,14 @@ static int ali_drw_pmu_offline_cpu(unsigned int cpu, struct hlist_node *node)
 	struct ali_drw_pmu_irq *irq;
 	struct ali_drw_pmu *drw_pmu;
 	unsigned int target;
-	int ret;
-	cpumask_t node_online_cpus;
 
 	irq = hlist_entry_safe(node, struct ali_drw_pmu_irq, node);
 	if (cpu != irq->cpu)
 		return 0;
 
-	ret = cpumask_and(&node_online_cpus,
-			  cpumask_of_node(cpu_to_node(cpu)), cpu_online_mask);
-	if (ret)
-		target = cpumask_any_but(&node_online_cpus, cpu);
-	else
+	target = cpumask_any_and_but(cpumask_of_node(cpu_to_node(cpu)),
+				     cpu_online_mask, cpu);
+	if (target >= nr_cpu_ids)
 		target = cpumask_any_but(cpu_online_mask, cpu);
 
 	if (target >= nr_cpu_ids)
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH v3 03/10] perf/arm-cmn: Avoid placing cpumask on the stack
  2024-04-03 15:59 [PATCH v3 00/10] perf: Avoid placing cpumask on the stack Dawei Li
  2024-04-03 15:59 ` [PATCH v3 01/10] cpumask: add cpumask_any_and_but() Dawei Li
  2024-04-03 15:59 ` [PATCH v3 02/10] perf/alibaba_uncore_drw: Avoid placing cpumask on the stack Dawei Li
@ 2024-04-03 15:59 ` Dawei Li
  2024-04-03 15:59 ` [PATCH v3 04/10] perf/arm_cspmu: " Dawei Li
                   ` (7 subsequent siblings)
  10 siblings, 0 replies; 17+ messages in thread
From: Dawei Li @ 2024-04-03 15:59 UTC (permalink / raw
  To: will, mark.rutland, yury.norov, linux
  Cc: xueshuai, renyu.zj, yangyicong, jonathan.cameron, andersson,
	konrad.dybcio, linux-arm-kernel, linux-kernel, linux-arm-msm,
	Dawei Li

In general it's preferable to avoid placing cpumasks on the stack, as
for large values of NR_CPUS these can consume significant amounts of
stack space and make stack overflows more likely.

Use cpumask_any_and_but() to avoid the need for a temporary cpumask on
the stack.

Suggested-by: Mark Rutland <mark.rutland@arm.com>
Reviewed-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Dawei Li <dawei.li@shingroup.cn>
---
 drivers/perf/arm-cmn.c | 10 +++++-----
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/drivers/perf/arm-cmn.c b/drivers/perf/arm-cmn.c
index 7ef9c7e4836b..6bfb0c4a1287 100644
--- a/drivers/perf/arm-cmn.c
+++ b/drivers/perf/arm-cmn.c
@@ -1950,20 +1950,20 @@ static int arm_cmn_pmu_offline_cpu(unsigned int cpu, struct hlist_node *cpuhp_no
 	struct arm_cmn *cmn;
 	unsigned int target;
 	int node;
-	cpumask_t mask;
 
 	cmn = hlist_entry_safe(cpuhp_node, struct arm_cmn, cpuhp_node);
 	if (cpu != cmn->cpu)
 		return 0;
 
 	node = dev_to_node(cmn->dev);
-	if (cpumask_and(&mask, cpumask_of_node(node), cpu_online_mask) &&
-	    cpumask_andnot(&mask, &mask, cpumask_of(cpu)))
-		target = cpumask_any(&mask);
-	else
+
+	target = cpumask_any_and_but(cpumask_of_node(node), cpu_online_mask, cpu);
+	if (target >= nr_cpu_ids)
 		target = cpumask_any_but(cpu_online_mask, cpu);
+
 	if (target < nr_cpu_ids)
 		arm_cmn_migrate(cmn, target);
+
 	return 0;
 }
 
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH v3 04/10] perf/arm_cspmu: Avoid placing cpumask on the stack
  2024-04-03 15:59 [PATCH v3 00/10] perf: Avoid placing cpumask on the stack Dawei Li
                   ` (2 preceding siblings ...)
  2024-04-03 15:59 ` [PATCH v3 03/10] perf/arm-cmn: " Dawei Li
@ 2024-04-03 15:59 ` Dawei Li
  2024-04-03 15:59 ` [PATCH v3 05/10] perf/arm_dsu: " Dawei Li
                   ` (6 subsequent siblings)
  10 siblings, 0 replies; 17+ messages in thread
From: Dawei Li @ 2024-04-03 15:59 UTC (permalink / raw
  To: will, mark.rutland, yury.norov, linux
  Cc: xueshuai, renyu.zj, yangyicong, jonathan.cameron, andersson,
	konrad.dybcio, linux-arm-kernel, linux-kernel, linux-arm-msm,
	Dawei Li

In general it's preferable to avoid placing cpumasks on the stack, as
for large values of NR_CPUS these can consume significant amounts of
stack space and make stack overflows more likely.

Use cpumask_any_and_but() to avoid the need for a temporary cpumask on
the stack.

Suggested-by: Mark Rutland <mark.rutland@arm.com>
Reviewed-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Dawei Li <dawei.li@shingroup.cn>
---
 drivers/perf/arm_cspmu/arm_cspmu.c | 8 +++-----
 1 file changed, 3 insertions(+), 5 deletions(-)

diff --git a/drivers/perf/arm_cspmu/arm_cspmu.c b/drivers/perf/arm_cspmu/arm_cspmu.c
index b9a252272f1e..fd1004251665 100644
--- a/drivers/perf/arm_cspmu/arm_cspmu.c
+++ b/drivers/perf/arm_cspmu/arm_cspmu.c
@@ -1322,8 +1322,7 @@ static int arm_cspmu_cpu_online(unsigned int cpu, struct hlist_node *node)
 
 static int arm_cspmu_cpu_teardown(unsigned int cpu, struct hlist_node *node)
 {
-	int dst;
-	struct cpumask online_supported;
+	unsigned int dst;
 
 	struct arm_cspmu *cspmu =
 		hlist_entry_safe(node, struct arm_cspmu, cpuhp_node);
@@ -1333,9 +1332,8 @@ static int arm_cspmu_cpu_teardown(unsigned int cpu, struct hlist_node *node)
 		return 0;
 
 	/* Choose a new CPU to migrate ownership of the PMU to */
-	cpumask_and(&online_supported, &cspmu->associated_cpus,
-		    cpu_online_mask);
-	dst = cpumask_any_but(&online_supported, cpu);
+	dst = cpumask_any_and_but(&cspmu->associated_cpus,
+				  cpu_online_mask, cpu);
 	if (dst >= nr_cpu_ids)
 		return 0;
 
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH v3 05/10] perf/arm_dsu: Avoid placing cpumask on the stack
  2024-04-03 15:59 [PATCH v3 00/10] perf: Avoid placing cpumask on the stack Dawei Li
                   ` (3 preceding siblings ...)
  2024-04-03 15:59 ` [PATCH v3 04/10] perf/arm_cspmu: " Dawei Li
@ 2024-04-03 15:59 ` Dawei Li
  2024-04-03 15:59 ` [PATCH v3 06/10] perf/dwc_pcie: " Dawei Li
                   ` (5 subsequent siblings)
  10 siblings, 0 replies; 17+ messages in thread
From: Dawei Li @ 2024-04-03 15:59 UTC (permalink / raw
  To: will, mark.rutland, yury.norov, linux
  Cc: xueshuai, renyu.zj, yangyicong, jonathan.cameron, andersson,
	konrad.dybcio, linux-arm-kernel, linux-kernel, linux-arm-msm,
	Dawei Li

In general it's preferable to avoid placing cpumasks on the stack, as
for large values of NR_CPUS these can consume significant amounts of
stack space and make stack overflows more likely.

Use cpumask_any_and_but() to avoid the need for a temporary cpumask on
the stack.

Suggested-by: Mark Rutland <mark.rutland@arm.com>
Reviewed-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Dawei Li <dawei.li@shingroup.cn>
---
 drivers/perf/arm_dsu_pmu.c | 19 ++++++-------------
 1 file changed, 6 insertions(+), 13 deletions(-)

diff --git a/drivers/perf/arm_dsu_pmu.c b/drivers/perf/arm_dsu_pmu.c
index bae3ca37f846..adc0bbb5fafe 100644
--- a/drivers/perf/arm_dsu_pmu.c
+++ b/drivers/perf/arm_dsu_pmu.c
@@ -230,15 +230,6 @@ static const struct attribute_group *dsu_pmu_attr_groups[] = {
 	NULL,
 };
 
-static int dsu_pmu_get_online_cpu_any_but(struct dsu_pmu *dsu_pmu, int cpu)
-{
-	struct cpumask online_supported;
-
-	cpumask_and(&online_supported,
-			 &dsu_pmu->associated_cpus, cpu_online_mask);
-	return cpumask_any_but(&online_supported, cpu);
-}
-
 static inline bool dsu_pmu_counter_valid(struct dsu_pmu *dsu_pmu, u32 idx)
 {
 	return (idx < dsu_pmu->num_counters) ||
@@ -827,14 +818,16 @@ static int dsu_pmu_cpu_online(unsigned int cpu, struct hlist_node *node)
 
 static int dsu_pmu_cpu_teardown(unsigned int cpu, struct hlist_node *node)
 {
-	int dst;
-	struct dsu_pmu *dsu_pmu = hlist_entry_safe(node, struct dsu_pmu,
-						   cpuhp_node);
+	struct dsu_pmu *dsu_pmu;
+	unsigned int dst;
+
+	dsu_pmu = hlist_entry_safe(node, struct dsu_pmu, cpuhp_node);
 
 	if (!cpumask_test_and_clear_cpu(cpu, &dsu_pmu->active_cpu))
 		return 0;
 
-	dst = dsu_pmu_get_online_cpu_any_but(dsu_pmu, cpu);
+	dst = cpumask_any_and_but(&dsu_pmu->associated_cpus,
+				  cpu_online_mask, cpu);
 	/* If there are no active CPUs in the DSU, leave IRQ disabled */
 	if (dst >= nr_cpu_ids)
 		return 0;
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH v3 06/10] perf/dwc_pcie: Avoid placing cpumask on the stack
  2024-04-03 15:59 [PATCH v3 00/10] perf: Avoid placing cpumask on the stack Dawei Li
                   ` (4 preceding siblings ...)
  2024-04-03 15:59 ` [PATCH v3 05/10] perf/arm_dsu: " Dawei Li
@ 2024-04-03 15:59 ` Dawei Li
  2024-04-08  0:30   ` Shuai Xue
  2024-04-03 15:59 ` [PATCH v3 07/10] perf/hisi_pcie: " Dawei Li
                   ` (4 subsequent siblings)
  10 siblings, 1 reply; 17+ messages in thread
From: Dawei Li @ 2024-04-03 15:59 UTC (permalink / raw
  To: will, mark.rutland, yury.norov, linux
  Cc: xueshuai, renyu.zj, yangyicong, jonathan.cameron, andersson,
	konrad.dybcio, linux-arm-kernel, linux-kernel, linux-arm-msm,
	Dawei Li

In general it's preferable to avoid placing cpumasks on the stack, as
for large values of NR_CPUS these can consume significant amounts of
stack space and make stack overflows more likely.

Use cpumask_any_and_but() to avoid the need for a temporary cpumask on
the stack.

Suggested-by: Mark Rutland <mark.rutland@arm.com>
Reviewed-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Dawei Li <dawei.li@shingroup.cn>
---
 drivers/perf/dwc_pcie_pmu.c | 10 ++++------
 1 file changed, 4 insertions(+), 6 deletions(-)

diff --git a/drivers/perf/dwc_pcie_pmu.c b/drivers/perf/dwc_pcie_pmu.c
index 957058ad0099..c5e328f23841 100644
--- a/drivers/perf/dwc_pcie_pmu.c
+++ b/drivers/perf/dwc_pcie_pmu.c
@@ -690,9 +690,8 @@ static int dwc_pcie_pmu_offline_cpu(unsigned int cpu, struct hlist_node *cpuhp_n
 {
 	struct dwc_pcie_pmu *pcie_pmu;
 	struct pci_dev *pdev;
-	int node;
-	cpumask_t mask;
 	unsigned int target;
+	int node;
 
 	pcie_pmu = hlist_entry_safe(cpuhp_node, struct dwc_pcie_pmu, cpuhp_node);
 	/* Nothing to do if this CPU doesn't own the PMU */
@@ -702,10 +701,9 @@ static int dwc_pcie_pmu_offline_cpu(unsigned int cpu, struct hlist_node *cpuhp_n
 	pcie_pmu->on_cpu = -1;
 	pdev = pcie_pmu->pdev;
 	node = dev_to_node(&pdev->dev);
-	if (cpumask_and(&mask, cpumask_of_node(node), cpu_online_mask) &&
-	    cpumask_andnot(&mask, &mask, cpumask_of(cpu)))
-		target = cpumask_any(&mask);
-	else
+
+	target = cpumask_any_and_but(cpumask_of_node(node), cpu_online_mask, cpu);
+	if (target >= nr_cpu_ids)
 		target = cpumask_any_but(cpu_online_mask, cpu);
 
 	if (target >= nr_cpu_ids) {
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH v3 07/10] perf/hisi_pcie: Avoid placing cpumask on the stack
  2024-04-03 15:59 [PATCH v3 00/10] perf: Avoid placing cpumask on the stack Dawei Li
                   ` (5 preceding siblings ...)
  2024-04-03 15:59 ` [PATCH v3 06/10] perf/dwc_pcie: " Dawei Li
@ 2024-04-03 15:59 ` Dawei Li
  2024-04-05  9:54   ` Jonathan Cameron
  2024-04-03 15:59 ` [PATCH v3 08/10] perf/hisi_uncore: " Dawei Li
                   ` (3 subsequent siblings)
  10 siblings, 1 reply; 17+ messages in thread
From: Dawei Li @ 2024-04-03 15:59 UTC (permalink / raw
  To: will, mark.rutland, yury.norov, linux
  Cc: xueshuai, renyu.zj, yangyicong, jonathan.cameron, andersson,
	konrad.dybcio, linux-arm-kernel, linux-kernel, linux-arm-msm,
	Dawei Li

In general it's preferable to avoid placing cpumasks on the stack, as
for large values of NR_CPUS these can consume significant amounts of
stack space and make stack overflows more likely.

Use cpumask_any_and_but() to avoid the need for a temporary cpumask on
the stack.

Suggested-by: Mark Rutland <mark.rutland@arm.com>
Reviewed-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Dawei Li <dawei.li@shingroup.cn>
---
 drivers/perf/hisilicon/hisi_pcie_pmu.c | 9 ++++-----
 1 file changed, 4 insertions(+), 5 deletions(-)

diff --git a/drivers/perf/hisilicon/hisi_pcie_pmu.c b/drivers/perf/hisilicon/hisi_pcie_pmu.c
index 5d1f0e9fdb08..06b192cc31d5 100644
--- a/drivers/perf/hisilicon/hisi_pcie_pmu.c
+++ b/drivers/perf/hisilicon/hisi_pcie_pmu.c
@@ -673,7 +673,6 @@ static int hisi_pcie_pmu_offline_cpu(unsigned int cpu, struct hlist_node *node)
 {
 	struct hisi_pcie_pmu *pcie_pmu = hlist_entry_safe(node, struct hisi_pcie_pmu, node);
 	unsigned int target;
-	cpumask_t mask;
 	int numa_node;
 
 	/* Nothing to do if this CPU doesn't own the PMU */
@@ -684,10 +683,10 @@ static int hisi_pcie_pmu_offline_cpu(unsigned int cpu, struct hlist_node *node)
 
 	/* Choose a local CPU from all online cpus. */
 	numa_node = dev_to_node(&pcie_pmu->pdev->dev);
-	if (cpumask_and(&mask, cpumask_of_node(numa_node), cpu_online_mask) &&
-	    cpumask_andnot(&mask, &mask, cpumask_of(cpu)))
-		target = cpumask_any(&mask);
-	else
+
+	target = cpumask_any_and_but(cpumask_of_node(numa_node),
+				     cpu_online_mask, cpu);
+	if (target >= nr_cpu_ids)
 		target = cpumask_any_but(cpu_online_mask, cpu);
 
 	if (target >= nr_cpu_ids) {
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH v3 08/10] perf/hisi_uncore: Avoid placing cpumask on the stack
  2024-04-03 15:59 [PATCH v3 00/10] perf: Avoid placing cpumask on the stack Dawei Li
                   ` (6 preceding siblings ...)
  2024-04-03 15:59 ` [PATCH v3 07/10] perf/hisi_pcie: " Dawei Li
@ 2024-04-03 15:59 ` Dawei Li
  2024-04-05  9:54   ` Jonathan Cameron
  2024-04-03 15:59 ` [PATCH v3 09/10] perf/qcom_l2: " Dawei Li
                   ` (2 subsequent siblings)
  10 siblings, 1 reply; 17+ messages in thread
From: Dawei Li @ 2024-04-03 15:59 UTC (permalink / raw
  To: will, mark.rutland, yury.norov, linux
  Cc: xueshuai, renyu.zj, yangyicong, jonathan.cameron, andersson,
	konrad.dybcio, linux-arm-kernel, linux-kernel, linux-arm-msm,
	Dawei Li

In general it's preferable to avoid placing cpumasks on the stack, as
for large values of NR_CPUS these can consume significant amounts of
stack space and make stack overflows more likely.

Use cpumask_any_and_but() to avoid the need for a temporary cpumask on
the stack.

Suggested-by: Mark Rutland <mark.rutland@arm.com>
Reviewed-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Dawei Li <dawei.li@shingroup.cn>
---
 drivers/perf/hisilicon/hisi_uncore_pmu.c | 6 ++----
 1 file changed, 2 insertions(+), 4 deletions(-)

diff --git a/drivers/perf/hisilicon/hisi_uncore_pmu.c b/drivers/perf/hisilicon/hisi_uncore_pmu.c
index 04031450d5fe..ccc9191ad1b6 100644
--- a/drivers/perf/hisilicon/hisi_uncore_pmu.c
+++ b/drivers/perf/hisilicon/hisi_uncore_pmu.c
@@ -504,7 +504,6 @@ int hisi_uncore_pmu_offline_cpu(unsigned int cpu, struct hlist_node *node)
 {
 	struct hisi_pmu *hisi_pmu = hlist_entry_safe(node, struct hisi_pmu,
 						     node);
-	cpumask_t pmu_online_cpus;
 	unsigned int target;
 
 	if (!cpumask_test_and_clear_cpu(cpu, &hisi_pmu->associated_cpus))
@@ -518,9 +517,8 @@ int hisi_uncore_pmu_offline_cpu(unsigned int cpu, struct hlist_node *node)
 	hisi_pmu->on_cpu = -1;
 
 	/* Choose a new CPU to migrate ownership of the PMU to */
-	cpumask_and(&pmu_online_cpus, &hisi_pmu->associated_cpus,
-		    cpu_online_mask);
-	target = cpumask_any_but(&pmu_online_cpus, cpu);
+	target = cpumask_any_and_but(&hisi_pmu->associated_cpus,
+				     cpu_online_mask, cpu);
 	if (target >= nr_cpu_ids)
 		return 0;
 
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH v3 09/10] perf/qcom_l2: Avoid placing cpumask on the stack
  2024-04-03 15:59 [PATCH v3 00/10] perf: Avoid placing cpumask on the stack Dawei Li
                   ` (7 preceding siblings ...)
  2024-04-03 15:59 ` [PATCH v3 08/10] perf/hisi_uncore: " Dawei Li
@ 2024-04-03 15:59 ` Dawei Li
  2024-04-03 15:59 ` [PATCH v3 10/10] perf/thunderx2: " Dawei Li
  2024-04-09 16:59 ` [PATCH v3 00/10] perf: " Will Deacon
  10 siblings, 0 replies; 17+ messages in thread
From: Dawei Li @ 2024-04-03 15:59 UTC (permalink / raw
  To: will, mark.rutland, yury.norov, linux
  Cc: xueshuai, renyu.zj, yangyicong, jonathan.cameron, andersson,
	konrad.dybcio, linux-arm-kernel, linux-kernel, linux-arm-msm,
	Dawei Li

In general it's preferable to avoid placing cpumasks on the stack, as
for large values of NR_CPUS these can consume significant amounts of
stack space and make stack overflows more likely.

Use cpumask_any_and_but() to avoid the need for a temporary cpumask on
the stack.

Suggested-by: Mark Rutland <mark.rutland@arm.com>
Reviewed-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Dawei Li <dawei.li@shingroup.cn>
---
 drivers/perf/qcom_l2_pmu.c | 8 +++-----
 1 file changed, 3 insertions(+), 5 deletions(-)

diff --git a/drivers/perf/qcom_l2_pmu.c b/drivers/perf/qcom_l2_pmu.c
index 148df5ae8ef8..b5a44dc1dc3a 100644
--- a/drivers/perf/qcom_l2_pmu.c
+++ b/drivers/perf/qcom_l2_pmu.c
@@ -801,9 +801,8 @@ static int l2cache_pmu_online_cpu(unsigned int cpu, struct hlist_node *node)
 
 static int l2cache_pmu_offline_cpu(unsigned int cpu, struct hlist_node *node)
 {
-	struct cluster_pmu *cluster;
 	struct l2cache_pmu *l2cache_pmu;
-	cpumask_t cluster_online_cpus;
+	struct cluster_pmu *cluster;
 	unsigned int target;
 
 	l2cache_pmu = hlist_entry_safe(node, struct l2cache_pmu, node);
@@ -820,9 +819,8 @@ static int l2cache_pmu_offline_cpu(unsigned int cpu, struct hlist_node *node)
 	cluster->on_cpu = -1;
 
 	/* Any other CPU for this cluster which is still online */
-	cpumask_and(&cluster_online_cpus, &cluster->cluster_cpus,
-		    cpu_online_mask);
-	target = cpumask_any_but(&cluster_online_cpus, cpu);
+	target = cpumask_any_and_but(&cluster->cluster_cpus,
+				     cpu_online_mask, cpu);
 	if (target >= nr_cpu_ids) {
 		disable_irq(cluster->irq);
 		return 0;
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH v3 10/10] perf/thunderx2: Avoid placing cpumask on the stack
  2024-04-03 15:59 [PATCH v3 00/10] perf: Avoid placing cpumask on the stack Dawei Li
                   ` (8 preceding siblings ...)
  2024-04-03 15:59 ` [PATCH v3 09/10] perf/qcom_l2: " Dawei Li
@ 2024-04-03 15:59 ` Dawei Li
  2024-04-09 16:59 ` [PATCH v3 00/10] perf: " Will Deacon
  10 siblings, 0 replies; 17+ messages in thread
From: Dawei Li @ 2024-04-03 15:59 UTC (permalink / raw
  To: will, mark.rutland, yury.norov, linux
  Cc: xueshuai, renyu.zj, yangyicong, jonathan.cameron, andersson,
	konrad.dybcio, linux-arm-kernel, linux-kernel, linux-arm-msm,
	Dawei Li

In general it's preferable to avoid placing cpumasks on the stack, as
for large values of NR_CPUS these can consume significant amounts of
stack space and make stack overflows more likely.

Use cpumask_any_and_but() to avoid the need for a temporary cpumask on
the stack.

Suggested-by: Mark Rutland <mark.rutland@arm.com>
Reviewed-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Dawei Li <dawei.li@shingroup.cn>
---
 drivers/perf/thunderx2_pmu.c | 10 +++-------
 1 file changed, 3 insertions(+), 7 deletions(-)

diff --git a/drivers/perf/thunderx2_pmu.c b/drivers/perf/thunderx2_pmu.c
index e16d10c763de..b3377b662ffc 100644
--- a/drivers/perf/thunderx2_pmu.c
+++ b/drivers/perf/thunderx2_pmu.c
@@ -932,9 +932,8 @@ static int tx2_uncore_pmu_online_cpu(unsigned int cpu,
 static int tx2_uncore_pmu_offline_cpu(unsigned int cpu,
 		struct hlist_node *hpnode)
 {
-	int new_cpu;
 	struct tx2_uncore_pmu *tx2_pmu;
-	struct cpumask cpu_online_mask_temp;
+	unsigned int new_cpu;
 
 	tx2_pmu = hlist_entry_safe(hpnode,
 			struct tx2_uncore_pmu, hpnode);
@@ -945,11 +944,8 @@ static int tx2_uncore_pmu_offline_cpu(unsigned int cpu,
 	if (tx2_pmu->hrtimer_callback)
 		hrtimer_cancel(&tx2_pmu->hrtimer);
 
-	cpumask_copy(&cpu_online_mask_temp, cpu_online_mask);
-	cpumask_clear_cpu(cpu, &cpu_online_mask_temp);
-	new_cpu = cpumask_any_and(
-			cpumask_of_node(tx2_pmu->node),
-			&cpu_online_mask_temp);
+	new_cpu = cpumask_any_and_but(cpumask_of_node(tx2_pmu->node),
+				      cpu_online_mask, cpu);
 
 	tx2_pmu->cpu = new_cpu;
 	if (new_cpu >= nr_cpu_ids)
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: [PATCH v3 01/10] cpumask: add cpumask_any_and_but()
  2024-04-03 15:59 ` [PATCH v3 01/10] cpumask: add cpumask_any_and_but() Dawei Li
@ 2024-04-03 16:31   ` Yury Norov
  0 siblings, 0 replies; 17+ messages in thread
From: Yury Norov @ 2024-04-03 16:31 UTC (permalink / raw
  To: Dawei Li
  Cc: will, mark.rutland, linux, xueshuai, renyu.zj, yangyicong,
	jonathan.cameron, andersson, konrad.dybcio, linux-arm-kernel,
	linux-kernel, linux-arm-msm, Thomas Gleixner, Andrew Morton,
	Peter Zijlstra, Rusty Russell

On Wed, Apr 03, 2024 at 11:59:41PM +0800, Dawei Li wrote:
> From: Mark Rutland <mark.rutland@arm.com>
> 
> In some cases, it's useful to be able to select a random cpu from the
> intersection of two masks, excluding a particular CPU.
> 
> For example, in some systems an uncore PMU is shared by a subset of
> CPUs, and management of this PMU is assigned to some arbitrary CPU in
> this set. Whenever the management CPU is hotplugged out, we wish to
> migrate responsibility to another arbitrary CPU which is both in this
> set and online.
> 
> Today we can use cpumask_any_and() to select an arbitrary CPU in the
> intersection of two masks. We can also use cpumask_any_but() to select
> any arbitrary cpu in a mask excluding, a particular CPU.
> 
> To do both, we either need to use a temporary cpumask, which is
> wasteful, or use some lower-level cpumask helpers, which can be unclear.
> 
> This patch adds a new cpumask_any_and_but() to cater for these cases.
> 
> Signed-off-by: Mark Rutland <mark.rutland@arm.com>
> Cc: Thomas Gleixner <tglx@linutronix.de>
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Cc: Peter Zijlstra <peterz@infradead.org>
> Cc: Rusty Russell <rusty@rustcorp.com.au>
> Cc: linux-kernel@vger.kernel.org
> Signed-off-by: Dawei Li <dawei.li@shingroup.cn>

Thank you,

Acked-by: Yury Norov <yury.norov@gmail.com>

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v3 07/10] perf/hisi_pcie: Avoid placing cpumask on the stack
  2024-04-03 15:59 ` [PATCH v3 07/10] perf/hisi_pcie: " Dawei Li
@ 2024-04-05  9:54   ` Jonathan Cameron
  0 siblings, 0 replies; 17+ messages in thread
From: Jonathan Cameron @ 2024-04-05  9:54 UTC (permalink / raw
  To: Dawei Li
  Cc: will, mark.rutland, yury.norov, linux, xueshuai, renyu.zj,
	yangyicong, andersson, konrad.dybcio, linux-arm-kernel,
	linux-kernel, linux-arm-msm

On Wed, 3 Apr 2024 23:59:47 +0800
Dawei Li <dawei.li@shingroup.cn> wrote:

> In general it's preferable to avoid placing cpumasks on the stack, as
> for large values of NR_CPUS these can consume significant amounts of
> stack space and make stack overflows more likely.
> 
> Use cpumask_any_and_but() to avoid the need for a temporary cpumask on
> the stack.
> 
> Suggested-by: Mark Rutland <mark.rutland@arm.com>
> Reviewed-by: Mark Rutland <mark.rutland@arm.com>
> Signed-off-by: Dawei Li <dawei.li@shingroup.cn>
Acked-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v3 08/10] perf/hisi_uncore: Avoid placing cpumask on the stack
  2024-04-03 15:59 ` [PATCH v3 08/10] perf/hisi_uncore: " Dawei Li
@ 2024-04-05  9:54   ` Jonathan Cameron
  0 siblings, 0 replies; 17+ messages in thread
From: Jonathan Cameron @ 2024-04-05  9:54 UTC (permalink / raw
  To: Dawei Li
  Cc: will, mark.rutland, yury.norov, linux, xueshuai, renyu.zj,
	yangyicong, andersson, konrad.dybcio, linux-arm-kernel,
	linux-kernel, linux-arm-msm

On Wed, 3 Apr 2024 23:59:48 +0800
Dawei Li <dawei.li@shingroup.cn> wrote:

> In general it's preferable to avoid placing cpumasks on the stack, as
> for large values of NR_CPUS these can consume significant amounts of
> stack space and make stack overflows more likely.
> 
> Use cpumask_any_and_but() to avoid the need for a temporary cpumask on
> the stack.
> 
> Suggested-by: Mark Rutland <mark.rutland@arm.com>
> Reviewed-by: Mark Rutland <mark.rutland@arm.com>
> Signed-off-by: Dawei Li <dawei.li@shingroup.cn>
Acked-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v3 02/10] perf/alibaba_uncore_drw: Avoid placing cpumask on the stack
  2024-04-03 15:59 ` [PATCH v3 02/10] perf/alibaba_uncore_drw: Avoid placing cpumask on the stack Dawei Li
@ 2024-04-08  0:28   ` Shuai Xue
  0 siblings, 0 replies; 17+ messages in thread
From: Shuai Xue @ 2024-04-08  0:28 UTC (permalink / raw
  To: Dawei Li, will, mark.rutland, yury.norov, linux
  Cc: renyu.zj, yangyicong, jonathan.cameron, andersson, konrad.dybcio,
	linux-arm-kernel, linux-kernel, linux-arm-msm



On 2024/4/3 23:59, Dawei Li wrote:
> In general it's preferable to avoid placing cpumasks on the stack, as
> for large values of NR_CPUS these can consume significant amounts of
> stack space and make stack overflows more likely.
> 
> Use cpumask_any_and_but() to avoid the need for a temporary cpumask on
> the stack.
> 
> Suggested-by: Mark Rutland <mark.rutland@arm.com>
> Reviewed-by: Mark Rutland <mark.rutland@arm.com>
> Signed-off-by: Dawei Li <dawei.li@shingroup.cn>
> ---

Reviewed-by: Shuai Xue <xueshuai@linux.alibaba.com>

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v3 06/10] perf/dwc_pcie: Avoid placing cpumask on the stack
  2024-04-03 15:59 ` [PATCH v3 06/10] perf/dwc_pcie: " Dawei Li
@ 2024-04-08  0:30   ` Shuai Xue
  0 siblings, 0 replies; 17+ messages in thread
From: Shuai Xue @ 2024-04-08  0:30 UTC (permalink / raw
  To: Dawei Li, will, mark.rutland, yury.norov, linux
  Cc: renyu.zj, yangyicong, jonathan.cameron, andersson, konrad.dybcio,
	linux-arm-kernel, linux-kernel, linux-arm-msm



On 2024/4/3 23:59, Dawei Li wrote:
> In general it's preferable to avoid placing cpumasks on the stack, as
> for large values of NR_CPUS these can consume significant amounts of
> stack space and make stack overflows more likely.
> 
> Use cpumask_any_and_but() to avoid the need for a temporary cpumask on
> the stack.
> 
> Suggested-by: Mark Rutland <mark.rutland@arm.com>
> Reviewed-by: Mark Rutland <mark.rutland@arm.com>
> Signed-off-by: Dawei Li <dawei.li@shingroup.cn>

Reviewed-by: Shuai Xue <xueshuai@linux.alibaba.com>

Thank you.

Shuai

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v3 00/10] perf: Avoid placing cpumask on the stack
  2024-04-03 15:59 [PATCH v3 00/10] perf: Avoid placing cpumask on the stack Dawei Li
                   ` (9 preceding siblings ...)
  2024-04-03 15:59 ` [PATCH v3 10/10] perf/thunderx2: " Dawei Li
@ 2024-04-09 16:59 ` Will Deacon
  10 siblings, 0 replies; 17+ messages in thread
From: Will Deacon @ 2024-04-09 16:59 UTC (permalink / raw
  To: mark.rutland, yury.norov, linux, Dawei Li
  Cc: catalin.marinas, kernel-team, Will Deacon, xueshuai, renyu.zj,
	yangyicong, jonathan.cameron, andersson, konrad.dybcio,
	linux-arm-kernel, linux-kernel, linux-arm-msm

On Wed, 03 Apr 2024 23:59:40 +0800, Dawei Li wrote:
> This is v3 of previous series on elimination of placing cpumask var on
> stack.
> 
> Change since v2:
> - Update commit message for patch[2-10]. [Mark]
> - Add Reviewed-by from Mark for patch[2-10].
> 
> [...]

Applied to will (for-next/perf), thanks!

[01/10] cpumask: add cpumask_any_and_but()
        https://git.kernel.org/will/c/897fa2c38c07
[02/10] perf/alibaba_uncore_drw: Avoid placing cpumask on the stack
        https://git.kernel.org/will/c/2f6589df124e
[03/10] perf/arm-cmn: Avoid placing cpumask on the stack
        https://git.kernel.org/will/c/60c73240f304
[04/10] perf/arm_cspmu: Avoid placing cpumask on the stack
        https://git.kernel.org/will/c/b5310fa1fe8e
[05/10] perf/arm_dsu: Avoid placing cpumask on the stack
        https://git.kernel.org/will/c/bea2a13b207e
[06/10] perf/dwc_pcie: Avoid placing cpumask on the stack
        https://git.kernel.org/will/c/cf276ee46bc4
[07/10] perf/hisi_pcie: Avoid placing cpumask on the stack
        https://git.kernel.org/will/c/d7df79e6af29
[08/10] perf/hisi_uncore: Avoid placing cpumask on the stack
        https://git.kernel.org/will/c/b78d0fa25462
[09/10] perf/qcom_l2: Avoid placing cpumask on the stack
        https://git.kernel.org/will/c/fc85cee97029
[10/10] perf/thunderx2: Avoid placing cpumask on the stack
        https://git.kernel.org/will/c/595275ca4984

Cheers,
-- 
Will

https://fixes.arm64.dev
https://next.arm64.dev
https://will.arm64.dev

^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2024-04-09 16:59 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-04-03 15:59 [PATCH v3 00/10] perf: Avoid placing cpumask on the stack Dawei Li
2024-04-03 15:59 ` [PATCH v3 01/10] cpumask: add cpumask_any_and_but() Dawei Li
2024-04-03 16:31   ` Yury Norov
2024-04-03 15:59 ` [PATCH v3 02/10] perf/alibaba_uncore_drw: Avoid placing cpumask on the stack Dawei Li
2024-04-08  0:28   ` Shuai Xue
2024-04-03 15:59 ` [PATCH v3 03/10] perf/arm-cmn: " Dawei Li
2024-04-03 15:59 ` [PATCH v3 04/10] perf/arm_cspmu: " Dawei Li
2024-04-03 15:59 ` [PATCH v3 05/10] perf/arm_dsu: " Dawei Li
2024-04-03 15:59 ` [PATCH v3 06/10] perf/dwc_pcie: " Dawei Li
2024-04-08  0:30   ` Shuai Xue
2024-04-03 15:59 ` [PATCH v3 07/10] perf/hisi_pcie: " Dawei Li
2024-04-05  9:54   ` Jonathan Cameron
2024-04-03 15:59 ` [PATCH v3 08/10] perf/hisi_uncore: " Dawei Li
2024-04-05  9:54   ` Jonathan Cameron
2024-04-03 15:59 ` [PATCH v3 09/10] perf/qcom_l2: " Dawei Li
2024-04-03 15:59 ` [PATCH v3 10/10] perf/thunderx2: " Dawei Li
2024-04-09 16:59 ` [PATCH v3 00/10] perf: " Will Deacon

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).