LKML Archive mirror
 help / color / mirror / Atom feed
* [PATCH v3 0/6] CPUs capacity information for heterogeneous systems
@ 2016-02-03 11:59 Juri Lelli
  2016-02-03 11:59 ` [PATCH v3 1/6] ARM: initialize cpu_scale to its default Juri Lelli
                   ` (6 more replies)
  0 siblings, 7 replies; 27+ messages in thread
From: Juri Lelli @ 2016-02-03 11:59 UTC (permalink / raw)
  To: linux-kernel
  Cc: linux-pm, linux-arm-kernel, devicetree, peterz, vincent.guittot,
	robh+dt, mark.rutland, linux, sudeep.holla, lorenzo.pieralisi,
	catalin.marinas, will.deacon, morten.rasmussen, dietmar.eggemann,
	juri.lelli, broonie

Hi all,

this is take 3 of "CPUs capacity information for heterogeneous systems"
patchset [1]; some context follows.

ARM systems may be configured to have CPUs with different power/performance
characteristics within the same chip. In this case, additional information has
to be made available to the kernel (the scheduler in particular) for it to be
aware of such differences and take decisions accordingly. This RFC stems from
the ongoing discussion about introducing a simple platform energy cost model to
guide scheduling decisions (a.k.a Energy Aware Scheduling [3]), but also aims
to be an independent track aimed to standardise the way we make the scheduler
aware of heterogenous CPU systems. With these patches and in addition patches
from [3] (that make the scheduler wakeup paths aware of heterogenous CPU
systems) we enable the scheduler to have good default performance on such
systems. In addition, we get a clearly defined way of providing the scheduler
with needed information about CPU capacity on such systems.

CPU capacity is defined in this context as a number that provides the scheduler
information about CPUs heterogeneity. Such heterogeneity can come from
micro-architectural differences (e.g., ARM big.LITTLE systems) or maximum
frequency at which CPUs can run (e.g., SMP systems with multiple frequency
domains and different max frequencies). Heterogeneity in this context is about
differing performance characteristics; in practice, the binding that we propose
in this RFC tries to capture a first-order approximation of the relative
performance of CPUs.

Several approaches for providing CPUs capacity information have been already
discussed on the list:

 v1: DT + sysfs [1]

 v2: Dynamic profiling at boot [2]

Third version of this patchset proposes what seems to be the solution we agreed
upon (see [2] for reference) to the problem of how do we init CPUs original
capacity: we run a bogus benchmark (stealing int_sqrt from lib/ we run that in
a loop to perform some integer computation, better benchmarks are welcome)
on the first cpu of each frequency domain (assuming no u-arch differences
inside domains), measure time to complete a fixed number of iterations and then
normalize results to SCHED_CAPACITY_SCALE (1024). This time around we also
added a boot time parameter to disable profiling at boot (as it can be time
consuming) and sysfs attributes with which default values can be overwritten.
The proposed solution is basically putting together bits of v1 and v2 that are
considered valuable and acceptable for mainline.

What follows gives you and idea of the kind of results you can expect comparing
the dynamic approach to profiling in userspace:

                          LITTLE      big

 TC2-userspace_profile     430        1024
 TC2-dynamic_profile      ~490        1024

 JUNO-userspace_profile    446        1024
 JUNO-dynamic_profile     ~424        1024

This time around we also decided to remove the RFC tag; even if patches might
still need some degree of improvement and discussion, there seems to be general
consensus about the idea behind the current solution (i.e., nobody NAKed it yet 
:)).

Patches high level description:

 o 01/06 cleans up how cpu_scale is initialized in arm (already landed on
   Russell's patch system)
 o 02/06 introduces dynamic profiling of CPUs capacity at boot
 o [03-04]/06 enable dynamic profiling for arm and arm64.
 o [05-06]/06 introduce sysfs attribute for arm and arm64.

The patchset is based on top of mainline as of today (4.5-rc2).

In case you would like to test this out, I pushed a branch here:

 git://linux-arm.org/linux-jl.git upstream/default_caps_v3

This branch contains additional patches, useful to better understand how CPU
capacity information is actually used by the scheduler. Discussion regarding
these additional patches will be started with a different posting in the
future. We just didn't want to make discussion too broad, as we realize that
this set can be controversial already on its own.

Comments, concerns and rants are more than welcome!

Best,

- Juri

[1] v1 - https://lkml.org/lkml/2015/11/23/391
[2] v2 - https://lkml.org/lkml/2016/1/8/417
[3] https://lkml.org/lkml/2015/7/7/754

Juri Lelli (6):
  ARM: initialize cpu_scale to its default
  drivers/cpufreq: implement init_cpu_capacity_default()
  arm: Enable dynamic CPU capacity initialization
  arm64: Enable dynamic CPU capacity initialization
  arm: add sysfs cpu_capacity attribute
  arm64: add sysfs cpu_capacity attribute

 Documentation/kernel-parameters.txt |   4 +
 arch/arm/kernel/topology.c          |  79 +++++++++++++++-
 arch/arm64/kernel/topology.c        |  85 ++++++++++++++++++
 drivers/cpufreq/Makefile            |   2 +-
 drivers/cpufreq/cpufreq.c           |   1 +
 drivers/cpufreq/cpufreq_capacity.c  | 174 ++++++++++++++++++++++++++++++++++++
 include/linux/cpufreq.h             |   2 +
 7 files changed, 342 insertions(+), 5 deletions(-)
 create mode 100644 drivers/cpufreq/cpufreq_capacity.c

-- 
2.7.0

^ permalink raw reply	[flat|nested] 27+ messages in thread

* [PATCH v3 1/6] ARM: initialize cpu_scale to its default
  2016-02-03 11:59 [PATCH v3 0/6] CPUs capacity information for heterogeneous systems Juri Lelli
@ 2016-02-03 11:59 ` Juri Lelli
  2016-02-03 11:59 ` [PATCH v3 2/6] drivers/cpufreq: implement init_cpu_capacity_default() Juri Lelli
                   ` (5 subsequent siblings)
  6 siblings, 0 replies; 27+ messages in thread
From: Juri Lelli @ 2016-02-03 11:59 UTC (permalink / raw)
  To: linux-kernel
  Cc: linux-pm, linux-arm-kernel, devicetree, peterz, vincent.guittot,
	robh+dt, mark.rutland, linux, sudeep.holla, lorenzo.pieralisi,
	catalin.marinas, will.deacon, morten.rasmussen, dietmar.eggemann,
	juri.lelli, broonie

Instead of looping through all cpus calling set_capacity_scale, we can
initialise cpu_scale per-cpu variables to SCHED_CAPACITY_SCALE with their
definition.

Cc: Russell King <linux@arm.linux.org.uk>
Acked-by: Vincent Guittot <vincent.guittot@linaro.org>
Signed-off-by: Juri Lelli <juri.lelli@arm.com>
---
 arch/arm/kernel/topology.c | 4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/arch/arm/kernel/topology.c b/arch/arm/kernel/topology.c
index 08b7847..ec279d1 100644
--- a/arch/arm/kernel/topology.c
+++ b/arch/arm/kernel/topology.c
@@ -40,7 +40,7 @@
  * to run the rebalance_domains for all idle cores and the cpu_capacity can be
  * updated during this sequence.
  */
-static DEFINE_PER_CPU(unsigned long, cpu_scale);
+static DEFINE_PER_CPU(unsigned long, cpu_scale) = SCHED_CAPACITY_SCALE;
 
 unsigned long arch_scale_cpu_capacity(struct sched_domain *sd, int cpu)
 {
@@ -306,8 +306,6 @@ void __init init_cpu_topology(void)
 		cpu_topo->socket_id = -1;
 		cpumask_clear(&cpu_topo->core_sibling);
 		cpumask_clear(&cpu_topo->thread_sibling);
-
-		set_capacity_scale(cpu, SCHED_CAPACITY_SCALE);
 	}
 	smp_wmb();
 
-- 
2.7.0

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH v3 2/6] drivers/cpufreq: implement init_cpu_capacity_default()
  2016-02-03 11:59 [PATCH v3 0/6] CPUs capacity information for heterogeneous systems Juri Lelli
  2016-02-03 11:59 ` [PATCH v3 1/6] ARM: initialize cpu_scale to its default Juri Lelli
@ 2016-02-03 11:59 ` Juri Lelli
  2016-02-03 21:04   ` Vincent Guittot
  2016-02-03 11:59 ` [PATCH v3 3/6] arm: Enable dynamic CPU capacity initialization Juri Lelli
                   ` (4 subsequent siblings)
  6 siblings, 1 reply; 27+ messages in thread
From: Juri Lelli @ 2016-02-03 11:59 UTC (permalink / raw)
  To: linux-kernel
  Cc: linux-pm, linux-arm-kernel, devicetree, peterz, vincent.guittot,
	robh+dt, mark.rutland, linux, sudeep.holla, lorenzo.pieralisi,
	catalin.marinas, will.deacon, morten.rasmussen, dietmar.eggemann,
	juri.lelli, broonie, Rafael J. Wysocki, Viresh Kumar

To get default values for CPUs capacity we profile a simple (bogus)
integer benchmark on such CPUs; then we normalize results to 1024
(highest capacity in the system).

Architectures that want this during boot have to define a weak function
(arch_wants_init_cpu_capacity) to return true.

Also, kernel has to boot with init_cpu_capacity parameter if profiling
is needed, as it can be expensive and might add ~1 sec to boot time.

Cc: Russell King <linux@arm.linux.org.uk>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net>
Cc: Viresh Kumar <viresh.kumar@linaro.org>
Cc: Vincent Guittot <vincent.guittot@linaro.org>
Cc: Sudeep Holla <sudeep.holla@arm.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Juri Lelli <juri.lelli@arm.com>
---
 Changes since v1:
 - add kernel command line parameter to enable profiling
 - add define for max trials

 Documentation/kernel-parameters.txt |   4 +
 arch/arm/kernel/topology.c          |   2 +-
 arch/arm64/kernel/topology.c        |  12 +++
 drivers/cpufreq/Makefile            |   2 +-
 drivers/cpufreq/cpufreq.c           |   1 +
 drivers/cpufreq/cpufreq_capacity.c  | 174 ++++++++++++++++++++++++++++++++++++
 include/linux/cpufreq.h             |   2 +
 7 files changed, 195 insertions(+), 2 deletions(-)
 create mode 100644 drivers/cpufreq/cpufreq_capacity.c

diff --git a/Documentation/kernel-parameters.txt b/Documentation/kernel-parameters.txt
index 87d40a7..fad2b89 100644
--- a/Documentation/kernel-parameters.txt
+++ b/Documentation/kernel-parameters.txt
@@ -1570,6 +1570,10 @@ bytes respectively. Such letter suffixes can also be entirely omitted.
 
 	initrd=		[BOOT] Specify the location of the initial ramdisk
 
+	init_cpu_capacity
+			[KNL,ARM] Enables dynamic CPUs capacity benchmarking
+			at boot.
+
 	inport.irq=	[HW] Inport (ATI XL and Microsoft) busmouse driver
 			Format: <irq>
 
diff --git a/arch/arm/kernel/topology.c b/arch/arm/kernel/topology.c
index ec279d1..c9c87a5 100644
--- a/arch/arm/kernel/topology.c
+++ b/arch/arm/kernel/topology.c
@@ -47,7 +47,7 @@ unsigned long arch_scale_cpu_capacity(struct sched_domain *sd, int cpu)
 	return per_cpu(cpu_scale, cpu);
 }
 
-static void set_capacity_scale(unsigned int cpu, unsigned long capacity)
+void set_capacity_scale(unsigned int cpu, unsigned long capacity)
 {
 	per_cpu(cpu_scale, cpu) = capacity;
 }
diff --git a/arch/arm64/kernel/topology.c b/arch/arm64/kernel/topology.c
index 694f6de..3b75d63 100644
--- a/arch/arm64/kernel/topology.c
+++ b/arch/arm64/kernel/topology.c
@@ -23,6 +23,18 @@
 #include <asm/cputype.h>
 #include <asm/topology.h>
 
+static DEFINE_PER_CPU(unsigned long, cpu_scale) = SCHED_CAPACITY_SCALE;
+
+unsigned long arm_arch_scale_cpu_capacity(struct sched_domain *sd, int cpu)
+{
+	return per_cpu(cpu_scale, cpu);
+}
+
+void set_capacity_scale(unsigned int cpu, unsigned long capacity)
+{
+	per_cpu(cpu_scale, cpu) = capacity;
+}
+
 static int __init get_cpu_for_node(struct device_node *node)
 {
 	struct device_node *cpu_node;
diff --git a/drivers/cpufreq/Makefile b/drivers/cpufreq/Makefile
index 9e63fb1..c4025fd 100644
--- a/drivers/cpufreq/Makefile
+++ b/drivers/cpufreq/Makefile
@@ -1,5 +1,5 @@
 # CPUfreq core
-obj-$(CONFIG_CPU_FREQ)			+= cpufreq.o freq_table.o
+obj-$(CONFIG_CPU_FREQ)			+= cpufreq.o freq_table.o cpufreq_capacity.o
 
 # CPUfreq stats
 obj-$(CONFIG_CPU_FREQ_STAT)             += cpufreq_stats.o
diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c
index e979ec7..b22afe8 100644
--- a/drivers/cpufreq/cpufreq.c
+++ b/drivers/cpufreq/cpufreq.c
@@ -2440,6 +2440,7 @@ int cpufreq_register_driver(struct cpufreq_driver *driver_data)
 	}
 
 	register_hotcpu_notifier(&cpufreq_cpu_notifier);
+	cpufreq_init_cpu_capacity();
 	pr_debug("driver %s up and running\n", driver_data->name);
 
 out:
diff --git a/drivers/cpufreq/cpufreq_capacity.c b/drivers/cpufreq/cpufreq_capacity.c
new file mode 100644
index 0000000..e54310b
--- /dev/null
+++ b/drivers/cpufreq/cpufreq_capacity.c
@@ -0,0 +1,174 @@
+/*
+ * Default CPU capacity calculation for u-arch invariance
+ *
+ * Copyright (C) 2015 ARM Ltd.
+ * Juri Lelli <juri.lelli@arm.com>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed "as is" WITHOUT ANY WARRANTY of any
+ * kind, whether express or implied; without even the implied warranty
+ * of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ */
+#include <linux/cpufreq.h>
+#include <linux/sched.h>
+
+#define MAX_TRIALS 10 /* how many times benchmark is executed */
+static unsigned long long elapsed[NR_CPUS];
+
+/*
+ * Don't let compiler optimize following two functions; we want to avoid any
+ * microarchitecture specific optimization that compiler would do and favour
+ * one CPU vs. another. Also, my_int_sqrt is cut-and-paste from
+ * lib/int_sqrt.c.
+ */
+static unsigned long __attribute__((optimize("O0")))
+my_int_sqrt(unsigned long x)
+{
+	unsigned long b, m, y = 0;
+
+	if (x <= 1)
+		return x;
+
+	m = 1UL << (BITS_PER_LONG - 2);
+	while (m != 0) {
+		b = y + m;
+		y >>= 1;
+
+		if (x >= b) {
+			x -= b;
+			y += m;
+		}
+		m >>= 2;
+	}
+
+	return y;
+}
+
+static unsigned long __attribute__((optimize("O0")))
+bogus_bench(void)
+{
+	unsigned long i, res;
+
+	for (i = 0; i < 100000; i++)
+		res = my_int_sqrt(i);
+
+	return res;
+}
+
+static int run_bogus_benchmark(int cpu)
+{
+	int ret, trials = MAX_TRIALS;
+	u64 begin, end, sample, mean = 0, count = 0;
+	unsigned long res;
+
+	ret = set_cpus_allowed_ptr(current, cpumask_of(cpu));
+	if (ret) {
+		pr_warn("%s: failed to set allowed ptr\n", __func__);
+		return -EINVAL;
+	}
+
+	while (trials--) {
+		begin = local_clock();
+		res = bogus_bench();
+		end = local_clock();
+		sample = end - begin;
+
+		mean = mean * count + sample;
+		mean = div64_u64(mean, ++count);
+		pr_debug("%s: cpu=%d begin=%llu end=%llu"
+			 " sample=%llu mean=%llu count=%llu res=%lu\n",
+			__func__, cpu, begin, end, sample,
+			mean, count, res);
+	}
+	elapsed[cpu] = mean;
+
+	ret = set_cpus_allowed_ptr(current, cpu_active_mask);
+	if (ret) {
+		pr_warn("%s: failed to set allowed ptr\n", __func__);
+		return -EINVAL;
+	}
+
+	return 0;
+}
+
+bool __weak arch_wants_init_cpu_capacity(void)
+{
+	return false;
+}
+
+void __weak set_capacity_scale(int cpu, unsigned long capacity) { }
+
+static __read_mostly bool init_cpu_capacity_enabled;
+
+static int __init init_cpu_capacity_setup(char *str)
+{
+	init_cpu_capacity_enabled = true;
+
+	return 0;
+}
+early_param("init_cpu_capacity", init_cpu_capacity_setup);
+
+void cpufreq_init_cpu_capacity(void)
+{
+	int cpu, fcpu;
+	unsigned long long elapsed_min = ULLONG_MAX;
+	unsigned int curr_min, curr_max;
+	struct cpufreq_policy *policy;
+
+	if (!arch_wants_init_cpu_capacity() || !init_cpu_capacity_enabled)
+		return;
+
+	for_each_possible_cpu(cpu) {
+		policy = cpufreq_cpu_get(cpu);
+		if (IS_ERR_OR_NULL(policy))
+			return;
+
+		/*
+		 * We profile only first CPU of each frequency domain;
+		 * and use that value as capacity of every CPU in the domain.
+		 */
+		fcpu = cpumask_first(policy->related_cpus);
+		if (cpu != fcpu) {
+			elapsed[cpu] = elapsed[fcpu];
+			cpufreq_cpu_put(policy);
+			continue;
+		}
+
+		down_write(&policy->rwsem);
+		curr_min = policy->user_policy.min;
+		curr_max = policy->user_policy.max;
+		policy->user_policy.min = policy->cpuinfo.max_freq;
+		policy->user_policy.max = policy->cpuinfo.max_freq;
+		up_write(&policy->rwsem);
+		cpufreq_cpu_put(policy);
+		cpufreq_update_policy(cpu);
+
+		run_bogus_benchmark(cpu);
+		if (elapsed[cpu] < elapsed_min)
+			elapsed_min = elapsed[cpu];
+		pr_debug("%s: cpu=%d elapsed=%llu (min=%llu)\n",
+				__func__, cpu, elapsed[cpu], elapsed_min);
+
+		policy = cpufreq_cpu_get(cpu);
+		down_write(&policy->rwsem);
+		policy->user_policy.min = curr_min;
+		policy->user_policy.max = curr_max;
+		up_write(&policy->rwsem);
+		cpufreq_cpu_put(policy);
+		cpufreq_update_policy(cpu);
+	}
+
+	for_each_possible_cpu(cpu) {
+		unsigned long capacity;
+
+		capacity = div64_u64((elapsed_min << 10), elapsed[cpu]);
+		pr_debug("%s: CPU%d capacity=%lu\n", __func__, cpu, capacity);
+		set_capacity_scale(cpu, capacity);
+	}
+
+	pr_info("dynamic CPUs capacity installed\n");
+}
diff --git a/include/linux/cpufreq.h b/include/linux/cpufreq.h
index 88a4215..9924351 100644
--- a/include/linux/cpufreq.h
+++ b/include/linux/cpufreq.h
@@ -419,6 +419,8 @@ static inline unsigned long cpufreq_scale(unsigned long old, u_int div,
 #endif
 }
 
+void cpufreq_init_cpu_capacity(void);
+
 /*********************************************************************
  *                          CPUFREQ GOVERNORS                        *
  *********************************************************************/
-- 
2.7.0

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH v3 3/6] arm: Enable dynamic CPU capacity initialization
  2016-02-03 11:59 [PATCH v3 0/6] CPUs capacity information for heterogeneous systems Juri Lelli
  2016-02-03 11:59 ` [PATCH v3 1/6] ARM: initialize cpu_scale to its default Juri Lelli
  2016-02-03 11:59 ` [PATCH v3 2/6] drivers/cpufreq: implement init_cpu_capacity_default() Juri Lelli
@ 2016-02-03 11:59 ` Juri Lelli
  2016-02-03 11:59 ` [PATCH v3 4/6] arm64: " Juri Lelli
                   ` (3 subsequent siblings)
  6 siblings, 0 replies; 27+ messages in thread
From: Juri Lelli @ 2016-02-03 11:59 UTC (permalink / raw)
  To: linux-kernel
  Cc: linux-pm, linux-arm-kernel, devicetree, peterz, vincent.guittot,
	robh+dt, mark.rutland, linux, sudeep.holla, lorenzo.pieralisi,
	catalin.marinas, will.deacon, morten.rasmussen, dietmar.eggemann,
	juri.lelli, broonie

Define arch_wants_init_cpu_capacity() to return true; so that
cpufreq_init_cpu_capacity() can go ahead and profile CPU capacities
at boot time.

Cc: Russell King <linux@arm.linux.org.uk>
Cc: Vincent Guittot <vincent.guittot@linaro.org>
Signed-off-by: Juri Lelli <juri.lelli@arm.com>
---
 arch/arm/kernel/topology.c | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/arch/arm/kernel/topology.c b/arch/arm/kernel/topology.c
index c9c87a5..7d7fc2c 100644
--- a/arch/arm/kernel/topology.c
+++ b/arch/arm/kernel/topology.c
@@ -52,6 +52,11 @@ void set_capacity_scale(unsigned int cpu, unsigned long capacity)
 	per_cpu(cpu_scale, cpu) = capacity;
 }
 
+bool arch_wants_init_cpu_capacity(void)
+{
+	return true;
+}
+
 #ifdef CONFIG_OF
 struct cpu_efficiency {
 	const char *compatible;
-- 
2.7.0

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH v3 4/6] arm64: Enable dynamic CPU capacity initialization
  2016-02-03 11:59 [PATCH v3 0/6] CPUs capacity information for heterogeneous systems Juri Lelli
                   ` (2 preceding siblings ...)
  2016-02-03 11:59 ` [PATCH v3 3/6] arm: Enable dynamic CPU capacity initialization Juri Lelli
@ 2016-02-03 11:59 ` Juri Lelli
  2016-02-08 12:28   ` Dietmar Eggemann
  2016-02-03 11:59 ` [PATCH v3 5/6] arm: add sysfs cpu_capacity attribute Juri Lelli
                   ` (2 subsequent siblings)
  6 siblings, 1 reply; 27+ messages in thread
From: Juri Lelli @ 2016-02-03 11:59 UTC (permalink / raw)
  To: linux-kernel
  Cc: linux-pm, linux-arm-kernel, devicetree, peterz, vincent.guittot,
	robh+dt, mark.rutland, linux, sudeep.holla, lorenzo.pieralisi,
	catalin.marinas, will.deacon, morten.rasmussen, dietmar.eggemann,
	juri.lelli, broonie

Define arch_wants_init_cpu_capacity() to return true; so that
cpufreq_init_cpu_capacity() can go ahead and profile CPU capacities
at boot time.

Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Sudeep Holla <sudeep.holla@arm.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Juri Lelli <juri.lelli@arm.com>
---
 arch/arm64/kernel/topology.c | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/arch/arm64/kernel/topology.c b/arch/arm64/kernel/topology.c
index 3b75d63..f2513a6 100644
--- a/arch/arm64/kernel/topology.c
+++ b/arch/arm64/kernel/topology.c
@@ -35,6 +35,11 @@ void set_capacity_scale(unsigned int cpu, unsigned long capacity)
 	per_cpu(cpu_scale, cpu) = capacity;
 }
 
+bool arch_wants_init_cpu_capacity(void)
+{
+	return true;
+}
+
 static int __init get_cpu_for_node(struct device_node *node)
 {
 	struct device_node *cpu_node;
-- 
2.7.0

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH v3 5/6] arm: add sysfs cpu_capacity attribute
  2016-02-03 11:59 [PATCH v3 0/6] CPUs capacity information for heterogeneous systems Juri Lelli
                   ` (3 preceding siblings ...)
  2016-02-03 11:59 ` [PATCH v3 4/6] arm64: " Juri Lelli
@ 2016-02-03 11:59 ` Juri Lelli
  2016-02-03 11:59 ` [PATCH v3 6/6] arm64: " Juri Lelli
  2016-02-08 23:59 ` [PATCH v3 0/6] CPUs capacity information for heterogeneous systems Steve Muckle
  6 siblings, 0 replies; 27+ messages in thread
From: Juri Lelli @ 2016-02-03 11:59 UTC (permalink / raw)
  To: linux-kernel
  Cc: linux-pm, linux-arm-kernel, devicetree, peterz, vincent.guittot,
	robh+dt, mark.rutland, linux, sudeep.holla, lorenzo.pieralisi,
	catalin.marinas, will.deacon, morten.rasmussen, dietmar.eggemann,
	juri.lelli, broonie

Add a sysfs cpu_capacity attribute with which it is possible to read and
write (thus over-writing default values) CPUs capacity. This might be
useful in situation where there is no way to get proper default values
at boot time.

The new attribute shows up as:

 /sys/devices/system/cpu/cpu*/cpu_capacity

Cc: Russell King <linux@arm.linux.org.uk>
Signed-off-by: Juri Lelli <juri.lelli@arm.com>
---
 arch/arm/kernel/topology.c | 68 ++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 68 insertions(+)

diff --git a/arch/arm/kernel/topology.c b/arch/arm/kernel/topology.c
index 7d7fc2c..9ffba65 100644
--- a/arch/arm/kernel/topology.c
+++ b/arch/arm/kernel/topology.c
@@ -57,6 +57,74 @@ bool arch_wants_init_cpu_capacity(void)
 	return true;
 }
 
+#ifdef CONFIG_PROC_SYSCTL
+#include <asm/cpu.h>
+#include <linux/string.h>
+static ssize_t show_cpu_capacity(struct device *dev,
+				 struct device_attribute *attr,
+				 char *buf)
+{
+	struct cpu *cpu = container_of(dev, struct cpu, dev);
+	ssize_t rc;
+	int cpunum = cpu->dev.id;
+	unsigned long capacity = arch_scale_cpu_capacity(NULL, cpunum);
+
+	rc = sprintf(buf, "%lu\n", capacity);
+
+	return rc;
+}
+
+static ssize_t store_cpu_capacity(struct device *dev,
+				  struct device_attribute *attr,
+				  const char *buf,
+				  size_t count)
+{
+	struct cpu *cpu = container_of(dev, struct cpu, dev);
+	int this_cpu = cpu->dev.id, i;
+	unsigned long new_capacity;
+	ssize_t ret;
+
+	if (count) {
+		char *p = (char *) buf;
+
+		ret = kstrtoul(p, 0, &new_capacity);
+		if (ret)
+			return ret;
+		if (new_capacity > SCHED_CAPACITY_SCALE)
+			return -EINVAL;
+
+		for_each_cpu(i, &cpu_topology[this_cpu].core_sibling)
+			set_capacity_scale(i, new_capacity);
+	}
+
+	return count;
+}
+
+static DEVICE_ATTR(cpu_capacity,
+		   0644,
+		   show_cpu_capacity,
+		   store_cpu_capacity);
+
+static int register_cpu_capacity_sysctl(void)
+{
+	int i;
+	struct device *cpu;
+
+	for_each_possible_cpu(i) {
+		cpu = get_cpu_device(i);
+		if (!cpu) {
+			pr_err("%s: too early to get CPU%d device!\n",
+			       __func__, i);
+			continue;
+		}
+		device_create_file(cpu, &dev_attr_cpu_capacity);
+	}
+
+	return 0;
+}
+late_initcall(register_cpu_capacity_sysctl);
+#endif
+
 #ifdef CONFIG_OF
 struct cpu_efficiency {
 	const char *compatible;
-- 
2.7.0

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH v3 6/6] arm64: add sysfs cpu_capacity attribute
  2016-02-03 11:59 [PATCH v3 0/6] CPUs capacity information for heterogeneous systems Juri Lelli
                   ` (4 preceding siblings ...)
  2016-02-03 11:59 ` [PATCH v3 5/6] arm: add sysfs cpu_capacity attribute Juri Lelli
@ 2016-02-03 11:59 ` Juri Lelli
  2016-02-05 17:19   ` Dietmar Eggemann
  2016-02-08 23:59 ` [PATCH v3 0/6] CPUs capacity information for heterogeneous systems Steve Muckle
  6 siblings, 1 reply; 27+ messages in thread
From: Juri Lelli @ 2016-02-03 11:59 UTC (permalink / raw)
  To: linux-kernel
  Cc: linux-pm, linux-arm-kernel, devicetree, peterz, vincent.guittot,
	robh+dt, mark.rutland, linux, sudeep.holla, lorenzo.pieralisi,
	catalin.marinas, will.deacon, morten.rasmussen, dietmar.eggemann,
	juri.lelli, broonie, Mark Brown

Add a sysfs cpu_capacity attribute with which it is possible to read and
write (thus over-writing default values) CPUs capacity. This might be
useful in situation where there is no way to get proper default values
at boot time.

The new attribute shows up as:

 /sys/devices/system/cpu/cpu*/cpu_capacity

Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Mark Brown <broonie@linaro.org>
Cc: Sudeep Holla <sudeep.holla@arm.com>
Signed-off-by: Juri Lelli <juri.lelli@arm.com>
---
 arch/arm64/kernel/topology.c | 68 ++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 68 insertions(+)

diff --git a/arch/arm64/kernel/topology.c b/arch/arm64/kernel/topology.c
index f2513a6..f05cc07 100644
--- a/arch/arm64/kernel/topology.c
+++ b/arch/arm64/kernel/topology.c
@@ -40,6 +40,74 @@ bool arch_wants_init_cpu_capacity(void)
 	return true;
 }
 
+#ifdef CONFIG_PROC_SYSCTL
+#include <asm/cpu.h>
+#include <linux/string.h>
+static ssize_t show_cpu_capacity(struct device *dev,
+				 struct device_attribute *attr,
+				 char *buf)
+{
+	struct cpu *cpu = container_of(dev, struct cpu, dev);
+	ssize_t rc;
+	int cpunum = cpu->dev.id;
+	unsigned long capacity = arch_scale_cpu_capacity(NULL, cpunum);
+
+	rc = sprintf(buf, "%lu\n", capacity);
+
+	return rc;
+}
+
+static ssize_t store_cpu_capacity(struct device *dev,
+				  struct device_attribute *attr,
+				  const char *buf,
+				  size_t count)
+{
+	struct cpu *cpu = container_of(dev, struct cpu, dev);
+	int this_cpu = cpu->dev.id, i;
+	unsigned long new_capacity;
+	ssize_t ret;
+
+	if (count) {
+		char *p = (char *) buf;
+
+		ret = kstrtoul(p, 0, &new_capacity);
+		if (ret)
+			return ret;
+		if (new_capacity > SCHED_CAPACITY_SCALE)
+			return -EINVAL;
+
+		for_each_cpu(i, &cpu_topology[this_cpu].core_sibling)
+			set_capacity_scale(i, new_capacity);
+	}
+
+	return count;
+}
+
+static DEVICE_ATTR(cpu_capacity,
+		   0644,
+		   show_cpu_capacity,
+		   store_cpu_capacity);
+
+static int register_cpu_capacity_sysctl(void)
+{
+	int i;
+	struct device *cpu;
+
+	for_each_possible_cpu(i) {
+		cpu = get_cpu_device(i);
+		if (!cpu) {
+			pr_err("%s: too early to get CPU%d device!\n",
+			       __func__, i);
+			continue;
+		}
+		device_create_file(cpu, &dev_attr_cpu_capacity);
+	}
+
+	return 0;
+}
+late_initcall(register_cpu_capacity_sysctl);
+#endif
+
 static int __init get_cpu_for_node(struct device_node *node)
 {
 	struct device_node *cpu_node;
-- 
2.7.0

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* Re: [PATCH v3 2/6] drivers/cpufreq: implement init_cpu_capacity_default()
  2016-02-03 11:59 ` [PATCH v3 2/6] drivers/cpufreq: implement init_cpu_capacity_default() Juri Lelli
@ 2016-02-03 21:04   ` Vincent Guittot
  2016-02-04  9:36     ` Morten Rasmussen
  0 siblings, 1 reply; 27+ messages in thread
From: Vincent Guittot @ 2016-02-03 21:04 UTC (permalink / raw)
  To: Juri Lelli
  Cc: linux-kernel, linux-pm@vger.kernel.org, LAK,
	devicetree@vger.kernel.org, Peter Zijlstra, Rob Herring,
	Mark Rutland, Russell King - ARM Linux, Sudeep Holla,
	Lorenzo Pieralisi, Catalin Marinas, Will Deacon, Morten Rasmussen,
	Dietmar Eggemann, Mark Brown, Rafael J. Wysocki, Viresh Kumar

On 3 February 2016 at 12:59, Juri Lelli <juri.lelli@arm.com> wrote:
> To get default values for CPUs capacity we profile a simple (bogus)
> integer benchmark on such CPUs; then we normalize results to 1024
> (highest capacity in the system).
>
> Architectures that want this during boot have to define a weak function
> (arch_wants_init_cpu_capacity) to return true.
>
> Also, kernel has to boot with init_cpu_capacity parameter if profiling
> is needed, as it can be expensive and might add ~1 sec to boot time.
>
> Cc: Russell King <linux@arm.linux.org.uk>
> Cc: Catalin Marinas <catalin.marinas@arm.com>
> Cc: Will Deacon <will.deacon@arm.com>
> Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net>
> Cc: Viresh Kumar <viresh.kumar@linaro.org>
> Cc: Vincent Guittot <vincent.guittot@linaro.org>
> Cc: Sudeep Holla <sudeep.holla@arm.com>
> Cc: Mark Rutland <mark.rutland@arm.com>
> Signed-off-by: Juri Lelli <juri.lelli@arm.com>
> ---
>  Changes since v1:
>  - add kernel command line parameter to enable profiling
>  - add define for max trials
>
>  Documentation/kernel-parameters.txt |   4 +
>  arch/arm/kernel/topology.c          |   2 +-
>  arch/arm64/kernel/topology.c        |  12 +++
>  drivers/cpufreq/Makefile            |   2 +-
>  drivers/cpufreq/cpufreq.c           |   1 +
>  drivers/cpufreq/cpufreq_capacity.c  | 174 ++++++++++++++++++++++++++++++++++++
>  include/linux/cpufreq.h             |   2 +
>  7 files changed, 195 insertions(+), 2 deletions(-)
>  create mode 100644 drivers/cpufreq/cpufreq_capacity.c
>
> diff --git a/Documentation/kernel-parameters.txt b/Documentation/kernel-parameters.txt
> index 87d40a7..fad2b89 100644
> --- a/Documentation/kernel-parameters.txt
> +++ b/Documentation/kernel-parameters.txt
> @@ -1570,6 +1570,10 @@ bytes respectively. Such letter suffixes can also be entirely omitted.
>
>         initrd=         [BOOT] Specify the location of the initial ramdisk
>
> +       init_cpu_capacity
> +                       [KNL,ARM] Enables dynamic CPUs capacity benchmarking
> +                       at boot.
> +
>         inport.irq=     [HW] Inport (ATI XL and Microsoft) busmouse driver
>                         Format: <irq>
>
> diff --git a/arch/arm/kernel/topology.c b/arch/arm/kernel/topology.c
> index ec279d1..c9c87a5 100644
> --- a/arch/arm/kernel/topology.c
> +++ b/arch/arm/kernel/topology.c
> @@ -47,7 +47,7 @@ unsigned long arch_scale_cpu_capacity(struct sched_domain *sd, int cpu)
>         return per_cpu(cpu_scale, cpu);
>  }
>
> -static void set_capacity_scale(unsigned int cpu, unsigned long capacity)
> +void set_capacity_scale(unsigned int cpu, unsigned long capacity)
>  {
>         per_cpu(cpu_scale, cpu) = capacity;
>  }
> diff --git a/arch/arm64/kernel/topology.c b/arch/arm64/kernel/topology.c
> index 694f6de..3b75d63 100644
> --- a/arch/arm64/kernel/topology.c
> +++ b/arch/arm64/kernel/topology.c
> @@ -23,6 +23,18 @@
>  #include <asm/cputype.h>
>  #include <asm/topology.h>
>
> +static DEFINE_PER_CPU(unsigned long, cpu_scale) = SCHED_CAPACITY_SCALE;
> +
> +unsigned long arm_arch_scale_cpu_capacity(struct sched_domain *sd, int cpu)
> +{
> +       return per_cpu(cpu_scale, cpu);
> +}
> +
> +void set_capacity_scale(unsigned int cpu, unsigned long capacity)
> +{
> +       per_cpu(cpu_scale, cpu) = capacity;
> +}
> +
>  static int __init get_cpu_for_node(struct device_node *node)
>  {
>         struct device_node *cpu_node;
> diff --git a/drivers/cpufreq/Makefile b/drivers/cpufreq/Makefile
> index 9e63fb1..c4025fd 100644
> --- a/drivers/cpufreq/Makefile
> +++ b/drivers/cpufreq/Makefile
> @@ -1,5 +1,5 @@
>  # CPUfreq core
> -obj-$(CONFIG_CPU_FREQ)                 += cpufreq.o freq_table.o
> +obj-$(CONFIG_CPU_FREQ)                 += cpufreq.o freq_table.o cpufreq_capacity.o

Do you really want to have the calibration of capacity dependent of
cpufreq ? It means that we can't use it without a cpufreq driver.
IMHO, this creates a unnecessary dependency. I understand that you
must ensure that core runs at max fequency if a driver is present but
you should be able to calibrate the capacity if cpufreq is not
available but you have different capacity because micro architecture

>
>  # CPUfreq stats
>  obj-$(CONFIG_CPU_FREQ_STAT)             += cpufreq_stats.o
> diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c
> index e979ec7..b22afe8 100644
> --- a/drivers/cpufreq/cpufreq.c
> +++ b/drivers/cpufreq/cpufreq.c
> @@ -2440,6 +2440,7 @@ int cpufreq_register_driver(struct cpufreq_driver *driver_data)
>         }
>
>         register_hotcpu_notifier(&cpufreq_cpu_notifier);
> +       cpufreq_init_cpu_capacity();
>         pr_debug("driver %s up and running\n", driver_data->name);
>
>  out:
> diff --git a/drivers/cpufreq/cpufreq_capacity.c b/drivers/cpufreq/cpufreq_capacity.c
> new file mode 100644
> index 0000000..e54310b
> --- /dev/null
> +++ b/drivers/cpufreq/cpufreq_capacity.c
> @@ -0,0 +1,174 @@
> +/*
> + * Default CPU capacity calculation for u-arch invariance
> + *
> + * Copyright (C) 2015 ARM Ltd.
> + * Juri Lelli <juri.lelli@arm.com>
> + *
> + * This program is free software; you can redistribute it and/or modify
> + * it under the terms of the GNU General Public License version 2 as
> + * published by the Free Software Foundation.
> + *
> + * This program is distributed "as is" WITHOUT ANY WARRANTY of any
> + * kind, whether express or implied; without even the implied warranty
> + * of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
> + * GNU General Public License for more details.
> + */
> +#include <linux/cpufreq.h>
> +#include <linux/sched.h>
> +
> +#define MAX_TRIALS 10 /* how many times benchmark is executed */
> +static unsigned long long elapsed[NR_CPUS];
> +
> +/*
> + * Don't let compiler optimize following two functions; we want to avoid any
> + * microarchitecture specific optimization that compiler would do and favour
> + * one CPU vs. another. Also, my_int_sqrt is cut-and-paste from
> + * lib/int_sqrt.c.
> + */
> +static unsigned long __attribute__((optimize("O0")))
> +my_int_sqrt(unsigned long x)
> +{
> +       unsigned long b, m, y = 0;
> +
> +       if (x <= 1)
> +               return x;
> +
> +       m = 1UL << (BITS_PER_LONG - 2);
> +       while (m != 0) {
> +               b = y + m;
> +               y >>= 1;
> +
> +               if (x >= b) {
> +                       x -= b;
> +                       y += m;
> +               }
> +               m >>= 2;
> +       }
> +
> +       return y;
> +}
> +
> +static unsigned long __attribute__((optimize("O0")))
> +bogus_bench(void)
> +{
> +       unsigned long i, res;
> +
> +       for (i = 0; i < 100000; i++)
> +               res = my_int_sqrt(i);
> +
> +       return res;
> +}
> +
> +static int run_bogus_benchmark(int cpu)
> +{
> +       int ret, trials = MAX_TRIALS;
> +       u64 begin, end, sample, mean = 0, count = 0;
> +       unsigned long res;
> +
> +       ret = set_cpus_allowed_ptr(current, cpumask_of(cpu));
> +       if (ret) {
> +               pr_warn("%s: failed to set allowed ptr\n", __func__);
> +               return -EINVAL;
> +       }
> +
> +       while (trials--) {
> +               begin = local_clock();
> +               res = bogus_bench();
> +               end = local_clock();
> +               sample = end - begin;
> +
> +               mean = mean * count + sample;
> +               mean = div64_u64(mean, ++count);
> +               pr_debug("%s: cpu=%d begin=%llu end=%llu"
> +                        " sample=%llu mean=%llu count=%llu res=%lu\n",
> +                       __func__, cpu, begin, end, sample,
> +                       mean, count, res);
> +       }
> +       elapsed[cpu] = mean;
> +
> +       ret = set_cpus_allowed_ptr(current, cpu_active_mask);
> +       if (ret) {
> +               pr_warn("%s: failed to set allowed ptr\n", __func__);
> +               return -EINVAL;
> +       }
> +
> +       return 0;
> +}
> +
> +bool __weak arch_wants_init_cpu_capacity(void)
> +{
> +       return false;
> +}
> +
> +void __weak set_capacity_scale(int cpu, unsigned long capacity) { }
> +
> +static __read_mostly bool init_cpu_capacity_enabled;
> +
> +static int __init init_cpu_capacity_setup(char *str)
> +{
> +       init_cpu_capacity_enabled = true;
> +
> +       return 0;
> +}
> +early_param("init_cpu_capacity", init_cpu_capacity_setup);
> +
> +void cpufreq_init_cpu_capacity(void)
> +{
> +       int cpu, fcpu;
> +       unsigned long long elapsed_min = ULLONG_MAX;
> +       unsigned int curr_min, curr_max;
> +       struct cpufreq_policy *policy;
> +
> +       if (!arch_wants_init_cpu_capacity() || !init_cpu_capacity_enabled)
> +               return;
> +
> +       for_each_possible_cpu(cpu) {
> +               policy = cpufreq_cpu_get(cpu);
> +               if (IS_ERR_OR_NULL(policy))
> +                       return;
> +
> +               /*
> +                * We profile only first CPU of each frequency domain;
> +                * and use that value as capacity of every CPU in the domain.
> +                */
> +               fcpu = cpumask_first(policy->related_cpus);
> +               if (cpu != fcpu) {
> +                       elapsed[cpu] = elapsed[fcpu];
> +                       cpufreq_cpu_put(policy);
> +                       continue;
> +               }
> +
> +               down_write(&policy->rwsem);
> +               curr_min = policy->user_policy.min;
> +               curr_max = policy->user_policy.max;
> +               policy->user_policy.min = policy->cpuinfo.max_freq;
> +               policy->user_policy.max = policy->cpuinfo.max_freq;
> +               up_write(&policy->rwsem);
> +               cpufreq_cpu_put(policy);
> +               cpufreq_update_policy(cpu);
> +
> +               run_bogus_benchmark(cpu);
> +               if (elapsed[cpu] < elapsed_min)
> +                       elapsed_min = elapsed[cpu];
> +               pr_debug("%s: cpu=%d elapsed=%llu (min=%llu)\n",
> +                               __func__, cpu, elapsed[cpu], elapsed_min);
> +
> +               policy = cpufreq_cpu_get(cpu);
> +               down_write(&policy->rwsem);
> +               policy->user_policy.min = curr_min;
> +               policy->user_policy.max = curr_max;
> +               up_write(&policy->rwsem);
> +               cpufreq_cpu_put(policy);
> +               cpufreq_update_policy(cpu);
> +       }
> +
> +       for_each_possible_cpu(cpu) {
> +               unsigned long capacity;
> +
> +               capacity = div64_u64((elapsed_min << 10), elapsed[cpu]);
> +               pr_debug("%s: CPU%d capacity=%lu\n", __func__, cpu, capacity);
> +               set_capacity_scale(cpu, capacity);
> +       }
> +
> +       pr_info("dynamic CPUs capacity installed\n");
> +}
> diff --git a/include/linux/cpufreq.h b/include/linux/cpufreq.h
> index 88a4215..9924351 100644
> --- a/include/linux/cpufreq.h
> +++ b/include/linux/cpufreq.h
> @@ -419,6 +419,8 @@ static inline unsigned long cpufreq_scale(unsigned long old, u_int div,
>  #endif
>  }
>
> +void cpufreq_init_cpu_capacity(void);
> +
>  /*********************************************************************
>   *                          CPUFREQ GOVERNORS                        *
>   *********************************************************************/
> --
> 2.7.0
>

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v3 2/6] drivers/cpufreq: implement init_cpu_capacity_default()
  2016-02-03 21:04   ` Vincent Guittot
@ 2016-02-04  9:36     ` Morten Rasmussen
  2016-02-04 12:03       ` Vincent Guittot
  0 siblings, 1 reply; 27+ messages in thread
From: Morten Rasmussen @ 2016-02-04  9:36 UTC (permalink / raw)
  To: Vincent Guittot
  Cc: Juri Lelli, linux-kernel, linux-pm@vger.kernel.org, LAK,
	devicetree@vger.kernel.org, Peter Zijlstra, Rob Herring,
	Mark Rutland, Russell King - ARM Linux, Sudeep Holla,
	Lorenzo Pieralisi, Catalin Marinas, Will Deacon, Dietmar Eggemann,
	Mark Brown, Rafael J. Wysocki, Viresh Kumar

On Wed, Feb 03, 2016 at 10:04:37PM +0100, Vincent Guittot wrote:
> On 3 February 2016 at 12:59, Juri Lelli <juri.lelli@arm.com> wrote:
> > To get default values for CPUs capacity we profile a simple (bogus)
> > integer benchmark on such CPUs; then we normalize results to 1024
> > (highest capacity in the system).
> >
> > Architectures that want this during boot have to define a weak function
> > (arch_wants_init_cpu_capacity) to return true.
> >
> > Also, kernel has to boot with init_cpu_capacity parameter if profiling
> > is needed, as it can be expensive and might add ~1 sec to boot time.
> >
> > Cc: Russell King <linux@arm.linux.org.uk>
> > Cc: Catalin Marinas <catalin.marinas@arm.com>
> > Cc: Will Deacon <will.deacon@arm.com>
> > Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net>
> > Cc: Viresh Kumar <viresh.kumar@linaro.org>
> > Cc: Vincent Guittot <vincent.guittot@linaro.org>
> > Cc: Sudeep Holla <sudeep.holla@arm.com>
> > Cc: Mark Rutland <mark.rutland@arm.com>
> > Signed-off-by: Juri Lelli <juri.lelli@arm.com>
> > ---
> >  Changes since v1:
> >  - add kernel command line parameter to enable profiling
> >  - add define for max trials
> >
> >  Documentation/kernel-parameters.txt |   4 +
> >  arch/arm/kernel/topology.c          |   2 +-
> >  arch/arm64/kernel/topology.c        |  12 +++
> >  drivers/cpufreq/Makefile            |   2 +-
> >  drivers/cpufreq/cpufreq.c           |   1 +
> >  drivers/cpufreq/cpufreq_capacity.c  | 174 ++++++++++++++++++++++++++++++++++++
> >  include/linux/cpufreq.h             |   2 +
> >  7 files changed, 195 insertions(+), 2 deletions(-)
> >  create mode 100644 drivers/cpufreq/cpufreq_capacity.c
> >
> > diff --git a/Documentation/kernel-parameters.txt b/Documentation/kernel-parameters.txt
> > index 87d40a7..fad2b89 100644
> > --- a/Documentation/kernel-parameters.txt
> > +++ b/Documentation/kernel-parameters.txt
> > @@ -1570,6 +1570,10 @@ bytes respectively. Such letter suffixes can also be entirely omitted.
> >
> >         initrd=         [BOOT] Specify the location of the initial ramdisk
> >
> > +       init_cpu_capacity
> > +                       [KNL,ARM] Enables dynamic CPUs capacity benchmarking
> > +                       at boot.
> > +
> >         inport.irq=     [HW] Inport (ATI XL and Microsoft) busmouse driver
> >                         Format: <irq>
> >
> > diff --git a/arch/arm/kernel/topology.c b/arch/arm/kernel/topology.c
> > index ec279d1..c9c87a5 100644
> > --- a/arch/arm/kernel/topology.c
> > +++ b/arch/arm/kernel/topology.c
> > @@ -47,7 +47,7 @@ unsigned long arch_scale_cpu_capacity(struct sched_domain *sd, int cpu)
> >         return per_cpu(cpu_scale, cpu);
> >  }
> >
> > -static void set_capacity_scale(unsigned int cpu, unsigned long capacity)
> > +void set_capacity_scale(unsigned int cpu, unsigned long capacity)
> >  {
> >         per_cpu(cpu_scale, cpu) = capacity;
> >  }
> > diff --git a/arch/arm64/kernel/topology.c b/arch/arm64/kernel/topology.c
> > index 694f6de..3b75d63 100644
> > --- a/arch/arm64/kernel/topology.c
> > +++ b/arch/arm64/kernel/topology.c
> > @@ -23,6 +23,18 @@
> >  #include <asm/cputype.h>
> >  #include <asm/topology.h>
> >
> > +static DEFINE_PER_CPU(unsigned long, cpu_scale) = SCHED_CAPACITY_SCALE;
> > +
> > +unsigned long arm_arch_scale_cpu_capacity(struct sched_domain *sd, int cpu)
> > +{
> > +       return per_cpu(cpu_scale, cpu);
> > +}
> > +
> > +void set_capacity_scale(unsigned int cpu, unsigned long capacity)
> > +{
> > +       per_cpu(cpu_scale, cpu) = capacity;
> > +}
> > +
> >  static int __init get_cpu_for_node(struct device_node *node)
> >  {
> >         struct device_node *cpu_node;
> > diff --git a/drivers/cpufreq/Makefile b/drivers/cpufreq/Makefile
> > index 9e63fb1..c4025fd 100644
> > --- a/drivers/cpufreq/Makefile
> > +++ b/drivers/cpufreq/Makefile
> > @@ -1,5 +1,5 @@
> >  # CPUfreq core
> > -obj-$(CONFIG_CPU_FREQ)                 += cpufreq.o freq_table.o
> > +obj-$(CONFIG_CPU_FREQ)                 += cpufreq.o freq_table.o cpufreq_capacity.o
> 
> Do you really want to have the calibration of capacity dependent of
> cpufreq ? It means that we can't use it without a cpufreq driver.
> IMHO, this creates a unnecessary dependency. I understand that you
> must ensure that core runs at max fequency if a driver is present but
> you should be able to calibrate the capacity if cpufreq is not
> available but you have different capacity because micro architecture

We could remove the dependency on cpufreq, but it would make things more
complicated for systems which do have frequency scaling as we would have
to either:

1) Run the calibration again once cpufreq has been initialized.

2) Know the frequency of all cpus at calibration time so we can
determine the correct micro-archicture difference. This option would
however mean that we have to provide a frequency scaling factor even for
systems that don't have cpufreq.

3) Find a way to do the calibration anyway if cpufreq doesn't
initialize. I'm not sure how that would work though.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v3 2/6] drivers/cpufreq: implement init_cpu_capacity_default()
  2016-02-04  9:36     ` Morten Rasmussen
@ 2016-02-04 12:03       ` Vincent Guittot
  2016-02-04 12:16         ` Juri Lelli
  0 siblings, 1 reply; 27+ messages in thread
From: Vincent Guittot @ 2016-02-04 12:03 UTC (permalink / raw)
  To: Morten Rasmussen
  Cc: Juri Lelli, linux-kernel, linux-pm@vger.kernel.org, LAK,
	devicetree@vger.kernel.org, Peter Zijlstra, Rob Herring,
	Mark Rutland, Russell King - ARM Linux, Sudeep Holla,
	Lorenzo Pieralisi, Catalin Marinas, Will Deacon, Dietmar Eggemann,
	Mark Brown, Rafael J. Wysocki, Viresh Kumar

On 4 February 2016 at 10:36, Morten Rasmussen <morten.rasmussen@arm.com> wrote:
> On Wed, Feb 03, 2016 at 10:04:37PM +0100, Vincent Guittot wrote:
>> On 3 February 2016 at 12:59, Juri Lelli <juri.lelli@arm.com> wrote:

[snip]

>> > diff --git a/drivers/cpufreq/Makefile b/drivers/cpufreq/Makefile
>> > index 9e63fb1..c4025fd 100644
>> > --- a/drivers/cpufreq/Makefile
>> > +++ b/drivers/cpufreq/Makefile
>> > @@ -1,5 +1,5 @@
>> >  # CPUfreq core
>> > -obj-$(CONFIG_CPU_FREQ)                 += cpufreq.o freq_table.o
>> > +obj-$(CONFIG_CPU_FREQ)                 += cpufreq.o freq_table.o cpufreq_capacity.o
>>
>> Do you really want to have the calibration of capacity dependent of
>> cpufreq ? It means that we can't use it without a cpufreq driver.
>> IMHO, this creates a unnecessary dependency. I understand that you
>> must ensure that core runs at max fequency if a driver is present but
>> you should be able to calibrate the capacity if cpufreq is not
>> available but you have different capacity because micro architecture
>
> We could remove the dependency on cpufreq, but it would make things more
> complicated for systems which do have frequency scaling as we would have
> to either:
>
> 1) Run the calibration again once cpufreq has been initialized.

or wait and let time for a driver to initialize and trig the
calibration. If calibration has not been done at the end of the boot,
you can force a calibration. If the cpufeq driver is a module and is
loaded far later for any good or bad reason, we will have to run the
calibration once again but at least the capacity will reflect he
current capacity of the CPUs.
I'm mainly worried that the compilation of the calibration is
dependent of CONFIG_CPU_FREQ not that cpufreq can trig the calibration
sequence

>
> 2) Know the frequency of all cpus at calibration time so we can
> determine the correct micro-archicture difference. This option would
> however mean that we have to provide a frequency scaling factor even for
> systems that don't have cpufreq.
>
> 3) Find a way to do the calibration anyway if cpufreq doesn't
> initialize. I'm not sure how that would work though.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v3 2/6] drivers/cpufreq: implement init_cpu_capacity_default()
  2016-02-04 12:03       ` Vincent Guittot
@ 2016-02-04 12:16         ` Juri Lelli
  2016-02-04 12:35           ` Vincent Guittot
  0 siblings, 1 reply; 27+ messages in thread
From: Juri Lelli @ 2016-02-04 12:16 UTC (permalink / raw)
  To: Vincent Guittot
  Cc: Morten Rasmussen, linux-kernel, linux-pm@vger.kernel.org, LAK,
	devicetree@vger.kernel.org, Peter Zijlstra, Rob Herring,
	Mark Rutland, Russell King - ARM Linux, Sudeep Holla,
	Lorenzo Pieralisi, Catalin Marinas, Will Deacon, Dietmar Eggemann,
	Mark Brown, Rafael J. Wysocki, Viresh Kumar

Hi Vincent,

On 04/02/16 13:03, Vincent Guittot wrote:
> On 4 February 2016 at 10:36, Morten Rasmussen <morten.rasmussen@arm.com> wrote:
> > On Wed, Feb 03, 2016 at 10:04:37PM +0100, Vincent Guittot wrote:
> >> On 3 February 2016 at 12:59, Juri Lelli <juri.lelli@arm.com> wrote:
> 
> [snip]
> 
> >> > diff --git a/drivers/cpufreq/Makefile b/drivers/cpufreq/Makefile
> >> > index 9e63fb1..c4025fd 100644
> >> > --- a/drivers/cpufreq/Makefile
> >> > +++ b/drivers/cpufreq/Makefile
> >> > @@ -1,5 +1,5 @@
> >> >  # CPUfreq core
> >> > -obj-$(CONFIG_CPU_FREQ)                 += cpufreq.o freq_table.o
> >> > +obj-$(CONFIG_CPU_FREQ)                 += cpufreq.o freq_table.o cpufreq_capacity.o
> >>
> >> Do you really want to have the calibration of capacity dependent of
> >> cpufreq ? It means that we can't use it without a cpufreq driver.
> >> IMHO, this creates a unnecessary dependency. I understand that you
> >> must ensure that core runs at max fequency if a driver is present but
> >> you should be able to calibrate the capacity if cpufreq is not
> >> available but you have different capacity because micro architecture
> >
> > We could remove the dependency on cpufreq, but it would make things more
> > complicated for systems which do have frequency scaling as we would have
> > to either:
> >
> > 1) Run the calibration again once cpufreq has been initialized.
> 
> or wait and let time for a driver to initialize and trig the
> calibration. If calibration has not been done at the end of the boot,
> you can force a calibration. If the cpufeq driver is a module and is
> loaded far later for any good or bad reason, we will have to run the
> calibration once again but at least the capacity will reflect he
> current capacity of the CPUs.
> I'm mainly worried that the compilation of the calibration is
> dependent of CONFIG_CPU_FREQ not that cpufreq can trig the calibration
> sequence
> 

Yes, I guess we can make this work in some way. Out of curiosity,
though, are out there heterogenous platforms that don't use cpufreq?
I mean, I wouldn't add code and complexity from start, if there are not
good reasons to do so.

Thanks,

- Juri

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v3 2/6] drivers/cpufreq: implement init_cpu_capacity_default()
  2016-02-04 12:16         ` Juri Lelli
@ 2016-02-04 12:35           ` Vincent Guittot
  2016-02-04 14:13             ` Juri Lelli
  0 siblings, 1 reply; 27+ messages in thread
From: Vincent Guittot @ 2016-02-04 12:35 UTC (permalink / raw)
  To: Juri Lelli
  Cc: Morten Rasmussen, linux-kernel, linux-pm@vger.kernel.org, LAK,
	devicetree@vger.kernel.org, Peter Zijlstra, Rob Herring,
	Mark Rutland, Russell King - ARM Linux, Sudeep Holla,
	Lorenzo Pieralisi, Catalin Marinas, Will Deacon, Dietmar Eggemann,
	Mark Brown, Rafael J. Wysocki, Viresh Kumar

On 4 February 2016 at 13:16, Juri Lelli <juri.lelli@arm.com> wrote:
> Hi Vincent,
>
> On 04/02/16 13:03, Vincent Guittot wrote:
>> On 4 February 2016 at 10:36, Morten Rasmussen <morten.rasmussen@arm.com> wrote:
>> > On Wed, Feb 03, 2016 at 10:04:37PM +0100, Vincent Guittot wrote:
>> >> On 3 February 2016 at 12:59, Juri Lelli <juri.lelli@arm.com> wrote:
>>
>> [snip]
>>
>> >> > diff --git a/drivers/cpufreq/Makefile b/drivers/cpufreq/Makefile
>> >> > index 9e63fb1..c4025fd 100644
>> >> > --- a/drivers/cpufreq/Makefile
>> >> > +++ b/drivers/cpufreq/Makefile
>> >> > @@ -1,5 +1,5 @@
>> >> >  # CPUfreq core
>> >> > -obj-$(CONFIG_CPU_FREQ)                 += cpufreq.o freq_table.o
>> >> > +obj-$(CONFIG_CPU_FREQ)                 += cpufreq.o freq_table.o cpufreq_capacity.o
>> >>
>> >> Do you really want to have the calibration of capacity dependent of
>> >> cpufreq ? It means that we can't use it without a cpufreq driver.
>> >> IMHO, this creates a unnecessary dependency. I understand that you
>> >> must ensure that core runs at max fequency if a driver is present but
>> >> you should be able to calibrate the capacity if cpufreq is not
>> >> available but you have different capacity because micro architecture
>> >
>> > We could remove the dependency on cpufreq, but it would make things more
>> > complicated for systems which do have frequency scaling as we would have
>> > to either:
>> >
>> > 1) Run the calibration again once cpufreq has been initialized.
>>
>> or wait and let time for a driver to initialize and trig the
>> calibration. If calibration has not been done at the end of the boot,
>> you can force a calibration. If the cpufeq driver is a module and is
>> loaded far later for any good or bad reason, we will have to run the
>> calibration once again but at least the capacity will reflect he
>> current capacity of the CPUs.
>> I'm mainly worried that the compilation of the calibration is
>> dependent of CONFIG_CPU_FREQ not that cpufreq can trig the calibration
>> sequence
>>
>
> Yes, I guess we can make this work in some way. Out of curiosity,
> though, are out there heterogenous platforms that don't use cpufreq?

At least, you can find several heterogeneous platforms without OPP
table for CPUs in the kernel. That's probably a temporary situation
but which can become a permanent one. It means that we can't calibrate
the CPUs for these platforms.

Thanks,
Vincent

> I mean, I wouldn't add code and complexity from start, if there are not
> good reasons to do so.
>
> Thanks,
>
> - Juri

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v3 2/6] drivers/cpufreq: implement init_cpu_capacity_default()
  2016-02-04 12:35           ` Vincent Guittot
@ 2016-02-04 14:13             ` Juri Lelli
  2016-02-04 15:44               ` Vincent Guittot
  0 siblings, 1 reply; 27+ messages in thread
From: Juri Lelli @ 2016-02-04 14:13 UTC (permalink / raw)
  To: Vincent Guittot
  Cc: Morten Rasmussen, linux-kernel, linux-pm@vger.kernel.org, LAK,
	devicetree@vger.kernel.org, Peter Zijlstra, Rob Herring,
	Mark Rutland, Russell King - ARM Linux, Sudeep Holla,
	Lorenzo Pieralisi, Catalin Marinas, Will Deacon, Dietmar Eggemann,
	Mark Brown, Rafael J. Wysocki, Viresh Kumar

On 04/02/16 13:35, Vincent Guittot wrote:
> On 4 February 2016 at 13:16, Juri Lelli <juri.lelli@arm.com> wrote:
> > Hi Vincent,
> >
> > On 04/02/16 13:03, Vincent Guittot wrote:
> >> On 4 February 2016 at 10:36, Morten Rasmussen <morten.rasmussen@arm.com> wrote:
> >> > On Wed, Feb 03, 2016 at 10:04:37PM +0100, Vincent Guittot wrote:
> >> >> On 3 February 2016 at 12:59, Juri Lelli <juri.lelli@arm.com> wrote:
> >>
> >> [snip]
> >>
> >> >> > diff --git a/drivers/cpufreq/Makefile b/drivers/cpufreq/Makefile
> >> >> > index 9e63fb1..c4025fd 100644
> >> >> > --- a/drivers/cpufreq/Makefile
> >> >> > +++ b/drivers/cpufreq/Makefile
> >> >> > @@ -1,5 +1,5 @@
> >> >> >  # CPUfreq core
> >> >> > -obj-$(CONFIG_CPU_FREQ)                 += cpufreq.o freq_table.o
> >> >> > +obj-$(CONFIG_CPU_FREQ)                 += cpufreq.o freq_table.o cpufreq_capacity.o
> >> >>
> >> >> Do you really want to have the calibration of capacity dependent of
> >> >> cpufreq ? It means that we can't use it without a cpufreq driver.
> >> >> IMHO, this creates a unnecessary dependency. I understand that you
> >> >> must ensure that core runs at max fequency if a driver is present but
> >> >> you should be able to calibrate the capacity if cpufreq is not
> >> >> available but you have different capacity because micro architecture
> >> >
> >> > We could remove the dependency on cpufreq, but it would make things more
> >> > complicated for systems which do have frequency scaling as we would have
> >> > to either:
> >> >
> >> > 1) Run the calibration again once cpufreq has been initialized.
> >>
> >> or wait and let time for a driver to initialize and trig the
> >> calibration. If calibration has not been done at the end of the boot,
> >> you can force a calibration. If the cpufeq driver is a module and is
> >> loaded far later for any good or bad reason, we will have to run the
> >> calibration once again but at least the capacity will reflect he
> >> current capacity of the CPUs.
> >> I'm mainly worried that the compilation of the calibration is
> >> dependent of CONFIG_CPU_FREQ not that cpufreq can trig the calibration
> >> sequence
> >>
> >
> > Yes, I guess we can make this work in some way. Out of curiosity,
> > though, are out there heterogenous platforms that don't use cpufreq?
> 
> At least, you can find several heterogeneous platforms without OPP
> table for CPUs in the kernel. That's probably a temporary situation
> but which can become a permanent one. It means that we can't calibrate
> the CPUs for these platforms.
> 

Sorry, can you make some examples so that I'm sure I understand what you
are referring to?

Anyway, don't these platform still make use of cpufreq (even if without
an OPP table) so that we can still control policy->max and min?

Thanks,

- Juri

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v3 2/6] drivers/cpufreq: implement init_cpu_capacity_default()
  2016-02-04 14:13             ` Juri Lelli
@ 2016-02-04 15:44               ` Vincent Guittot
  2016-02-04 15:46                 ` Vincent Guittot
  0 siblings, 1 reply; 27+ messages in thread
From: Vincent Guittot @ 2016-02-04 15:44 UTC (permalink / raw)
  To: Juri Lelli
  Cc: Morten Rasmussen, linux-kernel, linux-pm@vger.kernel.org, LAK,
	devicetree@vger.kernel.org, Peter Zijlstra, Rob Herring,
	Mark Rutland, Russell King - ARM Linux, Sudeep Holla,
	Lorenzo Pieralisi, Catalin Marinas, Will Deacon, Dietmar Eggemann,
	Mark Brown, Rafael J. Wysocki, Viresh Kumar

On 4 February 2016 at 15:13, Juri Lelli <juri.lelli@arm.com> wrote:
> On 04/02/16 13:35, Vincent Guittot wrote:
>> On 4 February 2016 at 13:16, Juri Lelli <juri.lelli@arm.com> wrote:
>> > Hi Vincent,
>> >
>> > On 04/02/16 13:03, Vincent Guittot wrote:
>> >> On 4 February 2016 at 10:36, Morten Rasmussen <morten.rasmussen@arm.com> wrote:
>> >> > On Wed, Feb 03, 2016 at 10:04:37PM +0100, Vincent Guittot wrote:
>> >> >> On 3 February 2016 at 12:59, Juri Lelli <juri.lelli@arm.com> wrote:
>> >>
>> >> [snip]
>> >>
>> >> >> > diff --git a/drivers/cpufreq/Makefile b/drivers/cpufreq/Makefile
>> >> >> > index 9e63fb1..c4025fd 100644
>> >> >> > --- a/drivers/cpufreq/Makefile
>> >> >> > +++ b/drivers/cpufreq/Makefile
>> >> >> > @@ -1,5 +1,5 @@
>> >> >> >  # CPUfreq core
>> >> >> > -obj-$(CONFIG_CPU_FREQ)                 += cpufreq.o freq_table.o
>> >> >> > +obj-$(CONFIG_CPU_FREQ)                 += cpufreq.o freq_table.o cpufreq_capacity.o
>> >> >>
>> >> >> Do you really want to have the calibration of capacity dependent of
>> >> >> cpufreq ? It means that we can't use it without a cpufreq driver.
>> >> >> IMHO, this creates a unnecessary dependency. I understand that you
>> >> >> must ensure that core runs at max fequency if a driver is present but
>> >> >> you should be able to calibrate the capacity if cpufreq is not
>> >> >> available but you have different capacity because micro architecture
>> >> >
>> >> > We could remove the dependency on cpufreq, but it would make things more
>> >> > complicated for systems which do have frequency scaling as we would have
>> >> > to either:
>> >> >
>> >> > 1) Run the calibration again once cpufreq has been initialized.
>> >>
>> >> or wait and let time for a driver to initialize and trig the
>> >> calibration. If calibration has not been done at the end of the boot,
>> >> you can force a calibration. If the cpufeq driver is a module and is
>> >> loaded far later for any good or bad reason, we will have to run the
>> >> calibration once again but at least the capacity will reflect he
>> >> current capacity of the CPUs.
>> >> I'm mainly worried that the compilation of the calibration is
>> >> dependent of CONFIG_CPU_FREQ not that cpufreq can trig the calibration
>> >> sequence
>> >>
>> >
>> > Yes, I guess we can make this work in some way. Out of curiosity,
>> > though, are out there heterogenous platforms that don't use cpufreq?
>>
>> At least, you can find several heterogeneous platforms without OPP
>> table for CPUs in the kernel. That's probably a temporary situation
>> but which can become a permanent one. It means that we can't calibrate
>> the CPUs for these platforms.
>>
>
> Sorry, can you make some examples so that I'm sure I understand what you
> are referring to?

As an example, the uniphier arm64 Soc doesn't have a cpufreq driver so far

>
> Anyway, don't these platform still make use of cpufreq (even if without
> an OPP table) so that we can still control policy->max and min?

AFAICT, They don't have a dedicated cpufreq driver.

More generally speaking, it can take time before having

>
> Thanks,
>
> - Juri

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v3 2/6] drivers/cpufreq: implement init_cpu_capacity_default()
  2016-02-04 15:44               ` Vincent Guittot
@ 2016-02-04 15:46                 ` Vincent Guittot
  2016-02-05  9:30                   ` Juri Lelli
  0 siblings, 1 reply; 27+ messages in thread
From: Vincent Guittot @ 2016-02-04 15:46 UTC (permalink / raw)
  To: Juri Lelli
  Cc: Morten Rasmussen, linux-kernel, linux-pm@vger.kernel.org, LAK,
	devicetree@vger.kernel.org, Peter Zijlstra, Rob Herring,
	Mark Rutland, Russell King - ARM Linux, Sudeep Holla,
	Lorenzo Pieralisi, Catalin Marinas, Will Deacon, Dietmar Eggemann,
	Mark Brown, Rafael J. Wysocki, Viresh Kumar

On 4 February 2016 at 16:44, Vincent Guittot <vincent.guittot@linaro.org> wrote:
> On 4 February 2016 at 15:13, Juri Lelli <juri.lelli@arm.com> wrote:
>> On 04/02/16 13:35, Vincent Guittot wrote:
>>> On 4 February 2016 at 13:16, Juri Lelli <juri.lelli@arm.com> wrote:
>>> > Hi Vincent,
>>> >
>>> > On 04/02/16 13:03, Vincent Guittot wrote:
>>> >> On 4 February 2016 at 10:36, Morten Rasmussen <morten.rasmussen@arm.com> wrote:
>>> >> > On Wed, Feb 03, 2016 at 10:04:37PM +0100, Vincent Guittot wrote:
>>> >> >> On 3 February 2016 at 12:59, Juri Lelli <juri.lelli@arm.com> wrote:
>>> >>
>>> >> [snip]
>>> >>
>>> >> >> > diff --git a/drivers/cpufreq/Makefile b/drivers/cpufreq/Makefile
>>> >> >> > index 9e63fb1..c4025fd 100644
>>> >> >> > --- a/drivers/cpufreq/Makefile
>>> >> >> > +++ b/drivers/cpufreq/Makefile
>>> >> >> > @@ -1,5 +1,5 @@
>>> >> >> >  # CPUfreq core
>>> >> >> > -obj-$(CONFIG_CPU_FREQ)                 += cpufreq.o freq_table.o
>>> >> >> > +obj-$(CONFIG_CPU_FREQ)                 += cpufreq.o freq_table.o cpufreq_capacity.o
>>> >> >>
>>> >> >> Do you really want to have the calibration of capacity dependent of
>>> >> >> cpufreq ? It means that we can't use it without a cpufreq driver.
>>> >> >> IMHO, this creates a unnecessary dependency. I understand that you
>>> >> >> must ensure that core runs at max fequency if a driver is present but
>>> >> >> you should be able to calibrate the capacity if cpufreq is not
>>> >> >> available but you have different capacity because micro architecture
>>> >> >
>>> >> > We could remove the dependency on cpufreq, but it would make things more
>>> >> > complicated for systems which do have frequency scaling as we would have
>>> >> > to either:
>>> >> >
>>> >> > 1) Run the calibration again once cpufreq has been initialized.
>>> >>
>>> >> or wait and let time for a driver to initialize and trig the
>>> >> calibration. If calibration has not been done at the end of the boot,
>>> >> you can force a calibration. If the cpufeq driver is a module and is
>>> >> loaded far later for any good or bad reason, we will have to run the
>>> >> calibration once again but at least the capacity will reflect he
>>> >> current capacity of the CPUs.
>>> >> I'm mainly worried that the compilation of the calibration is
>>> >> dependent of CONFIG_CPU_FREQ not that cpufreq can trig the calibration
>>> >> sequence
>>> >>
>>> >
>>> > Yes, I guess we can make this work in some way. Out of curiosity,
>>> > though, are out there heterogenous platforms that don't use cpufreq?
>>>
>>> At least, you can find several heterogeneous platforms without OPP
>>> table for CPUs in the kernel. That's probably a temporary situation
>>> but which can become a permanent one. It means that we can't calibrate
>>> the CPUs for these platforms.
>>>
>>
>> Sorry, can you make some examples so that I'm sure I understand what you
>> are referring to?
>
> As an example, the uniphier arm64 Soc doesn't have a cpufreq driver so far
>
>>
>> Anyway, don't these platform still make use of cpufreq (even if without
>> an OPP table) so that we can still control policy->max and min?
>
> AFAICT, They don't have a dedicated cpufreq driver.
>
> More generally speaking, it can take time before having

email sent before the ne d of the sentence ...

More generally speaking, it can take time before having a cpufreq
driver whereas we want to run and test scheduler behavior of these
heterogenous platform

Thanks,
Vincent
>
>>
>> Thanks,
>>
>> - Juri

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v3 2/6] drivers/cpufreq: implement init_cpu_capacity_default()
  2016-02-04 15:46                 ` Vincent Guittot
@ 2016-02-05  9:30                   ` Juri Lelli
  2016-02-09 15:54                     ` Dietmar Eggemann
  0 siblings, 1 reply; 27+ messages in thread
From: Juri Lelli @ 2016-02-05  9:30 UTC (permalink / raw)
  To: Vincent Guittot
  Cc: Morten Rasmussen, linux-kernel, linux-pm@vger.kernel.org, LAK,
	devicetree@vger.kernel.org, Peter Zijlstra, Rob Herring,
	Mark Rutland, Russell King - ARM Linux, Sudeep Holla,
	Lorenzo Pieralisi, Catalin Marinas, Will Deacon, Dietmar Eggemann,
	Mark Brown, Rafael J. Wysocki, Viresh Kumar

On 04/02/16 16:46, Vincent Guittot wrote:
> On 4 February 2016 at 16:44, Vincent Guittot <vincent.guittot@linaro.org> wrote:
> > On 4 February 2016 at 15:13, Juri Lelli <juri.lelli@arm.com> wrote:
> >> On 04/02/16 13:35, Vincent Guittot wrote:
> >>> On 4 February 2016 at 13:16, Juri Lelli <juri.lelli@arm.com> wrote:
> >>> > Hi Vincent,
> >>> >
> >>> > On 04/02/16 13:03, Vincent Guittot wrote:
> >>> >> On 4 February 2016 at 10:36, Morten Rasmussen <morten.rasmussen@arm.com> wrote:
> >>> >> > On Wed, Feb 03, 2016 at 10:04:37PM +0100, Vincent Guittot wrote:
> >>> >> >> On 3 February 2016 at 12:59, Juri Lelli <juri.lelli@arm.com> wrote:
> >>> >>
> >>> >> [snip]
> >>> >>
> >>> >> >> > diff --git a/drivers/cpufreq/Makefile b/drivers/cpufreq/Makefile
> >>> >> >> > index 9e63fb1..c4025fd 100644
> >>> >> >> > --- a/drivers/cpufreq/Makefile
> >>> >> >> > +++ b/drivers/cpufreq/Makefile
> >>> >> >> > @@ -1,5 +1,5 @@
> >>> >> >> >  # CPUfreq core
> >>> >> >> > -obj-$(CONFIG_CPU_FREQ)                 += cpufreq.o freq_table.o
> >>> >> >> > +obj-$(CONFIG_CPU_FREQ)                 += cpufreq.o freq_table.o cpufreq_capacity.o
> >>> >> >>
> >>> >> >> Do you really want to have the calibration of capacity dependent of
> >>> >> >> cpufreq ? It means that we can't use it without a cpufreq driver.
> >>> >> >> IMHO, this creates a unnecessary dependency. I understand that you
> >>> >> >> must ensure that core runs at max fequency if a driver is present but
> >>> >> >> you should be able to calibrate the capacity if cpufreq is not
> >>> >> >> available but you have different capacity because micro architecture
> >>> >> >
> >>> >> > We could remove the dependency on cpufreq, but it would make things more
> >>> >> > complicated for systems which do have frequency scaling as we would have
> >>> >> > to either:
> >>> >> >
> >>> >> > 1) Run the calibration again once cpufreq has been initialized.
> >>> >>
> >>> >> or wait and let time for a driver to initialize and trig the
> >>> >> calibration. If calibration has not been done at the end of the boot,
> >>> >> you can force a calibration. If the cpufeq driver is a module and is
> >>> >> loaded far later for any good or bad reason, we will have to run the
> >>> >> calibration once again but at least the capacity will reflect he
> >>> >> current capacity of the CPUs.
> >>> >> I'm mainly worried that the compilation of the calibration is
> >>> >> dependent of CONFIG_CPU_FREQ not that cpufreq can trig the calibration
> >>> >> sequence
> >>> >>
> >>> >
> >>> > Yes, I guess we can make this work in some way. Out of curiosity,
> >>> > though, are out there heterogenous platforms that don't use cpufreq?
> >>>
> >>> At least, you can find several heterogeneous platforms without OPP
> >>> table for CPUs in the kernel. That's probably a temporary situation
> >>> but which can become a permanent one. It means that we can't calibrate
> >>> the CPUs for these platforms.
> >>>
> >>
> >> Sorry, can you make some examples so that I'm sure I understand what you
> >> are referring to?
> >
> > As an example, the uniphier arm64 Soc doesn't have a cpufreq driver so far
> >
> >>
> >> Anyway, don't these platform still make use of cpufreq (even if without
> >> an OPP table) so that we can still control policy->max and min?
> >
> > AFAICT, They don't have a dedicated cpufreq driver.
> >
> > More generally speaking, it can take time before having
> 
> email sent before the ne d of the sentence ...
> 
> More generally speaking, it can take time before having a cpufreq
> driver whereas we want to run and test scheduler behavior of these
> heterogenous platform
> 

I'm not familiar with this platform, but from what you are saying and
what I could find online, it looks like full Linux support is not
finished yet. Can we consider that as still in development? And if we
can do that, maybe is fair enough that we use the sysfs interface to
play with that platform until support is complete.

Do others have any opinion on this point?

Thanks,

- Juri

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v3 6/6] arm64: add sysfs cpu_capacity attribute
  2016-02-03 11:59 ` [PATCH v3 6/6] arm64: " Juri Lelli
@ 2016-02-05 17:19   ` Dietmar Eggemann
  2016-02-05 17:49     ` Juri Lelli
  0 siblings, 1 reply; 27+ messages in thread
From: Dietmar Eggemann @ 2016-02-05 17:19 UTC (permalink / raw)
  To: Juri Lelli, linux-kernel
  Cc: linux-pm, linux-arm-kernel, devicetree, peterz, vincent.guittot,
	robh+dt, mark.rutland, linux, sudeep.holla, lorenzo.pieralisi,
	catalin.marinas, will.deacon, morten.rasmussen, broonie,
	Mark Brown

Hi Juri,

On 03/02/16 11:59, Juri Lelli wrote:
> Add a sysfs cpu_capacity attribute with which it is possible to read and
> write (thus over-writing default values) CPUs capacity. This might be
> useful in situation where there is no way to get proper default values
> at boot time.
> 
> The new attribute shows up as:
> 
>  /sys/devices/system/cpu/cpu*/cpu_capacity
> 
> Cc: Catalin Marinas <catalin.marinas@arm.com>
> Cc: Will Deacon <will.deacon@arm.com>
> Cc: Mark Brown <broonie@linaro.org>
> Cc: Sudeep Holla <sudeep.holla@arm.com>
> Signed-off-by: Juri Lelli <juri.lelli@arm.com>
> ---
>  arch/arm64/kernel/topology.c | 68 ++++++++++++++++++++++++++++++++++++++++++++
>  1 file changed, 68 insertions(+)
> 
> diff --git a/arch/arm64/kernel/topology.c b/arch/arm64/kernel/topology.c
> index f2513a6..f05cc07 100644
> --- a/arch/arm64/kernel/topology.c
> +++ b/arch/arm64/kernel/topology.c
> @@ -40,6 +40,74 @@ bool arch_wants_init_cpu_capacity(void)
>  	return true;
>  }
>  
> +#ifdef CONFIG_PROC_SYSCTL
> +#include <asm/cpu.h>
> +#include <linux/string.h>
> +static ssize_t show_cpu_capacity(struct device *dev,
> +				 struct device_attribute *attr,
> +				 char *buf)
> +{
> +	struct cpu *cpu = container_of(dev, struct cpu, dev);
> +	ssize_t rc;
> +	int cpunum = cpu->dev.id;
> +	unsigned long capacity = arch_scale_cpu_capacity(NULL, cpunum);

Gives me an implicit declaration of function ‘arch_scale_cpu_capacity’
error [without the commit fbc899610e1a ("arm64: Update
arch_scale_cpu_capacity() to reflect change to define") on your
git://linux-arm.org/linux-jl.git upstream/default_caps_v3 branch].

Why don't you just return cpu_scale

@@ -49,10 +49,8 @@ static ssize_t show_cpu_capacity(struct device *dev,
 {
        struct cpu *cpu = container_of(dev, struct cpu, dev);
        ssize_t rc;
-       int cpunum = cpu->dev.id;
-       unsigned long capacity = arch_scale_cpu_capacity(NULL, cpunum);

-       rc = sprintf(buf, "%lu\n", capacity);
+       rc = sprintf(buf, "%lu\n", per_cpu(cpu_scale, cpu->dev.id));

        return rc;
 }

to get rid of this dependency?

-- Dietmar

[...]

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v3 6/6] arm64: add sysfs cpu_capacity attribute
  2016-02-05 17:19   ` Dietmar Eggemann
@ 2016-02-05 17:49     ` Juri Lelli
  0 siblings, 0 replies; 27+ messages in thread
From: Juri Lelli @ 2016-02-05 17:49 UTC (permalink / raw)
  To: Dietmar Eggemann
  Cc: linux-kernel, linux-pm, linux-arm-kernel, devicetree, peterz,
	vincent.guittot, robh+dt, mark.rutland, linux, sudeep.holla,
	lorenzo.pieralisi, catalin.marinas, will.deacon, morten.rasmussen,
	broonie, Mark Brown

Hi,

On 05/02/16 17:19, Dietmar Eggemann wrote:
> Hi Juri,
> 
> On 03/02/16 11:59, Juri Lelli wrote:
> > Add a sysfs cpu_capacity attribute with which it is possible to read and
> > write (thus over-writing default values) CPUs capacity. This might be
> > useful in situation where there is no way to get proper default values
> > at boot time.
> > 
> > The new attribute shows up as:
> > 
> >  /sys/devices/system/cpu/cpu*/cpu_capacity
> > 
> > Cc: Catalin Marinas <catalin.marinas@arm.com>
> > Cc: Will Deacon <will.deacon@arm.com>
> > Cc: Mark Brown <broonie@linaro.org>
> > Cc: Sudeep Holla <sudeep.holla@arm.com>
> > Signed-off-by: Juri Lelli <juri.lelli@arm.com>
> > ---
> >  arch/arm64/kernel/topology.c | 68 ++++++++++++++++++++++++++++++++++++++++++++
> >  1 file changed, 68 insertions(+)
> > 
> > diff --git a/arch/arm64/kernel/topology.c b/arch/arm64/kernel/topology.c
> > index f2513a6..f05cc07 100644
> > --- a/arch/arm64/kernel/topology.c
> > +++ b/arch/arm64/kernel/topology.c
> > @@ -40,6 +40,74 @@ bool arch_wants_init_cpu_capacity(void)
> >  	return true;
> >  }
> >  
> > +#ifdef CONFIG_PROC_SYSCTL
> > +#include <asm/cpu.h>
> > +#include <linux/string.h>
> > +static ssize_t show_cpu_capacity(struct device *dev,
> > +				 struct device_attribute *attr,
> > +				 char *buf)
> > +{
> > +	struct cpu *cpu = container_of(dev, struct cpu, dev);
> > +	ssize_t rc;
> > +	int cpunum = cpu->dev.id;
> > +	unsigned long capacity = arch_scale_cpu_capacity(NULL, cpunum);
> 
> Gives me an implicit declaration of function ‘arch_scale_cpu_capacity’
> error [without the commit fbc899610e1a ("arm64: Update
> arch_scale_cpu_capacity() to reflect change to define") on your
> git://linux-arm.org/linux-jl.git upstream/default_caps_v3 branch].
> 
> Why don't you just return cpu_scale
> 
> @@ -49,10 +49,8 @@ static ssize_t show_cpu_capacity(struct device *dev,
>  {
>         struct cpu *cpu = container_of(dev, struct cpu, dev);
>         ssize_t rc;
> -       int cpunum = cpu->dev.id;
> -       unsigned long capacity = arch_scale_cpu_capacity(NULL, cpunum);
> 
> -       rc = sprintf(buf, "%lu\n", capacity);
> +       rc = sprintf(buf, "%lu\n", per_cpu(cpu_scale, cpu->dev.id));
> 
>         return rc;
>  }
> 
> to get rid of this dependency?
> 

Right! I'll fix this in the next version.

Thanks,

- Juri

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v3 4/6] arm64: Enable dynamic CPU capacity initialization
  2016-02-03 11:59 ` [PATCH v3 4/6] arm64: " Juri Lelli
@ 2016-02-08 12:28   ` Dietmar Eggemann
  2016-02-08 13:13     ` Mark Brown
  0 siblings, 1 reply; 27+ messages in thread
From: Dietmar Eggemann @ 2016-02-08 12:28 UTC (permalink / raw)
  To: Juri Lelli, linux-kernel
  Cc: linux-pm, linux-arm-kernel, devicetree, peterz, vincent.guittot,
	robh+dt, mark.rutland, linux, sudeep.holla, lorenzo.pieralisi,
	catalin.marinas, will.deacon, morten.rasmussen, broonie

On 03/02/16 11:59, Juri Lelli wrote:
> Define arch_wants_init_cpu_capacity() to return true; so that
> cpufreq_init_cpu_capacity() can go ahead and profile CPU capacities
> at boot time.

[...]

>  
> +bool arch_wants_init_cpu_capacity(void)
> +{
> +	return true;

Isn't this a little bit too simple? Not every ARM/ARM64 platform is a
heterogeneous one.

You could add code to compare the cpu node 'compatible' properties
(required) and only return true if they differ, which would let you
detect uarch based heterogeneity.

In case of max. frequency based heterogeneity (clusters consisting of
same cpu types but running at different max. frequency), you're at the
mercy of cpu node 'clock-frequency' properties (optional).
We might argue that for these platforms, providing cpu node
'clock-frequency' properties is necessary.
The 'struct cpu_efficiency table_efficiency[]' based approach in ARM
already faces this problem.

> +}
> +
>  static int __init get_cpu_for_node(struct device_node *node)
>  {
>  	struct device_node *cpu_node;
> 

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v3 4/6] arm64: Enable dynamic CPU capacity initialization
  2016-02-08 12:28   ` Dietmar Eggemann
@ 2016-02-08 13:13     ` Mark Brown
  2016-02-08 13:41       ` Dietmar Eggemann
  0 siblings, 1 reply; 27+ messages in thread
From: Mark Brown @ 2016-02-08 13:13 UTC (permalink / raw)
  To: Dietmar Eggemann
  Cc: Juri Lelli, linux-kernel, linux-pm, linux-arm-kernel, devicetree,
	peterz, vincent.guittot, robh+dt, mark.rutland, linux,
	sudeep.holla, lorenzo.pieralisi, catalin.marinas, will.deacon,
	morten.rasmussen

[-- Attachment #1: Type: text/plain, Size: 507 bytes --]

On Mon, Feb 08, 2016 at 12:28:39PM +0000, Dietmar Eggemann wrote:
> On 03/02/16 11:59, Juri Lelli wrote:

> > +bool arch_wants_init_cpu_capacity(void)
> > +{
> > +	return true;

> Isn't this a little bit too simple? Not every ARM/ARM64 platform is a
> heterogeneous one.

Does it matter?  Is there any problem with doing the callibration and
having it say that all the CPUs performs very similarly?  My
understanding was that this was simply saying it was worth checking to
see if there was some asymmetry.

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 473 bytes --]

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v3 4/6] arm64: Enable dynamic CPU capacity initialization
  2016-02-08 13:13     ` Mark Brown
@ 2016-02-08 13:41       ` Dietmar Eggemann
  0 siblings, 0 replies; 27+ messages in thread
From: Dietmar Eggemann @ 2016-02-08 13:41 UTC (permalink / raw)
  To: Mark Brown
  Cc: Juri Lelli, linux-kernel, linux-pm, linux-arm-kernel, devicetree,
	peterz, vincent.guittot, robh+dt, mark.rutland, linux,
	sudeep.holla, lorenzo.pieralisi, catalin.marinas, will.deacon,
	morten.rasmussen

On 08/02/16 13:13, Mark Brown wrote:
> On Mon, Feb 08, 2016 at 12:28:39PM +0000, Dietmar Eggemann wrote:
>> On 03/02/16 11:59, Juri Lelli wrote:
> 
>>> +bool arch_wants_init_cpu_capacity(void)
>>> +{
>>> +	return true;
> 
>> Isn't this a little bit too simple? Not every ARM/ARM64 platform is a
>> heterogeneous one.
> 
> Does it matter?  Is there any problem with doing the callibration and
> having it say that all the CPUs performs very similarly?  My
> understanding was that this was simply saying it was worth checking to
> see if there was some asymmetry.
> 

No, the calibration would work on any platform. I can see your point,
you want to have this feature not depend on dt.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v3 0/6] CPUs capacity information for heterogeneous systems
  2016-02-03 11:59 [PATCH v3 0/6] CPUs capacity information for heterogeneous systems Juri Lelli
                   ` (5 preceding siblings ...)
  2016-02-03 11:59 ` [PATCH v3 6/6] arm64: " Juri Lelli
@ 2016-02-08 23:59 ` Steve Muckle
  2016-02-09 10:37   ` Juri Lelli
  6 siblings, 1 reply; 27+ messages in thread
From: Steve Muckle @ 2016-02-08 23:59 UTC (permalink / raw)
  To: Juri Lelli, linux-kernel
  Cc: linux-pm, linux-arm-kernel, devicetree, peterz, vincent.guittot,
	robh+dt, mark.rutland, linux, sudeep.holla, lorenzo.pieralisi,
	catalin.marinas, will.deacon, morten.rasmussen, dietmar.eggemann,
	broonie

Hi Juri,

On 02/03/2016 03:59 AM, Juri Lelli wrote:
>  v1: DT + sysfs [1]
> 
>  v2: Dynamic profiling at boot [2]
> 
> Third version of this patchset proposes what seems to be the solution we agreed
> upon (see [2] for reference) to the problem of how do we init CPUs original
> capacity: we run a bogus benchmark (stealing int_sqrt from lib/ we run that in
> a loop to perform some integer computation, better benchmarks are welcome)
> on the first cpu of each frequency domain (assuming no u-arch differences
> inside domains), measure time to complete a fixed number of iterations and then
> normalize results to SCHED_CAPACITY_SCALE (1024). This time around we also
> added a boot time parameter to disable profiling at boot (as it can be time
> consuming) and sysfs attributes with which default values can be overwritten.
> The proposed solution is basically putting together bits of v1 and v2 that are
> considered valuable and acceptable for mainline.

I'm still concerned that there's no way to obtain optimal boot time on a
heterogeneous system. Either the dynamic benchmarking is enabled, adding
1 sec, or the benchmarking is skipped, and task distribution on the
heterogeneous CPUs is determined by the platform's CPU numbering and
chance, potentially impacting performance nondeterministically until
userspace sets the correct capacity values via sysfs.

I believe you tested the impact on boot time of using equal capacity
values and saw little difference. I'm wondering though, what was the CPU
numbering on that target?

thanks,
Steve

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v3 0/6] CPUs capacity information for heterogeneous systems
  2016-02-08 23:59 ` [PATCH v3 0/6] CPUs capacity information for heterogeneous systems Steve Muckle
@ 2016-02-09 10:37   ` Juri Lelli
  2016-02-09 17:30     ` Steve Muckle
  0 siblings, 1 reply; 27+ messages in thread
From: Juri Lelli @ 2016-02-09 10:37 UTC (permalink / raw)
  To: Steve Muckle
  Cc: linux-kernel, linux-pm, linux-arm-kernel, devicetree, peterz,
	vincent.guittot, robh+dt, mark.rutland, linux, sudeep.holla,
	lorenzo.pieralisi, catalin.marinas, will.deacon, morten.rasmussen,
	dietmar.eggemann, broonie

Hi Steve,

On 08/02/16 15:59, Steve Muckle wrote:
> Hi Juri,
> 
> On 02/03/2016 03:59 AM, Juri Lelli wrote:
> >  v1: DT + sysfs [1]
> > 
> >  v2: Dynamic profiling at boot [2]
> > 
> > Third version of this patchset proposes what seems to be the solution we agreed
> > upon (see [2] for reference) to the problem of how do we init CPUs original
> > capacity: we run a bogus benchmark (stealing int_sqrt from lib/ we run that in
> > a loop to perform some integer computation, better benchmarks are welcome)
> > on the first cpu of each frequency domain (assuming no u-arch differences
> > inside domains), measure time to complete a fixed number of iterations and then
> > normalize results to SCHED_CAPACITY_SCALE (1024). This time around we also
> > added a boot time parameter to disable profiling at boot (as it can be time
> > consuming) and sysfs attributes with which default values can be overwritten.
> > The proposed solution is basically putting together bits of v1 and v2 that are
> > considered valuable and acceptable for mainline.
> 
> I'm still concerned that there's no way to obtain optimal boot time on a
> heterogeneous system. Either the dynamic benchmarking is enabled, adding
> 1 sec, or the benchmarking is skipped, and task distribution on the
> heterogeneous CPUs is determined by the platform's CPU numbering and
> chance, potentially impacting performance nondeterministically until
> userspace sets the correct capacity values via sysfs.
> 
> I believe you tested the impact on boot time of using equal capacity
> values and saw little difference. I'm wondering though, what was the CPU
> numbering on that target?
> 

My targets (Juno and TC2) had big cluster on 1,2 and little on the
remaining cpus. Why do you think this might matter?

Anyway, IMHO boot time performance is not what we are targeting here, so
I wouldn't be too worried about this particular point.

Best,

- Juri

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v3 2/6] drivers/cpufreq: implement init_cpu_capacity_default()
  2016-02-05  9:30                   ` Juri Lelli
@ 2016-02-09 15:54                     ` Dietmar Eggemann
  2016-02-10 14:25                       ` Juri Lelli
  0 siblings, 1 reply; 27+ messages in thread
From: Dietmar Eggemann @ 2016-02-09 15:54 UTC (permalink / raw)
  To: Juri Lelli, Vincent Guittot
  Cc: Morten Rasmussen, linux-kernel, linux-pm@vger.kernel.org, LAK,
	devicetree@vger.kernel.org, Peter Zijlstra, Rob Herring,
	Mark Rutland, Russell King - ARM Linux, Sudeep Holla,
	Lorenzo Pieralisi, Catalin Marinas, Will Deacon, Mark Brown,
	Rafael J. Wysocki, Viresh Kumar

On 05/02/16 09:30, Juri Lelli wrote:
> On 04/02/16 16:46, Vincent Guittot wrote:
>> On 4 February 2016 at 16:44, Vincent Guittot <vincent.guittot@linaro.org> wrote:
>>> On 4 February 2016 at 15:13, Juri Lelli <juri.lelli@arm.com> wrote:
>>>> On 04/02/16 13:35, Vincent Guittot wrote:
>>>>> On 4 February 2016 at 13:16, Juri Lelli <juri.lelli@arm.com> wrote:
>>>>>> Hi Vincent,
>>>>>>
>>>>>> On 04/02/16 13:03, Vincent Guittot wrote:
>>>>>>> On 4 February 2016 at 10:36, Morten Rasmussen <morten.rasmussen@arm.com> wrote:
>>>>>>>> On Wed, Feb 03, 2016 at 10:04:37PM +0100, Vincent Guittot wrote:
>>>>>>>>> On 3 February 2016 at 12:59, Juri Lelli <juri.lelli@arm.com> wrote:

[...]

>>>
>>> AFAICT, They don't have a dedicated cpufreq driver.
>>>
>>> More generally speaking, it can take time before having
>>
>> email sent before the ne d of the sentence ...
>>
>> More generally speaking, it can take time before having a cpufreq
>> driver whereas we want to run and test scheduler behavior of these
>> heterogenous platform
>>
> 
> I'm not familiar with this platform, but from what you are saying and
> what I could find online, it looks like full Linux support is not
> finished yet. Can we consider that as still in development? And if we
> can do that, maybe is fair enough that we use the sysfs interface to
> play with that platform until support is complete.
> 
> Do others have any opinion on this point?

IMHO, the solution should work for all of heterogeneous systems, (a) w/
cpufreq and driver, (b) w/ cpufreq and no driver loaded (yet) or (c) w/o
cpufreq.

That means that you can't put the benchmarking only into
cpufreq_register_driver() and rely on cpufreq policy topology.

Maybe you could do this for (b) and (c) inside an initcall and use
topology_core_cpumask() to figure out which cpu to profile?

This would then happen w/ the cpu frequency set by the fw.

But this then has to be synchronized somehow with the benchmarking
approach in cpufreq_register_driver().

[...]

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v3 0/6] CPUs capacity information for heterogeneous systems
  2016-02-09 10:37   ` Juri Lelli
@ 2016-02-09 17:30     ` Steve Muckle
  2016-02-09 17:40       ` Juri Lelli
  0 siblings, 1 reply; 27+ messages in thread
From: Steve Muckle @ 2016-02-09 17:30 UTC (permalink / raw)
  To: Juri Lelli
  Cc: linux-kernel, linux-pm, linux-arm-kernel, devicetree, peterz,
	vincent.guittot, robh+dt, mark.rutland, linux, sudeep.holla,
	lorenzo.pieralisi, catalin.marinas, will.deacon, morten.rasmussen,
	dietmar.eggemann, broonie

On 02/09/2016 02:37 AM, Juri Lelli wrote:
>> I'm still concerned that there's no way to obtain optimal boot time on a
>> > heterogeneous system. Either the dynamic benchmarking is enabled, adding
>> > 1 sec, or the benchmarking is skipped, and task distribution on the
>> > heterogeneous CPUs is determined by the platform's CPU numbering and
>> > chance, potentially impacting performance nondeterministically until
>> > userspace sets the correct capacity values via sysfs.
>> > 
>> > I believe you tested the impact on boot time of using equal capacity
>> > values and saw little difference. I'm wondering though, what was the CPU
>> > numbering on that target?
>> > 
>
> My targets (Juno and TC2) had big cluster on 1,2 and little on the
> remaining cpus. Why do you think this might matter?

There's a natural bias in the scheduler AFAIK towards lower-numbered
CPUs since they are typically scanned in numerically ascending order. So
when all capacities are initially defaulted to be the same I think
you'll be more likely to use the lower numbered CPUs.

I'd be curious what the performance penalty is on a b.L system where the
lowest numbered CPUs are small. I don't have such a target but maybe
it's possible to compare booting just with bigs vs just with littles, at
least until userspace intializes and a script can bring up the others,
which is the same point at which capacities could be properly set. That
would give something of an upper bound.

> Anyway, IMHO boot time performance is not what we are targeting here, so
> I wouldn't be too worried about this particular point.

It may not be the most important thing but it is a factor worth
considering - as mentioned earlier there are applications where boot
time is critical such as automotive. It seems unfortunate that actual
performance may be left on the table due to (IMO anyway) a tenuous
concern over DT semantics. But it looks like that may just be my
position :/ .

thanks,
Steve

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v3 0/6] CPUs capacity information for heterogeneous systems
  2016-02-09 17:30     ` Steve Muckle
@ 2016-02-09 17:40       ` Juri Lelli
  0 siblings, 0 replies; 27+ messages in thread
From: Juri Lelli @ 2016-02-09 17:40 UTC (permalink / raw)
  To: Steve Muckle
  Cc: linux-kernel, linux-pm, linux-arm-kernel, devicetree, peterz,
	vincent.guittot, robh+dt, mark.rutland, linux, sudeep.holla,
	lorenzo.pieralisi, catalin.marinas, will.deacon, morten.rasmussen,
	dietmar.eggemann, broonie

On 09/02/16 09:30, Steve Muckle wrote:
> On 02/09/2016 02:37 AM, Juri Lelli wrote:
> >> I'm still concerned that there's no way to obtain optimal boot time on a
> >> > heterogeneous system. Either the dynamic benchmarking is enabled, adding
> >> > 1 sec, or the benchmarking is skipped, and task distribution on the
> >> > heterogeneous CPUs is determined by the platform's CPU numbering and
> >> > chance, potentially impacting performance nondeterministically until
> >> > userspace sets the correct capacity values via sysfs.
> >> > 
> >> > I believe you tested the impact on boot time of using equal capacity
> >> > values and saw little difference. I'm wondering though, what was the CPU
> >> > numbering on that target?
> >> > 
> >
> > My targets (Juno and TC2) had big cluster on 1,2 and little on the
> > remaining cpus. Why do you think this might matter?
> 
> There's a natural bias in the scheduler AFAIK towards lower-numbered
> CPUs since they are typically scanned in numerically ascending order. So
> when all capacities are initially defaulted to be the same I think
> you'll be more likely to use the lower numbered CPUs.
> 
> I'd be curious what the performance penalty is on a b.L system where the
> lowest numbered CPUs are small. I don't have such a target but maybe
> it's possible to compare booting just with bigs vs just with littles, at
> least until userspace intializes and a script can bring up the others,
> which is the same point at which capacities could be properly set. That
> would give something of an upper bound.
> 

Yeah. I could run some tests along this line. It should give us a rough
idea about how much we are leaving on the table.

> > Anyway, IMHO boot time performance is not what we are targeting here, so
> > I wouldn't be too worried about this particular point.
> 
> It may not be the most important thing but it is a factor worth
> considering - as mentioned earlier there are applications where boot
> time is critical such as automotive. It seems unfortunate that actual
> performance may be left on the table due to (IMO anyway) a tenuous
> concern over DT semantics. But it looks like that may just be my
> position :/ .
> 

Don't get me wrong Steve. I agree with you and I tried to defend the DT
approach as much as I could. I still think that it is the best solution
(much more cleaner and simpler), but it seems that there is no way we
can make it happen. Or has this discussion we are having changed things
in the meantime? :)

Thanks,

- Juri

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v3 2/6] drivers/cpufreq: implement init_cpu_capacity_default()
  2016-02-09 15:54                     ` Dietmar Eggemann
@ 2016-02-10 14:25                       ` Juri Lelli
  0 siblings, 0 replies; 27+ messages in thread
From: Juri Lelli @ 2016-02-10 14:25 UTC (permalink / raw)
  To: Dietmar Eggemann
  Cc: Vincent Guittot, Morten Rasmussen, linux-kernel,
	linux-pm@vger.kernel.org, LAK, devicetree@vger.kernel.org,
	Peter Zijlstra, Rob Herring, Mark Rutland,
	Russell King - ARM Linux, Sudeep Holla, Lorenzo Pieralisi,
	Catalin Marinas, Will Deacon, Mark Brown, Rafael J. Wysocki,
	Viresh Kumar

On 09/02/16 15:54, Dietmar Eggemann wrote:
> On 05/02/16 09:30, Juri Lelli wrote:
> > On 04/02/16 16:46, Vincent Guittot wrote:
> >> On 4 February 2016 at 16:44, Vincent Guittot <vincent.guittot@linaro.org> wrote:
> >>> On 4 February 2016 at 15:13, Juri Lelli <juri.lelli@arm.com> wrote:
> >>>> On 04/02/16 13:35, Vincent Guittot wrote:
> >>>>> On 4 February 2016 at 13:16, Juri Lelli <juri.lelli@arm.com> wrote:
> >>>>>> Hi Vincent,
> >>>>>>
> >>>>>> On 04/02/16 13:03, Vincent Guittot wrote:
> >>>>>>> On 4 February 2016 at 10:36, Morten Rasmussen <morten.rasmussen@arm.com> wrote:
> >>>>>>>> On Wed, Feb 03, 2016 at 10:04:37PM +0100, Vincent Guittot wrote:
> >>>>>>>>> On 3 February 2016 at 12:59, Juri Lelli <juri.lelli@arm.com> wrote:
> 
> [...]
> 
> >>>
> >>> AFAICT, They don't have a dedicated cpufreq driver.
> >>>
> >>> More generally speaking, it can take time before having
> >>
> >> email sent before the ne d of the sentence ...
> >>
> >> More generally speaking, it can take time before having a cpufreq
> >> driver whereas we want to run and test scheduler behavior of these
> >> heterogenous platform
> >>
> > 
> > I'm not familiar with this platform, but from what you are saying and
> > what I could find online, it looks like full Linux support is not
> > finished yet. Can we consider that as still in development? And if we
> > can do that, maybe is fair enough that we use the sysfs interface to
> > play with that platform until support is complete.
> > 
> > Do others have any opinion on this point?
> 
> IMHO, the solution should work for all of heterogeneous systems, (a) w/
> cpufreq and driver, (b) w/ cpufreq and no driver loaded (yet) or (c) w/o
> cpufreq.
> 
> That means that you can't put the benchmarking only into
> cpufreq_register_driver() and rely on cpufreq policy topology.
> 
> Maybe you could do this for (b) and (c) inside an initcall and use
> topology_core_cpumask() to figure out which cpu to profile?
> 
> This would then happen w/ the cpu frequency set by the fw.
> 
> But this then has to be synchronized somehow with the benchmarking
> approach in cpufreq_register_driver().
> 

Yes, I guess the tricky situation is the mixed one, when we might have
the benchmarking executing in a late_initcall when cpufreq kicks in. Or
we can also end up doing the benchmarking twice if cpufreq finishes
initialization after the late_initcall has been executed.

I'm not sure how this can be solved in a clean way. Ideally we would
start benchmarking in the late_initcall (at the frequency set by fw) and
then bailout if cpufreq kicks in. That should work also for the cpufreq
driver as a module case. But, we could do the same thing twice if the
late_initcall is faster or gets executed before cpufreq starts to be
initialized.

Any suggestion anyone? :)

Best,

- Juri

^ permalink raw reply	[flat|nested] 27+ messages in thread

end of thread, other threads:[~2016-02-10 14:24 UTC | newest]

Thread overview: 27+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-02-03 11:59 [PATCH v3 0/6] CPUs capacity information for heterogeneous systems Juri Lelli
2016-02-03 11:59 ` [PATCH v3 1/6] ARM: initialize cpu_scale to its default Juri Lelli
2016-02-03 11:59 ` [PATCH v3 2/6] drivers/cpufreq: implement init_cpu_capacity_default() Juri Lelli
2016-02-03 21:04   ` Vincent Guittot
2016-02-04  9:36     ` Morten Rasmussen
2016-02-04 12:03       ` Vincent Guittot
2016-02-04 12:16         ` Juri Lelli
2016-02-04 12:35           ` Vincent Guittot
2016-02-04 14:13             ` Juri Lelli
2016-02-04 15:44               ` Vincent Guittot
2016-02-04 15:46                 ` Vincent Guittot
2016-02-05  9:30                   ` Juri Lelli
2016-02-09 15:54                     ` Dietmar Eggemann
2016-02-10 14:25                       ` Juri Lelli
2016-02-03 11:59 ` [PATCH v3 3/6] arm: Enable dynamic CPU capacity initialization Juri Lelli
2016-02-03 11:59 ` [PATCH v3 4/6] arm64: " Juri Lelli
2016-02-08 12:28   ` Dietmar Eggemann
2016-02-08 13:13     ` Mark Brown
2016-02-08 13:41       ` Dietmar Eggemann
2016-02-03 11:59 ` [PATCH v3 5/6] arm: add sysfs cpu_capacity attribute Juri Lelli
2016-02-03 11:59 ` [PATCH v3 6/6] arm64: " Juri Lelli
2016-02-05 17:19   ` Dietmar Eggemann
2016-02-05 17:49     ` Juri Lelli
2016-02-08 23:59 ` [PATCH v3 0/6] CPUs capacity information for heterogeneous systems Steve Muckle
2016-02-09 10:37   ` Juri Lelli
2016-02-09 17:30     ` Steve Muckle
2016-02-09 17:40       ` Juri Lelli

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).