LKML Archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/4] intel: thermal: hfi: Add debugfs files for tuning
@ 2024-04-29 23:41 Ricardo Neri
  2024-04-29 23:41 ` [PATCH 1/4] thermal: intel: hfi: Rename HFI_UPDATE_INTERVAL Ricardo Neri
                   ` (3 more replies)
  0 siblings, 4 replies; 10+ messages in thread
From: Ricardo Neri @ 2024-04-29 23:41 UTC (permalink / raw
  To: Rafael J. Wysocki
  Cc: Zhang Rui, Srinivas Pandruvada, Len Brown, Stanislaw Gruszka,
	linux-pm, linux-kernel, Ricardo Neri

Hi,

HFI uses thermal netlink to relay updated CPU capabilities to user space.
The delay between an HFI interrupt and its corresponding thermal netlink
event as well as the number of capabilities updated per event (its payload)
have been so far hard-coded in intel_hfi.c.

These hard-coded values may not suit all hardware configurations or
listeners. If the delay is too long, much of the information from
consecutive hardware updates will be lost. If the delay is too short,
listeners may be overwhelmed with notifications.

The payload size may cause unnecessary overhead if it is too small, as
single HFI update is broken up into several thermal netlink events.

Listeners are better placed to decide the value of these parameters. They
know how quickly they can act on notifications and know better how to
handle them.

Add a debugfs interface to let listeners experiment with and tune these
parameters.

These patches apply cleanly on top of the testing branch of Rafael's
linux-pm.

Thanks and BR,
Ricardo

Ricardo Neri (4):
  thermal: intel: hfi: Rename HFI_UPDATE_INTERVAL
  thermal: intel: hfi: Tune the HFI thermal netlink event delay via
    debugfs
  thermal: intel: hfi: Rename HFI_MAX_THERM_NOTIFY_COUNT
  thermal: intel: hfi: Tune the number of CPU capabilities per netlink
    event

 drivers/thermal/intel/intel_hfi.c | 117 +++++++++++++++++++++++++++---
 1 file changed, 108 insertions(+), 9 deletions(-)

-- 
2.34.1


^ permalink raw reply	[flat|nested] 10+ messages in thread

* [PATCH 1/4] thermal: intel: hfi: Rename HFI_UPDATE_INTERVAL
  2024-04-29 23:41 [PATCH 0/4] intel: thermal: hfi: Add debugfs files for tuning Ricardo Neri
@ 2024-04-29 23:41 ` Ricardo Neri
  2024-04-30  4:50   ` Zhang, Rui
  2024-04-29 23:41 ` [PATCH 2/4] thermal: intel: hfi: Tune the HFI thermal netlink event delay via debugfs Ricardo Neri
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 10+ messages in thread
From: Ricardo Neri @ 2024-04-29 23:41 UTC (permalink / raw
  To: Rafael J. Wysocki
  Cc: Zhang Rui, Srinivas Pandruvada, Len Brown, Stanislaw Gruszka,
	linux-pm, linux-kernel, Ricardo Neri

The name of the constant HFI_UPDATE_INTERVAL is misleading. It is not a
periodic interval at which HFI updates are processed. It is the delay in
the processing of an HFI update after the arrival of an HFI interrupt.

Signed-off-by: Ricardo Neri <ricardo.neri-calderon@linux.intel.com>
---
Cc: Len Brown <len.brown@intel.com>
Cc: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Cc: Stanislaw Gruszka <stanislaw.gruszka@linux.intel.com>
Cc: Zhang Rui <rui.zhang@intel.com>
Cc: linux-pm@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
---
 drivers/thermal/intel/intel_hfi.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/thermal/intel/intel_hfi.c b/drivers/thermal/intel/intel_hfi.c
index fbc7f0cd83d7..e2b82d71ab6b 100644
--- a/drivers/thermal/intel/intel_hfi.c
+++ b/drivers/thermal/intel/intel_hfi.c
@@ -166,7 +166,7 @@ static struct hfi_features hfi_features;
 static DEFINE_MUTEX(hfi_instance_lock);
 
 static struct workqueue_struct *hfi_updates_wq;
-#define HFI_UPDATE_INTERVAL		HZ
+#define HFI_UPDATE_DELAY		HZ
 #define HFI_MAX_THERM_NOTIFY_COUNT	16
 
 static void get_hfi_caps(struct hfi_instance *hfi_instance,
@@ -322,7 +322,7 @@ void intel_hfi_process_event(__u64 pkg_therm_status_msr_val)
 	raw_spin_unlock(&hfi_instance->event_lock);
 
 	queue_delayed_work(hfi_updates_wq, &hfi_instance->update_work,
-			   HFI_UPDATE_INTERVAL);
+			   HFI_UPDATE_DELAY);
 }
 
 static void init_hfi_cpu_index(struct hfi_cpu_info *info)
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH 2/4] thermal: intel: hfi: Tune the HFI thermal netlink event delay via debugfs
  2024-04-29 23:41 [PATCH 0/4] intel: thermal: hfi: Add debugfs files for tuning Ricardo Neri
  2024-04-29 23:41 ` [PATCH 1/4] thermal: intel: hfi: Rename HFI_UPDATE_INTERVAL Ricardo Neri
@ 2024-04-29 23:41 ` Ricardo Neri
  2024-04-30  4:52   ` Zhang, Rui
  2024-04-29 23:41 ` [PATCH 3/4] thermal: intel: hfi: Rename HFI_MAX_THERM_NOTIFY_COUNT Ricardo Neri
  2024-04-29 23:41 ` [PATCH 4/4] thermal: intel: hfi: Tune the number of CPU capabilities per netlink event Ricardo Neri
  3 siblings, 1 reply; 10+ messages in thread
From: Ricardo Neri @ 2024-04-29 23:41 UTC (permalink / raw
  To: Rafael J. Wysocki
  Cc: Zhang Rui, Srinivas Pandruvada, Len Brown, Stanislaw Gruszka,
	linux-pm, linux-kernel, Ricardo Neri

THe delay between an HFI interrupt and its corresponding thermal netlink
event has so far been hard-coded to CONFIG_HZ jiffies. This may not suit
the needs of all hardware configurations or listeners of events.

If the update delay is too long, much of the information of consecutive
hardware updates will be lost as the HFI delayed workqueue posts a new
thermal netlink event only when there are no previous pending events. If
the delay is too short, multiple, events may overwhelm listeners.

Listeners are better placed to determine the delay of events. They know how
quickly they can act on them effectively. They may also want to experiment
with different values.

Start a debugfs interface to tune the notification delay.

Keep things simple and do not add extra locking or memory barriers. This
may result in the HFI interrupt ocassionally queueing work using stale
delay values, if at all. This should not be a problem: the debugfs file
will update the delay value atomically, we do not expect users to update
the delay value frequently, and the delayed workqueue operates in jiffies
resolution.

Suggested-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Signed-off-by: Ricardo Neri <ricardo.neri-calderon@linux.intel.com>
---
Cc: Len Brown <len.brown@intel.com>
Cc: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Cc: Stanislaw Gruszka <stanislaw.gruszka@linux.intel.com>
Cc: Zhang Rui <rui.zhang@intel.com>
Cc: linux-pm@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
---
 drivers/thermal/intel/intel_hfi.c | 77 ++++++++++++++++++++++++++++++-
 1 file changed, 76 insertions(+), 1 deletion(-)

diff --git a/drivers/thermal/intel/intel_hfi.c b/drivers/thermal/intel/intel_hfi.c
index e2b82d71ab6b..24d708268c68 100644
--- a/drivers/thermal/intel/intel_hfi.c
+++ b/drivers/thermal/intel/intel_hfi.c
@@ -43,6 +43,10 @@
 
 #include <asm/msr.h>
 
+#ifdef CONFIG_DEBUG_FS
+#include <linux/debugfs.h>
+#endif
+
 #include "intel_hfi.h"
 #include "thermal_interrupt.h"
 
@@ -169,6 +173,70 @@ static struct workqueue_struct *hfi_updates_wq;
 #define HFI_UPDATE_DELAY		HZ
 #define HFI_MAX_THERM_NOTIFY_COUNT	16
 
+/* Keep this variable 8-byte aligned to get atomic accesses. */
+static unsigned long hfi_update_delay = HFI_UPDATE_DELAY;
+
+#ifdef CONFIG_DEBUG_FS
+static int hfi_update_delay_get(void *data, u64 *val)
+{
+	mutex_lock(&hfi_instance_lock);
+	*val = jiffies_to_msecs(hfi_update_delay);
+	mutex_unlock(&hfi_instance_lock);
+	return 0;
+}
+
+static int hfi_update_delay_set(void *data, u64 val)
+{
+	/*
+	 * The mutex only serializes access to the debugfs file.
+	 *
+	 * hfi_process_event() loads @hfi_update_delay from interrupt context.
+	 * We could have serialized accesses with a spinlock or a memory barrier.
+	 * But this is a debug feature, the store of @hfi_update_delay is
+	 * atomic, and will seldom change. A few loads of @hfi_update_delay may
+	 * see stale values but the updated value will be seen eventually.
+	 */
+	mutex_lock(&hfi_instance_lock);
+	hfi_update_delay = msecs_to_jiffies(val);
+	mutex_unlock(&hfi_instance_lock);
+	return 0;
+}
+
+DEFINE_DEBUGFS_ATTRIBUTE(hfi_update_delay_fops, hfi_update_delay_get,
+			 hfi_update_delay_set, "%llu\n");
+
+static struct dentry *hfi_debugfs_dir;
+
+static void hfi_debugfs_unregister(void)
+{
+	debugfs_remove_recursive(hfi_debugfs_dir);
+	hfi_debugfs_dir = NULL;
+}
+
+static void hfi_debugfs_register(void)
+{
+	struct dentry *f;
+
+	hfi_debugfs_dir = debugfs_create_dir("intel_hfi", NULL);
+	if (!hfi_debugfs_dir)
+		return;
+
+	f = debugfs_create_file("update_delay_ms", 0644, hfi_debugfs_dir,
+				NULL, &hfi_update_delay_fops);
+	if (!f)
+		goto err;
+
+	return;
+err:
+	hfi_debugfs_unregister();
+}
+
+#else
+static void hfi_debugfs_register(void)
+{
+}
+#endif /* CONFIG_DEBUG_FS */
+
 static void get_hfi_caps(struct hfi_instance *hfi_instance,
 			 struct thermal_genl_cpu_caps *cpu_caps)
 {
@@ -321,8 +389,13 @@ void intel_hfi_process_event(__u64 pkg_therm_status_msr_val)
 	raw_spin_unlock(&hfi_instance->table_lock);
 	raw_spin_unlock(&hfi_instance->event_lock);
 
+	/*
+	 * debugfs may atomically store @hfi_update_delay without locking. The
+	 * updated value may not be immediately observed. See note in
+	 * hfi_update_delay_set().
+	 */
 	queue_delayed_work(hfi_updates_wq, &hfi_instance->update_work,
-			   HFI_UPDATE_DELAY);
+			   hfi_update_delay);
 }
 
 static void init_hfi_cpu_index(struct hfi_cpu_info *info)
@@ -708,6 +781,8 @@ void __init intel_hfi_init(void)
 
 	register_syscore_ops(&hfi_pm_ops);
 
+	hfi_debugfs_register();
+
 	return;
 
 err_nl_notif:
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH 3/4] thermal: intel: hfi: Rename HFI_MAX_THERM_NOTIFY_COUNT
  2024-04-29 23:41 [PATCH 0/4] intel: thermal: hfi: Add debugfs files for tuning Ricardo Neri
  2024-04-29 23:41 ` [PATCH 1/4] thermal: intel: hfi: Rename HFI_UPDATE_INTERVAL Ricardo Neri
  2024-04-29 23:41 ` [PATCH 2/4] thermal: intel: hfi: Tune the HFI thermal netlink event delay via debugfs Ricardo Neri
@ 2024-04-29 23:41 ` Ricardo Neri
  2024-04-29 23:41 ` [PATCH 4/4] thermal: intel: hfi: Tune the number of CPU capabilities per netlink event Ricardo Neri
  3 siblings, 0 replies; 10+ messages in thread
From: Ricardo Neri @ 2024-04-29 23:41 UTC (permalink / raw
  To: Rafael J. Wysocki
  Cc: Zhang Rui, Srinivas Pandruvada, Len Brown, Stanislaw Gruszka,
	linux-pm, linux-kernel, Ricardo Neri

When processing a hardware update, HFI generates as many thermal netlink
events as needed to relay all the updated CPU capabilities to user space.
The constant HFI_MAX_THERM_NOTIFY_COUNT is the number of CPU capabilities
updated per each of those events.

Give this constant a more descriptive name.

Signed-off-by: Ricardo Neri <ricardo.neri-calderon@linux.intel.com>
---
Cc: Len Brown <len.brown@intel.com>
Cc: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Cc: Stanislaw Gruszka <stanislaw.gruszka@linux.intel.com>
Cc: Zhang Rui <rui.zhang@intel.com>
Cc: linux-pm@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
---
 drivers/thermal/intel/intel_hfi.c | 12 ++++++------
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/drivers/thermal/intel/intel_hfi.c b/drivers/thermal/intel/intel_hfi.c
index 24d708268c68..d6d3544509fc 100644
--- a/drivers/thermal/intel/intel_hfi.c
+++ b/drivers/thermal/intel/intel_hfi.c
@@ -171,7 +171,7 @@ static DEFINE_MUTEX(hfi_instance_lock);
 
 static struct workqueue_struct *hfi_updates_wq;
 #define HFI_UPDATE_DELAY		HZ
-#define HFI_MAX_THERM_NOTIFY_COUNT	16
+#define HFI_THERMNL_CAPS_PER_EVENT	16
 
 /* Keep this variable 8-byte aligned to get atomic accesses. */
 static unsigned long hfi_update_delay = HFI_UPDATE_DELAY;
@@ -286,14 +286,14 @@ static void update_capabilities(struct hfi_instance *hfi_instance)
 
 	get_hfi_caps(hfi_instance, cpu_caps);
 
-	if (cpu_count < HFI_MAX_THERM_NOTIFY_COUNT)
+	if (cpu_count < HFI_THERMNL_CAPS_PER_EVENT)
 		goto last_cmd;
 
-	/* Process complete chunks of HFI_MAX_THERM_NOTIFY_COUNT capabilities. */
+	/* Process complete chunks of HFI_THERMNL_CAPS_PER_EVENT capabilities. */
 	for (i = 0;
-	     (i + HFI_MAX_THERM_NOTIFY_COUNT) <= cpu_count;
-	     i += HFI_MAX_THERM_NOTIFY_COUNT)
-		thermal_genl_cpu_capability_event(HFI_MAX_THERM_NOTIFY_COUNT,
+	     (i + HFI_THERMNL_CAPS_PER_EVENT) <= cpu_count;
+	     i += HFI_THERMNL_CAPS_PER_EVENT)
+		thermal_genl_cpu_capability_event(HFI_THERMNL_CAPS_PER_EVENT,
 						  &cpu_caps[i]);
 
 	cpu_count = cpu_count - i;
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH 4/4] thermal: intel: hfi: Tune the number of CPU capabilities per netlink event
  2024-04-29 23:41 [PATCH 0/4] intel: thermal: hfi: Add debugfs files for tuning Ricardo Neri
                   ` (2 preceding siblings ...)
  2024-04-29 23:41 ` [PATCH 3/4] thermal: intel: hfi: Rename HFI_MAX_THERM_NOTIFY_COUNT Ricardo Neri
@ 2024-04-29 23:41 ` Ricardo Neri
  2024-04-30  5:07   ` Zhang, Rui
  3 siblings, 1 reply; 10+ messages in thread
From: Ricardo Neri @ 2024-04-29 23:41 UTC (permalink / raw
  To: Rafael J. Wysocki
  Cc: Zhang Rui, Srinivas Pandruvada, Len Brown, Stanislaw Gruszka,
	linux-pm, linux-kernel, Ricardo Neri

The number of updated CPU capabilities per netlink event is hard-coded to
16. On systems with more than 16 it takes more than one thermal netlink
event to relay all the new capabilities when processing an HFI interrupt.
This adds unnecessary overhead.

Make the number of updated capabilities per event tuneable via debugfs.
Users can then experiment with different values.

We already take the hfi_instance_lock when submitting thermal netlink
updates. Use it to serialize debugfs accesses to hfi_therm_notify_count.

Suggested-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Signed-off-by: Ricardo Neri <ricardo.neri-calderon@linux.intel.com>
---
Cc: Len Brown <len.brown@intel.com>
Cc: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Cc: Stanislaw Gruszka <stanislaw.gruszka@linux.intel.com>
Cc: Zhang Rui <rui.zhang@intel.com>
Cc: linux-pm@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
---
 drivers/thermal/intel/intel_hfi.c | 34 ++++++++++++++++++++++++++-----
 1 file changed, 29 insertions(+), 5 deletions(-)

diff --git a/drivers/thermal/intel/intel_hfi.c b/drivers/thermal/intel/intel_hfi.c
index d6d3544509fc..d5163b9766c0 100644
--- a/drivers/thermal/intel/intel_hfi.c
+++ b/drivers/thermal/intel/intel_hfi.c
@@ -175,6 +175,7 @@ static struct workqueue_struct *hfi_updates_wq;
 
 /* Keep this variable 8-byte aligned to get atomic accesses. */
 static unsigned long hfi_update_delay = HFI_UPDATE_DELAY;
+static int hfi_thermnl_caps_per_event = HFI_THERMNL_CAPS_PER_EVENT;
 
 #ifdef CONFIG_DEBUG_FS
 static int hfi_update_delay_get(void *data, u64 *val)
@@ -205,6 +206,25 @@ static int hfi_update_delay_set(void *data, u64 val)
 DEFINE_DEBUGFS_ATTRIBUTE(hfi_update_delay_fops, hfi_update_delay_get,
 			 hfi_update_delay_set, "%llu\n");
 
+static int hfi_thermnl_caps_per_event_get(void *data, u64 *val)
+{
+	mutex_lock(&hfi_instance_lock);
+	*val = hfi_thermnl_caps_per_event;
+	mutex_unlock(&hfi_instance_lock);
+	return 0;
+}
+
+static int hfi_thermnl_caps_per_event_set(void *data, u64 val)
+{
+	mutex_lock(&hfi_instance_lock);
+	hfi_thermnl_caps_per_event = val;
+	mutex_unlock(&hfi_instance_lock);
+	return 0;
+}
+
+DEFINE_DEBUGFS_ATTRIBUTE(hfi_thermnl_caps_per_event_fops,
+			 hfi_thermnl_caps_per_event_get,
+			 hfi_thermnl_caps_per_event_set, "%llu\n");
 static struct dentry *hfi_debugfs_dir;
 
 static void hfi_debugfs_unregister(void)
@@ -226,6 +246,11 @@ static void hfi_debugfs_register(void)
 	if (!f)
 		goto err;
 
+	f = debugfs_create_file("thermnl_caps_per_event", 0644, hfi_debugfs_dir,
+				NULL, &hfi_thermnl_caps_per_event_fops);
+	if (!f)
+		goto err;
+
 	return;
 err:
 	hfi_debugfs_unregister();
@@ -286,16 +311,15 @@ static void update_capabilities(struct hfi_instance *hfi_instance)
 
 	get_hfi_caps(hfi_instance, cpu_caps);
 
-	if (cpu_count < HFI_THERMNL_CAPS_PER_EVENT)
+	if (cpu_count < hfi_thermnl_caps_per_event)
 		goto last_cmd;
 
 	/* Process complete chunks of HFI_THERMNL_CAPS_PER_EVENT capabilities. */
 	for (i = 0;
-	     (i + HFI_THERMNL_CAPS_PER_EVENT) <= cpu_count;
-	     i += HFI_THERMNL_CAPS_PER_EVENT)
-		thermal_genl_cpu_capability_event(HFI_THERMNL_CAPS_PER_EVENT,
+	     (i + hfi_thermnl_caps_per_event) <= cpu_count;
+	     i += hfi_thermnl_caps_per_event)
+		thermal_genl_cpu_capability_event(hfi_thermnl_caps_per_event,
 						  &cpu_caps[i]);
-
 	cpu_count = cpu_count - i;
 
 last_cmd:
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [PATCH 1/4] thermal: intel: hfi: Rename HFI_UPDATE_INTERVAL
  2024-04-29 23:41 ` [PATCH 1/4] thermal: intel: hfi: Rename HFI_UPDATE_INTERVAL Ricardo Neri
@ 2024-04-30  4:50   ` Zhang, Rui
  0 siblings, 0 replies; 10+ messages in thread
From: Zhang, Rui @ 2024-04-30  4:50 UTC (permalink / raw
  To: ricardo.neri-calderon@linux.intel.com, Wysocki, Rafael J
  Cc: srinivas.pandruvada@linux.intel.com, Brown, Len,
	stanislaw.gruszka@linux.intel.com, linux-pm@vger.kernel.org,
	linux-kernel@vger.kernel.org, Neri, Ricardo

On Mon, 2024-04-29 at 16:41 -0700, Ricardo Neri wrote:
> The name of the constant HFI_UPDATE_INTERVAL is misleading. It is not
> a
> periodic interval at which HFI updates are processed. It is the delay
> in
> the processing of an HFI update after the arrival of an HFI
> interrupt.
> 
> Signed-off-by: Ricardo Neri <ricardo.neri-calderon@linux.intel.com>

Acked-by: Zhang Rui <rui.zhang@intel.com>

-rui
> ---
> Cc: Len Brown <len.brown@intel.com>
> Cc: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
> Cc: Stanislaw Gruszka <stanislaw.gruszka@linux.intel.com>
> Cc: Zhang Rui <rui.zhang@intel.com>
> Cc: linux-pm@vger.kernel.org
> Cc: linux-kernel@vger.kernel.org
> ---
>  drivers/thermal/intel/intel_hfi.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/thermal/intel/intel_hfi.c
> b/drivers/thermal/intel/intel_hfi.c
> index fbc7f0cd83d7..e2b82d71ab6b 100644
> --- a/drivers/thermal/intel/intel_hfi.c
> +++ b/drivers/thermal/intel/intel_hfi.c
> @@ -166,7 +166,7 @@ static struct hfi_features hfi_features;
>  static DEFINE_MUTEX(hfi_instance_lock);
>  
>  static struct workqueue_struct *hfi_updates_wq;
> -#define HFI_UPDATE_INTERVAL            HZ
> +#define HFI_UPDATE_DELAY               HZ
>  #define HFI_MAX_THERM_NOTIFY_COUNT     16
>  
>  static void get_hfi_caps(struct hfi_instance *hfi_instance,
> @@ -322,7 +322,7 @@ void intel_hfi_process_event(__u64
> pkg_therm_status_msr_val)
>         raw_spin_unlock(&hfi_instance->event_lock);
>  
>         queue_delayed_work(hfi_updates_wq, &hfi_instance-
> >update_work,
> -                          HFI_UPDATE_INTERVAL);
> +                          HFI_UPDATE_DELAY);
>  }
>  
>  static void init_hfi_cpu_index(struct hfi_cpu_info *info)


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 2/4] thermal: intel: hfi: Tune the HFI thermal netlink event delay via debugfs
  2024-04-29 23:41 ` [PATCH 2/4] thermal: intel: hfi: Tune the HFI thermal netlink event delay via debugfs Ricardo Neri
@ 2024-04-30  4:52   ` Zhang, Rui
  2024-05-01  0:59     ` Ricardo Neri
  0 siblings, 1 reply; 10+ messages in thread
From: Zhang, Rui @ 2024-04-30  4:52 UTC (permalink / raw
  To: ricardo.neri-calderon@linux.intel.com, Wysocki, Rafael J
  Cc: srinivas.pandruvada@linux.intel.com, Brown, Len,
	stanislaw.gruszka@linux.intel.com, linux-pm@vger.kernel.org,
	linux-kernel@vger.kernel.org, Neri, Ricardo

On Mon, 2024-04-29 at 16:41 -0700, Ricardo Neri wrote:
> THe delay between an HFI interrupt and its corresponding thermal

s/THe/The

> netlink
> event has so far been hard-coded to CONFIG_HZ jiffies. This may not
> suit
> the needs of all hardware configurations or listeners of events.
> 
> If the update delay is too long, much of the information of
> consecutive
> hardware updates will be lost as the HFI delayed workqueue posts a
> new
> thermal netlink event only when there are no previous pending events.
> If
> the delay is too short, multiple, events may overwhelm listeners.
> 
> Listeners are better placed to determine the delay of events. They
> know how
> quickly they can act on them effectively. They may also want to
> experiment
> with different values.
> 
> Start a debugfs interface to tune the notification delay.

Why this is implemented as debugfs rather than a module param?

thanks,
rui
> 
> Keep things simple and do not add extra locking or memory barriers.
> This
> may result in the HFI interrupt ocassionally queueing work using
> stale
> delay values, if at all. This should not be a problem: the debugfs
> file
> will update the delay value atomically, we do not expect users to
> update
> the delay value frequently, and the delayed workqueue operates in
> jiffies
> resolution.
> 
> Suggested-by: Srinivas Pandruvada
> <srinivas.pandruvada@linux.intel.com>
> Signed-off-by: Ricardo Neri <ricardo.neri-calderon@linux.intel.com>
> ---
> Cc: Len Brown <len.brown@intel.com>
> Cc: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
> Cc: Stanislaw Gruszka <stanislaw.gruszka@linux.intel.com>
> Cc: Zhang Rui <rui.zhang@intel.com>
> Cc: linux-pm@vger.kernel.org
> Cc: linux-kernel@vger.kernel.org
> ---
>  drivers/thermal/intel/intel_hfi.c | 77
> ++++++++++++++++++++++++++++++-
>  1 file changed, 76 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/thermal/intel/intel_hfi.c
> b/drivers/thermal/intel/intel_hfi.c
> index e2b82d71ab6b..24d708268c68 100644
> --- a/drivers/thermal/intel/intel_hfi.c
> +++ b/drivers/thermal/intel/intel_hfi.c
> @@ -43,6 +43,10 @@
>  
>  #include <asm/msr.h>
>  
> +#ifdef CONFIG_DEBUG_FS
> +#include <linux/debugfs.h>
> +#endif
> +
>  #include "intel_hfi.h"
>  #include "thermal_interrupt.h"
>  
> @@ -169,6 +173,70 @@ static struct workqueue_struct *hfi_updates_wq;
>  #define HFI_UPDATE_DELAY               HZ
>  #define HFI_MAX_THERM_NOTIFY_COUNT     16
>  
> +/* Keep this variable 8-byte aligned to get atomic accesses. */
> +static unsigned long hfi_update_delay = HFI_UPDATE_DELAY;
> +
> +#ifdef CONFIG_DEBUG_FS
> +static int hfi_update_delay_get(void *data, u64 *val)
> +{
> +       mutex_lock(&hfi_instance_lock);
> +       *val = jiffies_to_msecs(hfi_update_delay);
> +       mutex_unlock(&hfi_instance_lock);
> +       return 0;
> +}
> +
> +static int hfi_update_delay_set(void *data, u64 val)
> +{
> +       /*
> +        * The mutex only serializes access to the debugfs file.
> +        *
> +        * hfi_process_event() loads @hfi_update_delay from interrupt
> context.
> +        * We could have serialized accesses with a spinlock or a
> memory barrier.
> +        * But this is a debug feature, the store of
> @hfi_update_delay is
> +        * atomic, and will seldom change. A few loads of
> @hfi_update_delay may
> +        * see stale values but the updated value will be seen
> eventually.
> +        */
> +       mutex_lock(&hfi_instance_lock);
> +       hfi_update_delay = msecs_to_jiffies(val);
> +       mutex_unlock(&hfi_instance_lock);
> +       return 0;
> +}
> +
> +DEFINE_DEBUGFS_ATTRIBUTE(hfi_update_delay_fops,
> hfi_update_delay_get,
> +                        hfi_update_delay_set, "%llu\n");
> +
> +static struct dentry *hfi_debugfs_dir;
> +
> +static void hfi_debugfs_unregister(void)
> +{
> +       debugfs_remove_recursive(hfi_debugfs_dir);
> +       hfi_debugfs_dir = NULL;
> +}
> +
> +static void hfi_debugfs_register(void)
> +{
> +       struct dentry *f;
> +
> +       hfi_debugfs_dir = debugfs_create_dir("intel_hfi", NULL);
> +       if (!hfi_debugfs_dir)
> +               return;
> +
> +       f = debugfs_create_file("update_delay_ms", 0644,
> hfi_debugfs_dir,
> +                               NULL, &hfi_update_delay_fops);
> +       if (!f)
> +               goto err;
> +
> +       return;
> +err:
> +       hfi_debugfs_unregister();
> +}
> +
> +#else
> +static void hfi_debugfs_register(void)
> +{
> +}
> +#endif /* CONFIG_DEBUG_FS */
> +
>  static void get_hfi_caps(struct hfi_instance *hfi_instance,
>                          struct thermal_genl_cpu_caps *cpu_caps)
>  {
> @@ -321,8 +389,13 @@ void intel_hfi_process_event(__u64
> pkg_therm_status_msr_val)
>         raw_spin_unlock(&hfi_instance->table_lock);
>         raw_spin_unlock(&hfi_instance->event_lock);
>  
> +       /*
> +        * debugfs may atomically store @hfi_update_delay without
> locking. The
> +        * updated value may not be immediately observed. See note in
> +        * hfi_update_delay_set().
> +        */
>         queue_delayed_work(hfi_updates_wq, &hfi_instance-
> >update_work,
> -                          HFI_UPDATE_DELAY);
> +                          hfi_update_delay);
>  }
>  
>  static void init_hfi_cpu_index(struct hfi_cpu_info *info)
> @@ -708,6 +781,8 @@ void __init intel_hfi_init(void)
>  
>         register_syscore_ops(&hfi_pm_ops);
>  
> +       hfi_debugfs_register();
> +
>         return;
>  
>  err_nl_notif:


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 4/4] thermal: intel: hfi: Tune the number of CPU capabilities per netlink event
  2024-04-29 23:41 ` [PATCH 4/4] thermal: intel: hfi: Tune the number of CPU capabilities per netlink event Ricardo Neri
@ 2024-04-30  5:07   ` Zhang, Rui
  2024-05-01  1:16     ` Ricardo Neri
  0 siblings, 1 reply; 10+ messages in thread
From: Zhang, Rui @ 2024-04-30  5:07 UTC (permalink / raw
  To: ricardo.neri-calderon@linux.intel.com, Wysocki, Rafael J
  Cc: srinivas.pandruvada@linux.intel.com, Brown, Len,
	stanislaw.gruszka@linux.intel.com, linux-pm@vger.kernel.org,
	linux-kernel@vger.kernel.org, Neri, Ricardo

On Mon, 2024-04-29 at 16:41 -0700, Ricardo Neri wrote:
> The number of updated CPU capabilities per netlink event is hard-
> coded to
> 16. On systems with more than 16 it takes more than one thermal
> netlink
> event to relay all the new capabilities when processing an HFI
> interrupt.
> This adds unnecessary overhead.
> 
> Make the number of updated capabilities per event tuneable via
> debugfs.
> Users can then experiment with different values.
> 
Is there a limitation about the number of CPUs supported in one netlink
event?

IMO, we still have to use a fixed number here because debugfs can be
changed by someone else, and userspace application like intel-lpmd
cannot make assumption that the netlink message follows what it set.

or can we append one magic item in the end of one update?
userspace can just check the magic item no matter the number of CPU per
netlink event.

thanks,
rui

> We already take the hfi_instance_lock when submitting thermal netlink
> updates. Use it to serialize debugfs accesses to
> hfi_therm_notify_count.
> 
> Suggested-by: Srinivas Pandruvada
> <srinivas.pandruvada@linux.intel.com>
> Signed-off-by: Ricardo Neri <ricardo.neri-calderon@linux.intel.com>
> ---
> Cc: Len Brown <len.brown@intel.com>
> Cc: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
> Cc: Stanislaw Gruszka <stanislaw.gruszka@linux.intel.com>
> Cc: Zhang Rui <rui.zhang@intel.com>
> Cc: linux-pm@vger.kernel.org
> Cc: linux-kernel@vger.kernel.org
> ---
>  drivers/thermal/intel/intel_hfi.c | 34 ++++++++++++++++++++++++++---
> --
>  1 file changed, 29 insertions(+), 5 deletions(-)
> 
> diff --git a/drivers/thermal/intel/intel_hfi.c
> b/drivers/thermal/intel/intel_hfi.c
> index d6d3544509fc..d5163b9766c0 100644
> --- a/drivers/thermal/intel/intel_hfi.c
> +++ b/drivers/thermal/intel/intel_hfi.c
> @@ -175,6 +175,7 @@ static struct workqueue_struct *hfi_updates_wq;
>  
>  /* Keep this variable 8-byte aligned to get atomic accesses. */
>  static unsigned long hfi_update_delay = HFI_UPDATE_DELAY;
> +static int hfi_thermnl_caps_per_event = HFI_THERMNL_CAPS_PER_EVENT;
>  
>  #ifdef CONFIG_DEBUG_FS
>  static int hfi_update_delay_get(void *data, u64 *val)
> @@ -205,6 +206,25 @@ static int hfi_update_delay_set(void *data, u64
> val)
>  DEFINE_DEBUGFS_ATTRIBUTE(hfi_update_delay_fops,
> hfi_update_delay_get,
>                          hfi_update_delay_set, "%llu\n");
>  
> +static int hfi_thermnl_caps_per_event_get(void *data, u64 *val)
> +{
> +       mutex_lock(&hfi_instance_lock);
> +       *val = hfi_thermnl_caps_per_event;
> +       mutex_unlock(&hfi_instance_lock);
> +       return 0;
> +}
> +
> +static int hfi_thermnl_caps_per_event_set(void *data, u64 val)
> +{
> +       mutex_lock(&hfi_instance_lock);
> +       hfi_thermnl_caps_per_event = val;
> +       mutex_unlock(&hfi_instance_lock);
> +       return 0;
> +}
> +
> +DEFINE_DEBUGFS_ATTRIBUTE(hfi_thermnl_caps_per_event_fops,
> +                        hfi_thermnl_caps_per_event_get,
> +                        hfi_thermnl_caps_per_event_set, "%llu\n");
>  static struct dentry *hfi_debugfs_dir;
>  
>  static void hfi_debugfs_unregister(void)
> @@ -226,6 +246,11 @@ static void hfi_debugfs_register(void)
>         if (!f)
>                 goto err;
>  
> +       f = debugfs_create_file("thermnl_caps_per_event", 0644,
> hfi_debugfs_dir,
> +                               NULL,
> &hfi_thermnl_caps_per_event_fops);
> +       if (!f)
> +               goto err;
> +
>         return;
>  err:
>         hfi_debugfs_unregister();
> @@ -286,16 +311,15 @@ static void update_capabilities(struct
> hfi_instance *hfi_instance)
>  
>         get_hfi_caps(hfi_instance, cpu_caps);
>  
> -       if (cpu_count < HFI_THERMNL_CAPS_PER_EVENT)
> +       if (cpu_count < hfi_thermnl_caps_per_event)
>                 goto last_cmd;
>  
>         /* Process complete chunks of HFI_THERMNL_CAPS_PER_EVENT
> capabilities. */
>         for (i = 0;
> -            (i + HFI_THERMNL_CAPS_PER_EVENT) <= cpu_count;
> -            i += HFI_THERMNL_CAPS_PER_EVENT)
> -
>                thermal_genl_cpu_capability_event(HFI_THERMNL_CAPS_PER_
> EVENT,
> +            (i + hfi_thermnl_caps_per_event) <= cpu_count;
> +            i += hfi_thermnl_caps_per_event)
> +               thermal_genl_cpu_capability_event(hfi_thermnl_caps_pe
> r_event,
>                                                   &cpu_caps[i]);
> -
>         cpu_count = cpu_count - i;
>  
>  last_cmd:


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 2/4] thermal: intel: hfi: Tune the HFI thermal netlink event delay via debugfs
  2024-04-30  4:52   ` Zhang, Rui
@ 2024-05-01  0:59     ` Ricardo Neri
  0 siblings, 0 replies; 10+ messages in thread
From: Ricardo Neri @ 2024-05-01  0:59 UTC (permalink / raw
  To: Zhang, Rui
  Cc: Wysocki, Rafael J, srinivas.pandruvada@linux.intel.com,
	Brown, Len, stanislaw.gruszka@linux.intel.com,
	linux-pm@vger.kernel.org, linux-kernel@vger.kernel.org,
	Neri, Ricardo

On Tue, Apr 30, 2024 at 04:52:52AM +0000, Zhang, Rui wrote:
> On Mon, 2024-04-29 at 16:41 -0700, Ricardo Neri wrote:
> > THe delay between an HFI interrupt and its corresponding thermal
> 
> s/THe/The

Ah, I read the changelog 1000 times and still I misse this. Thanks!

> 
> > netlink
> > event has so far been hard-coded to CONFIG_HZ jiffies. This may not
> > suit
> > the needs of all hardware configurations or listeners of events.
> > 
> > If the update delay is too long, much of the information of
> > consecutive
> > hardware updates will be lost as the HFI delayed workqueue posts a
> > new
> > thermal netlink event only when there are no previous pending events.
> > If
> > the delay is too short, multiple, events may overwhelm listeners.
> > 
> > Listeners are better placed to determine the delay of events. They
> > know how
> > quickly they can act on them effectively. They may also want to
> > experiment
> > with different values.
> > 
> > Start a debugfs interface to tune the notification delay.
> 
> Why this is implemented as debugfs rather than a module param?

Implementing as module param would require more work. In its current form,
intel_hfi is not a module. Its entry points are device_initcall()
(via therm_throt.c) and CPU hotplug.

I can look into implementing as a module param if folks think its a
better solution.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 4/4] thermal: intel: hfi: Tune the number of CPU capabilities per netlink event
  2024-04-30  5:07   ` Zhang, Rui
@ 2024-05-01  1:16     ` Ricardo Neri
  0 siblings, 0 replies; 10+ messages in thread
From: Ricardo Neri @ 2024-05-01  1:16 UTC (permalink / raw
  To: Zhang, Rui
  Cc: Wysocki, Rafael J, srinivas.pandruvada@linux.intel.com,
	Brown, Len, stanislaw.gruszka@linux.intel.com,
	linux-pm@vger.kernel.org, linux-kernel@vger.kernel.org,
	Neri, Ricardo

On Tue, Apr 30, 2024 at 05:07:54AM +0000, Zhang, Rui wrote:
> On Mon, 2024-04-29 at 16:41 -0700, Ricardo Neri wrote:
> > The number of updated CPU capabilities per netlink event is hard-
> > coded to
> > 16. On systems with more than 16 it takes more than one thermal
> > netlink
> > event to relay all the new capabilities when processing an HFI
> > interrupt.
> > This adds unnecessary overhead.
> > 
> > Make the number of updated capabilities per event tuneable via
> > debugfs.
> > Users can then experiment with different values.
> > 
> Is there a limitation about the number of CPUs supported in one netlink
> event?

IIUC, the only limit is the size of the buffer for the message. intel_hfi
allocates the message based on the number of CPUs of the HFI instance.

> 
> IMO, we still have to use a fixed number here because debugfs can be
> changed by someone else, and userspace application like intel-lpmd
> cannot make assumption that the netlink message follows what it set.

but you don't know how many messages with 16-CPUs payload you will receive
for a single update, no? Yes, you can infer it from the number of online
CPUs, but still.

But yes, now lpmd would receive an unknown number of messages with payloads
of unknown size.

> 
> or can we append one magic item in the end of one update?
> userspace can just check the magic item no matter the number of CPU per
> netlink event.

AFAIK, only HFI and lmpd use the CPU capabilities thermal netlink message.
I guess it could be done.

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2024-05-01  1:09 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-04-29 23:41 [PATCH 0/4] intel: thermal: hfi: Add debugfs files for tuning Ricardo Neri
2024-04-29 23:41 ` [PATCH 1/4] thermal: intel: hfi: Rename HFI_UPDATE_INTERVAL Ricardo Neri
2024-04-30  4:50   ` Zhang, Rui
2024-04-29 23:41 ` [PATCH 2/4] thermal: intel: hfi: Tune the HFI thermal netlink event delay via debugfs Ricardo Neri
2024-04-30  4:52   ` Zhang, Rui
2024-05-01  0:59     ` Ricardo Neri
2024-04-29 23:41 ` [PATCH 3/4] thermal: intel: hfi: Rename HFI_MAX_THERM_NOTIFY_COUNT Ricardo Neri
2024-04-29 23:41 ` [PATCH 4/4] thermal: intel: hfi: Tune the number of CPU capabilities per netlink event Ricardo Neri
2024-04-30  5:07   ` Zhang, Rui
2024-05-01  1:16     ` Ricardo Neri

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).