Linux-PM Archive mirror
 help / color / mirror / Atom feed
* [PATCH v1 0/2] thermal/debufs: Fix and clean up trip statistics collection
@ 2024-04-15 18:59 Rafael J. Wysocki
  2024-04-15 19:02 ` [PATCH v1 1/2] thermal/debugfs: Add missing count increment to thermal_debug_tz_trip_up() Rafael J. Wysocki
  2024-04-15 19:03 ` [PATCH v1 2/2] thermal/debugfs: Add helper function for trip stats updates Rafael J. Wysocki
  0 siblings, 2 replies; 4+ messages in thread
From: Rafael J. Wysocki @ 2024-04-15 18:59 UTC (permalink / raw
  To: Linux PM; +Cc: LKML, Lukasz Luba, Daniel Lezcano

Hi Everyone,

This series fixes a possible kernel crash in thermal_debug_tz_trip_up()
and reduces some code duplication between this function and
thermal_debug_update_temp().

The plan is to push the fix (patch [1/2]) for 6.9-rc and apply the cleanup
for 6.10 when the fix reaches the mainline.

Thanks!




^ permalink raw reply	[flat|nested] 4+ messages in thread

* [PATCH v1 1/2] thermal/debugfs: Add missing count increment to thermal_debug_tz_trip_up()
  2024-04-15 18:59 [PATCH v1 0/2] thermal/debufs: Fix and clean up trip statistics collection Rafael J. Wysocki
@ 2024-04-15 19:02 ` Rafael J. Wysocki
  2024-04-15 19:03 ` [PATCH v1 2/2] thermal/debugfs: Add helper function for trip stats updates Rafael J. Wysocki
  1 sibling, 0 replies; 4+ messages in thread
From: Rafael J. Wysocki @ 2024-04-15 19:02 UTC (permalink / raw
  To: Linux PM; +Cc: LKML, Lukasz Luba, Daniel Lezcano

From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>

The count field in struct trip_stats, representing the number of times
the zone temperature was above the trip point, needs to be incremented
in thermal_debug_tz_trip_up(), for two reasons.

First, if a trip point is crossed on the way up for the first time,
thermal_debug_update_temp() called from update_temperature() does
not see it because it has not been added to trips_crossed[] array
in the thermal zone's struct tz_debugfs object yet.  Therefore, when
thermal_debug_tz_trip_up() is called after that, the trip point's
count value is 0, and the attempt to divide by it during the average
temperature computation leads to a divide error which causes the kernel
to crash.  Setting the count to 1 before the division by incrementing it
fixes this problem.

Second, if a trip point is crossed on the way up, but it has been
crossed on the way up already before, its count value needs to be
incremented to make a record of the fact that the zone temperature is
above the trip now.  Without doing that, if the mitigations applied
after crossing the trip cause the zone temperature to drop below its
threshold, the count will not be updated for this episode at all and
the average temperature in the trip statistics record will be somewhat
smaller than it should be.

Fixes: 7ef01f228c9f ("thermal/debugfs: Add thermal debugfs information for mitigation episodes")
Cc :6.8+ <stable@vger.kernel.org> # 6.8+
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
---
 drivers/thermal/thermal_debugfs.c |    1 +
 1 file changed, 1 insertion(+)

Index: linux-pm/drivers/thermal/thermal_debugfs.c
===================================================================
--- linux-pm.orig/drivers/thermal/thermal_debugfs.c
+++ linux-pm/drivers/thermal/thermal_debugfs.c
@@ -616,6 +616,7 @@ void thermal_debug_tz_trip_up(struct the
 	tze->trip_stats[trip_id].timestamp = now;
 	tze->trip_stats[trip_id].max = max(tze->trip_stats[trip_id].max, temperature);
 	tze->trip_stats[trip_id].min = min(tze->trip_stats[trip_id].min, temperature);
+	tze->trip_stats[trip_id].count++;
 	tze->trip_stats[trip_id].avg = tze->trip_stats[trip_id].avg +
 		(temperature - tze->trip_stats[trip_id].avg) /
 		tze->trip_stats[trip_id].count;




^ permalink raw reply	[flat|nested] 4+ messages in thread

* [PATCH v1 2/2] thermal/debugfs: Add helper function for trip stats updates
  2024-04-15 18:59 [PATCH v1 0/2] thermal/debufs: Fix and clean up trip statistics collection Rafael J. Wysocki
  2024-04-15 19:02 ` [PATCH v1 1/2] thermal/debugfs: Add missing count increment to thermal_debug_tz_trip_up() Rafael J. Wysocki
@ 2024-04-15 19:03 ` Rafael J. Wysocki
  2024-04-17  9:35   ` Rafael J. Wysocki
  1 sibling, 1 reply; 4+ messages in thread
From: Rafael J. Wysocki @ 2024-04-15 19:03 UTC (permalink / raw
  To: Linux PM; +Cc: LKML, Lukasz Luba, Daniel Lezcano

From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>

The code updating a trip_stats entry in thermal_debug_tz_trip_up()
and thermal_debug_update_temp() is almost entirely duplicate, so move
it to a new helper function that will be called from both these places.

While at it, drop a redundant tz_dbg->nr_trips check and a label related
to it from thermal_debug_update_temp().

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
---
 drivers/thermal/thermal_debugfs.c |   42 +++++++++++++++++---------------------
 1 file changed, 19 insertions(+), 23 deletions(-)

Index: linux-pm/drivers/thermal/thermal_debugfs.c
===================================================================
--- linux-pm.orig/drivers/thermal/thermal_debugfs.c
+++ linux-pm/drivers/thermal/thermal_debugfs.c
@@ -539,6 +539,19 @@ static struct tz_episode *thermal_debugf
 	return tze;
 }
 
+static struct trip_stats *update_tz_episode(struct tz_debugfs *tz_dbg,
+					    int trip_id, int temperature)
+{
+	struct tz_episode *tze = list_first_entry(&tz_dbg->tz_episodes,
+						  struct tz_episode, node);
+	struct trip_stats *trip_stats = &tze->trip_stats[trip_id];
+
+	trip_stats->max = max(trip_stats->max, temperature);
+	trip_stats->min = min(trip_stats->min, temperature);
+	trip_stats->avg += (temperature - trip_stats->avg) / ++trip_stats->count;
+	return trip_stats;
+}
+
 void thermal_debug_tz_trip_up(struct thermal_zone_device *tz,
 			      const struct thermal_trip *trip)
 {
@@ -547,6 +560,7 @@ void thermal_debug_tz_trip_up(struct the
 	struct thermal_debugfs *thermal_dbg = tz->debugfs;
 	int temperature = tz->temperature;
 	int trip_id = thermal_zone_trip_id(tz, trip);
+	struct trip_stats *trip_stats;
 	ktime_t now = ktime_get();
 
 	if (!thermal_dbg)
@@ -612,14 +626,8 @@ void thermal_debug_tz_trip_up(struct the
 	 */
 	tz_dbg->trips_crossed[tz_dbg->nr_trips++] = trip_id;
 
-	tze = list_first_entry(&tz_dbg->tz_episodes, struct tz_episode, node);
-	tze->trip_stats[trip_id].timestamp = now;
-	tze->trip_stats[trip_id].max = max(tze->trip_stats[trip_id].max, temperature);
-	tze->trip_stats[trip_id].min = min(tze->trip_stats[trip_id].min, temperature);
-	tze->trip_stats[trip_id].count++;
-	tze->trip_stats[trip_id].avg = tze->trip_stats[trip_id].avg +
-		(temperature - tze->trip_stats[trip_id].avg) /
-		tze->trip_stats[trip_id].count;
+	trip_stats = update_tz_episode(tz_dbg, trip_id, temperature);
+	trip_stats->timestamp = now;
 
 unlock:
 	mutex_unlock(&thermal_dbg->lock);
@@ -686,9 +694,8 @@ out:
 void thermal_debug_update_temp(struct thermal_zone_device *tz)
 {
 	struct thermal_debugfs *thermal_dbg = tz->debugfs;
-	struct tz_episode *tze;
 	struct tz_debugfs *tz_dbg;
-	int trip_id, i;
+	int i;
 
 	if (!thermal_dbg)
 		return;
@@ -697,20 +704,9 @@ void thermal_debug_update_temp(struct th
 
 	tz_dbg = &thermal_dbg->tz_dbg;
 
-	if (!tz_dbg->nr_trips)
-		goto out;
+	for (i = 0; i < tz_dbg->nr_trips; i++)
+		update_tz_episode(tz_dbg, tz_dbg->trips_crossed[i], tz->temperature);
 
-	for (i = 0; i < tz_dbg->nr_trips; i++) {
-		trip_id = tz_dbg->trips_crossed[i];
-		tze = list_first_entry(&tz_dbg->tz_episodes, struct tz_episode, node);
-		tze->trip_stats[trip_id].count++;
-		tze->trip_stats[trip_id].max = max(tze->trip_stats[trip_id].max, tz->temperature);
-		tze->trip_stats[trip_id].min = min(tze->trip_stats[trip_id].min, tz->temperature);
-		tze->trip_stats[trip_id].avg = tze->trip_stats[trip_id].avg +
-			(tz->temperature - tze->trip_stats[trip_id].avg) /
-			tze->trip_stats[trip_id].count;
-	}
-out:
 	mutex_unlock(&thermal_dbg->lock);
 }
 




^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH v1 2/2] thermal/debugfs: Add helper function for trip stats updates
  2024-04-15 19:03 ` [PATCH v1 2/2] thermal/debugfs: Add helper function for trip stats updates Rafael J. Wysocki
@ 2024-04-17  9:35   ` Rafael J. Wysocki
  0 siblings, 0 replies; 4+ messages in thread
From: Rafael J. Wysocki @ 2024-04-17  9:35 UTC (permalink / raw
  To: Rafael J. Wysocki; +Cc: Linux PM, LKML, Lukasz Luba, Daniel Lezcano

On Mon, Apr 15, 2024 at 9:03 PM Rafael J. Wysocki <rjw@rjwysocki.net> wrote:
>
> From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
>
> The code updating a trip_stats entry in thermal_debug_tz_trip_up()
> and thermal_debug_update_temp() is almost entirely duplicate, so move
> it to a new helper function that will be called from both these places.
>
> While at it, drop a redundant tz_dbg->nr_trips check and a label related
> to it from thermal_debug_update_temp().
>
> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>

In the meantime I realized that thermal_debug_update_temp() made
false-positive updates of trips involved in the current mitigation
episode in the cases when they were crossed on the way down.

Namely, in that case the zone temperature is already below the low
temperature of the trip, but that is only recorded by
thermal_debug_tz_trip_down() that is called after
thermal_debug_update_temp().

For this reason, I'm withdrawing this patch and I will send
replacement patches later today.

Thanks!

> ---
>  drivers/thermal/thermal_debugfs.c |   42 +++++++++++++++++---------------------
>  1 file changed, 19 insertions(+), 23 deletions(-)
>
> Index: linux-pm/drivers/thermal/thermal_debugfs.c
> ===================================================================
> --- linux-pm.orig/drivers/thermal/thermal_debugfs.c
> +++ linux-pm/drivers/thermal/thermal_debugfs.c
> @@ -539,6 +539,19 @@ static struct tz_episode *thermal_debugf
>         return tze;
>  }
>
> +static struct trip_stats *update_tz_episode(struct tz_debugfs *tz_dbg,
> +                                           int trip_id, int temperature)
> +{
> +       struct tz_episode *tze = list_first_entry(&tz_dbg->tz_episodes,
> +                                                 struct tz_episode, node);
> +       struct trip_stats *trip_stats = &tze->trip_stats[trip_id];
> +
> +       trip_stats->max = max(trip_stats->max, temperature);
> +       trip_stats->min = min(trip_stats->min, temperature);
> +       trip_stats->avg += (temperature - trip_stats->avg) / ++trip_stats->count;
> +       return trip_stats;
> +}
> +
>  void thermal_debug_tz_trip_up(struct thermal_zone_device *tz,
>                               const struct thermal_trip *trip)
>  {
> @@ -547,6 +560,7 @@ void thermal_debug_tz_trip_up(struct the
>         struct thermal_debugfs *thermal_dbg = tz->debugfs;
>         int temperature = tz->temperature;
>         int trip_id = thermal_zone_trip_id(tz, trip);
> +       struct trip_stats *trip_stats;
>         ktime_t now = ktime_get();
>
>         if (!thermal_dbg)
> @@ -612,14 +626,8 @@ void thermal_debug_tz_trip_up(struct the
>          */
>         tz_dbg->trips_crossed[tz_dbg->nr_trips++] = trip_id;
>
> -       tze = list_first_entry(&tz_dbg->tz_episodes, struct tz_episode, node);
> -       tze->trip_stats[trip_id].timestamp = now;
> -       tze->trip_stats[trip_id].max = max(tze->trip_stats[trip_id].max, temperature);
> -       tze->trip_stats[trip_id].min = min(tze->trip_stats[trip_id].min, temperature);
> -       tze->trip_stats[trip_id].count++;
> -       tze->trip_stats[trip_id].avg = tze->trip_stats[trip_id].avg +
> -               (temperature - tze->trip_stats[trip_id].avg) /
> -               tze->trip_stats[trip_id].count;
> +       trip_stats = update_tz_episode(tz_dbg, trip_id, temperature);
> +       trip_stats->timestamp = now;
>
>  unlock:
>         mutex_unlock(&thermal_dbg->lock);
> @@ -686,9 +694,8 @@ out:
>  void thermal_debug_update_temp(struct thermal_zone_device *tz)
>  {
>         struct thermal_debugfs *thermal_dbg = tz->debugfs;
> -       struct tz_episode *tze;
>         struct tz_debugfs *tz_dbg;
> -       int trip_id, i;
> +       int i;
>
>         if (!thermal_dbg)
>                 return;
> @@ -697,20 +704,9 @@ void thermal_debug_update_temp(struct th
>
>         tz_dbg = &thermal_dbg->tz_dbg;
>
> -       if (!tz_dbg->nr_trips)
> -               goto out;
> +       for (i = 0; i < tz_dbg->nr_trips; i++)
> +               update_tz_episode(tz_dbg, tz_dbg->trips_crossed[i], tz->temperature);
>
> -       for (i = 0; i < tz_dbg->nr_trips; i++) {
> -               trip_id = tz_dbg->trips_crossed[i];
> -               tze = list_first_entry(&tz_dbg->tz_episodes, struct tz_episode, node);
> -               tze->trip_stats[trip_id].count++;
> -               tze->trip_stats[trip_id].max = max(tze->trip_stats[trip_id].max, tz->temperature);
> -               tze->trip_stats[trip_id].min = min(tze->trip_stats[trip_id].min, tz->temperature);
> -               tze->trip_stats[trip_id].avg = tze->trip_stats[trip_id].avg +
> -                       (tz->temperature - tze->trip_stats[trip_id].avg) /
> -                       tze->trip_stats[trip_id].count;
> -       }
> -out:
>         mutex_unlock(&thermal_dbg->lock);
>  }
>
>
>
>
>

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2024-04-17  9:35 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-04-15 18:59 [PATCH v1 0/2] thermal/debufs: Fix and clean up trip statistics collection Rafael J. Wysocki
2024-04-15 19:02 ` [PATCH v1 1/2] thermal/debugfs: Add missing count increment to thermal_debug_tz_trip_up() Rafael J. Wysocki
2024-04-15 19:03 ` [PATCH v1 2/2] thermal/debugfs: Add helper function for trip stats updates Rafael J. Wysocki
2024-04-17  9:35   ` Rafael J. Wysocki

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).