Linux-perf-users Archive mirror
 help / color / mirror / Atom feed
* [PATCH v2] perf/core: Fix small negative period being ignored
@ 2024-03-12 19:38 Luo Gengkun
  2024-03-31  6:54 ` Luo Gengkun
  0 siblings, 1 reply; 3+ messages in thread
From: Luo Gengkun @ 2024-03-12 19:38 UTC (permalink / raw
  To: peterz
  Cc: mingo, acme, namhyung, mark.rutland, alexander.shishkin, jolsa,
	irogers, adrian.hunter, linux-perf-users, linux-kernel,
	luogengkun2

In perf_adjust_period, we will first calculate period, and then use
this period to calculate delta. However, when delta is less than 0,
there will be a deviation compared to when delta is greater than or
equal to 0. For example, when delta is in the range of [-14,-1], the
range of delta = delta + 7 is between [-7,6], so the final value of
delta/8 is 0. Therefore, the impact of -1 and -2 will be ignored.
This is unacceptable when the target period is very short, because
we will lose a lot of samples.

Here are some tests and analyzes:
before:
  # perf record -e cs -F 1000  ./a.out
  [ perf record: Woken up 1 times to write data ]
  [ perf record: Captured and wrote 0.022 MB perf.data (518 samples) ]

  # perf script
  ...
  a.out     396   257.956048:         23 cs:  ffffffff81f4eeec schedul>
  a.out     396   257.957891:         23 cs:  ffffffff81f4eeec schedul>
  a.out     396   257.959730:         23 cs:  ffffffff81f4eeec schedul>
  a.out     396   257.961545:         23 cs:  ffffffff81f4eeec schedul>
  a.out     396   257.963355:         23 cs:  ffffffff81f4eeec schedul>
  a.out     396   257.965163:         23 cs:  ffffffff81f4eeec schedul>
  a.out     396   257.966973:         23 cs:  ffffffff81f4eeec schedul>
  a.out     396   257.968785:         23 cs:  ffffffff81f4eeec schedul>
  a.out     396   257.970593:         23 cs:  ffffffff81f4eeec schedul>
  ...

after:
  # perf record -e cs -F 1000  ./a.out
  [ perf record: Woken up 1 times to write data ]
  [ perf record: Captured and wrote 0.058 MB perf.data (1466 samples) ]

  # perf script
  ...
  a.out     395    59.338813:         11 cs:  ffffffff81f4eeec schedul>
  a.out     395    59.339707:         12 cs:  ffffffff81f4eeec schedul>
  a.out     395    59.340682:         13 cs:  ffffffff81f4eeec schedul>
  a.out     395    59.341751:         13 cs:  ffffffff81f4eeec schedul>
  a.out     395    59.342799:         12 cs:  ffffffff81f4eeec schedul>
  a.out     395    59.343765:         11 cs:  ffffffff81f4eeec schedul>
  a.out     395    59.344651:         11 cs:  ffffffff81f4eeec schedul>
  a.out     395    59.345539:         12 cs:  ffffffff81f4eeec schedul>
  a.out     395    59.346502:         13 cs:  ffffffff81f4eeec schedul>
  ...

test.c

int main() {
        for (int i = 0; i < 20000; i++)
                usleep(10);

        return 0;
}

  # time ./a.out
  real    0m1.583s
  user    0m0.040s
  sys     0m0.298s

The above results were tested on x86-64 qemu with KVM enabled using
test.c as test program. Ideally, we should have around 1500 samples,
but the previous algorithm had only about 500, whereas the modified
algorithm now has about 1400. Further more, the new version shows 1
sample per 0.001s, while the previous one is 1 sample per 0.002s.This
indicates that the new algorithm is more sensitive to small negative
values compared to old algorithm.

Fixes: bd2b5b12849a ("perf_counter: More aggressive frequency adjustment")

Link:https://lore.kernel.org/all/7919005d-ca26-4cae-8c1c-4adea63704ce@huawei.com/
Signed-off-by: Luo Gengkun <luogengkun2@huawei.com>
Reviewed-by: Adrian Hunter <adrian.hunter@intel.com>

---
v1 -> v2: Fix incorrected time diff in tick adjust period and review

It seems that the Timer Interrupts is not coming every TICK_NSEC when
system is idle. So the final period that we calculate will be bigger
than expected because we pass an incorrected nsec to perf_adjust_period.
As shown below, the unexcepted period is come from perf_adjust_freq_unthr_context.
Moreover, we cannot re-adjust the period using __perf_event_account_interrupt
because the overflow time is larger than 2 * TICK_NSEC. To fix this
problem, we can calculate the interval of Timer Interrupts using perf_clock.

  # taskset --cpu 0 perf record -F 1000 -e cs -- taskset --cpu 1 ./test
  [ perf record: Woken up 1 times to write data ]
  [ perf record: Captured and wrote 0.010 MB perf.data (204 samples) ]

  # perf script
  ...
  test   865   265.377846:         16 cs:  ffffffff832e927b schedule+0x2b
  test   865   265.378900:         15 cs:  ffffffff832e927b schedule+0x2b
  test   865   265.379845:         14 cs:  ffffffff832e927b schedule+0x2b
  test   865   265.380770:         14 cs:  ffffffff832e927b schedule+0x2b
  test   865   265.381647:         15 cs:  ffffffff832e927b schedule+0x2b
  test   865   265.382638:         16 cs:  ffffffff832e927b schedule+0x2b
  test   865   265.383647:         16 cs:  ffffffff832e927b schedule+0x2b
  test   865   265.384704:         15 cs:  ffffffff832e927b schedule+0x2b
  test   865   265.385649:         14 cs:  ffffffff832e927b schedule+0x2b
  test   865   265.386578:        152 cs:  ffffffff832e927b schedule+0x2b
  test   865   265.396383:        154 cs:  ffffffff832e927b schedule+0x2b
  test   865   265.406183:        154 cs:  ffffffff832e927b schedule+0x2b
  test   865   265.415839:        154 cs:  ffffffff832e927b schedule+0x2b
  test   865   265.425445:        154 cs:  ffffffff832e927b schedule+0x2b
  test   865   265.435052:        154 cs:  ffffffff832e927b schedule+0x2b
  test   865   265.444708:        154 cs:  ffffffff832e927b schedule+0x2b
  test   865   265.454314:        154 cs:  ffffffff832e927b schedule+0x2b
  test   865   265.463970:        154 cs:  ffffffff832e927b schedule+0x2b
  test   865   265.473577:        154 cs:  ffffffff832e927b schedule+0x2b
  ...
---
---
 include/linux/perf_event.h |  1 +
 kernel/events/core.c       | 22 ++++++++++++++++++----
 2 files changed, 19 insertions(+), 4 deletions(-)

diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h
index afb028c54f33..2708f1d0692c 100644
--- a/include/linux/perf_event.h
+++ b/include/linux/perf_event.h
@@ -265,6 +265,7 @@ struct hw_perf_event {
 	 * State for freq target events, see __perf_event_overflow() and
 	 * perf_adjust_freq_unthr_context().
 	 */
+	u64				freq_tick_stamp;
 	u64				freq_time_stamp;
 	u64				freq_count_stamp;
 #endif
diff --git a/kernel/events/core.c b/kernel/events/core.c
index 683dc086ef10..3f58d3803237 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -4078,7 +4078,11 @@ static void perf_adjust_period(struct perf_event *event, u64 nsec, u64 count, bo
 	period = perf_calculate_period(event, nsec, count);
 
 	delta = (s64)(period - hwc->sample_period);
-	delta = (delta + 7) / 8; /* low pass filter */
+	if (delta >= 0)
+		delta += 7;
+	else
+		delta -= 7;
+	delta /= 8; /* low pass filter */
 
 	sample_period = hwc->sample_period + delta;
 
@@ -4108,7 +4112,7 @@ perf_adjust_freq_unthr_context(struct perf_event_context *ctx, bool unthrottle)
 {
 	struct perf_event *event;
 	struct hw_perf_event *hwc;
-	u64 now, period = TICK_NSEC;
+	u64 now, period, tick_stamp;
 	s64 delta;
 
 	/*
@@ -4147,6 +4151,10 @@ perf_adjust_freq_unthr_context(struct perf_event_context *ctx, bool unthrottle)
 		 */
 		event->pmu->stop(event, PERF_EF_UPDATE);
 
+		tick_stamp = perf_clock();
+		period = tick_stamp - hwc->freq_tick_stamp;
+		hwc->freq_tick_stamp = tick_stamp;
+
 		now = local64_read(&event->count);
 		delta = now - hwc->freq_count_stamp;
 		hwc->freq_count_stamp = now;
@@ -4158,8 +4166,14 @@ perf_adjust_freq_unthr_context(struct perf_event_context *ctx, bool unthrottle)
 		 * to perf_adjust_period() to avoid stopping it
 		 * twice.
 		 */
-		if (delta > 0)
-			perf_adjust_period(event, period, delta, false);
+		if (delta > 0) {
+			/*
+			 * we skip first tick adjust period
+			 */
+			if (likely(period != tick_stamp && period < 2*TICK_NSEC)) {
+				perf_adjust_period(event, period, delta, false);
+			}
+		}
 
 		event->pmu->start(event, delta > 0 ? PERF_EF_RELOAD : 0);
 	next:
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 3+ messages in thread

* Re: [PATCH v2] perf/core: Fix small negative period being ignored
  2024-03-12 19:38 [PATCH v2] perf/core: Fix small negative period being ignored Luo Gengkun
@ 2024-03-31  6:54 ` Luo Gengkun
  2024-04-10  9:35   ` Luo Gengkun
  0 siblings, 1 reply; 3+ messages in thread
From: Luo Gengkun @ 2024-03-31  6:54 UTC (permalink / raw
  To: peterz
  Cc: mingo, acme, namhyung, mark.rutland, alexander.shishkin, jolsa,
	irogers, adrian.hunter, linux-perf-users, linux-kernel

Ping.

On 2024/3/13 3:38, Luo Gengkun wrote:
> In perf_adjust_period, we will first calculate period, and then use
> this period to calculate delta. However, when delta is less than 0,
> there will be a deviation compared to when delta is greater than or
> equal to 0. For example, when delta is in the range of [-14,-1], the
> range of delta = delta + 7 is between [-7,6], so the final value of
> delta/8 is 0. Therefore, the impact of -1 and -2 will be ignored.
> This is unacceptable when the target period is very short, because
> we will lose a lot of samples.
> 
> Here are some tests and analyzes:
> before:
>    # perf record -e cs -F 1000  ./a.out
>    [ perf record: Woken up 1 times to write data ]
>    [ perf record: Captured and wrote 0.022 MB perf.data (518 samples) ]
> 
>    # perf script
>    ...
>    a.out     396   257.956048:         23 cs:  ffffffff81f4eeec schedul>
>    a.out     396   257.957891:         23 cs:  ffffffff81f4eeec schedul>
>    a.out     396   257.959730:         23 cs:  ffffffff81f4eeec schedul>
>    a.out     396   257.961545:         23 cs:  ffffffff81f4eeec schedul>
>    a.out     396   257.963355:         23 cs:  ffffffff81f4eeec schedul>
>    a.out     396   257.965163:         23 cs:  ffffffff81f4eeec schedul>
>    a.out     396   257.966973:         23 cs:  ffffffff81f4eeec schedul>
>    a.out     396   257.968785:         23 cs:  ffffffff81f4eeec schedul>
>    a.out     396   257.970593:         23 cs:  ffffffff81f4eeec schedul>
>    ...
> 
> after:
>    # perf record -e cs -F 1000  ./a.out
>    [ perf record: Woken up 1 times to write data ]
>    [ perf record: Captured and wrote 0.058 MB perf.data (1466 samples) ]
> 
>    # perf script
>    ...
>    a.out     395    59.338813:         11 cs:  ffffffff81f4eeec schedul>
>    a.out     395    59.339707:         12 cs:  ffffffff81f4eeec schedul>
>    a.out     395    59.340682:         13 cs:  ffffffff81f4eeec schedul>
>    a.out     395    59.341751:         13 cs:  ffffffff81f4eeec schedul>
>    a.out     395    59.342799:         12 cs:  ffffffff81f4eeec schedul>
>    a.out     395    59.343765:         11 cs:  ffffffff81f4eeec schedul>
>    a.out     395    59.344651:         11 cs:  ffffffff81f4eeec schedul>
>    a.out     395    59.345539:         12 cs:  ffffffff81f4eeec schedul>
>    a.out     395    59.346502:         13 cs:  ffffffff81f4eeec schedul>
>    ...
> 
> test.c
> 
> int main() {
>          for (int i = 0; i < 20000; i++)
>                  usleep(10);
> 
>          return 0;
> }
> 
>    # time ./a.out
>    real    0m1.583s
>    user    0m0.040s
>    sys     0m0.298s
> 
> The above results were tested on x86-64 qemu with KVM enabled using
> test.c as test program. Ideally, we should have around 1500 samples,
> but the previous algorithm had only about 500, whereas the modified
> algorithm now has about 1400. Further more, the new version shows 1
> sample per 0.001s, while the previous one is 1 sample per 0.002s.This
> indicates that the new algorithm is more sensitive to small negative
> values compared to old algorithm.
> 
> Fixes: bd2b5b12849a ("perf_counter: More aggressive frequency adjustment")
> 
> Link:https://lore.kernel.org/all/7919005d-ca26-4cae-8c1c-4adea63704ce@huawei.com/
> Signed-off-by: Luo Gengkun <luogengkun2@huawei.com>
> Reviewed-by: Adrian Hunter <adrian.hunter@intel.com>
> 
> ---
> v1 -> v2: Fix incorrected time diff in tick adjust period and review
> 
> It seems that the Timer Interrupts is not coming every TICK_NSEC when
> system is idle. So the final period that we calculate will be bigger
> than expected because we pass an incorrected nsec to perf_adjust_period.
> As shown below, the unexcepted period is come from perf_adjust_freq_unthr_context.
> Moreover, we cannot re-adjust the period using __perf_event_account_interrupt
> because the overflow time is larger than 2 * TICK_NSEC. To fix this
> problem, we can calculate the interval of Timer Interrupts using perf_clock.
> 
>    # taskset --cpu 0 perf record -F 1000 -e cs -- taskset --cpu 1 ./test
>    [ perf record: Woken up 1 times to write data ]
>    [ perf record: Captured and wrote 0.010 MB perf.data (204 samples) ]
> 
>    # perf script
>    ...
>    test   865   265.377846:         16 cs:  ffffffff832e927b schedule+0x2b
>    test   865   265.378900:         15 cs:  ffffffff832e927b schedule+0x2b
>    test   865   265.379845:         14 cs:  ffffffff832e927b schedule+0x2b
>    test   865   265.380770:         14 cs:  ffffffff832e927b schedule+0x2b
>    test   865   265.381647:         15 cs:  ffffffff832e927b schedule+0x2b
>    test   865   265.382638:         16 cs:  ffffffff832e927b schedule+0x2b
>    test   865   265.383647:         16 cs:  ffffffff832e927b schedule+0x2b
>    test   865   265.384704:         15 cs:  ffffffff832e927b schedule+0x2b
>    test   865   265.385649:         14 cs:  ffffffff832e927b schedule+0x2b
>    test   865   265.386578:        152 cs:  ffffffff832e927b schedule+0x2b
>    test   865   265.396383:        154 cs:  ffffffff832e927b schedule+0x2b
>    test   865   265.406183:        154 cs:  ffffffff832e927b schedule+0x2b
>    test   865   265.415839:        154 cs:  ffffffff832e927b schedule+0x2b
>    test   865   265.425445:        154 cs:  ffffffff832e927b schedule+0x2b
>    test   865   265.435052:        154 cs:  ffffffff832e927b schedule+0x2b
>    test   865   265.444708:        154 cs:  ffffffff832e927b schedule+0x2b
>    test   865   265.454314:        154 cs:  ffffffff832e927b schedule+0x2b
>    test   865   265.463970:        154 cs:  ffffffff832e927b schedule+0x2b
>    test   865   265.473577:        154 cs:  ffffffff832e927b schedule+0x2b
>    ...
> ---
> ---
>   include/linux/perf_event.h |  1 +
>   kernel/events/core.c       | 22 ++++++++++++++++++----
>   2 files changed, 19 insertions(+), 4 deletions(-)
> 
> diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h
> index afb028c54f33..2708f1d0692c 100644
> --- a/include/linux/perf_event.h
> +++ b/include/linux/perf_event.h
> @@ -265,6 +265,7 @@ struct hw_perf_event {
>   	 * State for freq target events, see __perf_event_overflow() and
>   	 * perf_adjust_freq_unthr_context().
>   	 */
> +	u64				freq_tick_stamp;
>   	u64				freq_time_stamp;
>   	u64				freq_count_stamp;
>   #endif
> diff --git a/kernel/events/core.c b/kernel/events/core.c
> index 683dc086ef10..3f58d3803237 100644
> --- a/kernel/events/core.c
> +++ b/kernel/events/core.c
> @@ -4078,7 +4078,11 @@ static void perf_adjust_period(struct perf_event *event, u64 nsec, u64 count, bo
>   	period = perf_calculate_period(event, nsec, count);
>   
>   	delta = (s64)(period - hwc->sample_period);
> -	delta = (delta + 7) / 8; /* low pass filter */
> +	if (delta >= 0)
> +		delta += 7;
> +	else
> +		delta -= 7;
> +	delta /= 8; /* low pass filter */
>   
>   	sample_period = hwc->sample_period + delta;
>   
> @@ -4108,7 +4112,7 @@ perf_adjust_freq_unthr_context(struct perf_event_context *ctx, bool unthrottle)
>   {
>   	struct perf_event *event;
>   	struct hw_perf_event *hwc;
> -	u64 now, period = TICK_NSEC;
> +	u64 now, period, tick_stamp;
>   	s64 delta;
>   
>   	/*
> @@ -4147,6 +4151,10 @@ perf_adjust_freq_unthr_context(struct perf_event_context *ctx, bool unthrottle)
>   		 */
>   		event->pmu->stop(event, PERF_EF_UPDATE);
>   
> +		tick_stamp = perf_clock();
> +		period = tick_stamp - hwc->freq_tick_stamp;
> +		hwc->freq_tick_stamp = tick_stamp;
> +
>   		now = local64_read(&event->count);
>   		delta = now - hwc->freq_count_stamp;
>   		hwc->freq_count_stamp = now;
> @@ -4158,8 +4166,14 @@ perf_adjust_freq_unthr_context(struct perf_event_context *ctx, bool unthrottle)
>   		 * to perf_adjust_period() to avoid stopping it
>   		 * twice.
>   		 */
> -		if (delta > 0)
> -			perf_adjust_period(event, period, delta, false);
> +		if (delta > 0) {
> +			/*
> +			 * we skip first tick adjust period
> +			 */
> +			if (likely(period != tick_stamp && period < 2*TICK_NSEC)) {
> +				perf_adjust_period(event, period, delta, false);
> +			}
> +		}
>   
>   		event->pmu->start(event, delta > 0 ? PERF_EF_RELOAD : 0);
>   	next:

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [PATCH v2] perf/core: Fix small negative period being ignored
  2024-03-31  6:54 ` Luo Gengkun
@ 2024-04-10  9:35   ` Luo Gengkun
  0 siblings, 0 replies; 3+ messages in thread
From: Luo Gengkun @ 2024-04-10  9:35 UTC (permalink / raw
  To: peterz
  Cc: mingo, acme, namhyung, mark.rutland, alexander.shishkin, jolsa,
	irogers, adrian.hunter, linux-perf-users, linux-kernel,
	luogengkun2

Hi Adrian, can you tell me what to do next? Is there anything that needs 
to be changed?

Looking forward to your reply.

On 2024/3/31 14:54, Luo Gengkun wrote:
> Ping.
> 
> On 2024/3/13 3:38, Luo Gengkun wrote:
>> In perf_adjust_period, we will first calculate period, and then use
>> this period to calculate delta. However, when delta is less than 0,
>> there will be a deviation compared to when delta is greater than or
>> equal to 0. For example, when delta is in the range of [-14,-1], the
>> range of delta = delta + 7 is between [-7,6], so the final value of
>> delta/8 is 0. Therefore, the impact of -1 and -2 will be ignored.
>> This is unacceptable when the target period is very short, because
>> we will lose a lot of samples.
>>
>> Here are some tests and analyzes:
>> before:
>>    # perf record -e cs -F 1000  ./a.out
>>    [ perf record: Woken up 1 times to write data ]
>>    [ perf record: Captured and wrote 0.022 MB perf.data (518 samples) ]
>>
>>    # perf script
>>    ...
>>    a.out     396   257.956048:         23 cs:  ffffffff81f4eeec schedul>
>>    a.out     396   257.957891:         23 cs:  ffffffff81f4eeec schedul>
>>    a.out     396   257.959730:         23 cs:  ffffffff81f4eeec schedul>
>>    a.out     396   257.961545:         23 cs:  ffffffff81f4eeec schedul>
>>    a.out     396   257.963355:         23 cs:  ffffffff81f4eeec schedul>
>>    a.out     396   257.965163:         23 cs:  ffffffff81f4eeec schedul>
>>    a.out     396   257.966973:         23 cs:  ffffffff81f4eeec schedul>
>>    a.out     396   257.968785:         23 cs:  ffffffff81f4eeec schedul>
>>    a.out     396   257.970593:         23 cs:  ffffffff81f4eeec schedul>
>>    ...
>>
>> after:
>>    # perf record -e cs -F 1000  ./a.out
>>    [ perf record: Woken up 1 times to write data ]
>>    [ perf record: Captured and wrote 0.058 MB perf.data (1466 samples) ]
>>
>>    # perf script
>>    ...
>>    a.out     395    59.338813:         11 cs:  ffffffff81f4eeec schedul>
>>    a.out     395    59.339707:         12 cs:  ffffffff81f4eeec schedul>
>>    a.out     395    59.340682:         13 cs:  ffffffff81f4eeec schedul>
>>    a.out     395    59.341751:         13 cs:  ffffffff81f4eeec schedul>
>>    a.out     395    59.342799:         12 cs:  ffffffff81f4eeec schedul>
>>    a.out     395    59.343765:         11 cs:  ffffffff81f4eeec schedul>
>>    a.out     395    59.344651:         11 cs:  ffffffff81f4eeec schedul>
>>    a.out     395    59.345539:         12 cs:  ffffffff81f4eeec schedul>
>>    a.out     395    59.346502:         13 cs:  ffffffff81f4eeec schedul>
>>    ...
>>
>> test.c
>>
>> int main() {
>>          for (int i = 0; i < 20000; i++)
>>                  usleep(10);
>>
>>          return 0;
>> }
>>
>>    # time ./a.out
>>    real    0m1.583s
>>    user    0m0.040s
>>    sys     0m0.298s
>>
>> The above results were tested on x86-64 qemu with KVM enabled using
>> test.c as test program. Ideally, we should have around 1500 samples,
>> but the previous algorithm had only about 500, whereas the modified
>> algorithm now has about 1400. Further more, the new version shows 1
>> sample per 0.001s, while the previous one is 1 sample per 0.002s.This
>> indicates that the new algorithm is more sensitive to small negative
>> values compared to old algorithm.
>>
>> Fixes: bd2b5b12849a ("perf_counter: More aggressive frequency 
>> adjustment")
>>
>> Link:https://lore.kernel.org/all/7919005d-ca26-4cae-8c1c-4adea63704ce@huawei.com/
>> Signed-off-by: Luo Gengkun <luogengkun2@huawei.com>
>> Reviewed-by: Adrian Hunter <adrian.hunter@intel.com>
>>
>> ---
>> v1 -> v2: Fix incorrected time diff in tick adjust period and review
>>
>> It seems that the Timer Interrupts is not coming every TICK_NSEC when
>> system is idle. So the final period that we calculate will be bigger
>> than expected because we pass an incorrected nsec to perf_adjust_period.
>> As shown below, the unexcepted period is come from 
>> perf_adjust_freq_unthr_context.
>> Moreover, we cannot re-adjust the period using 
>> __perf_event_account_interrupt
>> because the overflow time is larger than 2 * TICK_NSEC. To fix this
>> problem, we can calculate the interval of Timer Interrupts using 
>> perf_clock.
>>
>>    # taskset --cpu 0 perf record -F 1000 -e cs -- taskset --cpu 1 ./test
>>    [ perf record: Woken up 1 times to write data ]
>>    [ perf record: Captured and wrote 0.010 MB perf.data (204 samples) ]
>>
>>    # perf script
>>    ...
>>    test   865   265.377846:         16 cs:  ffffffff832e927b 
>> schedule+0x2b
>>    test   865   265.378900:         15 cs:  ffffffff832e927b 
>> schedule+0x2b
>>    test   865   265.379845:         14 cs:  ffffffff832e927b 
>> schedule+0x2b
>>    test   865   265.380770:         14 cs:  ffffffff832e927b 
>> schedule+0x2b
>>    test   865   265.381647:         15 cs:  ffffffff832e927b 
>> schedule+0x2b
>>    test   865   265.382638:         16 cs:  ffffffff832e927b 
>> schedule+0x2b
>>    test   865   265.383647:         16 cs:  ffffffff832e927b 
>> schedule+0x2b
>>    test   865   265.384704:         15 cs:  ffffffff832e927b 
>> schedule+0x2b
>>    test   865   265.385649:         14 cs:  ffffffff832e927b 
>> schedule+0x2b
>>    test   865   265.386578:        152 cs:  ffffffff832e927b 
>> schedule+0x2b
>>    test   865   265.396383:        154 cs:  ffffffff832e927b 
>> schedule+0x2b
>>    test   865   265.406183:        154 cs:  ffffffff832e927b 
>> schedule+0x2b
>>    test   865   265.415839:        154 cs:  ffffffff832e927b 
>> schedule+0x2b
>>    test   865   265.425445:        154 cs:  ffffffff832e927b 
>> schedule+0x2b
>>    test   865   265.435052:        154 cs:  ffffffff832e927b 
>> schedule+0x2b
>>    test   865   265.444708:        154 cs:  ffffffff832e927b 
>> schedule+0x2b
>>    test   865   265.454314:        154 cs:  ffffffff832e927b 
>> schedule+0x2b
>>    test   865   265.463970:        154 cs:  ffffffff832e927b 
>> schedule+0x2b
>>    test   865   265.473577:        154 cs:  ffffffff832e927b 
>> schedule+0x2b
>>    ...
>> ---
>> ---
>>   include/linux/perf_event.h |  1 +
>>   kernel/events/core.c       | 22 ++++++++++++++++++----
>>   2 files changed, 19 insertions(+), 4 deletions(-)
>>
>> diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h
>> index afb028c54f33..2708f1d0692c 100644
>> --- a/include/linux/perf_event.h
>> +++ b/include/linux/perf_event.h
>> @@ -265,6 +265,7 @@ struct hw_perf_event {
>>        * State for freq target events, see __perf_event_overflow() and
>>        * perf_adjust_freq_unthr_context().
>>        */
>> +    u64                freq_tick_stamp;
>>       u64                freq_time_stamp;
>>       u64                freq_count_stamp;
>>   #endif
>> diff --git a/kernel/events/core.c b/kernel/events/core.c
>> index 683dc086ef10..3f58d3803237 100644
>> --- a/kernel/events/core.c
>> +++ b/kernel/events/core.c
>> @@ -4078,7 +4078,11 @@ static void perf_adjust_period(struct 
>> perf_event *event, u64 nsec, u64 count, bo
>>       period = perf_calculate_period(event, nsec, count);
>>       delta = (s64)(period - hwc->sample_period);
>> -    delta = (delta + 7) / 8; /* low pass filter */
>> +    if (delta >= 0)
>> +        delta += 7;
>> +    else
>> +        delta -= 7;
>> +    delta /= 8; /* low pass filter */
>>       sample_period = hwc->sample_period + delta;
>> @@ -4108,7 +4112,7 @@ perf_adjust_freq_unthr_context(struct 
>> perf_event_context *ctx, bool unthrottle)
>>   {
>>       struct perf_event *event;
>>       struct hw_perf_event *hwc;
>> -    u64 now, period = TICK_NSEC;
>> +    u64 now, period, tick_stamp;
>>       s64 delta;
>>       /*
>> @@ -4147,6 +4151,10 @@ perf_adjust_freq_unthr_context(struct 
>> perf_event_context *ctx, bool unthrottle)
>>            */
>>           event->pmu->stop(event, PERF_EF_UPDATE);
>> +        tick_stamp = perf_clock();
>> +        period = tick_stamp - hwc->freq_tick_stamp;
>> +        hwc->freq_tick_stamp = tick_stamp;
>> +
>>           now = local64_read(&event->count);
>>           delta = now - hwc->freq_count_stamp;
>>           hwc->freq_count_stamp = now;
>> @@ -4158,8 +4166,14 @@ perf_adjust_freq_unthr_context(struct 
>> perf_event_context *ctx, bool unthrottle)
>>            * to perf_adjust_period() to avoid stopping it
>>            * twice.
>>            */
>> -        if (delta > 0)
>> -            perf_adjust_period(event, period, delta, false);
>> +        if (delta > 0) {
>> +            /*
>> +             * we skip first tick adjust period
>> +             */
>> +            if (likely(period != tick_stamp && period < 2*TICK_NSEC)) {
>> +                perf_adjust_period(event, period, delta, false);
>> +            }
>> +        }
>>           event->pmu->start(event, delta > 0 ? PERF_EF_RELOAD : 0);
>>       next:

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2024-04-10  9:35 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-03-12 19:38 [PATCH v2] perf/core: Fix small negative period being ignored Luo Gengkun
2024-03-31  6:54 ` Luo Gengkun
2024-04-10  9:35   ` Luo Gengkun

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).