All the mail mirrored from lore.kernel.org
 help / color / mirror / Atom feed
From: Atish Patra <atishp@rivosinc.com>
To: 雷博涵 <garthlei@pku.edu.cn>, "Atish Patra" <atishp@atishpatra.org>
Cc: linux-riscv@lists.infradead.org
Subject: Re: RISC-V PMU driver issue
Date: Fri, 26 Apr 2024 17:07:07 -0700	[thread overview]
Message-ID: <03af8c62-affa-4ef0-b3a8-ce3d0aca432b@rivosinc.com> (raw)
In-Reply-To: <03e5da66-d2ee-482c-9fdd-333f18ca90a1@rivosinc.com>

On 3/18/24 15:18, Atish Patra wrote:
> On 3/12/24 21:29, 雷博涵 wrote:
>> Sorry for the duplicated message. The last one was an HTML e-mail so I 
>> resend this.
>>
>>> 2024年3月13日 08:02,Atish Patra <atishp@atishpatra.org> 写道:
>>>
>>> On Tue, Feb 27, 2024 at 7:49 PM 雷博涵 <garthlei@pku.edu.cn> wrote:
>>>>
>>>> Hi all,
>>>>
>>>> I am having problems with the RISC-V PMU driver. The overflow 
>>>> handler of my custom perf event kernel counter seems to read an 
>>>> incorrect value from the event.
>>>
>>> If I understand you correctly, you have a custom kernel counter for
>>> the perf event ? Can you explain the use case for that ?
>>
>> Actually it was the perf-event backend of an open-source tool called 
>> PMCTrack (https://github.com/jcsaezal/pmctrack), which I was trying to 
>> port to RISC-V. It creates a kernel counter using 
>> `perf_event_create_kernel_counter`, registering an overflow handler. 
>> The registered overflow handler calls `perf_event_read_value` to read 
>> the counter value.
>>
> 
> Thanks. It makes sense now.
> 
>>>
>>>> It seems that the issue lies in the function 
>>>> `riscv_pmu_event_set_period`, which sets `prev_count` to a new value 
>>>> but does not modify the underlying hardware counter. When 
>>>> `perf_event_read_value` gets called later in the user-defined 
>>>> overflow handler, `riscv_pmu_event_update` will update the `count` 
>>>> field again based on the unmodified hardware counter value and the 
>>>> modified `prev_count` field, which causes an incorrect reading.
>>>
>>> What do you mean by user defined overflow handler ? The overflow
>>> handler registered via the custom perf event kernel counter which gets
>>> invoked from perf_event_overflow ?
>>
>> Yes.
>>
>>>
>>>> I noticed that other PMU drivers, such as the ARM one, write to the 
>>>> underlying counter in their set_period functions, which prevents the 
>>>> problem. However, the RISC-V SBI specification does not have such an 
>>>> API to write to a counter without starting it. Using 
>>>> `local64_read(&hw_evt->period_left) <= 0` directly as the guard 
>>>> condition to place `riscv_pmu_event_set_period(event)` after it 
>>>> seems to work, but I am not sure whether it can cause other issues.
>>>
>>> Not sure the exact code path you are referring to. You don't want to
>>> invoke  riscv_pmu_event_set_period if period_left <= 0 ?
>>
>> First, `pmu_sbi_ovf_handler` calls `riscv_pmu_event_update`, so the 
>> perf event counter value (`event->count`) is updated. It calls 
>> `riscv_pmu_event_set_period` later, which calls 
>> `local64_set(&hwc->prev_count, (u64)-left)`. When 
>> `perf_event_read_value` gets invoked in the registered overflow 
>> handler, `riscv_pmu_event_update` will be invoked eventually. Because 
>> `prev_count` has been set to `(u64)-left` and the underlying hardware 
>> counter value (the value read via `rvpmu->ctr_read(event)`) has not 
>> been changed, `event->count` will be updated again, and the value this 
>> time will be incorrect.
> 
> Got it. This is a genuine issue. I can reproduce it as well with kvm pmu 
> support as it creates a kernel perf event as well.
> 
>>
>> I have found that the ARM PMU does not have the problem. The function 
>> `armpmu_event_set_period` in `arm_pmu.c` calls 
>> `armpmu->write_counter(event, (u64)(-left) & max_period)` after 
>> `local64_set(&hwc->prev_count, (u64)-left)`, so that the underlying 
>> hardware counter is also set to a new value. Thus, when 
>> `armpmu_event_update` gets called later in the registered handler, it 
>> will find that the underlying counter has the same value as 
>> `prev_count`, so `event->count` can keep the correct value.
>>
>> However, as I mentioned before, the RISC-V SBI specification does not 
>> have such an API as “write_counter,” so the ARM solution could not be 
>> adapted easily. I was wondering if we could postpone 
>> `riscv_pmu_event_set_period` after the if-block, like this:
>>
>> ```
>> if (local64_read(&hw_evt->period_left) <= 0) {
>>    perf_event_overflow(event, &data, regs);
>> }
>> riscv_pmu_event_set_period(event);
>> ```
>>
>> This way `perf_event_read_value` can return the correct counter value. 
>> `riscv_pmu_event_set_period` gets invoked after perf_event_overflow. 
>> However, I am not sure if it can cause other problems.
>>
> 
> That seems like a hack. I think we can avoid updating again based on 
> perf state which can indicate that event->count is already updated. 
> Thus, no need to update it again. I will toy around with this idea and 
> get back to you.
> 

Can you try this diff on latest upstream ?


diff --git a/drivers/perf/riscv_pmu.c b/drivers/perf/riscv_pmu.c
index b4efdddb2ad9..e08529c45c50 100644
--- a/drivers/perf/riscv_pmu.c
+++ b/drivers/perf/riscv_pmu.c
@@ -167,7 +167,7 @@ u64 riscv_pmu_event_update(struct perf_event *event)
         unsigned long cmask;
         u64 oldval, delta;

-       if (!rvpmu->ctr_read)
+       if (!rvpmu->ctr_read || (hwc->state == PERF_HES_UPTODATE))
                 return 0;

         cmask = riscv_pmu_ctr_get_width_mask(event);
diff --git a/drivers/perf/riscv_pmu_sbi.c b/drivers/perf/riscv_pmu_sbi.c
index 8cbe6e5f9c39..6ba825a38619 100644
--- a/drivers/perf/riscv_pmu_sbi.c
+++ b/drivers/perf/riscv_pmu_sbi.c
@@ -763,7 +763,10 @@ static irqreturn_t pmu_sbi_ovf_handler(int irq, 
void *dev)
                  */
                 overflowed_ctrs |= BIT(lidx);
                 hw_evt = &event->hw;
+               /* Update the event states here so that we know the 
state while reading */
+               hw_evt->state |= PERF_HES_STOPPED;
                 riscv_pmu_event_update(event);
+               hw_evt->state |= PERF_HES_UPTODATE;
                 perf_sample_data_init(&data, 0, hw_evt->last_period);
                 if (riscv_pmu_event_set_period(event)) {
                         /*
@@ -776,6 +779,8 @@ static irqreturn_t pmu_sbi_ovf_handler(int irq, void 
*dev)
                          */
                         perf_event_overflow(event, &data, regs);
                 }
+               /* Reset the state as we are going to start the counter 
after the loop */
+               hw_evt->state = 0;
         }



All the changes in drivers/perf/riscv_pmu_sbi.c are already part of 
ongoing series with snapshot support[1].

I will send the changes in drivers/perf/riscv_pmu.c as a separate fix if 
this works for you.

[1] https://lore.kernel.org/kvm/20240420151741.962500-1-atishp@rivosinc.com/

>>>
>>>>
>>>> Thank you,
>>>> Bohan Lei
>>>
>>>
>>> -- 
>>> Regards,
>>> Atish
>>
>> -- 
>> Regards,
>> Bohan Lei
>>
>> _______________________________________________
>> linux-riscv mailing list
>> linux-riscv@lists.infradead.org
>> http://lists.infradead.org/mailman/listinfo/linux-riscv
> 
> 
> _______________________________________________
> linux-riscv mailing list
> linux-riscv@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-riscv


_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

  reply	other threads:[~2024-04-27  0:07 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-02-28  3:49 RISC-V PMU driver issue 雷博涵
2024-03-13  0:02 ` Atish Patra
2024-03-13  4:29   ` 雷博涵
2024-03-18 22:18     ` Atish Patra
2024-04-27  0:07       ` Atish Patra [this message]
     [not found]         ` <2451aea.567f.18f3c8baa06.Coremail.garthlei@pku.edu.cn>
     [not found]           ` <CAHBxVyEotFmCfc+Ym-p7f=WAocAXtNA_z4MvOr7Ky8fXxPtVXA@mail.gmail.com>
2024-05-08  3:01             ` Re: " 雷博涵
2024-05-08  6:57               ` Atish Kumar Patra

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=03af8c62-affa-4ef0-b3a8-ce3d0aca432b@rivosinc.com \
    --to=atishp@rivosinc.com \
    --cc=atishp@atishpatra.org \
    --cc=garthlei@pku.edu.cn \
    --cc=linux-riscv@lists.infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.