All the mail mirrored from lore.kernel.org
 help / color / mirror / Atom feed
From: Mingwei Zhang <mizhang@google.com>
To: "Liang, Kan" <kan.liang@linux.intel.com>
Cc: "Mi, Dapeng" <dapeng1.mi@linux.intel.com>,
	Sean Christopherson <seanjc@google.com>,
	 maobibo <maobibo@loongson.cn>,
	Xiong Zhang <xiong.y.zhang@linux.intel.com>,
	pbonzini@redhat.com,  peterz@infradead.org, kan.liang@intel.com,
	zhenyuw@linux.intel.com,  jmattson@google.com,
	kvm@vger.kernel.org, linux-perf-users@vger.kernel.org,
	 linux-kernel@vger.kernel.org, zhiyuan.lv@intel.com,
	eranian@google.com,  irogers@google.com, samantha.alt@intel.com,
	like.xu.linux@gmail.com,  chao.gao@intel.com
Subject: Re: [RFC PATCH 23/41] KVM: x86/pmu: Implement the save/restore of PMU state for Intel CPU
Date: Fri, 26 Apr 2024 11:41:28 -0700	[thread overview]
Message-ID: <CAL715WJKL5__8RU0xxUf0HifNVQBDRODE54O2bwOx45w67TQTQ@mail.gmail.com> (raw)
In-Reply-To: <ff4a4229-04ac-4cbf-8aea-c84ccfa96e0b@linux.intel.com>

On Fri, Apr 26, 2024 at 7:10 AM Liang, Kan <kan.liang@linux.intel.com> wrote:
>
>
>
> On 2024-04-25 11:12 p.m., Mingwei Zhang wrote:
> >>>> the perf_guest_enter(). The stale information is saved in the KVM. Perf
> >>>> will schedule the event in the next perf_guest_exit(). KVM will not know it.
> >>> Ya, the creation of an event on a CPU that currently has guest PMU state loaded
> >>> is what I had in mind when I suggested a callback in my sketch:
> >>>
> >>>   :  D. Add a perf callback that is invoked from IRQ context when perf wants to
> >>>   :     configure a new PMU-based events, *before* actually programming the MSRs,
> >>>   :     and have KVM's callback put the guest PMU state
> >>
> >> when host creates a perf event with exclude_guest attribute which is
> >> used to profile KVM/VMM user space, the vCPU process could work at three
> >> places.
> >>
> >> 1. in guest state (non-root mode)
> >>
> >> 2. inside vcpu-loop
> >>
> >> 3. outside vcpu-loop
> >>
> >> Since the PMU state has already been switched to host state, we don't
> >> need to consider the case 3 and only care about cases 1 and 2.
> >>
> >> when host creates a perf event with exclude_guest attribute to profile
> >> KVM/VMM user space,  an IPI is triggered to enable the perf event
> >> eventually like the following code shows.
> >>
> >> event_function_call(event, __perf_event_enable, NULL);
> >>
> >> For case 1,  a vm-exit is triggered and KVM starts to process the
> >> vm-exit and then run IPI irq handler, exactly speaking
> >> __perf_event_enable() to enable the perf event.
> >>
> >> For case 2, the IPI irq handler would preempt the vcpu-loop and call
> >> __perf_event_enable() to enable the perf event.
> >>
> >> So IMO KVM just needs to provide a callback to switch guest/host PMU
> >> state, and __perf_event_enable() calls this callback before really
> >> touching PMU MSRs.
> > ok, in this case, do we still need KVM to query perf if there are
> > active exclude_guest events? yes? Because there is an ordering issue.
> > The above suggests that the host-level perf profiling comes when a VM
> > is already running, there is an IPI that can invoke the callback and
> > trigger preemption. In this case, KVM should switch the context from
> > guest to host. What if it is the other way around, ie., host-level
> > profiling runs first and then VM runs?
> >
> > In this case, just before entering the vcpu loop, kvm should check
> > whether there is an active host event and save that into a pmu data
> > structure.
>
> KVM doesn't need to save/restore the host state. Host perf has the
> information and will reload the values whenever the host events are
> rescheduled. But I think KVM should clear the registers used by the host
> to prevent the value leaks to the guest.

Right, KVM needs to know about the host state to optimize its own PMU
context switch. If the host is using a counter of index, say 1, then
KVM may not need to zap the value of counter #1, since perf side will
override it.

>
> > If none, do the context switch early (so that KVM saves a
> > huge amount of unnecessary PMU context switches in the future).
> > Otherwise, keep the host PMU context until vm-enter. At the time of
> > vm-exit, do the check again using the data stored in pmu structure. If
> > there is an active event do the context switch to the host PMU,
> > otherwise defer that until exiting the vcpu loop. Of course, in the
> > meantime, if there is any perf profiling started causing the IPI, the
> > irq handler calls the callback, preempting the guest PMU context. If
> > that happens, at the time of exiting the vcpu boundary, PMU context
> > switch is skipped since it is already done. Of course, note that the
> > irq could come at any time, so the PMU context switch in all 4
> > locations need to check the state flag (and skip the context switch if
> > needed).
> >
> > So this requires vcpu->pmu has two pieces of state information: 1) the
> > flag similar to TIF_NEED_FPU_LOAD; 2) host perf context info (phase #1
> > just a boolean; phase #2, bitmap of occupied counters).
> >
> > This is a non-trivial optimization on the PMU context switch. I am
> > thinking about splitting them into the following phases:
> >
> > 1) lazy PMU context switch, i.e., wait until the guest touches PMU MSR
> > for the 1st time.
> > 2) fast PMU context switch on KVM side, i.e., KVM checking event
> > selector value (enable/disable) and selectively switch PMU state
> > (reducing rd/wr msrs)
> > 3) dynamic PMU context boundary, ie., KVM can dynamically choose PMU
> > context switch boundary depending on existing active host-level
> > events.
> > 3.1) more accurate dynamic PMU context switch, ie., KVM checking
> > host-level counter position and further reduces the number of msr
> > accesses.
> > 4) guest PMU context preemption, i.e., any new host-level perf
> > profiling can immediately preempt the guest PMU in the vcpu loop
> > (instead of waiting for the next PMU context switch in KVM).
>
> I'm not quit sure about the 4.
> The new host-level perf must be an exclude_guest event. It should not be
> scheduled when a guest is using the PMU. Why do we want to preempt the
> guest PMU? The current implementation in perf doesn't schedule any
> exclude_guest events when a guest is running.

right. The grey area is the code within the KVM_RUN loop, but
_outside_ of the guest. This part of the code is on the "host" side.
However, for efficiency reasons, KVM defers the PMU context switch by
retaining the guest PMU MSR values within the loop. Optimization 4
allows the host side to immediately profiling this part instead of
waiting for vcpu to reach to PMU context switch locations. Doing so
will generate more accurate results.

Do we want to preempt that? I think it depends. For regular cloud
usage, we don't. But for any other usages where we want to prioritize
KVM/VMM profiling over guest vPMU, it is useful.

My current opinion is that optimization 4 is something nice to have.
But we should allow people to turn it off just like we could choose to
disable preempt kernel.

Thanks.
-Mingwei
>
> Thanks,
> Kan

  reply	other threads:[~2024-04-26 18:42 UTC|newest]

Thread overview: 182+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-01-26  8:54 [RFC PATCH 00/41] KVM: x86/pmu: Introduce passthrough vPM Xiong Zhang
2024-01-26  8:54 ` [RFC PATCH 01/41] perf: x86/intel: Support PERF_PMU_CAP_VPMU_PASSTHROUGH Xiong Zhang
2024-04-11 17:04   ` Sean Christopherson
2024-04-11 17:21     ` Liang, Kan
2024-04-11 17:24       ` Jim Mattson
2024-04-11 17:46         ` Sean Christopherson
2024-04-11 19:13           ` Liang, Kan
2024-04-11 20:43             ` Sean Christopherson
2024-04-11 21:04               ` Liang, Kan
2024-04-11 19:32           ` Sean Christopherson
2024-01-26  8:54 ` [RFC PATCH 02/41] perf: Support guest enter/exit interfaces Xiong Zhang
2024-03-20 16:40   ` Raghavendra Rao Ananta
2024-03-20 17:12     ` Liang, Kan
2024-04-11 18:06   ` Sean Christopherson
2024-04-11 19:53     ` Liang, Kan
2024-04-12 19:17       ` Sean Christopherson
2024-04-12 20:56         ` Liang, Kan
2024-04-15 16:03           ` Liang, Kan
2024-04-16  5:34             ` Zhang, Xiong Y
2024-04-16 12:48               ` Liang, Kan
2024-04-17  9:42                 ` Zhang, Xiong Y
2024-04-18 16:11                   ` Sean Christopherson
2024-04-19  1:37                     ` Zhang, Xiong Y
2024-04-26  4:09       ` Zhang, Xiong Y
2024-01-26  8:54 ` [RFC PATCH 03/41] perf: Set exclude_guest onto nmi_watchdog Xiong Zhang
2024-04-11 18:56   ` Sean Christopherson
2024-01-26  8:54 ` [RFC PATCH 04/41] perf: core/x86: Add support to register a new vector for PMI handling Xiong Zhang
2024-04-11 17:10   ` Sean Christopherson
2024-04-11 19:05     ` Sean Christopherson
2024-04-12  3:56     ` Zhang, Xiong Y
2024-04-13  1:17       ` Mi, Dapeng
2024-01-26  8:54 ` [RFC PATCH 05/41] KVM: x86/pmu: Register PMI handler for passthrough PMU Xiong Zhang
2024-04-11 19:07   ` Sean Christopherson
2024-04-12  5:44     ` Zhang, Xiong Y
2024-01-26  8:54 ` [RFC PATCH 06/41] perf: x86: Add function to switch PMI handler Xiong Zhang
2024-04-11 19:17   ` Sean Christopherson
2024-04-11 19:34     ` Sean Christopherson
2024-04-12  6:03       ` Zhang, Xiong Y
2024-04-12  5:57     ` Zhang, Xiong Y
2024-01-26  8:54 ` [RFC PATCH 07/41] perf/x86: Add interface to reflect virtual LVTPC_MASK bit onto HW Xiong Zhang
2024-04-11 19:21   ` Sean Christopherson
2024-04-12  6:17     ` Zhang, Xiong Y
2024-01-26  8:54 ` [RFC PATCH 08/41] KVM: x86/pmu: Add get virtual LVTPC_MASK bit function Xiong Zhang
2024-04-11 19:22   ` Sean Christopherson
2024-01-26  8:54 ` [RFC PATCH 09/41] perf: core/x86: Forbid PMI handler when guest own PMU Xiong Zhang
2024-04-11 19:26   ` Sean Christopherson
2024-01-26  8:54 ` [RFC PATCH 10/41] perf: core/x86: Plumb passthrough PMU capability from x86_pmu to x86_pmu_cap Xiong Zhang
2024-01-26  8:54 ` [RFC PATCH 11/41] KVM: x86/pmu: Introduce enable_passthrough_pmu module parameter and propage to KVM instance Xiong Zhang
2024-04-11 20:54   ` Sean Christopherson
2024-04-11 21:03   ` Sean Christopherson
2024-01-26  8:54 ` [RFC PATCH 12/41] KVM: x86/pmu: Plumb through passthrough PMU to vcpu for Intel CPUs Xiong Zhang
2024-04-11 20:57   ` Sean Christopherson
2024-01-26  8:54 ` [RFC PATCH 13/41] KVM: x86/pmu: Add a helper to check if passthrough PMU is enabled Xiong Zhang
2024-01-26  8:54 ` [RFC PATCH 14/41] KVM: x86/pmu: Allow RDPMC pass through Xiong Zhang
2024-01-26  8:54 ` [RFC PATCH 15/41] KVM: x86/pmu: Manage MSR interception for IA32_PERF_GLOBAL_CTRL Xiong Zhang
2024-01-29  5:53   ` kernel test robot
2024-04-11 21:21   ` Sean Christopherson
2024-04-11 22:30     ` Jim Mattson
2024-04-11 23:27       ` Sean Christopherson
2024-04-13  2:10       ` Mi, Dapeng
2024-01-26  8:54 ` [RFC PATCH 16/41] KVM: x86/pmu: Create a function prototype to disable MSR interception Xiong Zhang
2024-01-26  8:54 ` [RFC PATCH 17/41] KVM: x86/pmu: Implement pmu function for Intel CPU " Xiong Zhang
2024-01-26  8:54 ` [RFC PATCH 18/41] KVM: x86/pmu: Intercept full-width GP counter MSRs by checking with perf capabilities Xiong Zhang
2024-04-11 21:23   ` Sean Christopherson
2024-04-11 21:50     ` Jim Mattson
2024-04-12 16:01       ` Sean Christopherson
2024-01-26  8:54 ` [RFC PATCH 19/41] KVM: x86/pmu: Whitelist PMU MSRs for passthrough PMU Xiong Zhang
2024-01-26  8:54 ` [RFC PATCH 20/41] KVM: x86/pmu: Introduce PMU operation prototypes for save/restore PMU context Xiong Zhang
2024-01-26  8:54 ` [RFC PATCH 21/41] KVM: x86/pmu: Introduce function prototype for Intel CPU to " Xiong Zhang
2024-01-26  8:54 ` [RFC PATCH 22/41] x86: Introduce MSR_CORE_PERF_GLOBAL_STATUS_SET for passthrough PMU Xiong Zhang
2024-01-26  8:54 ` [RFC PATCH 23/41] KVM: x86/pmu: Implement the save/restore of PMU state for Intel CPU Xiong Zhang
2024-04-11 21:26   ` Sean Christopherson
2024-04-13  2:29     ` Mi, Dapeng
2024-04-11 21:44   ` Sean Christopherson
2024-04-11 22:19     ` Jim Mattson
2024-04-11 23:31       ` Sean Christopherson
2024-04-13  3:19         ` Mi, Dapeng
2024-04-13  3:03     ` Mi, Dapeng
2024-04-13  3:34       ` Mingwei Zhang
2024-04-13  4:25         ` Mi, Dapeng
2024-04-15  6:06           ` Mingwei Zhang
2024-04-15 10:04             ` Mi, Dapeng
2024-04-15 16:44               ` Mingwei Zhang
2024-04-15 17:38                 ` Sean Christopherson
2024-04-15 17:54                   ` Mingwei Zhang
2024-04-15 22:45                     ` Sean Christopherson
2024-04-22  2:14                       ` maobibo
2024-04-22 17:01                         ` Sean Christopherson
2024-04-23  1:01                           ` maobibo
2024-04-23  2:44                             ` Mi, Dapeng
2024-04-23  2:53                               ` maobibo
2024-04-23  3:13                                 ` Mi, Dapeng
2024-04-23  3:26                                   ` maobibo
2024-04-23  3:59                                     ` Mi, Dapeng
2024-04-23  3:55                                   ` maobibo
2024-04-23  4:23                                     ` Mingwei Zhang
2024-04-23  6:08                                       ` maobibo
2024-04-23  6:45                                         ` Mi, Dapeng
2024-04-23  7:10                                           ` Mingwei Zhang
2024-04-23  8:24                                             ` Mi, Dapeng
2024-04-23  8:51                                               ` maobibo
2024-04-23 16:50                                               ` Mingwei Zhang
2024-04-23 12:12                                       ` maobibo
2024-04-23 17:02                                         ` Mingwei Zhang
2024-04-24  1:07                                           ` maobibo
2024-04-24  8:18                                           ` Mi, Dapeng
2024-04-24 15:00                                             ` Sean Christopherson
2024-04-25  3:55                                               ` Mi, Dapeng
2024-04-25  4:24                                                 ` Mingwei Zhang
2024-04-25 16:13                                                   ` Liang, Kan
2024-04-25 20:16                                                     ` Mingwei Zhang
2024-04-25 20:43                                                       ` Liang, Kan
2024-04-25 21:46                                                         ` Sean Christopherson
2024-04-26  1:46                                                           ` Mi, Dapeng
2024-04-26  3:12                                                             ` Mingwei Zhang
2024-04-26  4:02                                                               ` Mi, Dapeng
2024-04-26  4:46                                                                 ` Mingwei Zhang
2024-04-26 14:09                                                               ` Liang, Kan
2024-04-26 18:41                                                                 ` Mingwei Zhang [this message]
2024-04-26 19:06                                                                   ` Liang, Kan
2024-04-26 19:46                                                                     ` Sean Christopherson
2024-04-27  3:04                                                                       ` Mingwei Zhang
2024-04-28  0:58                                                                         ` Mi, Dapeng
2024-04-28  6:01                                                                           ` Mingwei Zhang
2024-04-29 17:44                                                                             ` Sean Christopherson
2024-05-01 17:43                                                                               ` Mingwei Zhang
2024-05-01 18:00                                                                                 ` Liang, Kan
2024-05-01 20:36                                                                                 ` Sean Christopherson
2024-04-29 13:08                                                                         ` Liang, Kan
2024-04-26 13:53                                                           ` Liang, Kan
2024-04-26  1:50                                                         ` Mi, Dapeng
2024-04-18 21:21                   ` Mingwei Zhang
2024-04-18 21:41                     ` Mingwei Zhang
2024-04-19  1:02                     ` Mi, Dapeng
2024-01-26  8:54 ` [RFC PATCH 24/41] KVM: x86/pmu: Zero out unexposed Counters/Selectors to avoid information leakage Xiong Zhang
2024-04-11 21:36   ` Sean Christopherson
2024-04-11 21:56     ` Jim Mattson
2024-01-26  8:54 ` [RFC PATCH 25/41] KVM: x86/pmu: Introduce macro PMU_CAP_PERF_METRICS Xiong Zhang
2024-01-26  8:54 ` [RFC PATCH 26/41] KVM: x86/pmu: Add host_perf_cap field in kvm_caps to record host PMU capability Xiong Zhang
2024-04-11 21:49   ` Sean Christopherson
2024-01-26  8:54 ` [RFC PATCH 27/41] KVM: x86/pmu: Clear PERF_METRICS MSR for guest Xiong Zhang
2024-04-11 21:50   ` Sean Christopherson
2024-04-13  3:29     ` Mi, Dapeng
2024-01-26  8:54 ` [RFC PATCH 28/41] KVM: x86/pmu: Switch IA32_PERF_GLOBAL_CTRL at VM boundary Xiong Zhang
2024-04-11 21:54   ` Sean Christopherson
2024-04-11 22:10     ` Jim Mattson
2024-04-11 22:54       ` Sean Christopherson
2024-04-11 23:08         ` Jim Mattson
2024-01-26  8:54 ` [RFC PATCH 29/41] KVM: x86/pmu: Exclude existing vLBR logic from the passthrough PMU Xiong Zhang
2024-01-26  8:54 ` [RFC PATCH 30/41] KVM: x86/pmu: Switch PMI handler at KVM context switch boundary Xiong Zhang
2024-01-26  8:54 ` [RFC PATCH 31/41] KVM: x86/pmu: Call perf_guest_enter() at PMU context switch Xiong Zhang
2024-01-26  8:54 ` [RFC PATCH 32/41] KVM: x86/pmu: Add support for PMU context switch at VM-exit/enter Xiong Zhang
2024-01-26  8:54 ` [RFC PATCH 33/41] KVM: x86/pmu: Make check_pmu_event_filter() an exported function Xiong Zhang
2024-01-26  8:54 ` [RFC PATCH 34/41] KVM: x86/pmu: Intercept EVENT_SELECT MSR Xiong Zhang
2024-04-11 21:55   ` Sean Christopherson
2024-01-26  8:54 ` [RFC PATCH 35/41] KVM: x86/pmu: Allow writing to event selector for GP counters if event is allowed Xiong Zhang
2024-01-26  8:54 ` [RFC PATCH 36/41] KVM: x86/pmu: Intercept FIXED_CTR_CTRL MSR Xiong Zhang
2024-04-11 21:56   ` Sean Christopherson
2024-01-26  8:54 ` [RFC PATCH 37/41] KVM: x86/pmu: Allow writing to fixed counter selector if counter is exposed Xiong Zhang
2024-04-11 22:03   ` Sean Christopherson
2024-04-13  4:12     ` Mi, Dapeng
2024-01-26  8:54 ` [RFC PATCH 38/41] KVM: x86/pmu: Introduce PMU helper to increment counter Xiong Zhang
2024-01-26  8:54 ` [RFC PATCH 39/41] KVM: x86/pmu: Implement emulated counter increment for passthrough PMU Xiong Zhang
2024-04-11 23:12   ` Sean Christopherson
2024-04-11 23:17     ` Sean Christopherson
2024-01-26  8:54 ` [RFC PATCH 40/41] KVM: x86/pmu: Separate passthrough PMU logic in set/get_msr() from non-passthrough vPMU Xiong Zhang
2024-04-11 23:18   ` Sean Christopherson
2024-04-18 21:54     ` Mingwei Zhang
2024-01-26  8:54 ` [RFC PATCH 41/41] KVM: nVMX: Add nested virtualization support for passthrough PMU Xiong Zhang
2024-04-11 23:21   ` Sean Christopherson
2024-04-11 17:03 ` [RFC PATCH 00/41] KVM: x86/pmu: Introduce passthrough vPM Sean Christopherson
2024-04-12  2:19   ` Zhang, Xiong Y
2024-04-12 18:32     ` Sean Christopherson
2024-04-15  1:06       ` Zhang, Xiong Y
2024-04-15 15:05         ` Sean Christopherson
2024-04-16  5:11           ` Zhang, Xiong Y
2024-04-18 20:46   ` Mingwei Zhang
2024-04-18 21:52     ` Mingwei Zhang
2024-04-19 19:14     ` Sean Christopherson
2024-04-19 22:02       ` Mingwei Zhang
2024-04-11 23:25 ` Sean Christopherson
2024-04-11 23:56   ` Mingwei Zhang

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAL715WJKL5__8RU0xxUf0HifNVQBDRODE54O2bwOx45w67TQTQ@mail.gmail.com \
    --to=mizhang@google.com \
    --cc=chao.gao@intel.com \
    --cc=dapeng1.mi@linux.intel.com \
    --cc=eranian@google.com \
    --cc=irogers@google.com \
    --cc=jmattson@google.com \
    --cc=kan.liang@intel.com \
    --cc=kan.liang@linux.intel.com \
    --cc=kvm@vger.kernel.org \
    --cc=like.xu.linux@gmail.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-perf-users@vger.kernel.org \
    --cc=maobibo@loongson.cn \
    --cc=pbonzini@redhat.com \
    --cc=peterz@infradead.org \
    --cc=samantha.alt@intel.com \
    --cc=seanjc@google.com \
    --cc=xiong.y.zhang@linux.intel.com \
    --cc=zhenyuw@linux.intel.com \
    --cc=zhiyuan.lv@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.