LKML Archive mirror
 help / color / mirror / Atom feed
From: Mingwei Zhang <mizhang@google.com>
To: Sean Christopherson <seanjc@google.com>
Cc: Xiong Zhang <xiong.y.zhang@linux.intel.com>,
	pbonzini@redhat.com, peterz@infradead.org, kan.liang@intel.com,
	zhenyuw@linux.intel.com, dapeng1.mi@linux.intel.com,
	jmattson@google.com, kvm@vger.kernel.org,
	linux-perf-users@vger.kernel.org, linux-kernel@vger.kernel.org,
	zhiyuan.lv@intel.com, eranian@google.com, irogers@google.com,
	samantha.alt@intel.com, like.xu.linux@gmail.com,
	chao.gao@intel.com
Subject: Re: [RFC PATCH 00/41] KVM: x86/pmu: Introduce passthrough vPM
Date: Thu, 18 Apr 2024 20:46:00 +0000	[thread overview]
Message-ID: <ZiGGiOspm6N-vIta@google.com> (raw)
In-Reply-To: <ZhgX6BStTh05OfEd@google.com>

On Thu, Apr 11, 2024, Sean Christopherson wrote:
> <bikeshed>
> 
> I think we should call this a mediated PMU, not a passthrough PMU.  KVM still
> emulates the control plane (controls and event selectors), while the data is
> fully passed through (counters).
> 
> </bikeshed>
Sean,

I feel "mediated PMU" seems to be a little bit off the ..., no? In
KVM, almost all of features are mediated. In our specific case, the
legacy PMU is mediated by KVM and perf subsystem on the host. In new
design, it is mediated by KVM only.

We intercept the control plan in current design, but the only thing
we do is the event filtering. No fancy code change to emulate the control
registers. So, it is still a passthrough logic.

In some (rare) business cases, I think maybe we could fully passthrough
the control plan as well. For instance, sole-tenant machine, or
full-machine VM + full offload. In case if there is a cpu errata, KVM
can force vmexit and dynamically intercept the selectors on all vcpus
with filters checked. It is not supported in current RFC, but maybe
doable in later versions.

With the above, I wonder if we can still use passthrough PMU for
simplicity? But no strong opinion if you really want to keep this name.
I would have to take some time to convince myself.

Thanks.
-Mingwei
> 
> On Fri, Jan 26, 2024, Xiong Zhang wrote:
> 
> > 1. host system wide / QEMU events handling during VM running
> >    At VM-entry, all the host perf events which use host x86 PMU will be
> >    stopped. These events with attr.exclude_guest = 1 will be stopped here
> >    and re-started after vm-exit. These events without attr.exclude_guest=1
> >    will be in error state, and they cannot recovery into active state even
> >    if the guest stops running. This impacts host perf a lot and request
> >    host system wide perf events have attr.exclude_guest=1.
> > 
> >    This requests QEMU Process's perf event with attr.exclude_guest=1 also.
> > 
> >    During VM running, perf event creation for system wide and QEMU
> >    process without attr.exclude_guest=1 fail with -EBUSY. 
> > 
> > 2. NMI watchdog
> >    the perf event for NMI watchdog is a system wide cpu pinned event, it
> >    will be stopped also during vm running, but it doesn't have
> >    attr.exclude_guest=1, we add it in this RFC. But this still means NMI
> >    watchdog loses function during VM running.
> > 
> >    Two candidates exist for replacing perf event of NMI watchdog:
> >    a. Buddy hardlock detector[3] may be not reliable to replace perf event.
> >    b. HPET-based hardlock detector [4] isn't in the upstream kernel.
> 
> I think the simplest solution is to allow mediated PMU usage if and only if
> the NMI watchdog is disabled.  Then whether or not the host replaces the NMI
> watchdog with something else becomes an orthogonal discussion, i.e. not KVM's
> problem to solve.
> 
> > 3. Dedicated kvm_pmi_vector
> >    In emulated vPMU, host PMI handler notify KVM to inject a virtual
> >    PMI into guest when physical PMI belongs to guest counter. If the
> >    same mechanism is used in passthrough vPMU and PMI skid exists
> >    which cause physical PMI belonging to guest happens after VM-exit,
> >    then the host PMI handler couldn't identify this PMI belongs to
> >    host or guest.
> >    So this RFC uses a dedicated kvm_pmi_vector, PMI belonging to guest
> >    has this vector only. The PMI belonging to host still has an NMI
> >    vector.
> > 
> >    Without considering PMI skid especially for AMD, the host NMI vector
> >    could be used for guest PMI also, this method is simpler and doesn't
> 
> I don't see how multiplexing NMIs between guest and host is simpler.  At best,
> the complexity is a wash, just in different locations, and I highly doubt it's
> a wash.  AFAIK, there is no way to precisely know that an NMI came in via the
> LVTPC.
> 
> E.g. if an IPI NMI arrives before the host's PMU is loaded, confusion may ensue.
> SVM has the luxury of running with GIF=0, but that simply isn't an option on VMX.
> 
> >    need x86 subsystem to reserve the dedicated kvm_pmi_vector, and we
> >    didn't meet the skid PMI issue on modern Intel processors.
> > 
> > 4. per-VM passthrough mode configuration
> >    Current RFC uses a KVM module enable_passthrough_pmu RO parameter,
> >    it decides vPMU is passthrough mode or emulated mode at kvm module
> >    load time.
> >    Do we need the capability of per-VM passthrough mode configuration?
> >    So an admin can launch some non-passthrough VM and profile these
> >    non-passthrough VMs in host, but admin still cannot profile all
> >    the VMs once passthrough VM existence. This means passthrough vPMU
> >    and emulated vPMU mix on one platform, it has challenges to implement.
> >    As the commit message in commit 0011, the main challenge is 
> >    passthrough vPMU and emulated vPMU have different vPMU features, this
> >    ends up with two different values for kvm_cap.supported_perf_cap, which
> >    is initialized at module load time. To support it, more refactor is
> >    needed.
> 
> I have no objection to an all-or-nothing setup.  I'd honestly love to rip out the
> existing vPMU support entirely, but that's probably not be realistic, at least not
> in the near future.
> 
> > Remain Works
> > ===
> > 1. To reduce passthrough vPMU overhead, optimize the PMU context switch.
> 
> Before this gets out of its "RFC" phase, I would at least like line of sight to
> a more optimized switch.  I 100% agree that starting with a conservative
> implementation is the way to go, and the kernel absolutely needs to be able to
> profile KVM itself (and everything KVM calls into), i.e. _always_ keeping the
> guest PMU loaded for the entirety of KVM_RUN isn't a viable option.
> 
> But I also don't want to get into a situation where can't figure out a clean,
> robust way to do the optimized context switch without needing (another) massive
> rewrite.

  parent reply	other threads:[~2024-04-18 20:46 UTC|newest]

Thread overview: 181+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-01-26  8:54 [RFC PATCH 00/41] KVM: x86/pmu: Introduce passthrough vPM Xiong Zhang
2024-01-26  8:54 ` [RFC PATCH 01/41] perf: x86/intel: Support PERF_PMU_CAP_VPMU_PASSTHROUGH Xiong Zhang
2024-04-11 17:04   ` Sean Christopherson
2024-04-11 17:21     ` Liang, Kan
2024-04-11 17:24       ` Jim Mattson
2024-04-11 17:46         ` Sean Christopherson
2024-04-11 19:13           ` Liang, Kan
2024-04-11 20:43             ` Sean Christopherson
2024-04-11 21:04               ` Liang, Kan
2024-04-11 19:32           ` Sean Christopherson
2024-01-26  8:54 ` [RFC PATCH 02/41] perf: Support guest enter/exit interfaces Xiong Zhang
2024-03-20 16:40   ` Raghavendra Rao Ananta
2024-03-20 17:12     ` Liang, Kan
2024-04-11 18:06   ` Sean Christopherson
2024-04-11 19:53     ` Liang, Kan
2024-04-12 19:17       ` Sean Christopherson
2024-04-12 20:56         ` Liang, Kan
2024-04-15 16:03           ` Liang, Kan
2024-04-16  5:34             ` Zhang, Xiong Y
2024-04-16 12:48               ` Liang, Kan
2024-04-17  9:42                 ` Zhang, Xiong Y
2024-04-18 16:11                   ` Sean Christopherson
2024-04-19  1:37                     ` Zhang, Xiong Y
2024-04-26  4:09       ` Zhang, Xiong Y
2024-01-26  8:54 ` [RFC PATCH 03/41] perf: Set exclude_guest onto nmi_watchdog Xiong Zhang
2024-04-11 18:56   ` Sean Christopherson
2024-01-26  8:54 ` [RFC PATCH 04/41] perf: core/x86: Add support to register a new vector for PMI handling Xiong Zhang
2024-04-11 17:10   ` Sean Christopherson
2024-04-11 19:05     ` Sean Christopherson
2024-04-12  3:56     ` Zhang, Xiong Y
2024-04-13  1:17       ` Mi, Dapeng
2024-01-26  8:54 ` [RFC PATCH 05/41] KVM: x86/pmu: Register PMI handler for passthrough PMU Xiong Zhang
2024-04-11 19:07   ` Sean Christopherson
2024-04-12  5:44     ` Zhang, Xiong Y
2024-01-26  8:54 ` [RFC PATCH 06/41] perf: x86: Add function to switch PMI handler Xiong Zhang
2024-04-11 19:17   ` Sean Christopherson
2024-04-11 19:34     ` Sean Christopherson
2024-04-12  6:03       ` Zhang, Xiong Y
2024-04-12  5:57     ` Zhang, Xiong Y
2024-01-26  8:54 ` [RFC PATCH 07/41] perf/x86: Add interface to reflect virtual LVTPC_MASK bit onto HW Xiong Zhang
2024-04-11 19:21   ` Sean Christopherson
2024-04-12  6:17     ` Zhang, Xiong Y
2024-01-26  8:54 ` [RFC PATCH 08/41] KVM: x86/pmu: Add get virtual LVTPC_MASK bit function Xiong Zhang
2024-04-11 19:22   ` Sean Christopherson
2024-01-26  8:54 ` [RFC PATCH 09/41] perf: core/x86: Forbid PMI handler when guest own PMU Xiong Zhang
2024-04-11 19:26   ` Sean Christopherson
2024-01-26  8:54 ` [RFC PATCH 10/41] perf: core/x86: Plumb passthrough PMU capability from x86_pmu to x86_pmu_cap Xiong Zhang
2024-01-26  8:54 ` [RFC PATCH 11/41] KVM: x86/pmu: Introduce enable_passthrough_pmu module parameter and propage to KVM instance Xiong Zhang
2024-04-11 20:54   ` Sean Christopherson
2024-04-11 21:03   ` Sean Christopherson
2024-01-26  8:54 ` [RFC PATCH 12/41] KVM: x86/pmu: Plumb through passthrough PMU to vcpu for Intel CPUs Xiong Zhang
2024-04-11 20:57   ` Sean Christopherson
2024-01-26  8:54 ` [RFC PATCH 13/41] KVM: x86/pmu: Add a helper to check if passthrough PMU is enabled Xiong Zhang
2024-01-26  8:54 ` [RFC PATCH 14/41] KVM: x86/pmu: Allow RDPMC pass through Xiong Zhang
2024-01-26  8:54 ` [RFC PATCH 15/41] KVM: x86/pmu: Manage MSR interception for IA32_PERF_GLOBAL_CTRL Xiong Zhang
2024-04-11 21:21   ` Sean Christopherson
2024-04-11 22:30     ` Jim Mattson
2024-04-11 23:27       ` Sean Christopherson
2024-04-13  2:10       ` Mi, Dapeng
2024-01-26  8:54 ` [RFC PATCH 16/41] KVM: x86/pmu: Create a function prototype to disable MSR interception Xiong Zhang
2024-01-26  8:54 ` [RFC PATCH 17/41] KVM: x86/pmu: Implement pmu function for Intel CPU " Xiong Zhang
2024-01-26  8:54 ` [RFC PATCH 18/41] KVM: x86/pmu: Intercept full-width GP counter MSRs by checking with perf capabilities Xiong Zhang
2024-04-11 21:23   ` Sean Christopherson
2024-04-11 21:50     ` Jim Mattson
2024-04-12 16:01       ` Sean Christopherson
2024-01-26  8:54 ` [RFC PATCH 19/41] KVM: x86/pmu: Whitelist PMU MSRs for passthrough PMU Xiong Zhang
2024-01-26  8:54 ` [RFC PATCH 20/41] KVM: x86/pmu: Introduce PMU operation prototypes for save/restore PMU context Xiong Zhang
2024-01-26  8:54 ` [RFC PATCH 21/41] KVM: x86/pmu: Introduce function prototype for Intel CPU to " Xiong Zhang
2024-01-26  8:54 ` [RFC PATCH 22/41] x86: Introduce MSR_CORE_PERF_GLOBAL_STATUS_SET for passthrough PMU Xiong Zhang
2024-01-26  8:54 ` [RFC PATCH 23/41] KVM: x86/pmu: Implement the save/restore of PMU state for Intel CPU Xiong Zhang
2024-04-11 21:26   ` Sean Christopherson
2024-04-13  2:29     ` Mi, Dapeng
2024-04-11 21:44   ` Sean Christopherson
2024-04-11 22:19     ` Jim Mattson
2024-04-11 23:31       ` Sean Christopherson
2024-04-13  3:19         ` Mi, Dapeng
2024-04-13  3:03     ` Mi, Dapeng
2024-04-13  3:34       ` Mingwei Zhang
2024-04-13  4:25         ` Mi, Dapeng
2024-04-15  6:06           ` Mingwei Zhang
2024-04-15 10:04             ` Mi, Dapeng
2024-04-15 16:44               ` Mingwei Zhang
2024-04-15 17:38                 ` Sean Christopherson
2024-04-15 17:54                   ` Mingwei Zhang
2024-04-15 22:45                     ` Sean Christopherson
2024-04-22  2:14                       ` maobibo
2024-04-22 17:01                         ` Sean Christopherson
2024-04-23  1:01                           ` maobibo
2024-04-23  2:44                             ` Mi, Dapeng
2024-04-23  2:53                               ` maobibo
2024-04-23  3:13                                 ` Mi, Dapeng
2024-04-23  3:26                                   ` maobibo
2024-04-23  3:59                                     ` Mi, Dapeng
2024-04-23  3:55                                   ` maobibo
2024-04-23  4:23                                     ` Mingwei Zhang
2024-04-23  6:08                                       ` maobibo
2024-04-23  6:45                                         ` Mi, Dapeng
2024-04-23  7:10                                           ` Mingwei Zhang
2024-04-23  8:24                                             ` Mi, Dapeng
2024-04-23  8:51                                               ` maobibo
2024-04-23 16:50                                               ` Mingwei Zhang
2024-04-23 12:12                                       ` maobibo
2024-04-23 17:02                                         ` Mingwei Zhang
2024-04-24  1:07                                           ` maobibo
2024-04-24  8:18                                           ` Mi, Dapeng
2024-04-24 15:00                                             ` Sean Christopherson
2024-04-25  3:55                                               ` Mi, Dapeng
2024-04-25  4:24                                                 ` Mingwei Zhang
2024-04-25 16:13                                                   ` Liang, Kan
2024-04-25 20:16                                                     ` Mingwei Zhang
2024-04-25 20:43                                                       ` Liang, Kan
2024-04-25 21:46                                                         ` Sean Christopherson
2024-04-26  1:46                                                           ` Mi, Dapeng
2024-04-26  3:12                                                             ` Mingwei Zhang
2024-04-26  4:02                                                               ` Mi, Dapeng
2024-04-26  4:46                                                                 ` Mingwei Zhang
2024-04-26 14:09                                                               ` Liang, Kan
2024-04-26 18:41                                                                 ` Mingwei Zhang
2024-04-26 19:06                                                                   ` Liang, Kan
2024-04-26 19:46                                                                     ` Sean Christopherson
2024-04-27  3:04                                                                       ` Mingwei Zhang
2024-04-28  0:58                                                                         ` Mi, Dapeng
2024-04-28  6:01                                                                           ` Mingwei Zhang
2024-04-29 17:44                                                                             ` Sean Christopherson
2024-05-01 17:43                                                                               ` Mingwei Zhang
2024-05-01 18:00                                                                                 ` Liang, Kan
2024-05-01 20:36                                                                                 ` Sean Christopherson
2024-04-29 13:08                                                                         ` Liang, Kan
2024-04-26 13:53                                                           ` Liang, Kan
2024-04-26  1:50                                                         ` Mi, Dapeng
2024-04-18 21:21                   ` Mingwei Zhang
2024-04-18 21:41                     ` Mingwei Zhang
2024-04-19  1:02                     ` Mi, Dapeng
2024-01-26  8:54 ` [RFC PATCH 24/41] KVM: x86/pmu: Zero out unexposed Counters/Selectors to avoid information leakage Xiong Zhang
2024-04-11 21:36   ` Sean Christopherson
2024-04-11 21:56     ` Jim Mattson
2024-01-26  8:54 ` [RFC PATCH 25/41] KVM: x86/pmu: Introduce macro PMU_CAP_PERF_METRICS Xiong Zhang
2024-01-26  8:54 ` [RFC PATCH 26/41] KVM: x86/pmu: Add host_perf_cap field in kvm_caps to record host PMU capability Xiong Zhang
2024-04-11 21:49   ` Sean Christopherson
2024-01-26  8:54 ` [RFC PATCH 27/41] KVM: x86/pmu: Clear PERF_METRICS MSR for guest Xiong Zhang
2024-04-11 21:50   ` Sean Christopherson
2024-04-13  3:29     ` Mi, Dapeng
2024-01-26  8:54 ` [RFC PATCH 28/41] KVM: x86/pmu: Switch IA32_PERF_GLOBAL_CTRL at VM boundary Xiong Zhang
2024-04-11 21:54   ` Sean Christopherson
2024-04-11 22:10     ` Jim Mattson
2024-04-11 22:54       ` Sean Christopherson
2024-04-11 23:08         ` Jim Mattson
2024-01-26  8:54 ` [RFC PATCH 29/41] KVM: x86/pmu: Exclude existing vLBR logic from the passthrough PMU Xiong Zhang
2024-01-26  8:54 ` [RFC PATCH 30/41] KVM: x86/pmu: Switch PMI handler at KVM context switch boundary Xiong Zhang
2024-01-26  8:54 ` [RFC PATCH 31/41] KVM: x86/pmu: Call perf_guest_enter() at PMU context switch Xiong Zhang
2024-01-26  8:54 ` [RFC PATCH 32/41] KVM: x86/pmu: Add support for PMU context switch at VM-exit/enter Xiong Zhang
2024-01-26  8:54 ` [RFC PATCH 33/41] KVM: x86/pmu: Make check_pmu_event_filter() an exported function Xiong Zhang
2024-01-26  8:54 ` [RFC PATCH 34/41] KVM: x86/pmu: Intercept EVENT_SELECT MSR Xiong Zhang
2024-04-11 21:55   ` Sean Christopherson
2024-01-26  8:54 ` [RFC PATCH 35/41] KVM: x86/pmu: Allow writing to event selector for GP counters if event is allowed Xiong Zhang
2024-01-26  8:54 ` [RFC PATCH 36/41] KVM: x86/pmu: Intercept FIXED_CTR_CTRL MSR Xiong Zhang
2024-04-11 21:56   ` Sean Christopherson
2024-01-26  8:54 ` [RFC PATCH 37/41] KVM: x86/pmu: Allow writing to fixed counter selector if counter is exposed Xiong Zhang
2024-04-11 22:03   ` Sean Christopherson
2024-04-13  4:12     ` Mi, Dapeng
2024-01-26  8:54 ` [RFC PATCH 38/41] KVM: x86/pmu: Introduce PMU helper to increment counter Xiong Zhang
2024-01-26  8:54 ` [RFC PATCH 39/41] KVM: x86/pmu: Implement emulated counter increment for passthrough PMU Xiong Zhang
2024-04-11 23:12   ` Sean Christopherson
2024-04-11 23:17     ` Sean Christopherson
2024-01-26  8:54 ` [RFC PATCH 40/41] KVM: x86/pmu: Separate passthrough PMU logic in set/get_msr() from non-passthrough vPMU Xiong Zhang
2024-04-11 23:18   ` Sean Christopherson
2024-04-18 21:54     ` Mingwei Zhang
2024-01-26  8:54 ` [RFC PATCH 41/41] KVM: nVMX: Add nested virtualization support for passthrough PMU Xiong Zhang
2024-04-11 23:21   ` Sean Christopherson
2024-04-11 17:03 ` [RFC PATCH 00/41] KVM: x86/pmu: Introduce passthrough vPM Sean Christopherson
2024-04-12  2:19   ` Zhang, Xiong Y
2024-04-12 18:32     ` Sean Christopherson
2024-04-15  1:06       ` Zhang, Xiong Y
2024-04-15 15:05         ` Sean Christopherson
2024-04-16  5:11           ` Zhang, Xiong Y
2024-04-18 20:46   ` Mingwei Zhang [this message]
2024-04-18 21:52     ` Mingwei Zhang
2024-04-19 19:14     ` Sean Christopherson
2024-04-19 22:02       ` Mingwei Zhang
2024-04-11 23:25 ` Sean Christopherson
2024-04-11 23:56   ` Mingwei Zhang

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ZiGGiOspm6N-vIta@google.com \
    --to=mizhang@google.com \
    --cc=chao.gao@intel.com \
    --cc=dapeng1.mi@linux.intel.com \
    --cc=eranian@google.com \
    --cc=irogers@google.com \
    --cc=jmattson@google.com \
    --cc=kan.liang@intel.com \
    --cc=kvm@vger.kernel.org \
    --cc=like.xu.linux@gmail.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-perf-users@vger.kernel.org \
    --cc=pbonzini@redhat.com \
    --cc=peterz@infradead.org \
    --cc=samantha.alt@intel.com \
    --cc=seanjc@google.com \
    --cc=xiong.y.zhang@linux.intel.com \
    --cc=zhenyuw@linux.intel.com \
    --cc=zhiyuan.lv@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).