LKML Archive mirror
 help / color / mirror / Atom feed
From: "Liang, Kan" <kan.liang@linux.intel.com>
To: Sean Christopherson <seanjc@google.com>
Cc: Xiong Zhang <xiong.y.zhang@linux.intel.com>,
	pbonzini@redhat.com, peterz@infradead.org, mizhang@google.com,
	kan.liang@intel.com, zhenyuw@linux.intel.com,
	dapeng1.mi@linux.intel.com, jmattson@google.com,
	kvm@vger.kernel.org, linux-perf-users@vger.kernel.org,
	linux-kernel@vger.kernel.org, zhiyuan.lv@intel.com,
	eranian@google.com, irogers@google.com, samantha.alt@intel.com,
	like.xu.linux@gmail.com, chao.gao@intel.com
Subject: Re: [RFC PATCH 02/41] perf: Support guest enter/exit interfaces
Date: Fri, 12 Apr 2024 16:56:26 -0400	[thread overview]
Message-ID: <2342a4e2-2834-48e2-8403-f0050481e59e@linux.intel.com> (raw)
In-Reply-To: <ZhmIrQQVgblrhCZs@google.com>



On 2024-04-12 3:17 p.m., Sean Christopherson wrote:
> On Thu, Apr 11, 2024, Kan Liang wrote:
>>>> +/*
>>>> + * When a guest enters, force all active events of the PMU, which supports
>>>> + * the VPMU_PASSTHROUGH feature, to be scheduled out. The events of other
>>>> + * PMUs, such as uncore PMU, should not be impacted. The guest can
>>>> + * temporarily own all counters of the PMU.
>>>> + * During the period, all the creation of the new event of the PMU with
>>>> + * !exclude_guest are error out.
>>>> + */
>>>> +void perf_guest_enter(void)
>>>> +{
>>>> +	struct perf_cpu_context *cpuctx = this_cpu_ptr(&perf_cpu_context);
>>>> +
>>>> +	lockdep_assert_irqs_disabled();
>>>> +
>>>> +	if (__this_cpu_read(__perf_force_exclude_guest))
>>>
>>> This should be a WARN_ON_ONCE, no?
>>
>> To debug the improper behavior of KVM?
> 
> Not so much "debug" as ensure that the platform owner noticies that KVM is buggy.
> 
>>>> +static inline int perf_force_exclude_guest_check(struct perf_event *event,
>>>> +						 int cpu, struct task_struct *task)
>>>> +{
>>>> +	bool *force_exclude_guest = NULL;
>>>> +
>>>> +	if (!has_vpmu_passthrough_cap(event->pmu))
>>>> +		return 0;
>>>> +
>>>> +	if (event->attr.exclude_guest)
>>>> +		return 0;
>>>> +
>>>> +	if (cpu != -1) {
>>>> +		force_exclude_guest = per_cpu_ptr(&__perf_force_exclude_guest, cpu);
>>>> +	} else if (task && (task->flags & PF_VCPU)) {
>>>> +		/*
>>>> +		 * Just need to check the running CPU in the event creation. If the
>>>> +		 * task is moved to another CPU which supports the force_exclude_guest.
>>>> +		 * The event will filtered out and be moved to the error stage. See
>>>> +		 * merge_sched_in().
>>>> +		 */
>>>> +		force_exclude_guest = per_cpu_ptr(&__perf_force_exclude_guest, task_cpu(task));
>>>> +	}
>>>
>>> These checks are extremely racy, I don't see how this can possibly do the
>>> right thing.  PF_VCPU isn't a "this is a vCPU task", it's a "this task is about
>>> to do VM-Enter, or just took a VM-Exit" (the "I'm a virtual CPU" comment in
>>> include/linux/sched.h is wildly misleading, as it's _only_ valid when accounting
>>> time slices).
>>>
>>
>> This is to reject an !exclude_guest event creation for a running
>> "passthrough" guest from host perf tool.
>> Could you please suggest a way to detect it via the struct task_struct?
>>
>>> Digging deeper, I think __perf_force_exclude_guest has similar problems, e.g.
>>> perf_event_create_kernel_counter() calls perf_event_alloc() before acquiring the
>>> per-CPU context mutex.
>>
>> Do you mean that the perf_guest_enter() check could be happened right
>> after the perf_force_exclude_guest_check()?
>> It's possible. For this case, the event can still be created. It will be
>> treated as an existing event and handled in merge_sched_in(). It will
>> never be scheduled when a guest is running.
>>
>> The perf_force_exclude_guest_check() is to make sure most of the cases
>> can be rejected at the creation place. For the corner cases, they will
>> be rejected in the schedule stage.
> 
> Ah, the "rejected in the schedule stage" is what I'm missing.  But that creates
> a gross ABI, because IIUC, event creation will "randomly" succeed based on whether
> or not a CPU happens to be running in a KVM guest.  I.e. it's not just the kernel
> code that has races, the entire event creation is one big race.
> 
> What if perf had a global knob to enable/disable mediate PMU support?  Then when
> KVM is loaded with enable_mediated_true, call into perf to (a) check that there
> are no existing !exclude_guest events (this part could be optional), and (b) set
> the global knob to reject all new !exclude_guest events (for the core PMU?).
> 
> Hmm, or probably better, do it at VM creation.  That has the advantage of playing
> nice with CONFIG_KVM=y (perf could reject the enabling without completely breaking
> KVM), and not causing problems if KVM is auto-probed but the user doesn't actually
> want to run VMs.

I think it should be doable, and may simplify the perf implementation.
(The check in the schedule stage should not be necessary anymore.)

With this, something like NMI watchdog should fail the VM creation. The
user should either disable the NMI watchdog or use a replacement.

Thanks,
Kan
> 
> E.g. (very roughly)
> 
> int x86_perf_get_mediated_pmu(void)
> {
> 	if (refcount_inc_not_zero(...))
> 		return 0;
> 
> 	if (<system wide events>)
> 		return -EBUSY;
> 
> 	<slow path with locking>
> }
> 
> void x86_perf_put_mediated_pmu(void)
> {
> 	if (!refcount_dec_and_test(...))
> 		return;
> 
> 	<slow path with locking>
> }
> 
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index 1bbf312cbd73..f2994377ef44 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -12467,6 +12467,12 @@ int kvm_arch_init_vm(struct kvm *kvm, unsigned long type)
>         if (type)
>                 return -EINVAL;
>  
> +       if (enable_mediated_pmu)
> +               ret = x86_perf_get_mediated_pmu();
> +               if (ret)
> +                       return ret;
> +       }
> +
>         ret = kvm_page_track_init(kvm);
>         if (ret)
>                 goto out;
> @@ -12518,6 +12524,7 @@ int kvm_arch_init_vm(struct kvm *kvm, unsigned long type)
>         kvm_mmu_uninit_vm(kvm);
>         kvm_page_track_cleanup(kvm);
>  out:
> +       x86_perf_put_mediated_pmu();
>         return ret;
>  }
>  
> @@ -12659,6 +12666,7 @@ void kvm_arch_destroy_vm(struct kvm *kvm)
>         kvm_page_track_cleanup(kvm);
>         kvm_xen_destroy_vm(kvm);
>         kvm_hv_destroy_vm(kvm);
> +       x86_perf_put_mediated_pmu();
>  }
>  
>  static void memslot_rmap_free(struct kvm_memory_slot *slot)
> 
> 

  reply	other threads:[~2024-04-12 20:56 UTC|newest]

Thread overview: 181+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-01-26  8:54 [RFC PATCH 00/41] KVM: x86/pmu: Introduce passthrough vPM Xiong Zhang
2024-01-26  8:54 ` [RFC PATCH 01/41] perf: x86/intel: Support PERF_PMU_CAP_VPMU_PASSTHROUGH Xiong Zhang
2024-04-11 17:04   ` Sean Christopherson
2024-04-11 17:21     ` Liang, Kan
2024-04-11 17:24       ` Jim Mattson
2024-04-11 17:46         ` Sean Christopherson
2024-04-11 19:13           ` Liang, Kan
2024-04-11 20:43             ` Sean Christopherson
2024-04-11 21:04               ` Liang, Kan
2024-04-11 19:32           ` Sean Christopherson
2024-01-26  8:54 ` [RFC PATCH 02/41] perf: Support guest enter/exit interfaces Xiong Zhang
2024-03-20 16:40   ` Raghavendra Rao Ananta
2024-03-20 17:12     ` Liang, Kan
2024-04-11 18:06   ` Sean Christopherson
2024-04-11 19:53     ` Liang, Kan
2024-04-12 19:17       ` Sean Christopherson
2024-04-12 20:56         ` Liang, Kan [this message]
2024-04-15 16:03           ` Liang, Kan
2024-04-16  5:34             ` Zhang, Xiong Y
2024-04-16 12:48               ` Liang, Kan
2024-04-17  9:42                 ` Zhang, Xiong Y
2024-04-18 16:11                   ` Sean Christopherson
2024-04-19  1:37                     ` Zhang, Xiong Y
2024-04-26  4:09       ` Zhang, Xiong Y
2024-01-26  8:54 ` [RFC PATCH 03/41] perf: Set exclude_guest onto nmi_watchdog Xiong Zhang
2024-04-11 18:56   ` Sean Christopherson
2024-01-26  8:54 ` [RFC PATCH 04/41] perf: core/x86: Add support to register a new vector for PMI handling Xiong Zhang
2024-04-11 17:10   ` Sean Christopherson
2024-04-11 19:05     ` Sean Christopherson
2024-04-12  3:56     ` Zhang, Xiong Y
2024-04-13  1:17       ` Mi, Dapeng
2024-01-26  8:54 ` [RFC PATCH 05/41] KVM: x86/pmu: Register PMI handler for passthrough PMU Xiong Zhang
2024-04-11 19:07   ` Sean Christopherson
2024-04-12  5:44     ` Zhang, Xiong Y
2024-01-26  8:54 ` [RFC PATCH 06/41] perf: x86: Add function to switch PMI handler Xiong Zhang
2024-04-11 19:17   ` Sean Christopherson
2024-04-11 19:34     ` Sean Christopherson
2024-04-12  6:03       ` Zhang, Xiong Y
2024-04-12  5:57     ` Zhang, Xiong Y
2024-01-26  8:54 ` [RFC PATCH 07/41] perf/x86: Add interface to reflect virtual LVTPC_MASK bit onto HW Xiong Zhang
2024-04-11 19:21   ` Sean Christopherson
2024-04-12  6:17     ` Zhang, Xiong Y
2024-01-26  8:54 ` [RFC PATCH 08/41] KVM: x86/pmu: Add get virtual LVTPC_MASK bit function Xiong Zhang
2024-04-11 19:22   ` Sean Christopherson
2024-01-26  8:54 ` [RFC PATCH 09/41] perf: core/x86: Forbid PMI handler when guest own PMU Xiong Zhang
2024-04-11 19:26   ` Sean Christopherson
2024-01-26  8:54 ` [RFC PATCH 10/41] perf: core/x86: Plumb passthrough PMU capability from x86_pmu to x86_pmu_cap Xiong Zhang
2024-01-26  8:54 ` [RFC PATCH 11/41] KVM: x86/pmu: Introduce enable_passthrough_pmu module parameter and propage to KVM instance Xiong Zhang
2024-04-11 20:54   ` Sean Christopherson
2024-04-11 21:03   ` Sean Christopherson
2024-01-26  8:54 ` [RFC PATCH 12/41] KVM: x86/pmu: Plumb through passthrough PMU to vcpu for Intel CPUs Xiong Zhang
2024-04-11 20:57   ` Sean Christopherson
2024-01-26  8:54 ` [RFC PATCH 13/41] KVM: x86/pmu: Add a helper to check if passthrough PMU is enabled Xiong Zhang
2024-01-26  8:54 ` [RFC PATCH 14/41] KVM: x86/pmu: Allow RDPMC pass through Xiong Zhang
2024-01-26  8:54 ` [RFC PATCH 15/41] KVM: x86/pmu: Manage MSR interception for IA32_PERF_GLOBAL_CTRL Xiong Zhang
2024-04-11 21:21   ` Sean Christopherson
2024-04-11 22:30     ` Jim Mattson
2024-04-11 23:27       ` Sean Christopherson
2024-04-13  2:10       ` Mi, Dapeng
2024-01-26  8:54 ` [RFC PATCH 16/41] KVM: x86/pmu: Create a function prototype to disable MSR interception Xiong Zhang
2024-01-26  8:54 ` [RFC PATCH 17/41] KVM: x86/pmu: Implement pmu function for Intel CPU " Xiong Zhang
2024-01-26  8:54 ` [RFC PATCH 18/41] KVM: x86/pmu: Intercept full-width GP counter MSRs by checking with perf capabilities Xiong Zhang
2024-04-11 21:23   ` Sean Christopherson
2024-04-11 21:50     ` Jim Mattson
2024-04-12 16:01       ` Sean Christopherson
2024-01-26  8:54 ` [RFC PATCH 19/41] KVM: x86/pmu: Whitelist PMU MSRs for passthrough PMU Xiong Zhang
2024-01-26  8:54 ` [RFC PATCH 20/41] KVM: x86/pmu: Introduce PMU operation prototypes for save/restore PMU context Xiong Zhang
2024-01-26  8:54 ` [RFC PATCH 21/41] KVM: x86/pmu: Introduce function prototype for Intel CPU to " Xiong Zhang
2024-01-26  8:54 ` [RFC PATCH 22/41] x86: Introduce MSR_CORE_PERF_GLOBAL_STATUS_SET for passthrough PMU Xiong Zhang
2024-01-26  8:54 ` [RFC PATCH 23/41] KVM: x86/pmu: Implement the save/restore of PMU state for Intel CPU Xiong Zhang
2024-04-11 21:26   ` Sean Christopherson
2024-04-13  2:29     ` Mi, Dapeng
2024-04-11 21:44   ` Sean Christopherson
2024-04-11 22:19     ` Jim Mattson
2024-04-11 23:31       ` Sean Christopherson
2024-04-13  3:19         ` Mi, Dapeng
2024-04-13  3:03     ` Mi, Dapeng
2024-04-13  3:34       ` Mingwei Zhang
2024-04-13  4:25         ` Mi, Dapeng
2024-04-15  6:06           ` Mingwei Zhang
2024-04-15 10:04             ` Mi, Dapeng
2024-04-15 16:44               ` Mingwei Zhang
2024-04-15 17:38                 ` Sean Christopherson
2024-04-15 17:54                   ` Mingwei Zhang
2024-04-15 22:45                     ` Sean Christopherson
2024-04-22  2:14                       ` maobibo
2024-04-22 17:01                         ` Sean Christopherson
2024-04-23  1:01                           ` maobibo
2024-04-23  2:44                             ` Mi, Dapeng
2024-04-23  2:53                               ` maobibo
2024-04-23  3:13                                 ` Mi, Dapeng
2024-04-23  3:26                                   ` maobibo
2024-04-23  3:59                                     ` Mi, Dapeng
2024-04-23  3:55                                   ` maobibo
2024-04-23  4:23                                     ` Mingwei Zhang
2024-04-23  6:08                                       ` maobibo
2024-04-23  6:45                                         ` Mi, Dapeng
2024-04-23  7:10                                           ` Mingwei Zhang
2024-04-23  8:24                                             ` Mi, Dapeng
2024-04-23  8:51                                               ` maobibo
2024-04-23 16:50                                               ` Mingwei Zhang
2024-04-23 12:12                                       ` maobibo
2024-04-23 17:02                                         ` Mingwei Zhang
2024-04-24  1:07                                           ` maobibo
2024-04-24  8:18                                           ` Mi, Dapeng
2024-04-24 15:00                                             ` Sean Christopherson
2024-04-25  3:55                                               ` Mi, Dapeng
2024-04-25  4:24                                                 ` Mingwei Zhang
2024-04-25 16:13                                                   ` Liang, Kan
2024-04-25 20:16                                                     ` Mingwei Zhang
2024-04-25 20:43                                                       ` Liang, Kan
2024-04-25 21:46                                                         ` Sean Christopherson
2024-04-26  1:46                                                           ` Mi, Dapeng
2024-04-26  3:12                                                             ` Mingwei Zhang
2024-04-26  4:02                                                               ` Mi, Dapeng
2024-04-26  4:46                                                                 ` Mingwei Zhang
2024-04-26 14:09                                                               ` Liang, Kan
2024-04-26 18:41                                                                 ` Mingwei Zhang
2024-04-26 19:06                                                                   ` Liang, Kan
2024-04-26 19:46                                                                     ` Sean Christopherson
2024-04-27  3:04                                                                       ` Mingwei Zhang
2024-04-28  0:58                                                                         ` Mi, Dapeng
2024-04-28  6:01                                                                           ` Mingwei Zhang
2024-04-29 17:44                                                                             ` Sean Christopherson
2024-05-01 17:43                                                                               ` Mingwei Zhang
2024-05-01 18:00                                                                                 ` Liang, Kan
2024-05-01 20:36                                                                                 ` Sean Christopherson
2024-04-29 13:08                                                                         ` Liang, Kan
2024-04-26 13:53                                                           ` Liang, Kan
2024-04-26  1:50                                                         ` Mi, Dapeng
2024-04-18 21:21                   ` Mingwei Zhang
2024-04-18 21:41                     ` Mingwei Zhang
2024-04-19  1:02                     ` Mi, Dapeng
2024-01-26  8:54 ` [RFC PATCH 24/41] KVM: x86/pmu: Zero out unexposed Counters/Selectors to avoid information leakage Xiong Zhang
2024-04-11 21:36   ` Sean Christopherson
2024-04-11 21:56     ` Jim Mattson
2024-01-26  8:54 ` [RFC PATCH 25/41] KVM: x86/pmu: Introduce macro PMU_CAP_PERF_METRICS Xiong Zhang
2024-01-26  8:54 ` [RFC PATCH 26/41] KVM: x86/pmu: Add host_perf_cap field in kvm_caps to record host PMU capability Xiong Zhang
2024-04-11 21:49   ` Sean Christopherson
2024-01-26  8:54 ` [RFC PATCH 27/41] KVM: x86/pmu: Clear PERF_METRICS MSR for guest Xiong Zhang
2024-04-11 21:50   ` Sean Christopherson
2024-04-13  3:29     ` Mi, Dapeng
2024-01-26  8:54 ` [RFC PATCH 28/41] KVM: x86/pmu: Switch IA32_PERF_GLOBAL_CTRL at VM boundary Xiong Zhang
2024-04-11 21:54   ` Sean Christopherson
2024-04-11 22:10     ` Jim Mattson
2024-04-11 22:54       ` Sean Christopherson
2024-04-11 23:08         ` Jim Mattson
2024-01-26  8:54 ` [RFC PATCH 29/41] KVM: x86/pmu: Exclude existing vLBR logic from the passthrough PMU Xiong Zhang
2024-01-26  8:54 ` [RFC PATCH 30/41] KVM: x86/pmu: Switch PMI handler at KVM context switch boundary Xiong Zhang
2024-01-26  8:54 ` [RFC PATCH 31/41] KVM: x86/pmu: Call perf_guest_enter() at PMU context switch Xiong Zhang
2024-01-26  8:54 ` [RFC PATCH 32/41] KVM: x86/pmu: Add support for PMU context switch at VM-exit/enter Xiong Zhang
2024-01-26  8:54 ` [RFC PATCH 33/41] KVM: x86/pmu: Make check_pmu_event_filter() an exported function Xiong Zhang
2024-01-26  8:54 ` [RFC PATCH 34/41] KVM: x86/pmu: Intercept EVENT_SELECT MSR Xiong Zhang
2024-04-11 21:55   ` Sean Christopherson
2024-01-26  8:54 ` [RFC PATCH 35/41] KVM: x86/pmu: Allow writing to event selector for GP counters if event is allowed Xiong Zhang
2024-01-26  8:54 ` [RFC PATCH 36/41] KVM: x86/pmu: Intercept FIXED_CTR_CTRL MSR Xiong Zhang
2024-04-11 21:56   ` Sean Christopherson
2024-01-26  8:54 ` [RFC PATCH 37/41] KVM: x86/pmu: Allow writing to fixed counter selector if counter is exposed Xiong Zhang
2024-04-11 22:03   ` Sean Christopherson
2024-04-13  4:12     ` Mi, Dapeng
2024-01-26  8:54 ` [RFC PATCH 38/41] KVM: x86/pmu: Introduce PMU helper to increment counter Xiong Zhang
2024-01-26  8:54 ` [RFC PATCH 39/41] KVM: x86/pmu: Implement emulated counter increment for passthrough PMU Xiong Zhang
2024-04-11 23:12   ` Sean Christopherson
2024-04-11 23:17     ` Sean Christopherson
2024-01-26  8:54 ` [RFC PATCH 40/41] KVM: x86/pmu: Separate passthrough PMU logic in set/get_msr() from non-passthrough vPMU Xiong Zhang
2024-04-11 23:18   ` Sean Christopherson
2024-04-18 21:54     ` Mingwei Zhang
2024-01-26  8:54 ` [RFC PATCH 41/41] KVM: nVMX: Add nested virtualization support for passthrough PMU Xiong Zhang
2024-04-11 23:21   ` Sean Christopherson
2024-04-11 17:03 ` [RFC PATCH 00/41] KVM: x86/pmu: Introduce passthrough vPM Sean Christopherson
2024-04-12  2:19   ` Zhang, Xiong Y
2024-04-12 18:32     ` Sean Christopherson
2024-04-15  1:06       ` Zhang, Xiong Y
2024-04-15 15:05         ` Sean Christopherson
2024-04-16  5:11           ` Zhang, Xiong Y
2024-04-18 20:46   ` Mingwei Zhang
2024-04-18 21:52     ` Mingwei Zhang
2024-04-19 19:14     ` Sean Christopherson
2024-04-19 22:02       ` Mingwei Zhang
2024-04-11 23:25 ` Sean Christopherson
2024-04-11 23:56   ` Mingwei Zhang

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=2342a4e2-2834-48e2-8403-f0050481e59e@linux.intel.com \
    --to=kan.liang@linux.intel.com \
    --cc=chao.gao@intel.com \
    --cc=dapeng1.mi@linux.intel.com \
    --cc=eranian@google.com \
    --cc=irogers@google.com \
    --cc=jmattson@google.com \
    --cc=kan.liang@intel.com \
    --cc=kvm@vger.kernel.org \
    --cc=like.xu.linux@gmail.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-perf-users@vger.kernel.org \
    --cc=mizhang@google.com \
    --cc=pbonzini@redhat.com \
    --cc=peterz@infradead.org \
    --cc=samantha.alt@intel.com \
    --cc=seanjc@google.com \
    --cc=xiong.y.zhang@linux.intel.com \
    --cc=zhenyuw@linux.intel.com \
    --cc=zhiyuan.lv@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).