From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from mail-ed1-f50.google.com (mail-ed1-f50.google.com [209.85.208.50]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 6D87AD53C for ; Thu, 11 Apr 2024 22:19:55 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.208.50 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1712873997; cv=none; b=ZfL4Lt6DHvybqoYD7kMTyWx4vbQWvZ8r/FFf7A0nxxtcJG/Ej5I38kdsS0IYCp7leFVLRyspOM0EoaaGToD7Zdsofv6aj3EeGgkCgO9hgSWAk+E2VYEqhVNCy0URvJWY5ma/gSS4zTnw9DrXAa1tKGpYSqW/NLyGgqGfRvRitf8= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1712873997; c=relaxed/simple; bh=X6RbBcdB0SCCujrPb4Yr3xBtRmMZEnD1Kca1/Iv40h0=; h=MIME-Version:References:In-Reply-To:From:Date:Message-ID:Subject: To:Cc:Content-Type; b=WFE86iaiLCO8EBSS+nX3NP5Tvc6CnZJhr2yB2lskJcfIcX8M58F6rA2pqIjxgU3U5zBH7p0Pk/hh5mPohc2urivyl27rtJzEMVvrqJWYbQ2+VWwfn7KKYowIYFW26GSbYs8YbfKreyzVPfOWKsf5fC7OM9XsIngxqTj3Dc4a57Q= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=DME4ufkk; arc=none smtp.client-ip=209.85.208.50 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="DME4ufkk" Received: by mail-ed1-f50.google.com with SMTP id 4fb4d7f45d1cf-56e5174ffc2so1930a12.1 for ; Thu, 11 Apr 2024 15:19:55 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1712873994; x=1713478794; darn=vger.kernel.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=7leSu9jkQ4XwLuSXA3Sr1F8VUoujnomr8MHpVmJwno0=; b=DME4ufkkBcapiUMVYp4l/nDCo1uHIrCqEEL9R2xsGXI1ItLjv3ZyaBGMt3nM8tUGAK 5w2Mgti8+5RZDcwEdWD2ySebSiaQJGP3dwbcgE3jrjEZ74wRYLttg7uDDtEcO67JNSc0 nfNXaXbKG2GW4YpF/ueap4Ewa4OUtRrPgJNvY7QnJ+SHtv3Kn2U8ml0g9yWy82rocQqC NP40lj30I0GT4RFLfEckq0znW1AjhaZGS19ivAEjDB9xqS/Sj44bVdyrrM15pcRjyMJ5 EHqxkfWD0tMVpsG7reBcSU0KlPAzlbL2QsBqQlEDCtdlerEDRGikV8J/YgeeYqmE6jgW fHYQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1712873994; x=1713478794; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=7leSu9jkQ4XwLuSXA3Sr1F8VUoujnomr8MHpVmJwno0=; b=iq3JsdVutf59tRObrxiGztLXnmgSHZN+yiB/MkazEoqQS66NzI+FxnHYiJvmM8VAsT Z8NmDjP1/FsN3nw/c2bRPhZ5GIjTs6Lf/trwJx/Cs66uOY5oqOPL7G3xbBH0xNWZpZsH Bb1l49ieWlJFwW6MPPX2krNP4iMwBidWv2CSU+IaKYibtT4/P6dV87rcdBImJkGUuYy7 n1ocRPtQt6/LjA05uZLR7zssUY7mJIoMTdflgGaspaWBs0boMsPDFTQQU4kghULqjVMW bpXIXAkwNYI0BGBcHNeKOUdYAQEHmdFdeoKXhoTL60Vf5bcSh153djJIUnJAXoXavHQr 8JRw== X-Forwarded-Encrypted: i=1; AJvYcCWmw5FW0VWQaKWL8qT5tcWBe6Cdt5b1cOlfMe3LXV+teOj5qVg4/vGRRMIJs7V9NBIFuFXae+f4iiTi5vRhiknTWdhutlC/8Ukt44Gz X-Gm-Message-State: AOJu0YwRULf/lCTEXBRMAmePSlTvXj38y3PaFvz2Pf6WeVELJm07h9Df uDCRXoQklWCZ62lju1mIyy6baJt2ae50oSOZBJGwzzHBke4L8VK5D7+jXi7jyz7l/Sn6e9DY5+w rrvDhCD1SaFm60jR2SnNUPlrx0RDFpS/aiq0K X-Google-Smtp-Source: AGHT+IETDb+J1nsSc2scCFKEt691wN+Ak46hJjOjHK2LzkT7GTL5przS/xPK1Li7l8sPkqLx0ArjwPs5/iNeBTa4eIY= X-Received: by 2002:a05:6402:3592:b0:56f:d6ef:b61 with SMTP id y18-20020a056402359200b0056fd6ef0b61mr24663edc.5.1712873993575; Thu, 11 Apr 2024 15:19:53 -0700 (PDT) Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 References: <20240126085444.324918-1-xiong.y.zhang@linux.intel.com> <20240126085444.324918-24-xiong.y.zhang@linux.intel.com> In-Reply-To: From: Jim Mattson Date: Thu, 11 Apr 2024 15:19:41 -0700 Message-ID: Subject: Re: [RFC PATCH 23/41] KVM: x86/pmu: Implement the save/restore of PMU state for Intel CPU To: Sean Christopherson Cc: Xiong Zhang , pbonzini@redhat.com, peterz@infradead.org, mizhang@google.com, kan.liang@intel.com, zhenyuw@linux.intel.com, dapeng1.mi@linux.intel.com, kvm@vger.kernel.org, linux-perf-users@vger.kernel.org, linux-kernel@vger.kernel.org, zhiyuan.lv@intel.com, eranian@google.com, irogers@google.com, samantha.alt@intel.com, like.xu.linux@gmail.com, chao.gao@intel.com Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable On Thu, Apr 11, 2024 at 2:44=E2=80=AFPM Sean Christopherson wrote: > > On Fri, Jan 26, 2024, Xiong Zhang wrote: > > From: Dapeng Mi > > > > Implement the save/restore of PMU state for pasthrough PMU in Intel. In > > passthrough mode, KVM owns exclusively the PMU HW when control flow goe= s to > > the scope of passthrough PMU. Thus, KVM needs to save the host PMU stat= e > > and gains the full HW PMU ownership. On the contrary, host regains the > > ownership of PMU HW from KVM when control flow leaves the scope of > > passthrough PMU. > > > > Implement PMU context switches for Intel CPUs and opptunistically use > > rdpmcl() instead of rdmsrl() when reading counters since the former has > > lower latency in Intel CPUs. > > > > Co-developed-by: Mingwei Zhang > > Signed-off-by: Mingwei Zhang > > Signed-off-by: Dapeng Mi > > --- > > arch/x86/kvm/vmx/pmu_intel.c | 73 ++++++++++++++++++++++++++++++++++++ > > 1 file changed, 73 insertions(+) > > > > diff --git a/arch/x86/kvm/vmx/pmu_intel.c b/arch/x86/kvm/vmx/pmu_intel.= c > > index 0d58fe7d243e..f79bebe7093d 100644 > > --- a/arch/x86/kvm/vmx/pmu_intel.c > > +++ b/arch/x86/kvm/vmx/pmu_intel.c > > @@ -823,10 +823,83 @@ void intel_passthrough_pmu_msrs(struct kvm_vcpu *= vcpu) > > > > static void intel_save_pmu_context(struct kvm_vcpu *vcpu) > > I would prefer there be a "guest" in there somewhere, e.g. intel_save_gue= st_pmu_context(). > > > { > > + struct kvm_pmu *pmu =3D vcpu_to_pmu(vcpu); > > + struct kvm_pmc *pmc; > > + u32 i; > > + > > + if (pmu->version !=3D 2) { > > + pr_warn("only PerfMon v2 is supported for passthrough PMU= "); > > + return; > > + } > > + > > + /* Global ctrl register is already saved at VM-exit. */ > > + rdmsrl(MSR_CORE_PERF_GLOBAL_STATUS, pmu->global_status); > > + /* Clear hardware MSR_CORE_PERF_GLOBAL_STATUS MSR, if non-zero. *= / > > + if (pmu->global_status) > > + wrmsrl(MSR_CORE_PERF_GLOBAL_OVF_CTRL, pmu->global_status)= ; > > + > > + for (i =3D 0; i < pmu->nr_arch_gp_counters; i++) { > > + pmc =3D &pmu->gp_counters[i]; > > + rdpmcl(i, pmc->counter); > > + rdmsrl(i + MSR_ARCH_PERFMON_EVENTSEL0, pmc->eventsel); > > + /* > > + * Clear hardware PERFMON_EVENTSELx and its counter to av= oid > > + * leakage and also avoid this guest GP counter get accid= entally > > + * enabled during host running when host enable global ct= rl. > > + */ > > + if (pmc->eventsel) > > + wrmsrl(MSR_ARCH_PERFMON_EVENTSEL0 + i, 0); > > + if (pmc->counter) > > + wrmsrl(MSR_IA32_PMC0 + i, 0); > > This doesn't make much sense. The kernel already has full access to the = guest, > I don't see what is gained by zeroing out the MSRs just to hide them from= perf. > > Similarly, if perf enables a counter if PERF_GLOBAL_CTRL without first re= storing > the event selector, we gots problems. > > Same thing for the fixed counters below. Can't this just be? > > for (i =3D 0; i < pmu->nr_arch_gp_counters; i++) > rdpmcl(i, pmu->gp_counters[i].counter); > > for (i =3D 0; i < pmu->nr_arch_fixed_counters; i++) > rdpmcl(INTEL_PMC_FIXED_RDPMC_BASE | i, > pmu->fixed_counters[i].counter); > > > + } > > + > > + rdmsrl(MSR_CORE_PERF_FIXED_CTR_CTRL, pmu->fixed_ctr_ctrl); > > + /* > > + * Clear hardware FIXED_CTR_CTRL MSR to avoid information leakage= and > > + * also avoid these guest fixed counters get accidentially enable= d > > + * during host running when host enable global ctrl. > > + */ > > + if (pmu->fixed_ctr_ctrl) > > + wrmsrl(MSR_CORE_PERF_FIXED_CTR_CTRL, 0); > > + for (i =3D 0; i < pmu->nr_arch_fixed_counters; i++) { > > + pmc =3D &pmu->fixed_counters[i]; > > + rdpmcl(INTEL_PMC_FIXED_RDPMC_BASE | i, pmc->counter); > > + if (pmc->counter) > > + wrmsrl(MSR_CORE_PERF_FIXED_CTR0 + i, 0); > > + } > > } > > > > static void intel_restore_pmu_context(struct kvm_vcpu *vcpu) > > { > > + struct kvm_pmu *pmu =3D vcpu_to_pmu(vcpu); > > + struct kvm_pmc *pmc; > > + u64 global_status; > > + int i; > > + > > + if (pmu->version !=3D 2) { > > + pr_warn("only PerfMon v2 is supported for passthrough PMU= "); > > + return; > > + } > > + > > + /* Clear host global_ctrl and global_status MSR if non-zero. */ > > + wrmsrl(MSR_CORE_PERF_GLOBAL_CTRL, 0); > > Why? PERF_GLOBAL_CTRL will be auto-loaded at VM-Enter, why do it now? > > > + rdmsrl(MSR_CORE_PERF_GLOBAL_STATUS, global_status); > > + if (global_status) > > + wrmsrl(MSR_CORE_PERF_GLOBAL_OVF_CTRL, global_status); > > This seems especially silly, isn't the full MSR being written below? Or = am I > misunderstanding how these things work? LOL! You expect CPU design to follow basic logic?!? Writing a 1 to a bit in IA32_PERF_GLOBAL_STATUS_SET sets the corresponding bit in IA32_PERF_GLOBAL_STATUS to 1. Writing a 0 to a bit in to IA32_PERF_GLOBAL_STATUS_SET is a nop. To clear a bit in IA32_PERF_GLOBAL_STATUS, you need to write a 1 to the corresponding bit in IA32_PERF_GLOBAL_STATUS_RESET (aka IA32_PERF_GLOBAL_OVF_CTRL). > > + wrmsrl(MSR_CORE_PERF_GLOBAL_STATUS_SET, pmu->global_status); > > + > > + for (i =3D 0; i < pmu->nr_arch_gp_counters; i++) { > > + pmc =3D &pmu->gp_counters[i]; > > + wrmsrl(MSR_IA32_PMC0 + i, pmc->counter); > > + wrmsrl(MSR_ARCH_PERFMON_EVENTSEL0 + i, pmc->eventsel); > > + } > > + > > + wrmsrl(MSR_CORE_PERF_FIXED_CTR_CTRL, pmu->fixed_ctr_ctrl); > > + for (i =3D 0; i < pmu->nr_arch_fixed_counters; i++) { > > + pmc =3D &pmu->fixed_counters[i]; > > + wrmsrl(MSR_CORE_PERF_FIXED_CTR0 + i, pmc->counter); > > + } > > } > > > > struct kvm_pmu_ops intel_pmu_ops __initdata =3D { > > -- > > 2.34.1 > >