All the mail mirrored from lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v2 00/13] Intel Hardware P-States (HWP) support
@ 2022-08-10 19:29 Jason Andryuk
  2022-08-10 19:29 ` [PATCH v2 01/13] cpufreq: Allow restricting to internal governors only Jason Andryuk
                   ` (12 more replies)
  0 siblings, 13 replies; 24+ messages in thread
From: Jason Andryuk @ 2022-08-10 19:29 UTC (permalink / raw)
  To: xen-devel
  Cc: Jason Andryuk, Jan Beulich, Andrew Cooper, Roger Pau Monné,
	Wei Liu, George Dunlap, Julien Grall, Stefano Stabellini,
	Anthony PERARD, Juergen Gross, Henry Wang, Community Manager

Hi,

This patch series adds Hardware-Controlled Performance States (HWP) for
Intel processors to Xen.

With HWP, the processor makes its own determinations for frequency
selection, though users can set some parameters and preferences.  There
is also Turbo Boost which dynamically pushes the max frequency if
possible.

The existing governors don't work with HWP since they select frequencies
and HWP doesn't expose those.  Therefore a dummy hwp-interal governor is
used that doesn't do anything.

xenpm get-cpufreq-para is extended to show HWP parameters, and
set-cpufreq-hwp is added to set them.

A lightly loaded OpenXT laptop showed ~1W power savings according to
powertop.  A mostly idle Fedora system (dom0 only) showed a more modest
power savings.

This for for a 10th gen 6-core 1600 MHz base 4900 MHZ max cpu.  In the
default balance mode, Turbo Boost doesn't exceed 4GHz.  Tweaking the
energy_perf preference with `xenpm set-cpufreq-hwp balance ene:64`,
I've seen the CPU hit 4.7GHz before throttling down and bouncing around
between 4.3 and 4.5 GHz.  Curiously the other cores read ~4GHz when
turbo boost takes affect.  This was done after pinning all dom0 cores,
and using taskset to pin to vCPU/pCPU 11 and running a bash tightloop.

In v2, I think I addressed all comments for v1.  I kept patch 11 "xenpm:
Factor out a non-fatal cpuid_parse variant", with a v2 comment
explaining why I keep it.

HWP defaults to disabled and running with the existing HWP configuration
- it doesn't reconfigure by default.  It can be enabled with
cpufreq=xen:hwp.

Hardware Duty Cycling (HDC) is another feature to autonomously powerdown
things.  It defaults to enabled when HWP is enabled, but HDC can be
disabled on the command line.  cpufreq=xen:hwp,no-hdc

I've only tested on 8th gen and 10th gen systems with activity window
and energy_perf support.  So the pathes for CPUs lacking those features
are untested.

Fast MSR support was removed in v2.  The model specific checking was not
done properly, and I don't have hardware to test with.  Since writes are
expected to be infrequent, I just removed the code.

This changes the systcl_pm_op hypercall, so that wants review.

Regards,
Jason

Jason Andryuk (13):
  cpufreq: Allow restricting to internal governors only
  cpufreq: Add perf_freq to cpuinfo
  cpufreq: Export intel_feature_detect
  cpufreq: Add Hardware P-State (HWP) driver
  xenpm: Change get-cpufreq-para output for internal
  cpufreq: Export HWP parameters to userspace
  libxc: Include hwp_para in definitions
  xenpm: Print HWP parameters
  xen: Add SET_CPUFREQ_HWP xen_sysctl_pm_op
  libxc: Add xc_set_cpufreq_hwp
  xenpm: Factor out a non-fatal cpuid_parse variant
  xenpm: Add set-cpufreq-hwp subcommand
  CHANGELOG: Add Intel HWP entry

 CHANGELOG.md                              |   3 +
 docs/misc/xen-command-line.pandoc         |   8 +-
 tools/include/xenctrl.h                   |   6 +
 tools/libs/ctrl/xc_pm.c                   |  18 +
 tools/misc/xenpm.c                        | 355 +++++++++++-
 xen/arch/x86/acpi/cpufreq/Makefile        |   1 +
 xen/arch/x86/acpi/cpufreq/cpufreq.c       |  15 +-
 xen/arch/x86/acpi/cpufreq/hwp.c           | 627 ++++++++++++++++++++++
 xen/arch/x86/include/asm/cpufeature.h     |  13 +-
 xen/arch/x86/include/asm/msr-index.h      |  13 +
 xen/drivers/acpi/pmstat.c                 |  28 +
 xen/drivers/cpufreq/cpufreq.c             |  37 ++
 xen/drivers/cpufreq/utility.c             |   1 +
 xen/include/acpi/cpufreq/cpufreq.h        |  14 +
 xen/include/acpi/cpufreq/processor_perf.h |   3 +
 xen/include/public/sysctl.h               |  57 ++
 16 files changed, 1171 insertions(+), 28 deletions(-)
 create mode 100644 xen/arch/x86/acpi/cpufreq/hwp.c

-- 
2.37.1



^ permalink raw reply	[flat|nested] 24+ messages in thread

* [PATCH v2 01/13] cpufreq: Allow restricting to internal governors only
  2022-08-10 19:29 [PATCH v2 00/13] Intel Hardware P-States (HWP) support Jason Andryuk
@ 2022-08-10 19:29 ` Jason Andryuk
  2022-08-15 14:22   ` Jan Beulich
  2022-08-10 19:29 ` [PATCH v2 02/13] cpufreq: Add perf_freq to cpuinfo Jason Andryuk
                   ` (11 subsequent siblings)
  12 siblings, 1 reply; 24+ messages in thread
From: Jason Andryuk @ 2022-08-10 19:29 UTC (permalink / raw)
  To: xen-devel; +Cc: Jason Andryuk, Jan Beulich

For hwp, the standard governors are not usable, and only the internal
one is applicable.  Add the cpufreq_governor_internal boolean to
indicate when an internal governor, like hwp-internal, will be used.
This is set during presmp_initcall, so that it can suppress governor
registration during initcall.  Only a governor with a name containing
"-internal" will be allowed in that case.

This way, the unuseable governors are not registered, so they internal
one is the only one returned to userspace.  This means incompatible
governors won't be advertised to userspace.

Signed-off-by: Jason Andryuk <jandryuk@gmail.com>
---
v2:
Switch to "-internal"
Add blank line in header
---
 xen/drivers/cpufreq/cpufreq.c      | 5 +++++
 xen/include/acpi/cpufreq/cpufreq.h | 2 ++
 2 files changed, 7 insertions(+)

diff --git a/xen/drivers/cpufreq/cpufreq.c b/xen/drivers/cpufreq/cpufreq.c
index a94520ee57..1fdd63d7b5 100644
--- a/xen/drivers/cpufreq/cpufreq.c
+++ b/xen/drivers/cpufreq/cpufreq.c
@@ -57,6 +57,7 @@ struct cpufreq_dom {
 };
 static LIST_HEAD_READ_MOSTLY(cpufreq_dom_list_head);
 
+bool __read_mostly cpufreq_governor_internal;
 struct cpufreq_governor *__read_mostly cpufreq_opt_governor;
 LIST_HEAD_READ_MOSTLY(cpufreq_governor_list);
 
@@ -122,6 +123,10 @@ int __init cpufreq_register_governor(struct cpufreq_governor *governor)
     if (!governor)
         return -EINVAL;
 
+    if (cpufreq_governor_internal &&
+        strstr(governor->name, "-internal") == NULL)
+        return -EINVAL;
+
     if (__find_governor(governor->name) != NULL)
         return -EEXIST;
 
diff --git a/xen/include/acpi/cpufreq/cpufreq.h b/xen/include/acpi/cpufreq/cpufreq.h
index 35dcf21e8f..0da32ef519 100644
--- a/xen/include/acpi/cpufreq/cpufreq.h
+++ b/xen/include/acpi/cpufreq/cpufreq.h
@@ -114,6 +114,8 @@ extern struct cpufreq_governor cpufreq_gov_userspace;
 extern struct cpufreq_governor cpufreq_gov_performance;
 extern struct cpufreq_governor cpufreq_gov_powersave;
 
+extern bool cpufreq_governor_internal;
+
 extern struct list_head cpufreq_governor_list;
 
 extern int cpufreq_register_governor(struct cpufreq_governor *governor);
-- 
2.37.1



^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH v2 02/13] cpufreq: Add perf_freq to cpuinfo
  2022-08-10 19:29 [PATCH v2 00/13] Intel Hardware P-States (HWP) support Jason Andryuk
  2022-08-10 19:29 ` [PATCH v2 01/13] cpufreq: Allow restricting to internal governors only Jason Andryuk
@ 2022-08-10 19:29 ` Jason Andryuk
  2022-08-15 14:29   ` Jan Beulich
  2022-08-10 19:29 ` [PATCH v2 03/13] cpufreq: Export intel_feature_detect Jason Andryuk
                   ` (10 subsequent siblings)
  12 siblings, 1 reply; 24+ messages in thread
From: Jason Andryuk @ 2022-08-10 19:29 UTC (permalink / raw)
  To: xen-devel
  Cc: Jason Andryuk, Jan Beulich, Andrew Cooper, Roger Pau Monné,
	Wei Liu

acpi-cpufreq scales the aperf/mperf measurements by max_freq, but HWP
needs to scale by base frequency.  Settings max_freq to base_freq
"works" but the code is not obvious, and returning values to userspace
is tricky.  Add an additonal perf_freq member which is used for scaling
aperf/mperf measurements.

Signed-off-by: Jason Andryuk <jandryuk@gmail.com>
---
I don't like this, but it seems the best way to re-use the common
aperf/mperf code.  The other option would be to add wrappers that then
do the acpi vs. hwp scaling.
---
 xen/arch/x86/acpi/cpufreq/cpufreq.c | 2 +-
 xen/drivers/cpufreq/utility.c       | 1 +
 xen/include/acpi/cpufreq/cpufreq.h  | 3 +++
 3 files changed, 5 insertions(+), 1 deletion(-)

diff --git a/xen/arch/x86/acpi/cpufreq/cpufreq.c b/xen/arch/x86/acpi/cpufreq/cpufreq.c
index c27cbb2304..ded0150b3b 100644
--- a/xen/arch/x86/acpi/cpufreq/cpufreq.c
+++ b/xen/arch/x86/acpi/cpufreq/cpufreq.c
@@ -317,7 +317,7 @@ unsigned int get_measured_perf(unsigned int cpu, unsigned int flag)
     else
         perf_percent = 0;
 
-    return policy->cpuinfo.max_freq * perf_percent / 100;
+    return policy->cpuinfo.perf_freq * perf_percent / 100;
 }
 
 static unsigned int cf_check get_cur_freq_on_cpu(unsigned int cpu)
diff --git a/xen/drivers/cpufreq/utility.c b/xen/drivers/cpufreq/utility.c
index 9eb7ecedcd..6831f62851 100644
--- a/xen/drivers/cpufreq/utility.c
+++ b/xen/drivers/cpufreq/utility.c
@@ -236,6 +236,7 @@ int cpufreq_frequency_table_cpuinfo(struct cpufreq_policy *policy,
 
     policy->min = policy->cpuinfo.min_freq = min_freq;
     policy->max = policy->cpuinfo.max_freq = max_freq;
+    policy->cpuinfo.perf_freq = max_freq;
     policy->cpuinfo.second_max_freq = second_max_freq;
 
     if (policy->min == ~0)
diff --git a/xen/include/acpi/cpufreq/cpufreq.h b/xen/include/acpi/cpufreq/cpufreq.h
index 0da32ef519..a06aa92f62 100644
--- a/xen/include/acpi/cpufreq/cpufreq.h
+++ b/xen/include/acpi/cpufreq/cpufreq.h
@@ -37,6 +37,9 @@ extern struct acpi_cpufreq_data *cpufreq_drv_data[NR_CPUS];
 struct cpufreq_cpuinfo {
     unsigned int        max_freq;
     unsigned int        second_max_freq;    /* P1 if Turbo Mode is on */
+    unsigned int        perf_freq; /* Scaling freq for aperf/mpref.
+                                      acpi-cpufreq uses max_freq, but HWP uses
+                                      base_freq.*/
     unsigned int        min_freq;
     unsigned int        transition_latency; /* in 10^(-9) s = nanoseconds */
 };
-- 
2.37.1



^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH v2 03/13] cpufreq: Export intel_feature_detect
  2022-08-10 19:29 [PATCH v2 00/13] Intel Hardware P-States (HWP) support Jason Andryuk
  2022-08-10 19:29 ` [PATCH v2 01/13] cpufreq: Allow restricting to internal governors only Jason Andryuk
  2022-08-10 19:29 ` [PATCH v2 02/13] cpufreq: Add perf_freq to cpuinfo Jason Andryuk
@ 2022-08-10 19:29 ` Jason Andryuk
  2022-08-15 14:34   ` Jan Beulich
  2022-08-10 19:29 ` [PATCH v2 04/13] cpufreq: Add Hardware P-State (HWP) driver Jason Andryuk
                   ` (9 subsequent siblings)
  12 siblings, 1 reply; 24+ messages in thread
From: Jason Andryuk @ 2022-08-10 19:29 UTC (permalink / raw)
  To: xen-devel
  Cc: Jason Andryuk, Jan Beulich, Andrew Cooper, Roger Pau Monné,
	Wei Liu

Export feature_detect as intel_feature_detect so it can be re-used by
HWP.

Signed-off-by: Jason Andryuk <jandryuk@gmail.com>
---
v2
export intel_feature_detect with typed pointer
Move intel_feature_detect to acpi/cpufreq/cpufreq.h since the
declaration now contains struct cpufreq_policy *.
---
 xen/arch/x86/acpi/cpufreq/cpufreq.c | 8 ++++++--
 xen/include/acpi/cpufreq/cpufreq.h  | 2 ++
 2 files changed, 8 insertions(+), 2 deletions(-)

diff --git a/xen/arch/x86/acpi/cpufreq/cpufreq.c b/xen/arch/x86/acpi/cpufreq/cpufreq.c
index ded0150b3b..b5eb869227 100644
--- a/xen/arch/x86/acpi/cpufreq/cpufreq.c
+++ b/xen/arch/x86/acpi/cpufreq/cpufreq.c
@@ -340,9 +340,8 @@ static unsigned int cf_check get_cur_freq_on_cpu(unsigned int cpu)
     return extract_freq(get_cur_val(cpumask_of(cpu)), data);
 }
 
-static void cf_check feature_detect(void *info)
+void intel_feature_detect(struct cpufreq_policy *policy)
 {
-    struct cpufreq_policy *policy = info;
     unsigned int eax;
 
     eax = cpuid_eax(6);
@@ -354,6 +353,11 @@ static void cf_check feature_detect(void *info)
     }
 }
 
+static void cf_check feature_detect(void *info)
+{
+    intel_feature_detect((struct cpufreq_policy *)info);
+}
+
 static unsigned int check_freqs(const cpumask_t *mask, unsigned int freq,
                                 struct acpi_cpufreq_data *data)
 {
diff --git a/xen/include/acpi/cpufreq/cpufreq.h b/xen/include/acpi/cpufreq/cpufreq.h
index a06aa92f62..0f334d2a43 100644
--- a/xen/include/acpi/cpufreq/cpufreq.h
+++ b/xen/include/acpi/cpufreq/cpufreq.h
@@ -243,4 +243,6 @@ int write_userspace_scaling_setspeed(unsigned int cpu, unsigned int freq);
 void cpufreq_dbs_timer_suspend(void);
 void cpufreq_dbs_timer_resume(void);
 
+void intel_feature_detect(struct cpufreq_policy *policy);
+
 #endif /* __XEN_CPUFREQ_PM_H__ */
-- 
2.37.1



^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH v2 04/13] cpufreq: Add Hardware P-State (HWP) driver
  2022-08-10 19:29 [PATCH v2 00/13] Intel Hardware P-States (HWP) support Jason Andryuk
                   ` (2 preceding siblings ...)
  2022-08-10 19:29 ` [PATCH v2 03/13] cpufreq: Export intel_feature_detect Jason Andryuk
@ 2022-08-10 19:29 ` Jason Andryuk
  2022-08-10 19:29 ` [PATCH v2 05/13] xenpm: Change get-cpufreq-para output for internal Jason Andryuk
                   ` (8 subsequent siblings)
  12 siblings, 0 replies; 24+ messages in thread
From: Jason Andryuk @ 2022-08-10 19:29 UTC (permalink / raw)
  To: xen-devel
  Cc: Jason Andryuk, Andrew Cooper, George Dunlap, Jan Beulich,
	Julien Grall, Stefano Stabellini, Wei Liu, Roger Pau Monné

From the Intel SDM: "Hardware-Controlled Performance States (HWP), which
autonomously selects performance states while utilizing OS supplied
performance guidance hints."

Enable HWP to run in autonomous mode by poking the correct MSRs.
cpufreq=xen:hwp enables and cpufreq=xen:hwp=0 disables.  The same for
hdc.

There is no interface to configure - xen_sysctl_pm_op/xenpm will
be to be extended to configure in subsequent patches.  It will run with
the default values, which should be the default 0x80 (out
of 0x0-0xff) energy/performance preference.

Unscientific powertop measurement of an mostly idle, customized OpenXT
install:
A 10th gen 6-core laptop showed battery discharge drop from ~9.x to
~7.x watts.
A 8th gen 4-core laptop dropped from ~10 to ~9

Power usage depends on many factors, especially display brightness, but
this does show an power saving in balanced mode when CPU utilization is
low.

HWP isn't compatible with an external governor - it doesn't take
explicit frequency requests.  Therefore a minimal internal governor,
hwp-internal, is also added as a placeholder.

While adding to the xen-command-line.pandoc entry, un-nest verbose from
minfreq.  They are independent.

Signed-off-by: Jason Andryuk <jandryuk@gmail.com>

---

We disable on cpuid_level < 0x16.  cpuid(0x16) is used to get the cpu
frequencies for calculating the APERF/MPERF.  Without it, things would
still work, but the averge cpufrequency output would be wrong.

My 8th & 10th gen test systems both report:
(XEN) HWP: 1 notify: 1 act_window: 1 energy_perf: 1 pkg_level: 0 peci: 0
(XEN) HWP: Hardware Duty Cycling (HDC) supported
(XEN) HWP: HW_FEEDBACK not supported

IA32_ENERGY_PERF_BIAS has not been tested.

For cpufreq=xen:hwp, placing the option inside the governor wouldn't
work.  Users would have to select the hwp-internal governor to turn off
hwp support.  hwp-internal isn't usable without hwp, and users wouldn't
be able to select a different governor.  That doesn't matter while hwp
defaults off, but it would if or when hwp defaults to enabled.

We can't use parse_boolean() since it requires a single name=val string
and cpufreq_handle_common_option is provided two strings.  Use
parse_bool() and manual handle no-hwp.

Write to disable the interrupt - the linux pstate driver does this.  We
don't use the interrupts, so we can just turn them off.  We aren't ready
to handle them, so we don't want any.  Unclear if this is necessary.
SDM says it's default disabled.

FAST_IA32_HWP_REQUEST was removed in v2.  The check in v1 was wrong,
it's a model specific feature and the CPUID bit is only available
after enabling via the MSR.  Support was untested since I don't have
hardware with the feature.  Writes are expected to be infrequent, so
just leave it out.

---
v2:
Alphabetize headers
Re-work driver registration
name hwp_drv_data anonymous union "hw"
Drop hwp_verbose_cont
style cleanups
Condense hwp_governor switch
hwp_cpufreq_target remove .raw from hwp_req assignment
Use typed-pointer in a few functions
Pass type to xzalloc
Add HWP_ENERGY_PERF_BALANCE/IA32_ENERGY_BIAS_BALANCE defines
Add XEN_HWP_GOVERNOR define for "hwp-internal"
Capitalize CPUID and MSR defines
Change '_' to '-' for energy-perf & act-window
Read-modify-write MSRs updates
Use FAST_IA32_HWP_REQUEST_MSR_ENABLE define
constify pointer in hwp_set_misc_turbo
Add space after non-fallthrough break in governor switch
Add IA32_ENERGY_BIAS_MASK define
Check CPUID_PM_LEAK for energy bias when needed
Fail initialization with curr_req = -1
Fold hwp_read_capabilities into hwp_init_msrs
Add command line cpufreq=xen:hwp
Add command line cpufreq=xen:hdc
Use per_cpu for hwp_drv_data pointers
Move hwp_energy_perf_bias call into hwp_write_request
energy_perf 0 is valid, so hwp_energy_perf_bias cannot be skipped
Ensure we don't generate interrupts
Remove Fast Write of Uncore MSR
Initialize hwp_drv_data from curr_req
Use SPDX line instead of license text in hwp.c
---
 docs/misc/xen-command-line.pandoc         |   8 +-
 xen/arch/x86/acpi/cpufreq/Makefile        |   1 +
 xen/arch/x86/acpi/cpufreq/cpufreq.c       |   5 +-
 xen/arch/x86/acpi/cpufreq/hwp.c           | 506 ++++++++++++++++++++++
 xen/arch/x86/include/asm/cpufeature.h     |  13 +-
 xen/arch/x86/include/asm/msr-index.h      |  13 +
 xen/drivers/cpufreq/cpufreq.c             |  32 ++
 xen/include/acpi/cpufreq/cpufreq.h        |   3 +
 xen/include/acpi/cpufreq/processor_perf.h |   3 +
 xen/include/public/sysctl.h               |   1 +
 10 files changed, 581 insertions(+), 4 deletions(-)
 create mode 100644 xen/arch/x86/acpi/cpufreq/hwp.c

diff --git a/docs/misc/xen-command-line.pandoc b/docs/misc/xen-command-line.pandoc
index fa56e880b3..2ac5b8f5aa 100644
--- a/docs/misc/xen-command-line.pandoc
+++ b/docs/misc/xen-command-line.pandoc
@@ -492,7 +492,7 @@ If set, force use of the performance counters for oprofile, rather than detectin
 available support.
 
 ### cpufreq
-> `= none | {{ <boolean> | xen } [:[powersave|performance|ondemand|userspace][,<maxfreq>][,[<minfreq>][,[verbose]]]]} | dom0-kernel`
+> `= none | {{ <boolean> | xen } [:[powersave|performance|ondemand|userspace][,<hdc>][,[<hwp>]][,[<maxfreq>]][,[<minfreq>]][,[verbose]]]} | dom0-kernel`
 
 > Default: `xen`
 
@@ -503,6 +503,12 @@ choice of `dom0-kernel` is deprecated and not supported by all Dom0 kernels.
 * `<maxfreq>` and `<minfreq>` are integers which represent max and min processor frequencies
   respectively.
 * `verbose` option can be included as a string or also as `verbose=<integer>`
+* `<hwp>` is a boolean to enable Hardware-Controlled Performance States (HWP)
+  on supported Intel hardware.  HWP is a Skylake+ feature which provides better
+  CPU power management.  The default is disabled.
+* `<hdc>` is a boolean to enable Hardware Duty Cycling (HDC).  HDC enables the
+  processor to autonomously force physical package components into idle state.
+  The default is enabled, but the option only applies when `<hwp>` is enabled.
 
 ### cpuid (x86)
 > `= List of comma separated booleans`
diff --git a/xen/arch/x86/acpi/cpufreq/Makefile b/xen/arch/x86/acpi/cpufreq/Makefile
index f75da9b9ca..db83aa6b14 100644
--- a/xen/arch/x86/acpi/cpufreq/Makefile
+++ b/xen/arch/x86/acpi/cpufreq/Makefile
@@ -1,2 +1,3 @@
 obj-y += cpufreq.o
+obj-y += hwp.o
 obj-y += powernow.o
diff --git a/xen/arch/x86/acpi/cpufreq/cpufreq.c b/xen/arch/x86/acpi/cpufreq/cpufreq.c
index b5eb869227..baafaf9b90 100644
--- a/xen/arch/x86/acpi/cpufreq/cpufreq.c
+++ b/xen/arch/x86/acpi/cpufreq/cpufreq.c
@@ -643,7 +643,10 @@ static int __init cf_check cpufreq_driver_init(void)
         switch ( boot_cpu_data.x86_vendor )
         {
         case X86_VENDOR_INTEL:
-            ret = cpufreq_register_driver(&acpi_cpufreq_driver);
+            if ( hwp_available() )
+                ret = hwp_register_driver();
+            else
+                ret = cpufreq_register_driver(&acpi_cpufreq_driver);
             break;
 
         case X86_VENDOR_AMD:
diff --git a/xen/arch/x86/acpi/cpufreq/hwp.c b/xen/arch/x86/acpi/cpufreq/hwp.c
new file mode 100644
index 0000000000..405cbae334
--- /dev/null
+++ b/xen/arch/x86/acpi/cpufreq/hwp.c
@@ -0,0 +1,506 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/*
+ * hwp.c cpufreq driver to run Intel Hardware P-States (HWP)
+ *
+ * Copyright (C) 2021 Jason Andryuk <jandryuk@gmail.com>
+ */
+
+#include <xen/cpumask.h>
+#include <xen/init.h>
+#include <xen/param.h>
+#include <xen/xmalloc.h>
+#include <asm/io.h>
+#include <asm/msr.h>
+#include <acpi/cpufreq/cpufreq.h>
+
+static bool feature_hwp;
+static bool feature_hwp_notification;
+static bool feature_hwp_activity_window;
+static bool feature_hwp_energy_perf;
+static bool feature_hwp_pkg_level_ctl;
+static bool feature_hwp_peci;
+
+static bool feature_hdc;
+
+__initdata bool opt_cpufreq_hwp = false;
+__initdata bool opt_cpufreq_hdc = true;
+
+#define HWP_ENERGY_PERF_BALANCE         0x80
+#define IA32_ENERGY_BIAS_BALANCE        0x7
+#define IA32_ENERGY_BIAS_MAX_POWERSAVE  0xf
+#define IA32_ENERGY_BIAS_MASK           0xf
+
+union hwp_request
+{
+    struct
+    {
+        uint64_t min_perf:8;
+        uint64_t max_perf:8;
+        uint64_t desired:8;
+        uint64_t energy_perf:8;
+        uint64_t activity_window:10;
+        uint64_t package_control:1;
+        uint64_t reserved:16;
+        uint64_t activity_window_valid:1;
+        uint64_t energy_perf_valid:1;
+        uint64_t desired_valid:1;
+        uint64_t max_perf_valid:1;
+        uint64_t min_perf_valid:1;
+    };
+    uint64_t raw;
+};
+
+struct hwp_drv_data
+{
+    union
+    {
+        uint64_t hwp_caps;
+        struct
+        {
+            uint64_t highest:8;
+            uint64_t guaranteed:8;
+            uint64_t most_efficient:8;
+            uint64_t lowest:8;
+            uint64_t reserved:32;
+        } hw;
+    };
+    union hwp_request curr_req;
+    uint16_t activity_window;
+    uint8_t minimum;
+    uint8_t maximum;
+    uint8_t desired;
+    uint8_t energy_perf;
+};
+DEFINE_PER_CPU_READ_MOSTLY(struct hwp_drv_data *, hwp_drv_data);
+
+#define hwp_err(...)     printk(XENLOG_ERR __VA_ARGS__)
+#define hwp_info(...)    printk(XENLOG_INFO __VA_ARGS__)
+#define hwp_verbose(...)                   \
+({                                         \
+    if ( cpufreq_verbose )                 \
+        printk(XENLOG_DEBUG __VA_ARGS__);  \
+})
+
+static int cf_check hwp_governor(struct cpufreq_policy *policy,
+                                 unsigned int event)
+{
+    int ret;
+
+    if ( policy == NULL )
+        return -EINVAL;
+
+    switch ( event )
+    {
+    case CPUFREQ_GOV_START:
+    case CPUFREQ_GOV_LIMITS:
+        ret = 0;
+        break;
+
+    case CPUFREQ_GOV_STOP:
+    default:
+        ret = -EINVAL;
+        break;
+    }
+
+    return ret;
+}
+
+static struct cpufreq_governor hwp_cpufreq_governor =
+{
+    .name          = XEN_HWP_GOVERNOR,
+    .governor      = hwp_governor,
+};
+
+static int __init cpufreq_gov_hwp_init(void)
+{
+    return cpufreq_register_governor(&hwp_cpufreq_governor);
+}
+__initcall(cpufreq_gov_hwp_init);
+
+bool __init hwp_available(void)
+{
+    unsigned int eax, ecx, unused;
+    bool use_hwp;
+
+    if ( boot_cpu_data.cpuid_level < CPUID_PM_LEAF )
+    {
+        hwp_verbose("cpuid_level (%u) lacks HWP support\n",
+                    boot_cpu_data.cpuid_level);
+        return false;
+    }
+
+    if ( boot_cpu_data.cpuid_level < 0x16 )
+    {
+        hwp_info("HWP disabled: cpuid_level %x < 0x16 lacks CPU freq info\n",
+                 boot_cpu_data.cpuid_level);
+        return false;
+    }
+
+    cpuid(CPUID_PM_LEAF, &eax, &unused, &ecx, &unused);
+
+    if ( !(eax & CPUID6_EAX_HWP_ENERGY_PERFORMANCE_PREFERENCE) &&
+         !(ecx & CPUID6_ECX_IA32_ENERGY_PERF_BIAS) )
+    {
+        hwp_verbose("HWP disabled: No energy/performance preference available");
+        return false;
+    }
+
+    feature_hwp                 = eax & CPUID6_EAX_HWP;
+    feature_hwp_notification    = eax & CPUID6_EAX_HWP_NOTIFICATION;
+    feature_hwp_activity_window = eax & CPUID6_EAX_HWP_ACTIVITY_WINDOW;
+    feature_hwp_energy_perf     =
+        eax & CPUID6_EAX_HWP_ENERGY_PERFORMANCE_PREFERENCE;
+    feature_hwp_pkg_level_ctl   = eax & CPUID6_EAX_HWP_PACKAGE_LEVEL_REQUEST;
+    feature_hwp_peci            = eax & CPUID6_EAX_HWP_PECI;
+
+    hwp_verbose("HWP: %d notify: %d act-window: %d energy-perf: %d pkg-level: %d peci: %d\n",
+                feature_hwp, feature_hwp_notification,
+                feature_hwp_activity_window, feature_hwp_energy_perf,
+                feature_hwp_pkg_level_ctl, feature_hwp_peci);
+
+    if ( !feature_hwp )
+        return false;
+
+    feature_hdc = eax & CPUID6_EAX_HDC;
+
+    hwp_verbose("HWP: Hardware Duty Cycling (HDC) %ssupported%s\n",
+                feature_hdc ? "" : "not ",
+                feature_hdc ? opt_cpufreq_hdc ? ", enabled" : ", disabled"
+                            : "");
+
+    feature_hdc = feature_hdc && opt_cpufreq_hdc;
+
+    hwp_verbose("HWP: HW_FEEDBACK %ssupported\n",
+                (eax & CPUID6_EAX_HW_FEEDBACK) ? "" : "not ");
+
+    use_hwp = feature_hwp && opt_cpufreq_hwp;
+    cpufreq_governor_internal = use_hwp;
+
+    if ( use_hwp )
+        hwp_info("Using HWP for cpufreq\n");
+
+    return use_hwp;
+}
+
+static void hdc_set_pkg_hdc_ctl(bool val)
+{
+    uint64_t msr;
+
+    if ( rdmsr_safe(MSR_IA32_PKG_HDC_CTL, msr) )
+    {
+        hwp_err("error rdmsr_safe(MSR_IA32_PKG_HDC_CTL)\n");
+
+        return;
+    }
+
+    if ( val )
+        msr |= IA32_PKG_HDC_CTL_HDC_PKG_ENABLE;
+    else
+        msr &= ~IA32_PKG_HDC_CTL_HDC_PKG_ENABLE;
+
+    if ( wrmsr_safe(MSR_IA32_PKG_HDC_CTL, msr) )
+        hwp_err("error wrmsr_safe(MSR_IA32_PKG_HDC_CTL): %016lx\n", msr);
+}
+
+static void hdc_set_pm_ctl1(bool val)
+{
+    uint64_t msr;
+
+    if ( rdmsr_safe(MSR_IA32_PM_CTL1, msr) )
+    {
+        hwp_err("error rdmsr_safe(MSR_IA32_PM_CTL1)\n");
+
+        return;
+    }
+
+    if ( val )
+        msr |= IA32_PM_CTL1_HDC_ALLOW_BLOCK;
+    else
+        msr &= ~IA32_PM_CTL1_HDC_ALLOW_BLOCK;
+
+    if ( wrmsr_safe(MSR_IA32_PM_CTL1, msr) )
+        hwp_err("error wrmsr_safe(MSR_IA32_PM_CTL1): %016lx\n", msr);
+}
+
+static void hwp_get_cpu_speeds(struct cpufreq_policy *policy)
+{
+    uint32_t base_khz, max_khz, bus_khz, edx;
+
+    cpuid(0x16, &base_khz, &max_khz, &bus_khz, &edx);
+
+    /* aperf/mperf scales base. */
+    policy->cpuinfo.perf_freq = base_khz * 1000;
+    policy->cpuinfo.min_freq = base_khz * 1000;
+    policy->cpuinfo.max_freq = max_khz * 1000;
+    policy->min = base_khz * 1000;
+    policy->max = max_khz * 1000;
+    policy->cur = 0;
+}
+
+static void cf_check hwp_init_msrs(void *info)
+{
+    struct cpufreq_policy *policy = info;
+    struct hwp_drv_data *data = this_cpu(hwp_drv_data);
+    uint64_t val;
+
+    /*
+     * Package level MSR, but we don't have a good idea of packages here, so
+     * just do it everytime.
+     */
+    if ( rdmsr_safe(MSR_IA32_PM_ENABLE, val) )
+    {
+        hwp_err("CPU%u: error rdmsr_safe(MSR_IA32_PM_ENABLE)\n", policy->cpu);
+        data->curr_req.raw = -1;
+        return;
+    }
+
+    /* Ensure we don't generate interrupts */
+    if ( feature_hwp_notification )
+        wrmsr_safe(MSR_IA32_HWP_INTERRUPT, 0);
+
+    hwp_verbose("CPU%u: MSR_IA32_PM_ENABLE: %016lx\n", policy->cpu, val);
+    if ( !(val & IA32_PM_ENABLE_HWP_ENABLE) )
+    {
+        val |= IA32_PM_ENABLE_HWP_ENABLE;
+        if ( wrmsr_safe(MSR_IA32_PM_ENABLE, val) )
+        {
+            hwp_err("CPU%u: error wrmsr_safe(MSR_IA32_PM_ENABLE, %lx)\n",
+                    policy->cpu, val);
+            data->curr_req.raw = -1;
+            return;
+        }
+    }
+
+    if ( rdmsr_safe(MSR_IA32_HWP_CAPABILITIES, data->hwp_caps) )
+    {
+        hwp_err("CPU%u: error rdmsr_safe(MSR_IA32_HWP_CAPABILITIES)\n",
+                policy->cpu);
+        data->curr_req.raw = -1;
+        return;
+    }
+
+    if ( rdmsr_safe(MSR_IA32_HWP_REQUEST, data->curr_req.raw) )
+    {
+        hwp_err("CPU%u: error rdmsr_safe(MSR_IA32_HWP_REQUEST)\n", policy->cpu);
+        data->curr_req.raw = -1;
+        return;
+    }
+
+    if ( !feature_hwp_energy_perf ) {
+        if ( rdmsr_safe(MSR_IA32_ENERGY_PERF_BIAS, val) )
+        {
+            hwp_err("error rdmsr_safe(MSR_IA32_ENERGY_PERF_BIAS)\n");
+            data->curr_req.raw = -1;
+
+            return;
+        }
+
+        data->energy_perf = val & IA32_ENERGY_BIAS_MASK;
+    }
+
+    /*
+     * Check for APERF/MPERF support in hardware
+     * also check for boost/turbo support
+     */
+    intel_feature_detect(policy);
+
+    if ( feature_hdc )
+    {
+        hdc_set_pkg_hdc_ctl(true);
+        hdc_set_pm_ctl1(true);
+    }
+
+    hwp_get_cpu_speeds(policy);
+}
+
+static int cf_check hwp_cpufreq_verify(struct cpufreq_policy *policy)
+{
+    struct hwp_drv_data *data = per_cpu(hwp_drv_data, policy->cpu);
+
+    if ( !feature_hwp_energy_perf && data->energy_perf )
+    {
+        if ( data->energy_perf > IA32_ENERGY_BIAS_MAX_POWERSAVE )
+        {
+            hwp_err("energy_perf %d exceeds IA32_ENERGY_PERF_BIAS range 0-15\n",
+                    data->energy_perf);
+
+            return -EINVAL;
+        }
+    }
+
+    if ( !feature_hwp_activity_window && data->activity_window )
+    {
+        hwp_err("HWP activity window not supported\n");
+
+        return -EINVAL;
+    }
+
+    return 0;
+}
+
+/* val 0 - highest performance, 15 - maximum energy savings */
+static void hwp_energy_perf_bias(const struct hwp_drv_data *data)
+{
+    uint64_t msr;
+    uint8_t val = data->energy_perf;
+
+    ASSERT(val <= IA32_ENERGY_BIAS_MAX_POWERSAVE);
+
+    if ( rdmsr_safe(MSR_IA32_ENERGY_PERF_BIAS, msr) )
+    {
+        hwp_err("error rdmsr_safe(MSR_IA32_ENERGY_PERF_BIAS)\n");
+
+        return;
+    }
+
+    msr &= ~IA32_ENERGY_BIAS_MASK;
+    msr |= val;
+
+    if ( wrmsr_safe(MSR_IA32_ENERGY_PERF_BIAS, msr) )
+        hwp_err("error wrmsr_safe(MSR_IA32_ENERGY_PERF_BIAS): %016lx\n", msr);
+}
+
+static void cf_check hwp_write_request(void *info)
+{
+    struct cpufreq_policy *policy = info;
+    struct hwp_drv_data *data = this_cpu(hwp_drv_data);
+    union hwp_request hwp_req = data->curr_req;
+
+    BUILD_BUG_ON(sizeof(union hwp_request) != sizeof(uint64_t));
+    if ( wrmsr_safe(MSR_IA32_HWP_REQUEST, hwp_req.raw) )
+    {
+        hwp_err("CPU%u: error wrmsr_safe(MSR_IA32_HWP_REQUEST, %lx)\n",
+                policy->cpu, hwp_req.raw);
+        rdmsr_safe(MSR_IA32_HWP_REQUEST, data->curr_req.raw);
+    }
+
+    if ( !feature_hwp_energy_perf )
+        hwp_energy_perf_bias(data);
+
+}
+
+static int cf_check hwp_cpufreq_target(struct cpufreq_policy *policy,
+                                       unsigned int target_freq,
+                                       unsigned int relation)
+{
+    unsigned int cpu = policy->cpu;
+    struct hwp_drv_data *data = per_cpu(hwp_drv_data, cpu);
+    /* Zero everything to ensure reserved bits are zero... */
+    union hwp_request hwp_req = { .raw = 0 };
+
+    /* .. and update from there */
+    hwp_req.min_perf = data->minimum;
+    hwp_req.max_perf = data->maximum;
+    hwp_req.desired = data->desired;
+    if ( feature_hwp_energy_perf )
+        hwp_req.energy_perf = data->energy_perf;
+    if ( feature_hwp_activity_window )
+        hwp_req.activity_window = data->activity_window;
+
+    if ( hwp_req.raw == data->curr_req.raw )
+        return 0;
+
+    data->curr_req = hwp_req;
+
+    hwp_verbose("CPU%u: wrmsr HWP_REQUEST %016lx\n", cpu, hwp_req.raw);
+    on_selected_cpus(cpumask_of(cpu), hwp_write_request, policy, 1);
+
+    return 0;
+}
+
+static int cf_check hwp_cpufreq_cpu_init(struct cpufreq_policy *policy)
+{
+    unsigned int cpu = policy->cpu;
+    struct hwp_drv_data *data;
+
+    data = xzalloc(struct hwp_drv_data);
+    if ( !data )
+        return -ENOMEM;
+
+    if ( cpufreq_opt_governor )
+        printk(XENLOG_WARNING
+               "HWP: governor \"%s\" is incompatible with hwp. Using default \"%s\"\n",
+               cpufreq_opt_governor->name, hwp_cpufreq_governor.name);
+    policy->governor = &hwp_cpufreq_governor;
+
+    per_cpu(hwp_drv_data, cpu) = data;
+
+    on_selected_cpus(cpumask_of(cpu), hwp_init_msrs, policy, 1);
+
+    if ( data->curr_req.raw == -1 )
+    {
+        hwp_err("CPU%u: Could not initialize HWP properly\n", cpu);
+        XFREE(per_cpu(hwp_drv_data, cpu));
+        return -ENODEV;
+    }
+
+    data->minimum = data->curr_req.min_perf;
+    data->maximum = data->curr_req.max_perf;
+    data->desired = data->curr_req.desired;
+    /* the !feature_hwp_energy_perf case was handled in hwp_init_msrs(). */
+    if ( feature_hwp_energy_perf )
+        data->energy_perf = data->curr_req.energy_perf;
+
+    hwp_verbose("CPU%u: IA32_HWP_CAPABILITIES: %016lx\n", cpu, data->hwp_caps);
+
+    hwp_verbose("CPU%u: rdmsr HWP_REQUEST %016lx\n", cpu, data->curr_req.raw);
+
+    return 0;
+}
+
+static int cf_check hwp_cpufreq_cpu_exit(struct cpufreq_policy *policy)
+{
+    XFREE(per_cpu(hwp_drv_data, policy->cpu));
+
+    return 0;
+}
+
+/*
+ * The SDM reads like turbo should be disabled with MSR_IA32_PERF_CTL and
+ * PERF_CTL_TURBO_DISENGAGE, but that does not seem to actually work, at least
+ * with my HWP testing.  MSR_IA32_MISC_ENABLE and MISC_ENABLE_TURBO_DISENGAGE
+ * is what Linux uses and seems to work.
+ */
+static void cf_check hwp_set_misc_turbo(void *info)
+{
+    const struct cpufreq_policy *policy = info;
+    uint64_t msr;
+
+    if ( rdmsr_safe(MSR_IA32_MISC_ENABLE, msr) )
+    {
+        hwp_err("CPU%u: error rdmsr_safe(MSR_IA32_MISC_ENABLE)\n", policy->cpu);
+
+        return;
+    }
+
+    if ( policy->turbo == CPUFREQ_TURBO_ENABLED )
+        msr &= ~MSR_IA32_MISC_ENABLE_TURBO_DISENGAGE;
+    else
+        msr |= MSR_IA32_MISC_ENABLE_TURBO_DISENGAGE;
+
+    if ( wrmsr_safe(MSR_IA32_MISC_ENABLE, msr) )
+        hwp_err("CPU%u: error wrmsr_safe(MSR_IA32_MISC_ENABLE): %016lx\n",
+                policy->cpu, msr);
+}
+
+static int cf_check hwp_cpufreq_update(int cpuid, struct cpufreq_policy *policy)
+{
+    on_selected_cpus(cpumask_of(cpuid), hwp_set_misc_turbo, policy, 1);
+
+    return 0;
+}
+
+static const struct cpufreq_driver __initconstrel hwp_cpufreq_driver =
+{
+    .name   = "hwp-cpufreq",
+    .verify = hwp_cpufreq_verify,
+    .target = hwp_cpufreq_target,
+    .init   = hwp_cpufreq_cpu_init,
+    .exit   = hwp_cpufreq_cpu_exit,
+    .update = hwp_cpufreq_update,
+};
+
+int __init hwp_register_driver(void)
+{
+    return cpufreq_register_driver(&hwp_cpufreq_driver);
+}
diff --git a/xen/arch/x86/include/asm/cpufeature.h b/xen/arch/x86/include/asm/cpufeature.h
index f2c6f255ac..1aebf3f750 100644
--- a/xen/arch/x86/include/asm/cpufeature.h
+++ b/xen/arch/x86/include/asm/cpufeature.h
@@ -22,8 +22,17 @@
 #define cpu_has(c, bit)		test_bit(bit, (c)->x86_capability)
 #define boot_cpu_has(bit)	test_bit(bit, boot_cpu_data.x86_capability)
 
-#define CPUID_PM_LEAF                    6
-#define CPUID6_ECX_APERFMPERF_CAPABILITY 0x1
+#define CPUID_PM_LEAF                                6
+#define CPUID6_EAX_HWP                               (_AC(1, U) <<  7)
+#define CPUID6_EAX_HWP_NOTIFICATION                  (_AC(1, U) <<  8)
+#define CPUID6_EAX_HWP_ACTIVITY_WINDOW               (_AC(1, U) <<  9)
+#define CPUID6_EAX_HWP_ENERGY_PERFORMANCE_PREFERENCE (_AC(1, U) << 10)
+#define CPUID6_EAX_HWP_PACKAGE_LEVEL_REQUEST         (_AC(1, U) << 11)
+#define CPUID6_EAX_HDC                               (_AC(1, U) << 13)
+#define CPUID6_EAX_HWP_PECI                          (_AC(1, U) << 16)
+#define CPUID6_EAX_HW_FEEDBACK                       (_AC(1, U) << 19)
+#define CPUID6_ECX_APERFMPERF_CAPABILITY             0x1
+#define CPUID6_ECX_IA32_ENERGY_PERF_BIAS             0x8
 
 /* CPUID level 0x00000001.edx */
 #define cpu_has_fpu             1
diff --git a/xen/arch/x86/include/asm/msr-index.h b/xen/arch/x86/include/asm/msr-index.h
index 1a928ea6af..3961bac614 100644
--- a/xen/arch/x86/include/asm/msr-index.h
+++ b/xen/arch/x86/include/asm/msr-index.h
@@ -147,6 +147,13 @@
 #define MSR_PL3_SSP                         0x000006a7
 #define MSR_INTERRUPT_SSP_TABLE             0x000006a8
 
+#define MSR_IA32_PM_ENABLE                  0x00000770
+#define  IA32_PM_ENABLE_HWP_ENABLE          (_AC(1, ULL) <<  0)
+
+#define MSR_IA32_HWP_CAPABILITIES           0x00000771
+#define MSR_IA32_HWP_INTERRUPT              0x00000773
+#define MSR_IA32_HWP_REQUEST                0x00000774
+
 #define MSR_X2APIC_FIRST                    0x00000800
 #define MSR_X2APIC_LAST                     0x000008ff
 
@@ -161,6 +168,11 @@
 #define  PASID_PASID_MASK                   0x000fffff
 #define  PASID_VALID                        (_AC(1, ULL) << 31)
 
+#define MSR_IA32_PKG_HDC_CTL                0x00000db0
+#define  IA32_PKG_HDC_CTL_HDC_PKG_ENABLE    (_AC(1, ULL) <<  0)
+#define MSR_IA32_PM_CTL1                    0x00000db1
+#define  IA32_PM_CTL1_HDC_ALLOW_BLOCK       (_AC(1, ULL) <<  0)
+
 #define MSR_UARCH_MISC_CTRL                 0x00001b01
 #define  UARCH_CTRL_DOITM                   (_AC(1, ULL) <<  0)
 
@@ -496,6 +508,7 @@
 #define MSR_IA32_MISC_ENABLE_LIMIT_CPUID  (1<<22)
 #define MSR_IA32_MISC_ENABLE_XTPR_DISABLE (1<<23)
 #define MSR_IA32_MISC_ENABLE_XD_DISABLE	(1ULL << 34)
+#define MSR_IA32_MISC_ENABLE_TURBO_DISENGAGE (1ULL << 38)
 
 #define MSR_IA32_TSC_DEADLINE		0x000006E0
 #define MSR_IA32_ENERGY_PERF_BIAS	0x000001b0
diff --git a/xen/drivers/cpufreq/cpufreq.c b/xen/drivers/cpufreq/cpufreq.c
index 1fdd63d7b5..634f0a8a49 100644
--- a/xen/drivers/cpufreq/cpufreq.c
+++ b/xen/drivers/cpufreq/cpufreq.c
@@ -563,6 +563,38 @@ static void cpufreq_cmdline_common_para(struct cpufreq_policy *new_policy)
 
 static int __init cpufreq_handle_common_option(const char *name, const char *val)
 {
+    if (!strcmp(name, "hdc")) {
+        if (val) {
+            int ret = parse_bool(val, NULL);
+            if (ret != -1) {
+                opt_cpufreq_hdc = ret;
+                return 1;
+            }
+        } else {
+            opt_cpufreq_hdc = true;
+            return 1;
+        }
+    } else if (!strcmp(name, "no-hdc")) {
+        opt_cpufreq_hdc = false;
+        return 1;
+    }
+
+    if (!strcmp(name, "hwp")) {
+        if (val) {
+            int ret = parse_bool(val, NULL);
+            if (ret != -1) {
+                opt_cpufreq_hwp = ret;
+                return 1;
+            }
+        } else {
+            opt_cpufreq_hwp = true;
+            return 1;
+        }
+    } else if (!strcmp(name, "no-hwp")) {
+        opt_cpufreq_hwp = false;
+        return 1;
+    }
+
     if (!strcmp(name, "maxfreq") && val) {
         usr_max_freq = simple_strtoul(val, NULL, 0);
         return 1;
diff --git a/xen/include/acpi/cpufreq/cpufreq.h b/xen/include/acpi/cpufreq/cpufreq.h
index 0f334d2a43..29a712a4f1 100644
--- a/xen/include/acpi/cpufreq/cpufreq.h
+++ b/xen/include/acpi/cpufreq/cpufreq.h
@@ -245,4 +245,7 @@ void cpufreq_dbs_timer_resume(void);
 
 void intel_feature_detect(struct cpufreq_policy *policy);
 
+extern bool opt_cpufreq_hwp;
+extern bool opt_cpufreq_hdc;
+
 #endif /* __XEN_CPUFREQ_PM_H__ */
diff --git a/xen/include/acpi/cpufreq/processor_perf.h b/xen/include/acpi/cpufreq/processor_perf.h
index d8a1ba68a6..b751ca4937 100644
--- a/xen/include/acpi/cpufreq/processor_perf.h
+++ b/xen/include/acpi/cpufreq/processor_perf.h
@@ -7,6 +7,9 @@
 
 #define XEN_PX_INIT 0x80000000
 
+bool hwp_available(void);
+int hwp_register_driver(void);
+
 int powernow_cpufreq_init(void);
 unsigned int powernow_register_driver(void);
 unsigned int get_measured_perf(unsigned int cpu, unsigned int flag);
diff --git a/xen/include/public/sysctl.h b/xen/include/public/sysctl.h
index 5672906729..3d6dd14377 100644
--- a/xen/include/public/sysctl.h
+++ b/xen/include/public/sysctl.h
@@ -316,6 +316,7 @@ struct xen_ondemand {
     uint32_t up_threshold;
 };
 
+#define XEN_HWP_GOVERNOR "hwp-internal"
 /*
  * cpufreq para name of this structure named
  * same as sysfs file name of native linux
-- 
2.37.1



^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH v2 05/13] xenpm: Change get-cpufreq-para output for internal
  2022-08-10 19:29 [PATCH v2 00/13] Intel Hardware P-States (HWP) support Jason Andryuk
                   ` (3 preceding siblings ...)
  2022-08-10 19:29 ` [PATCH v2 04/13] cpufreq: Add Hardware P-State (HWP) driver Jason Andryuk
@ 2022-08-10 19:29 ` Jason Andryuk
  2022-08-10 19:29 ` [PATCH v2 06/13] cpufreq: Export HWP parameters to userspace Jason Andryuk
                   ` (7 subsequent siblings)
  12 siblings, 0 replies; 24+ messages in thread
From: Jason Andryuk @ 2022-08-10 19:29 UTC (permalink / raw)
  To: xen-devel; +Cc: Jason Andryuk, Wei Liu, Anthony PERARD

When using HWP, some of the returned data is not applicable.  In that
case, we should just omit it to avoid confusing the user.  So switch to
printing the base and turbo frequencies since those are relevant to HWP.
Similarly, stop printing the CPU frequencies since those do not apply.

Signed-off-by: Jason Andryuk <jandryuk@gmail.com>
---
v2:
Use full governor name XEN_HWP_GOVERNOR to change output
Style fixes
---
 tools/misc/xenpm.c | 41 +++++++++++++++++++++++++----------------
 1 file changed, 25 insertions(+), 16 deletions(-)

diff --git a/tools/misc/xenpm.c b/tools/misc/xenpm.c
index 4f8cde690a..179f9f1f96 100644
--- a/tools/misc/xenpm.c
+++ b/tools/misc/xenpm.c
@@ -711,6 +711,7 @@ void start_gather_func(int argc, char *argv[])
 /* print out parameters about cpu frequency */
 static void print_cpufreq_para(int cpuid, struct xc_get_cpufreq_para *p_cpufreq)
 {
+    bool internal = strstr(p_cpufreq->scaling_governor, XEN_HWP_GOVERNOR);
     int i;
 
     printf("cpu id               : %d\n", cpuid);
@@ -720,10 +721,15 @@ static void print_cpufreq_para(int cpuid, struct xc_get_cpufreq_para *p_cpufreq)
         printf(" %d", p_cpufreq->affected_cpus[i]);
     printf("\n");
 
-    printf("cpuinfo frequency    : max [%u] min [%u] cur [%u]\n",
-           p_cpufreq->cpuinfo_max_freq,
-           p_cpufreq->cpuinfo_min_freq,
-           p_cpufreq->cpuinfo_cur_freq);
+    if ( internal )
+        printf("cpuinfo frequency    : base [%u] turbo [%u]\n",
+               p_cpufreq->cpuinfo_min_freq,
+               p_cpufreq->cpuinfo_max_freq);
+    else
+        printf("cpuinfo frequency    : max [%u] min [%u] cur [%u]\n",
+               p_cpufreq->cpuinfo_max_freq,
+               p_cpufreq->cpuinfo_min_freq,
+               p_cpufreq->cpuinfo_cur_freq);
 
     printf("scaling_driver       : %s\n", p_cpufreq->scaling_driver);
 
@@ -750,19 +756,22 @@ static void print_cpufreq_para(int cpuid, struct xc_get_cpufreq_para *p_cpufreq)
                p_cpufreq->u.ondemand.up_threshold);
     }
 
-    printf("scaling_avail_freq   :");
-    for ( i = 0; i < p_cpufreq->freq_num; i++ )
-        if ( p_cpufreq->scaling_available_frequencies[i] ==
-             p_cpufreq->scaling_cur_freq )
-            printf(" *%d", p_cpufreq->scaling_available_frequencies[i]);
-        else
-            printf(" %d", p_cpufreq->scaling_available_frequencies[i]);
-    printf("\n");
+    if ( !internal )
+    {
+        printf("scaling_avail_freq   :");
+        for ( i = 0; i < p_cpufreq->freq_num; i++ )
+            if ( p_cpufreq->scaling_available_frequencies[i] ==
+                 p_cpufreq->scaling_cur_freq )
+                printf(" *%d", p_cpufreq->scaling_available_frequencies[i]);
+            else
+                printf(" %d", p_cpufreq->scaling_available_frequencies[i]);
+        printf("\n");
 
-    printf("scaling frequency    : max [%u] min [%u] cur [%u]\n",
-           p_cpufreq->scaling_max_freq,
-           p_cpufreq->scaling_min_freq,
-           p_cpufreq->scaling_cur_freq);
+        printf("scaling frequency    : max [%u] min [%u] cur [%u]\n",
+               p_cpufreq->scaling_max_freq,
+               p_cpufreq->scaling_min_freq,
+               p_cpufreq->scaling_cur_freq);
+    }
 
     printf("turbo mode           : %s\n",
            p_cpufreq->turbo_enabled ? "enabled" : "disabled or n/a");
-- 
2.37.1



^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH v2 06/13] cpufreq: Export HWP parameters to userspace
  2022-08-10 19:29 [PATCH v2 00/13] Intel Hardware P-States (HWP) support Jason Andryuk
                   ` (4 preceding siblings ...)
  2022-08-10 19:29 ` [PATCH v2 05/13] xenpm: Change get-cpufreq-para output for internal Jason Andryuk
@ 2022-08-10 19:29 ` Jason Andryuk
  2022-08-10 19:29 ` [PATCH v2 07/13] libxc: Include hwp_para in definitions Jason Andryuk
                   ` (6 subsequent siblings)
  12 siblings, 0 replies; 24+ messages in thread
From: Jason Andryuk @ 2022-08-10 19:29 UTC (permalink / raw)
  To: xen-devel
  Cc: Jason Andryuk, Jan Beulich, Andrew Cooper, Roger Pau Monné,
	Wei Liu, George Dunlap, Julien Grall, Stefano Stabellini

Extend xen_get_cpufreq_para to return hwp parameters.  These match the
hardware rather closely.

We need the features bitmask to indicated fields supported by the actual
hardware.

The use of uint8_t parameters matches the hardware size.  uint32_t
entries grows the sysctl_t past the build assertion in setup.c.  The
uint8_t ranges are supported across multiple generations, so hopefully
they won't change.

Signed-off-by: Jason Andryuk <jandryuk@gmail.com>
---
v2:
Style fixes
Don't bump XEN_SYSCTL_INTERFACE_VERSION
Drop cpufreq.h comment divider
Expand xen_hwp_para comment
Add HWP activity window mantissa/exponent defines
Handle union rename
Add const to get_hwp_para
Remove hw_ prefix from xen_hwp_para members
Use XEN_HWP_GOVERNOR
Use per_cpu for hwp_drv_data
---
 xen/arch/x86/acpi/cpufreq/hwp.c    | 25 +++++++++++++++++++++++++
 xen/drivers/acpi/pmstat.c          |  5 +++++
 xen/include/acpi/cpufreq/cpufreq.h |  2 ++
 xen/include/public/sysctl.h        | 26 ++++++++++++++++++++++++++
 4 files changed, 58 insertions(+)

diff --git a/xen/arch/x86/acpi/cpufreq/hwp.c b/xen/arch/x86/acpi/cpufreq/hwp.c
index 405cbae334..233089b6c9 100644
--- a/xen/arch/x86/acpi/cpufreq/hwp.c
+++ b/xen/arch/x86/acpi/cpufreq/hwp.c
@@ -500,6 +500,31 @@ static const struct cpufreq_driver __initconstrel hwp_cpufreq_driver =
     .update = hwp_cpufreq_update,
 };
 
+int get_hwp_para(const struct cpufreq_policy *policy,
+                 struct xen_hwp_para *hwp_para)
+{
+    unsigned int cpu = policy->cpu;
+    const struct hwp_drv_data *data = per_cpu(hwp_drv_data, cpu);
+
+    if ( data == NULL )
+        return -EINVAL;
+
+    hwp_para->features        =
+        (feature_hwp_activity_window ? XEN_SYSCTL_HWP_FEAT_ACT_WINDOW  : 0) |
+        (feature_hwp_energy_perf     ? XEN_SYSCTL_HWP_FEAT_ENERGY_PERF : 0);
+    hwp_para->lowest          = data->hw.lowest;
+    hwp_para->most_efficient  = data->hw.most_efficient;
+    hwp_para->guaranteed      = data->hw.guaranteed;
+    hwp_para->highest         = data->hw.highest;
+    hwp_para->minimum         = data->minimum;
+    hwp_para->maximum         = data->maximum;
+    hwp_para->energy_perf     = data->energy_perf;
+    hwp_para->activity_window = data->activity_window;
+    hwp_para->desired         = data->desired;
+
+    return 0;
+}
+
 int __init hwp_register_driver(void)
 {
     return cpufreq_register_driver(&hwp_cpufreq_driver);
diff --git a/xen/drivers/acpi/pmstat.c b/xen/drivers/acpi/pmstat.c
index 1bae635101..67fd9dabd4 100644
--- a/xen/drivers/acpi/pmstat.c
+++ b/xen/drivers/acpi/pmstat.c
@@ -290,6 +290,11 @@ static int get_cpufreq_para(struct xen_sysctl_pm_op *op)
             &op->u.get_para.u.ondemand.sampling_rate,
             &op->u.get_para.u.ondemand.up_threshold);
     }
+
+    if ( !strncasecmp(op->u.get_para.scaling_governor, XEN_HWP_GOVERNOR,
+                      CPUFREQ_NAME_LEN) )
+        ret = get_hwp_para(policy, &op->u.get_para.u.hwp_para);
+
     op->u.get_para.turbo_enabled = cpufreq_get_turbo_status(op->cpuid);
 
     return ret;
diff --git a/xen/include/acpi/cpufreq/cpufreq.h b/xen/include/acpi/cpufreq/cpufreq.h
index 29a712a4f1..92b4c7e79c 100644
--- a/xen/include/acpi/cpufreq/cpufreq.h
+++ b/xen/include/acpi/cpufreq/cpufreq.h
@@ -247,5 +247,7 @@ void intel_feature_detect(struct cpufreq_policy *policy);
 
 extern bool opt_cpufreq_hwp;
 extern bool opt_cpufreq_hdc;
+int get_hwp_para(const struct cpufreq_policy *policy,
+                 struct xen_hwp_para *hwp_para);
 
 #endif /* __XEN_CPUFREQ_PM_H__ */
diff --git a/xen/include/public/sysctl.h b/xen/include/public/sysctl.h
index 3d6dd14377..12a7a7db29 100644
--- a/xen/include/public/sysctl.h
+++ b/xen/include/public/sysctl.h
@@ -316,6 +316,31 @@ struct xen_ondemand {
     uint32_t up_threshold;
 };
 
+struct xen_hwp_para {
+    /*
+     * bits 6:0   - 7bit mantissa
+     * bits 9:7   - 3bit base-10 exponent
+     * btis 15:10 - Unused - must be 0
+     */
+#define HWP_ACT_WINDOW_MANTISSA_MASK  0x7f
+#define HWP_ACT_WINDOW_EXPONENT_MASK  0x7
+#define HWP_ACT_WINDOW_EXPONENT_SHIFT 7
+    uint16_t activity_window;
+    /* energy_perf range 0-255 if 1. Otherwise 0-15 */
+#define XEN_SYSCTL_HWP_FEAT_ENERGY_PERF (1 << 0)
+    /* activity_window supported if 1 */
+#define XEN_SYSCTL_HWP_FEAT_ACT_WINDOW  (1 << 1)
+    uint8_t features; /* bit flags for features */
+    uint8_t lowest;
+    uint8_t most_efficient;
+    uint8_t guaranteed;
+    uint8_t highest;
+    uint8_t minimum;
+    uint8_t maximum;
+    uint8_t desired;
+    uint8_t energy_perf;
+};
+
 #define XEN_HWP_GOVERNOR "hwp-internal"
 /*
  * cpufreq para name of this structure named
@@ -348,6 +373,7 @@ struct xen_get_cpufreq_para {
     union {
         struct  xen_userspace userspace;
         struct  xen_ondemand ondemand;
+        struct  xen_hwp_para hwp_para;
     } u;
 
     int32_t turbo_enabled;
-- 
2.37.1



^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH v2 07/13] libxc: Include hwp_para in definitions
  2022-08-10 19:29 [PATCH v2 00/13] Intel Hardware P-States (HWP) support Jason Andryuk
                   ` (5 preceding siblings ...)
  2022-08-10 19:29 ` [PATCH v2 06/13] cpufreq: Export HWP parameters to userspace Jason Andryuk
@ 2022-08-10 19:29 ` Jason Andryuk
  2022-08-10 19:29 ` [PATCH v2 08/13] xenpm: Print HWP parameters Jason Andryuk
                   ` (5 subsequent siblings)
  12 siblings, 0 replies; 24+ messages in thread
From: Jason Andryuk @ 2022-08-10 19:29 UTC (permalink / raw)
  To: xen-devel; +Cc: Jason Andryuk, Wei Liu, Anthony PERARD, Juergen Gross

Expose the hwp_para fields through libxc.

Signed-off-by: Jason Andryuk <jandryuk@gmail.com>
---
 tools/include/xenctrl.h | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/tools/include/xenctrl.h b/tools/include/xenctrl.h
index 0c8b4c3aa7..9586eca4c0 100644
--- a/tools/include/xenctrl.h
+++ b/tools/include/xenctrl.h
@@ -1900,6 +1900,7 @@ int xc_smt_disable(xc_interface *xch);
  */
 typedef struct xen_userspace xc_userspace_t;
 typedef struct xen_ondemand xc_ondemand_t;
+typedef struct xen_hwp_para xc_hwp_para_t;
 
 struct xc_get_cpufreq_para {
     /* IN/OUT variable */
@@ -1927,6 +1928,7 @@ struct xc_get_cpufreq_para {
     union {
         xc_userspace_t userspace;
         xc_ondemand_t ondemand;
+        xc_hwp_para_t hwp_para;
     } u;
 
     int32_t turbo_enabled;
-- 
2.37.1



^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH v2 08/13] xenpm: Print HWP parameters
  2022-08-10 19:29 [PATCH v2 00/13] Intel Hardware P-States (HWP) support Jason Andryuk
                   ` (6 preceding siblings ...)
  2022-08-10 19:29 ` [PATCH v2 07/13] libxc: Include hwp_para in definitions Jason Andryuk
@ 2022-08-10 19:29 ` Jason Andryuk
  2022-08-10 19:29 ` [PATCH v2 09/13] xen: Add SET_CPUFREQ_HWP xen_sysctl_pm_op Jason Andryuk
                   ` (4 subsequent siblings)
  12 siblings, 0 replies; 24+ messages in thread
From: Jason Andryuk @ 2022-08-10 19:29 UTC (permalink / raw)
  To: xen-devel; +Cc: Jason Andryuk, Wei Liu, Anthony PERARD

Print HWP-specific parameters.  Some are always present, but others
depend on hardware support.

Signed-off-by: Jason Andryuk <jandryuk@gmail.com>
---
v2:
Style fixes
Declare i outside loop
Replace repearted hardware/configured limits with spaces
Fixup for hw_ removal
Use XEN_HWP_GOVERNOR
Use HWP_ACT_WINDOW_EXPONENT_*
Remove energy_perf hw autonomous - 0 doesn't mean autonomous
---
 tools/misc/xenpm.c | 65 ++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 65 insertions(+)

diff --git a/tools/misc/xenpm.c b/tools/misc/xenpm.c
index 179f9f1f96..610a516213 100644
--- a/tools/misc/xenpm.c
+++ b/tools/misc/xenpm.c
@@ -708,6 +708,44 @@ void start_gather_func(int argc, char *argv[])
     pause();
 }
 
+static void calculate_hwp_activity_window(const xc_hwp_para_t *hwp,
+                                          unsigned int *activity_window,
+                                          const char **units)
+{
+    unsigned int mantissa = hwp->activity_window & HWP_ACT_WINDOW_MANTISSA_MASK;
+    unsigned int exponent =
+        (hwp->activity_window >> HWP_ACT_WINDOW_EXPONENT_SHIFT) &
+            HWP_ACT_WINDOW_EXPONENT_MASK;
+    unsigned int multiplier = 1;
+    unsigned int i;
+
+    if ( hwp->activity_window == 0 )
+    {
+        *units = "hardware selected";
+        *activity_window = 0;
+
+        return;
+    }
+
+    if ( exponent >= 6 )
+    {
+        *units = "s";
+        exponent -= 6;
+    }
+    else if ( exponent >= 3 )
+    {
+        *units = "ms";
+        exponent -= 3;
+    }
+    else
+        *units = "us";
+
+    for ( i = 0; i < exponent; i++ )
+        multiplier *= 10;
+
+    *activity_window = mantissa * multiplier;
+}
+
 /* print out parameters about cpu frequency */
 static void print_cpufreq_para(int cpuid, struct xc_get_cpufreq_para *p_cpufreq)
 {
@@ -773,6 +811,33 @@ static void print_cpufreq_para(int cpuid, struct xc_get_cpufreq_para *p_cpufreq)
                p_cpufreq->scaling_cur_freq);
     }
 
+    if ( strcmp(p_cpufreq->scaling_governor, XEN_HWP_GOVERNOR) == 0 )
+    {
+        const xc_hwp_para_t *hwp = &p_cpufreq->u.hwp_para;
+
+        printf("hwp variables        :\n");
+        printf("  hardware limits    : lowest [%u] most_efficient [%u]\n",
+               hwp->lowest, hwp->most_efficient);
+        printf("                     : guaranteed [%u] highest [%u]\n",
+               hwp->guaranteed, hwp->highest);
+        printf("  configured limits  : min [%u] max [%u] energy_perf [%u]\n",
+               hwp->minimum, hwp->maximum, hwp->energy_perf);
+
+        if ( hwp->features & XEN_SYSCTL_HWP_FEAT_ACT_WINDOW )
+        {
+            unsigned int activity_window;
+            const char *units;
+
+            calculate_hwp_activity_window(hwp, &activity_window, &units);
+            printf("                     : activity_window [%u %s]\n",
+                   activity_window, units);
+        }
+
+        printf("                     : desired [%u%s]\n",
+               hwp->desired,
+               hwp->desired ? "" : " hw autonomous");
+    }
+
     printf("turbo mode           : %s\n",
            p_cpufreq->turbo_enabled ? "enabled" : "disabled or n/a");
     printf("\n");
-- 
2.37.1



^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH v2 09/13] xen: Add SET_CPUFREQ_HWP xen_sysctl_pm_op
  2022-08-10 19:29 [PATCH v2 00/13] Intel Hardware P-States (HWP) support Jason Andryuk
                   ` (7 preceding siblings ...)
  2022-08-10 19:29 ` [PATCH v2 08/13] xenpm: Print HWP parameters Jason Andryuk
@ 2022-08-10 19:29 ` Jason Andryuk
  2022-08-10 19:29 ` [PATCH v2 10/13] libxc: Add xc_set_cpufreq_hwp Jason Andryuk
                   ` (3 subsequent siblings)
  12 siblings, 0 replies; 24+ messages in thread
From: Jason Andryuk @ 2022-08-10 19:29 UTC (permalink / raw)
  To: xen-devel
  Cc: Jason Andryuk, Jan Beulich, Andrew Cooper, Roger Pau Monné,
	Wei Liu, George Dunlap, Julien Grall, Stefano Stabellini

Add SET_CPUFREQ_HWP xen_sysctl_pm_op to set HWP parameters.  The sysctl
supports setting multiple values simultaneously as indicated by the
set_params bits.  This allows atomically applying new HWP configuration
via a single wrmsr.

XEN_SYSCTL_HWP_SET_PRESET_BALANCE/PERFORMANCE/POWERSAVE provide three
common presets.  Setting them depends on hardware limits which the
hypervisor is already caching.  So using them allows skipping a
hypercall to query the limits (lowest/highest) to then set those same
values.  The code is organized to allow a preset to be refined with
additional stuff if desired.

"most_efficient" and "guaranteed" could be additional presets in the
future, but the are not added now.  Those levels can change at runtime,
but we don't have code in place to monitor and update for those events.

Signed-off-by: Jason Andryuk <jandryuk@gmail.com>

---
v2:
Update for naming anonymous union
Drop hwp_err for invalid input in set_hwp_para()
Drop uint16_t cast in XEN_SYSCTL_HWP_SET_PARAM_MASK
Drop parens for HWP_SET_PRESET defines
Reference activity_window format comment
Place SET_CPUFREQ_HWP after SET_CPUFREQ_PARA
Add {HWP,IA32}_ENERGY_PERF_MAX_{PERFORMANCE,POWERSAVE} defines
Order defines before fields in sysctl.h
Use XEN_HWP_GOVERNOR
Use per_cpu for hwp_drv_data
---
 xen/arch/x86/acpi/cpufreq/hwp.c    | 96 ++++++++++++++++++++++++++++++
 xen/drivers/acpi/pmstat.c          | 23 +++++++
 xen/include/acpi/cpufreq/cpufreq.h |  2 +
 xen/include/public/sysctl.h        | 30 ++++++++++
 4 files changed, 151 insertions(+)

diff --git a/xen/arch/x86/acpi/cpufreq/hwp.c b/xen/arch/x86/acpi/cpufreq/hwp.c
index 233089b6c9..2466933407 100644
--- a/xen/arch/x86/acpi/cpufreq/hwp.c
+++ b/xen/arch/x86/acpi/cpufreq/hwp.c
@@ -25,7 +25,9 @@ static bool feature_hdc;
 __initdata bool opt_cpufreq_hwp = false;
 __initdata bool opt_cpufreq_hdc = true;
 
+#define HWP_ENERGY_PERF_MAX_PERFORMANCE 0
 #define HWP_ENERGY_PERF_BALANCE         0x80
+#define HWP_ENERGY_PERF_MAX_POWERSAVE   0xff
 #define IA32_ENERGY_BIAS_BALANCE        0x7
 #define IA32_ENERGY_BIAS_MAX_POWERSAVE  0xf
 #define IA32_ENERGY_BIAS_MASK           0xf
@@ -525,6 +527,100 @@ int get_hwp_para(const struct cpufreq_policy *policy,
     return 0;
 }
 
+int set_hwp_para(struct cpufreq_policy *policy,
+                 struct xen_set_hwp_para *set_hwp)
+{
+    unsigned int cpu = policy->cpu;
+    struct hwp_drv_data *data = per_cpu(hwp_drv_data, cpu);
+
+    if ( data == NULL )
+        return -EINVAL;
+
+    /* Validate all parameters first */
+    if ( set_hwp->set_params & ~XEN_SYSCTL_HWP_SET_PARAM_MASK )
+        return -EINVAL;
+
+    if ( set_hwp->activity_window & ~XEN_SYSCTL_HWP_ACT_WINDOW_MASK )
+        return -EINVAL;
+
+    if ( !feature_hwp_energy_perf &&
+         (set_hwp->set_params & XEN_SYSCTL_HWP_SET_ENERGY_PERF) &&
+         set_hwp->energy_perf > IA32_ENERGY_BIAS_MAX_POWERSAVE )
+        return -EINVAL;
+
+    if ( (set_hwp->set_params & XEN_SYSCTL_HWP_SET_DESIRED) &&
+         set_hwp->desired != 0 &&
+         (set_hwp->desired < data->hw.lowest ||
+          set_hwp->desired > data->hw.highest) )
+        return -EINVAL;
+
+    /*
+     * minimum & maximum are not validated as hardware doesn't seem to care
+     * and the SDM says CPUs will clip internally.
+     */
+
+    /* Apply presets */
+    switch ( set_hwp->set_params & XEN_SYSCTL_HWP_SET_PRESET_MASK )
+    {
+    case XEN_SYSCTL_HWP_SET_PRESET_POWERSAVE:
+        data->minimum = data->hw.lowest;
+        data->maximum = data->hw.lowest;
+        data->activity_window = 0;
+        if ( feature_hwp_energy_perf )
+            data->energy_perf = HWP_ENERGY_PERF_MAX_POWERSAVE;
+        else
+            data->energy_perf = IA32_ENERGY_BIAS_MAX_POWERSAVE;
+        data->desired = 0;
+        break;
+
+    case XEN_SYSCTL_HWP_SET_PRESET_PERFORMANCE:
+        data->minimum = data->hw.highest;
+        data->maximum = data->hw.highest;
+        data->activity_window = 0;
+        data->energy_perf = HWP_ENERGY_PERF_MAX_PERFORMANCE;
+        data->desired = 0;
+        break;
+
+    case XEN_SYSCTL_HWP_SET_PRESET_BALANCE:
+        data->minimum = data->hw.lowest;
+        data->maximum = data->hw.highest;
+        data->activity_window = 0;
+        if ( feature_hwp_energy_perf )
+            data->energy_perf = HWP_ENERGY_PERF_BALANCE;
+        else
+            data->energy_perf = IA32_ENERGY_BIAS_BALANCE;
+        data->desired = 0;
+        break;
+
+    case XEN_SYSCTL_HWP_SET_PRESET_NONE:
+        break;
+
+    default:
+        return -EINVAL;
+    }
+
+    /* Further customize presets if needed */
+    if ( set_hwp->set_params & XEN_SYSCTL_HWP_SET_MINIMUM )
+        data->minimum = set_hwp->minimum;
+
+    if ( set_hwp->set_params & XEN_SYSCTL_HWP_SET_MAXIMUM )
+        data->maximum = set_hwp->maximum;
+
+    if ( set_hwp->set_params & XEN_SYSCTL_HWP_SET_ENERGY_PERF )
+        data->energy_perf = set_hwp->energy_perf;
+
+    if ( set_hwp->set_params & XEN_SYSCTL_HWP_SET_DESIRED )
+        data->desired = set_hwp->desired;
+
+    if ( set_hwp->set_params & XEN_SYSCTL_HWP_SET_ACT_WINDOW )
+        data->activity_window = set_hwp->activity_window &
+                                XEN_SYSCTL_HWP_ACT_WINDOW_MASK;
+
+    hwp_cpufreq_target(policy, 0, 0);
+
+    return 0;
+}
+
 int __init hwp_register_driver(void)
 {
     return cpufreq_register_driver(&hwp_cpufreq_driver);
diff --git a/xen/drivers/acpi/pmstat.c b/xen/drivers/acpi/pmstat.c
index 67fd9dabd4..55e2b285b5 100644
--- a/xen/drivers/acpi/pmstat.c
+++ b/xen/drivers/acpi/pmstat.c
@@ -398,6 +398,25 @@ static int set_cpufreq_para(struct xen_sysctl_pm_op *op)
     return ret;
 }
 
+static int set_cpufreq_hwp(struct xen_sysctl_pm_op *op)
+{
+    struct cpufreq_policy *policy;
+
+    if ( !cpufreq_governor_internal )
+        return -EINVAL;
+
+    policy = per_cpu(cpufreq_cpu_policy, op->cpuid);
+
+    if ( !policy || !policy->governor )
+        return -EINVAL;
+
+    if ( strncasecmp(policy->governor->name, XEN_HWP_GOVERNOR,
+                     CPUFREQ_NAME_LEN) )
+        return -EINVAL;
+
+    return set_hwp_para(policy, &op->u.set_hwp);
+}
+
 int do_pm_op(struct xen_sysctl_pm_op *op)
 {
     int ret = 0;
@@ -470,6 +489,10 @@ int do_pm_op(struct xen_sysctl_pm_op *op)
         break;
     }
 
+    case SET_CPUFREQ_HWP:
+        ret = set_cpufreq_hwp(op);
+        break;
+
     case GET_CPUFREQ_AVGFREQ:
     {
         op->u.get_avgfreq = cpufreq_driver_getavg(op->cpuid, USR_GETAVG);
diff --git a/xen/include/acpi/cpufreq/cpufreq.h b/xen/include/acpi/cpufreq/cpufreq.h
index 92b4c7e79c..b8831b2cd3 100644
--- a/xen/include/acpi/cpufreq/cpufreq.h
+++ b/xen/include/acpi/cpufreq/cpufreq.h
@@ -249,5 +249,7 @@ extern bool opt_cpufreq_hwp;
 extern bool opt_cpufreq_hdc;
 int get_hwp_para(const struct cpufreq_policy *policy,
                  struct xen_hwp_para *hwp_para);
+int set_hwp_para(struct cpufreq_policy *policy,
+                 struct xen_set_hwp_para *set_hwp);
 
 #endif /* __XEN_CPUFREQ_PM_H__ */
diff --git a/xen/include/public/sysctl.h b/xen/include/public/sysctl.h
index 12a7a7db29..08e25e15a6 100644
--- a/xen/include/public/sysctl.h
+++ b/xen/include/public/sysctl.h
@@ -341,6 +341,34 @@ struct xen_hwp_para {
     uint8_t energy_perf;
 };
 
+/* set multiple values simultaneously when set_args bit is set */
+struct xen_set_hwp_para {
+#define XEN_SYSCTL_HWP_SET_DESIRED              (1U << 0)
+#define XEN_SYSCTL_HWP_SET_ENERGY_PERF          (1U << 1)
+#define XEN_SYSCTL_HWP_SET_ACT_WINDOW           (1U << 2)
+#define XEN_SYSCTL_HWP_SET_MINIMUM              (1U << 3)
+#define XEN_SYSCTL_HWP_SET_MAXIMUM              (1U << 4)
+#define XEN_SYSCTL_HWP_SET_PRESET_MASK          0xf000
+#define XEN_SYSCTL_HWP_SET_PRESET_NONE          0x0000
+#define XEN_SYSCTL_HWP_SET_PRESET_BALANCE       0x1000
+#define XEN_SYSCTL_HWP_SET_PRESET_POWERSAVE     0x2000
+#define XEN_SYSCTL_HWP_SET_PRESET_PERFORMANCE   0x3000
+#define XEN_SYSCTL_HWP_SET_PARAM_MASK ( \
+                                  XEN_SYSCTL_HWP_SET_PRESET_MASK | \
+                                  XEN_SYSCTL_HWP_SET_DESIRED     | \
+                                  XEN_SYSCTL_HWP_SET_ENERGY_PERF | \
+                                  XEN_SYSCTL_HWP_SET_ACT_WINDOW  | \
+                                  XEN_SYSCTL_HWP_SET_MINIMUM     | \
+                                  XEN_SYSCTL_HWP_SET_MAXIMUM     )
+    uint16_t set_params; /* bitflags for valid values */
+#define XEN_SYSCTL_HWP_ACT_WINDOW_MASK          0x03ff
+    uint16_t activity_window; /* See comment in struct xen_hwp_para */
+    uint8_t minimum;
+    uint8_t maximum;
+    uint8_t desired;
+    uint8_t energy_perf; /* 0-255 or 0-15 depending on HW support */
+};
+
 #define XEN_HWP_GOVERNOR "hwp-internal"
 /*
  * cpufreq para name of this structure named
@@ -403,6 +431,7 @@ struct xen_sysctl_pm_op {
     #define SET_CPUFREQ_GOV            (CPUFREQ_PARA | 0x02)
     #define SET_CPUFREQ_PARA           (CPUFREQ_PARA | 0x03)
     #define GET_CPUFREQ_AVGFREQ        (CPUFREQ_PARA | 0x04)
+    #define SET_CPUFREQ_HWP            (CPUFREQ_PARA | 0x05)
 
     /* set/reset scheduler power saving option */
     #define XEN_SYSCTL_pm_op_set_sched_opt_smt    0x21
@@ -429,6 +458,7 @@ struct xen_sysctl_pm_op {
         struct xen_get_cpufreq_para get_para;
         struct xen_set_cpufreq_gov  set_gov;
         struct xen_set_cpufreq_para set_para;
+        struct xen_set_hwp_para     set_hwp;
         uint64_aligned_t get_avgfreq;
         uint32_t                    set_sched_opt_smt;
 #define XEN_SYSCTL_CX_UNLIMITED 0xffffffff
-- 
2.37.1



^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH v2 10/13] libxc: Add xc_set_cpufreq_hwp
  2022-08-10 19:29 [PATCH v2 00/13] Intel Hardware P-States (HWP) support Jason Andryuk
                   ` (8 preceding siblings ...)
  2022-08-10 19:29 ` [PATCH v2 09/13] xen: Add SET_CPUFREQ_HWP xen_sysctl_pm_op Jason Andryuk
@ 2022-08-10 19:29 ` Jason Andryuk
  2022-08-10 19:29 ` [PATCH v2 11/13] xenpm: Factor out a non-fatal cpuid_parse variant Jason Andryuk
                   ` (2 subsequent siblings)
  12 siblings, 0 replies; 24+ messages in thread
From: Jason Andryuk @ 2022-08-10 19:29 UTC (permalink / raw)
  To: xen-devel; +Cc: Jason Andryuk, Wei Liu, Anthony PERARD, Juergen Gross

Add xc_set_cpufreq_hwp to allow calling xen_systctl_pm_op
SET_CPUFREQ_HWP.

Signed-off-by: Jason Andryuk <jandryuk@gmail.com>

---
v2:
Mark xc_set_hwp_para_t const
---
 tools/include/xenctrl.h |  4 ++++
 tools/libs/ctrl/xc_pm.c | 18 ++++++++++++++++++
 2 files changed, 22 insertions(+)

diff --git a/tools/include/xenctrl.h b/tools/include/xenctrl.h
index 9586eca4c0..2d6c112d44 100644
--- a/tools/include/xenctrl.h
+++ b/tools/include/xenctrl.h
@@ -1934,11 +1934,15 @@ struct xc_get_cpufreq_para {
     int32_t turbo_enabled;
 };
 
+typedef struct xen_set_hwp_para xc_set_hwp_para_t;
+
 int xc_get_cpufreq_para(xc_interface *xch, int cpuid,
                         struct xc_get_cpufreq_para *user_para);
 int xc_set_cpufreq_gov(xc_interface *xch, int cpuid, char *govname);
 int xc_set_cpufreq_para(xc_interface *xch, int cpuid,
                         int ctrl_type, int ctrl_value);
+int xc_set_cpufreq_hwp(xc_interface *xch, int cpuid,
+                       const xc_set_hwp_para_t *set_hwp);
 int xc_get_cpufreq_avgfreq(xc_interface *xch, int cpuid, int *avg_freq);
 
 int xc_set_sched_opt_smt(xc_interface *xch, uint32_t value);
diff --git a/tools/libs/ctrl/xc_pm.c b/tools/libs/ctrl/xc_pm.c
index 76d7eb7f26..87c636f820 100644
--- a/tools/libs/ctrl/xc_pm.c
+++ b/tools/libs/ctrl/xc_pm.c
@@ -330,6 +330,24 @@ int xc_set_cpufreq_para(xc_interface *xch, int cpuid,
     return xc_sysctl(xch, &sysctl);
 }
 
+int xc_set_cpufreq_hwp(xc_interface *xch, int cpuid,
+                       const xc_set_hwp_para_t *set_hwp)
+{
+    DECLARE_SYSCTL;
+
+    if ( !xch )
+    {
+        errno = EINVAL;
+        return -1;
+    }
+    sysctl.cmd = XEN_SYSCTL_pm_op;
+    sysctl.u.pm_op.cmd = SET_CPUFREQ_HWP;
+    sysctl.u.pm_op.cpuid = cpuid;
+    sysctl.u.pm_op.u.set_hwp = *set_hwp;
+
+    return xc_sysctl(xch, &sysctl);
+}
+
 int xc_get_cpufreq_avgfreq(xc_interface *xch, int cpuid, int *avg_freq)
 {
     int ret = 0;
-- 
2.37.1



^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH v2 11/13] xenpm: Factor out a non-fatal cpuid_parse variant
  2022-08-10 19:29 [PATCH v2 00/13] Intel Hardware P-States (HWP) support Jason Andryuk
                   ` (9 preceding siblings ...)
  2022-08-10 19:29 ` [PATCH v2 10/13] libxc: Add xc_set_cpufreq_hwp Jason Andryuk
@ 2022-08-10 19:29 ` Jason Andryuk
  2022-08-10 19:29 ` [PATCH v2 12/13] xenpm: Add set-cpufreq-hwp subcommand Jason Andryuk
  2022-08-10 19:29 ` [PATCH v2 13/13] CHANGELOG: Add Intel HWP entry Jason Andryuk
  12 siblings, 0 replies; 24+ messages in thread
From: Jason Andryuk @ 2022-08-10 19:29 UTC (permalink / raw)
  To: xen-devel; +Cc: Jason Andryuk, Wei Liu, Anthony PERARD

Allow cpuid_parse to be re-used without terminating xenpm.  HWP will
re-use it to optionally parse a cpuid.  Unlike other uses of
cpuid_parse, parse_hwp_opts will take a variable number of arguments and
cannot just check argc.

Signed-off-by: Jason Andryuk <jandryuk@gmail.com>
---
v2:
Retained because cpuid_parse handles numeric cpu numbers and "all".
---
 tools/misc/xenpm.c | 19 ++++++++++++++-----
 1 file changed, 14 insertions(+), 5 deletions(-)

diff --git a/tools/misc/xenpm.c b/tools/misc/xenpm.c
index 610a516213..5b28e2f6dd 100644
--- a/tools/misc/xenpm.c
+++ b/tools/misc/xenpm.c
@@ -79,17 +79,26 @@ void help_func(int argc, char *argv[])
     show_help();
 }
 
-static void parse_cpuid(const char *arg, int *cpuid)
+static int parse_cpuid_non_fatal(const char *arg, int *cpuid)
 {
     if ( sscanf(arg, "%d", cpuid) != 1 || *cpuid < 0 )
     {
         if ( strcasecmp(arg, "all") )
-        {
-            fprintf(stderr, "Invalid CPU identifier: '%s'\n", arg);
-            exit(EINVAL);
-        }
+            return -1;
+
         *cpuid = -1;
     }
+
+    return 0;
+}
+
+static void parse_cpuid(const char *arg, int *cpuid)
+{
+    if ( parse_cpuid_non_fatal(arg, cpuid) )
+    {
+        fprintf(stderr, "Invalid CPU identifier: '%s'\n", arg);
+        exit(EINVAL);
+    }
 }
 
 static void parse_cpuid_and_int(int argc, char *argv[],
-- 
2.37.1



^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH v2 12/13] xenpm: Add set-cpufreq-hwp subcommand
  2022-08-10 19:29 [PATCH v2 00/13] Intel Hardware P-States (HWP) support Jason Andryuk
                   ` (10 preceding siblings ...)
  2022-08-10 19:29 ` [PATCH v2 11/13] xenpm: Factor out a non-fatal cpuid_parse variant Jason Andryuk
@ 2022-08-10 19:29 ` Jason Andryuk
  2022-08-10 19:29 ` [PATCH v2 13/13] CHANGELOG: Add Intel HWP entry Jason Andryuk
  12 siblings, 0 replies; 24+ messages in thread
From: Jason Andryuk @ 2022-08-10 19:29 UTC (permalink / raw)
  To: xen-devel; +Cc: Jason Andryuk, Wei Liu, Anthony PERARD

set-cpufreq-hwp allows setting the Hardware P-State (HWP) parameters.

It can be run on all or just a single cpu.  There are presets of
balance, powersave & performance.  Those can be further tweaked by
param:val arguments as explained in the usage description.

Parameter names are just checked to the first 3 characters to shorten
typing.

Some options are hardware dependent, and ranges can be found in
get-cpufreq-para.

Signed-off-by: Jason Andryuk <jandryuk@gmail.com>
---
v2:
Compare provided parameter name and not just 3 characters.
Use "-" in parameter names
Remove hw_
Replace sscanf with strchr & strtoul.
Remove toplevel error message with lower level ones.
Help text s/127/128/
Help text mention truncation.
Avoid some truncation rounding down by adding 5 before division.
Help test mention default microseconds
Also comment the limit check written to avoid overflow.
---
 tools/misc/xenpm.c | 230 +++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 230 insertions(+)

diff --git a/tools/misc/xenpm.c b/tools/misc/xenpm.c
index 5b28e2f6dd..c66ba7c959 100644
--- a/tools/misc/xenpm.c
+++ b/tools/misc/xenpm.c
@@ -16,6 +16,7 @@
  */
 #define MAX_NR_CPU 512
 
+#include <limits.h>
 #include <stdio.h>
 #include <stdlib.h>
 #include <unistd.h>
@@ -67,6 +68,27 @@ void show_help(void)
             " set-max-cstate        <num>|'unlimited' [<num2>|'unlimited']\n"
             "                                     set the C-State limitation (<num> >= 0) and\n"
             "                                     optionally the C-sub-state limitation (<num2> >= 0)\n"
+            " set-cpufreq-hwp       [cpuid] [balance|performance|powersave] <param:val>*\n"
+            "                                     set Hardware P-State (HWP) parameters\n"
+            "                                     optionally a preset of one of\n"
+            "                                       balance|performance|powersave\n"
+            "                                     an optional list of param:val arguments\n"
+            "                                       minimum:N  lowest ... highest\n"
+            "                                       maximum:N  lowest ... highest\n"
+            "                                       desired:N  lowest ... highest\n"
+            "                                           Set explicit performance target.\n"
+            "                                           non-zero disables auto-HWP mode.\n"
+            "                                       energy-perf:0-255 (or 0-15)\n"
+            "                                                   energy/performance hint\n"
+            "                                                   lower - favor performance\n"
+            "                                                   higher - favor powersave\n"
+            "                                                   128 (or 7) - balance\n"
+            "                                       act-window:N{,m,u}s range 1us-1270s\n"
+            "                                           window for internal calculations.\n"
+            "                                           Defaults to us without units.\n"
+            "                                           Truncates un-representable values.\n"
+            "                                           0 lets the hardware decide.\n"
+            "                                     get-cpufreq-para returns lowest/highest.\n"
             " start [seconds]                     start collect Cx/Px statistics,\n"
             "                                     output after CTRL-C or SIGINT or several seconds.\n"
             " enable-turbo-mode     [cpuid]       enable Turbo Mode for processors that support it.\n"
@@ -1299,6 +1321,213 @@ void disable_turbo_mode(int argc, char *argv[])
                 errno, strerror(errno));
 }
 
+/*
+ * Parse activity_window:NNN{us,ms,s} and validate range.
+ *
+ * Activity window is a 7bit mantissa (0-127) with a 3bit exponent (0-7) base
+ * 10 in microseconds.  So the range is 1 microsecond to 1270 seconds.  A value
+ * of 0 lets the hardware autonomously select the window.
+ *
+ * Return 0 on success
+ *       -1 on error
+ */
+static int parse_activity_window(xc_set_hwp_para_t *set_hwp, unsigned long u,
+                                 const char *suffix)
+{
+    unsigned int exponent = 0;
+    unsigned int multiplier = 1;
+
+    if ( suffix && suffix[0] )
+    {
+        if ( strcasecmp(suffix, "s") == 0 )
+        {
+            multiplier = 1000 * 1000;
+            exponent = 6;
+        }
+        else if ( strcasecmp(suffix, "ms") == 0 )
+        {
+            multiplier = 1000;
+            exponent = 3;
+        }
+        else if ( strcasecmp(suffix, "us") == 0 )
+        {
+            multiplier = 1;
+            exponent = 0;
+        }
+        else
+        {
+            fprintf(stderr, "invalid activity window units: \"%s\"\n", suffix);
+
+            return -1;
+        }
+    }
+
+    /* u * multipler > 1270 * 1000 * 1000 transformed to avoid overflow. */
+    if ( u > 1270 * 1000 * 1000 / multiplier )
+    {
+        fprintf(stderr, "activity window is too large\n");
+
+        return -1;
+    }
+
+    /* looking for 7 bits of mantissa and 3 bits of exponent */
+    while ( u > 127 )
+    {
+        u += 5; /* Round up to mitigate truncation rounding down
+                   e.g. 128 -> 120 vs 128 -> 130. */
+        u /= 10;
+        exponent += 1;
+    }
+
+    set_hwp->activity_window = (exponent & HWP_ACT_WINDOW_EXPONENT_MASK) <<
+                                   HWP_ACT_WINDOW_EXPONENT_SHIFT |
+                               (u & HWP_ACT_WINDOW_MANTISSA_MASK);
+    set_hwp->set_params |= XEN_SYSCTL_HWP_SET_ACT_WINDOW;
+
+    return 0;
+}
+
+static int parse_hwp_opts(xc_set_hwp_para_t *set_hwp, int *cpuid,
+                          int argc, char *argv[])
+{
+    int i = 0;
+
+    if ( argc < 1 ) {
+        fprintf(stderr, "Missing arguments\n");
+        return -1;
+    }
+
+    if ( parse_cpuid_non_fatal(argv[i], cpuid) == 0 )
+    {
+        i++;
+    }
+
+    if ( i == argc ) {
+        fprintf(stderr, "Missing arguments\n");
+        return -1;
+    }
+
+    if ( strcasecmp(argv[i], "powersave") == 0 )
+    {
+        set_hwp->set_params = XEN_SYSCTL_HWP_SET_PRESET_POWERSAVE;
+        i++;
+    }
+    else if ( strcasecmp(argv[i], "performance") == 0 )
+    {
+        set_hwp->set_params = XEN_SYSCTL_HWP_SET_PRESET_PERFORMANCE;
+        i++;
+    }
+    else if ( strcasecmp(argv[i], "balance") == 0 )
+    {
+        set_hwp->set_params = XEN_SYSCTL_HWP_SET_PRESET_BALANCE;
+        i++;
+    }
+
+    for ( ; i < argc; i++)
+    {
+        unsigned long val;
+        char *param = argv[i];
+        char *value;
+        char *suffix;
+        int ret;
+
+        value = strchr(param, ':');
+        if ( value == NULL )
+        {
+            fprintf(stderr, "\"%s\" is an invalid hwp parameter\n", argv[i]);
+            return -1;
+        }
+
+        value[0] = '\0';
+        value++;
+
+        errno = 0;
+        val = strtoul(value, &suffix, 10);
+        if ( (errno && val == ULONG_MAX) || value == suffix )
+        {
+            fprintf(stderr, "Could not parse number \"%s\"\n", value);
+            return -1;
+        }
+
+        if ( strncasecmp(param, "act-window", strlen(param)) == 0 )
+        {
+            ret = parse_activity_window(set_hwp, val, suffix);
+            if (ret)
+                return -1;
+
+            continue;
+        }
+
+        if ( val > 255 )
+        {
+            fprintf(stderr, "\"%s\" value \"%lu\" is out of range\n", param,
+                    val);
+            return -1;
+        }
+
+        if ( suffix && suffix[0] )
+        {
+            fprintf(stderr, "Suffix \"%s\" is invalid\n", suffix);
+            return -1;
+        }
+
+        if ( strncasecmp(param, "minimum", MAX(2, strlen(param))) == 0 )
+        {
+            set_hwp->minimum = val;
+            set_hwp->set_params |= XEN_SYSCTL_HWP_SET_MINIMUM;
+        }
+        else if ( strncasecmp(param, "maximum", MAX(2, strlen(param))) == 0 )
+        {
+            set_hwp->maximum = val;
+            set_hwp->set_params |= XEN_SYSCTL_HWP_SET_MAXIMUM;
+        }
+        else if ( strncasecmp(param, "desired", strlen(param)) == 0 )
+        {
+            set_hwp->desired = val;
+            set_hwp->set_params |= XEN_SYSCTL_HWP_SET_DESIRED;
+        }
+        else if ( strncasecmp(param, "energy-perf", strlen(param)) == 0 )
+        {
+            set_hwp->energy_perf = val;
+            set_hwp->set_params |= XEN_SYSCTL_HWP_SET_ENERGY_PERF;
+        }
+        else
+        {
+            fprintf(stderr, "\"%s\" is an invalid parameter\n", param);
+            return -1;
+        }
+    }
+
+    if ( set_hwp->set_params == 0 )
+    {
+        fprintf(stderr, "No parameters set in request\n");
+        return -1;
+    }
+
+    return 0;
+}
+
+static void hwp_set_func(int argc, char *argv[])
+{
+    xc_set_hwp_para_t set_hwp = {};
+    int cpuid = -1;
+    int i = 0;
+
+    if ( parse_hwp_opts(&set_hwp, &cpuid, argc, argv) )
+        exit(EINVAL);
+
+    if ( cpuid != -1 )
+    {
+        i = cpuid;
+        max_cpu_nr = i + 1;
+    }
+
+    for ( ; i < max_cpu_nr; i++ )
+        if ( xc_set_cpufreq_hwp(xc_handle, i, &set_hwp) )
+            fprintf(stderr, "[CPU%d] failed to set hwp params (%d - %s)\n",
+                    i, errno, strerror(errno));
+}
+
 struct {
     const char *name;
     void (*function)(int argc, char *argv[]);
@@ -1309,6 +1538,7 @@ struct {
     { "get-cpufreq-average", cpufreq_func },
     { "start", start_gather_func },
     { "get-cpufreq-para", cpufreq_para_func },
+    { "set-cpufreq-hwp", hwp_set_func },
     { "set-scaling-maxfreq", scaling_max_freq_func },
     { "set-scaling-minfreq", scaling_min_freq_func },
     { "set-scaling-governor", scaling_governor_func },
-- 
2.37.1



^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH v2 13/13] CHANGELOG: Add Intel HWP entry
  2022-08-10 19:29 [PATCH v2 00/13] Intel Hardware P-States (HWP) support Jason Andryuk
                   ` (11 preceding siblings ...)
  2022-08-10 19:29 ` [PATCH v2 12/13] xenpm: Add set-cpufreq-hwp subcommand Jason Andryuk
@ 2022-08-10 19:29 ` Jason Andryuk
  2022-08-11  1:56   ` Henry Wang
  2022-08-11  6:51   ` Jan Beulich
  12 siblings, 2 replies; 24+ messages in thread
From: Jason Andryuk @ 2022-08-10 19:29 UTC (permalink / raw)
  To: xen-devel; +Cc: Jason Andryuk, Henry Wang, Community Manager

Signed-off-by: Jason Andryuk <jandryuk@gmail.com>

---
v2:
Add blank line
---
 CHANGELOG.md | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/CHANGELOG.md b/CHANGELOG.md
index a87a193e15..3d982bf96b 100644
--- a/CHANGELOG.md
+++ b/CHANGELOG.md
@@ -6,6 +6,9 @@ The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/)
 
 ## [unstable UNRELEASED](https://xenbits.xen.org/gitweb/?p=xen.git;a=shortlog;h=staging) - TBD
 
+### Added
+ - Intel Hardware P-States (HWP) cpufreq driver
+
 ### Changed
  - On x86 "vga=current" can now be used together with GrUB2's gfxpayload setting. Note that
    this requires use of "multiboot2" (and "module2") as the GrUB commands loading Xen.
-- 
2.37.1



^ permalink raw reply related	[flat|nested] 24+ messages in thread

* RE: [PATCH v2 13/13] CHANGELOG: Add Intel HWP entry
  2022-08-10 19:29 ` [PATCH v2 13/13] CHANGELOG: Add Intel HWP entry Jason Andryuk
@ 2022-08-11  1:56   ` Henry Wang
  2022-08-11  6:51   ` Jan Beulich
  1 sibling, 0 replies; 24+ messages in thread
From: Henry Wang @ 2022-08-11  1:56 UTC (permalink / raw)
  To: Jason Andryuk, xen-devel@lists.xenproject.org; +Cc: Community Manager

Hi Jason,

> -----Original Message-----
> From: Jason Andryuk <jandryuk@gmail.com>
> Subject: [PATCH v2 13/13] CHANGELOG: Add Intel HWP entry
> 
> Signed-off-by: Jason Andryuk <jandryuk@gmail.com>

Thanks for the patch,

Acked-by: Henry Wang <Henry.Wang@arm.com>

Kind regards,
Henry


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v2 13/13] CHANGELOG: Add Intel HWP entry
  2022-08-10 19:29 ` [PATCH v2 13/13] CHANGELOG: Add Intel HWP entry Jason Andryuk
  2022-08-11  1:56   ` Henry Wang
@ 2022-08-11  6:51   ` Jan Beulich
  2022-08-11 11:30     ` Jason Andryuk
  1 sibling, 1 reply; 24+ messages in thread
From: Jan Beulich @ 2022-08-11  6:51 UTC (permalink / raw)
  To: Jason Andryuk; +Cc: Henry Wang, Community Manager, xen-devel

On 10.08.2022 21:29, Jason Andryuk wrote:
> --- a/CHANGELOG.md
> +++ b/CHANGELOG.md
> @@ -6,6 +6,9 @@ The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/)
>  
>  ## [unstable UNRELEASED](https://xenbits.xen.org/gitweb/?p=xen.git;a=shortlog;h=staging) - TBD
>  
> +### Added
> + - Intel Hardware P-States (HWP) cpufreq driver

Note that there already is ...

>  ### Changed
>   - On x86 "vga=current" can now be used together with GrUB2's gfxpayload setting. Note that
>     this requires use of "multiboot2" (and "module2") as the GrUB commands loading Xen.

... a "Added / support upgraded" section right below here.

Jan


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v2 13/13] CHANGELOG: Add Intel HWP entry
  2022-08-11  6:51   ` Jan Beulich
@ 2022-08-11 11:30     ` Jason Andryuk
  2022-08-11 12:16       ` Henry Wang
  0 siblings, 1 reply; 24+ messages in thread
From: Jason Andryuk @ 2022-08-11 11:30 UTC (permalink / raw)
  To: Jan Beulich; +Cc: Henry Wang, Community Manager, xen-devel

On Thu, Aug 11, 2022 at 2:51 AM Jan Beulich <jbeulich@suse.com> wrote:
>
> On 10.08.2022 21:29, Jason Andryuk wrote:
> > --- a/CHANGELOG.md
> > +++ b/CHANGELOG.md
> > @@ -6,6 +6,9 @@ The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/)
> >
> >  ## [unstable UNRELEASED](https://xenbits.xen.org/gitweb/?p=xen.git;a=shortlog;h=staging) - TBD
> >
> > +### Added
> > + - Intel Hardware P-States (HWP) cpufreq driver
>
> Note that there already is ...
>
> >  ### Changed
> >   - On x86 "vga=current" can now be used together with GrUB2's gfxpayload setting. Note that
> >     this requires use of "multiboot2" (and "module2") as the GrUB commands loading Xen.
>
> ... a "Added / support upgraded" section right below here.

Whoops, sorry about that.  Thanks for catching it.

For v3, I'll move to the existing section while keeping Henry's Ack.

Regards,
Jason


^ permalink raw reply	[flat|nested] 24+ messages in thread

* RE: [PATCH v2 13/13] CHANGELOG: Add Intel HWP entry
  2022-08-11 11:30     ` Jason Andryuk
@ 2022-08-11 12:16       ` Henry Wang
  0 siblings, 0 replies; 24+ messages in thread
From: Henry Wang @ 2022-08-11 12:16 UTC (permalink / raw)
  To: Jason Andryuk; +Cc: Community Manager, xen-devel, Jan Beulich

Hi Jason,

> -----Original Message-----
> From: Jason Andryuk <jandryuk@gmail.com>
> For v3, I'll move to the existing section while keeping Henry's Ack.

Thank you. Keeping the Ack is ok to me.

Kind regards,
Henry

> 
> Regards,
> Jason

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v2 01/13] cpufreq: Allow restricting to internal governors only
  2022-08-10 19:29 ` [PATCH v2 01/13] cpufreq: Allow restricting to internal governors only Jason Andryuk
@ 2022-08-15 14:22   ` Jan Beulich
  2022-08-15 14:47     ` Jason Andryuk
  0 siblings, 1 reply; 24+ messages in thread
From: Jan Beulich @ 2022-08-15 14:22 UTC (permalink / raw)
  To: Jason Andryuk; +Cc: xen-devel

On 10.08.2022 21:29, Jason Andryuk wrote:
> For hwp, the standard governors are not usable, and only the internal
> one is applicable.  Add the cpufreq_governor_internal boolean to
> indicate when an internal governor, like hwp-internal, will be used.
> This is set during presmp_initcall, so that it can suppress governor
> registration during initcall.

At which point ...

>  Only a governor with a name containing
> "-internal" will be allowed in that case.
> 
> This way, the unuseable governors are not registered, so they internal
> one is the only one returned to userspace.  This means incompatible
> governors won't be advertised to userspace.
> 
> Signed-off-by: Jason Andryuk <jandryuk@gmail.com>
> ---
> v2:
> Switch to "-internal"
> Add blank line in header
> ---
>  xen/drivers/cpufreq/cpufreq.c      | 5 +++++
>  xen/include/acpi/cpufreq/cpufreq.h | 2 ++
>  2 files changed, 7 insertions(+)
> 
> diff --git a/xen/drivers/cpufreq/cpufreq.c b/xen/drivers/cpufreq/cpufreq.c
> index a94520ee57..1fdd63d7b5 100644
> --- a/xen/drivers/cpufreq/cpufreq.c
> +++ b/xen/drivers/cpufreq/cpufreq.c
> @@ -57,6 +57,7 @@ struct cpufreq_dom {
>  };
>  static LIST_HEAD_READ_MOSTLY(cpufreq_dom_list_head);
>  
> +bool __read_mostly cpufreq_governor_internal;

... I expect this can be __initdata? With that adjustment
Acked-by: Jan Beulich <jbeulich@suse.com>

Jan


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v2 02/13] cpufreq: Add perf_freq to cpuinfo
  2022-08-10 19:29 ` [PATCH v2 02/13] cpufreq: Add perf_freq to cpuinfo Jason Andryuk
@ 2022-08-15 14:29   ` Jan Beulich
  0 siblings, 0 replies; 24+ messages in thread
From: Jan Beulich @ 2022-08-15 14:29 UTC (permalink / raw)
  To: Jason Andryuk; +Cc: Andrew Cooper, Roger Pau Monné, Wei Liu, xen-devel

On 10.08.2022 21:29, Jason Andryuk wrote:
> acpi-cpufreq scales the aperf/mperf measurements by max_freq, but HWP
> needs to scale by base frequency.  Settings max_freq to base_freq
> "works" but the code is not obvious, and returning values to userspace
> is tricky.  Add an additonal perf_freq member which is used for scaling
> aperf/mperf measurements.
> 
> Signed-off-by: Jason Andryuk <jandryuk@gmail.com>
> ---
> I don't like this, but it seems the best way to re-use the common
> aperf/mperf code.  The other option would be to add wrappers that then
> do the acpi vs. hwp scaling.

I think that's okay-ish, so
Acked-by: Jan Beulich <jbeulich@suse.com>

Jan


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v2 03/13] cpufreq: Export intel_feature_detect
  2022-08-10 19:29 ` [PATCH v2 03/13] cpufreq: Export intel_feature_detect Jason Andryuk
@ 2022-08-15 14:34   ` Jan Beulich
  2022-08-15 14:58     ` Jason Andryuk
  0 siblings, 1 reply; 24+ messages in thread
From: Jan Beulich @ 2022-08-15 14:34 UTC (permalink / raw)
  To: Jason Andryuk; +Cc: Andrew Cooper, Roger Pau Monné, Wei Liu, xen-devel

On 10.08.2022 21:29, Jason Andryuk wrote:
> Export feature_detect as intel_feature_detect so it can be re-used by
> HWP.
> 
> Signed-off-by: Jason Andryuk <jandryuk@gmail.com>
> ---
> v2
> export intel_feature_detect with typed pointer
> Move intel_feature_detect to acpi/cpufreq/cpufreq.h since the
> declaration now contains struct cpufreq_policy *.

I don't mind the new placement, but I don't follow the reasoning.

> --- a/xen/arch/x86/acpi/cpufreq/cpufreq.c
> +++ b/xen/arch/x86/acpi/cpufreq/cpufreq.c
> @@ -340,9 +340,8 @@ static unsigned int cf_check get_cur_freq_on_cpu(unsigned int cpu)
>      return extract_freq(get_cur_val(cpumask_of(cpu)), data);
>  }
>  
> -static void cf_check feature_detect(void *info)
> +void intel_feature_detect(struct cpufreq_policy *policy)
>  {
> -    struct cpufreq_policy *policy = info;
>      unsigned int eax;
>  
>      eax = cpuid_eax(6);
> @@ -354,6 +353,11 @@ static void cf_check feature_detect(void *info)
>      }
>  }
>  
> +static void cf_check feature_detect(void *info)

This function is invoked indirectly via on_selected_cpus() (hence
the cf_check attribute) - I wonder how you get away without that for
HWP. Or else why we need this as a wrapper here when then you'd have
another similar wrapper elsewhere.

> +{
> +    intel_feature_detect((struct cpufreq_policy *)info);

Why the cast?

Jan


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v2 01/13] cpufreq: Allow restricting to internal governors only
  2022-08-15 14:22   ` Jan Beulich
@ 2022-08-15 14:47     ` Jason Andryuk
  0 siblings, 0 replies; 24+ messages in thread
From: Jason Andryuk @ 2022-08-15 14:47 UTC (permalink / raw)
  To: Jan Beulich; +Cc: xen-devel

On Mon, Aug 15, 2022 at 10:22 AM Jan Beulich <jbeulich@suse.com> wrote:
>
> On 10.08.2022 21:29, Jason Andryuk wrote:
> > For hwp, the standard governors are not usable, and only the internal
> > one is applicable.  Add the cpufreq_governor_internal boolean to
> > indicate when an internal governor, like hwp-internal, will be used.
> > This is set during presmp_initcall, so that it can suppress governor
> > registration during initcall.
>
> At which point ...
>
> >  Only a governor with a name containing
> > "-internal" will be allowed in that case.
> >
> > This way, the unuseable governors are not registered, so they internal
> > one is the only one returned to userspace.  This means incompatible
> > governors won't be advertised to userspace.
> >
> > Signed-off-by: Jason Andryuk <jandryuk@gmail.com>
> > ---
> > v2:
> > Switch to "-internal"
> > Add blank line in header
> > ---
> >  xen/drivers/cpufreq/cpufreq.c      | 5 +++++
> >  xen/include/acpi/cpufreq/cpufreq.h | 2 ++
> >  2 files changed, 7 insertions(+)
> >
> > diff --git a/xen/drivers/cpufreq/cpufreq.c b/xen/drivers/cpufreq/cpufreq.c
> > index a94520ee57..1fdd63d7b5 100644
> > --- a/xen/drivers/cpufreq/cpufreq.c
> > +++ b/xen/drivers/cpufreq/cpufreq.c
> > @@ -57,6 +57,7 @@ struct cpufreq_dom {
> >  };
> >  static LIST_HEAD_READ_MOSTLY(cpufreq_dom_list_head);
> >
> > +bool __read_mostly cpufreq_governor_internal;
>
> ... I expect this can be __initdata? With that adjustment

Patch 09 "xen: Add SET_CPUFREQ_HWP xen_sysctl_pm_op" uses it at
runtime.  But I think it can be dropped since there is a check of the
governor name.  I'll add __initdata.

> Acked-by: Jan Beulich <jbeulich@suse.com>

Thank you,
Jason


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v2 03/13] cpufreq: Export intel_feature_detect
  2022-08-15 14:34   ` Jan Beulich
@ 2022-08-15 14:58     ` Jason Andryuk
  2022-08-15 15:23       ` Jan Beulich
  0 siblings, 1 reply; 24+ messages in thread
From: Jason Andryuk @ 2022-08-15 14:58 UTC (permalink / raw)
  To: Jan Beulich; +Cc: Andrew Cooper, Roger Pau Monné, Wei Liu, xen-devel

On Mon, Aug 15, 2022 at 10:34 AM Jan Beulich <jbeulich@suse.com> wrote:
>
> On 10.08.2022 21:29, Jason Andryuk wrote:
> > Export feature_detect as intel_feature_detect so it can be re-used by
> > HWP.
> >
> > Signed-off-by: Jason Andryuk <jandryuk@gmail.com>
> > ---
> > v2
> > export intel_feature_detect with typed pointer
> > Move intel_feature_detect to acpi/cpufreq/cpufreq.h since the
> > declaration now contains struct cpufreq_policy *.
>
> I don't mind the new placement, but I don't follow the reasoning.

v1 added
void intel_feature_detect(void *info);
to acpi/cpufreq/processor_perf.h

v2 adds
void intel_feature_detect(struct cpufreq_policy *policy)
to acpi/cpufreq/cpufreq.h, which was selected to have the type available.

> > --- a/xen/arch/x86/acpi/cpufreq/cpufreq.c
> > +++ b/xen/arch/x86/acpi/cpufreq/cpufreq.c
> > @@ -340,9 +340,8 @@ static unsigned int cf_check get_cur_freq_on_cpu(unsigned int cpu)
> >      return extract_freq(get_cur_val(cpumask_of(cpu)), data);
> >  }
> >
> > -static void cf_check feature_detect(void *info)
> > +void intel_feature_detect(struct cpufreq_policy *policy)
> >  {
> > -    struct cpufreq_policy *policy = info;
> >      unsigned int eax;
> >
> >      eax = cpuid_eax(6);
> > @@ -354,6 +353,11 @@ static void cf_check feature_detect(void *info)
> >      }
> >  }
> >
> > +static void cf_check feature_detect(void *info)
>
> This function is invoked indirectly via on_selected_cpus() (hence
> the cf_check attribute) - I wonder how you get away without that for
> HWP. Or else why we need this as a wrapper here when then you'd have
> another similar wrapper elsewhere.

HWP calls hwp_init_msrs via on_selected_cpus, which then directly
calls intel_feature_detect().  ACPI needs the cf_check on
feature_detect, but intel_feature_detect is only called directly.

> > +{
> > +    intel_feature_detect((struct cpufreq_policy *)info);
>
> Why the cast?

I thought it was necessary to keep the compiler happy.  Double
checking, you are correct that it is not needed.

Thanks,
Jason


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v2 03/13] cpufreq: Export intel_feature_detect
  2022-08-15 14:58     ` Jason Andryuk
@ 2022-08-15 15:23       ` Jan Beulich
  0 siblings, 0 replies; 24+ messages in thread
From: Jan Beulich @ 2022-08-15 15:23 UTC (permalink / raw)
  To: Jason Andryuk; +Cc: Andrew Cooper, Roger Pau Monné, Wei Liu, xen-devel

On 15.08.2022 16:58, Jason Andryuk wrote:
> On Mon, Aug 15, 2022 at 10:34 AM Jan Beulich <jbeulich@suse.com> wrote:
>>
>> On 10.08.2022 21:29, Jason Andryuk wrote:
>>> Export feature_detect as intel_feature_detect so it can be re-used by
>>> HWP.
>>>
>>> Signed-off-by: Jason Andryuk <jandryuk@gmail.com>
>>> ---
>>> v2
>>> export intel_feature_detect with typed pointer
>>> Move intel_feature_detect to acpi/cpufreq/cpufreq.h since the
>>> declaration now contains struct cpufreq_policy *.
>>
>> I don't mind the new placement, but I don't follow the reasoning.
> 
> v1 added
> void intel_feature_detect(void *info);
> to acpi/cpufreq/processor_perf.h
> 
> v2 adds
> void intel_feature_detect(struct cpufreq_policy *policy)
> to acpi/cpufreq/cpufreq.h, which was selected to have the type available.

But you don't need the type to be available just for a function
declaration. For such a purpose a forward decl of the struct is
sufficient. Anyway - not an issue here.

Jan


^ permalink raw reply	[flat|nested] 24+ messages in thread

end of thread, other threads:[~2022-08-15 15:23 UTC | newest]

Thread overview: 24+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-08-10 19:29 [PATCH v2 00/13] Intel Hardware P-States (HWP) support Jason Andryuk
2022-08-10 19:29 ` [PATCH v2 01/13] cpufreq: Allow restricting to internal governors only Jason Andryuk
2022-08-15 14:22   ` Jan Beulich
2022-08-15 14:47     ` Jason Andryuk
2022-08-10 19:29 ` [PATCH v2 02/13] cpufreq: Add perf_freq to cpuinfo Jason Andryuk
2022-08-15 14:29   ` Jan Beulich
2022-08-10 19:29 ` [PATCH v2 03/13] cpufreq: Export intel_feature_detect Jason Andryuk
2022-08-15 14:34   ` Jan Beulich
2022-08-15 14:58     ` Jason Andryuk
2022-08-15 15:23       ` Jan Beulich
2022-08-10 19:29 ` [PATCH v2 04/13] cpufreq: Add Hardware P-State (HWP) driver Jason Andryuk
2022-08-10 19:29 ` [PATCH v2 05/13] xenpm: Change get-cpufreq-para output for internal Jason Andryuk
2022-08-10 19:29 ` [PATCH v2 06/13] cpufreq: Export HWP parameters to userspace Jason Andryuk
2022-08-10 19:29 ` [PATCH v2 07/13] libxc: Include hwp_para in definitions Jason Andryuk
2022-08-10 19:29 ` [PATCH v2 08/13] xenpm: Print HWP parameters Jason Andryuk
2022-08-10 19:29 ` [PATCH v2 09/13] xen: Add SET_CPUFREQ_HWP xen_sysctl_pm_op Jason Andryuk
2022-08-10 19:29 ` [PATCH v2 10/13] libxc: Add xc_set_cpufreq_hwp Jason Andryuk
2022-08-10 19:29 ` [PATCH v2 11/13] xenpm: Factor out a non-fatal cpuid_parse variant Jason Andryuk
2022-08-10 19:29 ` [PATCH v2 12/13] xenpm: Add set-cpufreq-hwp subcommand Jason Andryuk
2022-08-10 19:29 ` [PATCH v2 13/13] CHANGELOG: Add Intel HWP entry Jason Andryuk
2022-08-11  1:56   ` Henry Wang
2022-08-11  6:51   ` Jan Beulich
2022-08-11 11:30     ` Jason Andryuk
2022-08-11 12:16       ` Henry Wang

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.