LKML Archive mirror
 help / color / mirror / Atom feed
* [PATCH v7 00/24] x86/resctrl : Support AMD Assignable Bandwidth Monitoring Counters (ABMC)
@ 2024-09-04 22:21 Babu Moger
  2024-09-04 22:21 ` [PATCH v7 01/24] x86/cpufeatures: Add support for " Babu Moger
                   ` (24 more replies)
  0 siblings, 25 replies; 96+ messages in thread
From: Babu Moger @ 2024-09-04 22:21 UTC (permalink / raw)
  To: corbet, fenghua.yu, reinette.chatre, tglx, mingo, bp, dave.hansen
  Cc: x86, hpa, paulmck, rdunlap, tj, peterz, yanjiewtw, babu.moger,
	kim.phillips, lukas.bulwahn, seanjc, jmattson, leitao, jpoimboe,
	rick.p.edgecombe, kirill.shutemov, jithu.joseph, kai.huang,
	kan.liang, daniel.sneddon, pbonzini, sandipan.das, ilpo.jarvinen,
	peternewman, maciej.wieczor-retman, linux-doc, linux-kernel,
	eranian, james.morse


This series adds the support for Assignable Bandwidth Monitoring Counters
(ABMC). It is also called QoS RMID Pinning feature

Series is written such that it is easier to support other assignable
features supported from different vendors.

The feature details are documented in the  APM listed below [1].
[1] AMD64 Architecture Programmer's Manual Volume 2: System Programming
Publication # 24593 Revision 3.41 section 19.3.3.3 Assignable Bandwidth
Monitoring (ABMC). The documentation is available at
Link: https://bugzilla.kernel.org/show_bug.cgi?id=206537

The patches are based on top of commit
a85536e1bce7 (tip/master) Merge branch into tip/master: 'x86/timers'

# Introduction

Users can create as many monitor groups as RMIDs supported by the hardware.
However, bandwidth monitoring feature on AMD system only guarantees that
RMIDs currently assigned to a processor will be tracked by hardware.
The counters of any other RMIDs which are no longer being tracked will be
reset to zero. The MBM event counters return "Unavailable" for the RMIDs
that are not tracked by hardware. So, there can be only limited number of
groups that can give guaranteed monitoring numbers. With ever changing
configurations there is no way to definitely know which of these groups
are being tracked for certain point of time. Users do not have the option
to monitor a group or set of groups for certain period of time without
worrying about RMID being reset in between.
    
The ABMC feature provides an option to the user to assign a hardware
counter to an RMID and monitor the bandwidth as long as it is assigned.
The assigned RMID will be tracked by the hardware until the user unassigns
it manually. There is no need to worry about counters being reset during
this period. Additionally, the user can specify a bitmask identifying the
specific bandwidth types from the given source to track with the counter.

Without ABMC enabled, monitoring will work in current mode without
assignment option.

# Linux Implementation

Create a generic interface aimed to support user space assignment
of scarce counters used for monitoring. First usage of interface
is by ABMC with option to expand usage to "soft-ABMC" and MPAM
counters in future.

Feature adds following interface files:

/sys/fs/resctrl/info/L3_MON/mbm_assign_mode: Reports the list of assignable
monitoring features supported. The enclosed brackets indicate which
feature is enabled.

/sys/fs/resctrl/info/L3_MON/num_mbm_cntrs: Reports the number of monitoring
counters available for assignment.

/sys/fs/resctrl/info/L3_MON/mbm_assign_control: Reports the resctrl group and monitor
status of each group. Assignment state can be updated by writing to the
interface.

# Examples

a. Check if ABMC support is available
	#mount -t resctrl resctrl /sys/fs/resctrl/

	#cat /sys/fs/resctrl/info/L3_MON/mbm_assign_mode
	[mbm_cntr_assign]
	default

	ABMC feature is detected and it is enabled.

b. Check how many ABMC counters are available. 

	#cat /sys/fs/resctrl/info/L3_MON/num_mbm_cntrs 
	32

c. Create few resctrl groups.

	# mkdir /sys/fs/resctrl/mon_groups/child_default_mon_grp
	# mkdir /sys/fs/resctrl/non_default_ctrl_mon_grp
	# mkdir /sys/fs/resctrl/non_default_ctrl_mon_grp/mon_groups/child_non_default_mon_grp


d. This series adds a new interface file /sys/fs/resctrl/info/L3_MON/mbm_assign_control
   to list and modify the group's monitoring states. File provides single place
   to list monitoring states of all the resctrl groups. It makes it easier for
   user space to to learn about the used counters without needing to traverse
   all the groups thus reducing the number of file system calls.

	The list follows the following format:

	"<CTRL_MON group>/<MON group>/<domain_id>=<flags>"

	Format for specific type of groups:

	* Default CTRL_MON group:
	 "//<domain_id>=<flags>"

       * Non-default CTRL_MON group:
               "<CTRL_MON group>//<domain_id>=<flags>"

       * Child MON group of default CTRL_MON group:
               "/<MON group>/<domain_id>=<flags>"

       * Child MON group of non-default CTRL_MON group:
               "<CTRL_MON group>/<MON group>/<domain_id>=<flags>"

       Flags can be one of the following:

        t  MBM total event is enabled.
        l  MBM local event is enabled.
        tl Both total and local MBM events are enabled.
        _  None of the MBM events are enabled

	Examples:

	# cat /sys/fs/resctrl/info/L3_MON/mbm_assign_control 
	non_default_ctrl_mon_grp//0=tl;1=tl;
	non_default_ctrl_mon_grp/child_non_default_mon_grp/0=tl;1=tl;
	//0=tl;1=tl;
	/child_default_mon_grp/0=tl;1=tl;
	
	There are four groups and all the groups have local and total
	event enabled on domain 0 and 1.

e. Update the group assignment states using the interface file /sys/fs/resctrl/info/L3_MON/mbm_assign_control.

 	The write format is similar to the above list format with addition
	of opcode for the assignment operation.
    	“<CTRL_MON group>/<MON group>/<domain_id><opcode><flags>”

	
	* Default CTRL_MON group:
	        "//<domain_id><opcode><flags>"
	
	* Non-default CTRL_MON group:
	        "<CTRL_MON group>//<domain_id><opcode><flags>"
	
	* Child MON group of default CTRL_MON group:
	        "/<MON group>/<domain_id><opcode><flags>"
	
	* Child MON group of non-default CTRL_MON group:
	        "<CTRL_MON group>/<MON group>/<domain_id><opcode><flags>"
	
	Opcode can be one of the following:
	
	= Update the assignment to match the flags.
	+ Assign a new MBM event without impacting existing assignments.
	- Unassign a MBM event from currently assigned events.

	Flags can be one of the following:

        t  MBM total event.
        l  MBM local event.
        tl Both total and local MBM events.
        _  None of the MBM events. Only works with '=' opcode.
	
	Initial group status:
	# cat /sys/fs/resctrl/info/L3_MON/mbm_assign_control
	non_default_ctrl_mon_grp//0=tl;1=tl;
	non_default_ctrl_mon_grp/child_non_default_mon_grp/0=tl;1=tl;
	//0=tl;1=tl;
	/child_default_mon_grp/0=tl;1=tl;

	To update the default group to enable only total event on domain 0:
	# echo "//0=t" > /sys/fs/resctrl/info/L3_MON/mbm_assign_control

	Assignment status after the update:
	# cat /sys/fs/resctrl/info/L3_MON/mbm_assign_control
	non_default_ctrl_mon_grp//0=tl;1=tl;
	non_default_ctrl_mon_grp/child_non_default_mon_grp/0=tl;1=tl;
	//0=t;1=tl;
	/child_default_mon_grp/0=tl;1=tl;

	To update the MON group child_default_mon_grp to remove total event on domain 1:
	# echo "/child_default_mon_grp/1-t" > /sys/fs/resctrl/info/L3_MON/mbm_assign_control

	Assignment status after the update:
	$ cat /sys/fs/resctrl/info/L3_MON/mbm_assign_control
	non_default_ctrl_mon_grp//0=tl;1=tl;
	non_default_ctrl_mon_grp/child_non_default_mon_grp/0=tl;1=tl;
	//0=t;1=tl;
	/child_default_mon_grp/0=tl;1=l;

	To update the MON group non_default_ctrl_mon_grp/child_non_default_mon_grp to
	remove both local and total events on domain 1:
	# echo "non_default_ctrl_mon_grp/child_non_default_mon_grp/1=_" >
	       /sys/fs/resctrl/info/L3_MON/mbm_assign_control

	Assignment status after the update:
	non_default_ctrl_mon_grp//0=tl;1=tl;
	non_default_ctrl_mon_grp/child_non_default_mon_grp/0=tl;1=_;
	//0=t;1=tl;
	/child_default_mon_grp/0=tl;1=l;

	To update the default group to add a local event domain 0.
	# echo "//0+l" > /sys/fs/resctrl/info/L3_MON/mbm_assign_control

	Assignment status after the update:
	# cat /sys/fs/resctrl/info/L3_MON/mbm_assign_control
	non_default_ctrl_mon_grp//0=tl;1=tl;
	non_default_ctrl_mon_grp/child_non_default_mon_grp/0=tl;1=_;
	//0=tl;1=tl;
	/child_default_mon_grp/0=tl;1=l;

	To update the non default CTRL_MON group non_default_ctrl_mon_grp to unassign all
	the MBM events on all the domains.
	# echo "non_default_ctrl_mon_grp//*=_" > /sys/fs/resctrl/info/L3_MON/mbm_assign_control

	Assignment status after the update:
	# cat /sys/fs/resctrl/info/L3_MON/mbm_assign_control
	non_default_ctrl_mon_grp//0=_;1=_;
	non_default_ctrl_mon_grp/child_non_default_mon_grp/0=tl;1=_;
	//0=tl;1=tl;
	/child_default_mon_grp/0=tl;1=l;


f. Read the event mbm_total_bytes and mbm_local_bytes of the default group.
   There is no change in reading the events with ABMC. If the event is unassigned
   when reading, then the read will come back as "Unassigned".
	
	# cat /sys/fs/resctrl/mon_data/mon_L3_00/mbm_total_bytes
	779247936
	# cat /sys/fs/resctrl/mon_data/mon_L3_00/mbm_local_bytes 
	765207488
	
g. Check the bandwidth configuration for the group. Note that bandwidth
   configuration has a domain scope. Total event defaults to 0x7F (to
   count all the events) and local event defaults to 0x15 (to count all
   the local numa events). The event bitmap decoding is available at
   https://www.kernel.org/doc/Documentation/x86/resctrl.rst
   in section "mbm_total_bytes_config", "mbm_local_bytes_config":
	
	#cat /sys/fs/resctrl/info/L3_MON/mbm_total_bytes_config 
	0=0x7f;1=0x7f
	
	#cat /sys/fs/resctrl/info/L3_MON/mbm_local_bytes_config 
	0=0x15;1=0x15
	
h. Change the bandwidth source for domain 0 for the total event to count only reads.
   Note that this change effects total events on the domain 0.
	
	#echo 0=0x33 > /sys/fs/resctrl/info/L3_MON/mbm_total_bytes_config 
	#cat /sys/fs/resctrl/info/L3_MON/mbm_total_bytes_config 
	0=0x33;1=0x7F
	
i. Now read the total event again. The first read will come back with "Unavailable"
   status. The subsequent read of mbm_total_bytes will display only the read events.
	
	#cat /sys/fs/resctrl/mon_data/mon_L3_00/mbm_total_bytes
	Unavailable
	#cat /sys/fs/resctrl/mon_data/mon_L3_00/mbm_total_bytes
	314101

j. Users will have the option to go back to 'default' mbm_assign_mode if required.
   This can be done using the following command. Note that switching the
   mbm_assign_mode will reset all the MBM counters of all resctrl groups.

	# echo "default" > /sys/fs/resctrl/info/L3_MON/mbm_assign_mode
	# cat /sys/fs/resctrl/info/L3_MON/mbm_assign_mode
	mbm_cntr_assign
	[default]

	
k. Unmount the resctrl
	 
	#umount /sys/fs/resctrl/
---
v7:
   Major changes are related to FS and arch codes separation.
   Changed few interface names based on feedback.
   Here are the summary and each patch contains changes specific the patch.

   Removed WARN_ON for num_mbm_cntrs. Decided to dynamically allocate the bitmap.
   WARN_ON is not required anymore.
 
   Renamed the function resctrl_arch_get_abmc_enabled() to resctrl_arch_mbm_cntr_assign_enabled().

   Merged resctrl_arch_mbm_cntr_assign_disable, resctrl_arch_mbm_cntr_assign_disable
   and renamed to resctrl_arch_mbm_cntr_assign_set(). Passed the struct rdt_resource
   to these functions.

   Removed resctrl_arch_reset_rmid_all() from arch code. This will be done from FS the caller.

   Updated the descriptions/commit log in resctrl.rst to generic text. Removed ABMC references.
   Renamed mbm_mode to mbm_assign_mode.
   Renamed mbm_control to  mbm_assign_control.
   Introduced mutex lock in rdtgroup_mbm_mode_show().
 
   The 'legacy' mode is called 'default' mode. 

   Removed the static allocation and now allocating bitmap mbm_cntr_free_map dynamically.

   Merged rdtgroup_assign_cntr(), rdtgroup_alloc_cntr() into one.
   Merged rdtgroup_unassign_cntr(), rdtgroup_free_cntr() into one.
   
  Added struct rdt_resource to the interface functions resctrl_arch_assign_cntr ()
  and resctrl_arch_unassign_cntr().
  Rename rdtgroup_abmc_cfg() to resctrl_abmc_config_one_amd().
   
  Added a new patch to fix counter assignment on event config changes.

  Removed the references of ABMC from user interfaces.

  Simplified the parsing (strsep(&token, "//") in rdtgroup_mbm_assign_control_write().
  Added mutex lock in rdtgroup_mbm_assign_control_write() while processing.

  Thomas Gleixner asked us to update  https://gitlab.com/x86-cpuid.org/x86-cpuid-db. 
  It needs internal approval. We are working on it.

v6:
  We still need to finalize few interface details on mbm_assign_mode and mbm_assign_control
  in case of ABMC and Soft-ABMC. We can continue the discussion with this series.

  Added support for domain-id '*' to update all the domains at once.
  Fixed assign interface to allocate the counter if counter is
  not assigned.   
  Fixed unassign interface to free the counter if the counter is not
  assigned in any of the domains.

  Renamed abmc_capable to mbm_cntr_assignable.

  Renamed abmc_enabled to mbm_cntr_assign_enabled.
  Used msr_set_bit and msr_clear_bit for msr updates.
  Renamed resctrl_arch_abmc_enable() to resctrl_arch_mbm_cntr_assign_enable().
  Renamed resctrl_arch_abmc_disable() to resctrl_arch_mbm_cntr_assign_disable().

  Changed the display name from num_cntrs to num_mbm_cntrs.

  Removed the variable mbm_cntrs_free_map_len. This is not required.
  Removed the call mbm_cntrs_init() in arch code. This needs to be done at higher level.
  Used DECLARE_BITMAP to initialize mbm_cntrs_free_map.
  Removed unused config value definitions.

  Introduced mbm_cntr_map to track counters at domain level. With this
  we dont need to send MSR read to read the counter configuration.

  Separated all the counter id management to upper level in FS code.

  Added checks to detect "Unassigned" before reading the RMID.

  More details in each patch.

v5:
  Rebase changes (because of SNC support)

  Interface changes.
   /sys/fs/resctrl/mbm_assign to /sys/fs/resctrl/mbm_assign_mode.
   /sys/fs/resctrl/mbm_assign_control to /sys/fs/resctrl/mbm_assign_control.

  Added few arch specific routines.
  resctrl_arch_get_abmc_enabled.
  resctrl_arch_abmc_enable.
  resctrl_arch_abmc_disable.

  Few renames
   num_cntrs_free_map -> mbm_cntrs_free_map
   num_cntrs_init -> mbm_cntrs_init
   arch_domain_mbm_evt_config -> resctrl_arch_mbm_evt_config

  Introduced resctrl_arch_event_config_get and
    resctrl_arch_event_config_set() to update event configuration.

  Removed mon_state field mongroup. Added MON_CNTR_UNSET to initialize counters.

  Renamed ctr_id to cntr_id for the hardware counter.
 
  Report "Unassigned" in case the user attempts to read the events without assigning the counter.
  
  ABMC is enabled during the boot up. Can be enabled or disabled later.

  Fixed opcode and flags combination.
    '=_" is valid.
    "-_" amd "+_" is not valid.

 Added all the comments as far as I know. If I missed something, it is not intentional.

v4: 
  Main change is domain specific event assignment.
  Kept the ABMC feature as a default.
  Dynamcic switching between ABMC and mbm_legacy is still allowed.
  We are still not clear about mount option.
  Moved the monitoring related data in resctrl_mon structure from rdt_resource.
  Fixed the display of legacy and ABMC mode.
  Used bimap APIs when possible.
  Removed event configuration read from MSRs. We can use the
  internal saved data.(patch 12)
  Added more comments about L3_QOS_ABMC_CFG MSR.
  Added IPIs to read the assignment status for each domain (patch 18 and 19)
  More details in each patch.

v3:
   This series adds the support for global assignment mode discussed in
   the thread. https://lore.kernel.org/lkml/20231201005720.235639-1-babu.moger@amd.com/
   Removed the individual assignment mode and included the global assignment interface.
   Added following interface files.
   a. /sys/fs/resctrl/info/L3_MON/mbm_assign
      Used for displaying the current assignment mode and switch between
      ABMC and legacy mode.
   b. /sys/fs/resctrl/info/L3_MON/mbm_assign_control
      Used for lising the groups assignment mode and modify the assignment states.
   c. Most of the changes are related to the new interface.
   d. Addressed the comments from Reinette, James and Peter.
   e. Hope I have addressed most of the major feedbacks discussed. If I missed
      something then it is not intentional. Please feel free to comment.
   f. Sending this as an RFC as per Reinette's comment. So, this is still open
      for discussion.

v2:
   a. Major change is the way ABMC is enabled. Earlier, user needed to remount
      with -o abmc to enable ABMC feature. Removed that option now.
      Now users can enable ABMC by "$echo 1 to /sys/fs/resctrl/info/L3_MON/mbm_assign_enable".
     
   b. Added new word 21 to x86/cpufeatures.h.

   c. Display unsupported if user attempts to read the events when ABMC is enabled
      and event is not assigned.

   d. Display monitor_state as "Unsupported" when ABMC is disabled.
  
   e. Text updates and rebase to latest tip tree (as of Jan 18).
 
   f. This series is still work in progress. I am yet to hear from ARM developers. 

v6:
  https://lore.kernel.org/lkml/cover.1722981659.git.babu.moger@amd.com/

v5:
  https://lore.kernel.org/lkml/cover.1720043311.git.babu.moger@amd.com/

v4:
  https://lore.kernel.org/lkml/cover.1716552602.git.babu.moger@amd.com/

v3:
 https://lore.kernel.org/lkml/cover.1711674410.git.babu.moger@amd.com/  

v2:
  https://lore.kernel.org/lkml/20231201005720.235639-1-babu.moger@amd.com/

v1 :
   https://lore.kernel.org/lkml/20231201005720.235639-1-babu.moger@amd.com/


Babu Moger (24):
  x86/cpufeatures: Add support for Assignable Bandwidth Monitoring
    Counters (ABMC)
  x86/resctrl: Add ABMC feature in the command line options
  x86/resctrl: Consolidate monitoring related data from rdt_resource
  x86/resctrl: Detect Assignable Bandwidth Monitoring feature details
  x86/resctrl: Introduce resctrl_file_fflags_init() to initialize fflags
  x86/resctrl: Add support to enable/disable AMD ABMC feature
  x86/resctrl: Introduce the interface to display monitor mode
  x86/resctrl: Introduce interface to display number of monitoring
    counters
  x86/resctrl: Introduce bitmap mbm_cntr_free_map to track assignable
    counters
  x86/resctrl: Introduce mbm_total_cfg and mbm_local_cfg in struct
    rdt_hw_mon_domain
  x86/resctrl: Remove MSR reading of event configuration value
  x86/resctrl: Introduce mbm_cntr_map to track counters at domain
  x86/resctrl: Add data structures and definitions for ABMC assignment
  x86/resctrl: Introduce cntr_id in mongroup for assignments
  x86/resctrl: Implement resctrl_arch_assign_cntr to assign a counter
    with ABMC
  x86/resctrl: Add the interface to assign/update counter assignment
  x86/resctrl: Add the interface to unassign a MBM counter
  x86/resctrl: Auto Assign/unassign counters when mbm_cntr_assign is
    enabled
  x86/resctrl: Report "Unassigned" for MBM events in mbm_cntr_assign
    mode
  x86/resctrl: Introduce the interface to switch between monitor modes
  x86/resctrl: Configure mbm_cntr_assign mode if supported
  x86/resctrl: Update assignments on event configuration changes
  x86/resctrl: Introduce interface to list assignment states of all the
    groups
  x86/resctrl: Introduce interface to modify assignment states of the
    groups

 .../admin-guide/kernel-parameters.txt         |   2 +-
 Documentation/arch/x86/resctrl.rst            | 198 ++++
 arch/x86/include/asm/cpufeatures.h            |   1 +
 arch/x86/include/asm/msr-index.h              |   2 +
 arch/x86/kernel/cpu/cpuid-deps.c              |   3 +
 arch/x86/kernel/cpu/resctrl/core.c            |  19 +-
 arch/x86/kernel/cpu/resctrl/ctrlmondata.c     |  13 +-
 arch/x86/kernel/cpu/resctrl/internal.h        |  77 +-
 arch/x86/kernel/cpu/resctrl/monitor.c         |  90 +-
 arch/x86/kernel/cpu/resctrl/rdtgroup.c        | 875 ++++++++++++++++--
 arch/x86/kernel/cpu/scattered.c               |   1 +
 include/linux/resctrl.h                       |  31 +-
 12 files changed, 1227 insertions(+), 85 deletions(-)

-- 
2.34.1


^ permalink raw reply	[flat|nested] 96+ messages in thread

* [PATCH v7 01/24] x86/cpufeatures: Add support for Assignable Bandwidth Monitoring Counters (ABMC)
  2024-09-04 22:21 [PATCH v7 00/24] x86/resctrl : Support AMD Assignable Bandwidth Monitoring Counters (ABMC) Babu Moger
@ 2024-09-04 22:21 ` Babu Moger
  2024-09-04 22:21 ` [PATCH v7 02/24] x86/resctrl: Add ABMC feature in the command line options Babu Moger
                   ` (23 subsequent siblings)
  24 siblings, 0 replies; 96+ messages in thread
From: Babu Moger @ 2024-09-04 22:21 UTC (permalink / raw)
  To: corbet, fenghua.yu, reinette.chatre, tglx, mingo, bp, dave.hansen
  Cc: x86, hpa, paulmck, rdunlap, tj, peterz, yanjiewtw, babu.moger,
	kim.phillips, lukas.bulwahn, seanjc, jmattson, leitao, jpoimboe,
	rick.p.edgecombe, kirill.shutemov, jithu.joseph, kai.huang,
	kan.liang, daniel.sneddon, pbonzini, sandipan.das, ilpo.jarvinen,
	peternewman, maciej.wieczor-retman, linux-doc, linux-kernel,
	eranian, james.morse

Users can create as many monitor groups as RMIDs supported by the hardware.
However, bandwidth monitoring feature on AMD system only guarantees that
RMIDs currently assigned to a processor will be tracked by hardware. The
counters of any other RMIDs which are no longer being tracked will be
reset to zero. The MBM event counters return "Unavailable" for the RMIDs
that are not tracked by hardware. So, there can be only limited number of
groups that can give guaranteed monitoring numbers. With ever changing
configurations there is no way to definitely know which of these groups
are being tracked for certain point of time. Users do not have the option
to monitor a group or set of groups for certain period of time without
worrying about RMID being reset in between.

The ABMC feature provides an option to the user to assign a hardware
counter to an RMID, event pair and monitor the bandwidth as long as it is
assigned. The assigned RMID will be tracked by the hardware until the user
unassigns it manually. There is no need to worry about counters being reset
during this period. Additionally, the user can specify a bitmask identifying
the specific bandwidth types from the given source to track with the counter.

Without ABMC enabled, monitoring will work in current mode without
assignment option.

Linux resctrl subsystem provides the interface to count maximum of two
memory bandwidth events per group, from a combination of available total
and local events. Keeping the current interface, users can enable a maximum
of 2 ABMC counters per group. User will also have the option to enable only
one counter to the group. If the system runs out of assignable ABMC
counters, kernel will display an error. Users need to disable an already
enabled counter to make space for new assignments.

The feature can be detected via CPUID_Fn80000020_EBX_x00 bit 5.
Bits Description
5    ABMC (Assignable Bandwidth Monitoring Counters)

The feature details are documented in APM listed below [1].
[1] AMD64 Architecture Programmer's Manual Volume 2: System Programming
Publication # 24593 Revision 3.41 section 19.3.3.3 Assignable Bandwidth
Monitoring (ABMC).

Link: https://bugzilla.kernel.org/show_bug.cgi?id=206537
Signed-off-by: Babu Moger <babu.moger@amd.com>
Reviewed-by: Reinette Chatre <reinette.chatre@intel.com>
---
Note: Checkpatch checks/warnings are ignored to maintain coding style.

v7: Removed "" from feature flags. Not required anymore.
    https://lore.kernel.org/lkml/20240817145058.GCZsC40neU4wkPXeVR@fat_crate.local/

v6: Added Reinette's Reviewed-by. Moved the Checkpatch note below ---.

v5: Minor rebase change and subject line update.

v4: Changes because of rebase. Feature word 21 has few more additions now.
    Changed the text to "tracked by hardware" instead of active.

v3: Change because of rebase. Actual patch did not change.

v2: Added dependency on X86_FEATURE_BMEC.
---
 arch/x86/include/asm/cpufeatures.h | 1 +
 arch/x86/kernel/cpu/cpuid-deps.c   | 3 +++
 arch/x86/kernel/cpu/scattered.c    | 1 +
 3 files changed, 5 insertions(+)

diff --git a/arch/x86/include/asm/cpufeatures.h b/arch/x86/include/asm/cpufeatures.h
index dd4682857c12..4c514cb245ff 100644
--- a/arch/x86/include/asm/cpufeatures.h
+++ b/arch/x86/include/asm/cpufeatures.h
@@ -473,6 +473,7 @@
 #define X86_FEATURE_CLEAR_BHB_HW	(21*32+ 3) /* BHI_DIS_S HW control enabled */
 #define X86_FEATURE_CLEAR_BHB_LOOP_ON_VMEXIT (21*32+ 4) /* Clear branch history at vmexit using SW loop */
 #define X86_FEATURE_FAST_CPPC		(21*32 + 5) /* AMD Fast CPPC */
+#define X86_FEATURE_ABMC		(21*32 + 6) /* Assignable Bandwidth Monitoring Counters */
 
 /*
  * BUG word(s)
diff --git a/arch/x86/kernel/cpu/cpuid-deps.c b/arch/x86/kernel/cpu/cpuid-deps.c
index 8bd84114c2d9..7e4d63b381d6 100644
--- a/arch/x86/kernel/cpu/cpuid-deps.c
+++ b/arch/x86/kernel/cpu/cpuid-deps.c
@@ -70,6 +70,9 @@ static const struct cpuid_dep cpuid_deps[] = {
 	{ X86_FEATURE_CQM_MBM_LOCAL,		X86_FEATURE_CQM_LLC   },
 	{ X86_FEATURE_BMEC,			X86_FEATURE_CQM_MBM_TOTAL   },
 	{ X86_FEATURE_BMEC,			X86_FEATURE_CQM_MBM_LOCAL   },
+	{ X86_FEATURE_ABMC,			X86_FEATURE_CQM_MBM_TOTAL   },
+	{ X86_FEATURE_ABMC,			X86_FEATURE_CQM_MBM_LOCAL   },
+	{ X86_FEATURE_ABMC,			X86_FEATURE_BMEC      },
 	{ X86_FEATURE_AVX512_BF16,		X86_FEATURE_AVX512VL  },
 	{ X86_FEATURE_AVX512_FP16,		X86_FEATURE_AVX512BW  },
 	{ X86_FEATURE_ENQCMD,			X86_FEATURE_XSAVES    },
diff --git a/arch/x86/kernel/cpu/scattered.c b/arch/x86/kernel/cpu/scattered.c
index c84c30188fdf..87f63e6b2994 100644
--- a/arch/x86/kernel/cpu/scattered.c
+++ b/arch/x86/kernel/cpu/scattered.c
@@ -49,6 +49,7 @@ static const struct cpuid_bit cpuid_bits[] = {
 	{ X86_FEATURE_MBA,		CPUID_EBX,  6, 0x80000008, 0 },
 	{ X86_FEATURE_SMBA,		CPUID_EBX,  2, 0x80000020, 0 },
 	{ X86_FEATURE_BMEC,		CPUID_EBX,  3, 0x80000020, 0 },
+	{ X86_FEATURE_ABMC,		CPUID_EBX,  5, 0x80000020, 0 },
 	{ X86_FEATURE_PERFMON_V2,	CPUID_EAX,  0, 0x80000022, 0 },
 	{ X86_FEATURE_AMD_LBR_V2,	CPUID_EAX,  1, 0x80000022, 0 },
 	{ X86_FEATURE_AMD_LBR_PMC_FREEZE,	CPUID_EAX,  2, 0x80000022, 0 },
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 96+ messages in thread

* [PATCH v7 02/24] x86/resctrl: Add ABMC feature in the command line options
  2024-09-04 22:21 [PATCH v7 00/24] x86/resctrl : Support AMD Assignable Bandwidth Monitoring Counters (ABMC) Babu Moger
  2024-09-04 22:21 ` [PATCH v7 01/24] x86/cpufeatures: Add support for " Babu Moger
@ 2024-09-04 22:21 ` Babu Moger
  2024-09-19 16:00   ` Reinette Chatre
  2024-09-04 22:21 ` [PATCH v7 03/24] x86/resctrl: Consolidate monitoring related data from rdt_resource Babu Moger
                   ` (22 subsequent siblings)
  24 siblings, 1 reply; 96+ messages in thread
From: Babu Moger @ 2024-09-04 22:21 UTC (permalink / raw)
  To: corbet, fenghua.yu, reinette.chatre, tglx, mingo, bp, dave.hansen
  Cc: x86, hpa, paulmck, rdunlap, tj, peterz, yanjiewtw, babu.moger,
	kim.phillips, lukas.bulwahn, seanjc, jmattson, leitao, jpoimboe,
	rick.p.edgecombe, kirill.shutemov, jithu.joseph, kai.huang,
	kan.liang, daniel.sneddon, pbonzini, sandipan.das, ilpo.jarvinen,
	peternewman, maciej.wieczor-retman, linux-doc, linux-kernel,
	eranian, james.morse

Add the command line option to enable or disable the new resctrl feature
ABMC (Assignable Bandwidth Monitoring Counters).

Signed-off-by: Babu Moger <babu.moger@amd.com>
---
v7: No changes

v6: No changes

v5: No changes

v4: No changes

v3: No changes

v2: No changes
---
 Documentation/admin-guide/kernel-parameters.txt | 2 +-
 Documentation/arch/x86/resctrl.rst              | 1 +
 arch/x86/kernel/cpu/resctrl/core.c              | 2 ++
 3 files changed, 4 insertions(+), 1 deletion(-)

diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
index 09126bb8cc9f..12cc0a26c82a 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -5604,7 +5604,7 @@
 	rdt=		[HW,X86,RDT]
 			Turn on/off individual RDT features. List is:
 			cmt, mbmtotal, mbmlocal, l3cat, l3cdp, l2cat, l2cdp,
-			mba, smba, bmec.
+			mba, smba, bmec, abmc.
 			E.g. to turn on cmt and turn off mba use:
 				rdt=cmt,!mba
 
diff --git a/Documentation/arch/x86/resctrl.rst b/Documentation/arch/x86/resctrl.rst
index a824affd741d..30586728a4cd 100644
--- a/Documentation/arch/x86/resctrl.rst
+++ b/Documentation/arch/x86/resctrl.rst
@@ -26,6 +26,7 @@ MBM (Memory Bandwidth Monitoring)		"cqm_mbm_total", "cqm_mbm_local"
 MBA (Memory Bandwidth Allocation)		"mba"
 SMBA (Slow Memory Bandwidth Allocation)         ""
 BMEC (Bandwidth Monitoring Event Configuration) ""
+ABMC (Assignable Bandwidth Monitoring Counters) ""
 ===============================================	================================
 
 Historically, new features were made visible by default in /proc/cpuinfo. This
diff --git a/arch/x86/kernel/cpu/resctrl/core.c b/arch/x86/kernel/cpu/resctrl/core.c
index 8591d53c144b..668148ceda0b 100644
--- a/arch/x86/kernel/cpu/resctrl/core.c
+++ b/arch/x86/kernel/cpu/resctrl/core.c
@@ -809,6 +809,7 @@ enum {
 	RDT_FLAG_MBA,
 	RDT_FLAG_SMBA,
 	RDT_FLAG_BMEC,
+	RDT_FLAG_ABMC,
 };
 
 #define RDT_OPT(idx, n, f)	\
@@ -834,6 +835,7 @@ static struct rdt_options rdt_options[]  __initdata = {
 	RDT_OPT(RDT_FLAG_MBA,	    "mba",	X86_FEATURE_MBA),
 	RDT_OPT(RDT_FLAG_SMBA,	    "smba",	X86_FEATURE_SMBA),
 	RDT_OPT(RDT_FLAG_BMEC,	    "bmec",	X86_FEATURE_BMEC),
+	RDT_OPT(RDT_FLAG_ABMC,	    "abmc",	X86_FEATURE_ABMC),
 };
 #define NUM_RDT_OPTIONS ARRAY_SIZE(rdt_options)
 
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 96+ messages in thread

* [PATCH v7 03/24] x86/resctrl: Consolidate monitoring related data from rdt_resource
  2024-09-04 22:21 [PATCH v7 00/24] x86/resctrl : Support AMD Assignable Bandwidth Monitoring Counters (ABMC) Babu Moger
  2024-09-04 22:21 ` [PATCH v7 01/24] x86/cpufeatures: Add support for " Babu Moger
  2024-09-04 22:21 ` [PATCH v7 02/24] x86/resctrl: Add ABMC feature in the command line options Babu Moger
@ 2024-09-04 22:21 ` Babu Moger
  2024-09-19 16:03   ` Reinette Chatre
  2024-09-04 22:21 ` [PATCH v7 04/24] x86/resctrl: Detect Assignable Bandwidth Monitoring feature details Babu Moger
                   ` (21 subsequent siblings)
  24 siblings, 1 reply; 96+ messages in thread
From: Babu Moger @ 2024-09-04 22:21 UTC (permalink / raw)
  To: corbet, fenghua.yu, reinette.chatre, tglx, mingo, bp, dave.hansen
  Cc: x86, hpa, paulmck, rdunlap, tj, peterz, yanjiewtw, babu.moger,
	kim.phillips, lukas.bulwahn, seanjc, jmattson, leitao, jpoimboe,
	rick.p.edgecombe, kirill.shutemov, jithu.joseph, kai.huang,
	kan.liang, daniel.sneddon, pbonzini, sandipan.das, ilpo.jarvinen,
	peternewman, maciej.wieczor-retman, linux-doc, linux-kernel,
	eranian, james.morse

The cache allocation and memory bandwidth allocation feature properties
are consolidated into struct resctrl_cache and struct resctrl_membw
respectively.

In preparation for more monitoring properties that will clobber the
existing resource struct more, re-organize the monitoring specific
properties to also be in a separate structure.

Suggested-by: Reinette Chatre <reinette.chatre@intel.com>
Signed-off-by: Babu Moger <babu.moger@amd.com>
---
v7: Added kernel doc for data structure. Minor text update.

v6: Update commit message and update kernel doc for rdt_resource.

v5: Commit message update.
    Also changes related to data structure updates does to SNC support.

v4: New patch.
---
 arch/x86/kernel/cpu/resctrl/core.c     |  4 ++--
 arch/x86/kernel/cpu/resctrl/monitor.c  | 18 +++++++++---------
 arch/x86/kernel/cpu/resctrl/rdtgroup.c |  8 ++++----
 include/linux/resctrl.h                | 16 ++++++++++++----
 4 files changed, 27 insertions(+), 19 deletions(-)

diff --git a/arch/x86/kernel/cpu/resctrl/core.c b/arch/x86/kernel/cpu/resctrl/core.c
index 668148ceda0b..73bfc8d7a438 100644
--- a/arch/x86/kernel/cpu/resctrl/core.c
+++ b/arch/x86/kernel/cpu/resctrl/core.c
@@ -124,7 +124,7 @@ u32 resctrl_arch_system_num_rmid_idx(void)
 	struct rdt_resource *r = &rdt_resources_all[RDT_RESOURCE_L3].r_resctrl;
 
 	/* RMID are independent numbers for x86. num_rmid_idx == num_rmid */
-	return r->num_rmid;
+	return r->mon.num_rmid;
 }
 
 /*
@@ -625,7 +625,7 @@ static void domain_add_cpu_mon(int cpu, struct rdt_resource *r)
 
 	arch_mon_domain_online(r, d);
 
-	if (arch_domain_mbm_alloc(r->num_rmid, hw_dom)) {
+	if (arch_domain_mbm_alloc(r->mon.num_rmid, hw_dom)) {
 		mon_domain_free(hw_dom);
 		return;
 	}
diff --git a/arch/x86/kernel/cpu/resctrl/monitor.c b/arch/x86/kernel/cpu/resctrl/monitor.c
index 851b561850e0..795fe91a8feb 100644
--- a/arch/x86/kernel/cpu/resctrl/monitor.c
+++ b/arch/x86/kernel/cpu/resctrl/monitor.c
@@ -222,7 +222,7 @@ static int logical_rmid_to_physical_rmid(int cpu, int lrmid)
 	if (snc_nodes_per_l3_cache == 1)
 		return lrmid;
 
-	return lrmid + (cpu_to_node(cpu) % snc_nodes_per_l3_cache) * r->num_rmid;
+	return lrmid + (cpu_to_node(cpu) % snc_nodes_per_l3_cache) * r->mon.num_rmid;
 }
 
 static int __rmid_read_phys(u32 prmid, enum resctrl_event_id eventid, u64 *val)
@@ -297,11 +297,11 @@ void resctrl_arch_reset_rmid_all(struct rdt_resource *r, struct rdt_mon_domain *
 
 	if (is_mbm_total_enabled())
 		memset(hw_dom->arch_mbm_total, 0,
-		       sizeof(*hw_dom->arch_mbm_total) * r->num_rmid);
+		       sizeof(*hw_dom->arch_mbm_total) * r->mon.num_rmid);
 
 	if (is_mbm_local_enabled())
 		memset(hw_dom->arch_mbm_local, 0,
-		       sizeof(*hw_dom->arch_mbm_local) * r->num_rmid);
+		       sizeof(*hw_dom->arch_mbm_local) * r->mon.num_rmid);
 }
 
 static u64 mbm_overflow_count(u64 prev_msr, u64 cur_msr, unsigned int width)
@@ -1083,14 +1083,14 @@ static struct mon_evt mbm_local_event = {
  */
 static void l3_mon_evt_init(struct rdt_resource *r)
 {
-	INIT_LIST_HEAD(&r->evt_list);
+	INIT_LIST_HEAD(&r->mon.evt_list);
 
 	if (is_llc_occupancy_enabled())
-		list_add_tail(&llc_occupancy_event.list, &r->evt_list);
+		list_add_tail(&llc_occupancy_event.list, &r->mon.evt_list);
 	if (is_mbm_total_enabled())
-		list_add_tail(&mbm_total_event.list, &r->evt_list);
+		list_add_tail(&mbm_total_event.list, &r->mon.evt_list);
 	if (is_mbm_local_enabled())
-		list_add_tail(&mbm_local_event.list, &r->evt_list);
+		list_add_tail(&mbm_local_event.list, &r->mon.evt_list);
 }
 
 /*
@@ -1186,7 +1186,7 @@ int __init rdt_get_mon_l3_config(struct rdt_resource *r)
 
 	resctrl_rmid_realloc_limit = boot_cpu_data.x86_cache_size * 1024;
 	hw_res->mon_scale = boot_cpu_data.x86_cache_occ_scale / snc_nodes_per_l3_cache;
-	r->num_rmid = (boot_cpu_data.x86_cache_max_rmid + 1) / snc_nodes_per_l3_cache;
+	r->mon.num_rmid = (boot_cpu_data.x86_cache_max_rmid + 1) / snc_nodes_per_l3_cache;
 	hw_res->mbm_width = MBM_CNTR_WIDTH_BASE;
 
 	if (mbm_offset > 0 && mbm_offset <= MBM_CNTR_WIDTH_OFFSET_MAX)
@@ -1201,7 +1201,7 @@ int __init rdt_get_mon_l3_config(struct rdt_resource *r)
 	 *
 	 * For a 35MB LLC and 56 RMIDs, this is ~1.8% of the LLC.
 	 */
-	threshold = resctrl_rmid_realloc_limit / r->num_rmid;
+	threshold = resctrl_rmid_realloc_limit / r->mon.num_rmid;
 
 	/*
 	 * Because num_rmid may not be a power of two, round the value
diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
index d7163b764c62..f9f3b5db1987 100644
--- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
+++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
@@ -1097,7 +1097,7 @@ static int rdt_num_rmids_show(struct kernfs_open_file *of,
 {
 	struct rdt_resource *r = of->kn->parent->priv;
 
-	seq_printf(seq, "%d\n", r->num_rmid);
+	seq_printf(seq, "%d\n", r->mon.num_rmid);
 
 	return 0;
 }
@@ -1108,7 +1108,7 @@ static int rdt_mon_features_show(struct kernfs_open_file *of,
 	struct rdt_resource *r = of->kn->parent->priv;
 	struct mon_evt *mevt;
 
-	list_for_each_entry(mevt, &r->evt_list, list) {
+	list_for_each_entry(mevt, &r->mon.evt_list, list) {
 		seq_printf(seq, "%s\n", mevt->name);
 		if (mevt->configurable)
 			seq_printf(seq, "%s_config\n", mevt->name);
@@ -3057,13 +3057,13 @@ static int mon_add_all_files(struct kernfs_node *kn, struct rdt_mon_domain *d,
 	struct mon_evt *mevt;
 	int ret;
 
-	if (WARN_ON(list_empty(&r->evt_list)))
+	if (WARN_ON(list_empty(&r->mon.evt_list)))
 		return -EPERM;
 
 	priv.u.rid = r->rid;
 	priv.u.domid = do_sum ? d->ci->id : d->hdr.id;
 	priv.u.sum = do_sum;
-	list_for_each_entry(mevt, &r->evt_list, list) {
+	list_for_each_entry(mevt, &r->mon.evt_list, list) {
 		priv.u.evtid = mevt->evtid;
 		ret = mon_addfile(kn, mevt->name, priv.priv);
 		if (ret)
diff --git a/include/linux/resctrl.h b/include/linux/resctrl.h
index d94abba1c716..3c2307c7c106 100644
--- a/include/linux/resctrl.h
+++ b/include/linux/resctrl.h
@@ -182,16 +182,26 @@ enum resctrl_scope {
 	RESCTRL_L3_NODE,
 };
 
+/**
+ * struct resctrl_mon - Monitoring related data of a resctrl resource
+ * @num_rmid:		Number of RMIDs available
+ * @evt_list:		List of monitoring events
+ */
+struct resctrl_mon {
+	int			num_rmid;
+	struct list_head	evt_list;
+};
+
 /**
  * struct rdt_resource - attributes of a resctrl resource
  * @rid:		The index of the resource
  * @alloc_capable:	Is allocation available on this machine
  * @mon_capable:	Is monitor feature available on this machine
- * @num_rmid:		Number of RMIDs available
  * @ctrl_scope:		Scope of this resource for control functions
  * @mon_scope:		Scope of this resource for monitor functions
  * @cache:		Cache allocation related data
  * @membw:		If the component has bandwidth controls, their properties.
+ * @mon:		Monitoring related data.
  * @ctrl_domains:	RCU list of all control domains for this resource
  * @mon_domains:	RCU list of all monitor domains for this resource
  * @name:		Name to use in "schemata" file.
@@ -199,7 +209,6 @@ enum resctrl_scope {
  * @default_ctrl:	Specifies default cache cbm or memory B/W percent.
  * @format_str:		Per resource format string to show domain value
  * @parse_ctrlval:	Per resource function pointer to parse control values
- * @evt_list:		List of monitoring events
  * @fflags:		flags to choose base and info files
  * @cdp_capable:	Is the CDP feature available on this resource
  */
@@ -207,11 +216,11 @@ struct rdt_resource {
 	int			rid;
 	bool			alloc_capable;
 	bool			mon_capable;
-	int			num_rmid;
 	enum resctrl_scope	ctrl_scope;
 	enum resctrl_scope	mon_scope;
 	struct resctrl_cache	cache;
 	struct resctrl_membw	membw;
+	struct resctrl_mon	mon;
 	struct list_head	ctrl_domains;
 	struct list_head	mon_domains;
 	char			*name;
@@ -221,7 +230,6 @@ struct rdt_resource {
 	int			(*parse_ctrlval)(struct rdt_parse_data *data,
 						 struct resctrl_schema *s,
 						 struct rdt_ctrl_domain *d);
-	struct list_head	evt_list;
 	unsigned long		fflags;
 	bool			cdp_capable;
 };
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 96+ messages in thread

* [PATCH v7 04/24] x86/resctrl: Detect Assignable Bandwidth Monitoring feature details
  2024-09-04 22:21 [PATCH v7 00/24] x86/resctrl : Support AMD Assignable Bandwidth Monitoring Counters (ABMC) Babu Moger
                   ` (2 preceding siblings ...)
  2024-09-04 22:21 ` [PATCH v7 03/24] x86/resctrl: Consolidate monitoring related data from rdt_resource Babu Moger
@ 2024-09-04 22:21 ` Babu Moger
  2024-09-19 16:16   ` Reinette Chatre
  2024-09-04 22:21 ` [PATCH v7 05/24] x86/resctrl: Introduce resctrl_file_fflags_init() to initialize fflags Babu Moger
                   ` (20 subsequent siblings)
  24 siblings, 1 reply; 96+ messages in thread
From: Babu Moger @ 2024-09-04 22:21 UTC (permalink / raw)
  To: corbet, fenghua.yu, reinette.chatre, tglx, mingo, bp, dave.hansen
  Cc: x86, hpa, paulmck, rdunlap, tj, peterz, yanjiewtw, babu.moger,
	kim.phillips, lukas.bulwahn, seanjc, jmattson, leitao, jpoimboe,
	rick.p.edgecombe, kirill.shutemov, jithu.joseph, kai.huang,
	kan.liang, daniel.sneddon, pbonzini, sandipan.das, ilpo.jarvinen,
	peternewman, maciej.wieczor-retman, linux-doc, linux-kernel,
	eranian, james.morse

ABMC feature details are reported via CPUID Fn8000_0020_EBX_x5.
Bits Description
15:0 MAX_ABMC Maximum Supported Assignable Bandwidth
     Monitoring Counter ID + 1

The feature details are documented in APM listed below [1].
[1] AMD64 Architecture Programmer's Manual Volume 2: System Programming
Publication # 24593 Revision 3.41 section 19.3.3.3 Assignable Bandwidth
Monitoring (ABMC).

Detect the feature and number of assignable monitoring counters supported.

Link: https://bugzilla.kernel.org/show_bug.cgi?id=206537
Signed-off-by: Babu Moger <babu.moger@amd.com>
---
v7: Removed WARN_ON for num_mbm_cntrs. Decided to dynamically allocate the
    bitmap. WARN_ON is not required anymore.
    Removed redundant comments.

v6: Commit message update.
    Renamed abmc_capable to mbm_cntr_assignable.

v5: Name change num_cntrs to num_mbm_cntrs.
    Moved abmc_capable to resctrl_mon.

v4: Removed resctrl_arch_has_abmc(). Added all the code inline. We dont
    need to separate this as arch code.

v3: Removed changes related to mon_features.
    Moved rdt_cpu_has to core.c and added new function resctrl_arch_has_abmc.
    Also moved the fields mbm_assign_capable and mbm_assign_cntrs to
    rdt_resource. (James)

v2: Changed the field name to mbm_assign_capable from abmc_capable.
---
 arch/x86/kernel/cpu/resctrl/monitor.c | 6 ++++++
 include/linux/resctrl.h               | 4 ++++
 2 files changed, 10 insertions(+)

diff --git a/arch/x86/kernel/cpu/resctrl/monitor.c b/arch/x86/kernel/cpu/resctrl/monitor.c
index 795fe91a8feb..6a792f06f5ce 100644
--- a/arch/x86/kernel/cpu/resctrl/monitor.c
+++ b/arch/x86/kernel/cpu/resctrl/monitor.c
@@ -1229,6 +1229,12 @@ int __init rdt_get_mon_l3_config(struct rdt_resource *r)
 			mbm_local_event.configurable = true;
 			mbm_config_rftype_init("mbm_local_bytes_config");
 		}
+
+		if (rdt_cpu_has(X86_FEATURE_ABMC)) {
+			r->mon.mbm_cntr_assignable = true;
+			cpuid_count(0x80000020, 5, &eax, &ebx, &ecx, &edx);
+			r->mon.num_mbm_cntrs = (ebx & 0xFFFF) + 1;
+		}
 	}
 
 	l3_mon_evt_init(r);
diff --git a/include/linux/resctrl.h b/include/linux/resctrl.h
index 3c2307c7c106..511cfce8fc21 100644
--- a/include/linux/resctrl.h
+++ b/include/linux/resctrl.h
@@ -185,10 +185,14 @@ enum resctrl_scope {
 /**
  * struct resctrl_mon - Monitoring related data of a resctrl resource
  * @num_rmid:		Number of RMIDs available
+ * @num_mbm_cntrs:	Number of assignable monitoring counters
+ * @mbm_cntr_assignable:Is system capable of supporting monitor assignment?
  * @evt_list:		List of monitoring events
  */
 struct resctrl_mon {
 	int			num_rmid;
+	int			num_mbm_cntrs;
+	bool			mbm_cntr_assignable;
 	struct list_head	evt_list;
 };
 
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 96+ messages in thread

* [PATCH v7 05/24] x86/resctrl: Introduce resctrl_file_fflags_init() to initialize fflags
  2024-09-04 22:21 [PATCH v7 00/24] x86/resctrl : Support AMD Assignable Bandwidth Monitoring Counters (ABMC) Babu Moger
                   ` (3 preceding siblings ...)
  2024-09-04 22:21 ` [PATCH v7 04/24] x86/resctrl: Detect Assignable Bandwidth Monitoring feature details Babu Moger
@ 2024-09-04 22:21 ` Babu Moger
  2024-09-04 22:21 ` [PATCH v7 06/24] x86/resctrl: Add support to enable/disable AMD ABMC feature Babu Moger
                   ` (19 subsequent siblings)
  24 siblings, 0 replies; 96+ messages in thread
From: Babu Moger @ 2024-09-04 22:21 UTC (permalink / raw)
  To: corbet, fenghua.yu, reinette.chatre, tglx, mingo, bp, dave.hansen
  Cc: x86, hpa, paulmck, rdunlap, tj, peterz, yanjiewtw, babu.moger,
	kim.phillips, lukas.bulwahn, seanjc, jmattson, leitao, jpoimboe,
	rick.p.edgecombe, kirill.shutemov, jithu.joseph, kai.huang,
	kan.liang, daniel.sneddon, pbonzini, sandipan.das, ilpo.jarvinen,
	peternewman, maciej.wieczor-retman, linux-doc, linux-kernel,
	eranian, james.morse

thread_throttle_mode_init() and mbm_config_rftype_init() both initialize
fflags for resctrl files.

Adding new files will involve adding another function to initialize
the fflags. This can be simplified by adding a new function
resctrl_file_fflags_init() and passing the file name and flags
to be initialized.

Consolidate fflags initialization into resctrl_file_fflags_init() and
remove thread_throttle_mode_init() and mbm_config_rftype_init().

Signed-off-by: Babu Moger <babu.moger@amd.com>
Reviewed-by: Reinette Chatre <reinette.chatre@intel.com>
---
v7: No changes.

v6: Added Reviewed-by from Reinette.

v5: Commit message update.

v4: Commit message update.

v3: New patch to display ABMC capability.
---
 arch/x86/kernel/cpu/resctrl/core.c     |  4 +++-
 arch/x86/kernel/cpu/resctrl/internal.h |  4 ++--
 arch/x86/kernel/cpu/resctrl/monitor.c  |  6 ++++--
 arch/x86/kernel/cpu/resctrl/rdtgroup.c | 16 +++-------------
 4 files changed, 12 insertions(+), 18 deletions(-)

diff --git a/arch/x86/kernel/cpu/resctrl/core.c b/arch/x86/kernel/cpu/resctrl/core.c
index 73bfc8d7a438..186d8047578b 100644
--- a/arch/x86/kernel/cpu/resctrl/core.c
+++ b/arch/x86/kernel/cpu/resctrl/core.c
@@ -234,7 +234,9 @@ static bool __get_mem_config_intel(struct rdt_resource *r)
 		r->membw.throttle_mode = THREAD_THROTTLE_PER_THREAD;
 	else
 		r->membw.throttle_mode = THREAD_THROTTLE_MAX;
-	thread_throttle_mode_init();
+
+	resctrl_file_fflags_init("thread_throttle_mode",
+				 RFTYPE_CTRL_INFO | RFTYPE_RES_MB);
 
 	r->alloc_capable = true;
 
diff --git a/arch/x86/kernel/cpu/resctrl/internal.h b/arch/x86/kernel/cpu/resctrl/internal.h
index 955999aecfca..2bd207624eec 100644
--- a/arch/x86/kernel/cpu/resctrl/internal.h
+++ b/arch/x86/kernel/cpu/resctrl/internal.h
@@ -647,8 +647,8 @@ void cqm_handle_limbo(struct work_struct *work);
 bool has_busy_rmid(struct rdt_mon_domain *d);
 void __check_limbo(struct rdt_mon_domain *d, bool force_free);
 void rdt_domain_reconfigure_cdp(struct rdt_resource *r);
-void __init thread_throttle_mode_init(void);
-void __init mbm_config_rftype_init(const char *config);
+void __init resctrl_file_fflags_init(const char *config,
+				     unsigned long fflags);
 void rdt_staged_configs_clear(void);
 bool closid_allocated(unsigned int closid);
 int resctrl_find_cleanest_closid(void);
diff --git a/arch/x86/kernel/cpu/resctrl/monitor.c b/arch/x86/kernel/cpu/resctrl/monitor.c
index 6a792f06f5ce..71fab31e20da 100644
--- a/arch/x86/kernel/cpu/resctrl/monitor.c
+++ b/arch/x86/kernel/cpu/resctrl/monitor.c
@@ -1223,11 +1223,13 @@ int __init rdt_get_mon_l3_config(struct rdt_resource *r)
 
 		if (rdt_cpu_has(X86_FEATURE_CQM_MBM_TOTAL)) {
 			mbm_total_event.configurable = true;
-			mbm_config_rftype_init("mbm_total_bytes_config");
+			resctrl_file_fflags_init("mbm_total_bytes_config",
+						 RFTYPE_MON_INFO | RFTYPE_RES_CACHE);
 		}
 		if (rdt_cpu_has(X86_FEATURE_CQM_MBM_LOCAL)) {
 			mbm_local_event.configurable = true;
-			mbm_config_rftype_init("mbm_local_bytes_config");
+			resctrl_file_fflags_init("mbm_local_bytes_config",
+						 RFTYPE_MON_INFO | RFTYPE_RES_CACHE);
 		}
 
 		if (rdt_cpu_has(X86_FEATURE_ABMC)) {
diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
index f9f3b5db1987..7e76f8d839fc 100644
--- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
+++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
@@ -2020,24 +2020,14 @@ static struct rftype *rdtgroup_get_rftype_by_name(const char *name)
 	return NULL;
 }
 
-void __init thread_throttle_mode_init(void)
-{
-	struct rftype *rft;
-
-	rft = rdtgroup_get_rftype_by_name("thread_throttle_mode");
-	if (!rft)
-		return;
-
-	rft->fflags = RFTYPE_CTRL_INFO | RFTYPE_RES_MB;
-}
-
-void __init mbm_config_rftype_init(const char *config)
+void __init resctrl_file_fflags_init(const char *config,
+				     unsigned long fflags)
 {
 	struct rftype *rft;
 
 	rft = rdtgroup_get_rftype_by_name(config);
 	if (rft)
-		rft->fflags = RFTYPE_MON_INFO | RFTYPE_RES_CACHE;
+		rft->fflags = fflags;
 }
 
 /**
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 96+ messages in thread

* [PATCH v7 06/24] x86/resctrl: Add support to enable/disable AMD ABMC feature
  2024-09-04 22:21 [PATCH v7 00/24] x86/resctrl : Support AMD Assignable Bandwidth Monitoring Counters (ABMC) Babu Moger
                   ` (4 preceding siblings ...)
  2024-09-04 22:21 ` [PATCH v7 05/24] x86/resctrl: Introduce resctrl_file_fflags_init() to initialize fflags Babu Moger
@ 2024-09-04 22:21 ` Babu Moger
  2024-09-19 16:22   ` Reinette Chatre
  2024-09-04 22:21 ` [PATCH v7 07/24] x86/resctrl: Introduce the interface to display monitor mode Babu Moger
                   ` (18 subsequent siblings)
  24 siblings, 1 reply; 96+ messages in thread
From: Babu Moger @ 2024-09-04 22:21 UTC (permalink / raw)
  To: corbet, fenghua.yu, reinette.chatre, tglx, mingo, bp, dave.hansen
  Cc: x86, hpa, paulmck, rdunlap, tj, peterz, yanjiewtw, babu.moger,
	kim.phillips, lukas.bulwahn, seanjc, jmattson, leitao, jpoimboe,
	rick.p.edgecombe, kirill.shutemov, jithu.joseph, kai.huang,
	kan.liang, daniel.sneddon, pbonzini, sandipan.das, ilpo.jarvinen,
	peternewman, maciej.wieczor-retman, linux-doc, linux-kernel,
	eranian, james.morse

Add the functionality to enable/disable AMD ABMC feature.

AMD ABMC feature is enabled by setting enabled bit(0) in MSR
L3_QOS_EXT_CFG.  When the state of ABMC is changed, the MSR needs
to be updated on all the logical processors in the QOS Domain.

Hardware counters will reset when ABMC state is changed. Reset the
architectural state maintained by resctrl so that reading of a hardware
counter is not considered as an overflow in next update.

The ABMC feature details are documented in APM listed below [1].
[1] AMD64 Architecture Programmer's Manual Volume 2: System Programming
Publication # 24593 Revision 3.41 section 19.3.3.3 Assignable Bandwidth
Monitoring (ABMC).

Link: https://bugzilla.kernel.org/show_bug.cgi?id=206537
Signed-off-by: Babu Moger <babu.moger@amd.com>
---
v7:
  Renamed the function
   resctrl_arch_get_abmc_enabled() to resctrl_arch_mbm_cntr_assign_enabled().

  Merged resctrl_arch_mbm_cntr_assign_disable, resctrl_arch_mbm_cntr_assign_disable
  and renamed to resctrl_arch_mbm_cntr_assign_set().

  Moved the function definition to linux/resctrl.h.

  Passed the struct rdt_resource to these functions.
  Removed resctrl_arch_reset_rmid_all() from arch code. This will be done
  from the caller.

v6: Renamed abmc_enabled to mbm_cntr_assign_enabled.
    Used msr_set_bit and msr_clear_bit for msr updates.
    Renamed resctrl_arch_abmc_enable() to resctrl_arch_mbm_cntr_assign_enable().
    Renamed resctrl_arch_abmc_disable() to resctrl_arch_mbm_cntr_assign_disable().
    Made _resctrl_abmc_enable to return void.

v5: Renamed resctrl_abmc_enable to resctrl_arch_abmc_enable.
    Renamed resctrl_abmc_disable to resctrl_arch_abmc_disable.
    Introduced resctrl_arch_get_abmc_enabled to get abmc state from
    non-arch code.
    Renamed resctrl_abmc_set_all to _resctrl_abmc_enable().
    Modified commit log to make it clear about AMD ABMC feature.

v3: No changes.

v2: Few text changes in commit message.
---
 arch/x86/include/asm/msr-index.h       |  1 +
 arch/x86/kernel/cpu/resctrl/core.c     |  5 ++++
 arch/x86/kernel/cpu/resctrl/internal.h |  5 ++++
 arch/x86/kernel/cpu/resctrl/rdtgroup.c | 35 ++++++++++++++++++++++++++
 include/linux/resctrl.h                |  3 +++
 5 files changed, 49 insertions(+)

diff --git a/arch/x86/include/asm/msr-index.h b/arch/x86/include/asm/msr-index.h
index 82c6a4d350e0..d86469bf5d41 100644
--- a/arch/x86/include/asm/msr-index.h
+++ b/arch/x86/include/asm/msr-index.h
@@ -1182,6 +1182,7 @@
 #define MSR_IA32_MBA_BW_BASE		0xc0000200
 #define MSR_IA32_SMBA_BW_BASE		0xc0000280
 #define MSR_IA32_EVT_CFG_BASE		0xc0000400
+#define MSR_IA32_L3_QOS_EXT_CFG		0xc00003ff
 
 /* MSR_IA32_VMX_MISC bits */
 #define MSR_IA32_VMX_MISC_INTEL_PT                 (1ULL << 14)
diff --git a/arch/x86/kernel/cpu/resctrl/core.c b/arch/x86/kernel/cpu/resctrl/core.c
index 186d8047578b..49d147e2e4e5 100644
--- a/arch/x86/kernel/cpu/resctrl/core.c
+++ b/arch/x86/kernel/cpu/resctrl/core.c
@@ -405,6 +405,11 @@ void rdt_ctrl_update(void *arg)
 	hw_res->msr_update(m);
 }
 
+bool resctrl_arch_mbm_cntr_assign_enabled(struct rdt_resource *r)
+{
+	return resctrl_to_arch_res(r)->mbm_cntr_assign_enabled;
+}
+
 /*
  * rdt_find_domain - Search for a domain id in a resource domain list.
  *
diff --git a/arch/x86/kernel/cpu/resctrl/internal.h b/arch/x86/kernel/cpu/resctrl/internal.h
index 2bd207624eec..a45ae410274c 100644
--- a/arch/x86/kernel/cpu/resctrl/internal.h
+++ b/arch/x86/kernel/cpu/resctrl/internal.h
@@ -56,6 +56,9 @@
 /* Max event bits supported */
 #define MAX_EVT_CONFIG_BITS		GENMASK(6, 0)
 
+/* Setting bit 0 in L3_QOS_EXT_CFG enables the ABMC feature. */
+#define ABMC_ENABLE_BIT			0
+
 /**
  * cpumask_any_housekeeping() - Choose any CPU in @mask, preferring those that
  *			        aren't marked nohz_full
@@ -477,6 +480,7 @@ struct rdt_parse_data {
  * @mbm_cfg_mask:	Bandwidth sources that can be tracked when Bandwidth
  *			Monitoring Event Configuration (BMEC) is supported.
  * @cdp_enabled:	CDP state of this resource
+ * @mbm_cntr_assign_enabled:	ABMC feature is enabled
  *
  * Members of this structure are either private to the architecture
  * e.g. mbm_width, or accessed via helpers that provide abstraction. e.g.
@@ -491,6 +495,7 @@ struct rdt_hw_resource {
 	unsigned int		mbm_width;
 	unsigned int		mbm_cfg_mask;
 	bool			cdp_enabled;
+	bool			mbm_cntr_assign_enabled;
 };
 
 static inline struct rdt_hw_resource *resctrl_to_arch_res(struct rdt_resource *r)
diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
index 7e76f8d839fc..0178555bf3f6 100644
--- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
+++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
@@ -2402,6 +2402,41 @@ int resctrl_arch_set_cdp_enabled(enum resctrl_res_level l, bool enable)
 	return 0;
 }
 
+/*
+ * Update L3_QOS_EXT_CFG MSR on all the CPUs associated with the resource.
+ */
+static void resctrl_abmc_set_one_amd(void *arg)
+{
+	bool *enable = arg;
+
+	if (*enable)
+		msr_set_bit(MSR_IA32_L3_QOS_EXT_CFG, ABMC_ENABLE_BIT);
+	else
+		msr_clear_bit(MSR_IA32_L3_QOS_EXT_CFG, ABMC_ENABLE_BIT);
+}
+
+static void _resctrl_abmc_enable(struct rdt_resource *r, bool enable)
+{
+	struct rdt_mon_domain *d;
+
+	list_for_each_entry(d, &r->mon_domains, hdr.list)
+		on_each_cpu_mask(&d->hdr.cpu_mask,
+				 resctrl_abmc_set_one_amd, &enable, 1);
+}
+
+int resctrl_arch_mbm_cntr_assign_set(struct rdt_resource *r, bool enable)
+{
+	struct rdt_hw_resource *hw_res = resctrl_to_arch_res(r);
+
+	if (r->mon.mbm_cntr_assignable &&
+	    hw_res->mbm_cntr_assign_enabled != enable) {
+		_resctrl_abmc_enable(r, enable);
+		hw_res->mbm_cntr_assign_enabled = enable;
+	}
+
+	return 0;
+}
+
 /*
  * We don't allow rdtgroup directories to be created anywhere
  * except the root directory. Thus when looking for the rdtgroup
diff --git a/include/linux/resctrl.h b/include/linux/resctrl.h
index 511cfce8fc21..f11d6fdfd977 100644
--- a/include/linux/resctrl.h
+++ b/include/linux/resctrl.h
@@ -355,4 +355,7 @@ void resctrl_arch_reset_rmid_all(struct rdt_resource *r, struct rdt_mon_domain *
 extern unsigned int resctrl_rmid_realloc_threshold;
 extern unsigned int resctrl_rmid_realloc_limit;
 
+int resctrl_arch_mbm_cntr_assign_set(struct rdt_resource *r, bool enable);
+bool resctrl_arch_mbm_cntr_assign_enabled(struct rdt_resource *r);
+
 #endif /* _RESCTRL_H */
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 96+ messages in thread

* [PATCH v7 07/24] x86/resctrl: Introduce the interface to display monitor mode
  2024-09-04 22:21 [PATCH v7 00/24] x86/resctrl : Support AMD Assignable Bandwidth Monitoring Counters (ABMC) Babu Moger
                   ` (5 preceding siblings ...)
  2024-09-04 22:21 ` [PATCH v7 06/24] x86/resctrl: Add support to enable/disable AMD ABMC feature Babu Moger
@ 2024-09-04 22:21 ` Babu Moger
  2024-09-19 16:28   ` Reinette Chatre
  2024-09-04 22:21 ` [PATCH v7 08/24] x86/resctrl: Introduce interface to display number of monitoring counters Babu Moger
                   ` (17 subsequent siblings)
  24 siblings, 1 reply; 96+ messages in thread
From: Babu Moger @ 2024-09-04 22:21 UTC (permalink / raw)
  To: corbet, fenghua.yu, reinette.chatre, tglx, mingo, bp, dave.hansen
  Cc: x86, hpa, paulmck, rdunlap, tj, peterz, yanjiewtw, babu.moger,
	kim.phillips, lukas.bulwahn, seanjc, jmattson, leitao, jpoimboe,
	rick.p.edgecombe, kirill.shutemov, jithu.joseph, kai.huang,
	kan.liang, daniel.sneddon, pbonzini, sandipan.das, ilpo.jarvinen,
	peternewman, maciej.wieczor-retman, linux-doc, linux-kernel,
	eranian, james.morse

Introduce the interface file "mbm_assign_mode" to list monitor modes
supported.

The "mbm_cntr_assign" mode provides the option to assign a hardware
counter to an RMID and monitor the bandwidth as long as it is assigned.

On AMD systems "mbm_cntr_assign" is backed by the ABMC (Assignable
Bandwidth Monitoring Counters) hardware feature. "mbm_cntr_assign" mode
is enabled by default when supported.

The "default" mode is the existing monitoring mode that works without the
explicit counter assignment, instead relying on dynamic counter assignment
by hardware that may result in hardware not dedicating a counter resulting
in monitoring data reads returning "Unavailable".

Provide an interface to display the monitor mode on the system.
$cat /sys/fs/resctrl/info/L3_MON/mbm_assign_mode
[mbm_cntr_assign]
default

Switching the mbm_assign_mode will reset all the MBM counters of all
resctrl groups.

Signed-off-by: Babu Moger <babu.moger@amd.com>
---
v7: Updated the descriptions/commit log in resctrl.rst to generic text.
    Thanks to James and Reinette.
    Rename mbm_mode to mbm_assign_mode.
    Introduced mutex lock in rdtgroup_mbm_mode_show().

v6: Added documentation for mbm_cntr_assign and legacy mode.
    Moved mbm_mode fflags initialization to static initialization.

v5: Changed interface name to mbm_mode.
    It will be always available even if ABMC feature is not supported.
    Added description in resctrl.rst about ABMC mode.
    Fixed display abmc and legacy consistantly.

v4: Fixed the checks for legacy and abmc mode. Default it ABMC.

v3: New patch to display ABMC capability.
---
 Documentation/arch/x86/resctrl.rst     | 33 ++++++++++++++++++++++++++
 arch/x86/kernel/cpu/resctrl/rdtgroup.c | 31 ++++++++++++++++++++++++
 2 files changed, 64 insertions(+)

diff --git a/Documentation/arch/x86/resctrl.rst b/Documentation/arch/x86/resctrl.rst
index 30586728a4cd..a7b17ad8acb9 100644
--- a/Documentation/arch/x86/resctrl.rst
+++ b/Documentation/arch/x86/resctrl.rst
@@ -257,6 +257,39 @@ with the following files:
 	    # cat /sys/fs/resctrl/info/L3_MON/mbm_local_bytes_config
 	    0=0x30;1=0x30;3=0x15;4=0x15
 
+"mbm_assign_mode":
+	Reports the list of monitoring modes supported. The enclosed brackets
+	indicate which feature is enabled.
+	::
+
+	  cat /sys/fs/resctrl/info/L3_MON/mbm_assign_mode
+	  [mbm_cntr_assign]
+	  default
+
+	"mbm_cntr_assign":
+
+	In mbm_cntr_assign mode user-space is able to specify which control
+	or monitor groups in resctrl should have a hardware counter assigned
+	using the 'mbm_control' file. The number of hardware counters available
+	is described in the 'num_mbm_cntrs' file. Changing to this mode will
+	cause all counters on a resource to reset.
+
+	The feature is needed on platforms which support more control and monitor
+	groups than hardware counters, meaning 'unassigned' control or monitor
+	groups will report 'Unavailable' or not count all the traffic in an
+	unpredictable way.
+
+	AMD Platforms with ABMC (Assignable Bandwidth Monitoring Counters) feature
+	enable this mode by default so that counters remain assigned even when the
+	corresponding RMID is not in use by any processor.
+
+	"default":
+
+	By default resctrl assumes each control and monitor group has a hardware counter.
+	Hardware without this property will still allow more control or monitor groups
+	than 'num_mbm_cntrs' to be created. Reading the mbm files may report 'Unavailable'
+	if there is no hardware resource assigned.
+
 "max_threshold_occupancy":
 		Read/write file provides the largest value (in
 		bytes) at which a previously used LLC_occupancy
diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
index 0178555bf3f6..dbc8c5e63213 100644
--- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
+++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
@@ -845,6 +845,30 @@ static int rdtgroup_rmid_show(struct kernfs_open_file *of,
 	return ret;
 }
 
+static int rdtgroup_mbm_assign_mode_show(struct kernfs_open_file *of,
+					 struct seq_file *s, void *v)
+{
+	struct rdt_resource *r = of->kn->parent->priv;
+
+	mutex_lock(&rdtgroup_mutex);
+
+	if (r->mon.mbm_cntr_assignable) {
+		if (resctrl_arch_mbm_cntr_assign_enabled(r)) {
+			seq_puts(s, "[mbm_cntr_assign]\n");
+			seq_puts(s, "default\n");
+		} else {
+			seq_puts(s, "mbm_cntr_assign\n");
+			seq_puts(s, "[default]\n");
+		}
+	} else {
+		seq_puts(s, "[default]\n");
+	}
+
+	mutex_unlock(&rdtgroup_mutex);
+
+	return 0;
+}
+
 #ifdef CONFIG_PROC_CPU_RESCTRL
 
 /*
@@ -1901,6 +1925,13 @@ static struct rftype res_common_files[] = {
 		.seq_show	= mbm_local_bytes_config_show,
 		.write		= mbm_local_bytes_config_write,
 	},
+	{
+		.name		= "mbm_assign_mode",
+		.mode		= 0444,
+		.kf_ops		= &rdtgroup_kf_single_ops,
+		.seq_show	= rdtgroup_mbm_assign_mode_show,
+		.fflags		= RFTYPE_MON_INFO,
+	},
 	{
 		.name		= "cpus",
 		.mode		= 0644,
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 96+ messages in thread

* [PATCH v7 08/24] x86/resctrl: Introduce interface to display number of monitoring counters
  2024-09-04 22:21 [PATCH v7 00/24] x86/resctrl : Support AMD Assignable Bandwidth Monitoring Counters (ABMC) Babu Moger
                   ` (6 preceding siblings ...)
  2024-09-04 22:21 ` [PATCH v7 07/24] x86/resctrl: Introduce the interface to display monitor mode Babu Moger
@ 2024-09-04 22:21 ` Babu Moger
  2024-09-19 16:32   ` Reinette Chatre
  2024-09-04 22:21 ` [PATCH v7 09/24] x86/resctrl: Introduce bitmap mbm_cntr_free_map to track assignable counters Babu Moger
                   ` (16 subsequent siblings)
  24 siblings, 1 reply; 96+ messages in thread
From: Babu Moger @ 2024-09-04 22:21 UTC (permalink / raw)
  To: corbet, fenghua.yu, reinette.chatre, tglx, mingo, bp, dave.hansen
  Cc: x86, hpa, paulmck, rdunlap, tj, peterz, yanjiewtw, babu.moger,
	kim.phillips, lukas.bulwahn, seanjc, jmattson, leitao, jpoimboe,
	rick.p.edgecombe, kirill.shutemov, jithu.joseph, kai.huang,
	kan.liang, daniel.sneddon, pbonzini, sandipan.das, ilpo.jarvinen,
	peternewman, maciej.wieczor-retman, linux-doc, linux-kernel,
	eranian, james.morse

The mbm_assign_cntr mode provides an option to the user to assign a
hardware counter to an RMID, event pair and monitor the bandwidth as
long as the counter is assigned. Number of assignments depend on number
of monitoring counters available.

Provide the interface to display the number of monitoring counters
supported.

Signed-off-by: Babu Moger <babu.moger@amd.com>
---
v7: Minor commit log text changes.

v6: No changes.

v5: Changed the display name from num_cntrs to num_mbm_cntrs.
    Updated the commit message.
    Moved the patch after mbm_mode is introduced.

v4: Changed the counter name to num_cntrs. And few text changes.

v3: Changed the field name to mbm_assign_cntrs.

v2: Changed the field name to mbm_assignable_counters from abmc_counte
---
 Documentation/arch/x86/resctrl.rst     |  3 +++
 arch/x86/kernel/cpu/resctrl/monitor.c  |  1 +
 arch/x86/kernel/cpu/resctrl/rdtgroup.c | 16 ++++++++++++++++
 3 files changed, 20 insertions(+)

diff --git a/Documentation/arch/x86/resctrl.rst b/Documentation/arch/x86/resctrl.rst
index a7b17ad8acb9..3e9302971faf 100644
--- a/Documentation/arch/x86/resctrl.rst
+++ b/Documentation/arch/x86/resctrl.rst
@@ -290,6 +290,9 @@ with the following files:
 	than 'num_mbm_cntrs' to be created. Reading the mbm files may report 'Unavailable'
 	if there is no hardware resource assigned.
 
+"num_mbm_cntrs":
+	The number of monitoring counters available for assignment.
+
 "max_threshold_occupancy":
 		Read/write file provides the largest value (in
 		bytes) at which a previously used LLC_occupancy
diff --git a/arch/x86/kernel/cpu/resctrl/monitor.c b/arch/x86/kernel/cpu/resctrl/monitor.c
index 71fab31e20da..e3e71843401a 100644
--- a/arch/x86/kernel/cpu/resctrl/monitor.c
+++ b/arch/x86/kernel/cpu/resctrl/monitor.c
@@ -1236,6 +1236,7 @@ int __init rdt_get_mon_l3_config(struct rdt_resource *r)
 			r->mon.mbm_cntr_assignable = true;
 			cpuid_count(0x80000020, 5, &eax, &ebx, &ecx, &edx);
 			r->mon.num_mbm_cntrs = (ebx & 0xFFFF) + 1;
+			resctrl_file_fflags_init("num_mbm_cntrs", RFTYPE_MON_INFO);
 		}
 	}
 
diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
index dbc8c5e63213..ba737890d5c2 100644
--- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
+++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
@@ -869,6 +869,16 @@ static int rdtgroup_mbm_assign_mode_show(struct kernfs_open_file *of,
 	return 0;
 }
 
+static int rdtgroup_num_mbm_cntrs_show(struct kernfs_open_file *of,
+				       struct seq_file *s, void *v)
+{
+	struct rdt_resource *r = of->kn->parent->priv;
+
+	seq_printf(s, "%d\n", r->mon.num_mbm_cntrs);
+
+	return 0;
+}
+
 #ifdef CONFIG_PROC_CPU_RESCTRL
 
 /*
@@ -1940,6 +1950,12 @@ static struct rftype res_common_files[] = {
 		.seq_show	= rdtgroup_cpus_show,
 		.fflags		= RFTYPE_BASE,
 	},
+	{
+		.name		= "num_mbm_cntrs",
+		.mode		= 0444,
+		.kf_ops		= &rdtgroup_kf_single_ops,
+		.seq_show	= rdtgroup_num_mbm_cntrs_show,
+	},
 	{
 		.name		= "cpus_list",
 		.mode		= 0644,
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 96+ messages in thread

* [PATCH v7 09/24] x86/resctrl: Introduce bitmap mbm_cntr_free_map to track assignable counters
  2024-09-04 22:21 [PATCH v7 00/24] x86/resctrl : Support AMD Assignable Bandwidth Monitoring Counters (ABMC) Babu Moger
                   ` (7 preceding siblings ...)
  2024-09-04 22:21 ` [PATCH v7 08/24] x86/resctrl: Introduce interface to display number of monitoring counters Babu Moger
@ 2024-09-04 22:21 ` Babu Moger
  2024-09-19 16:42   ` Reinette Chatre
  2024-09-24 16:25   ` Peter Newman
  2024-09-04 22:21 ` [PATCH v7 10/24] x86/resctrl: Introduce mbm_total_cfg and mbm_local_cfg in struct rdt_hw_mon_domain Babu Moger
                   ` (15 subsequent siblings)
  24 siblings, 2 replies; 96+ messages in thread
From: Babu Moger @ 2024-09-04 22:21 UTC (permalink / raw)
  To: corbet, fenghua.yu, reinette.chatre, tglx, mingo, bp, dave.hansen
  Cc: x86, hpa, paulmck, rdunlap, tj, peterz, yanjiewtw, babu.moger,
	kim.phillips, lukas.bulwahn, seanjc, jmattson, leitao, jpoimboe,
	rick.p.edgecombe, kirill.shutemov, jithu.joseph, kai.huang,
	kan.liang, daniel.sneddon, pbonzini, sandipan.das, ilpo.jarvinen,
	peternewman, maciej.wieczor-retman, linux-doc, linux-kernel,
	eranian, james.morse

Hardware provides a set of counters when mbm_cntr_assignable feature is
supported. These counters are used for assigning the events in resctrl
a group when the feature is enabled. The kernel must manage and track the
number of available counters.

Introduce mbm_cntr_free_map bitmap to track available counters and set
of routines to allocate and free the counters.

Signed-off-by: Babu Moger <babu.moger@amd.com>
---
v7: Removed the static allocation and now allocating bitmap mbm_cntr_free_map
    dynamically.
    Passed the struct rdt_resource mbm_cntr_alloc and mbm_cntr_free.
    Removed the reference of ABMC and changed it mbm_cntr_assign.
    Few other text changes.

v6: Removed the variable mbm_cntrs_free_map_len. This is not required.
    Removed the call mbm_cntrs_init() in arch code. This needs to be
    done at higher level.
    Used DECLARE_BITMAP to initialize mbm_cntrs_free_map.
    Moved all the counter interfaces mbm_cntr_alloc() and mbm_cntr_free()
    in here as part of separating arch and fs bits.

v5:
   Updated the comments and commit log.
   Few renames
    num_cntrs_free_map -> mbm_cntrs_free_map
    num_cntrs_init -> mbm_cntrs_init
    Added initialization in rdt_get_tree because the default ABMC
    enablement happens during the init.

v4: Changed the name to num_cntrs where applicable.
     Used bitmap apis.
     Added more comments for the globals.

v3: Changed the bitmap name to assign_cntrs_free_map. Removed abmc
     from the name.

v2: Changed the bitmap name to assignable_counter_free_map from
     abmc_counter_free_map.
---
 arch/x86/kernel/cpu/resctrl/core.c     |  2 +-
 arch/x86/kernel/cpu/resctrl/internal.h |  4 +++-
 arch/x86/kernel/cpu/resctrl/monitor.c  | 31 +++++++++++++++++++++++++-
 arch/x86/kernel/cpu/resctrl/rdtgroup.c | 19 ++++++++++++++++
 include/linux/resctrl.h                |  2 ++
 5 files changed, 55 insertions(+), 3 deletions(-)

diff --git a/arch/x86/kernel/cpu/resctrl/core.c b/arch/x86/kernel/cpu/resctrl/core.c
index 49d147e2e4e5..00ad00258df2 100644
--- a/arch/x86/kernel/cpu/resctrl/core.c
+++ b/arch/x86/kernel/cpu/resctrl/core.c
@@ -1140,7 +1140,7 @@ static void __exit resctrl_exit(void)
 	rdtgroup_exit();
 
 	if (r->mon_capable)
-		rdt_put_mon_l3_config();
+		rdt_put_mon_l3_config(r);
 }
 
 __exitcall(resctrl_exit);
diff --git a/arch/x86/kernel/cpu/resctrl/internal.h b/arch/x86/kernel/cpu/resctrl/internal.h
index a45ae410274c..99f9103a35ba 100644
--- a/arch/x86/kernel/cpu/resctrl/internal.h
+++ b/arch/x86/kernel/cpu/resctrl/internal.h
@@ -633,7 +633,7 @@ void closid_free(int closid);
 int alloc_rmid(u32 closid);
 void free_rmid(u32 closid, u32 rmid);
 int rdt_get_mon_l3_config(struct rdt_resource *r);
-void __exit rdt_put_mon_l3_config(void);
+void __exit rdt_put_mon_l3_config(struct rdt_resource *r);
 bool __init rdt_cpu_has(int flag);
 void mon_event_count(void *info);
 int rdtgroup_mondata_show(struct seq_file *m, void *arg);
@@ -654,6 +654,8 @@ void __check_limbo(struct rdt_mon_domain *d, bool force_free);
 void rdt_domain_reconfigure_cdp(struct rdt_resource *r);
 void __init resctrl_file_fflags_init(const char *config,
 				     unsigned long fflags);
+int mbm_cntr_alloc(struct rdt_resource *r);
+void mbm_cntr_free(struct rdt_resource *r, u32 cntr_id);
 void rdt_staged_configs_clear(void);
 bool closid_allocated(unsigned int closid);
 int resctrl_find_cleanest_closid(void);
diff --git a/arch/x86/kernel/cpu/resctrl/monitor.c b/arch/x86/kernel/cpu/resctrl/monitor.c
index e3e71843401a..f98cc5b9bebc 100644
--- a/arch/x86/kernel/cpu/resctrl/monitor.c
+++ b/arch/x86/kernel/cpu/resctrl/monitor.c
@@ -1175,6 +1175,30 @@ static __init int snc_get_config(void)
 	return ret;
 }
 
+/*
+ * Counter bitmap for tracking the available counters.
+ * 'mbm_cntr_assign' mode provides set of hardware counters for assigning
+ * RMID, event pair. Each RMID and event pair takes one hardware counter.
+ * Kernel needs to keep track of the number of available counters.
+ */
+static int mbm_cntrs_init(struct rdt_resource *r)
+{
+	if (r->mon.mbm_cntr_assignable) {
+		r->mon.mbm_cntr_free_map = bitmap_zalloc(r->mon.num_mbm_cntrs,
+							 GFP_KERNEL);
+		if (!r->mon.mbm_cntr_free_map)
+			return -ENOMEM;
+		bitmap_fill(r->mon.mbm_cntr_free_map, r->mon.num_mbm_cntrs);
+	}
+	return 0;
+}
+
+static void __exit mbm_cntrs_exit(struct rdt_resource *r)
+{
+	if (r->mon.mbm_cntr_assignable)
+		bitmap_free(r->mon.mbm_cntr_free_map);
+}
+
 int __init rdt_get_mon_l3_config(struct rdt_resource *r)
 {
 	unsigned int mbm_offset = boot_cpu_data.x86_cache_mbm_width_offset;
@@ -1240,6 +1264,10 @@ int __init rdt_get_mon_l3_config(struct rdt_resource *r)
 		}
 	}
 
+	ret = mbm_cntrs_init(r);
+	if (ret)
+		return ret;
+
 	l3_mon_evt_init(r);
 
 	r->mon_capable = true;
@@ -1247,9 +1275,10 @@ int __init rdt_get_mon_l3_config(struct rdt_resource *r)
 	return 0;
 }
 
-void __exit rdt_put_mon_l3_config(void)
+void __exit rdt_put_mon_l3_config(struct rdt_resource *r)
 {
 	dom_data_exit();
+	mbm_cntrs_exit(r);
 }
 
 void __init intel_rdt_mbm_apply_quirk(void)
diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
index ba737890d5c2..a51992984832 100644
--- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
+++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
@@ -185,6 +185,25 @@ bool closid_allocated(unsigned int closid)
 	return !test_bit(closid, &closid_free_map);
 }
 
+int mbm_cntr_alloc(struct rdt_resource *r)
+{
+	int cntr_id;
+
+	cntr_id = find_first_bit(r->mon.mbm_cntr_free_map,
+				 r->mon.num_mbm_cntrs);
+	if (cntr_id >= r->mon.num_mbm_cntrs)
+		return -ENOSPC;
+
+	__clear_bit(cntr_id, r->mon.mbm_cntr_free_map);
+
+	return cntr_id;
+}
+
+void mbm_cntr_free(struct rdt_resource *r, u32 cntr_id)
+{
+	__set_bit(cntr_id, r->mon.mbm_cntr_free_map);
+}
+
 /**
  * rdtgroup_mode_by_closid - Return mode of resource group with closid
  * @closid: closid if the resource group
diff --git a/include/linux/resctrl.h b/include/linux/resctrl.h
index f11d6fdfd977..aab22ff8e0c1 100644
--- a/include/linux/resctrl.h
+++ b/include/linux/resctrl.h
@@ -187,12 +187,14 @@ enum resctrl_scope {
  * @num_rmid:		Number of RMIDs available
  * @num_mbm_cntrs:	Number of assignable monitoring counters
  * @mbm_cntr_assignable:Is system capable of supporting monitor assignment?
+ * @mbm_cntr_free_map:	bitmap of number of assignable MBM counters
  * @evt_list:		List of monitoring events
  */
 struct resctrl_mon {
 	int			num_rmid;
 	int			num_mbm_cntrs;
 	bool			mbm_cntr_assignable;
+	unsigned long		*mbm_cntr_free_map;
 	struct list_head	evt_list;
 };
 
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 96+ messages in thread

* [PATCH v7 10/24] x86/resctrl: Introduce mbm_total_cfg and mbm_local_cfg in struct rdt_hw_mon_domain
  2024-09-04 22:21 [PATCH v7 00/24] x86/resctrl : Support AMD Assignable Bandwidth Monitoring Counters (ABMC) Babu Moger
                   ` (8 preceding siblings ...)
  2024-09-04 22:21 ` [PATCH v7 09/24] x86/resctrl: Introduce bitmap mbm_cntr_free_map to track assignable counters Babu Moger
@ 2024-09-04 22:21 ` Babu Moger
  2024-09-19 16:51   ` Reinette Chatre
  2024-09-04 22:21 ` [PATCH v7 11/24] x86/resctrl: Remove MSR reading of event configuration value Babu Moger
                   ` (14 subsequent siblings)
  24 siblings, 1 reply; 96+ messages in thread
From: Babu Moger @ 2024-09-04 22:21 UTC (permalink / raw)
  To: corbet, fenghua.yu, reinette.chatre, tglx, mingo, bp, dave.hansen
  Cc: x86, hpa, paulmck, rdunlap, tj, peterz, yanjiewtw, babu.moger,
	kim.phillips, lukas.bulwahn, seanjc, jmattson, leitao, jpoimboe,
	rick.p.edgecombe, kirill.shutemov, jithu.joseph, kai.huang,
	kan.liang, daniel.sneddon, pbonzini, sandipan.das, ilpo.jarvinen,
	peternewman, maciej.wieczor-retman, linux-doc, linux-kernel,
	eranian, james.morse

If the BMEC (Bandwidth Monitoring Event Configuration) feature is
supported, the bandwidth events can be configured to track specific
events. The event configuration is domain specific. ABMC (Assignable
Bandwidth Monitoring Counters) feature needs event configuration
information to assign hardware counter to an RMID. Event configurations
are not stored in resctrl but instead always read from or written to
hardware directly when prompted by user space.

Read the event configuration from the hardware during the domain
initialization. Save the configuration value in rdt_hw_mon_domain,
so it can be used for counter assignment.

Signed-off-by: Babu Moger <babu.moger@amd.com>
---
v7: Fixed initializing INVALID_CONFIG_VALUE to mbm_local_cfg in case of error.

v6: Renamed resctrl_arch_mbm_evt_config -> resctrl_mbm_evt_config_init
    Initialized value to INVALID_CONFIG_VALUE if it is not configurable.
    Minor commit message update.

v5: Exported mon_event_config_index_get.
    Renamed arch_domain_mbm_evt_config to resctrl_arch_mbm_evt_config.

v4: Read the configuration information from the hardware to initialize.
    Added few commit messages.
    Fixed the tab spaces.

v3: Minor changes related to rebase in mbm_config_write_domain.

v2: No changes.
---
 arch/x86/kernel/cpu/resctrl/core.c     |  2 ++
 arch/x86/kernel/cpu/resctrl/internal.h |  9 +++++++++
 arch/x86/kernel/cpu/resctrl/monitor.c  | 26 ++++++++++++++++++++++++++
 arch/x86/kernel/cpu/resctrl/rdtgroup.c |  4 +---
 4 files changed, 38 insertions(+), 3 deletions(-)

diff --git a/arch/x86/kernel/cpu/resctrl/core.c b/arch/x86/kernel/cpu/resctrl/core.c
index 00ad00258df2..2a4be004a2df 100644
--- a/arch/x86/kernel/cpu/resctrl/core.c
+++ b/arch/x86/kernel/cpu/resctrl/core.c
@@ -632,6 +632,8 @@ static void domain_add_cpu_mon(int cpu, struct rdt_resource *r)
 
 	arch_mon_domain_online(r, d);
 
+	resctrl_mbm_evt_config_init(hw_dom);
+
 	if (arch_domain_mbm_alloc(r->mon.num_rmid, hw_dom)) {
 		mon_domain_free(hw_dom);
 		return;
diff --git a/arch/x86/kernel/cpu/resctrl/internal.h b/arch/x86/kernel/cpu/resctrl/internal.h
index 99f9103a35ba..6107101f2d8a 100644
--- a/arch/x86/kernel/cpu/resctrl/internal.h
+++ b/arch/x86/kernel/cpu/resctrl/internal.h
@@ -56,6 +56,9 @@
 /* Max event bits supported */
 #define MAX_EVT_CONFIG_BITS		GENMASK(6, 0)
 
+#define INVALID_CONFIG_VALUE		U32_MAX
+#define INVALID_CONFIG_INDEX		UINT_MAX
+
 /* Setting bit 0 in L3_QOS_EXT_CFG enables the ABMC feature. */
 #define ABMC_ENABLE_BIT			0
 
@@ -401,6 +404,8 @@ struct rdt_hw_ctrl_domain {
  * @d_resctrl:	Properties exposed to the resctrl file system
  * @arch_mbm_total:	arch private state for MBM total bandwidth
  * @arch_mbm_local:	arch private state for MBM local bandwidth
+ * @mbm_total_cfg:	MBM total bandwidth configuration
+ * @mbm_local_cfg:	MBM local bandwidth configuration
  *
  * Members of this structure are accessed via helpers that provide abstraction.
  */
@@ -408,6 +413,8 @@ struct rdt_hw_mon_domain {
 	struct rdt_mon_domain		d_resctrl;
 	struct arch_mbm_state		*arch_mbm_total;
 	struct arch_mbm_state		*arch_mbm_local;
+	u32				mbm_total_cfg;
+	u32				mbm_local_cfg;
 };
 
 static inline struct rdt_hw_ctrl_domain *resctrl_to_arch_ctrl_dom(struct rdt_ctrl_domain *r)
@@ -656,6 +663,8 @@ void __init resctrl_file_fflags_init(const char *config,
 				     unsigned long fflags);
 int mbm_cntr_alloc(struct rdt_resource *r);
 void mbm_cntr_free(struct rdt_resource *r, u32 cntr_id);
+void resctrl_mbm_evt_config_init(struct rdt_hw_mon_domain *hw_dom);
+unsigned int mon_event_config_index_get(u32 evtid);
 void rdt_staged_configs_clear(void);
 bool closid_allocated(unsigned int closid);
 int resctrl_find_cleanest_closid(void);
diff --git a/arch/x86/kernel/cpu/resctrl/monitor.c b/arch/x86/kernel/cpu/resctrl/monitor.c
index f98cc5b9bebc..09b1d8bb0aa0 100644
--- a/arch/x86/kernel/cpu/resctrl/monitor.c
+++ b/arch/x86/kernel/cpu/resctrl/monitor.c
@@ -1275,6 +1275,32 @@ int __init rdt_get_mon_l3_config(struct rdt_resource *r)
 	return 0;
 }
 
+void resctrl_mbm_evt_config_init(struct rdt_hw_mon_domain *hw_dom)
+{
+	unsigned int index;
+	u64 msrval;
+
+	/*
+	 * Read the configuration registers QOS_EVT_CFG_n, where <n> is
+	 * the BMEC event number (EvtID).
+	 */
+	if (mbm_total_event.configurable) {
+		index = mon_event_config_index_get(QOS_L3_MBM_TOTAL_EVENT_ID);
+		rdmsrl(MSR_IA32_EVT_CFG_BASE + index, msrval);
+		hw_dom->mbm_total_cfg = msrval & MAX_EVT_CONFIG_BITS;
+	} else {
+		hw_dom->mbm_total_cfg = INVALID_CONFIG_VALUE;
+	}
+
+	if (mbm_local_event.configurable) {
+		index = mon_event_config_index_get(QOS_L3_MBM_LOCAL_EVENT_ID);
+		rdmsrl(MSR_IA32_EVT_CFG_BASE + index, msrval);
+		hw_dom->mbm_local_cfg = msrval & MAX_EVT_CONFIG_BITS;
+	} else {
+		hw_dom->mbm_local_cfg = INVALID_CONFIG_VALUE;
+	}
+}
+
 void __exit rdt_put_mon_l3_config(struct rdt_resource *r)
 {
 	dom_data_exit();
diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
index a51992984832..299722b3fd90 100644
--- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
+++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
@@ -1601,8 +1601,6 @@ struct mon_config_info {
 	u32 mon_config;
 };
 
-#define INVALID_CONFIG_INDEX   UINT_MAX
-
 /**
  * mon_event_config_index_get - get the hardware index for the
  *                              configurable event
@@ -1612,7 +1610,7 @@ struct mon_config_info {
  *         1 for evtid == QOS_L3_MBM_LOCAL_EVENT_ID
  *         INVALID_CONFIG_INDEX for invalid evtid
  */
-static inline unsigned int mon_event_config_index_get(u32 evtid)
+unsigned int mon_event_config_index_get(u32 evtid)
 {
 	switch (evtid) {
 	case QOS_L3_MBM_TOTAL_EVENT_ID:
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 96+ messages in thread

* [PATCH v7 11/24] x86/resctrl: Remove MSR reading of event configuration value
  2024-09-04 22:21 [PATCH v7 00/24] x86/resctrl : Support AMD Assignable Bandwidth Monitoring Counters (ABMC) Babu Moger
                   ` (9 preceding siblings ...)
  2024-09-04 22:21 ` [PATCH v7 10/24] x86/resctrl: Introduce mbm_total_cfg and mbm_local_cfg in struct rdt_hw_mon_domain Babu Moger
@ 2024-09-04 22:21 ` Babu Moger
  2024-09-19 16:55   ` Reinette Chatre
  2024-09-04 22:21 ` [PATCH v7 12/24] x86/resctrl: Introduce mbm_cntr_map to track counters at domain Babu Moger
                   ` (13 subsequent siblings)
  24 siblings, 1 reply; 96+ messages in thread
From: Babu Moger @ 2024-09-04 22:21 UTC (permalink / raw)
  To: corbet, fenghua.yu, reinette.chatre, tglx, mingo, bp, dave.hansen
  Cc: x86, hpa, paulmck, rdunlap, tj, peterz, yanjiewtw, babu.moger,
	kim.phillips, lukas.bulwahn, seanjc, jmattson, leitao, jpoimboe,
	rick.p.edgecombe, kirill.shutemov, jithu.joseph, kai.huang,
	kan.liang, daniel.sneddon, pbonzini, sandipan.das, ilpo.jarvinen,
	peternewman, maciej.wieczor-retman, linux-doc, linux-kernel,
	eranian, james.morse

The event configuration is domain specific and initialized during domain
initialization. The values are stored in struct rdt_hw_mon_domain.

It is not required to read the configuration register every time user asks
for it. Use the value stored in struct rdt_hw_mon_domain instead.

Introduce resctrl_arch_event_config_get() and
resctrl_arch_event_config_set() to get/set architecture domain specific
mbm_total_cfg/mbm_local_cfg values.

Signed-off-by: Babu Moger <babu.moger@amd.com>
---
v7: Removed check if (val == INVALID_CONFIG_VALUE) as resctrl_arch_event_config_get
    already prints warning.
    Kept the Event config value definitions as is.

v6: Fixed inconstancy with types. Made all the types to u32 for config
    value.
    Removed few rdt_last_cmd_puts as it is not necessary.
    Removed unused config value definitions.
    Few more updates to commit message.

v5: Introduced resctrl_arch_event_config_get and
    resctrl_arch_event_config_get() based on our discussion.
    https://lore.kernel.org/lkml/68e861f9-245d-4496-a72e-46fc57d19c62@amd.com/

v4: New patch.
---
 arch/x86/kernel/cpu/resctrl/rdtgroup.c | 103 ++++++++++++++-----------
 include/linux/resctrl.h                |   4 +
 2 files changed, 62 insertions(+), 45 deletions(-)

diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
index 299722b3fd90..cc101fbe8683 100644
--- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
+++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
@@ -1597,10 +1597,57 @@ static int rdtgroup_size_show(struct kernfs_open_file *of,
 }
 
 struct mon_config_info {
+	struct rdt_mon_domain *d;
 	u32 evtid;
 	u32 mon_config;
 };
 
+u32 resctrl_arch_event_config_get(struct rdt_mon_domain *d,
+				  enum resctrl_event_id eventid)
+{
+	struct rdt_hw_mon_domain *hw_dom = resctrl_to_arch_mon_dom(d);
+
+	switch (eventid) {
+	case QOS_L3_OCCUP_EVENT_ID:
+		break;
+	case QOS_L3_MBM_TOTAL_EVENT_ID:
+		return hw_dom->mbm_total_cfg;
+	case QOS_L3_MBM_LOCAL_EVENT_ID:
+		return hw_dom->mbm_local_cfg;
+	}
+
+	/* Never expect to get here */
+	WARN_ON_ONCE(1);
+
+	return INVALID_CONFIG_VALUE;
+}
+
+void resctrl_arch_event_config_set(void *info)
+{
+	struct mon_config_info *mon_info = info;
+	struct rdt_hw_mon_domain *hw_dom;
+	unsigned int index;
+
+	index = mon_event_config_index_get(mon_info->evtid);
+	if (index == INVALID_CONFIG_INDEX)
+		return;
+
+	wrmsr(MSR_IA32_EVT_CFG_BASE + index, mon_info->mon_config, 0);
+
+	hw_dom = resctrl_to_arch_mon_dom(mon_info->d);
+
+	switch (mon_info->evtid) {
+	case QOS_L3_OCCUP_EVENT_ID:
+		break;
+	case QOS_L3_MBM_TOTAL_EVENT_ID:
+		hw_dom->mbm_total_cfg = mon_info->mon_config;
+		break;
+	case QOS_L3_MBM_LOCAL_EVENT_ID:
+		hw_dom->mbm_local_cfg =  mon_info->mon_config;
+		break;
+	}
+}
+
 /**
  * mon_event_config_index_get - get the hardware index for the
  *                              configurable event
@@ -1623,33 +1670,11 @@ unsigned int mon_event_config_index_get(u32 evtid)
 	}
 }
 
-static void mon_event_config_read(void *info)
-{
-	struct mon_config_info *mon_info = info;
-	unsigned int index;
-	u64 msrval;
-
-	index = mon_event_config_index_get(mon_info->evtid);
-	if (index == INVALID_CONFIG_INDEX) {
-		pr_warn_once("Invalid event id %d\n", mon_info->evtid);
-		return;
-	}
-	rdmsrl(MSR_IA32_EVT_CFG_BASE + index, msrval);
-
-	/* Report only the valid event configuration bits */
-	mon_info->mon_config = msrval & MAX_EVT_CONFIG_BITS;
-}
-
-static void mondata_config_read(struct rdt_mon_domain *d, struct mon_config_info *mon_info)
-{
-	smp_call_function_any(&d->hdr.cpu_mask, mon_event_config_read, mon_info, 1);
-}
-
 static int mbm_config_show(struct seq_file *s, struct rdt_resource *r, u32 evtid)
 {
-	struct mon_config_info mon_info = {0};
 	struct rdt_mon_domain *dom;
 	bool sep = false;
+	u32 val;
 
 	cpus_read_lock();
 	mutex_lock(&rdtgroup_mutex);
@@ -1658,11 +1683,8 @@ static int mbm_config_show(struct seq_file *s, struct rdt_resource *r, u32 evtid
 		if (sep)
 			seq_puts(s, ";");
 
-		memset(&mon_info, 0, sizeof(struct mon_config_info));
-		mon_info.evtid = evtid;
-		mondata_config_read(dom, &mon_info);
-
-		seq_printf(s, "%d=0x%02x", dom->hdr.id, mon_info.mon_config);
+		val = resctrl_arch_event_config_get(dom, evtid);
+		seq_printf(s, "%d=0x%02x", dom->hdr.id, val);
 		sep = true;
 	}
 	seq_puts(s, "\n");
@@ -1693,33 +1715,23 @@ static int mbm_local_bytes_config_show(struct kernfs_open_file *of,
 	return 0;
 }
 
-static void mon_event_config_write(void *info)
-{
-	struct mon_config_info *mon_info = info;
-	unsigned int index;
-
-	index = mon_event_config_index_get(mon_info->evtid);
-	if (index == INVALID_CONFIG_INDEX) {
-		pr_warn_once("Invalid event id %d\n", mon_info->evtid);
-		return;
-	}
-	wrmsr(MSR_IA32_EVT_CFG_BASE + index, mon_info->mon_config, 0);
-}
 
 static void mbm_config_write_domain(struct rdt_resource *r,
 				    struct rdt_mon_domain *d, u32 evtid, u32 val)
 {
 	struct mon_config_info mon_info = {0};
+	u32 config_val;
 
 	/*
-	 * Read the current config value first. If both are the same then
+	 * Check the current config value first. If both are the same then
 	 * no need to write it again.
 	 */
-	mon_info.evtid = evtid;
-	mondata_config_read(d, &mon_info);
-	if (mon_info.mon_config == val)
+	config_val = resctrl_arch_event_config_get(d, evtid);
+	if (config_val == INVALID_CONFIG_VALUE || config_val == val)
 		return;
 
+	mon_info.d = d;
+	mon_info.evtid = evtid;
 	mon_info.mon_config = val;
 
 	/*
@@ -1728,7 +1740,8 @@ static void mbm_config_write_domain(struct rdt_resource *r,
 	 * are scoped at the domain level. Writing any of these MSRs
 	 * on one CPU is observed by all the CPUs in the domain.
 	 */
-	smp_call_function_any(&d->hdr.cpu_mask, mon_event_config_write,
+	smp_call_function_any(&d->hdr.cpu_mask,
+			      resctrl_arch_event_config_set,
 			      &mon_info, 1);
 
 	/*
diff --git a/include/linux/resctrl.h b/include/linux/resctrl.h
index aab22ff8e0c1..757708cf5d3c 100644
--- a/include/linux/resctrl.h
+++ b/include/linux/resctrl.h
@@ -354,6 +354,10 @@ void resctrl_arch_reset_rmid(struct rdt_resource *r, struct rdt_mon_domain *d,
  */
 void resctrl_arch_reset_rmid_all(struct rdt_resource *r, struct rdt_mon_domain *d);
 
+void resctrl_arch_event_config_set(void *info);
+u32 resctrl_arch_event_config_get(struct rdt_mon_domain *d,
+				  enum resctrl_event_id eventid);
+
 extern unsigned int resctrl_rmid_realloc_threshold;
 extern unsigned int resctrl_rmid_realloc_limit;
 
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 96+ messages in thread

* [PATCH v7 12/24] x86/resctrl: Introduce mbm_cntr_map to track counters at domain
  2024-09-04 22:21 [PATCH v7 00/24] x86/resctrl : Support AMD Assignable Bandwidth Monitoring Counters (ABMC) Babu Moger
                   ` (10 preceding siblings ...)
  2024-09-04 22:21 ` [PATCH v7 11/24] x86/resctrl: Remove MSR reading of event configuration value Babu Moger
@ 2024-09-04 22:21 ` Babu Moger
  2024-09-04 22:21 ` [PATCH v7 13/24] x86/resctrl: Add data structures and definitions for ABMC assignment Babu Moger
                   ` (12 subsequent siblings)
  24 siblings, 0 replies; 96+ messages in thread
From: Babu Moger @ 2024-09-04 22:21 UTC (permalink / raw)
  To: corbet, fenghua.yu, reinette.chatre, tglx, mingo, bp, dave.hansen
  Cc: x86, hpa, paulmck, rdunlap, tj, peterz, yanjiewtw, babu.moger,
	kim.phillips, lukas.bulwahn, seanjc, jmattson, leitao, jpoimboe,
	rick.p.edgecombe, kirill.shutemov, jithu.joseph, kai.huang,
	kan.liang, daniel.sneddon, pbonzini, sandipan.das, ilpo.jarvinen,
	peternewman, maciej.wieczor-retman, linux-doc, linux-kernel,
	eranian, james.morse

The MBM counters are allocated globally and assigned to RMID, event pair
in a resctrl group. It is tracked by mbm_cntr_free_map. Then it is
assigned to the domain based on the user input. It needs to be tracked
at domain level also.

Add the mbm_cntr_map bitmap in struct rdt_mon_domain to keep track of
assignment at domain level. The global counter at mbm_cntr_free_map can
be released when assignment at all the domains are cleared.

Signed-off-by: Babu Moger <babu.moger@amd.com>
---
v7: Added check mbm_cntr_assignable for allocating bitmap mbm_cntr_map

v6: New patch to add domain level assignment.
---
 arch/x86/kernel/cpu/resctrl/rdtgroup.c | 10 ++++++++++
 include/linux/resctrl.h                |  2 ++
 2 files changed, 12 insertions(+)

diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
index cc101fbe8683..a014d5f4c0b3 100644
--- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
+++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
@@ -4092,6 +4092,7 @@ static void __init rdtgroup_setup_default(void)
 
 static void domain_destroy_mon_state(struct rdt_mon_domain *d)
 {
+	bitmap_free(d->mbm_cntr_map);
 	bitmap_free(d->rmid_busy_llc);
 	kfree(d->mbm_total);
 	kfree(d->mbm_local);
@@ -4165,6 +4166,15 @@ static int domain_setup_mon_state(struct rdt_resource *r, struct rdt_mon_domain
 			return -ENOMEM;
 		}
 	}
+	if (is_mbm_enabled() && r->mon.mbm_cntr_assignable) {
+		d->mbm_cntr_map = bitmap_zalloc(r->mon.num_mbm_cntrs, GFP_KERNEL);
+		if (!d->mbm_cntr_map) {
+			bitmap_free(d->rmid_busy_llc);
+			kfree(d->mbm_total);
+			kfree(d->mbm_local);
+			return -ENOMEM;
+		}
+	}
 
 	return 0;
 }
diff --git a/include/linux/resctrl.h b/include/linux/resctrl.h
index 757708cf5d3c..882a6ec55f27 100644
--- a/include/linux/resctrl.h
+++ b/include/linux/resctrl.h
@@ -105,6 +105,7 @@ struct rdt_ctrl_domain {
  * @cqm_limbo:		worker to periodically read CQM h/w counters
  * @mbm_work_cpu:	worker CPU for MBM h/w counters
  * @cqm_work_cpu:	worker CPU for CQM h/w counters
+ * @mbm_cntr_map:	bitmap to track domain counter assignment
  */
 struct rdt_mon_domain {
 	struct rdt_domain_hdr		hdr;
@@ -116,6 +117,7 @@ struct rdt_mon_domain {
 	struct delayed_work		cqm_limbo;
 	int				mbm_work_cpu;
 	int				cqm_work_cpu;
+	unsigned long			*mbm_cntr_map;
 };
 
 /**
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 96+ messages in thread

* [PATCH v7 13/24] x86/resctrl: Add data structures and definitions for ABMC assignment
  2024-09-04 22:21 [PATCH v7 00/24] x86/resctrl : Support AMD Assignable Bandwidth Monitoring Counters (ABMC) Babu Moger
                   ` (11 preceding siblings ...)
  2024-09-04 22:21 ` [PATCH v7 12/24] x86/resctrl: Introduce mbm_cntr_map to track counters at domain Babu Moger
@ 2024-09-04 22:21 ` Babu Moger
  2024-09-19 17:08   ` Reinette Chatre
  2024-09-04 22:21 ` [PATCH v7 14/24] x86/resctrl: Introduce cntr_id in mongroup for assignments Babu Moger
                   ` (11 subsequent siblings)
  24 siblings, 1 reply; 96+ messages in thread
From: Babu Moger @ 2024-09-04 22:21 UTC (permalink / raw)
  To: corbet, fenghua.yu, reinette.chatre, tglx, mingo, bp, dave.hansen
  Cc: x86, hpa, paulmck, rdunlap, tj, peterz, yanjiewtw, babu.moger,
	kim.phillips, lukas.bulwahn, seanjc, jmattson, leitao, jpoimboe,
	rick.p.edgecombe, kirill.shutemov, jithu.joseph, kai.huang,
	kan.liang, daniel.sneddon, pbonzini, sandipan.das, ilpo.jarvinen,
	peternewman, maciej.wieczor-retman, linux-doc, linux-kernel,
	eranian, james.morse

The ABMC feature provides an option to the user to assign a hardware
counter to an RMID, event pair and monitor the bandwidth as long as the
counter is assigned. The bandwidth events will be tracked by the hardware
until the user changes the configuration. Each resctrl group can configure
maximum two counters, one for total event and one for local event.

The ABMC feature implements an MSR L3_QOS_ABMC_CFG (C000_03FDh).
Configuration is done by setting the counter id, bandwidth source (RMID)
and bandwidth configuration supported by BMEC (Bandwidth Monitoring Event
Configuration).

Attempts to read or write the MSR when ABMC is not enabled will result
in a #GP(0) exception.

Introduce the data structures and definitions for MSR L3_QOS_ABMC_CFG
(0xC000_03FDh):
=========================================================================
Bits 	Mnemonic	Description			Access Reset
							Type   Value
=========================================================================
63 	CfgEn 		Configuration Enable 		R/W 	0

62 	CtrEn 		Enable/disable Tracking		R/W 	0

61:53 	– 		Reserved 			MBZ 	0

52:48 	CtrID 		Counter Identifier		R/W	0

47 	IsCOS		BwSrc field is a CLOSID		R/W	0
			(not an RMID)

46:44 	–		Reserved			MBZ	0

43:32	BwSrc		Bandwidth Source		R/W	0
			(RMID or CLOSID)

31:0	BwType		Bandwidth configuration		R/W	0
			to track for this counter
==========================================================================

The feature details are documented in the APM listed below [1].
[1] AMD64 Architecture Programmer's Manual Volume 2: System Programming
Publication # 24593 Revision 3.41 section 19.3.3.3 Assignable Bandwidth
Monitoring (ABMC).

Link: https://bugzilla.kernel.org/show_bug.cgi?id=206537
Signed-off-by: Babu Moger <babu.moger@amd.com>
---
v7: Removed the reference of L3_QOS_ABMC_DSC as it is not used anymore.
    Moved the configuration notes to kernel_doc.
    Adjusted the tabs for l3_qos_abmc_cfg and checkpatch seems happy.

v6: Removed all the fs related changes.
    Added note on CfgEn,CtrEn.
    Removed the definitions which are not used.
    Removed cntr_id initialization.

v5: Moved assignment flags here (path 10/19 of v4).
    Added MON_CNTR_UNSET definition to initialize cntr_id's.
    More details in commit log.
    Renamed few fields in l3_qos_abmc_cfg for readability.

v4: Added more descriptions.
    Changed the name abmc_ctr_id to ctr_id.
    Added L3_QOS_ABMC_DSC. Used for reading the configuration.

v3: No changes.

v2: No changes.
---
 arch/x86/include/asm/msr-index.h       |  1 +
 arch/x86/kernel/cpu/resctrl/internal.h | 30 ++++++++++++++++++++++++++
 2 files changed, 31 insertions(+)

diff --git a/arch/x86/include/asm/msr-index.h b/arch/x86/include/asm/msr-index.h
index d86469bf5d41..dd988a082fa8 100644
--- a/arch/x86/include/asm/msr-index.h
+++ b/arch/x86/include/asm/msr-index.h
@@ -1183,6 +1183,7 @@
 #define MSR_IA32_SMBA_BW_BASE		0xc0000280
 #define MSR_IA32_EVT_CFG_BASE		0xc0000400
 #define MSR_IA32_L3_QOS_EXT_CFG		0xc00003ff
+#define MSR_IA32_L3_QOS_ABMC_CFG	0xc00003fd
 
 /* MSR_IA32_VMX_MISC bits */
 #define MSR_IA32_VMX_MISC_INTEL_PT                 (1ULL << 14)
diff --git a/arch/x86/kernel/cpu/resctrl/internal.h b/arch/x86/kernel/cpu/resctrl/internal.h
index 6107101f2d8a..27617fe592ed 100644
--- a/arch/x86/kernel/cpu/resctrl/internal.h
+++ b/arch/x86/kernel/cpu/resctrl/internal.h
@@ -602,6 +602,36 @@ union cpuid_0x10_x_edx {
 	unsigned int full;
 };
 
+/*
+ * ABMC counters can be configured by writing to L3_QOS_ABMC_CFG.
+ * @bw_type		: Bandwidth configuration(supported by BMEC)
+ *			  tracked by the @cntr_id.
+ * @bw_src		: Bandwidth source (RMID or CLOSID).
+ * @reserved1		: Reserved.
+ * @is_clos		: @bw_src field is a CLOSID (not an RMID).
+ * @cntr_id		: Counter identifier.
+ * @reserved		: Reserved.
+ * @cntr_en		: Tracking enable bit.
+ * @cfg_en		: Configuration enable bit.
+ *
+ * Configuration and tracking:
+ * CfgEn=1,CtrEn=0 : Configure CtrID and but no tracking the events yet.
+ * CfgEn=1,CtrEn=1 : Configure CtrID and start tracking events.
+ */
+union l3_qos_abmc_cfg {
+	struct {
+		unsigned long bw_type  :32,
+			      bw_src   :12,
+			      reserved1: 3,
+			      is_clos  : 1,
+			      cntr_id  : 5,
+			      reserved : 9,
+			      cntr_en  : 1,
+			      cfg_en   : 1;
+	} split;
+	unsigned long full;
+};
+
 void rdt_last_cmd_clear(void);
 void rdt_last_cmd_puts(const char *s);
 __printf(1, 2)
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 96+ messages in thread

* [PATCH v7 14/24] x86/resctrl: Introduce cntr_id in mongroup for assignments
  2024-09-04 22:21 [PATCH v7 00/24] x86/resctrl : Support AMD Assignable Bandwidth Monitoring Counters (ABMC) Babu Moger
                   ` (12 preceding siblings ...)
  2024-09-04 22:21 ` [PATCH v7 13/24] x86/resctrl: Add data structures and definitions for ABMC assignment Babu Moger
@ 2024-09-04 22:21 ` Babu Moger
  2024-09-04 22:21 ` [PATCH v7 15/24] x86/resctrl: Implement resctrl_arch_assign_cntr to assign a counter with ABMC Babu Moger
                   ` (10 subsequent siblings)
  24 siblings, 0 replies; 96+ messages in thread
From: Babu Moger @ 2024-09-04 22:21 UTC (permalink / raw)
  To: corbet, fenghua.yu, reinette.chatre, tglx, mingo, bp, dave.hansen
  Cc: x86, hpa, paulmck, rdunlap, tj, peterz, yanjiewtw, babu.moger,
	kim.phillips, lukas.bulwahn, seanjc, jmattson, leitao, jpoimboe,
	rick.p.edgecombe, kirill.shutemov, jithu.joseph, kai.huang,
	kan.liang, daniel.sneddon, pbonzini, sandipan.das, ilpo.jarvinen,
	peternewman, maciej.wieczor-retman, linux-doc, linux-kernel,
	eranian, james.morse

mbm_cntr_assign feature provides an option to the user to assign a
hardware counter to an RMID and monitor the bandwidth as long as the
counter is assigned. There can be two counters per monitor group, one
for total event and another for local event.

Introduce cntr_id to manage the assignments.

Signed-off-by: Babu Moger <babu.moger@amd.com>
---
v7: Minor comment update for cntr_id.

v6: New patch.
    Separated FS and arch bits.
---
 arch/x86/kernel/cpu/resctrl/internal.h | 7 +++++++
 arch/x86/kernel/cpu/resctrl/rdtgroup.c | 6 ++++++
 2 files changed, 13 insertions(+)

diff --git a/arch/x86/kernel/cpu/resctrl/internal.h b/arch/x86/kernel/cpu/resctrl/internal.h
index 27617fe592ed..e0ae8b0b45b2 100644
--- a/arch/x86/kernel/cpu/resctrl/internal.h
+++ b/arch/x86/kernel/cpu/resctrl/internal.h
@@ -62,6 +62,11 @@
 /* Setting bit 0 in L3_QOS_EXT_CFG enables the ABMC feature. */
 #define ABMC_ENABLE_BIT			0
 
+/* Maximum assignable counters per resctrl group */
+#define MAX_CNTRS			2
+
+#define MON_CNTR_UNSET			U32_MAX
+
 /**
  * cpumask_any_housekeeping() - Choose any CPU in @mask, preferring those that
  *			        aren't marked nohz_full
@@ -231,12 +236,14 @@ enum rdtgrp_mode {
  * @parent:			parent rdtgrp
  * @crdtgrp_list:		child rdtgroup node list
  * @rmid:			rmid for this rdtgroup
+ * @cntr_id:			IDs of hardware counters assigned to monitor group
  */
 struct mongroup {
 	struct kernfs_node	*mon_data_kn;
 	struct rdtgroup		*parent;
 	struct list_head	crdtgrp_list;
 	u32			rmid;
+	u32			cntr_id[MAX_CNTRS];
 };
 
 /**
diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
index a014d5f4c0b3..7fa92143daa7 100644
--- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
+++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
@@ -3529,6 +3529,9 @@ static int mkdir_rdt_prepare_rmid_alloc(struct rdtgroup *rdtgrp)
 	}
 	rdtgrp->mon.rmid = ret;
 
+	rdtgrp->mon.cntr_id[0] = MON_CNTR_UNSET;
+	rdtgrp->mon.cntr_id[1] = MON_CNTR_UNSET;
+
 	ret = mkdir_mondata_all(rdtgrp->kn, rdtgrp, &rdtgrp->mon.mon_data_kn);
 	if (ret) {
 		rdt_last_cmd_puts("kernfs subdir error\n");
@@ -4083,6 +4086,9 @@ static void __init rdtgroup_setup_default(void)
 	rdtgroup_default.closid = RESCTRL_RESERVED_CLOSID;
 	rdtgroup_default.mon.rmid = RESCTRL_RESERVED_RMID;
 	rdtgroup_default.type = RDTCTRL_GROUP;
+	rdtgroup_default.mon.cntr_id[0] = MON_CNTR_UNSET;
+	rdtgroup_default.mon.cntr_id[1] = MON_CNTR_UNSET;
+
 	INIT_LIST_HEAD(&rdtgroup_default.mon.crdtgrp_list);
 
 	list_add(&rdtgroup_default.rdtgroup_list, &rdt_all_groups);
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 96+ messages in thread

* [PATCH v7 15/24] x86/resctrl: Implement resctrl_arch_assign_cntr to assign a counter with ABMC
  2024-09-04 22:21 [PATCH v7 00/24] x86/resctrl : Support AMD Assignable Bandwidth Monitoring Counters (ABMC) Babu Moger
                   ` (13 preceding siblings ...)
  2024-09-04 22:21 ` [PATCH v7 14/24] x86/resctrl: Introduce cntr_id in mongroup for assignments Babu Moger
@ 2024-09-04 22:21 ` Babu Moger
  2024-09-19 17:13   ` Reinette Chatre
  2024-09-04 22:21 ` [PATCH v7 16/24] x86/resctrl: Add the interface to assign/update counter assignment Babu Moger
                   ` (9 subsequent siblings)
  24 siblings, 1 reply; 96+ messages in thread
From: Babu Moger @ 2024-09-04 22:21 UTC (permalink / raw)
  To: corbet, fenghua.yu, reinette.chatre, tglx, mingo, bp, dave.hansen
  Cc: x86, hpa, paulmck, rdunlap, tj, peterz, yanjiewtw, babu.moger,
	kim.phillips, lukas.bulwahn, seanjc, jmattson, leitao, jpoimboe,
	rick.p.edgecombe, kirill.shutemov, jithu.joseph, kai.huang,
	kan.liang, daniel.sneddon, pbonzini, sandipan.das, ilpo.jarvinen,
	peternewman, maciej.wieczor-retman, linux-doc, linux-kernel,
	eranian, james.morse

The ABMC feature provides an option to the user to assign a hardware
counter to an RMID, event pair and monitor the bandwidth as long as it is
assigned. The assigned RMID will be tracked by the hardware until the user
unassigns it manually.

Counters are configured by writing to L3_QOS_ABMC_CFG MSR and
specifying the counter id, bandwidth source, and bandwidth types.

Provide the interface to assign the counter ids to RMID.

The feature details are documented in the APM listed below [1].
[1] AMD64 Architecture Programmer's Manual Volume 2: System Programming
    Publication # 24593 Revision 3.41 section 19.3.3.3 Assignable Bandwidth
    Monitoring (ABMC).

Link: https://bugzilla.kernel.org/show_bug.cgi?id=206537
Signed-off-by: Babu Moger <babu.moger@amd.com>
---
v7: Separated arch and fs functions. This patch only has arch implementation.
    Added struct rdt_resource to the interface resctrl_arch_assign_cntr.
    Rename rdtgroup_abmc_cfg() to resctrl_abmc_config_one_amd().

v6: Removed mbm_cntr_alloc() from this patch to keep fs and arch code
    separate.
    Added code to update the counter assignment at domain level.

v5: Few name changes to match cntr_id.
    Changed the function names to
      rdtgroup_assign_cntr
      resctr_arch_assign_cntr
      More comments on commit log.
      Added function summary.

v4: Commit message update.
      User bitmap APIs where applicable.
      Changed the interfaces considering MPAM(arm).
      Added domain specific assignment.

v3: Removed the static from the prototype of rdtgroup_assign_abmc.
      The function is not called directly from user anymore. These
      changes are related to global assignment interface.

v2: Minor text changes in commit message.
---
 arch/x86/kernel/cpu/resctrl/internal.h |  3 ++
 arch/x86/kernel/cpu/resctrl/rdtgroup.c | 45 ++++++++++++++++++++++++++
 2 files changed, 48 insertions(+)

diff --git a/arch/x86/kernel/cpu/resctrl/internal.h b/arch/x86/kernel/cpu/resctrl/internal.h
index e0ae8b0b45b2..57c31615eae7 100644
--- a/arch/x86/kernel/cpu/resctrl/internal.h
+++ b/arch/x86/kernel/cpu/resctrl/internal.h
@@ -702,6 +702,9 @@ int mbm_cntr_alloc(struct rdt_resource *r);
 void mbm_cntr_free(struct rdt_resource *r, u32 cntr_id);
 void resctrl_mbm_evt_config_init(struct rdt_hw_mon_domain *hw_dom);
 unsigned int mon_event_config_index_get(u32 evtid);
+int resctrl_arch_assign_cntr(struct rdt_resource *r, struct rdt_mon_domain *d,
+			     enum resctrl_event_id evtid, u32 rmid, u32 closid,
+			     u32 cntr_id, bool assign);
 void rdt_staged_configs_clear(void);
 bool closid_allocated(unsigned int closid);
 int resctrl_find_cleanest_closid(void);
diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
index 7fa92143daa7..7ad653b4e768 100644
--- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
+++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
@@ -1853,6 +1853,51 @@ static ssize_t mbm_local_bytes_config_write(struct kernfs_open_file *of,
 	return ret ?: nbytes;
 }
 
+static void resctrl_abmc_config_one_amd(void *info)
+{
+	u64 *msrval = info;
+
+	wrmsrl(MSR_IA32_L3_QOS_ABMC_CFG, *msrval);
+}
+
+/*
+ * Send an IPI to the domain to assign the counter to RMID, event pair.
+ */
+int resctrl_arch_assign_cntr(struct rdt_resource *r, struct rdt_mon_domain *d,
+			     enum resctrl_event_id evtid, u32 rmid, u32 closid,
+			     u32 cntr_id, bool assign)
+{
+	struct rdt_hw_mon_domain *hw_dom = resctrl_to_arch_mon_dom(d);
+	union l3_qos_abmc_cfg abmc_cfg = { 0 };
+	struct arch_mbm_state *arch_mbm;
+
+	abmc_cfg.split.cfg_en = 1;
+	abmc_cfg.split.cntr_en = assign ? 1 : 0;
+	abmc_cfg.split.cntr_id = cntr_id;
+	abmc_cfg.split.bw_src = rmid;
+
+	/* Update the event configuration from the domain */
+	if (evtid == QOS_L3_MBM_TOTAL_EVENT_ID) {
+		abmc_cfg.split.bw_type = hw_dom->mbm_total_cfg;
+		arch_mbm = &hw_dom->arch_mbm_total[rmid];
+	} else {
+		abmc_cfg.split.bw_type = hw_dom->mbm_local_cfg;
+		arch_mbm = &hw_dom->arch_mbm_local[rmid];
+	}
+
+	smp_call_function_any(&d->hdr.cpu_mask, resctrl_abmc_config_one_amd,
+			      &abmc_cfg, 1);
+
+	/*
+	 * Reset the architectural state so that reading of hardware
+	 * counter is not considered as an overflow in next update.
+	 */
+	if (arch_mbm)
+		memset(arch_mbm, 0, sizeof(struct arch_mbm_state));
+
+	return 0;
+}
+
 /* rdtgroup information files for one cache resource. */
 static struct rftype res_common_files[] = {
 	{
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 96+ messages in thread

* [PATCH v7 16/24] x86/resctrl: Add the interface to assign/update counter assignment
  2024-09-04 22:21 [PATCH v7 00/24] x86/resctrl : Support AMD Assignable Bandwidth Monitoring Counters (ABMC) Babu Moger
                   ` (14 preceding siblings ...)
  2024-09-04 22:21 ` [PATCH v7 15/24] x86/resctrl: Implement resctrl_arch_assign_cntr to assign a counter with ABMC Babu Moger
@ 2024-09-04 22:21 ` Babu Moger
  2024-09-19 17:20   ` Reinette Chatre
  2024-09-04 22:21 ` [PATCH v7 17/24] x86/resctrl: Add the interface to unassign a MBM counter Babu Moger
                   ` (8 subsequent siblings)
  24 siblings, 1 reply; 96+ messages in thread
From: Babu Moger @ 2024-09-04 22:21 UTC (permalink / raw)
  To: corbet, fenghua.yu, reinette.chatre, tglx, mingo, bp, dave.hansen
  Cc: x86, hpa, paulmck, rdunlap, tj, peterz, yanjiewtw, babu.moger,
	kim.phillips, lukas.bulwahn, seanjc, jmattson, leitao, jpoimboe,
	rick.p.edgecombe, kirill.shutemov, jithu.joseph, kai.huang,
	kan.liang, daniel.sneddon, pbonzini, sandipan.das, ilpo.jarvinen,
	peternewman, maciej.wieczor-retman, linux-doc, linux-kernel,
	eranian, james.morse

The mbm_cntr_assign mode offers several hardware counters that can be
assigned to an RMID-event pair and monitor the bandwidth as long as it
is assigned.

Counters are managed at two levels. The global assignment is tracked
using the mbm_cntr_free_map field in the struct resctrl_mon, while
domain-specific assignments are tracked using the mbm_cntr_map field
in the struct rdt_mon_domain. Allocation begins at the global level
and is then applied individually to each domain.

Introduce an interface to allocate these counters and update the
corresponding domains accordingly.

Signed-off-by: Babu Moger <babu.moger@amd.com>
---
v7: New patch. Moved all the FS code here.
    Merged rdtgroup_assign_cntr and rdtgroup_alloc_cntr.
    Adde new #define MBM_EVENT_ARRAY_INDEX.
---
 arch/x86/kernel/cpu/resctrl/internal.h |  2 ++
 arch/x86/kernel/cpu/resctrl/rdtgroup.c | 46 ++++++++++++++++++++++++++
 2 files changed, 48 insertions(+)

diff --git a/arch/x86/kernel/cpu/resctrl/internal.h b/arch/x86/kernel/cpu/resctrl/internal.h
index 57c31615eae7..6a90fc20be5b 100644
--- a/arch/x86/kernel/cpu/resctrl/internal.h
+++ b/arch/x86/kernel/cpu/resctrl/internal.h
@@ -705,6 +705,8 @@ unsigned int mon_event_config_index_get(u32 evtid);
 int resctrl_arch_assign_cntr(struct rdt_resource *r, struct rdt_mon_domain *d,
 			     enum resctrl_event_id evtid, u32 rmid, u32 closid,
 			     u32 cntr_id, bool assign);
+int rdtgroup_assign_cntr(struct rdt_resource *r, struct rdtgroup *rdtgrp,
+			 struct rdt_mon_domain *d, enum resctrl_event_id evtid);
 void rdt_staged_configs_clear(void);
 bool closid_allocated(unsigned int closid);
 int resctrl_find_cleanest_closid(void);
diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
index 7ad653b4e768..1d45120ff2b5 100644
--- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
+++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
@@ -864,6 +864,13 @@ static int rdtgroup_rmid_show(struct kernfs_open_file *of,
 	return ret;
 }
 
+/*
+ * Get the counter index for the assignable counter
+ * 0 for evtid == QOS_L3_MBM_TOTAL_EVENT_ID
+ * 1 for evtid == QOS_L3_MBM_LOCAL_EVENT_ID
+ */
+#define MBM_EVENT_ARRAY_INDEX(_event) ((_event) - 2)
+
 static int rdtgroup_mbm_assign_mode_show(struct kernfs_open_file *of,
 					 struct seq_file *s, void *v)
 {
@@ -1898,6 +1905,45 @@ int resctrl_arch_assign_cntr(struct rdt_resource *r, struct rdt_mon_domain *d,
 	return 0;
 }
 
+/*
+ * Assign a hardware counter to the group.
+ * Counter will be assigned to all the domains if rdt_mon_domain is NULL
+ * else the counter will be allocated to specific domain.
+ */
+int rdtgroup_assign_cntr(struct rdt_resource *r, struct rdtgroup *rdtgrp,
+			 struct rdt_mon_domain *d, enum resctrl_event_id evtid)
+{
+	int index = MBM_EVENT_ARRAY_INDEX(evtid);
+	int cntr_id = rdtgrp->mon.cntr_id[index];
+
+	/*
+	 * Allocate a new counter id to the group if the counter id is not
+	 * is not assigned already.
+	 */
+	if (cntr_id == MON_CNTR_UNSET) {
+		cntr_id = mbm_cntr_alloc(r);
+		if (cntr_id < 0) {
+			rdt_last_cmd_puts("Out of MBM assignable counters\n");
+			return -ENOSPC;
+		}
+		rdtgrp->mon.cntr_id[index] = cntr_id;
+	}
+
+	if (!d) {
+		list_for_each_entry(d, &r->mon_domains, hdr.list) {
+			resctrl_arch_assign_cntr(r, d, evtid, rdtgrp->mon.rmid,
+						 rdtgrp->closid, cntr_id, true);
+			set_bit(cntr_id, d->mbm_cntr_map);
+		}
+	} else {
+		resctrl_arch_assign_cntr(r, d, evtid, rdtgrp->mon.rmid,
+					 rdtgrp->closid, cntr_id, true);
+		set_bit(cntr_id, d->mbm_cntr_map);
+	}
+
+	return 0;
+}
+
 /* rdtgroup information files for one cache resource. */
 static struct rftype res_common_files[] = {
 	{
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 96+ messages in thread

* [PATCH v7 17/24] x86/resctrl: Add the interface to unassign a MBM counter
  2024-09-04 22:21 [PATCH v7 00/24] x86/resctrl : Support AMD Assignable Bandwidth Monitoring Counters (ABMC) Babu Moger
                   ` (15 preceding siblings ...)
  2024-09-04 22:21 ` [PATCH v7 16/24] x86/resctrl: Add the interface to assign/update counter assignment Babu Moger
@ 2024-09-04 22:21 ` Babu Moger
  2024-09-19 17:26   ` Reinette Chatre
  2024-09-04 22:21 ` [PATCH v7 18/24] x86/resctrl: Auto Assign/unassign counters when mbm_cntr_assign is enabled Babu Moger
                   ` (7 subsequent siblings)
  24 siblings, 1 reply; 96+ messages in thread
From: Babu Moger @ 2024-09-04 22:21 UTC (permalink / raw)
  To: corbet, fenghua.yu, reinette.chatre, tglx, mingo, bp, dave.hansen
  Cc: x86, hpa, paulmck, rdunlap, tj, peterz, yanjiewtw, babu.moger,
	kim.phillips, lukas.bulwahn, seanjc, jmattson, leitao, jpoimboe,
	rick.p.edgecombe, kirill.shutemov, jithu.joseph, kai.huang,
	kan.liang, daniel.sneddon, pbonzini, sandipan.das, ilpo.jarvinen,
	peternewman, maciej.wieczor-retman, linux-doc, linux-kernel,
	eranian, james.morse

The mbm_cntr_assign mode provides a limited number of hardware counters
that can be assigned to an RMID-event pair to monitor bandwidth while
assigned. If all counters are in use, the kernel will show an error
message: "Out of MBM assignable counters" when a new assignment is
requested. To make space for a new assignment, users must unassign an
already assigned counter.

Introduce an interface that allows for the unassignment of counter IDs
from both the group and the domain. Additionally, ensure that the global
counter is released if it is no longer assigned to any domains.

Signed-off-by: Babu Moger <babu.moger@amd.com>
---
v7: Merged rdtgroup_unassign_cntr and rdtgroup_free_cntr functions.
    Renamed rdtgroup_mbm_cntr_test() to rdtgroup_mbm_cntr_is_assigned().
    Reworded the commit log little bit.

v6: Removed mbm_cntr_free from this patch.
    Added counter test in all the domains and free if it is not assigned to
    any domains.

v5: Few name changes to match cntr_id.
    Changed the function names to rdtgroup_unassign_cntr
    More comments on commit log.

v4: Added domain specific unassign feature.
    Few name changes.

v3: Removed the static from the prototype of rdtgroup_unassign_abmc.
    The function is not called directly from user anymore. These
    changes are related to global assignment interface.

v2: No changes.
---
 arch/x86/kernel/cpu/resctrl/internal.h |  2 ++
 arch/x86/kernel/cpu/resctrl/rdtgroup.c | 49 ++++++++++++++++++++++++++
 2 files changed, 51 insertions(+)

diff --git a/arch/x86/kernel/cpu/resctrl/internal.h b/arch/x86/kernel/cpu/resctrl/internal.h
index 6a90fc20be5b..9a65a13ccbe9 100644
--- a/arch/x86/kernel/cpu/resctrl/internal.h
+++ b/arch/x86/kernel/cpu/resctrl/internal.h
@@ -707,6 +707,8 @@ int resctrl_arch_assign_cntr(struct rdt_resource *r, struct rdt_mon_domain *d,
 			     u32 cntr_id, bool assign);
 int rdtgroup_assign_cntr(struct rdt_resource *r, struct rdtgroup *rdtgrp,
 			 struct rdt_mon_domain *d, enum resctrl_event_id evtid);
+int rdtgroup_unassign_cntr(struct rdt_resource *r, struct rdtgroup *rdtgrp,
+			   struct rdt_mon_domain *d, enum resctrl_event_id evtid);
 void rdt_staged_configs_clear(void);
 bool closid_allocated(unsigned int closid);
 int resctrl_find_cleanest_closid(void);
diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
index 1d45120ff2b5..21b9ca4ce493 100644
--- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
+++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
@@ -1944,6 +1944,55 @@ int rdtgroup_assign_cntr(struct rdt_resource *r, struct rdtgroup *rdtgrp,
 	return 0;
 }
 
+static int rdtgroup_mbm_cntr_is_assigned(struct rdt_resource *r, u32 cntr_id)
+{
+	struct rdt_mon_domain *d;
+
+	list_for_each_entry(d, &r->mon_domains, hdr.list)
+		if (test_bit(cntr_id, d->mbm_cntr_map))
+			return 1;
+
+	return 0;
+}
+
+/*
+ * Unassign a hardware counter from the domain and the group. Global
+ * counter will be freed once it is unassigned from all the domains.
+ */
+int rdtgroup_unassign_cntr(struct rdt_resource *r, struct rdtgroup *rdtgrp,
+			   struct rdt_mon_domain *d,
+			   enum resctrl_event_id evtid)
+{
+	int index = MBM_EVENT_ARRAY_INDEX(evtid);
+	int cntr_id = rdtgrp->mon.cntr_id[index];
+
+	if (cntr_id != MON_CNTR_UNSET) {
+		if (!d) {
+			list_for_each_entry(d, &r->mon_domains, hdr.list) {
+				resctrl_arch_assign_cntr(r, d, evtid,
+							 rdtgrp->mon.rmid,
+							 rdtgrp->closid,
+							 cntr_id, false);
+				clear_bit(cntr_id, d->mbm_cntr_map);
+			}
+		} else {
+			resctrl_arch_assign_cntr(r, d, evtid,
+						 rdtgrp->mon.rmid,
+						 rdtgrp->closid,
+						 cntr_id, false);
+			clear_bit(cntr_id, d->mbm_cntr_map);
+		}
+
+		/* Update the counter bitmap */
+		if (!rdtgroup_mbm_cntr_is_assigned(r, cntr_id)) {
+			mbm_cntr_free(r, cntr_id);
+			rdtgrp->mon.cntr_id[index] = MON_CNTR_UNSET;
+		}
+	}
+
+	return 0;
+}
+
 /* rdtgroup information files for one cache resource. */
 static struct rftype res_common_files[] = {
 	{
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 96+ messages in thread

* [PATCH v7 18/24] x86/resctrl: Auto Assign/unassign counters when mbm_cntr_assign is enabled
  2024-09-04 22:21 [PATCH v7 00/24] x86/resctrl : Support AMD Assignable Bandwidth Monitoring Counters (ABMC) Babu Moger
                   ` (16 preceding siblings ...)
  2024-09-04 22:21 ` [PATCH v7 17/24] x86/resctrl: Add the interface to unassign a MBM counter Babu Moger
@ 2024-09-04 22:21 ` Babu Moger
  2024-09-19 17:29   ` Reinette Chatre
  2024-09-04 22:21 ` [PATCH v7 19/24] x86/resctrl: Report "Unassigned" for MBM events in mbm_cntr_assign mode Babu Moger
                   ` (6 subsequent siblings)
  24 siblings, 1 reply; 96+ messages in thread
From: Babu Moger @ 2024-09-04 22:21 UTC (permalink / raw)
  To: corbet, fenghua.yu, reinette.chatre, tglx, mingo, bp, dave.hansen
  Cc: x86, hpa, paulmck, rdunlap, tj, peterz, yanjiewtw, babu.moger,
	kim.phillips, lukas.bulwahn, seanjc, jmattson, leitao, jpoimboe,
	rick.p.edgecombe, kirill.shutemov, jithu.joseph, kai.huang,
	kan.liang, daniel.sneddon, pbonzini, sandipan.das, ilpo.jarvinen,
	peternewman, maciej.wieczor-retman, linux-doc, linux-kernel,
	eranian, james.morse

Assign/unassign counters on resctrl group creation/deletion. Two counters
are required per group, one for MBM total event and one for MBM local
event.

There are a limited number of counters available for assignment. If these
counters are exhausted, the kernel will display the error message: "Out of
MBM assignable counters". However, it is not necessary to fail the
creation of a group due to assignment failures. Users have the flexibility
to modify the assignments at a later time.

Signed-off-by: Babu Moger <babu.moger@amd.com>
---
v7: Reworded the commit message.
    Removed the reference of ABMC with mbm_cntr_assign.
    Renamed the function rdtgroup_assign_cntrs to rdtgroup_assign_grp.

v6: Removed the redundant comments on all the calls of
    rdtgroup_assign_cntrs. Updated the commit message.
    Dropped printing error message on every call of rdtgroup_assign_cntrs.

v5: Removed the code to enable/disable ABMC during the mount.
    That will be another patch.
    Added arch callers to get the arch specific data.
    Renamed fuctions to match the other abmc function.
    Added code comments for assignment failures.

v4: Few name changes based on the upstream discussion.
    Commit message update.

v3: This is a new patch. Patch addresses the upstream comment to enable
    ABMC feature by default if the feature is available.
---
 arch/x86/kernel/cpu/resctrl/rdtgroup.c | 61 ++++++++++++++++++++++++++
 1 file changed, 61 insertions(+)

diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
index 21b9ca4ce493..bf94e4e05540 100644
--- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
+++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
@@ -2866,6 +2866,52 @@ static void schemata_list_destroy(void)
 	}
 }
 
+/*
+ * Called when a new group is created. If `mbm_cntr_assign` mode is enabled,
+ * counters are automatically assigned. Each group requires two counters:
+ * one for the total event and one for the local event. Due to the limited
+ * number of counters, assignments may fail in some cases. However, it is
+ * not necessary to fail the group creation. Users have the option to
+ * modify the assignments after the group has been created.
+ */
+static int rdtgroup_assign_grp(struct rdtgroup *rdtgrp)
+{
+	struct rdt_resource *r = &rdt_resources_all[RDT_RESOURCE_L3].r_resctrl;
+	int ret = 0;
+
+	if (!resctrl_arch_mbm_cntr_assign_enabled(r))
+		return 0;
+
+	if (is_mbm_total_enabled())
+		ret = rdtgroup_assign_cntr(r, rdtgrp, NULL, QOS_L3_MBM_TOTAL_EVENT_ID);
+
+	if (!ret && is_mbm_local_enabled())
+		ret = rdtgroup_assign_cntr(r, rdtgrp, NULL, QOS_L3_MBM_LOCAL_EVENT_ID);
+
+	return ret;
+}
+
+/*
+ * Called when a group is deleted. Counters are unassigned if it was in
+ * assigned state.
+ */
+static int rdtgroup_unassign_grp(struct rdtgroup *rdtgrp)
+{
+	struct rdt_resource *r = &rdt_resources_all[RDT_RESOURCE_L3].r_resctrl;
+	int ret = 0;
+
+	if (!resctrl_arch_mbm_cntr_assign_enabled(r))
+		return 0;
+
+	if (is_mbm_total_enabled())
+		ret = rdtgroup_unassign_cntr(r, rdtgrp, NULL, QOS_L3_MBM_TOTAL_EVENT_ID);
+
+	if (!ret && is_mbm_local_enabled())
+		ret = rdtgroup_unassign_cntr(r, rdtgrp, NULL, QOS_L3_MBM_LOCAL_EVENT_ID);
+
+	return ret;
+}
+
 static int rdt_get_tree(struct fs_context *fc)
 {
 	struct rdt_fs_context *ctx = rdt_fc2context(fc);
@@ -2925,6 +2971,8 @@ static int rdt_get_tree(struct fs_context *fc)
 		if (ret < 0)
 			goto out_mongrp;
 		rdtgroup_default.mon.mon_data_kn = kn_mondata;
+
+		rdtgroup_assign_grp(&rdtgroup_default);
 	}
 
 	ret = rdt_pseudo_lock_init();
@@ -2955,6 +3003,7 @@ static int rdt_get_tree(struct fs_context *fc)
 out_psl:
 	rdt_pseudo_lock_release();
 out_mondata:
+	rdtgroup_unassign_grp(&rdtgroup_default);
 	if (resctrl_arch_mon_capable())
 		kernfs_remove(kn_mondata);
 out_mongrp:
@@ -3214,6 +3263,8 @@ static void rdt_kill_sb(struct super_block *sb)
 		resctrl_arch_disable_alloc();
 	if (resctrl_arch_mon_capable())
 		resctrl_arch_disable_mon();
+
+	rdtgroup_unassign_grp(&rdtgroup_default);
 	resctrl_mounted = false;
 	kernfs_kill_sb(sb);
 	mutex_unlock(&rdtgroup_mutex);
@@ -3805,6 +3856,8 @@ static int rdtgroup_mkdir_mon(struct kernfs_node *parent_kn,
 		goto out_unlock;
 	}
 
+	rdtgroup_assign_grp(rdtgrp);
+
 	kernfs_activate(rdtgrp->kn);
 
 	/*
@@ -3849,6 +3902,8 @@ static int rdtgroup_mkdir_ctrl_mon(struct kernfs_node *parent_kn,
 	if (ret)
 		goto out_closid_free;
 
+	rdtgroup_assign_grp(rdtgrp);
+
 	kernfs_activate(rdtgrp->kn);
 
 	ret = rdtgroup_init_alloc(rdtgrp);
@@ -3874,6 +3929,7 @@ static int rdtgroup_mkdir_ctrl_mon(struct kernfs_node *parent_kn,
 out_del_list:
 	list_del(&rdtgrp->rdtgroup_list);
 out_rmid_free:
+	rdtgroup_unassign_grp(rdtgrp);
 	mkdir_rdt_prepare_rmid_free(rdtgrp);
 out_closid_free:
 	closid_free(closid);
@@ -3944,6 +4000,9 @@ static int rdtgroup_rmdir_mon(struct rdtgroup *rdtgrp, cpumask_var_t tmpmask)
 	update_closid_rmid(tmpmask, NULL);
 
 	rdtgrp->flags = RDT_DELETED;
+
+	rdtgroup_unassign_grp(rdtgrp);
+
 	free_rmid(rdtgrp->closid, rdtgrp->mon.rmid);
 
 	/*
@@ -3990,6 +4049,8 @@ static int rdtgroup_rmdir_ctrl(struct rdtgroup *rdtgrp, cpumask_var_t tmpmask)
 	cpumask_or(tmpmask, tmpmask, &rdtgrp->cpu_mask);
 	update_closid_rmid(tmpmask, NULL);
 
+	rdtgroup_unassign_grp(rdtgrp);
+
 	free_rmid(rdtgrp->closid, rdtgrp->mon.rmid);
 	closid_free(rdtgrp->closid);
 
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 96+ messages in thread

* [PATCH v7 19/24] x86/resctrl: Report "Unassigned" for MBM events in mbm_cntr_assign mode
  2024-09-04 22:21 [PATCH v7 00/24] x86/resctrl : Support AMD Assignable Bandwidth Monitoring Counters (ABMC) Babu Moger
                   ` (17 preceding siblings ...)
  2024-09-04 22:21 ` [PATCH v7 18/24] x86/resctrl: Auto Assign/unassign counters when mbm_cntr_assign is enabled Babu Moger
@ 2024-09-04 22:21 ` Babu Moger
  2024-09-19 17:31   ` Reinette Chatre
  2024-09-04 22:21 ` [PATCH v7 20/24] x86/resctrl: Introduce the interface to switch between monitor modes Babu Moger
                   ` (5 subsequent siblings)
  24 siblings, 1 reply; 96+ messages in thread
From: Babu Moger @ 2024-09-04 22:21 UTC (permalink / raw)
  To: corbet, fenghua.yu, reinette.chatre, tglx, mingo, bp, dave.hansen
  Cc: x86, hpa, paulmck, rdunlap, tj, peterz, yanjiewtw, babu.moger,
	kim.phillips, lukas.bulwahn, seanjc, jmattson, leitao, jpoimboe,
	rick.p.edgecombe, kirill.shutemov, jithu.joseph, kai.huang,
	kan.liang, daniel.sneddon, pbonzini, sandipan.das, ilpo.jarvinen,
	peternewman, maciej.wieczor-retman, linux-doc, linux-kernel,
	eranian, james.morse

In mbm_cntr_assign mode, the hardware counter should be assigned to read
the MBM events.

Report "Unassigned" in case the user attempts to read the events without
assigning the counter.

Signed-off-by: Babu Moger <babu.moger@amd.com>
---
v7: Moved the documentation under "mon_data".
    Updated the text little bit.

v6: Added more explaination in the resctrl.rst
    Added checks to detect "Unassigned" before reading RMID.

v5: New patch.
---
 Documentation/arch/x86/resctrl.rst        | 10 ++++++++++
 arch/x86/kernel/cpu/resctrl/ctrlmondata.c | 13 ++++++++++++-
 2 files changed, 22 insertions(+), 1 deletion(-)

diff --git a/Documentation/arch/x86/resctrl.rst b/Documentation/arch/x86/resctrl.rst
index 3e9302971faf..ff5397d19704 100644
--- a/Documentation/arch/x86/resctrl.rst
+++ b/Documentation/arch/x86/resctrl.rst
@@ -417,6 +417,16 @@ When monitoring is enabled all MON groups will also contain:
 	for the L3 cache they occupy). These are named "mon_sub_L3_YY"
 	where "YY" is the node number.
 
+	The mbm_cntr_assign mode allows users to assign a hardware counter
+	to an RMID-event pair, enabling bandwidth monitoring for as long
+	as the counter remains assigned. The hardware will continue tracking
+	the assigned RMID until the user manually unassigns it, ensuring
+	that counters are not reset during this period. With a limited number
+	of counters, the system may run out of assignable resources. In
+	mbm_cntr_assign mode, MBM event counters will return "Unassigned"
+	if the counter is not allocated to the event when read. Users must
+	manually assign a counter to read the events.
+
 "mon_hw_id":
 	Available only with debug option. The identifier used by hardware
 	for the monitor group. On x86 this is the RMID.
diff --git a/arch/x86/kernel/cpu/resctrl/ctrlmondata.c b/arch/x86/kernel/cpu/resctrl/ctrlmondata.c
index 50fa1fe9a073..fc19b1d131b2 100644
--- a/arch/x86/kernel/cpu/resctrl/ctrlmondata.c
+++ b/arch/x86/kernel/cpu/resctrl/ctrlmondata.c
@@ -562,7 +562,7 @@ int rdtgroup_mondata_show(struct seq_file *m, void *arg)
 	struct rdtgroup *rdtgrp;
 	struct rdt_resource *r;
 	union mon_data_bits md;
-	int ret = 0;
+	int ret = 0, index;
 
 	rdtgrp = rdtgroup_kn_lock_live(of->kn);
 	if (!rdtgrp) {
@@ -576,6 +576,15 @@ int rdtgroup_mondata_show(struct seq_file *m, void *arg)
 	evtid = md.u.evtid;
 	r = &rdt_resources_all[resid].r_resctrl;
 
+	if (resctrl_arch_mbm_cntr_assign_enabled(r) && evtid != QOS_L3_OCCUP_EVENT_ID) {
+		index = mon_event_config_index_get(evtid);
+		if (index != INVALID_CONFIG_INDEX &&
+		    rdtgrp->mon.cntr_id[index] == MON_CNTR_UNSET) {
+			rr.err = -ENOENT;
+			goto checkresult;
+		}
+	}
+
 	if (md.u.sum) {
 		/*
 		 * This file requires summing across all domains that share
@@ -613,6 +622,8 @@ int rdtgroup_mondata_show(struct seq_file *m, void *arg)
 		seq_puts(m, "Error\n");
 	else if (rr.err == -EINVAL)
 		seq_puts(m, "Unavailable\n");
+	else if (rr.err == -ENOENT)
+		seq_puts(m, "Unassigned\n");
 	else
 		seq_printf(m, "%llu\n", rr.val);
 
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 96+ messages in thread

* [PATCH v7 20/24] x86/resctrl: Introduce the interface to switch between monitor modes
  2024-09-04 22:21 [PATCH v7 00/24] x86/resctrl : Support AMD Assignable Bandwidth Monitoring Counters (ABMC) Babu Moger
                   ` (18 preceding siblings ...)
  2024-09-04 22:21 ` [PATCH v7 19/24] x86/resctrl: Report "Unassigned" for MBM events in mbm_cntr_assign mode Babu Moger
@ 2024-09-04 22:21 ` Babu Moger
  2024-09-19 17:38   ` Reinette Chatre
  2024-09-04 22:21 ` [PATCH v7 21/24] x86/resctrl: Configure mbm_cntr_assign mode if supported Babu Moger
                   ` (4 subsequent siblings)
  24 siblings, 1 reply; 96+ messages in thread
From: Babu Moger @ 2024-09-04 22:21 UTC (permalink / raw)
  To: corbet, fenghua.yu, reinette.chatre, tglx, mingo, bp, dave.hansen
  Cc: x86, hpa, paulmck, rdunlap, tj, peterz, yanjiewtw, babu.moger,
	kim.phillips, lukas.bulwahn, seanjc, jmattson, leitao, jpoimboe,
	rick.p.edgecombe, kirill.shutemov, jithu.joseph, kai.huang,
	kan.liang, daniel.sneddon, pbonzini, sandipan.das, ilpo.jarvinen,
	peternewman, maciej.wieczor-retman, linux-doc, linux-kernel,
	eranian, james.morse

Introduce interface to switch between mbm_cntr_assign and default modes.

$ cat /sys/fs/resctrl/info/L3_MON/mbm_assign_mode
[mbm_cntr_assign]
default

To enable the "mbm_cntr_assign" mode:
$ echo "mbm_cntr_assign" > /sys/fs/resctrl/info/L3_MON/mbm_assign_mode

To enable the default monitoring mode:
$ echo "default" > /sys/fs/resctrl/info/L3_MON/mbm_assign_mode

MBM event counters will reset when mbm_assign_mode is changed.

Signed-off-by: Babu Moger <babu.moger@amd.com>
---
v7: Changed the interface name to mbm_assign_mode.
    Removed the references of ABMC.
    Added the changes to reset global and domain bitmaps.
    Added the changes to reset rmid.

v6: Changed the mode name to mbm_cntr_assign.
    Moved all the FS related code here.
    Added changes to reset mbm_cntr_map and resctrl group counters.

v5: Change log and mode description text correction.

v4: Minor commit text changes. Keep the default to ABMC when supported.
    Fixed comments to reflect changed interface "mbm_mode".

v3: New patch to address the review comments from upstream.
---
 Documentation/arch/x86/resctrl.rst     | 15 ++++++
 arch/x86/kernel/cpu/resctrl/rdtgroup.c | 74 +++++++++++++++++++++++++-
 2 files changed, 88 insertions(+), 1 deletion(-)

diff --git a/Documentation/arch/x86/resctrl.rst b/Documentation/arch/x86/resctrl.rst
index ff5397d19704..743c0b64a330 100644
--- a/Documentation/arch/x86/resctrl.rst
+++ b/Documentation/arch/x86/resctrl.rst
@@ -290,6 +290,21 @@ with the following files:
 	than 'num_mbm_cntrs' to be created. Reading the mbm files may report 'Unavailable'
 	if there is no hardware resource assigned.
 
+	* To enable ABMC feature:
+	  ::
+
+	    # echo  "mbm_cntr_assign" > /sys/fs/resctrl/info/L3_MON/mbm_assign_mode
+
+	* To enable the legacy monitoring feature:
+	  ::
+
+	    # echo  "default" > /sys/fs/resctrl/info/L3_MON/mbm_assign_mode
+
+	The MBM event counters will reset when mbm_assign_mode is changed. Moving to
+	mbm_cntr_assign will require users to assign the counters to the events to
+	read the events. Otherwise, the MBM event counters will return "Unassigned"
+	when read.
+
 "num_mbm_cntrs":
 	The number of monitoring counters available for assignment.
 
diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
index bf94e4e05540..7a8ece12d7da 100644
--- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
+++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
@@ -895,6 +895,77 @@ static int rdtgroup_mbm_assign_mode_show(struct kernfs_open_file *of,
 	return 0;
 }
 
+static void rdtgroup_mbm_cntr_reset(struct rdt_resource *r)
+{
+	struct rdtgroup *prgrp, *crgrp;
+	struct rdt_mon_domain *dom;
+
+	/*
+	 * Hardware counters will reset after switching the monitor mode.
+	 * Reset the architectural state so that reading of hardware
+	 * counter is not considered as an overflow in the next update.
+	 * Also reset the domain counter bitmap.
+	 */
+	list_for_each_entry(dom, &r->mon_domains, hdr.list) {
+		bitmap_zero(dom->mbm_cntr_map, r->mon.num_mbm_cntrs);
+		resctrl_arch_reset_rmid_all(r, dom);
+	}
+
+	/* Reset global MBM counter map */
+	bitmap_fill(r->mon.mbm_cntr_free_map, r->mon.num_mbm_cntrs);
+
+	/* Reset the cntr_id's for all the monitor groups */
+	list_for_each_entry(prgrp, &rdt_all_groups, rdtgroup_list) {
+		prgrp->mon.cntr_id[0] = MON_CNTR_UNSET;
+		prgrp->mon.cntr_id[1] = MON_CNTR_UNSET;
+		list_for_each_entry(crgrp, &prgrp->mon.crdtgrp_list,
+				    mon.crdtgrp_list) {
+			crgrp->mon.cntr_id[0] = MON_CNTR_UNSET;
+			crgrp->mon.cntr_id[1] = MON_CNTR_UNSET;
+		}
+	}
+}
+
+static ssize_t rdtgroup_mbm_assign_mode_write(struct kernfs_open_file *of,
+					      char *buf, size_t nbytes, loff_t off)
+{
+	struct rdt_resource *r = of->kn->parent->priv;
+	int ret = 0;
+	bool enable;
+
+	/* Valid input requires a trailing newline */
+	if (nbytes == 0 || buf[nbytes - 1] != '\n')
+		return -EINVAL;
+
+	buf[nbytes - 1] = '\0';
+
+	cpus_read_lock();
+	mutex_lock(&rdtgroup_mutex);
+
+	rdt_last_cmd_clear();
+
+	if (!strcmp(buf, "default")) {
+		enable = 0;
+	} else if (!strcmp(buf, "mbm_cntr_assign")) {
+		enable = 1;
+	} else {
+		ret = -EINVAL;
+		rdt_last_cmd_puts("Unsupported assign mode\n");
+		goto write_exit;
+	}
+
+	if (enable != resctrl_arch_mbm_cntr_assign_enabled(r)) {
+		rdtgroup_mbm_cntr_reset(r);
+		ret = resctrl_arch_mbm_cntr_assign_set(r, enable);
+	}
+
+write_exit:
+	mutex_unlock(&rdtgroup_mutex);
+	cpus_read_unlock();
+
+	return ret ?: nbytes;
+}
+
 static int rdtgroup_num_mbm_cntrs_show(struct kernfs_open_file *of,
 				       struct seq_file *s, void *v)
 {
@@ -2107,9 +2178,10 @@ static struct rftype res_common_files[] = {
 	},
 	{
 		.name		= "mbm_assign_mode",
-		.mode		= 0444,
+		.mode		= 0644,
 		.kf_ops		= &rdtgroup_kf_single_ops,
 		.seq_show	= rdtgroup_mbm_assign_mode_show,
+		.write		= rdtgroup_mbm_assign_mode_write,
 		.fflags		= RFTYPE_MON_INFO,
 	},
 	{
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 96+ messages in thread

* [PATCH v7 21/24] x86/resctrl: Configure mbm_cntr_assign mode if supported
  2024-09-04 22:21 [PATCH v7 00/24] x86/resctrl : Support AMD Assignable Bandwidth Monitoring Counters (ABMC) Babu Moger
                   ` (19 preceding siblings ...)
  2024-09-04 22:21 ` [PATCH v7 20/24] x86/resctrl: Introduce the interface to switch between monitor modes Babu Moger
@ 2024-09-04 22:21 ` Babu Moger
  2024-09-19 17:43   ` Reinette Chatre
  2024-09-04 22:21 ` [PATCH v7 22/24] x86/resctrl: Update assignments on event configuration changes Babu Moger
                   ` (3 subsequent siblings)
  24 siblings, 1 reply; 96+ messages in thread
From: Babu Moger @ 2024-09-04 22:21 UTC (permalink / raw)
  To: corbet, fenghua.yu, reinette.chatre, tglx, mingo, bp, dave.hansen
  Cc: x86, hpa, paulmck, rdunlap, tj, peterz, yanjiewtw, babu.moger,
	kim.phillips, lukas.bulwahn, seanjc, jmattson, leitao, jpoimboe,
	rick.p.edgecombe, kirill.shutemov, jithu.joseph, kai.huang,
	kan.liang, daniel.sneddon, pbonzini, sandipan.das, ilpo.jarvinen,
	peternewman, maciej.wieczor-retman, linux-doc, linux-kernel,
	eranian, james.morse

Configure mbm_cntr_assign on AMD.

'mbm_cntr_assign' mode in AMD is ABMC (Assignable Bandwidth Monitoring
Counters). When the ABMC is updated, it must be updated on all logical
processors in the resctrl domain.

Signed-off-by: Babu Moger <babu.moger@amd.com>
---
v7: Introduced resctrl_arch_mbm_cntr_assign_configure() to configure.
    Moved the default settings to rdt_get_mon_l3_config(). It should be
    done before the hotplug handler is called. It cannot be done at
    rdtgroup_init().

v6: Keeping the default enablement in arch init code for now.
     This may need some discussion.
     Renamed resctrl_arch_configure_abmc to resctrl_arch_mbm_cntr_assign_configure.

v5: New patch to enable ABMC by default.
---
 arch/x86/kernel/cpu/resctrl/internal.h |  1 +
 arch/x86/kernel/cpu/resctrl/monitor.c  |  1 +
 arch/x86/kernel/cpu/resctrl/rdtgroup.c | 11 +++++++++++
 3 files changed, 13 insertions(+)

diff --git a/arch/x86/kernel/cpu/resctrl/internal.h b/arch/x86/kernel/cpu/resctrl/internal.h
index 9a65a13ccbe9..3250561f0187 100644
--- a/arch/x86/kernel/cpu/resctrl/internal.h
+++ b/arch/x86/kernel/cpu/resctrl/internal.h
@@ -709,6 +709,7 @@ int rdtgroup_assign_cntr(struct rdt_resource *r, struct rdtgroup *rdtgrp,
 			 struct rdt_mon_domain *d, enum resctrl_event_id evtid);
 int rdtgroup_unassign_cntr(struct rdt_resource *r, struct rdtgroup *rdtgrp,
 			   struct rdt_mon_domain *d, enum resctrl_event_id evtid);
+void resctrl_arch_mbm_cntr_assign_configure(struct rdt_resource *r);
 void rdt_staged_configs_clear(void);
 bool closid_allocated(unsigned int closid);
 int resctrl_find_cleanest_closid(void);
diff --git a/arch/x86/kernel/cpu/resctrl/monitor.c b/arch/x86/kernel/cpu/resctrl/monitor.c
index 09b1d8bb0aa0..314c0b297470 100644
--- a/arch/x86/kernel/cpu/resctrl/monitor.c
+++ b/arch/x86/kernel/cpu/resctrl/monitor.c
@@ -1261,6 +1261,7 @@ int __init rdt_get_mon_l3_config(struct rdt_resource *r)
 			cpuid_count(0x80000020, 5, &eax, &ebx, &ecx, &edx);
 			r->mon.num_mbm_cntrs = (ebx & 0xFFFF) + 1;
 			resctrl_file_fflags_init("num_mbm_cntrs", RFTYPE_MON_INFO);
+			hw_res->mbm_cntr_assign_enabled = true;
 		}
 	}
 
diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
index 7a8ece12d7da..1054583bef9d 100644
--- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
+++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
@@ -2726,6 +2726,13 @@ int resctrl_arch_mbm_cntr_assign_set(struct rdt_resource *r, bool enable)
 	return 0;
 }
 
+void resctrl_arch_mbm_cntr_assign_configure(struct rdt_resource *r)
+{
+	struct rdt_hw_resource *hw_res = resctrl_to_arch_res(r);
+
+	resctrl_abmc_set_one_amd(&hw_res->mbm_cntr_assign_enabled);
+}
+
 /*
  * We don't allow rdtgroup directories to be created anywhere
  * except the root directory. Thus when looking for the rdtgroup
@@ -4510,9 +4517,13 @@ int resctrl_online_mon_domain(struct rdt_resource *r, struct rdt_mon_domain *d)
 
 void resctrl_online_cpu(unsigned int cpu)
 {
+	struct rdt_resource *r = &rdt_resources_all[RDT_RESOURCE_L3].r_resctrl;
+
 	mutex_lock(&rdtgroup_mutex);
 	/* The CPU is set in default rdtgroup after online. */
 	cpumask_set_cpu(cpu, &rdtgroup_default.cpu_mask);
+	if (r->mon.mbm_cntr_assignable)
+		resctrl_arch_mbm_cntr_assign_configure(r);
 	mutex_unlock(&rdtgroup_mutex);
 }
 
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 96+ messages in thread

* [PATCH v7 22/24] x86/resctrl: Update assignments on event configuration changes
  2024-09-04 22:21 [PATCH v7 00/24] x86/resctrl : Support AMD Assignable Bandwidth Monitoring Counters (ABMC) Babu Moger
                   ` (20 preceding siblings ...)
  2024-09-04 22:21 ` [PATCH v7 21/24] x86/resctrl: Configure mbm_cntr_assign mode if supported Babu Moger
@ 2024-09-04 22:21 ` Babu Moger
  2024-09-19 17:45   ` Reinette Chatre
  2024-09-04 22:21 ` [PATCH v7 23/24] x86/resctrl: Introduce interface to list assignment states of all the groups Babu Moger
                   ` (2 subsequent siblings)
  24 siblings, 1 reply; 96+ messages in thread
From: Babu Moger @ 2024-09-04 22:21 UTC (permalink / raw)
  To: corbet, fenghua.yu, reinette.chatre, tglx, mingo, bp, dave.hansen
  Cc: x86, hpa, paulmck, rdunlap, tj, peterz, yanjiewtw, babu.moger,
	kim.phillips, lukas.bulwahn, seanjc, jmattson, leitao, jpoimboe,
	rick.p.edgecombe, kirill.shutemov, jithu.joseph, kai.huang,
	kan.liang, daniel.sneddon, pbonzini, sandipan.das, ilpo.jarvinen,
	peternewman, maciej.wieczor-retman, linux-doc, linux-kernel,
	eranian, james.morse

Users can modify the configuration of assignable events. Whenever the
event configuration is updated, MBM assignments must be revised across
all monitor groups within the impacted domains.

Signed-off-by: Babu Moger <babu.moger@amd.com>
---
v7: New patch to update the assignments. Missed it earlier.
---
 arch/x86/kernel/cpu/resctrl/rdtgroup.c | 53 ++++++++++++++++++++++++++
 1 file changed, 53 insertions(+)

diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
index 1054583bef9d..0b1490d71e77 100644
--- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
+++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
@@ -871,6 +871,15 @@ static int rdtgroup_rmid_show(struct kernfs_open_file *of,
  */
 #define MBM_EVENT_ARRAY_INDEX(_event) ((_event) - 2)
 
+static bool resctrl_mbm_event_assigned(struct rdtgroup *rdtg,
+				       struct rdt_mon_domain *d, u32 evtid)
+{
+	int index = MBM_EVENT_ARRAY_INDEX(evtid);
+	int cntr_id = rdtg->mon.cntr_id[index];
+
+	return  (cntr_id != MON_CNTR_UNSET && test_bit(cntr_id, d->mbm_cntr_map));
+}
+
 static int rdtgroup_mbm_assign_mode_show(struct kernfs_open_file *of,
 					 struct seq_file *s, void *v)
 {
@@ -1793,12 +1802,48 @@ static int mbm_local_bytes_config_show(struct kernfs_open_file *of,
 	return 0;
 }
 
+static int resctrl_mbm_event_update_assign(struct rdt_resource *r,
+					   struct rdt_mon_domain *d, u32 evtid)
+{
+	struct rdt_mon_domain *dom;
+	struct rdtgroup *rdtg;
+	int ret = 0;
+
+	if (!resctrl_arch_mbm_cntr_assign_enabled(r))
+		return ret;
+
+	list_for_each_entry(rdtg, &rdt_all_groups, rdtgroup_list) {
+		struct rdtgroup *crg;
+
+		list_for_each_entry(dom, &r->mon_domains, hdr.list) {
+			if (d == dom && resctrl_mbm_event_assigned(rdtg, dom, evtid)) {
+				ret = rdtgroup_assign_cntr(r, rdtg, dom, evtid);
+				if (ret)
+					goto out_done;
+			}
+		}
+
+		list_for_each_entry(crg, &rdtg->mon.crdtgrp_list, mon.crdtgrp_list) {
+			list_for_each_entry(dom, &r->mon_domains, hdr.list) {
+				if (d == dom && resctrl_mbm_event_assigned(crg, dom, evtid)) {
+					ret = rdtgroup_assign_cntr(r, crg, dom, evtid);
+					if (ret)
+						goto out_done;
+				}
+			}
+		}
+	}
+
+out_done:
+	return ret;
+}
 
 static void mbm_config_write_domain(struct rdt_resource *r,
 				    struct rdt_mon_domain *d, u32 evtid, u32 val)
 {
 	struct mon_config_info mon_info = {0};
 	u32 config_val;
+	int ret;
 
 	/*
 	 * Check the current config value first. If both are the same then
@@ -1822,6 +1867,14 @@ static void mbm_config_write_domain(struct rdt_resource *r,
 			      resctrl_arch_event_config_set,
 			      &mon_info, 1);
 
+	/*
+	 * Counter assignments needs to be updated to match the event
+	 * configuration.
+	 */
+	ret = resctrl_mbm_event_update_assign(r, d, evtid);
+	if (ret)
+		rdt_last_cmd_puts("Assign failed, event will be Unavailable\n");
+
 	/*
 	 * When an Event Configuration is changed, the bandwidth counters
 	 * for all RMIDs and Events will be cleared by the hardware. The
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 96+ messages in thread

* [PATCH v7 23/24] x86/resctrl: Introduce interface to list assignment states of all the groups
  2024-09-04 22:21 [PATCH v7 00/24] x86/resctrl : Support AMD Assignable Bandwidth Monitoring Counters (ABMC) Babu Moger
                   ` (21 preceding siblings ...)
  2024-09-04 22:21 ` [PATCH v7 22/24] x86/resctrl: Update assignments on event configuration changes Babu Moger
@ 2024-09-04 22:21 ` Babu Moger
  2024-09-19 17:53   ` Reinette Chatre
  2024-09-04 22:21 ` [PATCH v7 24/24] x86/resctrl: Introduce interface to modify assignment states of " Babu Moger
  2024-09-19 18:00 ` [PATCH v7 00/24] x86/resctrl : Support AMD Assignable Bandwidth Monitoring Counters (ABMC) Reinette Chatre
  24 siblings, 1 reply; 96+ messages in thread
From: Babu Moger @ 2024-09-04 22:21 UTC (permalink / raw)
  To: corbet, fenghua.yu, reinette.chatre, tglx, mingo, bp, dave.hansen
  Cc: x86, hpa, paulmck, rdunlap, tj, peterz, yanjiewtw, babu.moger,
	kim.phillips, lukas.bulwahn, seanjc, jmattson, leitao, jpoimboe,
	rick.p.edgecombe, kirill.shutemov, jithu.joseph, kai.huang,
	kan.liang, daniel.sneddon, pbonzini, sandipan.das, ilpo.jarvinen,
	peternewman, maciej.wieczor-retman, linux-doc, linux-kernel,
	eranian, james.morse

Provide the interface to list the assignment states of all the resctrl
groups in mbm_cntr_assign mode.

Example:
$cat /sys/fs/resctrl/info/L3_MON/mbm_assign_control

List follows the following format:

"<CTRL_MON group>/<MON group>/<domain_id>=<flags>"

Format for specific type of groups:

- Default CTRL_MON group:
  "//<domain_id>=<flags>"

- Non-default CTRL_MON group:
  "<CTRL_MON group>//<domain_id>=<flags>"

- Child MON group of default CTRL_MON group:
  "/<MON group>/<domain_id>=<flags>"

- Child MON group of non-default CTRL_MON group:
  "<CTRL_MON group>/<MON group>/<domain_id>=<flags>"

Flags can be one of the following:
t  MBM total event is enabled
l  MBM local event is enabled
tl Both total and local MBM events are enabled
_  None of the MBM events are enabled

Signed-off-by: Babu Moger <babu.moger@amd.com>
---
v7: Renamed the interface name from 'mbm_control' to 'mbm_assign_control'
    to match 'mbm_assign_mode'.
    Removed Arch references from FS code.
    Added rdt_last_cmd_clear() before the command processing.
    Added rdtgroup_mutex before all the calls.
    Removed references of ABMC from FS code.

v6: The domain specific assignment can be determined looking at mbm_cntr_map.
    Removed rdtgroup_abmc_dom_cfg() and rdtgroup_abmc_dom_state().
    Removed the switch statement for the domain_state detection.
    Determined the flags incremently.
    Removed special handling of default group while printing..

v5: Replaced "assignment flags" with "flags".
    Changes related to mon structure.
    Changes related renaming the interface from mbm_assign_control to
    mbm_control.

v4: Added functionality to query domain specific assigment in.
    rdtgroup_abmc_dom_state().

v3: New patch.
    Addresses the feedback to provide the global assignment interface.
    https://lore.kernel.org/lkml/c73f444b-83a1-4e9a-95d3-54c5165ee782@intel.com/
---
 Documentation/arch/x86/resctrl.rst     | 44 +++++++++++++++++
 arch/x86/kernel/cpu/resctrl/monitor.c  |  1 +
 arch/x86/kernel/cpu/resctrl/rdtgroup.c | 68 ++++++++++++++++++++++++++
 3 files changed, 113 insertions(+)

diff --git a/Documentation/arch/x86/resctrl.rst b/Documentation/arch/x86/resctrl.rst
index 743c0b64a330..a72cb3a6b07a 100644
--- a/Documentation/arch/x86/resctrl.rst
+++ b/Documentation/arch/x86/resctrl.rst
@@ -308,6 +308,50 @@ with the following files:
 "num_mbm_cntrs":
 	The number of monitoring counters available for assignment.
 
+"mbm_assign_control":
+	Reports the resctrl group and monitor status of each group.
+
+	List follows the following format:
+		"<CTRL_MON group>/<MON group>/<domain_id>=<flags>"
+
+	Format for specific type of groups:
+
+	* Default CTRL_MON group:
+		"//<domain_id>=<flags>"
+
+	* Non-default CTRL_MON group:
+		"<CTRL_MON group>//<domain_id>=<flags>"
+
+	* Child MON group of default CTRL_MON group:
+		"/<MON group>/<domain_id>=<flags>"
+
+	* Child MON group of non-default CTRL_MON group:
+		"<CTRL_MON group>/<MON group>/<domain_id>=<flags>"
+
+	Flags can be one of the following:
+	::
+
+	 t  MBM total event is assigned.
+	 l  MBM local event is assigned.
+	 tl Both total and local MBM events are assigned.
+	 _  None of the MBM events are assigned.
+
+	Examples:
+	::
+
+	 # mkdir /sys/fs/resctrl/mon_groups/child_default_mon_grp
+	 # mkdir /sys/fs/resctrl/non_default_ctrl_mon_grp
+	 # mkdir /sys/fs/resctrl/non_default_ctrl_mon_grp/mon_groups/child_non_default_mon_grp
+
+	 # cat /sys/fs/resctrl/info/L3_MON/mbm_assign_control
+	 non_default_ctrl_mon_grp//0=tl;1=tl;
+	 non_default_ctrl_mon_grp/child_non_default_mon_grp/0=tl;1=tl;
+	 //0=tl;1=tl;
+	 /child_default_mon_grp/0=tl;1=tl;
+
+	 There are four resctrl groups. All the groups have total and local MBM events
+	 assigned on domain 0 and 1.
+
 "max_threshold_occupancy":
 		Read/write file provides the largest value (in
 		bytes) at which a previously used LLC_occupancy
diff --git a/arch/x86/kernel/cpu/resctrl/monitor.c b/arch/x86/kernel/cpu/resctrl/monitor.c
index 314c0b297470..74db63402482 100644
--- a/arch/x86/kernel/cpu/resctrl/monitor.c
+++ b/arch/x86/kernel/cpu/resctrl/monitor.c
@@ -1262,6 +1262,7 @@ int __init rdt_get_mon_l3_config(struct rdt_resource *r)
 			r->mon.num_mbm_cntrs = (ebx & 0xFFFF) + 1;
 			resctrl_file_fflags_init("num_mbm_cntrs", RFTYPE_MON_INFO);
 			hw_res->mbm_cntr_assign_enabled = true;
+			resctrl_file_fflags_init("mbm_assign_control", RFTYPE_MON_INFO);
 		}
 	}
 
diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
index 0b1490d71e77..ffa0ed98efbe 100644
--- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
+++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
@@ -985,6 +985,68 @@ static int rdtgroup_num_mbm_cntrs_show(struct kernfs_open_file *of,
 	return 0;
 }
 
+static char *rdtgroup_mon_state_to_str(struct rdtgroup *rdtgrp,
+				       struct rdt_mon_domain *d, char *str)
+{
+	char *tmp = str;
+
+	/* Query the total and local event flags for the domain */
+	if (resctrl_mbm_event_assigned(rdtgrp, d, QOS_L3_MBM_TOTAL_EVENT_ID))
+		*tmp++ = 't';
+
+	if (resctrl_mbm_event_assigned(rdtgrp, d, QOS_L3_MBM_LOCAL_EVENT_ID))
+		*tmp++ = 'l';
+
+	if (tmp == str)
+		*tmp++ = '_';
+
+	*tmp = '\0';
+	return str;
+}
+
+static int rdtgroup_mbm_assign_control_show(struct kernfs_open_file *of,
+					    struct seq_file *s, void *v)
+{
+	struct rdt_resource *r = of->kn->parent->priv;
+	struct rdt_mon_domain *dom;
+	struct rdtgroup *rdtg;
+	char str[10];
+
+	mutex_lock(&rdtgroup_mutex);
+
+	if (!resctrl_arch_mbm_cntr_assign_enabled(r)) {
+		rdt_last_cmd_puts("mbm_cntr_assign mode is not enabled\n");
+		mutex_unlock(&rdtgroup_mutex);
+		return -EINVAL;
+	}
+
+	rdt_last_cmd_clear();
+
+	list_for_each_entry(rdtg, &rdt_all_groups, rdtgroup_list) {
+		struct rdtgroup *crg;
+
+		seq_printf(s, "%s//", rdtg->kn->name);
+
+		list_for_each_entry(dom, &r->mon_domains, hdr.list)
+			seq_printf(s, "%d=%s;", dom->hdr.id,
+				   rdtgroup_mon_state_to_str(rdtg, dom, str));
+		seq_putc(s, '\n');
+
+		list_for_each_entry(crg, &rdtg->mon.crdtgrp_list,
+				    mon.crdtgrp_list) {
+			seq_printf(s, "%s/%s/", rdtg->kn->name, crg->kn->name);
+
+			list_for_each_entry(dom, &r->mon_domains, hdr.list)
+				seq_printf(s, "%d=%s;", dom->hdr.id,
+					   rdtgroup_mon_state_to_str(crg, dom, str));
+			seq_putc(s, '\n');
+		}
+	}
+
+	mutex_unlock(&rdtgroup_mutex);
+	return 0;
+}
+
 #ifdef CONFIG_PROC_CPU_RESCTRL
 
 /*
@@ -2251,6 +2313,12 @@ static struct rftype res_common_files[] = {
 		.kf_ops		= &rdtgroup_kf_single_ops,
 		.seq_show	= rdtgroup_num_mbm_cntrs_show,
 	},
+	{
+		.name		= "mbm_assign_control",
+		.mode		= 0444,
+		.kf_ops		= &rdtgroup_kf_single_ops,
+		.seq_show	= rdtgroup_mbm_assign_control_show,
+	},
 	{
 		.name		= "cpus_list",
 		.mode		= 0644,
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 96+ messages in thread

* [PATCH v7 24/24] x86/resctrl: Introduce interface to modify assignment states of the groups
  2024-09-04 22:21 [PATCH v7 00/24] x86/resctrl : Support AMD Assignable Bandwidth Monitoring Counters (ABMC) Babu Moger
                   ` (22 preceding siblings ...)
  2024-09-04 22:21 ` [PATCH v7 23/24] x86/resctrl: Introduce interface to list assignment states of all the groups Babu Moger
@ 2024-09-04 22:21 ` Babu Moger
  2024-09-19 17:59   ` Reinette Chatre
  2024-09-19 18:00 ` [PATCH v7 00/24] x86/resctrl : Support AMD Assignable Bandwidth Monitoring Counters (ABMC) Reinette Chatre
  24 siblings, 1 reply; 96+ messages in thread
From: Babu Moger @ 2024-09-04 22:21 UTC (permalink / raw)
  To: corbet, fenghua.yu, reinette.chatre, tglx, mingo, bp, dave.hansen
  Cc: x86, hpa, paulmck, rdunlap, tj, peterz, yanjiewtw, babu.moger,
	kim.phillips, lukas.bulwahn, seanjc, jmattson, leitao, jpoimboe,
	rick.p.edgecombe, kirill.shutemov, jithu.joseph, kai.huang,
	kan.liang, daniel.sneddon, pbonzini, sandipan.das, ilpo.jarvinen,
	peternewman, maciej.wieczor-retman, linux-doc, linux-kernel,
	eranian, james.morse

Introduce the interface to assign MBM events in mbm_cntr_assign mode.

Events can be enabled or disabled by writing to file
/sys/fs/resctrl/info/L3_MON/mbm_assign_control

Format is similar to the list format with addition of opcode for the
assignment operation.
 "<CTRL_MON group>/<MON group>/<domain_id><opcode><flags>"

Format for specific type of groups:

 * Default CTRL_MON group:
         "//<domain_id><opcode><flags>"

 * Non-default CTRL_MON group:
         "<CTRL_MON group>//<domain_id><opcode><flags>"

 * Child MON group of default CTRL_MON group:
         "/<MON group>/<domain_id><opcode><flags>"

 * Child MON group of non-default CTRL_MON group:
         "<CTRL_MON group>/<MON group>/<domain_id><opcode><flags>"

Domain_id '*' will apply the flags on all the domains.

Opcode can be one of the following:

 = Update the assignment to match the flags
 + Assign a new MBM event without impacting existing assignments.
 - Unassign a MBM event from currently assigned events.

Assignment flags can be one of the following:
 t  MBM total event
 l  MBM local event
 tl Both total and local MBM events
 _  None of the MBM events. Valid only with '=' opcode.

Signed-off-by: Babu Moger <babu.moger@amd.com>
---
v7: Simplified the parsing (strsep(&token, "//") in rdtgroup_mbm_assign_control_write().
    Added mutex lock in rdtgroup_mbm_assign_control_write() while processing.
    Renamed rdtgroup_find_grp to rdtgroup_find_grp_by_name.
    Fixed rdtgroup_str_to_mon_state to return error for invalid flags.
    Simplified the calls rdtgroup_assign_cntr by merging few functions earlier.
    Removed ABMC reference in FS code.
    Reinette commented about handling the combination of flags like 'lt_' and '_lt'.
    Not sure if we need to change the behaviour here. Processed them sequencially right now.
    Users have the liberty to pass the flags. Restricting it might be a problem later.

v6: Added support assign all if domain id is '*'
    Fixed the allocation of counter id if it not assigned already.

v5: Interface name changed from mbm_assign_control to mbm_control.
    Fixed opcode and flags combination.
    '=_" is valid.
    "-_" amd "+_" is not valid.
    Minor message update.
    Renamed the function with prefix - rdtgroup_.
    Corrected few documentation mistakes.
    Rebase related changes after SNC support.

v4: Added domain specific assignments. Fixed the opcode parsing.

v3: New patch.
    Addresses the feedback to provide the global assignment interface.
    https://lore.kernel.org/lkml/c73f444b-83a1-4e9a-95d3-54c5165ee782@intel.com/
---
 Documentation/arch/x86/resctrl.rst     |  94 +++++++++-
 arch/x86/kernel/cpu/resctrl/internal.h |  10 ++
 arch/x86/kernel/cpu/resctrl/rdtgroup.c | 234 ++++++++++++++++++++++++-
 3 files changed, 336 insertions(+), 2 deletions(-)

diff --git a/Documentation/arch/x86/resctrl.rst b/Documentation/arch/x86/resctrl.rst
index a72cb3a6b07a..e46ec63d920e 100644
--- a/Documentation/arch/x86/resctrl.rst
+++ b/Documentation/arch/x86/resctrl.rst
@@ -334,7 +334,7 @@ with the following files:
 	 t  MBM total event is assigned.
 	 l  MBM local event is assigned.
 	 tl Both total and local MBM events are assigned.
-	 _  None of the MBM events are assigned.
+	 _  None of the MBM events are assigned. Only works with opcode '=' for write.
 
 	Examples:
 	::
@@ -352,6 +352,98 @@ with the following files:
 	 There are four resctrl groups. All the groups have total and local MBM events
 	 assigned on domain 0 and 1.
 
+	Assignment state can be updated by writing to the interface.
+
+	Format is similar to the list format with addition of opcode for the
+	assignment operation.
+
+		"<CTRL_MON group>/<MON group>/<domain_id><opcode><flags>"
+
+	Format for each type of groups:
+
+        * Default CTRL_MON group:
+                "//<domain_id><opcode><flags>"
+
+        * Non-default CTRL_MON group:
+                "<CTRL_MON group>//<domain_id><opcode><flags>"
+
+        * Child MON group of default CTRL_MON group:
+                "/<MON group>/<domain_id><opcode><flags>"
+
+        * Child MON group of non-default CTRL_MON group:
+                "<CTRL_MON group>/<MON group>/<domain_id><opcode><flags>"
+
+	Domain_id '*' will apply the flags on all the domains.
+
+	Opcode can be one of the following:
+	::
+
+	 = Update the assignment to match the MBM event.
+	 + Assign a new MBM event without impacting existing assignments.
+	 - Unassign a MBM event from currently assigned events.
+
+	Examples:
+	::
+
+	  Initial group status:
+	  # cat /sys/fs/resctrl/info/L3_MON/mbm_assign_control
+	  non_default_ctrl_mon_grp//0=tl;1=tl;
+	  non_default_ctrl_mon_grp/child_non_default_mon_grp/0=tl;1=tl;
+	  //0=tl;1=tl;
+	  /child_default_mon_grp/0=tl;1=tl;
+
+	  To update the default group to assign only total MBM event on domain 0:
+	  # echo "//0=t" > /sys/fs/resctrl/info/L3_MON/mbm_assign_control
+
+	  Assignment status after the update:
+	  # cat /sys/fs/resctrl/info/L3_MON/mbm_assign_control
+	  non_default_ctrl_mon_grp//0=tl;1=tl;
+	  non_default_ctrl_mon_grp/child_non_default_mon_grp/0=tl;1=tl;
+	  //0=t;1=tl;
+	  /child_default_mon_grp/0=tl;1=tl;
+
+	  To update the MON group child_default_mon_grp to remove total MBM event on domain 1:
+	  # echo "/child_default_mon_grp/1-t" > /sys/fs/resctrl/info/L3_MON/mbm_assign_control
+
+	  Assignment status after the update:
+	  $ cat /sys/fs/resctrl/info/L3_MON/mbm_assign_control
+	  non_default_ctrl_mon_grp//0=tl;1=tl;
+	  non_default_ctrl_mon_grp/child_non_default_mon_grp/0=tl;1=tl;
+	  //0=t;1=tl;
+	  /child_default_mon_grp/0=tl;1=l;
+
+	  To update the MON group non_default_ctrl_mon_grp/child_non_default_mon_grp to
+	  unassign both local and total MBM events on domain 1:
+	  # echo "non_default_ctrl_mon_grp/child_non_default_mon_grp/1=_" >
+			/sys/fs/resctrl/info/L3_MON/mbm_assign_control
+
+	  Assignment status after the update:
+	  non_default_ctrl_mon_grp//0=tl;1=tl;
+	  non_default_ctrl_mon_grp/child_non_default_mon_grp/0=tl;1=_;
+	  //0=t;1=tl;
+	  /child_default_mon_grp/0=tl;1=l;
+
+	  To update the default group to add a local MBM event domain 0.
+	  # echo "//0+l" > /sys/fs/resctrl/info/L3_MON/mbm_assign_control
+
+	  Assignment status after the update:
+	  # cat /sys/fs/resctrl/info/L3_MON/mbm_assign_control
+	  non_default_ctrl_mon_grp//0=tl;1=tl;
+	  non_default_ctrl_mon_grp/child_non_default_mon_grp/0=tl;1=_;
+	  //0=tl;1=tl;
+	  /child_default_mon_grp/0=tl;1=l;
+
+	  To update the non default CTRL_MON group non_default_ctrl_mon_grp to unassign all
+	  the MBM events on all the domains.
+	  # echo "non_default_ctrl_mon_grp//*=_" > /sys/fs/resctrl/info/L3_MON/mbm_assign_control
+
+	  Assignment status after the update:
+	  #cat /sys/fs/resctrl/info/L3_MON/mbm_assign_control
+	  non_default_ctrl_mon_grp//0=_;1=_;
+	  non_default_ctrl_mon_grp/child_non_default_mon_grp/0=tl;1=_;
+	  //0=tl;1=tl;
+	  /child_default_mon_grp/0=tl;1=l;
+
 "max_threshold_occupancy":
 		Read/write file provides the largest value (in
 		bytes) at which a previously used LLC_occupancy
diff --git a/arch/x86/kernel/cpu/resctrl/internal.h b/arch/x86/kernel/cpu/resctrl/internal.h
index 3250561f0187..799f36eef2b6 100644
--- a/arch/x86/kernel/cpu/resctrl/internal.h
+++ b/arch/x86/kernel/cpu/resctrl/internal.h
@@ -67,6 +67,16 @@
 
 #define MON_CNTR_UNSET			U32_MAX
 
+/*
+ * Assignment flags for mbm_cntr_assign feature
+ */
+enum {
+	ASSIGN_NONE	= 0,
+	ASSIGN_TOTAL	= BIT(QOS_L3_MBM_TOTAL_EVENT_ID),
+	ASSIGN_LOCAL	= BIT(QOS_L3_MBM_LOCAL_EVENT_ID),
+	ASSIGN_INVALID,
+};
+
 /**
  * cpumask_any_housekeeping() - Choose any CPU in @mask, preferring those that
  *			        aren't marked nohz_full
diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
index ffa0ed98efbe..56ecdf7406ae 100644
--- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
+++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
@@ -1047,6 +1047,237 @@ static int rdtgroup_mbm_assign_control_show(struct kernfs_open_file *of,
 	return 0;
 }
 
+static int rdtgroup_str_to_mon_state(char *flag)
+{
+	int i, mon_state = ASSIGN_NONE;
+
+	for (i = 0; i < strlen(flag); i++) {
+		switch (*(flag + i)) {
+		case 't':
+			mon_state |= ASSIGN_TOTAL;
+			break;
+		case 'l':
+			mon_state |= ASSIGN_LOCAL;
+			break;
+		case '_':
+			mon_state = ASSIGN_NONE;
+			break;
+		default:
+			return ASSIGN_INVALID;
+		}
+	}
+
+	return mon_state;
+}
+
+static struct rdtgroup *rdtgroup_find_grp_by_name(enum rdt_group_type rtype,
+						  char *p_grp, char *c_grp)
+{
+	struct rdtgroup *rdtg, *crg;
+
+	if (rtype == RDTCTRL_GROUP && *p_grp == '\0') {
+		return &rdtgroup_default;
+	} else if (rtype == RDTCTRL_GROUP) {
+		list_for_each_entry(rdtg, &rdt_all_groups, rdtgroup_list)
+			if (!strcmp(p_grp, rdtg->kn->name))
+				return rdtg;
+	} else if (rtype == RDTMON_GROUP) {
+		list_for_each_entry(rdtg, &rdt_all_groups, rdtgroup_list) {
+			if (!strcmp(p_grp, rdtg->kn->name)) {
+				list_for_each_entry(crg, &rdtg->mon.crdtgrp_list,
+						    mon.crdtgrp_list) {
+					if (!strcmp(c_grp, crg->kn->name))
+						return crg;
+				}
+			}
+		}
+	}
+
+	return NULL;
+}
+
+static int rdtgroup_process_flags(struct rdt_resource *r,
+				  enum rdt_group_type rtype,
+				  char *p_grp, char *c_grp, char *tok)
+{
+	int op, mon_state, assign_state, unassign_state;
+	char *dom_str, *id_str, *op_str;
+	struct rdt_mon_domain *d;
+	struct rdtgroup *rdtgrp;
+	unsigned long dom_id;
+	int ret, found = 0;
+
+	rdtgrp = rdtgroup_find_grp_by_name(rtype, p_grp, c_grp);
+
+	if (!rdtgrp) {
+		rdt_last_cmd_puts("Not a valid resctrl group\n");
+		return -EINVAL;
+	}
+
+next:
+	if (!tok || tok[0] == '\0')
+		return 0;
+
+	/* Start processing the strings for each domain */
+	dom_str = strim(strsep(&tok, ";"));
+
+	op_str = strpbrk(dom_str, "=+-");
+
+	if (op_str) {
+		op = *op_str;
+	} else {
+		rdt_last_cmd_puts("Missing operation =, +, -, _ character\n");
+		return -EINVAL;
+	}
+
+	id_str = strsep(&dom_str, "=+-");
+
+	/* Check for domain id '*' which means all domains */
+	if (id_str && *id_str == '*') {
+		d = NULL;
+		goto check_state;
+	} else if (!id_str || kstrtoul(id_str, 10, &dom_id)) {
+		rdt_last_cmd_puts("Missing domain id\n");
+		return -EINVAL;
+	}
+
+	/* Verify if the dom_id is valid */
+	list_for_each_entry(d, &r->mon_domains, hdr.list) {
+		if (d->hdr.id == dom_id) {
+			found = 1;
+			break;
+		}
+	}
+
+	if (!found) {
+		rdt_last_cmd_printf("Invalid domain id %ld\n", dom_id);
+		return -EINVAL;
+	}
+
+check_state:
+	mon_state = rdtgroup_str_to_mon_state(dom_str);
+
+	if (mon_state == ASSIGN_INVALID) {
+		rdt_last_cmd_puts("Invalid assign flag\n");
+		goto out_fail;
+	}
+
+	assign_state = 0;
+	unassign_state = 0;
+
+	switch (op) {
+	case '+':
+		if (mon_state == ASSIGN_NONE) {
+			rdt_last_cmd_puts("Invalid assign opcode\n");
+			goto out_fail;
+		}
+		assign_state = mon_state;
+		break;
+	case '-':
+		if (mon_state == ASSIGN_NONE) {
+			rdt_last_cmd_puts("Invalid assign opcode\n");
+			goto out_fail;
+		}
+		unassign_state = mon_state;
+		break;
+	case '=':
+		assign_state = mon_state;
+		unassign_state = (ASSIGN_TOTAL | ASSIGN_LOCAL) & ~assign_state;
+		break;
+	default:
+		break;
+	}
+
+	if (assign_state & ASSIGN_TOTAL) {
+		ret = rdtgroup_assign_cntr(r, rdtgrp, d, QOS_L3_MBM_TOTAL_EVENT_ID);
+		if (ret)
+			goto out_fail;
+	}
+
+	if (assign_state & ASSIGN_LOCAL) {
+		ret = rdtgroup_assign_cntr(r, rdtgrp, d, QOS_L3_MBM_LOCAL_EVENT_ID);
+		if (ret)
+			goto out_fail;
+	}
+
+	if (unassign_state & ASSIGN_TOTAL) {
+		ret = rdtgroup_unassign_cntr(r, rdtgrp, d, QOS_L3_MBM_TOTAL_EVENT_ID);
+		if (ret)
+			goto out_fail;
+	}
+
+	if (unassign_state & ASSIGN_LOCAL) {
+		ret = rdtgroup_unassign_cntr(r, rdtgrp, d, QOS_L3_MBM_LOCAL_EVENT_ID);
+		if (ret)
+			goto out_fail;
+	}
+
+	goto next;
+
+out_fail:
+
+	return -EINVAL;
+}
+
+static ssize_t rdtgroup_mbm_assign_control_write(struct kernfs_open_file *of,
+						 char *buf, size_t nbytes, loff_t off)
+{
+	struct rdt_resource *r = of->kn->parent->priv;
+	char *token, *cmon_grp, *mon_grp;
+	enum rdt_group_type rtype;
+	int ret;
+
+	/* Valid input requires a trailing newline */
+	if (nbytes == 0 || buf[nbytes - 1] != '\n')
+		return -EINVAL;
+
+	buf[nbytes - 1] = '\0';
+
+	cpus_read_lock();
+	mutex_lock(&rdtgroup_mutex);
+
+	if (!resctrl_arch_mbm_cntr_assign_enabled(r)) {
+		rdt_last_cmd_puts("mbm_cntr_assign mode is not enabled\n");
+		mutex_unlock(&rdtgroup_mutex);
+		cpus_read_unlock();
+		return -EINVAL;
+	}
+
+	rdt_last_cmd_clear();
+
+	while ((token = strsep(&buf, "\n")) != NULL) {
+		if (strstr(token, "/")) {
+			/*
+			 * The write command follows the following format:
+			 * “<CTRL_MON group>/<MON group>/<domain_id><opcode><flags>”
+			 * Extract the CTRL_MON group.
+			 */
+			cmon_grp = strsep(&token, "/");
+
+			/*
+			 * Extract the MON_GROUP.
+			 * strsep returns empty string for contiguous delimiters.
+			 * Empty mon_grp here means it is a RDTCTRL_GROUP.
+			 */
+			mon_grp = strsep(&token, "/");
+
+			if (*mon_grp == '\0')
+				rtype = RDTCTRL_GROUP;
+			else
+				rtype = RDTMON_GROUP;
+
+			ret = rdtgroup_process_flags(r, rtype, cmon_grp, mon_grp, token);
+			if (ret)
+				break;
+		}
+	}
+
+	mutex_unlock(&rdtgroup_mutex);
+	cpus_read_unlock();
+
+	return ret ?: nbytes;
+}
+
 #ifdef CONFIG_PROC_CPU_RESCTRL
 
 /*
@@ -2315,9 +2546,10 @@ static struct rftype res_common_files[] = {
 	},
 	{
 		.name		= "mbm_assign_control",
-		.mode		= 0444,
+		.mode		= 0644,
 		.kf_ops		= &rdtgroup_kf_single_ops,
 		.seq_show	= rdtgroup_mbm_assign_control_show,
+		.write		= rdtgroup_mbm_assign_control_write,
 	},
 	{
 		.name		= "cpus_list",
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 96+ messages in thread

* Re: [PATCH v7 02/24] x86/resctrl: Add ABMC feature in the command line options
  2024-09-04 22:21 ` [PATCH v7 02/24] x86/resctrl: Add ABMC feature in the command line options Babu Moger
@ 2024-09-19 16:00   ` Reinette Chatre
  2024-09-23 14:21     ` Moger, Babu
  0 siblings, 1 reply; 96+ messages in thread
From: Reinette Chatre @ 2024-09-19 16:00 UTC (permalink / raw)
  To: Babu Moger, corbet, fenghua.yu, tglx, mingo, bp, dave.hansen
  Cc: x86, hpa, paulmck, rdunlap, tj, peterz, yanjiewtw, kim.phillips,
	lukas.bulwahn, seanjc, jmattson, leitao, jpoimboe,
	rick.p.edgecombe, kirill.shutemov, jithu.joseph, kai.huang,
	kan.liang, daniel.sneddon, pbonzini, sandipan.das, ilpo.jarvinen,
	peternewman, maciej.wieczor-retman, linux-doc, linux-kernel,
	eranian, james.morse

Hi Babu,

On 9/4/24 3:21 PM, Babu Moger wrote:
> Add the command line option to enable or disable the new resctrl feature
> ABMC (Assignable Bandwidth Monitoring Counters).

This does not reflect the fs and arch separation that this version highlights
since ABMC is not a resctrl feature.

This can get confusing and I think this interface is indeed for the
architecture where hardware features are enabled/disabled (highlighted
by how the parameter is connected to the X86_FEATURE_ flag) ... so
perhaps something like:

	Add the command line option to enable or disable exposing
	the ABMC (Assignable Bandwidth Monitoring Counters) hardware
	feature to resctrl.

Patch looks good to me.

Reinette


^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [PATCH v7 03/24] x86/resctrl: Consolidate monitoring related data from rdt_resource
  2024-09-04 22:21 ` [PATCH v7 03/24] x86/resctrl: Consolidate monitoring related data from rdt_resource Babu Moger
@ 2024-09-19 16:03   ` Reinette Chatre
  0 siblings, 0 replies; 96+ messages in thread
From: Reinette Chatre @ 2024-09-19 16:03 UTC (permalink / raw)
  To: Babu Moger, corbet, fenghua.yu, tglx, mingo, bp, dave.hansen
  Cc: x86, hpa, paulmck, rdunlap, tj, peterz, yanjiewtw, kim.phillips,
	lukas.bulwahn, seanjc, jmattson, leitao, jpoimboe,
	rick.p.edgecombe, kirill.shutemov, jithu.joseph, kai.huang,
	kan.liang, daniel.sneddon, pbonzini, sandipan.das, ilpo.jarvinen,
	peternewman, maciej.wieczor-retman, linux-doc, linux-kernel,
	eranian, james.morse

Hi Babu,

On 9/4/24 3:21 PM, Babu Moger wrote:
> The cache allocation and memory bandwidth allocation feature properties
> are consolidated into struct resctrl_cache and struct resctrl_membw
> respectively.
> 
> In preparation for more monitoring properties that will clobber the
> existing resource struct more, re-organize the monitoring specific
> properties to also be in a separate structure.
> 
> Suggested-by: Reinette Chatre <reinette.chatre@intel.com>
> Signed-off-by: Babu Moger <babu.moger@amd.com>
> ---

Reviewed-by: Reinette Chatre <reinette.chatre@intel.com>

Reinette



^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [PATCH v7 04/24] x86/resctrl: Detect Assignable Bandwidth Monitoring feature details
  2024-09-04 22:21 ` [PATCH v7 04/24] x86/resctrl: Detect Assignable Bandwidth Monitoring feature details Babu Moger
@ 2024-09-19 16:16   ` Reinette Chatre
  2024-09-23 14:37     ` Moger, Babu
  0 siblings, 1 reply; 96+ messages in thread
From: Reinette Chatre @ 2024-09-19 16:16 UTC (permalink / raw)
  To: Babu Moger, corbet, fenghua.yu, tglx, mingo, bp, dave.hansen
  Cc: x86, hpa, paulmck, rdunlap, tj, peterz, yanjiewtw, kim.phillips,
	lukas.bulwahn, seanjc, jmattson, leitao, jpoimboe,
	rick.p.edgecombe, kirill.shutemov, jithu.joseph, kai.huang,
	kan.liang, daniel.sneddon, pbonzini, sandipan.das, ilpo.jarvinen,
	peternewman, maciej.wieczor-retman, linux-doc, linux-kernel,
	eranian, james.morse

Hi Babu,

On 9/4/24 3:21 PM, Babu Moger wrote:
> diff --git a/arch/x86/kernel/cpu/resctrl/monitor.c b/arch/x86/kernel/cpu/resctrl/monitor.c
> index 795fe91a8feb..6a792f06f5ce 100644
> --- a/arch/x86/kernel/cpu/resctrl/monitor.c
> +++ b/arch/x86/kernel/cpu/resctrl/monitor.c
> @@ -1229,6 +1229,12 @@ int __init rdt_get_mon_l3_config(struct rdt_resource *r)
>  			mbm_local_event.configurable = true;
>  			mbm_config_rftype_init("mbm_local_bytes_config");
>  		}
> +
> +		if (rdt_cpu_has(X86_FEATURE_ABMC)) {
> +			r->mon.mbm_cntr_assignable = true;
> +			cpuid_count(0x80000020, 5, &eax, &ebx, &ecx, &edx);
> +			r->mon.num_mbm_cntrs = (ebx & 0xFFFF) + 1;

This should use GENMASK()

Reinette


^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [PATCH v7 06/24] x86/resctrl: Add support to enable/disable AMD ABMC feature
  2024-09-04 22:21 ` [PATCH v7 06/24] x86/resctrl: Add support to enable/disable AMD ABMC feature Babu Moger
@ 2024-09-19 16:22   ` Reinette Chatre
  2024-09-23 15:30     ` Moger, Babu
  0 siblings, 1 reply; 96+ messages in thread
From: Reinette Chatre @ 2024-09-19 16:22 UTC (permalink / raw)
  To: Babu Moger, corbet, fenghua.yu, tglx, mingo, bp, dave.hansen
  Cc: x86, hpa, paulmck, rdunlap, tj, peterz, yanjiewtw, kim.phillips,
	lukas.bulwahn, seanjc, jmattson, leitao, jpoimboe,
	rick.p.edgecombe, kirill.shutemov, jithu.joseph, kai.huang,
	kan.liang, daniel.sneddon, pbonzini, sandipan.das, ilpo.jarvinen,
	peternewman, maciej.wieczor-retman, linux-doc, linux-kernel,
	eranian, james.morse

Hi Babu,

On 9/4/24 3:21 PM, Babu Moger wrote:
> Add the functionality to enable/disable AMD ABMC feature.
> 
> AMD ABMC feature is enabled by setting enabled bit(0) in MSR
> L3_QOS_EXT_CFG.  When the state of ABMC is changed, the MSR needs
> to be updated on all the logical processors in the QOS Domain.
> 
> Hardware counters will reset when ABMC state is changed. Reset the
> architectural state maintained by resctrl so that reading of a hardware
> counter is not considered as an overflow in next update.

Above mentions that architectural state is also reset, but that does
not seem to form part of this patch? 

> 
> The ABMC feature details are documented in APM listed below [1].
> [1] AMD64 Architecture Programmer's Manual Volume 2: System Programming
> Publication # 24593 Revision 3.41 section 19.3.3.3 Assignable Bandwidth
> Monitoring (ABMC).
> 
> Link: https://bugzilla.kernel.org/show_bug.cgi?id=206537
> Signed-off-by: Babu Moger <babu.moger@amd.com>
> ---

...

>  static inline struct rdt_hw_resource *resctrl_to_arch_res(struct rdt_resource *r)
> diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
> index 7e76f8d839fc..0178555bf3f6 100644
> --- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
> +++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
> @@ -2402,6 +2402,41 @@ int resctrl_arch_set_cdp_enabled(enum resctrl_res_level l, bool enable)
>  	return 0;
>  }
>  
> +/*
> + * Update L3_QOS_EXT_CFG MSR on all the CPUs associated with the resource.

This comment is not accurate since the function below only sets MSR on current CPU.

> + */
> +static void resctrl_abmc_set_one_amd(void *arg)

Reinette

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [PATCH v7 07/24] x86/resctrl: Introduce the interface to display monitor mode
  2024-09-04 22:21 ` [PATCH v7 07/24] x86/resctrl: Introduce the interface to display monitor mode Babu Moger
@ 2024-09-19 16:28   ` Reinette Chatre
  2024-09-23 16:01     ` Moger, Babu
  0 siblings, 1 reply; 96+ messages in thread
From: Reinette Chatre @ 2024-09-19 16:28 UTC (permalink / raw)
  To: Babu Moger, corbet, fenghua.yu, tglx, mingo, bp, dave.hansen
  Cc: x86, hpa, paulmck, rdunlap, tj, peterz, yanjiewtw, kim.phillips,
	lukas.bulwahn, seanjc, jmattson, leitao, jpoimboe,
	rick.p.edgecombe, kirill.shutemov, jithu.joseph, kai.huang,
	kan.liang, daniel.sneddon, pbonzini, sandipan.das, ilpo.jarvinen,
	peternewman, maciej.wieczor-retman, linux-doc, linux-kernel,
	eranian, james.morse

Hi Babu,

On 9/4/24 3:21 PM, Babu Moger wrote:
> Introduce the interface file "mbm_assign_mode" to list monitor modes
> supported.
> 
> The "mbm_cntr_assign" mode provides the option to assign a hardware
> counter to an RMID and monitor the bandwidth as long as it is assigned.
> 
> On AMD systems "mbm_cntr_assign" is backed by the ABMC (Assignable
> Bandwidth Monitoring Counters) hardware feature. "mbm_cntr_assign" mode
> is enabled by default when supported.

As I understand this series changed this behavior to let the architecture
dictate whether "mbm_cntr_assign" is enabled by default.

> 
> The "default" mode is the existing monitoring mode that works without the
> explicit counter assignment, instead relying on dynamic counter assignment
> by hardware that may result in hardware not dedicating a counter resulting
> in monitoring data reads returning "Unavailable".
> 
> Provide an interface to display the monitor mode on the system.
> $cat /sys/fs/resctrl/info/L3_MON/mbm_assign_mode
> [mbm_cntr_assign]
> default
> 
> Switching the mbm_assign_mode will reset all the MBM counters of all
> resctrl groups.
> 
> Signed-off-by: Babu Moger <babu.moger@amd.com>
> ---
> v7: Updated the descriptions/commit log in resctrl.rst to generic text.
>     Thanks to James and Reinette.
>     Rename mbm_mode to mbm_assign_mode.
>     Introduced mutex lock in rdtgroup_mbm_mode_show().
> 
> v6: Added documentation for mbm_cntr_assign and legacy mode.
>     Moved mbm_mode fflags initialization to static initialization.
> 
> v5: Changed interface name to mbm_mode.
>     It will be always available even if ABMC feature is not supported.
>     Added description in resctrl.rst about ABMC mode.
>     Fixed display abmc and legacy consistantly.
> 
> v4: Fixed the checks for legacy and abmc mode. Default it ABMC.
> 
> v3: New patch to display ABMC capability.
> ---
>  Documentation/arch/x86/resctrl.rst     | 33 ++++++++++++++++++++++++++
>  arch/x86/kernel/cpu/resctrl/rdtgroup.c | 31 ++++++++++++++++++++++++
>  2 files changed, 64 insertions(+)
> 
> diff --git a/Documentation/arch/x86/resctrl.rst b/Documentation/arch/x86/resctrl.rst
> index 30586728a4cd..a7b17ad8acb9 100644
> --- a/Documentation/arch/x86/resctrl.rst
> +++ b/Documentation/arch/x86/resctrl.rst
> @@ -257,6 +257,39 @@ with the following files:
>  	    # cat /sys/fs/resctrl/info/L3_MON/mbm_local_bytes_config
>  	    0=0x30;1=0x30;3=0x15;4=0x15
>  
> +"mbm_assign_mode":
> +	Reports the list of monitoring modes supported. The enclosed brackets
> +	indicate which feature is enabled.

"which feature is enabled" -> "which mode is enabled"?

> +	::
> +
> +	  cat /sys/fs/resctrl/info/L3_MON/mbm_assign_mode
> +	  [mbm_cntr_assign]
> +	  default
> +
> +	"mbm_cntr_assign":
> +
> +	In mbm_cntr_assign mode user-space is able to specify which control
> +	or monitor groups in resctrl should have a hardware counter assigned

This documentation should ideally also be appropriate for when the "soft-ABMC"
support lands. Considering that, should all the "hardware counter" instances perhaps be
changed to just be "counter"?

> +	using the 'mbm_control' file. The number of hardware counters available
> +	is described in the 'num_mbm_cntrs' file. Changing to this mode will
> +	cause all counters on a resource to reset.

Should resctrl commit to this? Resetting of the counters as implemented here
does seem to be an architecture specific action so this text could be
made more generic by stating "may cause all counters on a resource to reset".

> +
> +	The feature is needed on platforms which support more control and monitor

"The feature" -> "The mode"?

> +	groups than hardware counters, meaning 'unassigned' control or monitor
> +	groups will report 'Unavailable' or not count all the traffic in an
> +	unpredictable way.

"or not count all the traffic in an unpredictable way" is a bit hard to parse ... how
about "or count traffic in an unpredictable way"?


> +
> +	AMD Platforms with ABMC (Assignable Bandwidth Monitoring Counters) feature
> +	enable this mode by default so that counters remain assigned even when the
> +	corresponding RMID is not in use by any processor.
> +
> +	"default":
> +
> +	By default resctrl assumes each control and monitor group has a hardware counter.
> +	Hardware without this property will still allow more control or monitor groups
> +	than 'num_mbm_cntrs' to be created. Reading the mbm files may report 'Unavailable'
Please be specific what is meant with "the mbm files"

> +	if there is no hardware resource assigned.

"no hardware resource" -> "no counter"?

> +
>  "max_threshold_occupancy":
>  		Read/write file provides the largest value (in
>  		bytes) at which a previously used LLC_occupancy
> diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
> index 0178555bf3f6..dbc8c5e63213 100644
> --- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
> +++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
> @@ -845,6 +845,30 @@ static int rdtgroup_rmid_show(struct kernfs_open_file *of,
>  	return ret;
>  }
>  

Reinette

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [PATCH v7 08/24] x86/resctrl: Introduce interface to display number of monitoring counters
  2024-09-04 22:21 ` [PATCH v7 08/24] x86/resctrl: Introduce interface to display number of monitoring counters Babu Moger
@ 2024-09-19 16:32   ` Reinette Chatre
  2024-09-23 16:23     ` Moger, Babu
  0 siblings, 1 reply; 96+ messages in thread
From: Reinette Chatre @ 2024-09-19 16:32 UTC (permalink / raw)
  To: Babu Moger, corbet, fenghua.yu, tglx, mingo, bp, dave.hansen
  Cc: x86, hpa, paulmck, rdunlap, tj, peterz, yanjiewtw, kim.phillips,
	lukas.bulwahn, seanjc, jmattson, leitao, jpoimboe,
	rick.p.edgecombe, kirill.shutemov, jithu.joseph, kai.huang,
	kan.liang, daniel.sneddon, pbonzini, sandipan.das, ilpo.jarvinen,
	peternewman, maciej.wieczor-retman, linux-doc, linux-kernel,
	eranian, james.morse

Hi Babu,

On 9/4/24 3:21 PM, Babu Moger wrote:
> The mbm_assign_cntr mode provides an option to the user to assign a
> hardware counter to an RMID, event pair and monitor the bandwidth as

Could you please be consistent in this series in how you refer to
an RMID, event pair ? For example later it becomes RMID-event pair.


> long as the counter is assigned. Number of assignments depend on number
> of monitoring counters available.
> 
> Provide the interface to display the number of monitoring counters
> supported.
> 
> Signed-off-by: Babu Moger <babu.moger@amd.com>
> ---
> v7: Minor commit log text changes.
> 
> v6: No changes.
> 
> v5: Changed the display name from num_cntrs to num_mbm_cntrs.
>     Updated the commit message.
>     Moved the patch after mbm_mode is introduced.
> 
> v4: Changed the counter name to num_cntrs. And few text changes.
> 
> v3: Changed the field name to mbm_assign_cntrs.
> 
> v2: Changed the field name to mbm_assignable_counters from abmc_counte
> ---
>  Documentation/arch/x86/resctrl.rst     |  3 +++
>  arch/x86/kernel/cpu/resctrl/monitor.c  |  1 +
>  arch/x86/kernel/cpu/resctrl/rdtgroup.c | 16 ++++++++++++++++
>  3 files changed, 20 insertions(+)
> 
> diff --git a/Documentation/arch/x86/resctrl.rst b/Documentation/arch/x86/resctrl.rst
> index a7b17ad8acb9..3e9302971faf 100644
> --- a/Documentation/arch/x86/resctrl.rst
> +++ b/Documentation/arch/x86/resctrl.rst
> @@ -290,6 +290,9 @@ with the following files:
>  	than 'num_mbm_cntrs' to be created. Reading the mbm files may report 'Unavailable'
>  	if there is no hardware resource assigned.
>  
> +"num_mbm_cntrs":
> +	The number of monitoring counters available for assignment.
> +

I think it will be helpful if the changelog and the above doc notes when this file can
be expected to be visible since its visibility is not connected to visibility of
"mbm_assign_mode" that refers to it. There also seems to be a conflict here where
"mbm_assign_mode" documentation contains section about "default" that refers to
"num_mbm_cntrs", but "num_mbm_cntrs" may not be visible if "default" is the only mode.

Reinette

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [PATCH v7 09/24] x86/resctrl: Introduce bitmap mbm_cntr_free_map to track assignable counters
  2024-09-04 22:21 ` [PATCH v7 09/24] x86/resctrl: Introduce bitmap mbm_cntr_free_map to track assignable counters Babu Moger
@ 2024-09-19 16:42   ` Reinette Chatre
  2024-09-23 18:33     ` Moger, Babu
  2024-09-24 16:25   ` Peter Newman
  1 sibling, 1 reply; 96+ messages in thread
From: Reinette Chatre @ 2024-09-19 16:42 UTC (permalink / raw)
  To: Babu Moger, corbet, fenghua.yu, tglx, mingo, bp, dave.hansen
  Cc: x86, hpa, paulmck, rdunlap, tj, peterz, yanjiewtw, kim.phillips,
	lukas.bulwahn, seanjc, jmattson, leitao, jpoimboe,
	rick.p.edgecombe, kirill.shutemov, jithu.joseph, kai.huang,
	kan.liang, daniel.sneddon, pbonzini, sandipan.das, ilpo.jarvinen,
	peternewman, maciej.wieczor-retman, linux-doc, linux-kernel,
	eranian, james.morse

Hi Babu,

On 9/4/24 3:21 PM, Babu Moger wrote:
> Hardware provides a set of counters when mbm_cntr_assignable feature is
> supported. These counters are used for assigning the events in resctrl
> a group when the feature is enabled. The kernel must manage and track the

The second sentence ("These counters ...") is difficult to parse.

> number of available counters.

"The kernel must manage and track the number of available counters." ->
"The kernel must manage and track the available counters." ?

> 
> Introduce mbm_cntr_free_map bitmap to track available counters and set
> of routines to allocate and free the counters.
> 
> Signed-off-by: Babu Moger <babu.moger@amd.com>
> ---

...

> diff --git a/arch/x86/kernel/cpu/resctrl/monitor.c b/arch/x86/kernel/cpu/resctrl/monitor.c
> index e3e71843401a..f98cc5b9bebc 100644
> --- a/arch/x86/kernel/cpu/resctrl/monitor.c
> +++ b/arch/x86/kernel/cpu/resctrl/monitor.c
> @@ -1175,6 +1175,30 @@ static __init int snc_get_config(void)
>  	return ret;
>  }
>  
> +/*
> + * Counter bitmap for tracking the available counters.
> + * 'mbm_cntr_assign' mode provides set of hardware counters for assigning
> + * RMID, event pair. Each RMID and event pair takes one hardware counter.

(soft-ABMC may need to edit this comment)

> + * Kernel needs to keep track of the number of available counters.

Last sentence seems to be duplicate of the first?

> + */
> +static int mbm_cntrs_init(struct rdt_resource *r)

Needs __init?

> +{
> +	if (r->mon.mbm_cntr_assignable) {
> +		r->mon.mbm_cntr_free_map = bitmap_zalloc(r->mon.num_mbm_cntrs,
> +							 GFP_KERNEL);
> +		if (!r->mon.mbm_cntr_free_map)
> +			return -ENOMEM;
> +		bitmap_fill(r->mon.mbm_cntr_free_map, r->mon.num_mbm_cntrs);
> +	}
> +	return 0;
> +}
> +
> +static void __exit mbm_cntrs_exit(struct rdt_resource *r)
> +{
> +	if (r->mon.mbm_cntr_assignable)
> +		bitmap_free(r->mon.mbm_cntr_free_map);
> +}
> +
>  int __init rdt_get_mon_l3_config(struct rdt_resource *r)
>  {
>  	unsigned int mbm_offset = boot_cpu_data.x86_cache_mbm_width_offset;
> @@ -1240,6 +1264,10 @@ int __init rdt_get_mon_l3_config(struct rdt_resource *r)
>  		}
>  	}
>  
> +	ret = mbm_cntrs_init(r);
> +	if (ret)
> +		return ret;

Missing cleanup of earlier allocation on error path here. Even so, this does not
seem to integrate with existing dom_data_init() wrt ordering and locking. Could
this be more fitting when merged with dom_data_init() (after moving it)?

> +
>  	l3_mon_evt_init(r);
>  
>  	r->mon_capable = true;
> @@ -1247,9 +1275,10 @@ int __init rdt_get_mon_l3_config(struct rdt_resource *r)
>  	return 0;
>  }
>  
> -void __exit rdt_put_mon_l3_config(void)
> +void __exit rdt_put_mon_l3_config(struct rdt_resource *r)
>  {
>  	dom_data_exit();
> +	mbm_cntrs_exit(r);

There is a mismatch wrt locking used in dom_data_exit() and mbm_cntrs_exit() that is
sure to cause confusion and difficulty in the MPAM transition.

>  }
>  
>  void __init intel_rdt_mbm_apply_quirk(void)
> diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
> index ba737890d5c2..a51992984832 100644
> --- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
> +++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
> @@ -185,6 +185,25 @@ bool closid_allocated(unsigned int closid)
>  	return !test_bit(closid, &closid_free_map);
>  }
>  
> +int mbm_cntr_alloc(struct rdt_resource *r)
> +{
> +	int cntr_id;
> +
> +	cntr_id = find_first_bit(r->mon.mbm_cntr_free_map,
> +				 r->mon.num_mbm_cntrs);
> +	if (cntr_id >= r->mon.num_mbm_cntrs)
> +		return -ENOSPC;
> +
> +	__clear_bit(cntr_id, r->mon.mbm_cntr_free_map);
> +
> +	return cntr_id;
> +}
> +
> +void mbm_cntr_free(struct rdt_resource *r, u32 cntr_id)
> +{
> +	__set_bit(cntr_id, r->mon.mbm_cntr_free_map);
> +}
> +
>  /**
>   * rdtgroup_mode_by_closid - Return mode of resource group with closid
>   * @closid: closid if the resource group
> diff --git a/include/linux/resctrl.h b/include/linux/resctrl.h
> index f11d6fdfd977..aab22ff8e0c1 100644
> --- a/include/linux/resctrl.h
> +++ b/include/linux/resctrl.h
> @@ -187,12 +187,14 @@ enum resctrl_scope {
>   * @num_rmid:		Number of RMIDs available
>   * @num_mbm_cntrs:	Number of assignable monitoring counters
>   * @mbm_cntr_assignable:Is system capable of supporting monitor assignment?
> + * @mbm_cntr_free_map:	bitmap of number of assignable MBM counters

The "number of" is not clear ... it seems to indicate tracking a count? How about
just "bitmap of free MBM counters"

>   * @evt_list:		List of monitoring events
>   */
>  struct resctrl_mon {
>  	int			num_rmid;
>  	int			num_mbm_cntrs;
>  	bool			mbm_cntr_assignable;
> +	unsigned long		*mbm_cntr_free_map;
>  	struct list_head	evt_list;
>  };
>  

Reinette

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [PATCH v7 10/24] x86/resctrl: Introduce mbm_total_cfg and mbm_local_cfg in struct rdt_hw_mon_domain
  2024-09-04 22:21 ` [PATCH v7 10/24] x86/resctrl: Introduce mbm_total_cfg and mbm_local_cfg in struct rdt_hw_mon_domain Babu Moger
@ 2024-09-19 16:51   ` Reinette Chatre
  2024-09-23 18:43     ` Moger, Babu
  0 siblings, 1 reply; 96+ messages in thread
From: Reinette Chatre @ 2024-09-19 16:51 UTC (permalink / raw)
  To: Babu Moger, corbet, fenghua.yu, tglx, mingo, bp, dave.hansen
  Cc: x86, hpa, paulmck, rdunlap, tj, peterz, yanjiewtw, kim.phillips,
	lukas.bulwahn, seanjc, jmattson, leitao, jpoimboe,
	rick.p.edgecombe, kirill.shutemov, jithu.joseph, kai.huang,
	kan.liang, daniel.sneddon, pbonzini, sandipan.das, ilpo.jarvinen,
	peternewman, maciej.wieczor-retman, linux-doc, linux-kernel,
	eranian, james.morse

Hi Babu,

On 9/4/24 3:21 PM, Babu Moger wrote:
> If the BMEC (Bandwidth Monitoring Event Configuration) feature is
> supported, the bandwidth events can be configured to track specific
> events. The event configuration is domain specific. ABMC (Assignable
> Bandwidth Monitoring Counters) feature needs event configuration
> information to assign hardware counter to an RMID. Event configurations

"to assign hardware counter" -> "to assign a hardware counter"?

> are not stored in resctrl but instead always read from or written to
> hardware directly when prompted by user space.
> 
> Read the event configuration from the hardware during the domain
> initialization. Save the configuration value in rdt_hw_mon_domain,

"rdt_hw_mon_domain" -> "struct rdt_hw_mon_domain"

> so it can be used for counter assignment.
> 
> Signed-off-by: Babu Moger <babu.moger@amd.com>
> ---
> v7: Fixed initializing INVALID_CONFIG_VALUE to mbm_local_cfg in case of error.
> 
> v6: Renamed resctrl_arch_mbm_evt_config -> resctrl_mbm_evt_config_init
>     Initialized value to INVALID_CONFIG_VALUE if it is not configurable.
>     Minor commit message update.
> 
> v5: Exported mon_event_config_index_get.
>     Renamed arch_domain_mbm_evt_config to resctrl_arch_mbm_evt_config.
> 
> v4: Read the configuration information from the hardware to initialize.
>     Added few commit messages.
>     Fixed the tab spaces.
> 
> v3: Minor changes related to rebase in mbm_config_write_domain.
> 
> v2: No changes.
> ---
>  arch/x86/kernel/cpu/resctrl/core.c     |  2 ++
>  arch/x86/kernel/cpu/resctrl/internal.h |  9 +++++++++
>  arch/x86/kernel/cpu/resctrl/monitor.c  | 26 ++++++++++++++++++++++++++
>  arch/x86/kernel/cpu/resctrl/rdtgroup.c |  4 +---
>  4 files changed, 38 insertions(+), 3 deletions(-)
> 
> diff --git a/arch/x86/kernel/cpu/resctrl/core.c b/arch/x86/kernel/cpu/resctrl/core.c
> index 00ad00258df2..2a4be004a2df 100644
> --- a/arch/x86/kernel/cpu/resctrl/core.c
> +++ b/arch/x86/kernel/cpu/resctrl/core.c
> @@ -632,6 +632,8 @@ static void domain_add_cpu_mon(int cpu, struct rdt_resource *r)
>  
>  	arch_mon_domain_online(r, d);
>  
> +	resctrl_mbm_evt_config_init(hw_dom);

Now that the arch and fs separate becomes clear I wonder if it may help to understand
this work if we start using clear namespaces to help this distinction. Surely the
arch code is very inconsistent in this regard (thus this function fits in), but
resctrl_ has to be the prefix for fs code.

Reinette

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [PATCH v7 11/24] x86/resctrl: Remove MSR reading of event configuration value
  2024-09-04 22:21 ` [PATCH v7 11/24] x86/resctrl: Remove MSR reading of event configuration value Babu Moger
@ 2024-09-19 16:55   ` Reinette Chatre
  2024-09-23 18:45     ` Moger, Babu
  0 siblings, 1 reply; 96+ messages in thread
From: Reinette Chatre @ 2024-09-19 16:55 UTC (permalink / raw)
  To: Babu Moger, corbet, fenghua.yu, tglx, mingo, bp, dave.hansen
  Cc: x86, hpa, paulmck, rdunlap, tj, peterz, yanjiewtw, kim.phillips,
	lukas.bulwahn, seanjc, jmattson, leitao, jpoimboe,
	rick.p.edgecombe, kirill.shutemov, jithu.joseph, kai.huang,
	kan.liang, daniel.sneddon, pbonzini, sandipan.das, ilpo.jarvinen,
	peternewman, maciej.wieczor-retman, linux-doc, linux-kernel,
	eranian, james.morse

Hi Babu,

On 9/4/24 3:21 PM, Babu Moger wrote:
> The event configuration is domain specific and initialized during domain
> initialization. The values are stored in struct rdt_hw_mon_domain.
> 
> It is not required to read the configuration register every time user asks
> for it. Use the value stored in struct rdt_hw_mon_domain instead.
> 
> Introduce resctrl_arch_event_config_get() and
> resctrl_arch_event_config_set() to get/set architecture domain specific
> mbm_total_cfg/mbm_local_cfg values.
> 
> Signed-off-by: Babu Moger <babu.moger@amd.com>
> ---

This change looks fine, but could the function names be more specific? For example,
 resctrl_arch_mon_event_config_get()/resctrl_arch_mon_event_config_set()?

Reinette




^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [PATCH v7 13/24] x86/resctrl: Add data structures and definitions for ABMC assignment
  2024-09-04 22:21 ` [PATCH v7 13/24] x86/resctrl: Add data structures and definitions for ABMC assignment Babu Moger
@ 2024-09-19 17:08   ` Reinette Chatre
  2024-09-23 20:21     ` Moger, Babu
  0 siblings, 1 reply; 96+ messages in thread
From: Reinette Chatre @ 2024-09-19 17:08 UTC (permalink / raw)
  To: Babu Moger, corbet, fenghua.yu, tglx, mingo, bp, dave.hansen
  Cc: x86, hpa, paulmck, rdunlap, tj, peterz, yanjiewtw, kim.phillips,
	lukas.bulwahn, seanjc, jmattson, leitao, jpoimboe,
	rick.p.edgecombe, kirill.shutemov, jithu.joseph, kai.huang,
	kan.liang, daniel.sneddon, pbonzini, sandipan.das, ilpo.jarvinen,
	peternewman, maciej.wieczor-retman, linux-doc, linux-kernel,
	eranian, james.morse

Hi Babu,

On 9/4/24 3:21 PM, Babu Moger wrote:
> +/*
> + * ABMC counters can be configured by writing to L3_QOS_ABMC_CFG.
> + * @bw_type		: Bandwidth configuration(supported by BMEC)
> + *			  tracked by the @cntr_id.
> + * @bw_src		: Bandwidth source (RMID or CLOSID).
> + * @reserved1		: Reserved.
> + * @is_clos		: @bw_src field is a CLOSID (not an RMID).
> + * @cntr_id		: Counter identifier.
> + * @reserved		: Reserved.
> + * @cntr_en		: Tracking enable bit.
> + * @cfg_en		: Configuration enable bit.
> + *
> + * Configuration and tracking:
> + * CfgEn=1,CtrEn=0 : Configure CtrID and but no tracking the events yet.
> + * CfgEn=1,CtrEn=1 : Configure CtrID and start tracking events.

Thanks for moving the text ... could it now be made to match the new (outside
AMD arch document) destination? For example, "CfgEn" becomes "@cfg_en",
"CtrID" becomes "@cntr_id" etc. Also please fix language, for example
what does "and but no tracking the events yet" mean? So far this work
has focused on "counting" vs "not counting" events and it is not
clear how this "tracking" fits it ... this seems to be the hardware
view that means "tracking the RMID to which @cntr_id is assigned"?
Please help readers to understand how the implementation is supported
by the hardware.

Reinette


^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [PATCH v7 15/24] x86/resctrl: Implement resctrl_arch_assign_cntr to assign a counter with ABMC
  2024-09-04 22:21 ` [PATCH v7 15/24] x86/resctrl: Implement resctrl_arch_assign_cntr to assign a counter with ABMC Babu Moger
@ 2024-09-19 17:13   ` Reinette Chatre
  2024-09-23 21:03     ` Moger, Babu
  0 siblings, 1 reply; 96+ messages in thread
From: Reinette Chatre @ 2024-09-19 17:13 UTC (permalink / raw)
  To: Babu Moger, corbet, fenghua.yu, tglx, mingo, bp, dave.hansen
  Cc: x86, hpa, paulmck, rdunlap, tj, peterz, yanjiewtw, kim.phillips,
	lukas.bulwahn, seanjc, jmattson, leitao, jpoimboe,
	rick.p.edgecombe, kirill.shutemov, jithu.joseph, kai.huang,
	kan.liang, daniel.sneddon, pbonzini, sandipan.das, ilpo.jarvinen,
	peternewman, maciej.wieczor-retman, linux-doc, linux-kernel,
	eranian, james.morse

Hi Babu,

In subject, please use "()" for a function.

On 9/4/24 3:21 PM, Babu Moger wrote:
> +/*
> + * Send an IPI to the domain to assign the counter to RMID, event pair.
> + */
> +int resctrl_arch_assign_cntr(struct rdt_resource *r, struct rdt_mon_domain *d,
> +			     enum resctrl_event_id evtid, u32 rmid, u32 closid,
> +			     u32 cntr_id, bool assign)

Looking ahead this is also called when config of existing assigned counter is
changed. Should this thus perhaps be resctrl_arch_config_cntr()? 

> +{
> +	struct rdt_hw_mon_domain *hw_dom = resctrl_to_arch_mon_dom(d);
> +	union l3_qos_abmc_cfg abmc_cfg = { 0 };
> +	struct arch_mbm_state *arch_mbm;
> +
> +	abmc_cfg.split.cfg_en = 1;

Just to confirm ... a counter remains "configured" from the hardware side whether it
is assigned from resctrl perspective or not? It seems to me that once a counter is
"unassigned" from resctrl perspective it needs no more context about that
counter, yet it remains configured from hardware side?

Reinette

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [PATCH v7 16/24] x86/resctrl: Add the interface to assign/update counter assignment
  2024-09-04 22:21 ` [PATCH v7 16/24] x86/resctrl: Add the interface to assign/update counter assignment Babu Moger
@ 2024-09-19 17:20   ` Reinette Chatre
  2024-09-26 16:28     ` Moger, Babu
  0 siblings, 1 reply; 96+ messages in thread
From: Reinette Chatre @ 2024-09-19 17:20 UTC (permalink / raw)
  To: Babu Moger, corbet, fenghua.yu, tglx, mingo, bp, dave.hansen
  Cc: x86, hpa, paulmck, rdunlap, tj, peterz, yanjiewtw, kim.phillips,
	lukas.bulwahn, seanjc, jmattson, leitao, jpoimboe,
	rick.p.edgecombe, kirill.shutemov, jithu.joseph, kai.huang,
	kan.liang, daniel.sneddon, pbonzini, sandipan.das, ilpo.jarvinen,
	peternewman, maciej.wieczor-retman, linux-doc, linux-kernel,
	eranian, james.morse

Hi Babu,

On 9/4/24 3:21 PM, Babu Moger wrote:
> diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
> index 7ad653b4e768..1d45120ff2b5 100644
> --- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
> +++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
> @@ -864,6 +864,13 @@ static int rdtgroup_rmid_show(struct kernfs_open_file *of,
>  	return ret;
>  }
>  
> +/*
> + * Get the counter index for the assignable counter
> + * 0 for evtid == QOS_L3_MBM_TOTAL_EVENT_ID
> + * 1 for evtid == QOS_L3_MBM_LOCAL_EVENT_ID
> + */
> +#define MBM_EVENT_ARRAY_INDEX(_event) ((_event) - 2)
> +
>  static int rdtgroup_mbm_assign_mode_show(struct kernfs_open_file *of,
>  					 struct seq_file *s, void *v)
>  {
> @@ -1898,6 +1905,45 @@ int resctrl_arch_assign_cntr(struct rdt_resource *r, struct rdt_mon_domain *d,
>  	return 0;
>  }
>  
> +/*
> + * Assign a hardware counter to the group.
> + * Counter will be assigned to all the domains if rdt_mon_domain is NULL
> + * else the counter will be allocated to specific domain.
> + */
> +int rdtgroup_assign_cntr(struct rdt_resource *r, struct rdtgroup *rdtgrp,
> +			 struct rdt_mon_domain *d, enum resctrl_event_id evtid)

Could we please review the naming of function as this series progresses? Using such a generic
name for this specific function seems to result in its callers later in series having even more
generic names that are hard to decipher. For example, later (very generic) "rdtgroup_assign_grp()"
calls this function and I find rdtgroup_assign_grp() to be very vague making the code more difficult
to follow. For example, rdtgroup_assign_cntr() could be rdtgroup_assign_cntr_event() and
rdtgroup_assign_grp() could instead be rdtgroup_assign_cntr()?  Please feel free to improve. 

> +{
> +	int index = MBM_EVENT_ARRAY_INDEX(evtid);
> +	int cntr_id = rdtgrp->mon.cntr_id[index];
> +
> +	/*
> +	 * Allocate a new counter id to the group if the counter id is not
> +	 * is not assigned already.

"is not is not" -> "is not"

Reinette

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [PATCH v7 17/24] x86/resctrl: Add the interface to unassign a MBM counter
  2024-09-04 22:21 ` [PATCH v7 17/24] x86/resctrl: Add the interface to unassign a MBM counter Babu Moger
@ 2024-09-19 17:26   ` Reinette Chatre
  2024-09-26 16:56     ` Moger, Babu
  0 siblings, 1 reply; 96+ messages in thread
From: Reinette Chatre @ 2024-09-19 17:26 UTC (permalink / raw)
  To: Babu Moger, corbet, fenghua.yu, tglx, mingo, bp, dave.hansen
  Cc: x86, hpa, paulmck, rdunlap, tj, peterz, yanjiewtw, kim.phillips,
	lukas.bulwahn, seanjc, jmattson, leitao, jpoimboe,
	rick.p.edgecombe, kirill.shutemov, jithu.joseph, kai.huang,
	kan.liang, daniel.sneddon, pbonzini, sandipan.das, ilpo.jarvinen,
	peternewman, maciej.wieczor-retman, linux-doc, linux-kernel,
	eranian, james.morse

Hi Babu,

On 9/4/24 3:21 PM, Babu Moger wrote:
> The mbm_cntr_assign mode provides a limited number of hardware counters
> that can be assigned to an RMID-event pair to monitor bandwidth while
> assigned. If all counters are in use, the kernel will show an error
> message: "Out of MBM assignable counters" when a new assignment is
> requested. To make space for a new assignment, users must unassign an
> already assigned counter.
> 
> Introduce an interface that allows for the unassignment of counter IDs
> from both the group and the domain. Additionally, ensure that the global
> counter is released if it is no longer assigned to any domains.
> 
> Signed-off-by: Babu Moger <babu.moger@amd.com>
> ---
> v7: Merged rdtgroup_unassign_cntr and rdtgroup_free_cntr functions.
>     Renamed rdtgroup_mbm_cntr_test() to rdtgroup_mbm_cntr_is_assigned().
>     Reworded the commit log little bit.
> 
> v6: Removed mbm_cntr_free from this patch.
>     Added counter test in all the domains and free if it is not assigned to
>     any domains.
> 
> v5: Few name changes to match cntr_id.
>     Changed the function names to rdtgroup_unassign_cntr
>     More comments on commit log.
> 
> v4: Added domain specific unassign feature.
>     Few name changes.
> 
> v3: Removed the static from the prototype of rdtgroup_unassign_abmc.
>     The function is not called directly from user anymore. These
>     changes are related to global assignment interface.
> 
> v2: No changes.
> ---
>  arch/x86/kernel/cpu/resctrl/internal.h |  2 ++
>  arch/x86/kernel/cpu/resctrl/rdtgroup.c | 49 ++++++++++++++++++++++++++
>  2 files changed, 51 insertions(+)
> 
> diff --git a/arch/x86/kernel/cpu/resctrl/internal.h b/arch/x86/kernel/cpu/resctrl/internal.h
> index 6a90fc20be5b..9a65a13ccbe9 100644
> --- a/arch/x86/kernel/cpu/resctrl/internal.h
> +++ b/arch/x86/kernel/cpu/resctrl/internal.h
> @@ -707,6 +707,8 @@ int resctrl_arch_assign_cntr(struct rdt_resource *r, struct rdt_mon_domain *d,
>  			     u32 cntr_id, bool assign);
>  int rdtgroup_assign_cntr(struct rdt_resource *r, struct rdtgroup *rdtgrp,
>  			 struct rdt_mon_domain *d, enum resctrl_event_id evtid);
> +int rdtgroup_unassign_cntr(struct rdt_resource *r, struct rdtgroup *rdtgrp,
> +			   struct rdt_mon_domain *d, enum resctrl_event_id evtid);
>  void rdt_staged_configs_clear(void);
>  bool closid_allocated(unsigned int closid);
>  int resctrl_find_cleanest_closid(void);
> diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
> index 1d45120ff2b5..21b9ca4ce493 100644
> --- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
> +++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
> @@ -1944,6 +1944,55 @@ int rdtgroup_assign_cntr(struct rdt_resource *r, struct rdtgroup *rdtgrp,
>  	return 0;
>  }
>  
> +static int rdtgroup_mbm_cntr_is_assigned(struct rdt_resource *r, u32 cntr_id)

Should this return bool?

With function prefix of "rdtgroup" I would expect that an rdtgroup would be one of its
parameters but that is not the case ... this is nothing to do with a rdtgroup.
Maybe something like "mbm_cntr_assigned_to_domain()"?

> +{
> +	struct rdt_mon_domain *d;
> +
> +	list_for_each_entry(d, &r->mon_domains, hdr.list)

Based on function name it is unexpected that it checks the global bitmap and not the
domain lists. The function really needs a more appropriate name to reflect what it
actually does.

> +		if (test_bit(cntr_id, d->mbm_cntr_map))
> +			return 1;
> +
> +	return 0;
> +}
> +
> +/*
> + * Unassign a hardware counter from the domain and the group. Global
> + * counter will be freed once it is unassigned from all the domains.

Could this also get a similar comment as partner function about special
meaning of NULL domain?

> + */
> +int rdtgroup_unassign_cntr(struct rdt_resource *r, struct rdtgroup *rdtgrp,
> +			   struct rdt_mon_domain *d,
> +			   enum resctrl_event_id evtid)
> +{
> +	int index = MBM_EVENT_ARRAY_INDEX(evtid);
> +	int cntr_id = rdtgrp->mon.cntr_id[index];
> +
> +	if (cntr_id != MON_CNTR_UNSET) {

Function can exit early after the MON_CNTR_UNSET check to reduce level of
indentation in rest of function.

> +		if (!d) {
> +			list_for_each_entry(d, &r->mon_domains, hdr.list) {
> +				resctrl_arch_assign_cntr(r, d, evtid,
> +							 rdtgrp->mon.rmid,
> +							 rdtgrp->closid,
> +							 cntr_id, false);
> +				clear_bit(cntr_id, d->mbm_cntr_map);
> +			}
> +		} else {
> +			resctrl_arch_assign_cntr(r, d, evtid,
> +						 rdtgrp->mon.rmid,
> +						 rdtgrp->closid,
> +						 cntr_id, false);
> +			clear_bit(cntr_id, d->mbm_cntr_map);
> +		}
> +
> +		/* Update the counter bitmap */
> +		if (!rdtgroup_mbm_cntr_is_assigned(r, cntr_id)) {
> +			mbm_cntr_free(r, cntr_id);
> +			rdtgrp->mon.cntr_id[index] = MON_CNTR_UNSET;
> +		}
> +	}
> +
> +	return 0;

This function is called many times and there are always paths adding complexity
to handle error from this function ... yet it always returns 0. I expect that it should
actually do error checking of the arch callback that could actually fail on other archs, that
should impact this function's return value and make the need for error handling apparent.

> +}
> +
>  /* rdtgroup information files for one cache resource. */
>  static struct rftype res_common_files[] = {
>  	{


Reinette

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [PATCH v7 18/24] x86/resctrl: Auto Assign/unassign counters when mbm_cntr_assign is enabled
  2024-09-04 22:21 ` [PATCH v7 18/24] x86/resctrl: Auto Assign/unassign counters when mbm_cntr_assign is enabled Babu Moger
@ 2024-09-19 17:29   ` Reinette Chatre
  2024-09-26 18:48     ` Moger, Babu
  0 siblings, 1 reply; 96+ messages in thread
From: Reinette Chatre @ 2024-09-19 17:29 UTC (permalink / raw)
  To: Babu Moger, corbet, fenghua.yu, tglx, mingo, bp, dave.hansen
  Cc: x86, hpa, paulmck, rdunlap, tj, peterz, yanjiewtw, kim.phillips,
	lukas.bulwahn, seanjc, jmattson, leitao, jpoimboe,
	rick.p.edgecombe, kirill.shutemov, jithu.joseph, kai.huang,
	kan.liang, daniel.sneddon, pbonzini, sandipan.das, ilpo.jarvinen,
	peternewman, maciej.wieczor-retman, linux-doc, linux-kernel,
	eranian, james.morse

Hi Babu,

Subject: "Assign" -> "assign"

On 9/4/24 3:21 PM, Babu Moger wrote:

>  arch/x86/kernel/cpu/resctrl/rdtgroup.c | 61 ++++++++++++++++++++++++++
>  1 file changed, 61 insertions(+)
> 
> diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
> index 21b9ca4ce493..bf94e4e05540 100644
> --- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
> +++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
> @@ -2866,6 +2866,52 @@ static void schemata_list_destroy(void)
>  	}
>  }
>  
> +/*
> + * Called when a new group is created. If `mbm_cntr_assign` mode is enabled,
> + * counters are automatically assigned. Each group requires two counters:
> + * one for the total event and one for the local event. Due to the limited
> + * number of counters, assignments may fail in some cases. However, it is
> + * not necessary to fail the group creation. Users have the option to
> + * modify the assignments after the group has been created.
> + */
> +static int rdtgroup_assign_grp(struct rdtgroup *rdtgrp)
> +{
> +	struct rdt_resource *r = &rdt_resources_all[RDT_RESOURCE_L3].r_resctrl;
> +	int ret = 0;
> +
> +	if (!resctrl_arch_mbm_cntr_assign_enabled(r))
> +		return 0;
> +
> +	if (is_mbm_total_enabled())
> +		ret = rdtgroup_assign_cntr(r, rdtgrp, NULL, QOS_L3_MBM_TOTAL_EVENT_ID);
> +
> +	if (!ret && is_mbm_local_enabled())
> +		ret = rdtgroup_assign_cntr(r, rdtgrp, NULL, QOS_L3_MBM_LOCAL_EVENT_ID);
> +
> +	return ret;
> +}
> +
> +/*
> + * Called when a group is deleted. Counters are unassigned if it was in
> + * assigned state.
> + */
> +static int rdtgroup_unassign_grp(struct rdtgroup *rdtgrp)
> +{
> +	struct rdt_resource *r = &rdt_resources_all[RDT_RESOURCE_L3].r_resctrl;
> +	int ret = 0;
> +
> +	if (!resctrl_arch_mbm_cntr_assign_enabled(r))
> +		return 0;
> +
> +	if (is_mbm_total_enabled())
> +		ret = rdtgroup_unassign_cntr(r, rdtgrp, NULL, QOS_L3_MBM_TOTAL_EVENT_ID);
> +
> +	if (!ret && is_mbm_local_enabled())
> +		ret = rdtgroup_unassign_cntr(r, rdtgrp, NULL, QOS_L3_MBM_LOCAL_EVENT_ID);
> +
> +	return ret;
> +}
> +
>  static int rdt_get_tree(struct fs_context *fc)
>  {
>  	struct rdt_fs_context *ctx = rdt_fc2context(fc);
> @@ -2925,6 +2971,8 @@ static int rdt_get_tree(struct fs_context *fc)
>  		if (ret < 0)
>  			goto out_mongrp;
>  		rdtgroup_default.mon.mon_data_kn = kn_mondata;
> +
> +		rdtgroup_assign_grp(&rdtgroup_default);
>  	}
>  
>  	ret = rdt_pseudo_lock_init();
> @@ -2955,6 +3003,7 @@ static int rdt_get_tree(struct fs_context *fc)
>  out_psl:
>  	rdt_pseudo_lock_release();
>  out_mondata:
> +	rdtgroup_unassign_grp(&rdtgroup_default);
>  	if (resctrl_arch_mon_capable())
>  		kernfs_remove(kn_mondata);
>  out_mongrp:
> @@ -3214,6 +3263,8 @@ static void rdt_kill_sb(struct super_block *sb)
>  		resctrl_arch_disable_alloc();
>  	if (resctrl_arch_mon_capable())
>  		resctrl_arch_disable_mon();
> +
> +	rdtgroup_unassign_grp(&rdtgroup_default);
>  	resctrl_mounted = false;
>  	kernfs_kill_sb(sb);
>  	mutex_unlock(&rdtgroup_mutex);
> @@ -3805,6 +3856,8 @@ static int rdtgroup_mkdir_mon(struct kernfs_node *parent_kn,
>  		goto out_unlock;
>  	}
>  
> +	rdtgroup_assign_grp(rdtgrp);
> +
>  	kernfs_activate(rdtgrp->kn);
>  
>  	/*
> @@ -3849,6 +3902,8 @@ static int rdtgroup_mkdir_ctrl_mon(struct kernfs_node *parent_kn,
>  	if (ret)
>  		goto out_closid_free;
>  
> +	rdtgroup_assign_grp(rdtgrp);
> +
>  	kernfs_activate(rdtgrp->kn);
>  
>  	ret = rdtgroup_init_alloc(rdtgrp);
> @@ -3874,6 +3929,7 @@ static int rdtgroup_mkdir_ctrl_mon(struct kernfs_node *parent_kn,
>  out_del_list:
>  	list_del(&rdtgrp->rdtgroup_list);
>  out_rmid_free:
> +	rdtgroup_unassign_grp(rdtgrp);
>  	mkdir_rdt_prepare_rmid_free(rdtgrp);
>  out_closid_free:
>  	closid_free(closid);
> @@ -3944,6 +4000,9 @@ static int rdtgroup_rmdir_mon(struct rdtgroup *rdtgrp, cpumask_var_t tmpmask)
>  	update_closid_rmid(tmpmask, NULL);
>  
>  	rdtgrp->flags = RDT_DELETED;
> +
> +	rdtgroup_unassign_grp(rdtgrp);
> +
>  	free_rmid(rdtgrp->closid, rdtgrp->mon.rmid);
>  
>  	/*
> @@ -3990,6 +4049,8 @@ static int rdtgroup_rmdir_ctrl(struct rdtgroup *rdtgrp, cpumask_var_t tmpmask)
>  	cpumask_or(tmpmask, tmpmask, &rdtgrp->cpu_mask);
>  	update_closid_rmid(tmpmask, NULL);
>  
> +	rdtgroup_unassign_grp(rdtgrp);
> +
>  	free_rmid(rdtgrp->closid, rdtgrp->mon.rmid);
>  	closid_free(rdtgrp->closid);
>  

Apart from earlier comment about rdtgroup_assign_grp()/rdtgroup_unassign_grp() naming, please also
take care about how these functions are integrated since it seems to be inconsistent wrt whether it is called
on mon capable resource. Also, I can see how the counter is removed when CTRL_MON group and MON group are
explicitly removed but it is not clear to me how when a user removes a CTRL_MON group how the counters
assigned to its child MON groups are unassigned.

Reinette


^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [PATCH v7 19/24] x86/resctrl: Report "Unassigned" for MBM events in mbm_cntr_assign mode
  2024-09-04 22:21 ` [PATCH v7 19/24] x86/resctrl: Report "Unassigned" for MBM events in mbm_cntr_assign mode Babu Moger
@ 2024-09-19 17:31   ` Reinette Chatre
  2024-09-26 19:16     ` Moger, Babu
  0 siblings, 1 reply; 96+ messages in thread
From: Reinette Chatre @ 2024-09-19 17:31 UTC (permalink / raw)
  To: Babu Moger, corbet, fenghua.yu, tglx, mingo, bp, dave.hansen
  Cc: x86, hpa, paulmck, rdunlap, tj, peterz, yanjiewtw, kim.phillips,
	lukas.bulwahn, seanjc, jmattson, leitao, jpoimboe,
	rick.p.edgecombe, kirill.shutemov, jithu.joseph, kai.huang,
	kan.liang, daniel.sneddon, pbonzini, sandipan.das, ilpo.jarvinen,
	peternewman, maciej.wieczor-retman, linux-doc, linux-kernel,
	eranian, james.morse

Hi Babu,

On 9/4/24 3:21 PM, Babu Moger wrote:
> In mbm_cntr_assign mode, the hardware counter should be assigned to read
> the MBM events.
> 
> Report "Unassigned" in case the user attempts to read the events without
> assigning the counter.
> 
> Signed-off-by: Babu Moger <babu.moger@amd.com>
> ---
> v7: Moved the documentation under "mon_data".
>     Updated the text little bit.
> 
> v6: Added more explaination in the resctrl.rst
>     Added checks to detect "Unassigned" before reading RMID.
> 
> v5: New patch.
> ---
>  Documentation/arch/x86/resctrl.rst        | 10 ++++++++++
>  arch/x86/kernel/cpu/resctrl/ctrlmondata.c | 13 ++++++++++++-
>  2 files changed, 22 insertions(+), 1 deletion(-)
> 
> diff --git a/Documentation/arch/x86/resctrl.rst b/Documentation/arch/x86/resctrl.rst
> index 3e9302971faf..ff5397d19704 100644
> --- a/Documentation/arch/x86/resctrl.rst
> +++ b/Documentation/arch/x86/resctrl.rst
> @@ -417,6 +417,16 @@ When monitoring is enabled all MON groups will also contain:
>  	for the L3 cache they occupy). These are named "mon_sub_L3_YY"
>  	where "YY" is the node number.
>  
> +	The mbm_cntr_assign mode allows users to assign a hardware counter
> +	to an RMID-event pair, enabling bandwidth monitoring for as long
> +	as the counter remains assigned. The hardware will continue tracking
> +	the assigned RMID until the user manually unassigns it, ensuring
> +	that counters are not reset during this period. With a limited number
> +	of counters, the system may run out of assignable resources. In
> +	mbm_cntr_assign mode, MBM event counters will return "Unassigned"
> +	if the counter is not allocated to the event when read. Users must
> +	manually assign a counter to read the events.
> +

Please consider how this text could also be relevant to soft-ABMC.

>  "mon_hw_id":
>  	Available only with debug option. The identifier used by hardware
>  	for the monitor group. On x86 this is the RMID.
> diff --git a/arch/x86/kernel/cpu/resctrl/ctrlmondata.c b/arch/x86/kernel/cpu/resctrl/ctrlmondata.c
> index 50fa1fe9a073..fc19b1d131b2 100644
> --- a/arch/x86/kernel/cpu/resctrl/ctrlmondata.c
> +++ b/arch/x86/kernel/cpu/resctrl/ctrlmondata.c
> @@ -562,7 +562,7 @@ int rdtgroup_mondata_show(struct seq_file *m, void *arg)
>  	struct rdtgroup *rdtgrp;
>  	struct rdt_resource *r;
>  	union mon_data_bits md;
> -	int ret = 0;
> +	int ret = 0, index;
>  
>  	rdtgrp = rdtgroup_kn_lock_live(of->kn);
>  	if (!rdtgrp) {
> @@ -576,6 +576,15 @@ int rdtgroup_mondata_show(struct seq_file *m, void *arg)
>  	evtid = md.u.evtid;
>  	r = &rdt_resources_all[resid].r_resctrl;
>  
> +	if (resctrl_arch_mbm_cntr_assign_enabled(r) && evtid != QOS_L3_OCCUP_EVENT_ID) {
> +		index = mon_event_config_index_get(evtid);

This should use MBM_EVENT_ARRAY_INDEX, not the arch index.

> +		if (index != INVALID_CONFIG_INDEX &&
> +		    rdtgrp->mon.cntr_id[index] == MON_CNTR_UNSET) {
> +			rr.err = -ENOENT;
> +			goto checkresult;
> +		}
> +	}
> +
>  	if (md.u.sum) {
>  		/*
>  		 * This file requires summing across all domains that share
> @@ -613,6 +622,8 @@ int rdtgroup_mondata_show(struct seq_file *m, void *arg)
>  		seq_puts(m, "Error\n");
>  	else if (rr.err == -EINVAL)
>  		seq_puts(m, "Unavailable\n");
> +	else if (rr.err == -ENOENT)
> +		seq_puts(m, "Unassigned\n");
>  	else
>  		seq_printf(m, "%llu\n", rr.val);
>  

Reinette

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [PATCH v7 20/24] x86/resctrl: Introduce the interface to switch between monitor modes
  2024-09-04 22:21 ` [PATCH v7 20/24] x86/resctrl: Introduce the interface to switch between monitor modes Babu Moger
@ 2024-09-19 17:38   ` Reinette Chatre
  2024-09-26 19:39     ` Moger, Babu
  0 siblings, 1 reply; 96+ messages in thread
From: Reinette Chatre @ 2024-09-19 17:38 UTC (permalink / raw)
  To: Babu Moger, corbet, fenghua.yu, tglx, mingo, bp, dave.hansen
  Cc: x86, hpa, paulmck, rdunlap, tj, peterz, yanjiewtw, kim.phillips,
	lukas.bulwahn, seanjc, jmattson, leitao, jpoimboe,
	rick.p.edgecombe, kirill.shutemov, jithu.joseph, kai.huang,
	kan.liang, daniel.sneddon, pbonzini, sandipan.das, ilpo.jarvinen,
	peternewman, maciej.wieczor-retman, linux-doc, linux-kernel,
	eranian, james.morse

Hi Babu,

On 9/4/24 3:21 PM, Babu Moger wrote:
> Introduce interface to switch between mbm_cntr_assign and default modes.
> 
> $ cat /sys/fs/resctrl/info/L3_MON/mbm_assign_mode
> [mbm_cntr_assign]
> default
> 
> To enable the "mbm_cntr_assign" mode:
> $ echo "mbm_cntr_assign" > /sys/fs/resctrl/info/L3_MON/mbm_assign_mode
> 
> To enable the default monitoring mode:
> $ echo "default" > /sys/fs/resctrl/info/L3_MON/mbm_assign_mode
> 
> MBM event counters will reset when mbm_assign_mode is changed.
> 
> Signed-off-by: Babu Moger <babu.moger@amd.com>
> ---
> v7: Changed the interface name to mbm_assign_mode.
>     Removed the references of ABMC.
>     Added the changes to reset global and domain bitmaps.
>     Added the changes to reset rmid.
> 
> v6: Changed the mode name to mbm_cntr_assign.
>     Moved all the FS related code here.
>     Added changes to reset mbm_cntr_map and resctrl group counters.
> ""
> v5: Change log and mode description text correction.
> 
> v4: Minor commit text changes. Keep the default to ABMC when supported.
>     Fixed comments to reflect changed interface "mbm_mode".
> 
> v3: New patch to address the review comments from upstream.
> ---
>  Documentation/arch/x86/resctrl.rst     | 15 ++++++
>  arch/x86/kernel/cpu/resctrl/rdtgroup.c | 74 +++++++++++++++++++++++++-
>  2 files changed, 88 insertions(+), 1 deletion(-)
> 
> diff --git a/Documentation/arch/x86/resctrl.rst b/Documentation/arch/x86/resctrl.rst
> index ff5397d19704..743c0b64a330 100644
> --- a/Documentation/arch/x86/resctrl.rst
> +++ b/Documentation/arch/x86/resctrl.rst
> @@ -290,6 +290,21 @@ with the following files:
>  	than 'num_mbm_cntrs' to be created. Reading the mbm files may report 'Unavailable'
>  	if there is no hardware resource assigned.
>  
> +	* To enable ABMC feature:

The separation between fs and arch did not make it to this patch?

> +	  ::
> +
> +	    # echo  "mbm_cntr_assign" > /sys/fs/resctrl/info/L3_MON/mbm_assign_mode
> +
> +	* To enable the legacy monitoring feature:

"legacy" -> "default"?

> +	  ::
> +
> +	    # echo  "default" > /sys/fs/resctrl/info/L3_MON/mbm_assign_mode
> +
> +	The MBM event counters will reset when mbm_assign_mode is changed. Moving to

"will reset" -> "may reset"? Please also be clear on what is meant with "MBM event counter".
Note that "counter" has a very specific meaning in this work and after considering that
it is not clear if "MBM event counter will reset" means that the counters are no longer
assigned or if it means that the counts associated with events will be reset.

> +	mbm_cntr_assign will require users to assign the counters to the events to
> +	read the events. Otherwise, the MBM event counters will return "Unassigned"
> +	when read.
> +
>  "num_mbm_cntrs":
>  	The number of monitoring counters available for assignment.
>  
> diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
> index bf94e4e05540..7a8ece12d7da 100644
> --- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
> +++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
> @@ -895,6 +895,77 @@ static int rdtgroup_mbm_assign_mode_show(struct kernfs_open_file *of,
>  	return 0;
>  }
>  
> +static void rdtgroup_mbm_cntr_reset(struct rdt_resource *r)

It is not clear why this has "rdtgroup" prefix since it is not specific to
a resource group but a global action that resets all counters.

> +{
> +	struct rdtgroup *prgrp, *crgrp;
> +	struct rdt_mon_domain *dom;
> +
> +	/*
> +	 * Hardware counters will reset after switching the monitor mode.
> +	 * Reset the architectural state so that reading of hardware
> +	 * counter is not considered as an overflow in the next update.
> +	 * Also reset the domain counter bitmap.
> +	 */
> +	list_for_each_entry(dom, &r->mon_domains, hdr.list) {
> +		bitmap_zero(dom->mbm_cntr_map, r->mon.num_mbm_cntrs);
> +		resctrl_arch_reset_rmid_all(r, dom);
> +	}
> +
> +	/* Reset global MBM counter map */
> +	bitmap_fill(r->mon.mbm_cntr_free_map, r->mon.num_mbm_cntrs);
> +
> +	/* Reset the cntr_id's for all the monitor groups */
> +	list_for_each_entry(prgrp, &rdt_all_groups, rdtgroup_list) {
> +		prgrp->mon.cntr_id[0] = MON_CNTR_UNSET;
> +		prgrp->mon.cntr_id[1] = MON_CNTR_UNSET;
> +		list_for_each_entry(crgrp, &prgrp->mon.crdtgrp_list,
> +				    mon.crdtgrp_list) {
> +			crgrp->mon.cntr_id[0] = MON_CNTR_UNSET;
> +			crgrp->mon.cntr_id[1] = MON_CNTR_UNSET;
> +		}
> +	}
> +}
> +
> +static ssize_t rdtgroup_mbm_assign_mode_write(struct kernfs_open_file *of,
> +					      char *buf, size_t nbytes, loff_t off)
> +{
> +	struct rdt_resource *r = of->kn->parent->priv;
> +	int ret = 0;
> +	bool enable;
> +
> +	/* Valid input requires a trailing newline */
> +	if (nbytes == 0 || buf[nbytes - 1] != '\n')
> +		return -EINVAL;
> +
> +	buf[nbytes - 1] = '\0';
> +
> +	cpus_read_lock();
> +	mutex_lock(&rdtgroup_mutex);
> +
> +	rdt_last_cmd_clear();
> +
> +	if (!strcmp(buf, "default")) {
> +		enable = 0;
> +	} else if (!strcmp(buf, "mbm_cntr_assign")) {
> +		enable = 1;
> +	} else {
> +		ret = -EINVAL;
> +		rdt_last_cmd_puts("Unsupported assign mode\n");
> +		goto write_exit;
> +	}
> +
> +	if (enable != resctrl_arch_mbm_cntr_assign_enabled(r)) {
> +		rdtgroup_mbm_cntr_reset(r);

Should this reset not happen only after the hardware state was changed
successfully? If the arch change failed then this may lead to inconsistent
state.

> +		ret = resctrl_arch_mbm_cntr_assign_set(r, enable);
> +	}
> +
> +write_exit:
> +	mutex_unlock(&rdtgroup_mutex);
> +	cpus_read_unlock();
> +
> +	return ret ?: nbytes;
> +}
> +
>  static int rdtgroup_num_mbm_cntrs_show(struct kernfs_open_file *of,
>  				       struct seq_file *s, void *v)
>  {
> @@ -2107,9 +2178,10 @@ static struct rftype res_common_files[] = {
>  	},
>  	{
>  		.name		= "mbm_assign_mode",
> -		.mode		= 0444,
> +		.mode		= 0644,
>  		.kf_ops		= &rdtgroup_kf_single_ops,
>  		.seq_show	= rdtgroup_mbm_assign_mode_show,
> +		.write		= rdtgroup_mbm_assign_mode_write,
>  		.fflags		= RFTYPE_MON_INFO,
>  	},
>  	{

Reinette

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [PATCH v7 21/24] x86/resctrl: Configure mbm_cntr_assign mode if supported
  2024-09-04 22:21 ` [PATCH v7 21/24] x86/resctrl: Configure mbm_cntr_assign mode if supported Babu Moger
@ 2024-09-19 17:43   ` Reinette Chatre
  2024-09-27 14:37     ` Moger, Babu
  0 siblings, 1 reply; 96+ messages in thread
From: Reinette Chatre @ 2024-09-19 17:43 UTC (permalink / raw)
  To: Babu Moger, corbet, fenghua.yu, tglx, mingo, bp, dave.hansen
  Cc: x86, hpa, paulmck, rdunlap, tj, peterz, yanjiewtw, kim.phillips,
	lukas.bulwahn, seanjc, jmattson, leitao, jpoimboe,
	rick.p.edgecombe, kirill.shutemov, jithu.joseph, kai.huang,
	kan.liang, daniel.sneddon, pbonzini, sandipan.das, ilpo.jarvinen,
	peternewman, maciej.wieczor-retman, linux-doc, linux-kernel,
	eranian, james.morse

Hi Babu,

On 9/4/24 3:21 PM, Babu Moger wrote:
> Configure mbm_cntr_assign on AMD.
> 
> 'mbm_cntr_assign' mode in AMD is ABMC (Assignable Bandwidth Monitoring
> Counters). When the ABMC is updated, it must be updated on all logical
> processors in the resctrl domain.
> 
> Signed-off-by: Babu Moger <babu.moger@amd.com>
> ---
> v7: Introduced resctrl_arch_mbm_cntr_assign_configure() to configure.
>     Moved the default settings to rdt_get_mon_l3_config(). It should be
>     done before the hotplug handler is called. It cannot be done at
>     rdtgroup_init().
> 
> v6: Keeping the default enablement in arch init code for now.
>      This may need some discussion.
>      Renamed resctrl_arch_configure_abmc to resctrl_arch_mbm_cntr_assign_configure.
> 
> v5: New patch to enable ABMC by default.
> ---
>  arch/x86/kernel/cpu/resctrl/internal.h |  1 +
>  arch/x86/kernel/cpu/resctrl/monitor.c  |  1 +
>  arch/x86/kernel/cpu/resctrl/rdtgroup.c | 11 +++++++++++
>  3 files changed, 13 insertions(+)
> 
> diff --git a/arch/x86/kernel/cpu/resctrl/internal.h b/arch/x86/kernel/cpu/resctrl/internal.h
> index 9a65a13ccbe9..3250561f0187 100644
> --- a/arch/x86/kernel/cpu/resctrl/internal.h
> +++ b/arch/x86/kernel/cpu/resctrl/internal.h
> @@ -709,6 +709,7 @@ int rdtgroup_assign_cntr(struct rdt_resource *r, struct rdtgroup *rdtgrp,
>  			 struct rdt_mon_domain *d, enum resctrl_event_id evtid);
>  int rdtgroup_unassign_cntr(struct rdt_resource *r, struct rdtgroup *rdtgrp,
>  			   struct rdt_mon_domain *d, enum resctrl_event_id evtid);
> +void resctrl_arch_mbm_cntr_assign_configure(struct rdt_resource *r);
>  void rdt_staged_configs_clear(void);
>  bool closid_allocated(unsigned int closid);
>  int resctrl_find_cleanest_closid(void);
> diff --git a/arch/x86/kernel/cpu/resctrl/monitor.c b/arch/x86/kernel/cpu/resctrl/monitor.c
> index 09b1d8bb0aa0..314c0b297470 100644
> --- a/arch/x86/kernel/cpu/resctrl/monitor.c
> +++ b/arch/x86/kernel/cpu/resctrl/monitor.c
> @@ -1261,6 +1261,7 @@ int __init rdt_get_mon_l3_config(struct rdt_resource *r)
>  			cpuid_count(0x80000020, 5, &eax, &ebx, &ecx, &edx);
>  			r->mon.num_mbm_cntrs = (ebx & 0xFFFF) + 1;
>  			resctrl_file_fflags_init("num_mbm_cntrs", RFTYPE_MON_INFO);
> +			hw_res->mbm_cntr_assign_enabled = true;

This is a major change to require architecture to set whether this is the default mode.
That seems fine but needs to be highlighted in the changelog and descriptions of this work.

>  		}
>  	}
>  
> diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
> index 7a8ece12d7da..1054583bef9d 100644
> --- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
> +++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
> @@ -2726,6 +2726,13 @@ int resctrl_arch_mbm_cntr_assign_set(struct rdt_resource *r, bool enable)
>  	return 0;
>  }
>  
> +void resctrl_arch_mbm_cntr_assign_configure(struct rdt_resource *r)

How about resctrl_arch_mbm_cntr_assign_set_one() to match existing
resctrl_arch_mbm_cntr_assign_set()?

> +{
> +	struct rdt_hw_resource *hw_res = resctrl_to_arch_res(r);
> +
> +	resctrl_abmc_set_one_amd(&hw_res->mbm_cntr_assign_enabled);
> +}
> +
>  /*
>   * We don't allow rdtgroup directories to be created anywhere
>   * except the root directory. Thus when looking for the rdtgroup
> @@ -4510,9 +4517,13 @@ int resctrl_online_mon_domain(struct rdt_resource *r, struct rdt_mon_domain *d)
>  
>  void resctrl_online_cpu(unsigned int cpu)
>  {
> +	struct rdt_resource *r = &rdt_resources_all[RDT_RESOURCE_L3].r_resctrl;
> +
>  	mutex_lock(&rdtgroup_mutex);
>  	/* The CPU is set in default rdtgroup after online. */
>  	cpumask_set_cpu(cpu, &rdtgroup_default.cpu_mask);
> +	if (r->mon.mbm_cntr_assignable)

Needs a r->mon_capable check?

> +		resctrl_arch_mbm_cntr_assign_configure(r);
>  	mutex_unlock(&rdtgroup_mutex);
>  }
>  

Reinette

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [PATCH v7 22/24] x86/resctrl: Update assignments on event configuration changes
  2024-09-04 22:21 ` [PATCH v7 22/24] x86/resctrl: Update assignments on event configuration changes Babu Moger
@ 2024-09-19 17:45   ` Reinette Chatre
  2024-09-27 16:22     ` Moger, Babu
  0 siblings, 1 reply; 96+ messages in thread
From: Reinette Chatre @ 2024-09-19 17:45 UTC (permalink / raw)
  To: Babu Moger, corbet, fenghua.yu, tglx, mingo, bp, dave.hansen
  Cc: x86, hpa, paulmck, rdunlap, tj, peterz, yanjiewtw, kim.phillips,
	lukas.bulwahn, seanjc, jmattson, leitao, jpoimboe,
	rick.p.edgecombe, kirill.shutemov, jithu.joseph, kai.huang,
	kan.liang, daniel.sneddon, pbonzini, sandipan.das, ilpo.jarvinen,
	peternewman, maciej.wieczor-retman, linux-doc, linux-kernel,
	eranian, james.morse

Hi Babu,

On 9/4/24 3:21 PM, Babu Moger wrote:
> Users can modify the configuration of assignable events. Whenever the
> event configuration is updated, MBM assignments must be revised across
> all monitor groups within the impacted domains.
> 
> Signed-off-by: Babu Moger <babu.moger@amd.com>
> ---
> v7: New patch to update the assignments. Missed it earlier.
> ---
>  arch/x86/kernel/cpu/resctrl/rdtgroup.c | 53 ++++++++++++++++++++++++++
>  1 file changed, 53 insertions(+)
> 
> diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
> index 1054583bef9d..0b1490d71e77 100644
> --- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
> +++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
> @@ -871,6 +871,15 @@ static int rdtgroup_rmid_show(struct kernfs_open_file *of,
>   */
>  #define MBM_EVENT_ARRAY_INDEX(_event) ((_event) - 2)
>  
> +static bool resctrl_mbm_event_assigned(struct rdtgroup *rdtg,
> +				       struct rdt_mon_domain *d, u32 evtid)
> +{
> +	int index = MBM_EVENT_ARRAY_INDEX(evtid);
> +	int cntr_id = rdtg->mon.cntr_id[index];
> +
> +	return  (cntr_id != MON_CNTR_UNSET && test_bit(cntr_id, d->mbm_cntr_map));

(Please check spaces and paren use.)

> +}
> +
>  static int rdtgroup_mbm_assign_mode_show(struct kernfs_open_file *of,
>  					 struct seq_file *s, void *v)
>  {
> @@ -1793,12 +1802,48 @@ static int mbm_local_bytes_config_show(struct kernfs_open_file *of,
>  	return 0;
>  }
>  
> +static int resctrl_mbm_event_update_assign(struct rdt_resource *r,
> +					   struct rdt_mon_domain *d, u32 evtid)
> +{
> +	struct rdt_mon_domain *dom;
> +	struct rdtgroup *rdtg;
> +	int ret = 0;
> +
> +	if (!resctrl_arch_mbm_cntr_assign_enabled(r))
> +		return ret;
> +
> +	list_for_each_entry(rdtg, &rdt_all_groups, rdtgroup_list) {
> +		struct rdtgroup *crg;
> +
> +		list_for_each_entry(dom, &r->mon_domains, hdr.list) {
> +			if (d == dom && resctrl_mbm_event_assigned(rdtg, dom, evtid)) {
> +				ret = rdtgroup_assign_cntr(r, rdtg, dom, evtid);
> +				if (ret)
> +					goto out_done;
> +			}
> +		}
> +
> +		list_for_each_entry(crg, &rdtg->mon.crdtgrp_list, mon.crdtgrp_list) {
> +			list_for_each_entry(dom, &r->mon_domains, hdr.list) {
> +				if (d == dom && resctrl_mbm_event_assigned(crg, dom, evtid)) {
> +					ret = rdtgroup_assign_cntr(r, crg, dom, evtid);
> +					if (ret)
> +						goto out_done;
> +				}
> +			}
> +		}
> +	}
> +
> +out_done:
> +	return ret;
> +}
>  
>  static void mbm_config_write_domain(struct rdt_resource *r,
>  				    struct rdt_mon_domain *d, u32 evtid, u32 val)
>  {
>  	struct mon_config_info mon_info = {0};
>  	u32 config_val;
> +	int ret;
>  
>  	/*
>  	 * Check the current config value first. If both are the same then
> @@ -1822,6 +1867,14 @@ static void mbm_config_write_domain(struct rdt_resource *r,
>  			      resctrl_arch_event_config_set,
>  			      &mon_info, 1);
>  
> +	/*
> +	 * Counter assignments needs to be updated to match the event
> +	 * configuration.
> +	 */
> +	ret = resctrl_mbm_event_update_assign(r, d, evtid);
> +	if (ret)
> +		rdt_last_cmd_puts("Assign failed, event will be Unavailable\n");
> +

This does not look right. This function _just_ returned from an IPI on appropriate CPU and then
starts flow to do _another_ IPI to run code that could have just been run during previous IPI.
The whole flow to call rdgroup_assign_cntr() also seems like an obfuscated way to call resctrl_arch_assign_cntr()
to just reconfigure the counter (not actually assign it).
Could it perhaps call some resctrl fs code via single IPI that in turn calls the appropriate arch code to set the new
mon event config and re-configures the counter?

Reinette


^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [PATCH v7 23/24] x86/resctrl: Introduce interface to list assignment states of all the groups
  2024-09-04 22:21 ` [PATCH v7 23/24] x86/resctrl: Introduce interface to list assignment states of all the groups Babu Moger
@ 2024-09-19 17:53   ` Reinette Chatre
  2024-09-27 17:06     ` Moger, Babu
  0 siblings, 1 reply; 96+ messages in thread
From: Reinette Chatre @ 2024-09-19 17:53 UTC (permalink / raw)
  To: Babu Moger, corbet, fenghua.yu, tglx, mingo, bp, dave.hansen
  Cc: x86, hpa, paulmck, rdunlap, tj, peterz, yanjiewtw, kim.phillips,
	lukas.bulwahn, seanjc, jmattson, leitao, jpoimboe,
	rick.p.edgecombe, kirill.shutemov, jithu.joseph, kai.huang,
	kan.liang, daniel.sneddon, pbonzini, sandipan.das, ilpo.jarvinen,
	peternewman, maciej.wieczor-retman, linux-doc, linux-kernel,
	eranian, james.morse

Hi Babu,

On 9/4/24 3:21 PM, Babu Moger wrote:
> Provide the interface to list the assignment states of all the resctrl
> groups in mbm_cntr_assign mode.
> 
> Example:
> $cat /sys/fs/resctrl/info/L3_MON/mbm_assign_control

It is not clear what is intended with above example, was it intended to 
have some output?

> 
> List follows the following format:
> 
> "<CTRL_MON group>/<MON group>/<domain_id>=<flags>"
> 
> Format for specific type of groups:
> 
> - Default CTRL_MON group:
>   "//<domain_id>=<flags>"
> 
> - Non-default CTRL_MON group:
>   "<CTRL_MON group>//<domain_id>=<flags>"
> 
> - Child MON group of default CTRL_MON group:
>   "/<MON group>/<domain_id>=<flags>"
> 
> - Child MON group of non-default CTRL_MON group:
>   "<CTRL_MON group>/<MON group>/<domain_id>=<flags>"
> 
> Flags can be one of the following:
> t  MBM total event is enabled
> l  MBM local event is enabled
> tl Both total and local MBM events are enabled
> _  None of the MBM events are enabled
> 
> Signed-off-by: Babu Moger <babu.moger@amd.com>
> ---

> +"mbm_assign_control":
> +	Reports the resctrl group and monitor status of each group.
> +
> +	List follows the following format:
> +		"<CTRL_MON group>/<MON group>/<domain_id>=<flags>"
> +
> +	Format for specific type of groups:
> +
> +	* Default CTRL_MON group:
> +		"//<domain_id>=<flags>"
> +
> +	* Non-default CTRL_MON group:
> +		"<CTRL_MON group>//<domain_id>=<flags>"
> +
> +	* Child MON group of default CTRL_MON group:
> +		"/<MON group>/<domain_id>=<flags>"
> +
> +	* Child MON group of non-default CTRL_MON group:
> +		"<CTRL_MON group>/<MON group>/<domain_id>=<flags>"
> +
> +	Flags can be one of the following:
> +	::
> +
> +	 t  MBM total event is assigned.
> +	 l  MBM local event is assigned.
> +	 tl Both total and local MBM events are assigned.
> +	 _  None of the MBM events are assigned.
> +
> +	Examples:
> +	::
> +
> +	 # mkdir /sys/fs/resctrl/mon_groups/child_default_mon_grp
> +	 # mkdir /sys/fs/resctrl/non_default_ctrl_mon_grp
> +	 # mkdir /sys/fs/resctrl/non_default_ctrl_mon_grp/mon_groups/child_non_default_mon_grp
> +
> +	 # cat /sys/fs/resctrl/info/L3_MON/mbm_assign_control
> +	 non_default_ctrl_mon_grp//0=tl;1=tl;
> +	 non_default_ctrl_mon_grp/child_non_default_mon_grp/0=tl;1=tl;
> +	 //0=tl;1=tl;
> +	 /child_default_mon_grp/0=tl;1=tl;
> +
> +	 There are four resctrl groups. All the groups have total and local MBM events
> +	 assigned on domain 0 and 1.
> +

Please create the docs in chosen format, like htmldocs, and see how it ends up being formatted.
For example, above seems to be intended to be a code sample but the description ("There are
four resctrl ...") appears as part of the code sample.

>  "max_threshold_occupancy":
>  		Read/write file provides the largest value (in
>  		bytes) at which a previously used LLC_occupancy

...

> +static int rdtgroup_mbm_assign_control_show(struct kernfs_open_file *of,
> +					    struct seq_file *s, void *v)
> +{
> +	struct rdt_resource *r = of->kn->parent->priv;
> +	struct rdt_mon_domain *dom;
> +	struct rdtgroup *rdtg;
> +	char str[10];
> +
> +	mutex_lock(&rdtgroup_mutex);
> +
> +	if (!resctrl_arch_mbm_cntr_assign_enabled(r)) {
> +		rdt_last_cmd_puts("mbm_cntr_assign mode is not enabled\n");
> +		mutex_unlock(&rdtgroup_mutex);
> +		return -EINVAL;
> +	}
> +
> +	rdt_last_cmd_clear();

This should be done before any attempt to write to the buffer.


Reinette

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [PATCH v7 24/24] x86/resctrl: Introduce interface to modify assignment states of the groups
  2024-09-04 22:21 ` [PATCH v7 24/24] x86/resctrl: Introduce interface to modify assignment states of " Babu Moger
@ 2024-09-19 17:59   ` Reinette Chatre
  2024-09-27 17:47     ` Moger, Babu
  0 siblings, 1 reply; 96+ messages in thread
From: Reinette Chatre @ 2024-09-19 17:59 UTC (permalink / raw)
  To: Babu Moger, corbet, fenghua.yu, tglx, mingo, bp, dave.hansen
  Cc: x86, hpa, paulmck, rdunlap, tj, peterz, yanjiewtw, kim.phillips,
	lukas.bulwahn, seanjc, jmattson, leitao, jpoimboe,
	rick.p.edgecombe, kirill.shutemov, jithu.joseph, kai.huang,
	kan.liang, daniel.sneddon, pbonzini, sandipan.das, ilpo.jarvinen,
	peternewman, maciej.wieczor-retman, linux-doc, linux-kernel,
	eranian, james.morse

Hi Babu,

On 9/4/24 3:21 PM, Babu Moger wrote:
> Introduce the interface to assign MBM events in mbm_cntr_assign mode.
> 
> Events can be enabled or disabled by writing to file
> /sys/fs/resctrl/info/L3_MON/mbm_assign_control
> 
> Format is similar to the list format with addition of opcode for the
> assignment operation.
>  "<CTRL_MON group>/<MON group>/<domain_id><opcode><flags>"
> 
> Format for specific type of groups:
> 
>  * Default CTRL_MON group:
>          "//<domain_id><opcode><flags>"
> 
>  * Non-default CTRL_MON group:
>          "<CTRL_MON group>//<domain_id><opcode><flags>"
> 
>  * Child MON group of default CTRL_MON group:
>          "/<MON group>/<domain_id><opcode><flags>"
> 
>  * Child MON group of non-default CTRL_MON group:
>          "<CTRL_MON group>/<MON group>/<domain_id><opcode><flags>"
> 
> Domain_id '*' will apply the flags on all the domains.
> 
> Opcode can be one of the following:
> 
>  = Update the assignment to match the flags
>  + Assign a new MBM event without impacting existing assignments.
>  - Unassign a MBM event from currently assigned events.
> 
> Assignment flags can be one of the following:
>  t  MBM total event
>  l  MBM local event
>  tl Both total and local MBM events
>  _  None of the MBM events. Valid only with '=' opcode.
> 
> Signed-off-by: Babu Moger <babu.moger@amd.com>
> ---
> v7: Simplified the parsing (strsep(&token, "//") in rdtgroup_mbm_assign_control_write().
>     Added mutex lock in rdtgroup_mbm_assign_control_write() while processing.
>     Renamed rdtgroup_find_grp to rdtgroup_find_grp_by_name.
>     Fixed rdtgroup_str_to_mon_state to return error for invalid flags.
>     Simplified the calls rdtgroup_assign_cntr by merging few functions earlier.
>     Removed ABMC reference in FS code.
>     Reinette commented about handling the combination of flags like 'lt_' and '_lt'.
>     Not sure if we need to change the behaviour here. Processed them sequencially right now.
>     Users have the liberty to pass the flags. Restricting it might be a problem later.

Could you please give an example of what problem may be encountered later? An assignment
like "domain=_lt" seems like a contradiction to me since user space essentially asks
for "None of the MBM events" as well as "MBM total event" and "MBM local event".


...

> @@ -352,6 +352,98 @@ with the following files:
>  	 There are four resctrl groups. All the groups have total and local MBM events
>  	 assigned on domain 0 and 1.
>  
> +	Assignment state can be updated by writing to the interface.
> +
> +	Format is similar to the list format with addition of opcode for the
> +	assignment operation.
> +
> +		"<CTRL_MON group>/<MON group>/<domain_id><opcode><flags>"
> +
> +	Format for each type of groups:
> +
> +        * Default CTRL_MON group:
> +                "//<domain_id><opcode><flags>"
> +
> +        * Non-default CTRL_MON group:
> +                "<CTRL_MON group>//<domain_id><opcode><flags>"
> +
> +        * Child MON group of default CTRL_MON group:
> +                "/<MON group>/<domain_id><opcode><flags>"
> +
> +        * Child MON group of non-default CTRL_MON group:
> +                "<CTRL_MON group>/<MON group>/<domain_id><opcode><flags>"
> +
> +	Domain_id '*' will apply the flags on all the domains.
> +
> +	Opcode can be one of the following:
> +	::
> +
> +	 = Update the assignment to match the MBM event.
> +	 + Assign a new MBM event without impacting existing assignments.
> +	 - Unassign a MBM event from currently assigned events.
> +
> +	Examples:
> +	::
> +
> +	  Initial group status:
> +	  # cat /sys/fs/resctrl/info/L3_MON/mbm_assign_control
> +	  non_default_ctrl_mon_grp//0=tl;1=tl;
> +	  non_default_ctrl_mon_grp/child_non_default_mon_grp/0=tl;1=tl;
> +	  //0=tl;1=tl;
> +	  /child_default_mon_grp/0=tl;1=tl;
> +

Similar to previous patch, looking at this generated doc does not seem to reflect
what is intended. Above and below are all formatted as code, the descriptions as
well as the actual "code".

> +	  To update the default group to assign only total MBM event on domain 0:
> +	  # echo "//0=t" > /sys/fs/resctrl/info/L3_MON/mbm_assign_control
> +
> +	  Assignment status after the update:
> +	  # cat /sys/fs/resctrl/info/L3_MON/mbm_assign_control
> +	  non_default_ctrl_mon_grp//0=tl;1=tl;
> +	  non_default_ctrl_mon_grp/child_non_default_mon_grp/0=tl;1=tl;
> +	  //0=t;1=tl;
> +	  /child_default_mon_grp/0=tl;1=tl;
> +
> +	  To update the MON group child_default_mon_grp to remove total MBM event on domain 1:
> +	  # echo "/child_default_mon_grp/1-t" > /sys/fs/resctrl/info/L3_MON/mbm_assign_control
> +
> +	  Assignment status after the update:
> +	  $ cat /sys/fs/resctrl/info/L3_MON/mbm_assign_control
> +	  non_default_ctrl_mon_grp//0=tl;1=tl;
> +	  non_default_ctrl_mon_grp/child_non_default_mon_grp/0=tl;1=tl;
> +	  //0=t;1=tl;
> +	  /child_default_mon_grp/0=tl;1=l;
> +
> +	  To update the MON group non_default_ctrl_mon_grp/child_non_default_mon_grp to
> +	  unassign both local and total MBM events on domain 1:
> +	  # echo "non_default_ctrl_mon_grp/child_non_default_mon_grp/1=_" >
> +			/sys/fs/resctrl/info/L3_MON/mbm_assign_control
> +
> +	  Assignment status after the update:
> +	  non_default_ctrl_mon_grp//0=tl;1=tl;
> +	  non_default_ctrl_mon_grp/child_non_default_mon_grp/0=tl;1=_;
> +	  //0=t;1=tl;
> +	  /child_default_mon_grp/0=tl;1=l;
> +
> +	  To update the default group to add a local MBM event domain 0.
> +	  # echo "//0+l" > /sys/fs/resctrl/info/L3_MON/mbm_assign_control
> +
> +	  Assignment status after the update:
> +	  # cat /sys/fs/resctrl/info/L3_MON/mbm_assign_control
> +	  non_default_ctrl_mon_grp//0=tl;1=tl;
> +	  non_default_ctrl_mon_grp/child_non_default_mon_grp/0=tl;1=_;
> +	  //0=tl;1=tl;
> +	  /child_default_mon_grp/0=tl;1=l;
> +
> +	  To update the non default CTRL_MON group non_default_ctrl_mon_grp to unassign all
> +	  the MBM events on all the domains.
> +	  # echo "non_default_ctrl_mon_grp//*=_" > /sys/fs/resctrl/info/L3_MON/mbm_assign_control
> +
> +	  Assignment status after the update:
> +	  #cat /sys/fs/resctrl/info/L3_MON/mbm_assign_control
> +	  non_default_ctrl_mon_grp//0=_;1=_;
> +	  non_default_ctrl_mon_grp/child_non_default_mon_grp/0=tl;1=_;
> +	  //0=tl;1=tl;
> +	  /child_default_mon_grp/0=tl;1=l;
> +
>  "max_threshold_occupancy":
>  		Read/write file provides the largest value (in
>  		bytes) at which a previously used LLC_occupancy

...

> +static int rdtgroup_process_flags(struct rdt_resource *r,
> +				  enum rdt_group_type rtype,
> +				  char *p_grp, char *c_grp, char *tok)
> +{
> +	int op, mon_state, assign_state, unassign_state;
> +	char *dom_str, *id_str, *op_str;
> +	struct rdt_mon_domain *d;
> +	struct rdtgroup *rdtgrp;
> +	unsigned long dom_id;
> +	int ret, found = 0;
> +
> +	rdtgrp = rdtgroup_find_grp_by_name(rtype, p_grp, c_grp);
> +
> +	if (!rdtgrp) {
> +		rdt_last_cmd_puts("Not a valid resctrl group\n");
> +		return -EINVAL;
> +	}
> +
> +next:
> +	if (!tok || tok[0] == '\0')
> +		return 0;
> +
> +	/* Start processing the strings for each domain */
> +	dom_str = strim(strsep(&tok, ";"));
> +
> +	op_str = strpbrk(dom_str, "=+-");
> +
> +	if (op_str) {
> +		op = *op_str;
> +	} else {
> +		rdt_last_cmd_puts("Missing operation =, +, -, _ character\n");

"_" is not an operation.

> +		return -EINVAL;
> +	}
> +
> +	id_str = strsep(&dom_str, "=+-");
> +
> +	/* Check for domain id '*' which means all domains */
> +	if (id_str && *id_str == '*') {
> +		d = NULL;
> +		goto check_state;
> +	} else if (!id_str || kstrtoul(id_str, 10, &dom_id)) {
> +		rdt_last_cmd_puts("Missing domain id\n");
> +		return -EINVAL;
> +	}
> +
> +	/* Verify if the dom_id is valid */
> +	list_for_each_entry(d, &r->mon_domains, hdr.list) {
> +		if (d->hdr.id == dom_id) {
> +			found = 1;
> +			break;
> +		}
> +	}
> +
> +	if (!found) {
> +		rdt_last_cmd_printf("Invalid domain id %ld\n", dom_id);
> +		return -EINVAL;
> +	}
> +
> +check_state:
> +	mon_state = rdtgroup_str_to_mon_state(dom_str);
> +
> +	if (mon_state == ASSIGN_INVALID) {
> +		rdt_last_cmd_puts("Invalid assign flag\n");
> +		goto out_fail;
> +	}
> +
> +	assign_state = 0;
> +	unassign_state = 0;
> +
> +	switch (op) {
> +	case '+':
> +		if (mon_state == ASSIGN_NONE) {
> +			rdt_last_cmd_puts("Invalid assign opcode\n");
> +			goto out_fail;
> +		}
> +		assign_state = mon_state;
> +		break;
> +	case '-':
> +		if (mon_state == ASSIGN_NONE) {
> +			rdt_last_cmd_puts("Invalid assign opcode\n");
> +			goto out_fail;
> +		}
> +		unassign_state = mon_state;
> +		break;
> +	case '=':
> +		assign_state = mon_state;
> +		unassign_state = (ASSIGN_TOTAL | ASSIGN_LOCAL) & ~assign_state;
> +		break;
> +	default:
> +		break;
> +	}
> +
> +	if (assign_state & ASSIGN_TOTAL) {
> +		ret = rdtgroup_assign_cntr(r, rdtgrp, d, QOS_L3_MBM_TOTAL_EVENT_ID);

hmmm ... wasn't unassign going to happen first? That would potentially make counters
available to help subsequent assign succeed.

> +		if (ret)
> +			goto out_fail;
> +	}
> +
> +	if (assign_state & ASSIGN_LOCAL) {
> +		ret = rdtgroup_assign_cntr(r, rdtgrp, d, QOS_L3_MBM_LOCAL_EVENT_ID);
> +		if (ret)
> +			goto out_fail;
> +	}
> +
> +	if (unassign_state & ASSIGN_TOTAL) {
> +		ret = rdtgroup_unassign_cntr(r, rdtgrp, d, QOS_L3_MBM_TOTAL_EVENT_ID);
> +		if (ret)
> +			goto out_fail;
> +	}
> +
> +	if (unassign_state & ASSIGN_LOCAL) {
> +		ret = rdtgroup_unassign_cntr(r, rdtgrp, d, QOS_L3_MBM_LOCAL_EVENT_ID);
> +		if (ret)
> +			goto out_fail;
> +	}
> +
> +	goto next;
> +
> +out_fail:
> +
> +	return -EINVAL;
> +}
> +

Reinette


^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [PATCH v7 00/24] x86/resctrl : Support AMD Assignable Bandwidth Monitoring Counters (ABMC)
  2024-09-04 22:21 [PATCH v7 00/24] x86/resctrl : Support AMD Assignable Bandwidth Monitoring Counters (ABMC) Babu Moger
                   ` (23 preceding siblings ...)
  2024-09-04 22:21 ` [PATCH v7 24/24] x86/resctrl: Introduce interface to modify assignment states of " Babu Moger
@ 2024-09-19 18:00 ` Reinette Chatre
  2024-09-27 18:11   ` Moger, Babu
  24 siblings, 1 reply; 96+ messages in thread
From: Reinette Chatre @ 2024-09-19 18:00 UTC (permalink / raw)
  To: Babu Moger, corbet, fenghua.yu, tglx, mingo, bp, dave.hansen
  Cc: x86, hpa, paulmck, rdunlap, tj, peterz, yanjiewtw, kim.phillips,
	lukas.bulwahn, seanjc, jmattson, leitao, jpoimboe,
	rick.p.edgecombe, kirill.shutemov, jithu.joseph, kai.huang,
	kan.liang, daniel.sneddon, pbonzini, sandipan.das, ilpo.jarvinen,
	peternewman, maciej.wieczor-retman, linux-doc, linux-kernel,
	eranian, james.morse

Hi Babu,

On 9/4/24 3:21 PM, Babu Moger wrote:
> # Linux Implementation
> 
> Create a generic interface aimed to support user space assignment
> of scarce counters used for monitoring. First usage of interface
> is by ABMC with option to expand usage to "soft-ABMC" and MPAM
> counters in future.
> 
> Feature adds following interface files:
> 
> /sys/fs/resctrl/info/L3_MON/mbm_assign_mode: Reports the list of assignable
> monitoring features supported. The enclosed brackets indicate which
> feature is enabled.
> 
> /sys/fs/resctrl/info/L3_MON/num_mbm_cntrs: Reports the number of monitoring
> counters available for assignment.
> 
> /sys/fs/resctrl/info/L3_MON/mbm_assign_control: Reports the resctrl group and monitor
> status of each group. Assignment state can be updated by writing to the
> interface.

At this point I think the architecture is settling with the remaining work focusing
on polishing the code and making it more robust. To get confidence in this big addition
it will be valuable to hear from Peter and James to confirm if soft-ABMC and
MPAM can build on this.

> 
> # Examples
> 
> a. Check if ABMC support is available
> 	#mount -t resctrl resctrl /sys/fs/resctrl/
> 
> 	#cat /sys/fs/resctrl/info/L3_MON/mbm_assign_mode
> 	[mbm_cntr_assign]
> 	default
> 
> 	ABMC feature is detected and it is enabled.
> 
> b. Check how many ABMC counters are available. 
> 
> 	#cat /sys/fs/resctrl/info/L3_MON/num_mbm_cntrs 
> 	32
> 
> c. Create few resctrl groups.
> 
> 	# mkdir /sys/fs/resctrl/mon_groups/child_default_mon_grp
> 	# mkdir /sys/fs/resctrl/non_default_ctrl_mon_grp
> 	# mkdir /sys/fs/resctrl/non_default_ctrl_mon_grp/mon_groups/child_non_default_mon_grp
> 
> 
> d. This series adds a new interface file /sys/fs/resctrl/info/L3_MON/mbm_assign_control
>    to list and modify the group's monitoring states. File provides single place

"modify the group's monitoring states" -> "modify any group's monitoring states"?

>    to list monitoring states of all the resctrl groups. It makes it easier for
>    user space to to learn about the used counters without needing to traverse

"to to learn" -> "to learn"

Reinette

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [PATCH v7 02/24] x86/resctrl: Add ABMC feature in the command line options
  2024-09-19 16:00   ` Reinette Chatre
@ 2024-09-23 14:21     ` Moger, Babu
  0 siblings, 0 replies; 96+ messages in thread
From: Moger, Babu @ 2024-09-23 14:21 UTC (permalink / raw)
  To: Reinette Chatre, corbet, fenghua.yu, tglx, mingo, bp, dave.hansen
  Cc: x86, hpa, paulmck, rdunlap, tj, peterz, yanjiewtw, kim.phillips,
	lukas.bulwahn, seanjc, jmattson, leitao, jpoimboe,
	rick.p.edgecombe, kirill.shutemov, jithu.joseph, kai.huang,
	kan.liang, daniel.sneddon, pbonzini, sandipan.das, ilpo.jarvinen,
	peternewman, maciej.wieczor-retman, linux-doc, linux-kernel,
	eranian, james.morse

Hi Reinette,

On 9/19/24 11:00, Reinette Chatre wrote:
> Hi Babu,
> 
> On 9/4/24 3:21 PM, Babu Moger wrote:
>> Add the command line option to enable or disable the new resctrl feature
>> ABMC (Assignable Bandwidth Monitoring Counters).
> 
> This does not reflect the fs and arch separation that this version highlights
> since ABMC is not a resctrl feature.
> 
> This can get confusing and I think this interface is indeed for the
> architecture where hardware features are enabled/disabled (highlighted
> by how the parameter is connected to the X86_FEATURE_ flag) ... so
> perhaps something like:
> 
> 	Add the command line option to enable or disable exposing
> 	the ABMC (Assignable Bandwidth Monitoring Counters) hardware
> 	feature to resctrl.

Sure.

> 
> Patch looks good to me.

-- 
Thanks
Babu Moger

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [PATCH v7 04/24] x86/resctrl: Detect Assignable Bandwidth Monitoring feature details
  2024-09-19 16:16   ` Reinette Chatre
@ 2024-09-23 14:37     ` Moger, Babu
  0 siblings, 0 replies; 96+ messages in thread
From: Moger, Babu @ 2024-09-23 14:37 UTC (permalink / raw)
  To: Reinette Chatre, corbet, fenghua.yu, tglx, mingo, bp, dave.hansen
  Cc: x86, hpa, paulmck, rdunlap, tj, peterz, yanjiewtw, kim.phillips,
	lukas.bulwahn, seanjc, jmattson, leitao, jpoimboe,
	rick.p.edgecombe, kirill.shutemov, jithu.joseph, kai.huang,
	kan.liang, daniel.sneddon, pbonzini, sandipan.das, ilpo.jarvinen,
	peternewman, maciej.wieczor-retman, linux-doc, linux-kernel,
	eranian, james.morse

Hi Reinette,

On 9/19/24 11:16, Reinette Chatre wrote:
> Hi Babu,
> 
> On 9/4/24 3:21 PM, Babu Moger wrote:
>> diff --git a/arch/x86/kernel/cpu/resctrl/monitor.c b/arch/x86/kernel/cpu/resctrl/monitor.c
>> index 795fe91a8feb..6a792f06f5ce 100644
>> --- a/arch/x86/kernel/cpu/resctrl/monitor.c
>> +++ b/arch/x86/kernel/cpu/resctrl/monitor.c
>> @@ -1229,6 +1229,12 @@ int __init rdt_get_mon_l3_config(struct rdt_resource *r)
>>  			mbm_local_event.configurable = true;
>>  			mbm_config_rftype_init("mbm_local_bytes_config");
>>  		}
>> +
>> +		if (rdt_cpu_has(X86_FEATURE_ABMC)) {
>> +			r->mon.mbm_cntr_assignable = true;
>> +			cpuid_count(0x80000020, 5, &eax, &ebx, &ecx, &edx);
>> +			r->mon.num_mbm_cntrs = (ebx & 0xFFFF) + 1;
> 
> This should use GENMASK()

Sure.
-- 
Thanks
Babu Moger

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [PATCH v7 06/24] x86/resctrl: Add support to enable/disable AMD ABMC feature
  2024-09-19 16:22   ` Reinette Chatre
@ 2024-09-23 15:30     ` Moger, Babu
  0 siblings, 0 replies; 96+ messages in thread
From: Moger, Babu @ 2024-09-23 15:30 UTC (permalink / raw)
  To: Reinette Chatre, corbet, fenghua.yu, tglx, mingo, bp, dave.hansen
  Cc: x86, hpa, paulmck, rdunlap, tj, peterz, yanjiewtw, kim.phillips,
	lukas.bulwahn, seanjc, jmattson, leitao, jpoimboe,
	rick.p.edgecombe, kirill.shutemov, jithu.joseph, kai.huang,
	kan.liang, daniel.sneddon, pbonzini, sandipan.das, ilpo.jarvinen,
	peternewman, maciej.wieczor-retman, linux-doc, linux-kernel,
	eranian, james.morse

Hi Reinette,

On 9/19/24 11:22, Reinette Chatre wrote:
> Hi Babu,
> 
> On 9/4/24 3:21 PM, Babu Moger wrote:
>> Add the functionality to enable/disable AMD ABMC feature.
>>
>> AMD ABMC feature is enabled by setting enabled bit(0) in MSR
>> L3_QOS_EXT_CFG.  When the state of ABMC is changed, the MSR needs
>> to be updated on all the logical processors in the QOS Domain.
>>
>> Hardware counters will reset when ABMC state is changed. Reset the
>> architectural state maintained by resctrl so that reading of a hardware
>> counter is not considered as an overflow in next update.
> 
> Above mentions that architectural state is also reset, but that does
> not seem to form part of this patch? 

Yes. Correct. Will remove this text.

> 
>>
>> The ABMC feature details are documented in APM listed below [1].
>> [1] AMD64 Architecture Programmer's Manual Volume 2: System Programming
>> Publication # 24593 Revision 3.41 section 19.3.3.3 Assignable Bandwidth
>> Monitoring (ABMC).
>>
>> Link: https://bugzilla.kernel.org/show_bug.cgi?id=206537
>> Signed-off-by: Babu Moger <babu.moger@amd.com>
>> ---
> 
> ...
> 
>>  static inline struct rdt_hw_resource *resctrl_to_arch_res(struct rdt_resource *r)
>> diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
>> index 7e76f8d839fc..0178555bf3f6 100644
>> --- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
>> +++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
>> @@ -2402,6 +2402,41 @@ int resctrl_arch_set_cdp_enabled(enum resctrl_res_level l, bool enable)
>>  	return 0;
>>  }
>>  
>> +/*
>> + * Update L3_QOS_EXT_CFG MSR on all the CPUs associated with the resource.
> 
> This comment is not accurate since the function below only sets MSR on current CPU.

Sure. Will move this comment to the caller where "on_each_cpu_mask" is called.

> 
>> + */
>> +static void resctrl_abmc_set_one_amd(void *arg)
> 
> Reinette
> 

-- 
Thanks
Babu Moger

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [PATCH v7 07/24] x86/resctrl: Introduce the interface to display monitor mode
  2024-09-19 16:28   ` Reinette Chatre
@ 2024-09-23 16:01     ` Moger, Babu
  0 siblings, 0 replies; 96+ messages in thread
From: Moger, Babu @ 2024-09-23 16:01 UTC (permalink / raw)
  To: Reinette Chatre, corbet, fenghua.yu, tglx, mingo, bp, dave.hansen
  Cc: x86, hpa, paulmck, rdunlap, tj, peterz, yanjiewtw, kim.phillips,
	lukas.bulwahn, seanjc, jmattson, leitao, jpoimboe,
	rick.p.edgecombe, kirill.shutemov, jithu.joseph, kai.huang,
	kan.liang, daniel.sneddon, pbonzini, sandipan.das, ilpo.jarvinen,
	peternewman, maciej.wieczor-retman, linux-doc, linux-kernel,
	eranian, james.morse

Hi Reinette,

On 9/19/24 11:28, Reinette Chatre wrote:
> Hi Babu,
> 
> On 9/4/24 3:21 PM, Babu Moger wrote:
>> Introduce the interface file "mbm_assign_mode" to list monitor modes
>> supported.
>>
>> The "mbm_cntr_assign" mode provides the option to assign a hardware
>> counter to an RMID and monitor the bandwidth as long as it is assigned.
>>
>> On AMD systems "mbm_cntr_assign" is backed by the ABMC (Assignable
>> Bandwidth Monitoring Counters) hardware feature. "mbm_cntr_assign" mode
>> is enabled by default when supported.
> 
> As I understand this series changed this behavior to let the architecture
> dictate whether "mbm_cntr_assign" is enabled by default.

Yes. Correct. Will change the test to mention that.

> 
>>
>> The "default" mode is the existing monitoring mode that works without the
>> explicit counter assignment, instead relying on dynamic counter assignment
>> by hardware that may result in hardware not dedicating a counter resulting
>> in monitoring data reads returning "Unavailable".
>>
>> Provide an interface to display the monitor mode on the system.
>> $cat /sys/fs/resctrl/info/L3_MON/mbm_assign_mode
>> [mbm_cntr_assign]
>> default
>>
>> Switching the mbm_assign_mode will reset all the MBM counters of all
>> resctrl groups.
>>
>> Signed-off-by: Babu Moger <babu.moger@amd.com>
>> ---
>> v7: Updated the descriptions/commit log in resctrl.rst to generic text.
>>     Thanks to James and Reinette.
>>     Rename mbm_mode to mbm_assign_mode.
>>     Introduced mutex lock in rdtgroup_mbm_mode_show().
>>
>> v6: Added documentation for mbm_cntr_assign and legacy mode.
>>     Moved mbm_mode fflags initialization to static initialization.
>>
>> v5: Changed interface name to mbm_mode.
>>     It will be always available even if ABMC feature is not supported.
>>     Added description in resctrl.rst about ABMC mode.
>>     Fixed display abmc and legacy consistantly.
>>
>> v4: Fixed the checks for legacy and abmc mode. Default it ABMC.
>>
>> v3: New patch to display ABMC capability.
>> ---
>>  Documentation/arch/x86/resctrl.rst     | 33 ++++++++++++++++++++++++++
>>  arch/x86/kernel/cpu/resctrl/rdtgroup.c | 31 ++++++++++++++++++++++++
>>  2 files changed, 64 insertions(+)
>>
>> diff --git a/Documentation/arch/x86/resctrl.rst b/Documentation/arch/x86/resctrl.rst
>> index 30586728a4cd..a7b17ad8acb9 100644
>> --- a/Documentation/arch/x86/resctrl.rst
>> +++ b/Documentation/arch/x86/resctrl.rst
>> @@ -257,6 +257,39 @@ with the following files:
>>  	    # cat /sys/fs/resctrl/info/L3_MON/mbm_local_bytes_config
>>  	    0=0x30;1=0x30;3=0x15;4=0x15
>>  
>> +"mbm_assign_mode":
>> +	Reports the list of monitoring modes supported. The enclosed brackets
>> +	indicate which feature is enabled.
> 
> "which feature is enabled" -> "which mode is enabled"?

Sure.

> 
>> +	::
>> +
>> +	  cat /sys/fs/resctrl/info/L3_MON/mbm_assign_mode
>> +	  [mbm_cntr_assign]
>> +	  default
>> +
>> +	"mbm_cntr_assign":
>> +
>> +	In mbm_cntr_assign mode user-space is able to specify which control
>> +	or monitor groups in resctrl should have a hardware counter assigned
> 
> This documentation should ideally also be appropriate for when the "soft-ABMC"
> support lands. Considering that, should all the "hardware counter" instances perhaps be
> changed to just be "counter"?

Sure.

> 
>> +	using the 'mbm_control' file. The number of hardware counters available
>> +	is described in the 'num_mbm_cntrs' file. Changing to this mode will
>> +	cause all counters on a resource to reset.
> 
> Should resctrl commit to this? Resetting of the counters as implemented here
> does seem to be an architecture specific action so this text could be
> made more generic by stating "may cause all counters on a resource to reset".

Ok. Sure.

> 
>> +
>> +	The feature is needed on platforms which support more control and monitor
> 
> "The feature" -> "The mode"?

Sure.
> 
>> +	groups than hardware counters, meaning 'unassigned' control or monitor
>> +	groups will report 'Unavailable' or not count all the traffic in an
>> +	unpredictable way.
> 
> "or not count all the traffic in an unpredictable way" is a bit hard to parse ... how
> about "or count traffic in an unpredictable way"?

ok. Sure.

> 
> 
>> +
>> +	AMD Platforms with ABMC (Assignable Bandwidth Monitoring Counters) feature
>> +	enable this mode by default so that counters remain assigned even when the
>> +	corresponding RMID is not in use by any processor.
>> +
>> +	"default":
>> +
>> +	By default resctrl assumes each control and monitor group has a hardware counter.
>> +	Hardware without this property will still allow more control or monitor groups
>> +	than 'num_mbm_cntrs' to be created. Reading the mbm files may report 'Unavailable'
> Please be specific what is meant with "the mbm files"

Sure. Will change it to mbm_total_bytes and mbm_local_bytes.

> 
>> +	if there is no hardware resource assigned.
> 
> "no hardware resource" -> "no counter"?

Sure.

> 
>> +
>>  "max_threshold_occupancy":
>>  		Read/write file provides the largest value (in
>>  		bytes) at which a previously used LLC_occupancy
>> diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
>> index 0178555bf3f6..dbc8c5e63213 100644
>> --- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
>> +++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
>> @@ -845,6 +845,30 @@ static int rdtgroup_rmid_show(struct kernfs_open_file *of,
>>  	return ret;
>>  }
>>  
> 
> Reinette
> 

-- 
Thanks
Babu Moger

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [PATCH v7 08/24] x86/resctrl: Introduce interface to display number of monitoring counters
  2024-09-19 16:32   ` Reinette Chatre
@ 2024-09-23 16:23     ` Moger, Babu
  0 siblings, 0 replies; 96+ messages in thread
From: Moger, Babu @ 2024-09-23 16:23 UTC (permalink / raw)
  To: Reinette Chatre, corbet, fenghua.yu, tglx, mingo, bp, dave.hansen
  Cc: x86, hpa, paulmck, rdunlap, tj, peterz, yanjiewtw, kim.phillips,
	lukas.bulwahn, seanjc, jmattson, leitao, jpoimboe,
	rick.p.edgecombe, kirill.shutemov, jithu.joseph, kai.huang,
	kan.liang, daniel.sneddon, pbonzini, sandipan.das, ilpo.jarvinen,
	peternewman, maciej.wieczor-retman, linux-doc, linux-kernel,
	eranian, james.morse

Hi Reinette,

On 9/19/24 11:32, Reinette Chatre wrote:
> Hi Babu,
> 
> On 9/4/24 3:21 PM, Babu Moger wrote:
>> The mbm_assign_cntr mode provides an option to the user to assign a
>> hardware counter to an RMID, event pair and monitor the bandwidth as
> 
> Could you please be consistent in this series in how you refer to
> an RMID, event pair ? For example later it becomes RMID-event pair.

Will keep it as "an RMID, event pair" in all the references.

> 
> 
>> long as the counter is assigned. Number of assignments depend on number
>> of monitoring counters available.
>>
>> Provide the interface to display the number of monitoring counters
>> supported.
>>
>> Signed-off-by: Babu Moger <babu.moger@amd.com>
>> ---
>> v7: Minor commit log text changes.
>>
>> v6: No changes.
>>
>> v5: Changed the display name from num_cntrs to num_mbm_cntrs.
>>     Updated the commit message.
>>     Moved the patch after mbm_mode is introduced.
>>
>> v4: Changed the counter name to num_cntrs. And few text changes.
>>
>> v3: Changed the field name to mbm_assign_cntrs.
>>
>> v2: Changed the field name to mbm_assignable_counters from abmc_counte
>> ---
>>  Documentation/arch/x86/resctrl.rst     |  3 +++
>>  arch/x86/kernel/cpu/resctrl/monitor.c  |  1 +
>>  arch/x86/kernel/cpu/resctrl/rdtgroup.c | 16 ++++++++++++++++
>>  3 files changed, 20 insertions(+)
>>
>> diff --git a/Documentation/arch/x86/resctrl.rst b/Documentation/arch/x86/resctrl.rst
>> index a7b17ad8acb9..3e9302971faf 100644
>> --- a/Documentation/arch/x86/resctrl.rst
>> +++ b/Documentation/arch/x86/resctrl.rst
>> @@ -290,6 +290,9 @@ with the following files:
>>  	than 'num_mbm_cntrs' to be created. Reading the mbm files may report 'Unavailable'
>>  	if there is no hardware resource assigned.
>>  
>> +"num_mbm_cntrs":
>> +	The number of monitoring counters available for assignment.
>> +
> 
> I think it will be helpful if the changelog and the above doc notes when this file can
> be expected to be visible since its visibility is not connected to visibility of

Sure.

> "mbm_assign_mode" that refers to it. There also seems to be a conflict here where
> "mbm_assign_mode" documentation contains section about "default" that refers to
> "num_mbm_cntrs", but "num_mbm_cntrs" may not be visible if "default" is the only mode.
> 

Yes. Need to change the reference to "num_rmids" in default section.

-- 
Thanks
Babu Moger

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [PATCH v7 09/24] x86/resctrl: Introduce bitmap mbm_cntr_free_map to track assignable counters
  2024-09-19 16:42   ` Reinette Chatre
@ 2024-09-23 18:33     ` Moger, Babu
  2024-09-23 22:28       ` Reinette Chatre
  0 siblings, 1 reply; 96+ messages in thread
From: Moger, Babu @ 2024-09-23 18:33 UTC (permalink / raw)
  To: Reinette Chatre, corbet, fenghua.yu, tglx, mingo, bp, dave.hansen
  Cc: x86, hpa, paulmck, rdunlap, tj, peterz, yanjiewtw, kim.phillips,
	lukas.bulwahn, seanjc, jmattson, leitao, jpoimboe,
	rick.p.edgecombe, kirill.shutemov, jithu.joseph, kai.huang,
	kan.liang, daniel.sneddon, pbonzini, sandipan.das, ilpo.jarvinen,
	peternewman, maciej.wieczor-retman, linux-doc, linux-kernel,
	eranian, james.morse

Hi Reinette,

On 9/19/24 11:42, Reinette Chatre wrote:
> Hi Babu,
> 
> On 9/4/24 3:21 PM, Babu Moger wrote:
>> Hardware provides a set of counters when mbm_cntr_assignable feature is
>> supported. These counters are used for assigning the events in resctrl
>> a group when the feature is enabled. The kernel must manage and track the
> 
> The second sentence ("These counters ...") is difficult to parse.

How about?

Counters are used for assigning the events in resctrl group.

> 
>> number of available counters.
> 
> "The kernel must manage and track the number of available counters." ->
> "The kernel must manage and track the available counters." ?

Sure.

> 
>>
>> Introduce mbm_cntr_free_map bitmap to track available counters and set
>> of routines to allocate and free the counters.
>>
>> Signed-off-by: Babu Moger <babu.moger@amd.com>
>> ---
> 
> ...
> 
>> diff --git a/arch/x86/kernel/cpu/resctrl/monitor.c b/arch/x86/kernel/cpu/resctrl/monitor.c
>> index e3e71843401a..f98cc5b9bebc 100644
>> --- a/arch/x86/kernel/cpu/resctrl/monitor.c
>> +++ b/arch/x86/kernel/cpu/resctrl/monitor.c
>> @@ -1175,6 +1175,30 @@ static __init int snc_get_config(void)
>>  	return ret;
>>  }
>>  
>> +/*
>> + * Counter bitmap for tracking the available counters.
>> + * 'mbm_cntr_assign' mode provides set of hardware counters for assigning
>> + * RMID, event pair. Each RMID and event pair takes one hardware counter.
> 
> (soft-ABMC may need to edit this comment)

Agree..

> 
>> + * Kernel needs to keep track of the number of available counters.
> 
> Last sentence seems to be duplicate of the first?

Will remove it.

> 
>> + */
>> +static int mbm_cntrs_init(struct rdt_resource *r)
> 
> Needs __init?

Did you mean to merge this with dom_data_init and don't have to have a
separate function. Please clarify.


> 
>> +{
>> +	if (r->mon.mbm_cntr_assignable) {
>> +		r->mon.mbm_cntr_free_map = bitmap_zalloc(r->mon.num_mbm_cntrs,
>> +							 GFP_KERNEL);
>> +		if (!r->mon.mbm_cntr_free_map)
>> +			return -ENOMEM;
>> +		bitmap_fill(r->mon.mbm_cntr_free_map, r->mon.num_mbm_cntrs);
>> +	}
>> +	return 0;
>> +}
>> +
>> +static void __exit mbm_cntrs_exit(struct rdt_resource *r)
>> +{
>> +	if (r->mon.mbm_cntr_assignable)
>> +		bitmap_free(r->mon.mbm_cntr_free_map);
>> +}
>> +
>>  int __init rdt_get_mon_l3_config(struct rdt_resource *r)
>>  {
>>  	unsigned int mbm_offset = boot_cpu_data.x86_cache_mbm_width_offset;
>> @@ -1240,6 +1264,10 @@ int __init rdt_get_mon_l3_config(struct rdt_resource *r)
>>  		}
>>  	}
>>  
>> +	ret = mbm_cntrs_init(r);
>> +	if (ret)
>> +		return ret;
> 
> Missing cleanup of earlier allocation on error path here. Even so, this does not
> seem to integrate with existing dom_data_init() wrt ordering and locking. Could
> this be more fitting when merged with dom_data_init() (after moving it)?

Sure. Will do.

> 
>> +
>>  	l3_mon_evt_init(r);
>>  
>>  	r->mon_capable = true;
>> @@ -1247,9 +1275,10 @@ int __init rdt_get_mon_l3_config(struct rdt_resource *r)
>>  	return 0;
>>  }
>>  
>> -void __exit rdt_put_mon_l3_config(void)
>> +void __exit rdt_put_mon_l3_config(struct rdt_resource *r)
>>  {
>>  	dom_data_exit();
>> +	mbm_cntrs_exit(r);
> 
> There is a mismatch wrt locking used in dom_data_exit() and mbm_cntrs_exit() that is
> sure to cause confusion and difficulty in the MPAM transition.

Will merge this with dom_data_exit.

> 
>>  }
>>  
>>  void __init intel_rdt_mbm_apply_quirk(void)
>> diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
>> index ba737890d5c2..a51992984832 100644
>> --- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
>> +++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
>> @@ -185,6 +185,25 @@ bool closid_allocated(unsigned int closid)
>>  	return !test_bit(closid, &closid_free_map);
>>  }
>>  
>> +int mbm_cntr_alloc(struct rdt_resource *r)
>> +{
>> +	int cntr_id;
>> +
>> +	cntr_id = find_first_bit(r->mon.mbm_cntr_free_map,
>> +				 r->mon.num_mbm_cntrs);
>> +	if (cntr_id >= r->mon.num_mbm_cntrs)
>> +		return -ENOSPC;
>> +
>> +	__clear_bit(cntr_id, r->mon.mbm_cntr_free_map);
>> +
>> +	return cntr_id;
>> +}
>> +
>> +void mbm_cntr_free(struct rdt_resource *r, u32 cntr_id)
>> +{
>> +	__set_bit(cntr_id, r->mon.mbm_cntr_free_map);
>> +}
>> +
>>  /**
>>   * rdtgroup_mode_by_closid - Return mode of resource group with closid
>>   * @closid: closid if the resource group
>> diff --git a/include/linux/resctrl.h b/include/linux/resctrl.h
>> index f11d6fdfd977..aab22ff8e0c1 100644
>> --- a/include/linux/resctrl.h
>> +++ b/include/linux/resctrl.h
>> @@ -187,12 +187,14 @@ enum resctrl_scope {
>>   * @num_rmid:		Number of RMIDs available
>>   * @num_mbm_cntrs:	Number of assignable monitoring counters
>>   * @mbm_cntr_assignable:Is system capable of supporting monitor assignment?
>> + * @mbm_cntr_free_map:	bitmap of number of assignable MBM counters
> 
> The "number of" is not clear ... it seems to indicate tracking a count? How about
> just "bitmap of free MBM counters"

Sure.

> 
>>   * @evt_list:		List of monitoring events
>>   */
>>  struct resctrl_mon {
>>  	int			num_rmid;
>>  	int			num_mbm_cntrs;
>>  	bool			mbm_cntr_assignable;
>> +	unsigned long		*mbm_cntr_free_map;
>>  	struct list_head	evt_list;
>>  };
>>  
> 
> Reinette
> 

-- 
Thanks
Babu Moger

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [PATCH v7 10/24] x86/resctrl: Introduce mbm_total_cfg and mbm_local_cfg in struct rdt_hw_mon_domain
  2024-09-19 16:51   ` Reinette Chatre
@ 2024-09-23 18:43     ` Moger, Babu
  0 siblings, 0 replies; 96+ messages in thread
From: Moger, Babu @ 2024-09-23 18:43 UTC (permalink / raw)
  To: Reinette Chatre, corbet, fenghua.yu, tglx, mingo, bp, dave.hansen
  Cc: x86, hpa, paulmck, rdunlap, tj, peterz, yanjiewtw, kim.phillips,
	lukas.bulwahn, seanjc, jmattson, leitao, jpoimboe,
	rick.p.edgecombe, kirill.shutemov, jithu.joseph, kai.huang,
	kan.liang, daniel.sneddon, pbonzini, sandipan.das, ilpo.jarvinen,
	peternewman, maciej.wieczor-retman, linux-doc, linux-kernel,
	eranian, james.morse

Hi Reinette,

On 9/19/24 11:51, Reinette Chatre wrote:
> Hi Babu,
> 
> On 9/4/24 3:21 PM, Babu Moger wrote:
>> If the BMEC (Bandwidth Monitoring Event Configuration) feature is
>> supported, the bandwidth events can be configured to track specific
>> events. The event configuration is domain specific. ABMC (Assignable
>> Bandwidth Monitoring Counters) feature needs event configuration
>> information to assign hardware counter to an RMID. Event configurations
> 
> "to assign hardware counter" -> "to assign a hardware counter"?

Sure.

> 
>> are not stored in resctrl but instead always read from or written to
>> hardware directly when prompted by user space.
>>
>> Read the event configuration from the hardware during the domain
>> initialization. Save the configuration value in rdt_hw_mon_domain,
> 
> "rdt_hw_mon_domain" -> "struct rdt_hw_mon_domain"

Sure.

> 
>> so it can be used for counter assignment.
>>
>> Signed-off-by: Babu Moger <babu.moger@amd.com>
>> ---
>> v7: Fixed initializing INVALID_CONFIG_VALUE to mbm_local_cfg in case of error.
>>
>> v6: Renamed resctrl_arch_mbm_evt_config -> resctrl_mbm_evt_config_init
>>     Initialized value to INVALID_CONFIG_VALUE if it is not configurable.
>>     Minor commit message update.
>>
>> v5: Exported mon_event_config_index_get.
>>     Renamed arch_domain_mbm_evt_config to resctrl_arch_mbm_evt_config.
>>
>> v4: Read the configuration information from the hardware to initialize.
>>     Added few commit messages.
>>     Fixed the tab spaces.
>>
>> v3: Minor changes related to rebase in mbm_config_write_domain.
>>
>> v2: No changes.
>> ---
>>  arch/x86/kernel/cpu/resctrl/core.c     |  2 ++
>>  arch/x86/kernel/cpu/resctrl/internal.h |  9 +++++++++
>>  arch/x86/kernel/cpu/resctrl/monitor.c  | 26 ++++++++++++++++++++++++++
>>  arch/x86/kernel/cpu/resctrl/rdtgroup.c |  4 +---
>>  4 files changed, 38 insertions(+), 3 deletions(-)
>>
>> diff --git a/arch/x86/kernel/cpu/resctrl/core.c b/arch/x86/kernel/cpu/resctrl/core.c
>> index 00ad00258df2..2a4be004a2df 100644
>> --- a/arch/x86/kernel/cpu/resctrl/core.c
>> +++ b/arch/x86/kernel/cpu/resctrl/core.c
>> @@ -632,6 +632,8 @@ static void domain_add_cpu_mon(int cpu, struct rdt_resource *r)
>>  
>>  	arch_mon_domain_online(r, d);
>>  
>> +	resctrl_mbm_evt_config_init(hw_dom);
> 
> Now that the arch and fs separate becomes clear I wonder if it may help to understand
> this work if we start using clear namespaces to help this distinction. Surely the
> arch code is very inconsistent in this regard (thus this function fits in), but
> resctrl_ has to be the prefix for fs code.
> 

Will rename the function to arch_mbm_evt_config_init().
-- 
Thanks
Babu Moger

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [PATCH v7 11/24] x86/resctrl: Remove MSR reading of event configuration value
  2024-09-19 16:55   ` Reinette Chatre
@ 2024-09-23 18:45     ` Moger, Babu
  0 siblings, 0 replies; 96+ messages in thread
From: Moger, Babu @ 2024-09-23 18:45 UTC (permalink / raw)
  To: Reinette Chatre, corbet, fenghua.yu, tglx, mingo, bp, dave.hansen
  Cc: x86, hpa, paulmck, rdunlap, tj, peterz, yanjiewtw, kim.phillips,
	lukas.bulwahn, seanjc, jmattson, leitao, jpoimboe,
	rick.p.edgecombe, kirill.shutemov, jithu.joseph, kai.huang,
	kan.liang, daniel.sneddon, pbonzini, sandipan.das, ilpo.jarvinen,
	peternewman, maciej.wieczor-retman, linux-doc, linux-kernel,
	eranian, james.morse

Hi Reinette,

On 9/19/24 11:55, Reinette Chatre wrote:
> Hi Babu,
> 
> On 9/4/24 3:21 PM, Babu Moger wrote:
>> The event configuration is domain specific and initialized during domain
>> initialization. The values are stored in struct rdt_hw_mon_domain.
>>
>> It is not required to read the configuration register every time user asks
>> for it. Use the value stored in struct rdt_hw_mon_domain instead.
>>
>> Introduce resctrl_arch_event_config_get() and
>> resctrl_arch_event_config_set() to get/set architecture domain specific
>> mbm_total_cfg/mbm_local_cfg values.
>>
>> Signed-off-by: Babu Moger <babu.moger@amd.com>
>> ---
> 
> This change looks fine, but could the function names be more specific? For example,
>  resctrl_arch_mon_event_config_get()/resctrl_arch_mon_event_config_set()?

Sure. will do.

-- 
Thanks
Babu Moger

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [PATCH v7 13/24] x86/resctrl: Add data structures and definitions for ABMC assignment
  2024-09-19 17:08   ` Reinette Chatre
@ 2024-09-23 20:21     ` Moger, Babu
  2024-09-23 22:30       ` Reinette Chatre
  0 siblings, 1 reply; 96+ messages in thread
From: Moger, Babu @ 2024-09-23 20:21 UTC (permalink / raw)
  To: Reinette Chatre, corbet, fenghua.yu, tglx, mingo, bp, dave.hansen
  Cc: x86, hpa, paulmck, rdunlap, tj, peterz, yanjiewtw, kim.phillips,
	lukas.bulwahn, seanjc, jmattson, leitao, jpoimboe,
	rick.p.edgecombe, kirill.shutemov, jithu.joseph, kai.huang,
	kan.liang, daniel.sneddon, pbonzini, sandipan.das, ilpo.jarvinen,
	peternewman, maciej.wieczor-retman, linux-doc, linux-kernel,
	eranian, james.morse

Hi Reinette,

On 9/19/24 12:08, Reinette Chatre wrote:
> Hi Babu,
> 
> On 9/4/24 3:21 PM, Babu Moger wrote:
>> +/*
>> + * ABMC counters can be configured by writing to L3_QOS_ABMC_CFG.
>> + * @bw_type		: Bandwidth configuration(supported by BMEC)
>> + *			  tracked by the @cntr_id.
>> + * @bw_src		: Bandwidth source (RMID or CLOSID).
>> + * @reserved1		: Reserved.
>> + * @is_clos		: @bw_src field is a CLOSID (not an RMID).
>> + * @cntr_id		: Counter identifier.
>> + * @reserved		: Reserved.
>> + * @cntr_en		: Tracking enable bit.
>> + * @cfg_en		: Configuration enable bit.
>> + *
>> + * Configuration and tracking:
>> + * CfgEn=1,CtrEn=0 : Configure CtrID and but no tracking the events yet.
>> + * CfgEn=1,CtrEn=1 : Configure CtrID and start tracking events.
> 
> Thanks for moving the text ... could it now be made to match the new (outside
> AMD arch document) destination? For example, "CfgEn" becomes "@cfg_en",

Sure. Will do.

> "CtrID" becomes "@cntr_id" etc. Also please fix language, for example
> what does "and but no tracking the events yet" mean? So far this work
> has focused on "counting" vs "not counting" events and it is not

I will change the text to "not counting".  Hope this will clarify here.

> clear how this "tracking" fits it ... this seems to be the hardware
> view that means "tracking the RMID to which @cntr_id is assigned"?
> Please help readers to understand how the implementation is supported
> by the hardware.

I have checked with hw team on this.
CfgEn: This corresponds counter assignment.
CtrEn: This is to start or stop counting.
       We always set this to 1 to start counting.
-- 
Thanks
Babu Moger

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [PATCH v7 15/24] x86/resctrl: Implement resctrl_arch_assign_cntr to assign a counter with ABMC
  2024-09-19 17:13   ` Reinette Chatre
@ 2024-09-23 21:03     ` Moger, Babu
  2024-09-23 22:29       ` Reinette Chatre
  0 siblings, 1 reply; 96+ messages in thread
From: Moger, Babu @ 2024-09-23 21:03 UTC (permalink / raw)
  To: Reinette Chatre, corbet, fenghua.yu, tglx, mingo, bp, dave.hansen
  Cc: x86, hpa, paulmck, rdunlap, tj, peterz, yanjiewtw, kim.phillips,
	lukas.bulwahn, seanjc, jmattson, leitao, jpoimboe,
	rick.p.edgecombe, kirill.shutemov, jithu.joseph, kai.huang,
	kan.liang, daniel.sneddon, pbonzini, sandipan.das, ilpo.jarvinen,
	peternewman, maciej.wieczor-retman, linux-doc, linux-kernel,
	eranian, james.morse

Hi Reinette,

On 9/19/24 12:13, Reinette Chatre wrote:
> Hi Babu,
> 
> In subject, please use "()" for a function.

Sure.

> 
> On 9/4/24 3:21 PM, Babu Moger wrote:
>> +/*
>> + * Send an IPI to the domain to assign the counter to RMID, event pair.
>> + */
>> +int resctrl_arch_assign_cntr(struct rdt_resource *r, struct rdt_mon_domain *d,
>> +			     enum resctrl_event_id evtid, u32 rmid, u32 closid,
>> +			     u32 cntr_id, bool assign)
> 
> Looking ahead this is also called when config of existing assigned counter is
> changed. Should this thus perhaps be resctrl_arch_config_cntr()? 

We have a matching resctrl_arch_assign_cntr() and
resctrl_arch_unassign_cntr() pair.

If we change resctrl_arch_config_cntr() then we need to change
resctrl_arch_unassign_cntr to resctrl_arch_unconfig_cntr().

Should we change both?


> 
>> +{
>> +	struct rdt_hw_mon_domain *hw_dom = resctrl_to_arch_mon_dom(d);
>> +	union l3_qos_abmc_cfg abmc_cfg = { 0 };
>> +	struct arch_mbm_state *arch_mbm;
>> +
>> +	abmc_cfg.split.cfg_en = 1;
> 
> Just to confirm ... a counter remains "configured" from the hardware side whether it
> is assigned from resctrl perspective or not? It seems to me that once a counter is
> "unassigned" from resctrl perspective it needs no more context about that
> counter, yet it remains configured from hardware side?

That is correct.
When unassigned, we are setting cntr_en = 0, so there is no counting. But
in hardware perspective it is still configured.
-- 
Thanks
Babu Moger

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [PATCH v7 09/24] x86/resctrl: Introduce bitmap mbm_cntr_free_map to track assignable counters
  2024-09-23 18:33     ` Moger, Babu
@ 2024-09-23 22:28       ` Reinette Chatre
  2024-09-24 13:58         ` Moger, Babu
  0 siblings, 1 reply; 96+ messages in thread
From: Reinette Chatre @ 2024-09-23 22:28 UTC (permalink / raw)
  To: babu.moger, corbet, fenghua.yu, tglx, mingo, bp, dave.hansen
  Cc: x86, hpa, paulmck, rdunlap, tj, peterz, yanjiewtw, kim.phillips,
	lukas.bulwahn, seanjc, jmattson, leitao, jpoimboe,
	rick.p.edgecombe, kirill.shutemov, jithu.joseph, kai.huang,
	kan.liang, daniel.sneddon, pbonzini, sandipan.das, ilpo.jarvinen,
	peternewman, maciej.wieczor-retman, linux-doc, linux-kernel,
	eranian, james.morse

Hi Babu,

On 9/23/24 11:33 AM, Moger, Babu wrote:
> Hi Reinette,
> 
> On 9/19/24 11:42, Reinette Chatre wrote:
>> Hi Babu,
>>
>> On 9/4/24 3:21 PM, Babu Moger wrote:
>>> Hardware provides a set of counters when mbm_cntr_assignable feature is
>>> supported. These counters are used for assigning the events in resctrl
>>> a group when the feature is enabled. The kernel must manage and track the
>>
>> The second sentence ("These counters ...") is difficult to parse.
> 
> How about?
> 
> Counters are used for assigning the events in resctrl group.

Apologies but I am just not able to parse this. How about: "These counters
are assigned to the MBM monitoring events of a MON group that needs to
be tracked."

...

>>> + */
>>> +static int mbm_cntrs_init(struct rdt_resource *r)
>>
>> Needs __init?
> 
> Did you mean to merge this with dom_data_init and don't have to have a
> separate function. Please clarify.

Here I was referring to the actual __init storage class attribute. Since
mbm_cntrs_init() is only called by __init code, it too should have the
__init storage class attribute.
I do expect that mbm_cntrs_init() will be called by dom_data_init() and
care should be taken when making this change since it seems that dom_data_init()
itself needs the __init storage class attribute. Looks like this was missed
by commit bd334c86b5d7 ("x86/resctrl: Add __init attribute to rdt_get_mon_l3_config()")

Reinette


^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [PATCH v7 15/24] x86/resctrl: Implement resctrl_arch_assign_cntr to assign a counter with ABMC
  2024-09-23 21:03     ` Moger, Babu
@ 2024-09-23 22:29       ` Reinette Chatre
  2024-09-24 14:07         ` Moger, Babu
  0 siblings, 1 reply; 96+ messages in thread
From: Reinette Chatre @ 2024-09-23 22:29 UTC (permalink / raw)
  To: babu.moger, corbet, fenghua.yu, tglx, mingo, bp, dave.hansen
  Cc: x86, hpa, paulmck, rdunlap, tj, peterz, yanjiewtw, kim.phillips,
	lukas.bulwahn, seanjc, jmattson, leitao, jpoimboe,
	rick.p.edgecombe, kirill.shutemov, jithu.joseph, kai.huang,
	kan.liang, daniel.sneddon, pbonzini, sandipan.das, ilpo.jarvinen,
	peternewman, maciej.wieczor-retman, linux-doc, linux-kernel,
	eranian, james.morse

Hi Babu,

On 9/23/24 2:03 PM, Moger, Babu wrote:
> Hi Reinette,
> 
> On 9/19/24 12:13, Reinette Chatre wrote:
>> Hi Babu,
>>
>> In subject, please use "()" for a function.
> 
> Sure.
> 
>>
>> On 9/4/24 3:21 PM, Babu Moger wrote:
>>> +/*
>>> + * Send an IPI to the domain to assign the counter to RMID, event pair.
>>> + */
>>> +int resctrl_arch_assign_cntr(struct rdt_resource *r, struct rdt_mon_domain *d,
>>> +			     enum resctrl_event_id evtid, u32 rmid, u32 closid,
>>> +			     u32 cntr_id, bool assign)
>>
>> Looking ahead this is also called when config of existing assigned counter is
>> changed. Should this thus perhaps be resctrl_arch_config_cntr()? 
> 
> We have a matching resctrl_arch_assign_cntr() and
> resctrl_arch_unassign_cntr() pair.

hmmm ... resctrl_arch_unassign_cntr() does not exist in this version of the series.

> 
> If we change resctrl_arch_config_cntr() then we need to change
> resctrl_arch_unassign_cntr to resctrl_arch_unconfig_cntr().
> 
> Should we change both?
> 
> 
>>
>>> +{
>>> +	struct rdt_hw_mon_domain *hw_dom = resctrl_to_arch_mon_dom(d);
>>> +	union l3_qos_abmc_cfg abmc_cfg = { 0 };
>>> +	struct arch_mbm_state *arch_mbm;
>>> +
>>> +	abmc_cfg.split.cfg_en = 1;
>>
>> Just to confirm ... a counter remains "configured" from the hardware side whether it
>> is assigned from resctrl perspective or not? It seems to me that once a counter is
>> "unassigned" from resctrl perspective it needs no more context about that
>> counter, yet it remains configured from hardware side?
> 
> That is correct.
> When unassigned, we are setting cntr_en = 0, so there is no counting. But
> in hardware perspective it is still configured.

I think I misunderstood the "configured in hardware" to equate to "assigned by
OS" when in fact it is just a bit to indicate when hardware makes changes
requested by MSR write.

Reinette 


^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [PATCH v7 13/24] x86/resctrl: Add data structures and definitions for ABMC assignment
  2024-09-23 20:21     ` Moger, Babu
@ 2024-09-23 22:30       ` Reinette Chatre
  2024-09-24 14:51         ` Moger, Babu
  0 siblings, 1 reply; 96+ messages in thread
From: Reinette Chatre @ 2024-09-23 22:30 UTC (permalink / raw)
  To: babu.moger, corbet, fenghua.yu, tglx, mingo, bp, dave.hansen
  Cc: x86, hpa, paulmck, rdunlap, tj, peterz, yanjiewtw, kim.phillips,
	lukas.bulwahn, seanjc, jmattson, leitao, jpoimboe,
	rick.p.edgecombe, kirill.shutemov, jithu.joseph, kai.huang,
	kan.liang, daniel.sneddon, pbonzini, sandipan.das, ilpo.jarvinen,
	peternewman, maciej.wieczor-retman, linux-doc, linux-kernel,
	eranian, james.morse

Hi Babu,

On 9/23/24 1:21 PM, Moger, Babu wrote:
> Hi Reinette,
> 
> On 9/19/24 12:08, Reinette Chatre wrote:
>> Hi Babu,
>>
>> On 9/4/24 3:21 PM, Babu Moger wrote:
>>> +/*
>>> + * ABMC counters can be configured by writing to L3_QOS_ABMC_CFG.
>>> + * @bw_type		: Bandwidth configuration(supported by BMEC)
>>> + *			  tracked by the @cntr_id.
>>> + * @bw_src		: Bandwidth source (RMID or CLOSID).
>>> + * @reserved1		: Reserved.
>>> + * @is_clos		: @bw_src field is a CLOSID (not an RMID).
>>> + * @cntr_id		: Counter identifier.
>>> + * @reserved		: Reserved.
>>> + * @cntr_en		: Tracking enable bit.
>>> + * @cfg_en		: Configuration enable bit.
>>> + *
>>> + * Configuration and tracking:
>>> + * CfgEn=1,CtrEn=0 : Configure CtrID and but no tracking the events yet.
>>> + * CfgEn=1,CtrEn=1 : Configure CtrID and start tracking events.
>>
>> Thanks for moving the text ... could it now be made to match the new (outside
>> AMD arch document) destination? For example, "CfgEn" becomes "@cfg_en",
> 
> Sure. Will do.
> 
>> "CtrID" becomes "@cntr_id" etc. Also please fix language, for example
>> what does "and but no tracking the events yet" mean? So far this work
>> has focused on "counting" vs "not counting" events and it is not
> 
> I will change the text to "not counting".  Hope this will clarify here.
> 
>> clear how this "tracking" fits it ... this seems to be the hardware
>> view that means "tracking the RMID to which @cntr_id is assigned"?
>> Please help readers to understand how the implementation is supported
>> by the hardware.
> 
> I have checked with hw team on this.
> CfgEn: This corresponds counter assignment.

To be specific this corresponds to *hardware* counter assignment? This is
because software sets CfgEn to 1 whether it is assigned from kernel perspective
or not.

Actually ... when I look at the AMD spec it becomes more clear to me. If I
understand the spec correctly the CfgEn bit is used to coordinate changes
between OS and HW. Seems like OS can leisurely write to any fields of
L3_QOS_ABMC_CFG, but only when CfgEn bit is set will the actual hardware
configuration be performed.

> CtrEn: This is to start or stop counting.
>        We always set this to 1 to start counting.

Understood. Now that I read this portion of AMD spec it is more clear to me
and I understand why CfgEn is set in both counter assign and unassign.

Reinette

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [PATCH v7 09/24] x86/resctrl: Introduce bitmap mbm_cntr_free_map to track assignable counters
  2024-09-23 22:28       ` Reinette Chatre
@ 2024-09-24 13:58         ` Moger, Babu
  0 siblings, 0 replies; 96+ messages in thread
From: Moger, Babu @ 2024-09-24 13:58 UTC (permalink / raw)
  To: Reinette Chatre, corbet, fenghua.yu, tglx, mingo, bp, dave.hansen
  Cc: x86, hpa, paulmck, rdunlap, tj, peterz, yanjiewtw, kim.phillips,
	lukas.bulwahn, seanjc, jmattson, leitao, jpoimboe,
	rick.p.edgecombe, kirill.shutemov, jithu.joseph, kai.huang,
	kan.liang, daniel.sneddon, pbonzini, sandipan.das, ilpo.jarvinen,
	peternewman, maciej.wieczor-retman, linux-doc, linux-kernel,
	eranian, james.morse

Hi Reinette,

On 9/23/24 17:28, Reinette Chatre wrote:
> Hi Babu,
> 
> On 9/23/24 11:33 AM, Moger, Babu wrote:
>> Hi Reinette,
>>
>> On 9/19/24 11:42, Reinette Chatre wrote:
>>> Hi Babu,
>>>
>>> On 9/4/24 3:21 PM, Babu Moger wrote:
>>>> Hardware provides a set of counters when mbm_cntr_assignable feature is
>>>> supported. These counters are used for assigning the events in resctrl
>>>> a group when the feature is enabled. The kernel must manage and track the
>>>
>>> The second sentence ("These counters ...") is difficult to parse.
>>
>> How about?
>>
>> Counters are used for assigning the events in resctrl group.
> 
> Apologies but I am just not able to parse this. How about: "These counters
> are assigned to the MBM monitoring events of a MON group that needs to
> be tracked."

Sure.

> ...
> 
>>>> + */
>>>> +static int mbm_cntrs_init(struct rdt_resource *r)
>>>
>>> Needs __init?
>>
>> Did you mean to merge this with dom_data_init and don't have to have a
>> separate function. Please clarify.
> 
> Here I was referring to the actual __init storage class attribute. Since
> mbm_cntrs_init() is only called by __init code, it too should have the
> __init storage class attribute.
> I do expect that mbm_cntrs_init() will be called by dom_data_init() and
> care should be taken when making this change since it seems that dom_data_init()
> itself needs the __init storage class attribute. Looks like this was missed
> by commit bd334c86b5d7 ("x86/resctrl: Add __init attribute to rdt_get_mon_l3_config()")

Sure. I can take care of this.

-- 
Thanks
Babu Moger

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [PATCH v7 15/24] x86/resctrl: Implement resctrl_arch_assign_cntr to assign a counter with ABMC
  2024-09-23 22:29       ` Reinette Chatre
@ 2024-09-24 14:07         ` Moger, Babu
  0 siblings, 0 replies; 96+ messages in thread
From: Moger, Babu @ 2024-09-24 14:07 UTC (permalink / raw)
  To: Reinette Chatre, corbet, fenghua.yu, tglx, mingo, bp, dave.hansen
  Cc: x86, hpa, paulmck, rdunlap, tj, peterz, yanjiewtw, kim.phillips,
	lukas.bulwahn, seanjc, jmattson, leitao, jpoimboe,
	rick.p.edgecombe, kirill.shutemov, jithu.joseph, kai.huang,
	kan.liang, daniel.sneddon, pbonzini, sandipan.das, ilpo.jarvinen,
	peternewman, maciej.wieczor-retman, linux-doc, linux-kernel,
	eranian, james.morse

Hi Reinette,

On 9/23/24 17:29, Reinette Chatre wrote:
> Hi Babu,
> 
> On 9/23/24 2:03 PM, Moger, Babu wrote:
>> Hi Reinette,
>>
>> On 9/19/24 12:13, Reinette Chatre wrote:
>>> Hi Babu,
>>>
>>> In subject, please use "()" for a function.
>>
>> Sure.
>>
>>>
>>> On 9/4/24 3:21 PM, Babu Moger wrote:
>>>> +/*
>>>> + * Send an IPI to the domain to assign the counter to RMID, event pair.
>>>> + */
>>>> +int resctrl_arch_assign_cntr(struct rdt_resource *r, struct rdt_mon_domain *d,
>>>> +			     enum resctrl_event_id evtid, u32 rmid, u32 closid,
>>>> +			     u32 cntr_id, bool assign)
>>>
>>> Looking ahead this is also called when config of existing assigned counter is
>>> changed. Should this thus perhaps be resctrl_arch_config_cntr()? 
>>
>> We have a matching resctrl_arch_assign_cntr() and
>> resctrl_arch_unassign_cntr() pair.
> 
> hmmm ... resctrl_arch_unassign_cntr() does not exist in this version of the series.

My bad. Confused with different versions.

Sure. Will change it resctrl_arch_config_cntr().


> 
>>
>> If we change resctrl_arch_config_cntr() then we need to change
>> resctrl_arch_unassign_cntr to resctrl_arch_unconfig_cntr().
>>
>> Should we change both?
>>
>>
>>>
>>>> +{
>>>> +	struct rdt_hw_mon_domain *hw_dom = resctrl_to_arch_mon_dom(d);
>>>> +	union l3_qos_abmc_cfg abmc_cfg = { 0 };
>>>> +	struct arch_mbm_state *arch_mbm;
>>>> +
>>>> +	abmc_cfg.split.cfg_en = 1;
>>>
>>> Just to confirm ... a counter remains "configured" from the hardware side whether it
>>> is assigned from resctrl perspective or not? It seems to me that once a counter is
>>> "unassigned" from resctrl perspective it needs no more context about that
>>> counter, yet it remains configured from hardware side?
>>
>> That is correct.
>> When unassigned, we are setting cntr_en = 0, so there is no counting. But
>> in hardware perspective it is still configured.
> 
> I think I misunderstood the "configured in hardware" to equate to "assigned by
> OS" when in fact it is just a bit to indicate when hardware makes changes
> requested by MSR write.
> 

That is correct. Hardware makes the changes only when cfg_en = 1.
Otherwise writing the MSR has no effect.
-- 
Thanks
Babu Moger

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [PATCH v7 13/24] x86/resctrl: Add data structures and definitions for ABMC assignment
  2024-09-23 22:30       ` Reinette Chatre
@ 2024-09-24 14:51         ` Moger, Babu
  0 siblings, 0 replies; 96+ messages in thread
From: Moger, Babu @ 2024-09-24 14:51 UTC (permalink / raw)
  To: Reinette Chatre, corbet, fenghua.yu, tglx, mingo, bp, dave.hansen
  Cc: x86, hpa, paulmck, rdunlap, tj, peterz, yanjiewtw, kim.phillips,
	lukas.bulwahn, seanjc, jmattson, leitao, jpoimboe,
	rick.p.edgecombe, kirill.shutemov, jithu.joseph, kai.huang,
	kan.liang, daniel.sneddon, pbonzini, sandipan.das, ilpo.jarvinen,
	peternewman, maciej.wieczor-retman, linux-doc, linux-kernel,
	eranian, james.morse

Hi Reinette,

On 9/23/24 17:30, Reinette Chatre wrote:
> Hi Babu,
> 
> On 9/23/24 1:21 PM, Moger, Babu wrote:
>> Hi Reinette,
>>
>> On 9/19/24 12:08, Reinette Chatre wrote:
>>> Hi Babu,
>>>
>>> On 9/4/24 3:21 PM, Babu Moger wrote:
>>>> +/*
>>>> + * ABMC counters can be configured by writing to L3_QOS_ABMC_CFG.
>>>> + * @bw_type		: Bandwidth configuration(supported by BMEC)
>>>> + *			  tracked by the @cntr_id.
>>>> + * @bw_src		: Bandwidth source (RMID or CLOSID).
>>>> + * @reserved1		: Reserved.
>>>> + * @is_clos		: @bw_src field is a CLOSID (not an RMID).
>>>> + * @cntr_id		: Counter identifier.
>>>> + * @reserved		: Reserved.
>>>> + * @cntr_en		: Tracking enable bit.
>>>> + * @cfg_en		: Configuration enable bit.
>>>> + *
>>>> + * Configuration and tracking:
>>>> + * CfgEn=1,CtrEn=0 : Configure CtrID and but no tracking the events yet.
>>>> + * CfgEn=1,CtrEn=1 : Configure CtrID and start tracking events.
>>>
>>> Thanks for moving the text ... could it now be made to match the new (outside
>>> AMD arch document) destination? For example, "CfgEn" becomes "@cfg_en",
>>
>> Sure. Will do.
>>
>>> "CtrID" becomes "@cntr_id" etc. Also please fix language, for example
>>> what does "and but no tracking the events yet" mean? So far this work
>>> has focused on "counting" vs "not counting" events and it is not
>>
>> I will change the text to "not counting".  Hope this will clarify here.
>>
>>> clear how this "tracking" fits it ... this seems to be the hardware
>>> view that means "tracking the RMID to which @cntr_id is assigned"?
>>> Please help readers to understand how the implementation is supported
>>> by the hardware.
>>
>> I have checked with hw team on this.
>> CfgEn: This corresponds counter assignment.
> 
> To be specific this corresponds to *hardware* counter assignment? This is
> because software sets CfgEn to 1 whether it is assigned from kernel perspective
> or not.

Yes. We are setting CfgEn = 1 in both assign/unassign.

In case of unassign, we want the counter to stop counting so that software
does not get confused. Otherwise it is really not required.


> 
> Actually ... when I look at the AMD spec it becomes more clear to me. If I
> understand the spec correctly the CfgEn bit is used to coordinate changes
> between OS and HW. Seems like OS can leisurely write to any fields of
> L3_QOS_ABMC_CFG, but only when CfgEn bit is set will the actual hardware
> configuration be performed.
> 
>> CtrEn: This is to start or stop counting.
>>        We always set this to 1 to start counting.
> 
> Understood. Now that I read this portion of AMD spec it is more clear to me
> and I understand why CfgEn is set in both counter assign and unassign.
> 
> Reinette
> 

-- 
Thanks
Babu Moger

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [PATCH v7 09/24] x86/resctrl: Introduce bitmap mbm_cntr_free_map to track assignable counters
  2024-09-04 22:21 ` [PATCH v7 09/24] x86/resctrl: Introduce bitmap mbm_cntr_free_map to track assignable counters Babu Moger
  2024-09-19 16:42   ` Reinette Chatre
@ 2024-09-24 16:25   ` Peter Newman
  2024-09-24 17:01     ` Moger, Babu
  1 sibling, 1 reply; 96+ messages in thread
From: Peter Newman @ 2024-09-24 16:25 UTC (permalink / raw)
  To: Babu Moger
  Cc: corbet, fenghua.yu, reinette.chatre, tglx, mingo, bp, dave.hansen,
	x86, hpa, paulmck, rdunlap, tj, peterz, yanjiewtw, kim.phillips,
	lukas.bulwahn, seanjc, jmattson, leitao, jpoimboe,
	rick.p.edgecombe, kirill.shutemov, jithu.joseph, kai.huang,
	kan.liang, daniel.sneddon, pbonzini, sandipan.das, ilpo.jarvinen,
	maciej.wieczor-retman, linux-doc, linux-kernel, eranian,
	james.morse

Hi Babu,

On Wed, Sep 4, 2024 at 3:23 PM Babu Moger <babu.moger@amd.com> wrote:

> diff --git a/include/linux/resctrl.h b/include/linux/resctrl.h
> index f11d6fdfd977..aab22ff8e0c1 100644
> --- a/include/linux/resctrl.h
> +++ b/include/linux/resctrl.h
> @@ -187,12 +187,14 @@ enum resctrl_scope {
>   * @num_rmid:          Number of RMIDs available
>   * @num_mbm_cntrs:     Number of assignable monitoring counters
>   * @mbm_cntr_assignable:Is system capable of supporting monitor assignment?
> + * @mbm_cntr_free_map: bitmap of number of assignable MBM counters
>   * @evt_list:          List of monitoring events
>   */
>  struct resctrl_mon {
>         int                     num_rmid;
>         int                     num_mbm_cntrs;
>         bool                    mbm_cntr_assignable;
> +       unsigned long           *mbm_cntr_free_map;
>         struct list_head        evt_list;
>  };

This looks global still. Will only all-domain (*=) operations be
supported initially?

Thanks,
-Peter

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [PATCH v7 09/24] x86/resctrl: Introduce bitmap mbm_cntr_free_map to track assignable counters
  2024-09-24 16:25   ` Peter Newman
@ 2024-09-24 17:01     ` Moger, Babu
  0 siblings, 0 replies; 96+ messages in thread
From: Moger, Babu @ 2024-09-24 17:01 UTC (permalink / raw)
  To: Peter Newman
  Cc: corbet, fenghua.yu, reinette.chatre, tglx, mingo, bp, dave.hansen,
	x86, hpa, paulmck, rdunlap, tj, peterz, yanjiewtw, kim.phillips,
	lukas.bulwahn, seanjc, jmattson, leitao, jpoimboe,
	rick.p.edgecombe, kirill.shutemov, jithu.joseph, kai.huang,
	kan.liang, daniel.sneddon, pbonzini, sandipan.das, ilpo.jarvinen,
	maciej.wieczor-retman, linux-doc, linux-kernel, eranian,
	james.morse

Hi Peter,

On 9/24/24 11:25, Peter Newman wrote:
> Hi Babu,
> 
> On Wed, Sep 4, 2024 at 3:23 PM Babu Moger <babu.moger@amd.com> wrote:
> 
>> diff --git a/include/linux/resctrl.h b/include/linux/resctrl.h
>> index f11d6fdfd977..aab22ff8e0c1 100644
>> --- a/include/linux/resctrl.h
>> +++ b/include/linux/resctrl.h
>> @@ -187,12 +187,14 @@ enum resctrl_scope {
>>   * @num_rmid:          Number of RMIDs available
>>   * @num_mbm_cntrs:     Number of assignable monitoring counters
>>   * @mbm_cntr_assignable:Is system capable of supporting monitor assignment?
>> + * @mbm_cntr_free_map: bitmap of number of assignable MBM counters
>>   * @evt_list:          List of monitoring events
>>   */
>>  struct resctrl_mon {
>>         int                     num_rmid;
>>         int                     num_mbm_cntrs;
>>         bool                    mbm_cntr_assignable;
>> +       unsigned long           *mbm_cntr_free_map;
>>         struct list_head        evt_list;
>>  };
> 
> This looks global still. Will only all-domain (*=) operations be
> supported initially?

Yes. It is supported in this series.

We have one counter at global level and another at domain level.
https://lore.kernel.org/lkml/7a24bb182897acab3daaac1cadaabca3bcc73dc5.1725488488.git.babu.moger@amd.com/

Domain level counter is used for tracking the counters status in each domain.

Global counter is released once the counter is freed in all the domains.

-- 
Thanks
Babu Moger

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [PATCH v7 16/24] x86/resctrl: Add the interface to assign/update counter assignment
  2024-09-19 17:20   ` Reinette Chatre
@ 2024-09-26 16:28     ` Moger, Babu
  2024-09-26 16:46       ` Reinette Chatre
  0 siblings, 1 reply; 96+ messages in thread
From: Moger, Babu @ 2024-09-26 16:28 UTC (permalink / raw)
  To: Reinette Chatre, corbet, fenghua.yu, tglx, mingo, bp, dave.hansen
  Cc: x86, hpa, paulmck, rdunlap, tj, peterz, yanjiewtw, kim.phillips,
	lukas.bulwahn, seanjc, jmattson, leitao, jpoimboe,
	rick.p.edgecombe, kirill.shutemov, jithu.joseph, kai.huang,
	kan.liang, daniel.sneddon, pbonzini, sandipan.das, ilpo.jarvinen,
	peternewman, maciej.wieczor-retman, linux-doc, linux-kernel,
	eranian, james.morse

Hi Reinette,


On 9/19/24 12:20, Reinette Chatre wrote:
> Hi Babu,
> 
> On 9/4/24 3:21 PM, Babu Moger wrote:
>> diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
>> index 7ad653b4e768..1d45120ff2b5 100644
>> --- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
>> +++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
>> @@ -864,6 +864,13 @@ static int rdtgroup_rmid_show(struct kernfs_open_file *of,
>>  	return ret;
>>  }
>>  
>> +/*
>> + * Get the counter index for the assignable counter
>> + * 0 for evtid == QOS_L3_MBM_TOTAL_EVENT_ID
>> + * 1 for evtid == QOS_L3_MBM_LOCAL_EVENT_ID
>> + */
>> +#define MBM_EVENT_ARRAY_INDEX(_event) ((_event) - 2)
>> +
>>  static int rdtgroup_mbm_assign_mode_show(struct kernfs_open_file *of,
>>  					 struct seq_file *s, void *v)
>>  {
>> @@ -1898,6 +1905,45 @@ int resctrl_arch_assign_cntr(struct rdt_resource *r, struct rdt_mon_domain *d,
>>  	return 0;
>>  }
>>  
>> +/*
>> + * Assign a hardware counter to the group.
>> + * Counter will be assigned to all the domains if rdt_mon_domain is NULL
>> + * else the counter will be allocated to specific domain.
>> + */
>> +int rdtgroup_assign_cntr(struct rdt_resource *r, struct rdtgroup *rdtgrp,
>> +			 struct rdt_mon_domain *d, enum resctrl_event_id evtid)
> 
> Could we please review the naming of function as this series progresses? Using such a generic
> name for this specific function seems to result in its callers later in series having even more
> generic names that are hard to decipher. For example, later (very generic) "rdtgroup_assign_grp()"
> calls this function and I find rdtgroup_assign_grp() to be very vague making the code more difficult
> to follow. For example, rdtgroup_assign_cntr() could be rdtgroup_assign_cntr_event() and
> rdtgroup_assign_grp() could instead be rdtgroup_assign_cntr()?  Please feel free to improve.

Sure.

How about rdtgroup_assign_cntr_event() and rdtgroup_assign_cntr_grp() ?

Added grp extension for the second one.

> 
>>  +{
>> +	int index = MBM_EVENT_ARRAY_INDEX(evtid);
>> +	int cntr_id = rdtgrp->mon.cntr_id[index];
>> +
>> +	/*
>> +	 * Allocate a new counter id to the group if the counter id is not
>> +	 * is not assigned already.
> 
> "is not is not" -> "is not"
> 

Sure.

-- 
Thanks
Babu Moger

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [PATCH v7 16/24] x86/resctrl: Add the interface to assign/update counter assignment
  2024-09-26 16:28     ` Moger, Babu
@ 2024-09-26 16:46       ` Reinette Chatre
  2024-09-26 16:59         ` Moger, Babu
  0 siblings, 1 reply; 96+ messages in thread
From: Reinette Chatre @ 2024-09-26 16:46 UTC (permalink / raw)
  To: babu.moger, corbet, fenghua.yu, tglx, mingo, bp, dave.hansen
  Cc: x86, hpa, paulmck, rdunlap, tj, peterz, yanjiewtw, kim.phillips,
	lukas.bulwahn, seanjc, jmattson, leitao, jpoimboe,
	rick.p.edgecombe, kirill.shutemov, jithu.joseph, kai.huang,
	kan.liang, daniel.sneddon, pbonzini, sandipan.das, ilpo.jarvinen,
	peternewman, maciej.wieczor-retman, linux-doc, linux-kernel,
	eranian, james.morse

Hi Babu,

On 9/26/24 9:28 AM, Moger, Babu wrote:
> Hi Reinette,
> 
> 
> On 9/19/24 12:20, Reinette Chatre wrote:
>> Hi Babu,
>>
>> On 9/4/24 3:21 PM, Babu Moger wrote:
>>> diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
>>> index 7ad653b4e768..1d45120ff2b5 100644
>>> --- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
>>> +++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
>>> @@ -864,6 +864,13 @@ static int rdtgroup_rmid_show(struct kernfs_open_file *of,
>>>  	return ret;
>>>  }
>>>  
>>> +/*
>>> + * Get the counter index for the assignable counter
>>> + * 0 for evtid == QOS_L3_MBM_TOTAL_EVENT_ID
>>> + * 1 for evtid == QOS_L3_MBM_LOCAL_EVENT_ID
>>> + */
>>> +#define MBM_EVENT_ARRAY_INDEX(_event) ((_event) - 2)
>>> +
>>>  static int rdtgroup_mbm_assign_mode_show(struct kernfs_open_file *of,
>>>  					 struct seq_file *s, void *v)
>>>  {
>>> @@ -1898,6 +1905,45 @@ int resctrl_arch_assign_cntr(struct rdt_resource *r, struct rdt_mon_domain *d,
>>>  	return 0;
>>>  }
>>>  
>>> +/*
>>> + * Assign a hardware counter to the group.
>>> + * Counter will be assigned to all the domains if rdt_mon_domain is NULL
>>> + * else the counter will be allocated to specific domain.
>>> + */
>>> +int rdtgroup_assign_cntr(struct rdt_resource *r, struct rdtgroup *rdtgrp,
>>> +			 struct rdt_mon_domain *d, enum resctrl_event_id evtid)
>>
>> Could we please review the naming of function as this series progresses? Using such a generic
>> name for this specific function seems to result in its callers later in series having even more
>> generic names that are hard to decipher. For example, later (very generic) "rdtgroup_assign_grp()"
>> calls this function and I find rdtgroup_assign_grp() to be very vague making the code more difficult
>> to follow. For example, rdtgroup_assign_cntr() could be rdtgroup_assign_cntr_event() and
>> rdtgroup_assign_grp() could instead be rdtgroup_assign_cntr()?  Please feel free to improve.
> 
> Sure.
> 
> How about rdtgroup_assign_cntr_event() and rdtgroup_assign_cntr_grp() ?
> 
> Added grp extension for the second one.

Is the "grp" extension needed? The function already has "rdtgroup_" as prefix so
the "grp" extension does not seem necessary to me since I think "rdtgroup_" and "grp"
intend to refer to the same?

Reinette


^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [PATCH v7 17/24] x86/resctrl: Add the interface to unassign a MBM counter
  2024-09-19 17:26   ` Reinette Chatre
@ 2024-09-26 16:56     ` Moger, Babu
  0 siblings, 0 replies; 96+ messages in thread
From: Moger, Babu @ 2024-09-26 16:56 UTC (permalink / raw)
  To: Reinette Chatre, corbet, fenghua.yu, tglx, mingo, bp, dave.hansen
  Cc: x86, hpa, paulmck, rdunlap, tj, peterz, yanjiewtw, kim.phillips,
	lukas.bulwahn, seanjc, jmattson, leitao, jpoimboe,
	rick.p.edgecombe, kirill.shutemov, jithu.joseph, kai.huang,
	kan.liang, daniel.sneddon, pbonzini, sandipan.das, ilpo.jarvinen,
	peternewman, maciej.wieczor-retman, linux-doc, linux-kernel,
	eranian, james.morse

Hi Reinette,

On 9/19/24 12:26, Reinette Chatre wrote:
> Hi Babu,
> 
> On 9/4/24 3:21 PM, Babu Moger wrote:
>> The mbm_cntr_assign mode provides a limited number of hardware counters
>> that can be assigned to an RMID-event pair to monitor bandwidth while
>> assigned. If all counters are in use, the kernel will show an error
>> message: "Out of MBM assignable counters" when a new assignment is
>> requested. To make space for a new assignment, users must unassign an
>> already assigned counter.
>>
>> Introduce an interface that allows for the unassignment of counter IDs
>> from both the group and the domain. Additionally, ensure that the global
>> counter is released if it is no longer assigned to any domains.
>>
>> Signed-off-by: Babu Moger <babu.moger@amd.com>
>> ---
>> v7: Merged rdtgroup_unassign_cntr and rdtgroup_free_cntr functions.
>>     Renamed rdtgroup_mbm_cntr_test() to rdtgroup_mbm_cntr_is_assigned().
>>     Reworded the commit log little bit.
>>
>> v6: Removed mbm_cntr_free from this patch.
>>     Added counter test in all the domains and free if it is not assigned to
>>     any domains.
>>
>> v5: Few name changes to match cntr_id.
>>     Changed the function names to rdtgroup_unassign_cntr
>>     More comments on commit log.
>>
>> v4: Added domain specific unassign feature.
>>     Few name changes.
>>
>> v3: Removed the static from the prototype of rdtgroup_unassign_abmc.
>>     The function is not called directly from user anymore. These
>>     changes are related to global assignment interface.
>>
>> v2: No changes.
>> ---
>>  arch/x86/kernel/cpu/resctrl/internal.h |  2 ++
>>  arch/x86/kernel/cpu/resctrl/rdtgroup.c | 49 ++++++++++++++++++++++++++
>>  2 files changed, 51 insertions(+)
>>
>> diff --git a/arch/x86/kernel/cpu/resctrl/internal.h b/arch/x86/kernel/cpu/resctrl/internal.h
>> index 6a90fc20be5b..9a65a13ccbe9 100644
>> --- a/arch/x86/kernel/cpu/resctrl/internal.h
>> +++ b/arch/x86/kernel/cpu/resctrl/internal.h
>> @@ -707,6 +707,8 @@ int resctrl_arch_assign_cntr(struct rdt_resource *r, struct rdt_mon_domain *d,
>>  			     u32 cntr_id, bool assign);
>>  int rdtgroup_assign_cntr(struct rdt_resource *r, struct rdtgroup *rdtgrp,
>>  			 struct rdt_mon_domain *d, enum resctrl_event_id evtid);
>> +int rdtgroup_unassign_cntr(struct rdt_resource *r, struct rdtgroup *rdtgrp,
>> +			   struct rdt_mon_domain *d, enum resctrl_event_id evtid);
>>  void rdt_staged_configs_clear(void);
>>  bool closid_allocated(unsigned int closid);
>>  int resctrl_find_cleanest_closid(void);
>> diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
>> index 1d45120ff2b5..21b9ca4ce493 100644
>> --- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
>> +++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
>> @@ -1944,6 +1944,55 @@ int rdtgroup_assign_cntr(struct rdt_resource *r, struct rdtgroup *rdtgrp,
>>  	return 0;
>>  }
>>  
>> +static int rdtgroup_mbm_cntr_is_assigned(struct rdt_resource *r, u32 cntr_id)
> 
> Should this return bool?

Sure.

> 
> With function prefix of "rdtgroup" I would expect that an rdtgroup would be one of its
> parameters but that is not the case ... this is nothing to do with a rdtgroup.
> Maybe something like "mbm_cntr_assigned_to_domain()"?

Sure.

> 
>> +{
>> +	struct rdt_mon_domain *d;
>> +
>> +	list_for_each_entry(d, &r->mon_domains, hdr.list)
> 
> Based on function name it is unexpected that it checks the global bitmap and not the
> domain lists. The function really needs a more appropriate name to reflect what it
> actually does.

ok. The name mbm_cntr_assigned_to_domain() should be fine now.

> 
>> +		if (test_bit(cntr_id, d->mbm_cntr_map))
>> +			return 1;
>> +
>> +	return 0;
>> +}
>> +
>> +/*
>> + * Unassign a hardware counter from the domain and the group. Global
>> + * counter will be freed once it is unassigned from all the domains.
> 
> Could this also get a similar comment as partner function about special
> meaning of NULL domain?

Sure.

> 
>> + */
>> +int rdtgroup_unassign_cntr(struct rdt_resource *r, struct rdtgroup *rdtgrp,
>> +			   struct rdt_mon_domain *d,
>> +			   enum resctrl_event_id evtid)
>> +{
>> +	int index = MBM_EVENT_ARRAY_INDEX(evtid);
>> +	int cntr_id = rdtgrp->mon.cntr_id[index];
>> +
>> +	if (cntr_id != MON_CNTR_UNSET) {
> 
> Function can exit early after the MON_CNTR_UNSET check to reduce level of
> indentation in rest of function.

Sure.

> 
>> +		if (!d) {
>> +			list_for_each_entry(d, &r->mon_domains, hdr.list) {
>> +				resctrl_arch_assign_cntr(r, d, evtid,
>> +							 rdtgrp->mon.rmid,
>> +							 rdtgrp->closid,
>> +							 cntr_id, false);
>> +				clear_bit(cntr_id, d->mbm_cntr_map);
>> +			}
>> +		} else {
>> +			resctrl_arch_assign_cntr(r, d, evtid,
>> +						 rdtgrp->mon.rmid,
>> +						 rdtgrp->closid,
>> +						 cntr_id, false);
>> +			clear_bit(cntr_id, d->mbm_cntr_map);
>> +		}
>> +
>> +		/* Update the counter bitmap */
>> +		if (!rdtgroup_mbm_cntr_is_assigned(r, cntr_id)) {
>> +			mbm_cntr_free(r, cntr_id);
>> +			rdtgrp->mon.cntr_id[index] = MON_CNTR_UNSET;
>> +		}
>> +	}
>> +
>> +	return 0;
> 
> This function is called many times and there are always paths adding complexity
> to handle error from this function ... yet it always returns 0. I expect that it should
> actually do error checking of the arch callback that could actually fail on other archs, that
> should impact this function's return value and make the need for error handling apparent.

Sure. Will do it.

> 
>> +}
>> +
>>  /* rdtgroup information files for one cache resource. */
>>  static struct rftype res_common_files[] = {
>>  	{
> 
> 
> Reinette
> 

-- 
Thanks
Babu Moger

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [PATCH v7 16/24] x86/resctrl: Add the interface to assign/update counter assignment
  2024-09-26 16:46       ` Reinette Chatre
@ 2024-09-26 16:59         ` Moger, Babu
  2024-09-27  1:48           ` Reinette Chatre
  0 siblings, 1 reply; 96+ messages in thread
From: Moger, Babu @ 2024-09-26 16:59 UTC (permalink / raw)
  To: Reinette Chatre, corbet, fenghua.yu, tglx, mingo, bp, dave.hansen
  Cc: x86, hpa, paulmck, rdunlap, tj, peterz, yanjiewtw, kim.phillips,
	lukas.bulwahn, seanjc, jmattson, leitao, jpoimboe,
	rick.p.edgecombe, kirill.shutemov, jithu.joseph, kai.huang,
	kan.liang, daniel.sneddon, pbonzini, sandipan.das, ilpo.jarvinen,
	peternewman, maciej.wieczor-retman, linux-doc, linux-kernel,
	eranian, james.morse

Hi Reinette,

On 9/26/24 11:46, Reinette Chatre wrote:
> Hi Babu,
> 
> On 9/26/24 9:28 AM, Moger, Babu wrote:
>> Hi Reinette,
>>
>>
>> On 9/19/24 12:20, Reinette Chatre wrote:
>>> Hi Babu,
>>>
>>> On 9/4/24 3:21 PM, Babu Moger wrote:
>>>> diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
>>>> index 7ad653b4e768..1d45120ff2b5 100644
>>>> --- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
>>>> +++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
>>>> @@ -864,6 +864,13 @@ static int rdtgroup_rmid_show(struct kernfs_open_file *of,
>>>>  	return ret;
>>>>  }
>>>>  
>>>> +/*
>>>> + * Get the counter index for the assignable counter
>>>> + * 0 for evtid == QOS_L3_MBM_TOTAL_EVENT_ID
>>>> + * 1 for evtid == QOS_L3_MBM_LOCAL_EVENT_ID
>>>> + */
>>>> +#define MBM_EVENT_ARRAY_INDEX(_event) ((_event) - 2)
>>>> +
>>>>  static int rdtgroup_mbm_assign_mode_show(struct kernfs_open_file *of,
>>>>  					 struct seq_file *s, void *v)
>>>>  {
>>>> @@ -1898,6 +1905,45 @@ int resctrl_arch_assign_cntr(struct rdt_resource *r, struct rdt_mon_domain *d,
>>>>  	return 0;
>>>>  }
>>>>  
>>>> +/*
>>>> + * Assign a hardware counter to the group.
>>>> + * Counter will be assigned to all the domains if rdt_mon_domain is NULL
>>>> + * else the counter will be allocated to specific domain.
>>>> + */
>>>> +int rdtgroup_assign_cntr(struct rdt_resource *r, struct rdtgroup *rdtgrp,
>>>> +			 struct rdt_mon_domain *d, enum resctrl_event_id evtid)
>>>
>>> Could we please review the naming of function as this series progresses? Using such a generic
>>> name for this specific function seems to result in its callers later in series having even more
>>> generic names that are hard to decipher. For example, later (very generic) "rdtgroup_assign_grp()"
>>> calls this function and I find rdtgroup_assign_grp() to be very vague making the code more difficult
>>> to follow. For example, rdtgroup_assign_cntr() could be rdtgroup_assign_cntr_event() and
>>> rdtgroup_assign_grp() could instead be rdtgroup_assign_cntr()?  Please feel free to improve.
>>
>> Sure.
>>
>> How about rdtgroup_assign_cntr_event() and rdtgroup_assign_cntr_grp() ?
>>
>> Added grp extension for the second one.
> 
> Is the "grp" extension needed? The function already has "rdtgroup_" as prefix so
> the "grp" extension does not seem necessary to me since I think "rdtgroup_" and "grp"
> intend to refer to the same?

How about rdtgroup_assign_cntrs() ?  Added 's' in the end.

We are assigning multiple counters here.

-- 
Thanks
Babu Moger

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [PATCH v7 18/24] x86/resctrl: Auto Assign/unassign counters when mbm_cntr_assign is enabled
  2024-09-19 17:29   ` Reinette Chatre
@ 2024-09-26 18:48     ` Moger, Babu
  0 siblings, 0 replies; 96+ messages in thread
From: Moger, Babu @ 2024-09-26 18:48 UTC (permalink / raw)
  To: Reinette Chatre, corbet, fenghua.yu, tglx, mingo, bp, dave.hansen
  Cc: x86, hpa, paulmck, rdunlap, tj, peterz, yanjiewtw, kim.phillips,
	lukas.bulwahn, seanjc, jmattson, leitao, jpoimboe,
	rick.p.edgecombe, kirill.shutemov, jithu.joseph, kai.huang,
	kan.liang, daniel.sneddon, pbonzini, sandipan.das, ilpo.jarvinen,
	peternewman, maciej.wieczor-retman, linux-doc, linux-kernel,
	eranian, james.morse

Hi Reinette,

On 9/19/24 12:29, Reinette Chatre wrote:
> Hi Babu,
> 
> Subject: "Assign" -> "assign"

Sure.
> 
> On 9/4/24 3:21 PM, Babu Moger wrote:
> 
>>  arch/x86/kernel/cpu/resctrl/rdtgroup.c | 61 ++++++++++++++++++++++++++
>>  1 file changed, 61 insertions(+)
>>
>> diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
>> index 21b9ca4ce493..bf94e4e05540 100644
>> --- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
>> +++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
>> @@ -2866,6 +2866,52 @@ static void schemata_list_destroy(void)
>>  	}
>>  }
>>  
>> +/*
>> + * Called when a new group is created. If `mbm_cntr_assign` mode is enabled,
>> + * counters are automatically assigned. Each group requires two counters:
>> + * one for the total event and one for the local event. Due to the limited
>> + * number of counters, assignments may fail in some cases. However, it is
>> + * not necessary to fail the group creation. Users have the option to
>> + * modify the assignments after the group has been created.
>> + */
>> +static int rdtgroup_assign_grp(struct rdtgroup *rdtgrp)
>> +{
>> +	struct rdt_resource *r = &rdt_resources_all[RDT_RESOURCE_L3].r_resctrl;
>> +	int ret = 0;
>> +
>> +	if (!resctrl_arch_mbm_cntr_assign_enabled(r))
>> +		return 0;
>> +
>> +	if (is_mbm_total_enabled())
>> +		ret = rdtgroup_assign_cntr(r, rdtgrp, NULL, QOS_L3_MBM_TOTAL_EVENT_ID);
>> +
>> +	if (!ret && is_mbm_local_enabled())
>> +		ret = rdtgroup_assign_cntr(r, rdtgrp, NULL, QOS_L3_MBM_LOCAL_EVENT_ID);
>> +
>> +	return ret;
>> +}
>> +
>> +/*
>> + * Called when a group is deleted. Counters are unassigned if it was in
>> + * assigned state.
>> + */
>> +static int rdtgroup_unassign_grp(struct rdtgroup *rdtgrp)
>> +{
>> +	struct rdt_resource *r = &rdt_resources_all[RDT_RESOURCE_L3].r_resctrl;
>> +	int ret = 0;
>> +
>> +	if (!resctrl_arch_mbm_cntr_assign_enabled(r))
>> +		return 0;
>> +
>> +	if (is_mbm_total_enabled())
>> +		ret = rdtgroup_unassign_cntr(r, rdtgrp, NULL, QOS_L3_MBM_TOTAL_EVENT_ID);
>> +
>> +	if (!ret && is_mbm_local_enabled())
>> +		ret = rdtgroup_unassign_cntr(r, rdtgrp, NULL, QOS_L3_MBM_LOCAL_EVENT_ID);
>> +
>> +	return ret;
>> +}
>> +
>>  static int rdt_get_tree(struct fs_context *fc)
>>  {
>>  	struct rdt_fs_context *ctx = rdt_fc2context(fc);
>> @@ -2925,6 +2971,8 @@ static int rdt_get_tree(struct fs_context *fc)
>>  		if (ret < 0)
>>  			goto out_mongrp;
>>  		rdtgroup_default.mon.mon_data_kn = kn_mondata;
>> +
>> +		rdtgroup_assign_grp(&rdtgroup_default);
>>  	}
>>  
>>  	ret = rdt_pseudo_lock_init();
>> @@ -2955,6 +3003,7 @@ static int rdt_get_tree(struct fs_context *fc)
>>  out_psl:
>>  	rdt_pseudo_lock_release();
>>  out_mondata:
>> +	rdtgroup_unassign_grp(&rdtgroup_default);
>>  	if (resctrl_arch_mon_capable())
>>  		kernfs_remove(kn_mondata);
>>  out_mongrp:
>> @@ -3214,6 +3263,8 @@ static void rdt_kill_sb(struct super_block *sb)
>>  		resctrl_arch_disable_alloc();
>>  	if (resctrl_arch_mon_capable())
>>  		resctrl_arch_disable_mon();
>> +
>> +	rdtgroup_unassign_grp(&rdtgroup_default);
>>  	resctrl_mounted = false;
>>  	kernfs_kill_sb(sb);
>>  	mutex_unlock(&rdtgroup_mutex);
>> @@ -3805,6 +3856,8 @@ static int rdtgroup_mkdir_mon(struct kernfs_node *parent_kn,
>>  		goto out_unlock;
>>  	}
>>  
>> +	rdtgroup_assign_grp(rdtgrp);
>> +
>>  	kernfs_activate(rdtgrp->kn);
>>  
>>  	/*
>> @@ -3849,6 +3902,8 @@ static int rdtgroup_mkdir_ctrl_mon(struct kernfs_node *parent_kn,
>>  	if (ret)
>>  		goto out_closid_free;
>>  
>> +	rdtgroup_assign_grp(rdtgrp);
>> +
>>  	kernfs_activate(rdtgrp->kn);
>>  
>>  	ret = rdtgroup_init_alloc(rdtgrp);
>> @@ -3874,6 +3929,7 @@ static int rdtgroup_mkdir_ctrl_mon(struct kernfs_node *parent_kn,
>>  out_del_list:
>>  	list_del(&rdtgrp->rdtgroup_list);
>>  out_rmid_free:
>> +	rdtgroup_unassign_grp(rdtgrp);
>>  	mkdir_rdt_prepare_rmid_free(rdtgrp);
>>  out_closid_free:
>>  	closid_free(closid);
>> @@ -3944,6 +4000,9 @@ static int rdtgroup_rmdir_mon(struct rdtgroup *rdtgrp, cpumask_var_t tmpmask)
>>  	update_closid_rmid(tmpmask, NULL);
>>  
>>  	rdtgrp->flags = RDT_DELETED;
>> +
>> +	rdtgroup_unassign_grp(rdtgrp);
>> +
>>  	free_rmid(rdtgrp->closid, rdtgrp->mon.rmid);
>>  
>>  	/*
>> @@ -3990,6 +4049,8 @@ static int rdtgroup_rmdir_ctrl(struct rdtgroup *rdtgrp, cpumask_var_t tmpmask)
>>  	cpumask_or(tmpmask, tmpmask, &rdtgrp->cpu_mask);
>>  	update_closid_rmid(tmpmask, NULL);
>>  
>> +	rdtgroup_unassign_grp(rdtgrp);
>> +
>>  	free_rmid(rdtgrp->closid, rdtgrp->mon.rmid);
>>  	closid_free(rdtgrp->closid);
>>  
> 
> Apart from earlier comment about rdtgroup_assign_grp()/rdtgroup_unassign_grp() naming, please also
> take care about how these functions are integrated since it seems to be inconsistent wrt whether it is called
> on mon capable resource. Also, I can see how the counter is removed when CTRL_MON group and MON group are
> explicitly removed but it is not clear to me how when a user removes a CTRL_MON group how the counters
> assigned to its child MON groups are unassigned.

I think we have a problem here while removing. The child MON grou counters
may remain assigned when CTRL_MON is removed. Will fix it. Thanks for
pointing out.
-- 
Thanks
Babu Moger

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [PATCH v7 19/24] x86/resctrl: Report "Unassigned" for MBM events in mbm_cntr_assign mode
  2024-09-19 17:31   ` Reinette Chatre
@ 2024-09-26 19:16     ` Moger, Babu
  2024-09-27  1:50       ` Reinette Chatre
  0 siblings, 1 reply; 96+ messages in thread
From: Moger, Babu @ 2024-09-26 19:16 UTC (permalink / raw)
  To: Reinette Chatre, corbet, fenghua.yu, tglx, mingo, bp, dave.hansen
  Cc: x86, hpa, paulmck, rdunlap, tj, peterz, yanjiewtw, kim.phillips,
	lukas.bulwahn, seanjc, jmattson, leitao, jpoimboe,
	rick.p.edgecombe, kirill.shutemov, jithu.joseph, kai.huang,
	kan.liang, daniel.sneddon, pbonzini, sandipan.das, ilpo.jarvinen,
	peternewman, maciej.wieczor-retman, linux-doc, linux-kernel,
	eranian, james.morse

Hi Reinette,


On 9/19/24 12:31, Reinette Chatre wrote:
> Hi Babu,
> 
> On 9/4/24 3:21 PM, Babu Moger wrote:
>> In mbm_cntr_assign mode, the hardware counter should be assigned to read
>> the MBM events.
>>
>> Report "Unassigned" in case the user attempts to read the events without
>> assigning the counter.
>>
>> Signed-off-by: Babu Moger <babu.moger@amd.com>
>> ---
>> v7: Moved the documentation under "mon_data".
>>     Updated the text little bit.
>>
>> v6: Added more explaination in the resctrl.rst
>>     Added checks to detect "Unassigned" before reading RMID.
>>
>> v5: New patch.
>> ---
>>  Documentation/arch/x86/resctrl.rst        | 10 ++++++++++
>>  arch/x86/kernel/cpu/resctrl/ctrlmondata.c | 13 ++++++++++++-
>>  2 files changed, 22 insertions(+), 1 deletion(-)
>>
>> diff --git a/Documentation/arch/x86/resctrl.rst b/Documentation/arch/x86/resctrl.rst
>> index 3e9302971faf..ff5397d19704 100644
>> --- a/Documentation/arch/x86/resctrl.rst
>> +++ b/Documentation/arch/x86/resctrl.rst
>> @@ -417,6 +417,16 @@ When monitoring is enabled all MON groups will also contain:
>>  	for the L3 cache they occupy). These are named "mon_sub_L3_YY"
>>  	where "YY" is the node number.
>>  
>> +	The mbm_cntr_assign mode allows users to assign a hardware counter
>> +	to an RMID-event pair, enabling bandwidth monitoring for as long
>> +	as the counter remains assigned. The hardware will continue tracking
>> +	the assigned RMID until the user manually unassigns it, ensuring
>> +	that counters are not reset during this period. With a limited number
>> +	of counters, the system may run out of assignable resources. In
>> +	mbm_cntr_assign mode, MBM event counters will return "Unassigned"
>> +	if the counter is not allocated to the event when read. Users must
>> +	manually assign a counter to read the events.
>> +
> 
> Please consider how this text could also be relevant to soft-ABMC.

It mostly applies to soft-ABMC also. Minor tweaking may be required.
How about?

"When supported the 'mbm_cntr_assign' mode allows users to assign a
hardware counter to RMID, event pair, enabling bandwidth monitoring for as
long as the counter remains assigned. The hardware will continue tracking
the assigned RMID until the user manually unassigns it, ensuring
that counters are not reset during this period. With a limited number
of counters, the system may run out of assignable counters at some point.
In that case, MBM event counters will return "Unassigned" when the event
when read. Users must manually assign a counter to read the events."


> 
>>  "mon_hw_id":
>>  	Available only with debug option. The identifier used by hardware
>>  	for the monitor group. On x86 this is the RMID.
>> diff --git a/arch/x86/kernel/cpu/resctrl/ctrlmondata.c b/arch/x86/kernel/cpu/resctrl/ctrlmondata.c
>> index 50fa1fe9a073..fc19b1d131b2 100644
>> --- a/arch/x86/kernel/cpu/resctrl/ctrlmondata.c
>> +++ b/arch/x86/kernel/cpu/resctrl/ctrlmondata.c
>> @@ -562,7 +562,7 @@ int rdtgroup_mondata_show(struct seq_file *m, void *arg)
>>  	struct rdtgroup *rdtgrp;
>>  	struct rdt_resource *r;
>>  	union mon_data_bits md;
>> -	int ret = 0;
>> +	int ret = 0, index;
>>  
>>  	rdtgrp = rdtgroup_kn_lock_live(of->kn);
>>  	if (!rdtgrp) {
>> @@ -576,6 +576,15 @@ int rdtgroup_mondata_show(struct seq_file *m, void *arg)
>>  	evtid = md.u.evtid;
>>  	r = &rdt_resources_all[resid].r_resctrl;
>>  
>> +	if (resctrl_arch_mbm_cntr_assign_enabled(r) && evtid != QOS_L3_OCCUP_EVENT_ID) {
>> +		index = mon_event_config_index_get(evtid);
> 
> This should use MBM_EVENT_ARRAY_INDEX, not the arch index.

Sure.

> 
>> +		if (index != INVALID_CONFIG_INDEX &&
>> +		    rdtgrp->mon.cntr_id[index] == MON_CNTR_UNSET) {
>> +			rr.err = -ENOENT;
>> +			goto checkresult;
>> +		}
>> +	}
>> +
>>  	if (md.u.sum) {
>>  		/*
>>  		 * This file requires summing across all domains that share
>> @@ -613,6 +622,8 @@ int rdtgroup_mondata_show(struct seq_file *m, void *arg)
>>  		seq_puts(m, "Error\n");
>>  	else if (rr.err == -EINVAL)
>>  		seq_puts(m, "Unavailable\n");
>> +	else if (rr.err == -ENOENT)
>> +		seq_puts(m, "Unassigned\n");
>>  	else
>>  		seq_printf(m, "%llu\n", rr.val);
>>  
> 
> Reinette
> 

-- 
Thanks
Babu Moger

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [PATCH v7 20/24] x86/resctrl: Introduce the interface to switch between monitor modes
  2024-09-19 17:38   ` Reinette Chatre
@ 2024-09-26 19:39     ` Moger, Babu
  2024-09-27  1:51       ` Reinette Chatre
  0 siblings, 1 reply; 96+ messages in thread
From: Moger, Babu @ 2024-09-26 19:39 UTC (permalink / raw)
  To: Reinette Chatre, corbet, fenghua.yu, tglx, mingo, bp, dave.hansen
  Cc: x86, hpa, paulmck, rdunlap, tj, peterz, yanjiewtw, kim.phillips,
	lukas.bulwahn, seanjc, jmattson, leitao, jpoimboe,
	rick.p.edgecombe, kirill.shutemov, jithu.joseph, kai.huang,
	kan.liang, daniel.sneddon, pbonzini, sandipan.das, ilpo.jarvinen,
	peternewman, maciej.wieczor-retman, linux-doc, linux-kernel,
	eranian, james.morse

Hi Reinette,

On 9/19/24 12:38, Reinette Chatre wrote:
> Hi Babu,
> 
> On 9/4/24 3:21 PM, Babu Moger wrote:
>> Introduce interface to switch between mbm_cntr_assign and default modes.
>>
>> $ cat /sys/fs/resctrl/info/L3_MON/mbm_assign_mode
>> [mbm_cntr_assign]
>> default
>>
>> To enable the "mbm_cntr_assign" mode:
>> $ echo "mbm_cntr_assign" > /sys/fs/resctrl/info/L3_MON/mbm_assign_mode
>>
>> To enable the default monitoring mode:
>> $ echo "default" > /sys/fs/resctrl/info/L3_MON/mbm_assign_mode
>>
>> MBM event counters will reset when mbm_assign_mode is changed.
>>
>> Signed-off-by: Babu Moger <babu.moger@amd.com>
>> ---
>> v7: Changed the interface name to mbm_assign_mode.
>>     Removed the references of ABMC.
>>     Added the changes to reset global and domain bitmaps.
>>     Added the changes to reset rmid.
>>
>> v6: Changed the mode name to mbm_cntr_assign.
>>     Moved all the FS related code here.
>>     Added changes to reset mbm_cntr_map and resctrl group counters.
>> ""
>> v5: Change log and mode description text correction.
>>
>> v4: Minor commit text changes. Keep the default to ABMC when supported.
>>     Fixed comments to reflect changed interface "mbm_mode".
>>
>> v3: New patch to address the review comments from upstream.
>> ---
>>  Documentation/arch/x86/resctrl.rst     | 15 ++++++
>>  arch/x86/kernel/cpu/resctrl/rdtgroup.c | 74 +++++++++++++++++++++++++-
>>  2 files changed, 88 insertions(+), 1 deletion(-)
>>
>> diff --git a/Documentation/arch/x86/resctrl.rst b/Documentation/arch/x86/resctrl.rst
>> index ff5397d19704..743c0b64a330 100644
>> --- a/Documentation/arch/x86/resctrl.rst
>> +++ b/Documentation/arch/x86/resctrl.rst
>> @@ -290,6 +290,21 @@ with the following files:
>>  	than 'num_mbm_cntrs' to be created. Reading the mbm files may report 'Unavailable'
>>  	if there is no hardware resource assigned.
>>  
>> +	* To enable ABMC feature:
> 
> The separation between fs and arch did not make it to this patch?

My bad. Will change the text.

> 
>> +	  ::
>> +
>> +	    # echo  "mbm_cntr_assign" > /sys/fs/resctrl/info/L3_MON/mbm_assign_mode
>> +
>> +	* To enable the legacy monitoring feature:
> 
> "legacy" -> "default"?

Sure.

> 
>> +	  ::
>> +
>> +	    # echo  "default" > /sys/fs/resctrl/info/L3_MON/mbm_assign_mode
>> +
>> +	The MBM event counters will reset when mbm_assign_mode is changed. Moving to
> 
> "will reset" -> "may reset"? Please also be clear on what is meant with "MBM event counter".

It "will reset".

> Note that "counter" has a very specific meaning in this work and after considering that
> it is not clear if "MBM event counter will reset" means that the counters are no longer
> assigned or if it means that the counts associated with events will be reset.

How about

"The MBM event counters(mbm_total_bytes and mbm_local_bytes) associated
with the event will reset when mbm_assign_mode is changed."

> 
>> +	mbm_cntr_assign will require users to assign the counters to the events to
>> +	read the events. Otherwise, the MBM event counters will return "Unassigned"
>> +	when read.
>> +
>>  "num_mbm_cntrs":
>>  	The number of monitoring counters available for assignment.
>>  
>> diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
>> index bf94e4e05540..7a8ece12d7da 100644
>> --- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
>> +++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
>> @@ -895,6 +895,77 @@ static int rdtgroup_mbm_assign_mode_show(struct kernfs_open_file *of,
>>  	return 0;
>>  }
>>  
>> +static void rdtgroup_mbm_cntr_reset(struct rdt_resource *r)
> 
> It is not clear why this has "rdtgroup" prefix since it is not specific to
> a resource group but a global action that resets all counters.

Sure. Will remote prefix.

> 
>> +{
>> +	struct rdtgroup *prgrp, *crgrp;
>> +	struct rdt_mon_domain *dom;
>> +
>> +	/*
>> +	 * Hardware counters will reset after switching the monitor mode.
>> +	 * Reset the architectural state so that reading of hardware
>> +	 * counter is not considered as an overflow in the next update.
>> +	 * Also reset the domain counter bitmap.
>> +	 */
>> +	list_for_each_entry(dom, &r->mon_domains, hdr.list) {
>> +		bitmap_zero(dom->mbm_cntr_map, r->mon.num_mbm_cntrs);
>> +		resctrl_arch_reset_rmid_all(r, dom);
>> +	}
>> +
>> +	/* Reset global MBM counter map */
>> +	bitmap_fill(r->mon.mbm_cntr_free_map, r->mon.num_mbm_cntrs);
>> +
>> +	/* Reset the cntr_id's for all the monitor groups */
>> +	list_for_each_entry(prgrp, &rdt_all_groups, rdtgroup_list) {
>> +		prgrp->mon.cntr_id[0] = MON_CNTR_UNSET;
>> +		prgrp->mon.cntr_id[1] = MON_CNTR_UNSET;
>> +		list_for_each_entry(crgrp, &prgrp->mon.crdtgrp_list,
>> +				    mon.crdtgrp_list) {
>> +			crgrp->mon.cntr_id[0] = MON_CNTR_UNSET;
>> +			crgrp->mon.cntr_id[1] = MON_CNTR_UNSET;
>> +		}
>> +	}
>> +}
>> +
>> +static ssize_t rdtgroup_mbm_assign_mode_write(struct kernfs_open_file *of,
>> +					      char *buf, size_t nbytes, loff_t off)
>> +{
>> +	struct rdt_resource *r = of->kn->parent->priv;
>> +	int ret = 0;
>> +	bool enable;
>> +
>> +	/* Valid input requires a trailing newline */
>> +	if (nbytes == 0 || buf[nbytes - 1] != '\n')
>> +		return -EINVAL;
>> +
>> +	buf[nbytes - 1] = '\0';
>> +
>> +	cpus_read_lock();
>> +	mutex_lock(&rdtgroup_mutex);
>> +
>> +	rdt_last_cmd_clear();
>> +
>> +	if (!strcmp(buf, "default")) {
>> +		enable = 0;
>> +	} else if (!strcmp(buf, "mbm_cntr_assign")) {
>> +		enable = 1;
>> +	} else {
>> +		ret = -EINVAL;
>> +		rdt_last_cmd_puts("Unsupported assign mode\n");
>> +		goto write_exit;
>> +	}
>> +
>> +	if (enable != resctrl_arch_mbm_cntr_assign_enabled(r)) {
>> +		rdtgroup_mbm_cntr_reset(r);
> 
> Should this reset not happen only after the hardware state was changed
> successfully? If the arch change failed then this may lead to inconsistent
> state.

Sure. Will move after the mode is changed.

> 
>> +		ret = resctrl_arch_mbm_cntr_assign_set(r, enable);
>> +	}
>> +
>> +write_exit:
>> +	mutex_unlock(&rdtgroup_mutex);
>> +	cpus_read_unlock();
>> +
>> +	return ret ?: nbytes;
>> +}
>> +
>>  static int rdtgroup_num_mbm_cntrs_show(struct kernfs_open_file *of,
>>  				       struct seq_file *s, void *v)
>>  {
>> @@ -2107,9 +2178,10 @@ static struct rftype res_common_files[] = {
>>  	},
>>  	{
>>  		.name		= "mbm_assign_mode",
>> -		.mode		= 0444,
>> +		.mode		= 0644,
>>  		.kf_ops		= &rdtgroup_kf_single_ops,
>>  		.seq_show	= rdtgroup_mbm_assign_mode_show,
>> +		.write		= rdtgroup_mbm_assign_mode_write,
>>  		.fflags		= RFTYPE_MON_INFO,
>>  	},
>>  	{
> 
> Reinette
> 

-- 
Thanks
Babu Moger

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [PATCH v7 16/24] x86/resctrl: Add the interface to assign/update counter assignment
  2024-09-26 16:59         ` Moger, Babu
@ 2024-09-27  1:48           ` Reinette Chatre
  0 siblings, 0 replies; 96+ messages in thread
From: Reinette Chatre @ 2024-09-27  1:48 UTC (permalink / raw)
  To: babu.moger, corbet, fenghua.yu, tglx, mingo, bp, dave.hansen
  Cc: x86, hpa, paulmck, rdunlap, tj, peterz, yanjiewtw, kim.phillips,
	lukas.bulwahn, seanjc, jmattson, leitao, jpoimboe,
	rick.p.edgecombe, kirill.shutemov, jithu.joseph, kai.huang,
	kan.liang, daniel.sneddon, pbonzini, sandipan.das, ilpo.jarvinen,
	peternewman, maciej.wieczor-retman, linux-doc, linux-kernel,
	eranian, james.morse

Hi Babu,

On 9/26/24 9:59 AM, Moger, Babu wrote:
> On 9/26/24 11:46, Reinette Chatre wrote:
>> On 9/26/24 9:28 AM, Moger, Babu wrote:
>>> On 9/19/24 12:20, Reinette Chatre wrote:
>>>> On 9/4/24 3:21 PM, Babu Moger wrote:
>>>>> diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
>>>>> index 7ad653b4e768..1d45120ff2b5 100644
>>>>> --- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
>>>>> +++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
>>>>> @@ -864,6 +864,13 @@ static int rdtgroup_rmid_show(struct kernfs_open_file *of,
>>>>>  	return ret;
>>>>>  }
>>>>>  
>>>>> +/*
>>>>> + * Get the counter index for the assignable counter
>>>>> + * 0 for evtid == QOS_L3_MBM_TOTAL_EVENT_ID
>>>>> + * 1 for evtid == QOS_L3_MBM_LOCAL_EVENT_ID
>>>>> + */
>>>>> +#define MBM_EVENT_ARRAY_INDEX(_event) ((_event) - 2)
>>>>> +
>>>>>  static int rdtgroup_mbm_assign_mode_show(struct kernfs_open_file *of,
>>>>>  					 struct seq_file *s, void *v)
>>>>>  {
>>>>> @@ -1898,6 +1905,45 @@ int resctrl_arch_assign_cntr(struct rdt_resource *r, struct rdt_mon_domain *d,
>>>>>  	return 0;
>>>>>  }
>>>>>  
>>>>> +/*
>>>>> + * Assign a hardware counter to the group.
>>>>> + * Counter will be assigned to all the domains if rdt_mon_domain is NULL
>>>>> + * else the counter will be allocated to specific domain.
>>>>> + */
>>>>> +int rdtgroup_assign_cntr(struct rdt_resource *r, struct rdtgroup *rdtgrp,
>>>>> +			 struct rdt_mon_domain *d, enum resctrl_event_id evtid)
>>>>
>>>> Could we please review the naming of function as this series progresses? Using such a generic
>>>> name for this specific function seems to result in its callers later in series having even more
>>>> generic names that are hard to decipher. For example, later (very generic) "rdtgroup_assign_grp()"
>>>> calls this function and I find rdtgroup_assign_grp() to be very vague making the code more difficult
>>>> to follow. For example, rdtgroup_assign_cntr() could be rdtgroup_assign_cntr_event() and
>>>> rdtgroup_assign_grp() could instead be rdtgroup_assign_cntr()?  Please feel free to improve.
>>>
>>> Sure.
>>>
>>> How about rdtgroup_assign_cntr_event() and rdtgroup_assign_cntr_grp() ?
>>>
>>> Added grp extension for the second one.
>>
>> Is the "grp" extension needed? The function already has "rdtgroup_" as prefix so
>> the "grp" extension does not seem necessary to me since I think "rdtgroup_" and "grp"
>> intend to refer to the same?
> 
> How about rdtgroup_assign_cntrs() ?  Added 's' in the end.
> 
> We are assigning multiple counters here.
> 

rdtgroup_assign_cntrs() sounds good to me.

Thank you

Reinette


^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [PATCH v7 19/24] x86/resctrl: Report "Unassigned" for MBM events in mbm_cntr_assign mode
  2024-09-26 19:16     ` Moger, Babu
@ 2024-09-27  1:50       ` Reinette Chatre
  2024-09-27 13:40         ` Moger, Babu
  0 siblings, 1 reply; 96+ messages in thread
From: Reinette Chatre @ 2024-09-27  1:50 UTC (permalink / raw)
  To: babu.moger, corbet, fenghua.yu, tglx, mingo, bp, dave.hansen
  Cc: x86, hpa, paulmck, rdunlap, tj, peterz, yanjiewtw, kim.phillips,
	lukas.bulwahn, seanjc, jmattson, leitao, jpoimboe,
	rick.p.edgecombe, kirill.shutemov, jithu.joseph, kai.huang,
	kan.liang, daniel.sneddon, pbonzini, sandipan.das, ilpo.jarvinen,
	peternewman, maciej.wieczor-retman, linux-doc, linux-kernel,
	eranian, james.morse

Hi Babu,

On 9/26/24 12:16 PM, Moger, Babu wrote:
> On 9/19/24 12:31, Reinette Chatre wrote:
>> Hi Babu,
>>
>> On 9/4/24 3:21 PM, Babu Moger wrote:
>>> In mbm_cntr_assign mode, the hardware counter should be assigned to read
>>> the MBM events.
>>>
>>> Report "Unassigned" in case the user attempts to read the events without
>>> assigning the counter.
>>>
>>> Signed-off-by: Babu Moger <babu.moger@amd.com>
>>> ---
>>> v7: Moved the documentation under "mon_data".
>>>     Updated the text little bit.
>>>
>>> v6: Added more explaination in the resctrl.rst
>>>     Added checks to detect "Unassigned" before reading RMID.
>>>
>>> v5: New patch.
>>> ---
>>>  Documentation/arch/x86/resctrl.rst        | 10 ++++++++++
>>>  arch/x86/kernel/cpu/resctrl/ctrlmondata.c | 13 ++++++++++++-
>>>  2 files changed, 22 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/Documentation/arch/x86/resctrl.rst b/Documentation/arch/x86/resctrl.rst
>>> index 3e9302971faf..ff5397d19704 100644
>>> --- a/Documentation/arch/x86/resctrl.rst
>>> +++ b/Documentation/arch/x86/resctrl.rst
>>> @@ -417,6 +417,16 @@ When monitoring is enabled all MON groups will also contain:
>>>  	for the L3 cache they occupy). These are named "mon_sub_L3_YY"
>>>  	where "YY" is the node number.
>>>  
>>> +	The mbm_cntr_assign mode allows users to assign a hardware counter
>>> +	to an RMID-event pair, enabling bandwidth monitoring for as long
>>> +	as the counter remains assigned. The hardware will continue tracking
>>> +	the assigned RMID until the user manually unassigns it, ensuring
>>> +	that counters are not reset during this period. With a limited number
>>> +	of counters, the system may run out of assignable resources. In
>>> +	mbm_cntr_assign mode, MBM event counters will return "Unassigned"
>>> +	if the counter is not allocated to the event when read. Users must
>>> +	manually assign a counter to read the events.
>>> +
>>
>> Please consider how this text could also be relevant to soft-ABMC.
> 
> It mostly applies to soft-ABMC also. Minor tweaking may be required.

hmmm ... seems that I still have mostly the "soft-RMID" model in my head.

> How about?
> 
> "When supported the 'mbm_cntr_assign' mode allows users to assign a
> hardware counter to RMID, event pair, enabling bandwidth monitoring for as

hmmm ... so soft-ABMC also assigns hardware counters?

Also, we should aim for generic text that will cover how this may look on MPAM
also. Considering this, it may just mean to replace "RMID, event pair" with 
"mon_hw_id, event pair"?

> long as the counter remains assigned. The hardware will continue tracking
> the assigned RMID until the user manually unassigns it, ensuring

Please do double-check all usage of "RMID" in user facing interfaces/docs where
mon_hw_id may be more appropriate.

> that counters are not reset during this period. With a limited number
> of counters, the system may run out of assignable counters at some point.
> In that case, MBM event counters will return "Unassigned" when the event
> when read. Users must manually assign a counter to read the events."

"when the event when read" -> "when the event is read"?

Reinette

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [PATCH v7 20/24] x86/resctrl: Introduce the interface to switch between monitor modes
  2024-09-26 19:39     ` Moger, Babu
@ 2024-09-27  1:51       ` Reinette Chatre
  2024-09-27 13:26         ` Moger, Babu
  0 siblings, 1 reply; 96+ messages in thread
From: Reinette Chatre @ 2024-09-27  1:51 UTC (permalink / raw)
  To: babu.moger, corbet, fenghua.yu, tglx, mingo, bp, dave.hansen
  Cc: x86, hpa, paulmck, rdunlap, tj, peterz, yanjiewtw, kim.phillips,
	lukas.bulwahn, seanjc, jmattson, leitao, jpoimboe,
	rick.p.edgecombe, kirill.shutemov, jithu.joseph, kai.huang,
	kan.liang, daniel.sneddon, pbonzini, sandipan.das, ilpo.jarvinen,
	peternewman, maciej.wieczor-retman, linux-doc, linux-kernel,
	eranian, james.morse

Hi Babu,

On 9/26/24 12:39 PM, Moger, Babu wrote:
> On 9/19/24 12:38, Reinette Chatre wrote:
>> On 9/4/24 3:21 PM, Babu Moger wrote:

>>> +	  ::
>>> +
>>> +	    # echo  "default" > /sys/fs/resctrl/info/L3_MON/mbm_assign_mode
>>> +
>>> +	The MBM event counters will reset when mbm_assign_mode is changed. Moving to
>>
>> "will reset" -> "may reset"? Please also be clear on what is meant with "MBM event counter".
> 
> It "will reset".
> 

I understand that this is true for the ABMC implementation. My goal with making this vague is
to not have this reset set in stone if some other implementation behaves differently. 

>> Note that "counter" has a very specific meaning in this work and after considering that
>> it is not clear if "MBM event counter will reset" means that the counters are no longer
>> assigned or if it means that the counts associated with events will be reset.
> 
> How about
> 
> "The MBM event counters(mbm_total_bytes and mbm_local_bytes) associated
> with the event will reset when mbm_assign_mode is changed."

In the docs "mbm_total_bytes" and "mbm_local_bytes" are termed "events" ... maybe
"The MBM events (mbm_total_bytes and/or mbm_local_bytes) associated 
counters may reset when mbm_assign_mode is changed."?

Reinette

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [PATCH v7 20/24] x86/resctrl: Introduce the interface to switch between monitor modes
  2024-09-27  1:51       ` Reinette Chatre
@ 2024-09-27 13:26         ` Moger, Babu
  2024-09-27 15:07           ` Reinette Chatre
  0 siblings, 1 reply; 96+ messages in thread
From: Moger, Babu @ 2024-09-27 13:26 UTC (permalink / raw)
  To: Reinette Chatre, babu.moger, corbet, fenghua.yu, tglx, mingo, bp,
	dave.hansen
  Cc: x86, hpa, paulmck, rdunlap, tj, peterz, yanjiewtw, kim.phillips,
	lukas.bulwahn, seanjc, jmattson, leitao, jpoimboe,
	rick.p.edgecombe, kirill.shutemov, jithu.joseph, kai.huang,
	kan.liang, daniel.sneddon, pbonzini, sandipan.das, ilpo.jarvinen,
	peternewman, maciej.wieczor-retman, linux-doc, linux-kernel,
	eranian, james.morse

Hi Reinette,

On 9/26/2024 8:51 PM, Reinette Chatre wrote:
> Hi Babu,
> 
> On 9/26/24 12:39 PM, Moger, Babu wrote:
>> On 9/19/24 12:38, Reinette Chatre wrote:
>>> On 9/4/24 3:21 PM, Babu Moger wrote:
> 
>>>> +	  ::
>>>> +
>>>> +	    # echo  "default" > /sys/fs/resctrl/info/L3_MON/mbm_assign_mode
>>>> +
>>>> +	The MBM event counters will reset when mbm_assign_mode is changed. Moving to
>>>
>>> "will reset" -> "may reset"? Please also be clear on what is meant with "MBM event counter".
>>
>> It "will reset".
>>
> 
> I understand that this is true for the ABMC implementation. My goal with making this vague is
> to not have this reset set in stone if some other implementation behaves differently.

ok.
> 
>>> Note that "counter" has a very specific meaning in this work and after considering that
>>> it is not clear if "MBM event counter will reset" means that the counters are no longer
>>> assigned or if it means that the counts associated with events will be reset.
>>
>> How about
>>
>> "The MBM event counters(mbm_total_bytes and mbm_local_bytes) associated
>> with the event will reset when mbm_assign_mode is changed."
> 
> In the docs "mbm_total_bytes" and "mbm_local_bytes" are termed "events" ... maybe
> "The MBM events (mbm_total_bytes and/or mbm_local_bytes) associated
> counters may reset when mbm_assign_mode is changed."?

Sure.

-- 
- Babu Moger

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [PATCH v7 19/24] x86/resctrl: Report "Unassigned" for MBM events in mbm_cntr_assign mode
  2024-09-27  1:50       ` Reinette Chatre
@ 2024-09-27 13:40         ` Moger, Babu
  0 siblings, 0 replies; 96+ messages in thread
From: Moger, Babu @ 2024-09-27 13:40 UTC (permalink / raw)
  To: Reinette Chatre, babu.moger, corbet, fenghua.yu, tglx, mingo, bp,
	dave.hansen
  Cc: x86, hpa, paulmck, rdunlap, tj, peterz, yanjiewtw, kim.phillips,
	lukas.bulwahn, seanjc, jmattson, leitao, jpoimboe,
	rick.p.edgecombe, kirill.shutemov, jithu.joseph, kai.huang,
	kan.liang, daniel.sneddon, pbonzini, sandipan.das, ilpo.jarvinen,
	peternewman, maciej.wieczor-retman, linux-doc, linux-kernel,
	eranian, james.morse

Hi Reinette,

On 9/26/2024 8:50 PM, Reinette Chatre wrote:
> Hi Babu,
> 
> On 9/26/24 12:16 PM, Moger, Babu wrote:
>> On 9/19/24 12:31, Reinette Chatre wrote:
>>> Hi Babu,
>>>
>>> On 9/4/24 3:21 PM, Babu Moger wrote:
>>>> In mbm_cntr_assign mode, the hardware counter should be assigned to read
>>>> the MBM events.
>>>>
>>>> Report "Unassigned" in case the user attempts to read the events without
>>>> assigning the counter.
>>>>
>>>> Signed-off-by: Babu Moger <babu.moger@amd.com>
>>>> ---
>>>> v7: Moved the documentation under "mon_data".
>>>>      Updated the text little bit.
>>>>
>>>> v6: Added more explaination in the resctrl.rst
>>>>      Added checks to detect "Unassigned" before reading RMID.
>>>>
>>>> v5: New patch.
>>>> ---
>>>>   Documentation/arch/x86/resctrl.rst        | 10 ++++++++++
>>>>   arch/x86/kernel/cpu/resctrl/ctrlmondata.c | 13 ++++++++++++-
>>>>   2 files changed, 22 insertions(+), 1 deletion(-)
>>>>
>>>> diff --git a/Documentation/arch/x86/resctrl.rst b/Documentation/arch/x86/resctrl.rst
>>>> index 3e9302971faf..ff5397d19704 100644
>>>> --- a/Documentation/arch/x86/resctrl.rst
>>>> +++ b/Documentation/arch/x86/resctrl.rst
>>>> @@ -417,6 +417,16 @@ When monitoring is enabled all MON groups will also contain:
>>>>   	for the L3 cache they occupy). These are named "mon_sub_L3_YY"
>>>>   	where "YY" is the node number.
>>>>   
>>>> +	The mbm_cntr_assign mode allows users to assign a hardware counter
>>>> +	to an RMID-event pair, enabling bandwidth monitoring for as long
>>>> +	as the counter remains assigned. The hardware will continue tracking
>>>> +	the assigned RMID until the user manually unassigns it, ensuring
>>>> +	that counters are not reset during this period. With a limited number
>>>> +	of counters, the system may run out of assignable resources. In
>>>> +	mbm_cntr_assign mode, MBM event counters will return "Unassigned"
>>>> +	if the counter is not allocated to the event when read. Users must
>>>> +	manually assign a counter to read the events.
>>>> +
>>>
>>> Please consider how this text could also be relevant to soft-ABMC.
>>
>> It mostly applies to soft-ABMC also. Minor tweaking may be required.
> 
> hmmm ... seems that I still have mostly the "soft-RMID" model in my head.
> 
>> How about?
>>
>> "When supported the 'mbm_cntr_assign' mode allows users to assign a
>> hardware counter to RMID, event pair, enabling bandwidth monitoring for as
> 
> hmmm ... so soft-ABMC also assigns hardware counters?


It does not have hardware counter. I need to change this text.

> Also, we should aim for generic text that will cover how this may look on MPAM
> also. Considering this, it may just mean to replace "RMID, event pair" with
> "mon_hw_id, event pair"?

ok.

> 
>> long as the counter remains assigned. The hardware will continue tracking
>> the assigned RMID until the user manually unassigns it, ensuring
> 
> Please do double-check all usage of "RMID" in user facing interfaces/docs where
> mon_hw_id may be more appropriate.

Sure.

> 
>> that counters are not reset during this period. With a limited number
>> of counters, the system may run out of assignable counters at some point.
>> In that case, MBM event counters will return "Unassigned" when the event
>> when read. Users must manually assign a counter to read the events."
> 
> "when the event when read" -> "when the event is read"?

Sure.

-- 
- Babu Moger

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [PATCH v7 21/24] x86/resctrl: Configure mbm_cntr_assign mode if supported
  2024-09-19 17:43   ` Reinette Chatre
@ 2024-09-27 14:37     ` Moger, Babu
  0 siblings, 0 replies; 96+ messages in thread
From: Moger, Babu @ 2024-09-27 14:37 UTC (permalink / raw)
  To: Reinette Chatre, Babu Moger, corbet, fenghua.yu, tglx, mingo, bp,
	dave.hansen
  Cc: x86, hpa, paulmck, rdunlap, tj, peterz, yanjiewtw, kim.phillips,
	lukas.bulwahn, seanjc, jmattson, leitao, jpoimboe,
	rick.p.edgecombe, kirill.shutemov, jithu.joseph, kai.huang,
	kan.liang, daniel.sneddon, pbonzini, sandipan.das, ilpo.jarvinen,
	peternewman, maciej.wieczor-retman, linux-doc, linux-kernel,
	eranian, james.morse

Hi Reinette,

On 9/19/2024 12:43 PM, Reinette Chatre wrote:
> Hi Babu,
> 
> On 9/4/24 3:21 PM, Babu Moger wrote:
>> Configure mbm_cntr_assign on AMD.
>>
>> 'mbm_cntr_assign' mode in AMD is ABMC (Assignable Bandwidth Monitoring
>> Counters). When the ABMC is updated, it must be updated on all logical
>> processors in the resctrl domain.
>>
>> Signed-off-by: Babu Moger <babu.moger@amd.com>
>> ---
>> v7: Introduced resctrl_arch_mbm_cntr_assign_configure() to configure.
>>      Moved the default settings to rdt_get_mon_l3_config(). It should be
>>      done before the hotplug handler is called. It cannot be done at
>>      rdtgroup_init().
>>
>> v6: Keeping the default enablement in arch init code for now.
>>       This may need some discussion.
>>       Renamed resctrl_arch_configure_abmc to resctrl_arch_mbm_cntr_assign_configure.
>>
>> v5: New patch to enable ABMC by default.
>> ---
>>   arch/x86/kernel/cpu/resctrl/internal.h |  1 +
>>   arch/x86/kernel/cpu/resctrl/monitor.c  |  1 +
>>   arch/x86/kernel/cpu/resctrl/rdtgroup.c | 11 +++++++++++
>>   3 files changed, 13 insertions(+)
>>
>> diff --git a/arch/x86/kernel/cpu/resctrl/internal.h b/arch/x86/kernel/cpu/resctrl/internal.h
>> index 9a65a13ccbe9..3250561f0187 100644
>> --- a/arch/x86/kernel/cpu/resctrl/internal.h
>> +++ b/arch/x86/kernel/cpu/resctrl/internal.h
>> @@ -709,6 +709,7 @@ int rdtgroup_assign_cntr(struct rdt_resource *r, struct rdtgroup *rdtgrp,
>>   			 struct rdt_mon_domain *d, enum resctrl_event_id evtid);
>>   int rdtgroup_unassign_cntr(struct rdt_resource *r, struct rdtgroup *rdtgrp,
>>   			   struct rdt_mon_domain *d, enum resctrl_event_id evtid);
>> +void resctrl_arch_mbm_cntr_assign_configure(struct rdt_resource *r);
>>   void rdt_staged_configs_clear(void);
>>   bool closid_allocated(unsigned int closid);
>>   int resctrl_find_cleanest_closid(void);
>> diff --git a/arch/x86/kernel/cpu/resctrl/monitor.c b/arch/x86/kernel/cpu/resctrl/monitor.c
>> index 09b1d8bb0aa0..314c0b297470 100644
>> --- a/arch/x86/kernel/cpu/resctrl/monitor.c
>> +++ b/arch/x86/kernel/cpu/resctrl/monitor.c
>> @@ -1261,6 +1261,7 @@ int __init rdt_get_mon_l3_config(struct rdt_resource *r)
>>   			cpuid_count(0x80000020, 5, &eax, &ebx, &ecx, &edx);
>>   			r->mon.num_mbm_cntrs = (ebx & 0xFFFF) + 1;
>>   			resctrl_file_fflags_init("num_mbm_cntrs", RFTYPE_MON_INFO);
>> +			hw_res->mbm_cntr_assign_enabled = true;
> 
> This is a major change to require architecture to set whether this is the default mode.
> That seems fine but needs to be highlighted in the changelog and descriptions of this work.

Sure. Will add the text about this.

> 
>>   		}
>>   	}
>>   
>> diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
>> index 7a8ece12d7da..1054583bef9d 100644
>> --- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
>> +++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
>> @@ -2726,6 +2726,13 @@ int resctrl_arch_mbm_cntr_assign_set(struct rdt_resource *r, bool enable)
>>   	return 0;
>>   }
>>   
>> +void resctrl_arch_mbm_cntr_assign_configure(struct rdt_resource *r)
> 
> How about resctrl_arch_mbm_cntr_assign_set_one() to match existing
> resctrl_arch_mbm_cntr_assign_set()?

Sure.

> 
>> +{
>> +	struct rdt_hw_resource *hw_res = resctrl_to_arch_res(r);
>> +
>> +	resctrl_abmc_set_one_amd(&hw_res->mbm_cntr_assign_enabled);
>> +}
>> +
>>   /*
>>    * We don't allow rdtgroup directories to be created anywhere
>>    * except the root directory. Thus when looking for the rdtgroup
>> @@ -4510,9 +4517,13 @@ int resctrl_online_mon_domain(struct rdt_resource *r, struct rdt_mon_domain *d)
>>   
>>   void resctrl_online_cpu(unsigned int cpu)
>>   {
>> +	struct rdt_resource *r = &rdt_resources_all[RDT_RESOURCE_L3].r_resctrl;
>> +
>>   	mutex_lock(&rdtgroup_mutex);
>>   	/* The CPU is set in default rdtgroup after online. */
>>   	cpumask_set_cpu(cpu, &rdtgroup_default.cpu_mask);
>> +	if (r->mon.mbm_cntr_assignable)
> 
> Needs a r->mon_capable check?

Sure. Will add it.

> 
>> +		resctrl_arch_mbm_cntr_assign_configure(r);
>>   	mutex_unlock(&rdtgroup_mutex);
>>   }
>>   
> 
> Reinette
> 

Thanks
- Babu Moger

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [PATCH v7 20/24] x86/resctrl: Introduce the interface to switch between monitor modes
  2024-09-27 13:26         ` Moger, Babu
@ 2024-09-27 15:07           ` Reinette Chatre
  0 siblings, 0 replies; 96+ messages in thread
From: Reinette Chatre @ 2024-09-27 15:07 UTC (permalink / raw)
  To: babu.moger, corbet, fenghua.yu, tglx, mingo, bp, dave.hansen
  Cc: x86, hpa, paulmck, rdunlap, tj, peterz, yanjiewtw, kim.phillips,
	lukas.bulwahn, seanjc, jmattson, leitao, jpoimboe,
	rick.p.edgecombe, kirill.shutemov, jithu.joseph, kai.huang,
	kan.liang, daniel.sneddon, pbonzini, sandipan.das, ilpo.jarvinen,
	peternewman, maciej.wieczor-retman, linux-doc, linux-kernel,
	eranian, james.morse

Hi Babu,

On 9/27/24 6:26 AM, Moger, Babu wrote:
> Hi Reinette,
> 
> On 9/26/2024 8:51 PM, Reinette Chatre wrote:
>> Hi Babu,
>>
>> On 9/26/24 12:39 PM, Moger, Babu wrote:
>>> On 9/19/24 12:38, Reinette Chatre wrote:
>>>> On 9/4/24 3:21 PM, Babu Moger wrote:
>>
>>>>> +      ::
>>>>> +
>>>>> +        # echo  "default" > /sys/fs/resctrl/info/L3_MON/mbm_assign_mode
>>>>> +
>>>>> +    The MBM event counters will reset when mbm_assign_mode is changed. Moving to
>>>>
>>>> "will reset" -> "may reset"? Please also be clear on what is meant with "MBM event counter".
>>>
>>> It "will reset".
>>>
>>
>> I understand that this is true for the ABMC implementation. My goal with making this vague is
>> to not have this reset set in stone if some other implementation behaves differently.
> 
> ok.
>>
>>>> Note that "counter" has a very specific meaning in this work and after considering that
>>>> it is not clear if "MBM event counter will reset" means that the counters are no longer
>>>> assigned or if it means that the counts associated with events will be reset.
>>>
>>> How about
>>>
>>> "The MBM event counters(mbm_total_bytes and mbm_local_bytes) associated
>>> with the event will reset when mbm_assign_mode is changed."
>>
>> In the docs "mbm_total_bytes" and "mbm_local_bytes" are termed "events" ... maybe
>> "The MBM events (mbm_total_bytes and/or mbm_local_bytes) associated
>> counters may reset when mbm_assign_mode is changed."?
> 
> Sure.
> 

Please do not just copy the text because I made a mistake with the grammar.
"The MBM events (mbm_total_bytes and/or mbm_local_bytes) associated with
counters may reset when mbm_assign_mode is changed."?

Reinette

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [PATCH v7 22/24] x86/resctrl: Update assignments on event configuration changes
  2024-09-19 17:45   ` Reinette Chatre
@ 2024-09-27 16:22     ` Moger, Babu
  2024-10-02 18:20       ` Reinette Chatre
  0 siblings, 1 reply; 96+ messages in thread
From: Moger, Babu @ 2024-09-27 16:22 UTC (permalink / raw)
  To: Reinette Chatre, Babu Moger, corbet, fenghua.yu, tglx, mingo, bp,
	dave.hansen
  Cc: x86, hpa, paulmck, rdunlap, tj, peterz, yanjiewtw, kim.phillips,
	lukas.bulwahn, seanjc, jmattson, leitao, jpoimboe,
	rick.p.edgecombe, kirill.shutemov, jithu.joseph, kai.huang,
	kan.liang, daniel.sneddon, pbonzini, sandipan.das, ilpo.jarvinen,
	peternewman, maciej.wieczor-retman, linux-doc, linux-kernel,
	eranian, james.morse

Hi Reinitte,

On 9/19/2024 12:45 PM, Reinette Chatre wrote:
> Hi Babu,
> 
> On 9/4/24 3:21 PM, Babu Moger wrote:
>> Users can modify the configuration of assignable events. Whenever the
>> event configuration is updated, MBM assignments must be revised across
>> all monitor groups within the impacted domains.
>>
>> Signed-off-by: Babu Moger <babu.moger@amd.com>
>> ---
>> v7: New patch to update the assignments. Missed it earlier.
>> ---
>>   arch/x86/kernel/cpu/resctrl/rdtgroup.c | 53 ++++++++++++++++++++++++++
>>   1 file changed, 53 insertions(+)
>>
>> diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
>> index 1054583bef9d..0b1490d71e77 100644
>> --- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
>> +++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
>> @@ -871,6 +871,15 @@ static int rdtgroup_rmid_show(struct kernfs_open_file *of,
>>    */
>>   #define MBM_EVENT_ARRAY_INDEX(_event) ((_event) - 2)
>>   
>> +static bool resctrl_mbm_event_assigned(struct rdtgroup *rdtg,
>> +				       struct rdt_mon_domain *d, u32 evtid)
>> +{
>> +	int index = MBM_EVENT_ARRAY_INDEX(evtid);
>> +	int cntr_id = rdtg->mon.cntr_id[index];
>> +
>> +	return  (cntr_id != MON_CNTR_UNSET && test_bit(cntr_id, d->mbm_cntr_map));
> 
> (Please check spaces and paren use.)

Sure.

> 
>> +}
>> +
>>   static int rdtgroup_mbm_assign_mode_show(struct kernfs_open_file *of,
>>   					 struct seq_file *s, void *v)
>>   {
>> @@ -1793,12 +1802,48 @@ static int mbm_local_bytes_config_show(struct kernfs_open_file *of,
>>   	return 0;
>>   }
>>   
>> +static int resctrl_mbm_event_update_assign(struct rdt_resource *r,
>> +					   struct rdt_mon_domain *d, u32 evtid)
>> +{
>> +	struct rdt_mon_domain *dom;
>> +	struct rdtgroup *rdtg;
>> +	int ret = 0;
>> +
>> +	if (!resctrl_arch_mbm_cntr_assign_enabled(r))
>> +		return ret;
>> +
>> +	list_for_each_entry(rdtg, &rdt_all_groups, rdtgroup_list) {
>> +		struct rdtgroup *crg;
>> +
>> +		list_for_each_entry(dom, &r->mon_domains, hdr.list) {
>> +			if (d == dom && resctrl_mbm_event_assigned(rdtg, dom, evtid)) {
>> +				ret = rdtgroup_assign_cntr(r, rdtg, dom, evtid);
>> +				if (ret)
>> +					goto out_done;
>> +			}
>> +		}
>> +
>> +		list_for_each_entry(crg, &rdtg->mon.crdtgrp_list, mon.crdtgrp_list) {
>> +			list_for_each_entry(dom, &r->mon_domains, hdr.list) {
>> +				if (d == dom && resctrl_mbm_event_assigned(crg, dom, evtid)) {
>> +					ret = rdtgroup_assign_cntr(r, crg, dom, evtid);
>> +					if (ret)
>> +						goto out_done;
>> +				}
>> +			}
>> +		}
>> +	}
>> +
>> +out_done:
>> +	return ret;
>> +}
>>   
>>   static void mbm_config_write_domain(struct rdt_resource *r,
>>   				    struct rdt_mon_domain *d, u32 evtid, u32 val)
>>   {
>>   	struct mon_config_info mon_info = {0};
>>   	u32 config_val;
>> +	int ret;
>>   
>>   	/*
>>   	 * Check the current config value first. If both are the same then
>> @@ -1822,6 +1867,14 @@ static void mbm_config_write_domain(struct rdt_resource *r,
>>   			      resctrl_arch_event_config_set,
>>   			      &mon_info, 1);
>>   
>> +	/*
>> +	 * Counter assignments needs to be updated to match the event
>> +	 * configuration.
>> +	 */
>> +	ret = resctrl_mbm_event_update_assign(r, d, evtid);
>> +	if (ret)
>> +		rdt_last_cmd_puts("Assign failed, event will be Unavailable\n");
>> +
> 
> This does not look right. This function _just_ returned from an IPI on appropriate CPU and then
> starts flow to do _another_ IPI to run code that could have just been run during previous IPI.
> The whole flow to call rdgroup_assign_cntr() also seems like an obfuscated way to call resctrl_arch_assign_cntr()
> to just reconfigure the counter (not actually assign it).
> Could it perhaps call some resctrl fs code via single IPI that in turn calls the appropriate arch code to set the new
> mon event config and re-configures the counter?
> 

I think we can simplify this. We dont have to go thru all the rdtgroups 
to figure out if the counter is assigned or not.

I can move the code inside mon_config_write() after the call 
mbm_config_write_domain().

Using the domain bitmap we can figure out which of the counters are 
assigned in the domain. I can use the hardware help to update the 
assignment for each counter.  This has to be done via IPI.
Something like this.

static void rdtgroup_abmc_dom_cfg(void *info)
{
	union l3_qos_abmc_cfg *abmc_cfg = info;
         u32 val = abmc_cfg->bw_type;

         /* Get the counter configuration */
	wrmsrl(MSR_IA32_L3_QOS_ABMC_CFG, *abmc_cfg);
	rdmsrl(MGR_IA32_L3_QOS_ABMC_DSC, *abmc_cfg);

         /* update the counter configuration */
         if (abmc_cfg->bw_type != val) {
             abmc_cfg->bw_type = val;
             abmc_cfg.split.cfg_en = 1;
             wrmsrl(MSR_IA32_L3_QOS_ABMC_CFG, *abmc_cfg);
         }
}


-- 
- Babu Moger

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [PATCH v7 23/24] x86/resctrl: Introduce interface to list assignment states of all the groups
  2024-09-19 17:53   ` Reinette Chatre
@ 2024-09-27 17:06     ` Moger, Babu
  0 siblings, 0 replies; 96+ messages in thread
From: Moger, Babu @ 2024-09-27 17:06 UTC (permalink / raw)
  To: Reinette Chatre, Babu Moger, corbet, fenghua.yu, tglx, mingo, bp,
	dave.hansen
  Cc: x86, hpa, paulmck, rdunlap, tj, peterz, yanjiewtw, kim.phillips,
	lukas.bulwahn, seanjc, jmattson, leitao, jpoimboe,
	rick.p.edgecombe, kirill.shutemov, jithu.joseph, kai.huang,
	kan.liang, daniel.sneddon, pbonzini, sandipan.das, ilpo.jarvinen,
	peternewman, maciej.wieczor-retman, linux-doc, linux-kernel,
	eranian, james.morse

Hi Reinette,

On 9/19/2024 12:53 PM, Reinette Chatre wrote:
> Hi Babu,
> 
> On 9/4/24 3:21 PM, Babu Moger wrote:
>> Provide the interface to list the assignment states of all the resctrl
>> groups in mbm_cntr_assign mode.
>>
>> Example:
>> $cat /sys/fs/resctrl/info/L3_MON/mbm_assign_control
> 
> It is not clear what is intended with above example, was it intended to
> have some output?

Yes. Will add output here.

> 
>>
>> List follows the following format:
>>
>> "<CTRL_MON group>/<MON group>/<domain_id>=<flags>"
>>
>> Format for specific type of groups:
>>
>> - Default CTRL_MON group:
>>    "//<domain_id>=<flags>"
>>
>> - Non-default CTRL_MON group:
>>    "<CTRL_MON group>//<domain_id>=<flags>"
>>
>> - Child MON group of default CTRL_MON group:
>>    "/<MON group>/<domain_id>=<flags>"
>>
>> - Child MON group of non-default CTRL_MON group:
>>    "<CTRL_MON group>/<MON group>/<domain_id>=<flags>"
>>
>> Flags can be one of the following:
>> t  MBM total event is enabled
>> l  MBM local event is enabled
>> tl Both total and local MBM events are enabled
>> _  None of the MBM events are enabled
>>
>> Signed-off-by: Babu Moger <babu.moger@amd.com>
>> ---
> 
>> +"mbm_assign_control":
>> +	Reports the resctrl group and monitor status of each group.
>> +
>> +	List follows the following format:
>> +		"<CTRL_MON group>/<MON group>/<domain_id>=<flags>"
>> +
>> +	Format for specific type of groups:
>> +
>> +	* Default CTRL_MON group:
>> +		"//<domain_id>=<flags>"
>> +
>> +	* Non-default CTRL_MON group:
>> +		"<CTRL_MON group>//<domain_id>=<flags>"
>> +
>> +	* Child MON group of default CTRL_MON group:
>> +		"/<MON group>/<domain_id>=<flags>"
>> +
>> +	* Child MON group of non-default CTRL_MON group:
>> +		"<CTRL_MON group>/<MON group>/<domain_id>=<flags>"
>> +
>> +	Flags can be one of the following:
>> +	::
>> +
>> +	 t  MBM total event is assigned.
>> +	 l  MBM local event is assigned.
>> +	 tl Both total and local MBM events are assigned.
>> +	 _  None of the MBM events are assigned.
>> +
>> +	Examples:
>> +	::
>> +
>> +	 # mkdir /sys/fs/resctrl/mon_groups/child_default_mon_grp
>> +	 # mkdir /sys/fs/resctrl/non_default_ctrl_mon_grp
>> +	 # mkdir /sys/fs/resctrl/non_default_ctrl_mon_grp/mon_groups/child_non_default_mon_grp
>> +
>> +	 # cat /sys/fs/resctrl/info/L3_MON/mbm_assign_control
>> +	 non_default_ctrl_mon_grp//0=tl;1=tl;
>> +	 non_default_ctrl_mon_grp/child_non_default_mon_grp/0=tl;1=tl;
>> +	 //0=tl;1=tl;
>> +	 /child_default_mon_grp/0=tl;1=tl;
>> +
>> +	 There are four resctrl groups. All the groups have total and local MBM events
>> +	 assigned on domain 0 and 1.
>> +
> 
> Please create the docs in chosen format, like htmldocs, and see how it ends up being formatted.
> For example, above seems to be intended to be a code sample but the description ("There are
> four resctrl ...") appears as part of the code sample.

Sure. Will check.

> 
>>   "max_threshold_occupancy":
>>   		Read/write file provides the largest value (in
>>   		bytes) at which a previously used LLC_occupancy
> 
> ...
> 
>> +static int rdtgroup_mbm_assign_control_show(struct kernfs_open_file *of,
>> +					    struct seq_file *s, void *v)
>> +{
>> +	struct rdt_resource *r = of->kn->parent->priv;
>> +	struct rdt_mon_domain *dom;
>> +	struct rdtgroup *rdtg;
>> +	char str[10];
>> +
>> +	mutex_lock(&rdtgroup_mutex);
>> +
>> +	if (!resctrl_arch_mbm_cntr_assign_enabled(r)) {
>> +		rdt_last_cmd_puts("mbm_cntr_assign mode is not enabled\n");
>> +		mutex_unlock(&rdtgroup_mutex);
>> +		return -EINVAL;
>> +	}
>> +
>> +	rdt_last_cmd_clear();
> 
> This should be done before any attempt to write to the buffer.

Sure. Will move it up.

> 
> 
> Reinette
> 

Thanks
-- 
- Babu Moger

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [PATCH v7 24/24] x86/resctrl: Introduce interface to modify assignment states of the groups
  2024-09-19 17:59   ` Reinette Chatre
@ 2024-09-27 17:47     ` Moger, Babu
  2024-10-02 18:19       ` Reinette Chatre
  0 siblings, 1 reply; 96+ messages in thread
From: Moger, Babu @ 2024-09-27 17:47 UTC (permalink / raw)
  To: Reinette Chatre, Babu Moger, corbet, fenghua.yu, tglx, mingo, bp,
	dave.hansen
  Cc: x86, hpa, paulmck, rdunlap, tj, peterz, yanjiewtw, kim.phillips,
	lukas.bulwahn, seanjc, jmattson, leitao, jpoimboe,
	rick.p.edgecombe, kirill.shutemov, jithu.joseph, kai.huang,
	kan.liang, daniel.sneddon, pbonzini, sandipan.das, ilpo.jarvinen,
	peternewman, maciej.wieczor-retman, linux-doc, linux-kernel,
	eranian, james.morse

Hi Reinette,


On 9/19/2024 12:59 PM, Reinette Chatre wrote:
> Hi Babu,
> 
> On 9/4/24 3:21 PM, Babu Moger wrote:
>> Introduce the interface to assign MBM events in mbm_cntr_assign mode.
>>
>> Events can be enabled or disabled by writing to file
>> /sys/fs/resctrl/info/L3_MON/mbm_assign_control
>>
>> Format is similar to the list format with addition of opcode for the
>> assignment operation.
>>   "<CTRL_MON group>/<MON group>/<domain_id><opcode><flags>"
>>
>> Format for specific type of groups:
>>
>>   * Default CTRL_MON group:
>>           "//<domain_id><opcode><flags>"
>>
>>   * Non-default CTRL_MON group:
>>           "<CTRL_MON group>//<domain_id><opcode><flags>"
>>
>>   * Child MON group of default CTRL_MON group:
>>           "/<MON group>/<domain_id><opcode><flags>"
>>
>>   * Child MON group of non-default CTRL_MON group:
>>           "<CTRL_MON group>/<MON group>/<domain_id><opcode><flags>"
>>
>> Domain_id '*' will apply the flags on all the domains.
>>
>> Opcode can be one of the following:
>>
>>   = Update the assignment to match the flags
>>   + Assign a new MBM event without impacting existing assignments.
>>   - Unassign a MBM event from currently assigned events.
>>
>> Assignment flags can be one of the following:
>>   t  MBM total event
>>   l  MBM local event
>>   tl Both total and local MBM events
>>   _  None of the MBM events. Valid only with '=' opcode.
>>
>> Signed-off-by: Babu Moger <babu.moger@amd.com>
>> ---
>> v7: Simplified the parsing (strsep(&token, "//") in rdtgroup_mbm_assign_control_write().
>>      Added mutex lock in rdtgroup_mbm_assign_control_write() while processing.
>>      Renamed rdtgroup_find_grp to rdtgroup_find_grp_by_name.
>>      Fixed rdtgroup_str_to_mon_state to return error for invalid flags.
>>      Simplified the calls rdtgroup_assign_cntr by merging few functions earlier.
>>      Removed ABMC reference in FS code.
>>      Reinette commented about handling the combination of flags like 'lt_' and '_lt'.
>>      Not sure if we need to change the behaviour here. Processed them sequencially right now.
>>      Users have the liberty to pass the flags. Restricting it might be a problem later.
> 
> Could you please give an example of what problem may be encountered later? An assignment
> like "domain=_lt" seems like a contradiction to me since user space essentially asks
> for "None of the MBM events" as well as "MBM total event" and "MBM local event".

I agree it is contradiction. But user is the one who decides to do that. 
I think we should allow it. Also, there is some value to it as well.

"domain=_lt" This will also reset the counters if the total and local 
events are assigned earlier this action.


> 
> 
> ...
> 
>> @@ -352,6 +352,98 @@ with the following files:
>>   	 There are four resctrl groups. All the groups have total and local MBM events
>>   	 assigned on domain 0 and 1.
>>   
>> +	Assignment state can be updated by writing to the interface.
>> +
>> +	Format is similar to the list format with addition of opcode for the
>> +	assignment operation.
>> +
>> +		"<CTRL_MON group>/<MON group>/<domain_id><opcode><flags>"
>> +
>> +	Format for each type of groups:
>> +
>> +        * Default CTRL_MON group:
>> +                "//<domain_id><opcode><flags>"
>> +
>> +        * Non-default CTRL_MON group:
>> +                "<CTRL_MON group>//<domain_id><opcode><flags>"
>> +
>> +        * Child MON group of default CTRL_MON group:
>> +                "/<MON group>/<domain_id><opcode><flags>"
>> +
>> +        * Child MON group of non-default CTRL_MON group:
>> +                "<CTRL_MON group>/<MON group>/<domain_id><opcode><flags>"
>> +
>> +	Domain_id '*' will apply the flags on all the domains.
>> +
>> +	Opcode can be one of the following:
>> +	::
>> +
>> +	 = Update the assignment to match the MBM event.
>> +	 + Assign a new MBM event without impacting existing assignments.
>> +	 - Unassign a MBM event from currently assigned events.
>> +
>> +	Examples:
>> +	::
>> +
>> +	  Initial group status:
>> +	  # cat /sys/fs/resctrl/info/L3_MON/mbm_assign_control
>> +	  non_default_ctrl_mon_grp//0=tl;1=tl;
>> +	  non_default_ctrl_mon_grp/child_non_default_mon_grp/0=tl;1=tl;
>> +	  //0=tl;1=tl;
>> +	  /child_default_mon_grp/0=tl;1=tl;
>> +
> 
> Similar to previous patch, looking at this generated doc does not seem to reflect
> what is intended. Above and below are all formatted as code, the descriptions as
> well as the actual "code".

Sure. Will check again.

> 
>> +	  To update the default group to assign only total MBM event on domain 0:
>> +	  # echo "//0=t" > /sys/fs/resctrl/info/L3_MON/mbm_assign_control
>> +
>> +	  Assignment status after the update:
>> +	  # cat /sys/fs/resctrl/info/L3_MON/mbm_assign_control
>> +	  non_default_ctrl_mon_grp//0=tl;1=tl;
>> +	  non_default_ctrl_mon_grp/child_non_default_mon_grp/0=tl;1=tl;
>> +	  //0=t;1=tl;
>> +	  /child_default_mon_grp/0=tl;1=tl;
>> +
>> +	  To update the MON group child_default_mon_grp to remove total MBM event on domain 1:
>> +	  # echo "/child_default_mon_grp/1-t" > /sys/fs/resctrl/info/L3_MON/mbm_assign_control
>> +
>> +	  Assignment status after the update:
>> +	  $ cat /sys/fs/resctrl/info/L3_MON/mbm_assign_control
>> +	  non_default_ctrl_mon_grp//0=tl;1=tl;
>> +	  non_default_ctrl_mon_grp/child_non_default_mon_grp/0=tl;1=tl;
>> +	  //0=t;1=tl;
>> +	  /child_default_mon_grp/0=tl;1=l;
>> +
>> +	  To update the MON group non_default_ctrl_mon_grp/child_non_default_mon_grp to
>> +	  unassign both local and total MBM events on domain 1:
>> +	  # echo "non_default_ctrl_mon_grp/child_non_default_mon_grp/1=_" >
>> +			/sys/fs/resctrl/info/L3_MON/mbm_assign_control
>> +
>> +	  Assignment status after the update:
>> +	  non_default_ctrl_mon_grp//0=tl;1=tl;
>> +	  non_default_ctrl_mon_grp/child_non_default_mon_grp/0=tl;1=_;
>> +	  //0=t;1=tl;
>> +	  /child_default_mon_grp/0=tl;1=l;
>> +
>> +	  To update the default group to add a local MBM event domain 0.
>> +	  # echo "//0+l" > /sys/fs/resctrl/info/L3_MON/mbm_assign_control
>> +
>> +	  Assignment status after the update:
>> +	  # cat /sys/fs/resctrl/info/L3_MON/mbm_assign_control
>> +	  non_default_ctrl_mon_grp//0=tl;1=tl;
>> +	  non_default_ctrl_mon_grp/child_non_default_mon_grp/0=tl;1=_;
>> +	  //0=tl;1=tl;
>> +	  /child_default_mon_grp/0=tl;1=l;
>> +
>> +	  To update the non default CTRL_MON group non_default_ctrl_mon_grp to unassign all
>> +	  the MBM events on all the domains.
>> +	  # echo "non_default_ctrl_mon_grp//*=_" > /sys/fs/resctrl/info/L3_MON/mbm_assign_control
>> +
>> +	  Assignment status after the update:
>> +	  #cat /sys/fs/resctrl/info/L3_MON/mbm_assign_control
>> +	  non_default_ctrl_mon_grp//0=_;1=_;
>> +	  non_default_ctrl_mon_grp/child_non_default_mon_grp/0=tl;1=_;
>> +	  //0=tl;1=tl;
>> +	  /child_default_mon_grp/0=tl;1=l;
>> +
>>   "max_threshold_occupancy":
>>   		Read/write file provides the largest value (in
>>   		bytes) at which a previously used LLC_occupancy
> 
> ...
> 
>> +static int rdtgroup_process_flags(struct rdt_resource *r,
>> +				  enum rdt_group_type rtype,
>> +				  char *p_grp, char *c_grp, char *tok)
>> +{
>> +	int op, mon_state, assign_state, unassign_state;
>> +	char *dom_str, *id_str, *op_str;
>> +	struct rdt_mon_domain *d;
>> +	struct rdtgroup *rdtgrp;
>> +	unsigned long dom_id;
>> +	int ret, found = 0;
>> +
>> +	rdtgrp = rdtgroup_find_grp_by_name(rtype, p_grp, c_grp);
>> +
>> +	if (!rdtgrp) {
>> +		rdt_last_cmd_puts("Not a valid resctrl group\n");
>> +		return -EINVAL;
>> +	}
>> +
>> +next:
>> +	if (!tok || tok[0] == '\0')
>> +		return 0;
>> +
>> +	/* Start processing the strings for each domain */
>> +	dom_str = strim(strsep(&tok, ";"));
>> +
>> +	op_str = strpbrk(dom_str, "=+-");
>> +
>> +	if (op_str) {
>> +		op = *op_str;
>> +	} else {
>> +		rdt_last_cmd_puts("Missing operation =, +, -, _ character\n");
> 
> "_" is not an operation.

Sure. Will remove  this charactor.

> 
>> +		return -EINVAL;
>> +	}
>> +
>> +	id_str = strsep(&dom_str, "=+-");
>> +
>> +	/* Check for domain id '*' which means all domains */
>> +	if (id_str && *id_str == '*') {
>> +		d = NULL;
>> +		goto check_state;
>> +	} else if (!id_str || kstrtoul(id_str, 10, &dom_id)) {
>> +		rdt_last_cmd_puts("Missing domain id\n");
>> +		return -EINVAL;
>> +	}
>> +
>> +	/* Verify if the dom_id is valid */
>> +	list_for_each_entry(d, &r->mon_domains, hdr.list) {
>> +		if (d->hdr.id == dom_id) {
>> +			found = 1;
>> +			break;
>> +		}
>> +	}
>> +
>> +	if (!found) {
>> +		rdt_last_cmd_printf("Invalid domain id %ld\n", dom_id);
>> +		return -EINVAL;
>> +	}
>> +
>> +check_state:
>> +	mon_state = rdtgroup_str_to_mon_state(dom_str);
>> +
>> +	if (mon_state == ASSIGN_INVALID) {
>> +		rdt_last_cmd_puts("Invalid assign flag\n");
>> +		goto out_fail;
>> +	}
>> +
>> +	assign_state = 0;
>> +	unassign_state = 0;
>> +
>> +	switch (op) {
>> +	case '+':
>> +		if (mon_state == ASSIGN_NONE) {
>> +			rdt_last_cmd_puts("Invalid assign opcode\n");
>> +			goto out_fail;
>> +		}
>> +		assign_state = mon_state;
>> +		break;
>> +	case '-':
>> +		if (mon_state == ASSIGN_NONE) {
>> +			rdt_last_cmd_puts("Invalid assign opcode\n");
>> +			goto out_fail;
>> +		}
>> +		unassign_state = mon_state;
>> +		break;
>> +	case '=':
>> +		assign_state = mon_state;
>> +		unassign_state = (ASSIGN_TOTAL | ASSIGN_LOCAL) & ~assign_state;
>> +		break;
>> +	default:
>> +		break;
>> +	}
>> +
>> +	if (assign_state & ASSIGN_TOTAL) {
>> +		ret = rdtgroup_assign_cntr(r, rdtgrp, d, QOS_L3_MBM_TOTAL_EVENT_ID);
> 
> hmmm ... wasn't unassign going to happen first? That would potentially make counters
> available to help subsequent assign succeed.

Good point. I will change the order.

> 
>> +		if (ret)
>> +			goto out_fail;
>> +	}
>> +
>> +	if (assign_state & ASSIGN_LOCAL) {
>> +		ret = rdtgroup_assign_cntr(r, rdtgrp, d, QOS_L3_MBM_LOCAL_EVENT_ID);
>> +		if (ret)
>> +			goto out_fail;
>> +	}
>> +
>> +	if (unassign_state & ASSIGN_TOTAL) {
>> +		ret = rdtgroup_unassign_cntr(r, rdtgrp, d, QOS_L3_MBM_TOTAL_EVENT_ID);
>> +		if (ret)
>> +			goto out_fail;
>> +	}
>> +
>> +	if (unassign_state & ASSIGN_LOCAL) {
>> +		ret = rdtgroup_unassign_cntr(r, rdtgrp, d, QOS_L3_MBM_LOCAL_EVENT_ID);
>> +		if (ret)
>> +			goto out_fail;
>> +	}
>> +
>> +	goto next;
>> +
>> +out_fail:
>> +
>> +	return -EINVAL;
>> +}
>> +
> 
> Reinette
> 
> 

Thanks
-- 
- Babu Moger

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [PATCH v7 00/24] x86/resctrl : Support AMD Assignable Bandwidth Monitoring Counters (ABMC)
  2024-09-19 18:00 ` [PATCH v7 00/24] x86/resctrl : Support AMD Assignable Bandwidth Monitoring Counters (ABMC) Reinette Chatre
@ 2024-09-27 18:11   ` Moger, Babu
  0 siblings, 0 replies; 96+ messages in thread
From: Moger, Babu @ 2024-09-27 18:11 UTC (permalink / raw)
  To: Reinette Chatre, Babu Moger, corbet, fenghua.yu, tglx, mingo, bp,
	dave.hansen
  Cc: x86, hpa, paulmck, rdunlap, tj, peterz, yanjiewtw, kim.phillips,
	lukas.bulwahn, seanjc, jmattson, leitao, jpoimboe,
	rick.p.edgecombe, kirill.shutemov, jithu.joseph, kai.huang,
	kan.liang, daniel.sneddon, pbonzini, sandipan.das, ilpo.jarvinen,
	peternewman, maciej.wieczor-retman, linux-doc, linux-kernel,
	eranian, james.morse

Hi Reinette,

On 9/19/2024 1:00 PM, Reinette Chatre wrote:
> Hi Babu,
> 
> On 9/4/24 3:21 PM, Babu Moger wrote:
>> # Linux Implementation
>>
>> Create a generic interface aimed to support user space assignment
>> of scarce counters used for monitoring. First usage of interface
>> is by ABMC with option to expand usage to "soft-ABMC" and MPAM
>> counters in future.
>>
>> Feature adds following interface files:
>>
>> /sys/fs/resctrl/info/L3_MON/mbm_assign_mode: Reports the list of assignable
>> monitoring features supported. The enclosed brackets indicate which
>> feature is enabled.
>>
>> /sys/fs/resctrl/info/L3_MON/num_mbm_cntrs: Reports the number of monitoring
>> counters available for assignment.
>>
>> /sys/fs/resctrl/info/L3_MON/mbm_assign_control: Reports the resctrl group and monitor
>> status of each group. Assignment state can be updated by writing to the
>> interface.
> 
> At this point I think the architecture is settling with the remaining work focusing
> on polishing the code and making it more robust. To get confidence in this big addition
> it will be valuable to hear from Peter and James to confirm if soft-ABMC and
> MPAM can build on this.

Agree. Peter/James, Please let me know if there are any concerns with 
the interface.

> 
>>
>> # Examples
>>
>> a. Check if ABMC support is available
>> 	#mount -t resctrl resctrl /sys/fs/resctrl/
>>
>> 	#cat /sys/fs/resctrl/info/L3_MON/mbm_assign_mode
>> 	[mbm_cntr_assign]
>> 	default
>>
>> 	ABMC feature is detected and it is enabled.
>>
>> b. Check how many ABMC counters are available.
>>
>> 	#cat /sys/fs/resctrl/info/L3_MON/num_mbm_cntrs
>> 	32
>>
>> c. Create few resctrl groups.
>>
>> 	# mkdir /sys/fs/resctrl/mon_groups/child_default_mon_grp
>> 	# mkdir /sys/fs/resctrl/non_default_ctrl_mon_grp
>> 	# mkdir /sys/fs/resctrl/non_default_ctrl_mon_grp/mon_groups/child_non_default_mon_grp
>>
>>
>> d. This series adds a new interface file /sys/fs/resctrl/info/L3_MON/mbm_assign_control
>>     to list and modify the group's monitoring states. File provides single place
> 
> "modify the group's monitoring states" -> "modify any group's monitoring states"?

Sure.

> 
>>     to list monitoring states of all the resctrl groups. It makes it easier for
>>     user space to to learn about the used counters without needing to traverse
> 
> "to to learn" -> "to learn"

Sure.

> 
> Reinette
> 

Thanks
- Babu Moger

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [PATCH v7 24/24] x86/resctrl: Introduce interface to modify assignment states of the groups
  2024-09-27 17:47     ` Moger, Babu
@ 2024-10-02 18:19       ` Reinette Chatre
  2024-10-04  1:11         ` Moger, Babu
  0 siblings, 1 reply; 96+ messages in thread
From: Reinette Chatre @ 2024-10-02 18:19 UTC (permalink / raw)
  To: babu.moger, corbet, fenghua.yu, tglx, mingo, bp, dave.hansen
  Cc: x86, hpa, paulmck, rdunlap, tj, peterz, yanjiewtw, kim.phillips,
	lukas.bulwahn, seanjc, jmattson, leitao, jpoimboe,
	rick.p.edgecombe, kirill.shutemov, jithu.joseph, kai.huang,
	kan.liang, daniel.sneddon, pbonzini, sandipan.das, ilpo.jarvinen,
	peternewman, maciej.wieczor-retman, linux-doc, linux-kernel,
	eranian, james.morse

Hi Babu,

On 9/27/24 10:47 AM, Moger, Babu wrote:
> On 9/19/2024 12:59 PM, Reinette Chatre wrote:
>> On 9/4/24 3:21 PM, Babu Moger wrote:

>>> v7: Simplified the parsing (strsep(&token, "//") in rdtgroup_mbm_assign_control_write().
>>>      Added mutex lock in rdtgroup_mbm_assign_control_write() while processing.
>>>      Renamed rdtgroup_find_grp to rdtgroup_find_grp_by_name.
>>>      Fixed rdtgroup_str_to_mon_state to return error for invalid flags.
>>>      Simplified the calls rdtgroup_assign_cntr by merging few functions earlier.
>>>      Removed ABMC reference in FS code.
>>>      Reinette commented about handling the combination of flags like 'lt_' and '_lt'.
>>>      Not sure if we need to change the behaviour here. Processed them sequencially right now.
>>>      Users have the liberty to pass the flags. Restricting it might be a problem later.
>>
>> Could you please give an example of what problem may be encountered later? An assignment
>> like "domain=_lt" seems like a contradiction to me since user space essentially asks
>> for "None of the MBM events" as well as "MBM total event" and "MBM local event".
> 
> I agree it is contradiction. But user is the one who decides to do that. I think we should allow it. Also, there is some value to it as well.
> 
> "domain=_lt" This will also reset the counters if the total and local events are assigned earlier this action.

The last sentence is not clear to me. Could you please elaborate what
you mean with "are assigned earlier this action"?

Reinette

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [PATCH v7 22/24] x86/resctrl: Update assignments on event configuration changes
  2024-09-27 16:22     ` Moger, Babu
@ 2024-10-02 18:20       ` Reinette Chatre
  2024-10-04  0:51         ` Moger, Babu
  0 siblings, 1 reply; 96+ messages in thread
From: Reinette Chatre @ 2024-10-02 18:20 UTC (permalink / raw)
  To: babu.moger, corbet, fenghua.yu, tglx, mingo, bp, dave.hansen
  Cc: x86, hpa, paulmck, rdunlap, tj, peterz, yanjiewtw, kim.phillips,
	lukas.bulwahn, seanjc, jmattson, leitao, jpoimboe,
	rick.p.edgecombe, kirill.shutemov, jithu.joseph, kai.huang,
	kan.liang, daniel.sneddon, pbonzini, sandipan.das, ilpo.jarvinen,
	peternewman, maciej.wieczor-retman, linux-doc, linux-kernel,
	eranian, james.morse

Hi Babu,

On 9/27/24 9:22 AM, Moger, Babu wrote:
> Hi Reinitte,
> 
> On 9/19/2024 12:45 PM, Reinette Chatre wrote:
>> On 9/4/24 3:21 PM, Babu Moger wrote:

...

>>> +}
>>> +
>>>   static int rdtgroup_mbm_assign_mode_show(struct kernfs_open_file *of,
>>>                        struct seq_file *s, void *v)
>>>   {
>>> @@ -1793,12 +1802,48 @@ static int mbm_local_bytes_config_show(struct kernfs_open_file *of,
>>>       return 0;
>>>   }
>>>   +static int resctrl_mbm_event_update_assign(struct rdt_resource *r,
>>> +                       struct rdt_mon_domain *d, u32 evtid)
>>> +{
>>> +    struct rdt_mon_domain *dom;
>>> +    struct rdtgroup *rdtg;
>>> +    int ret = 0;
>>> +
>>> +    if (!resctrl_arch_mbm_cntr_assign_enabled(r))
>>> +        return ret;
>>> +
>>> +    list_for_each_entry(rdtg, &rdt_all_groups, rdtgroup_list) {
>>> +        struct rdtgroup *crg;
>>> +
>>> +        list_for_each_entry(dom, &r->mon_domains, hdr.list) {
>>> +            if (d == dom && resctrl_mbm_event_assigned(rdtg, dom, evtid)) {
>>> +                ret = rdtgroup_assign_cntr(r, rdtg, dom, evtid);
>>> +                if (ret)
>>> +                    goto out_done;
>>> +            }
>>> +        }
>>> +
>>> +        list_for_each_entry(crg, &rdtg->mon.crdtgrp_list, mon.crdtgrp_list) {
>>> +            list_for_each_entry(dom, &r->mon_domains, hdr.list) {
>>> +                if (d == dom && resctrl_mbm_event_assigned(crg, dom, evtid)) {
>>> +                    ret = rdtgroup_assign_cntr(r, crg, dom, evtid);
>>> +                    if (ret)
>>> +                        goto out_done;
>>> +                }
>>> +            }
>>> +        }
>>> +    }
>>> +
>>> +out_done:
>>> +    return ret;
>>> +}
>>>     static void mbm_config_write_domain(struct rdt_resource *r,
>>>                       struct rdt_mon_domain *d, u32 evtid, u32 val)
>>>   {
>>>       struct mon_config_info mon_info = {0};
>>>       u32 config_val;
>>> +    int ret;
>>>         /*
>>>        * Check the current config value first. If both are the same then
>>> @@ -1822,6 +1867,14 @@ static void mbm_config_write_domain(struct rdt_resource *r,
>>>                     resctrl_arch_event_config_set,
>>>                     &mon_info, 1);
>>>   +    /*
>>> +     * Counter assignments needs to be updated to match the event
>>> +     * configuration.
>>> +     */
>>> +    ret = resctrl_mbm_event_update_assign(r, d, evtid);
>>> +    if (ret)
>>> +        rdt_last_cmd_puts("Assign failed, event will be Unavailable\n");
>>> +
>>
>> This does not look right. This function _just_ returned from an IPI on appropriate CPU and then
>> starts flow to do _another_ IPI to run code that could have just been run during previous IPI.
>> The whole flow to call rdgroup_assign_cntr() also seems like an obfuscated way to call resctrl_arch_assign_cntr()
>> to just reconfigure the counter (not actually assign it).
>> Could it perhaps call some resctrl fs code via single IPI that in turn calls the appropriate arch code to set the new
>> mon event config and re-configures the counter?
>>
> 
> I think we can simplify this. We dont have to go thru all the rdtgroups to figure out if the counter is assigned or not.
> 
> I can move the code inside mon_config_write() after the call mbm_config_write_domain().

mbm_config_write_domain() already does an IPI so if I understand correctly this will still
result in a second IPI that seems unnecessary to me. Why can the reconfigure not be done
with a single IPI?

> 
> Using the domain bitmap we can figure out which of the counters are assigned in the domain. I can use the hardware help to update the assignment for each counter.  This has to be done via IPI.
> Something like this.
> 
> static void rdtgroup_abmc_dom_cfg(void *info)
> {
>     union l3_qos_abmc_cfg *abmc_cfg = info;
>         u32 val = abmc_cfg->bw_type;
> 
>         /* Get the counter configuration */
>     wrmsrl(MSR_IA32_L3_QOS_ABMC_CFG, *abmc_cfg);
>     rdmsrl(MGR_IA32_L3_QOS_ABMC_DSC, *abmc_cfg);
> 

This is not clear to me. I expected MSR_IA32_L3_QOS_ABMC_DSC
to return the bw_type that was just written to 
MSR_IA32_L3_QOS_ABMC_CFG. 

It is also not clear to me how these registers can be
used without a valid counter ID. I seem to miss
the context of this call.

>         /* update the counter configuration */
>         if (abmc_cfg->bw_type != val) {
>             abmc_cfg->bw_type = val;
>             abmc_cfg.split.cfg_en = 1;
>             wrmsrl(MSR_IA32_L3_QOS_ABMC_CFG, *abmc_cfg);
>         }
> }
> 
> 

Reinette

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [PATCH v7 22/24] x86/resctrl: Update assignments on event configuration changes
  2024-10-02 18:20       ` Reinette Chatre
@ 2024-10-04  0:51         ` Moger, Babu
  2024-10-04  2:17           ` Reinette Chatre
  0 siblings, 1 reply; 96+ messages in thread
From: Moger, Babu @ 2024-10-04  0:51 UTC (permalink / raw)
  To: Reinette Chatre, babu.moger, corbet, fenghua.yu, tglx, mingo, bp,
	dave.hansen
  Cc: x86, hpa, paulmck, rdunlap, tj, peterz, yanjiewtw, kim.phillips,
	lukas.bulwahn, seanjc, jmattson, leitao, jpoimboe,
	rick.p.edgecombe, kirill.shutemov, jithu.joseph, kai.huang,
	kan.liang, daniel.sneddon, pbonzini, sandipan.das, ilpo.jarvinen,
	peternewman, maciej.wieczor-retman, linux-doc, linux-kernel,
	eranian, james.morse

Hi Reinette,

On 10/2/2024 1:20 PM, Reinette Chatre wrote:
> Hi Babu,
> 
> On 9/27/24 9:22 AM, Moger, Babu wrote:
>> Hi Reinitte,
>>
>> On 9/19/2024 12:45 PM, Reinette Chatre wrote:
>>> On 9/4/24 3:21 PM, Babu Moger wrote:
> 
> ...
> 
>>>> +}
>>>> +
>>>>    static int rdtgroup_mbm_assign_mode_show(struct kernfs_open_file *of,
>>>>                         struct seq_file *s, void *v)
>>>>    {
>>>> @@ -1793,12 +1802,48 @@ static int mbm_local_bytes_config_show(struct kernfs_open_file *of,
>>>>        return 0;
>>>>    }
>>>>    +static int resctrl_mbm_event_update_assign(struct rdt_resource *r,
>>>> +                       struct rdt_mon_domain *d, u32 evtid)
>>>> +{
>>>> +    struct rdt_mon_domain *dom;
>>>> +    struct rdtgroup *rdtg;
>>>> +    int ret = 0;
>>>> +
>>>> +    if (!resctrl_arch_mbm_cntr_assign_enabled(r))
>>>> +        return ret;
>>>> +
>>>> +    list_for_each_entry(rdtg, &rdt_all_groups, rdtgroup_list) {
>>>> +        struct rdtgroup *crg;
>>>> +
>>>> +        list_for_each_entry(dom, &r->mon_domains, hdr.list) {
>>>> +            if (d == dom && resctrl_mbm_event_assigned(rdtg, dom, evtid)) {
>>>> +                ret = rdtgroup_assign_cntr(r, rdtg, dom, evtid);
>>>> +                if (ret)
>>>> +                    goto out_done;
>>>> +            }
>>>> +        }
>>>> +
>>>> +        list_for_each_entry(crg, &rdtg->mon.crdtgrp_list, mon.crdtgrp_list) {
>>>> +            list_for_each_entry(dom, &r->mon_domains, hdr.list) {
>>>> +                if (d == dom && resctrl_mbm_event_assigned(crg, dom, evtid)) {
>>>> +                    ret = rdtgroup_assign_cntr(r, crg, dom, evtid);
>>>> +                    if (ret)
>>>> +                        goto out_done;
>>>> +                }
>>>> +            }
>>>> +        }
>>>> +    }
>>>> +
>>>> +out_done:
>>>> +    return ret;
>>>> +}
>>>>      static void mbm_config_write_domain(struct rdt_resource *r,
>>>>                        struct rdt_mon_domain *d, u32 evtid, u32 val)
>>>>    {
>>>>        struct mon_config_info mon_info = {0};
>>>>        u32 config_val;
>>>> +    int ret;
>>>>          /*
>>>>         * Check the current config value first. If both are the same then
>>>> @@ -1822,6 +1867,14 @@ static void mbm_config_write_domain(struct rdt_resource *r,
>>>>                      resctrl_arch_event_config_set,
>>>>                      &mon_info, 1);
>>>>    +    /*
>>>> +     * Counter assignments needs to be updated to match the event
>>>> +     * configuration.
>>>> +     */
>>>> +    ret = resctrl_mbm_event_update_assign(r, d, evtid);
>>>> +    if (ret)
>>>> +        rdt_last_cmd_puts("Assign failed, event will be Unavailable\n");
>>>> +
>>>
>>> This does not look right. This function _just_ returned from an IPI on appropriate CPU and then
>>> starts flow to do _another_ IPI to run code that could have just been run during previous IPI.
>>> The whole flow to call rdgroup_assign_cntr() also seems like an obfuscated way to call resctrl_arch_assign_cntr()
>>> to just reconfigure the counter (not actually assign it).
>>> Could it perhaps call some resctrl fs code via single IPI that in turn calls the appropriate arch code to set the new
>>> mon event config and re-configures the counter?
>>>
>>
>> I think we can simplify this. We dont have to go thru all the rdtgroups to figure out if the counter is assigned or not.
>>
>> I can move the code inside mon_config_write() after the call mbm_config_write_domain().
> 
> mbm_config_write_domain() already does an IPI so if I understand correctly this will still
> result in a second IPI that seems unnecessary to me. Why can the reconfigure not be done
> with a single IPI?

I think we can try updating the counter configuration in the same IPI. 
Let me try that.

> 
>>
>> Using the domain bitmap we can figure out which of the counters are assigned in the domain. I can use the hardware help to update the assignment for each counter.  This has to be done via IPI.
>> Something like this.
>>
>> static void rdtgroup_abmc_dom_cfg(void *info)
>> {
>>      union l3_qos_abmc_cfg *abmc_cfg = info;
>>          u32 val = abmc_cfg->bw_type;
>>
>>          /* Get the counter configuration */
>>      wrmsrl(MSR_IA32_L3_QOS_ABMC_CFG, *abmc_cfg);
>>      rdmsrl(MGR_IA32_L3_QOS_ABMC_DSC, *abmc_cfg);
>>
> 
> This is not clear to me. I expected MSR_IA32_L3_QOS_ABMC_DSC
> to return the bw_type that was just written to
> MSR_IA32_L3_QOS_ABMC_CFG.
> 
> It is also not clear to me how these registers can be
> used without a valid counter ID. I seem to miss
> the context of this call.

Event configuration changes are domain specific. We have the domain data 
structure and we have the bitmap(mbm_cntr_map) in rdt_mon_domain. This 
bitmap tells us which of the counters in the domain are configured. So, 
we can get the  counter id from this bitmap. Using the counter id, we 
can query the hardware to get the current configuration by this sequence.

/* Get the counter configuration */
for (i=0; i< r->mon.num_mbm_cntrs; i++) {
  if (test_bit(i, d->mbm_cntr_map)) {
    abmc_cfg->cntr_id = i;
    abmc_cfg.split.cfg_en = 0;
    wrmsrl(MSR_IA32_L3_QOS_ABMC_CFG, *abmc_cfg);

    /* Reading L3_QOS_ABMC_DSC returns the configuration of the
     * counter id specified in L3_QOS_ABMC_CFG.cntr_id with RMID(bw_src)
     * and event configuration(bw_type)  Get the counter configuration
     */
    rdmsrl(MGR_IA32_L3_QOS_ABMC_DSC, *abmc_cfg);

  /*
   * We know the new bandwidth to be updated.
   * Update the counter by writing to QOS_ABMC_CFG with the new 
configuration
   */

   if (abmc_cfg->bw_type != val) {
       abmc_cfg->bw_type = val;
       abmc_cfg.split.cfg_en = 1;
       wrmsrl(MSR_IA32_L3_QOS_ABMC_CFG, *abmc_cfg);
      }
  }
}

Hope this helps. I need to pass few information to IPI to make this 
work. Let me know if this is not clear. I will code this tomorrow then 
it will be much more clear.


> 
>>          /* update the counter configuration */
>>          if (abmc_cfg->bw_type != val) {
>>              abmc_cfg->bw_type = val;
>>              abmc_cfg.split.cfg_en = 1;
>>              wrmsrl(MSR_IA32_L3_QOS_ABMC_CFG, *abmc_cfg);
>>          }
>> }
>>
>>
> 
> Reinette
> 

-- 
- Babu Moger

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [PATCH v7 24/24] x86/resctrl: Introduce interface to modify assignment states of the groups
  2024-10-02 18:19       ` Reinette Chatre
@ 2024-10-04  1:11         ` Moger, Babu
  2024-10-04  2:17           ` Reinette Chatre
  0 siblings, 1 reply; 96+ messages in thread
From: Moger, Babu @ 2024-10-04  1:11 UTC (permalink / raw)
  To: Reinette Chatre, babu.moger, corbet, fenghua.yu, tglx, mingo, bp,
	dave.hansen
  Cc: x86, hpa, paulmck, rdunlap, tj, peterz, yanjiewtw, kim.phillips,
	lukas.bulwahn, seanjc, jmattson, leitao, jpoimboe,
	rick.p.edgecombe, kirill.shutemov, jithu.joseph, kai.huang,
	kan.liang, daniel.sneddon, pbonzini, sandipan.das, ilpo.jarvinen,
	peternewman, maciej.wieczor-retman, linux-doc, linux-kernel,
	eranian, james.morse

Hi Reinette,

On 10/2/2024 1:19 PM, Reinette Chatre wrote:
> Hi Babu,
> 
> On 9/27/24 10:47 AM, Moger, Babu wrote:
>> On 9/19/2024 12:59 PM, Reinette Chatre wrote:
>>> On 9/4/24 3:21 PM, Babu Moger wrote:
> 
>>>> v7: Simplified the parsing (strsep(&token, "//") in rdtgroup_mbm_assign_control_write().
>>>>       Added mutex lock in rdtgroup_mbm_assign_control_write() while processing.
>>>>       Renamed rdtgroup_find_grp to rdtgroup_find_grp_by_name.
>>>>       Fixed rdtgroup_str_to_mon_state to return error for invalid flags.
>>>>       Simplified the calls rdtgroup_assign_cntr by merging few functions earlier.
>>>>       Removed ABMC reference in FS code.
>>>>       Reinette commented about handling the combination of flags like 'lt_' and '_lt'.
>>>>       Not sure if we need to change the behaviour here. Processed them sequencially right now.
>>>>       Users have the liberty to pass the flags. Restricting it might be a problem later.
>>>
>>> Could you please give an example of what problem may be encountered later? An assignment
>>> like "domain=_lt" seems like a contradiction to me since user space essentially asks
>>> for "None of the MBM events" as well as "MBM total event" and "MBM local event".
>>
>> I agree it is contradiction. But user is the one who decides to do that. I think we should allow it. Also, there is some value to it as well.
>>
>> "domain=_lt" This will also reset the counters if the total and local events are assigned earlier this action.
> 
> The last sentence is not clear to me. Could you please elaborate what
> you mean with "are assigned earlier this action"?
>

I think I confused you here. "domain=_lt" is equivalent to "domain=lt". 
  My reasoning is handling all the combination in the code adds code 
complexity and leave it the user what he wants to do.
- Babu Moger

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [PATCH v7 22/24] x86/resctrl: Update assignments on event configuration changes
  2024-10-04  0:51         ` Moger, Babu
@ 2024-10-04  2:17           ` Reinette Chatre
  2024-10-04 15:02             ` Moger, Babu
  0 siblings, 1 reply; 96+ messages in thread
From: Reinette Chatre @ 2024-10-04  2:17 UTC (permalink / raw)
  To: babu.moger, corbet, fenghua.yu, tglx, mingo, bp, dave.hansen
  Cc: x86, hpa, paulmck, rdunlap, tj, peterz, yanjiewtw, kim.phillips,
	lukas.bulwahn, seanjc, jmattson, leitao, jpoimboe,
	rick.p.edgecombe, kirill.shutemov, jithu.joseph, kai.huang,
	kan.liang, daniel.sneddon, pbonzini, sandipan.das, ilpo.jarvinen,
	peternewman, maciej.wieczor-retman, linux-doc, linux-kernel,
	eranian, james.morse

Hi Babu,

On 10/3/24 5:51 PM, Moger, Babu wrote:
> Hi Reinette,
> 
> On 10/2/2024 1:20 PM, Reinette Chatre wrote:
>> Hi Babu,
>>
>> On 9/27/24 9:22 AM, Moger, Babu wrote:
>>> Hi Reinitte,
>>>
>>> On 9/19/2024 12:45 PM, Reinette Chatre wrote:
>>>> On 9/4/24 3:21 PM, Babu Moger wrote:
>>
>> ...
>>
>>>>> +}
>>>>> +
>>>>>    static int rdtgroup_mbm_assign_mode_show(struct kernfs_open_file *of,
>>>>>                         struct seq_file *s, void *v)
>>>>>    {
>>>>> @@ -1793,12 +1802,48 @@ static int mbm_local_bytes_config_show(struct kernfs_open_file *of,
>>>>>        return 0;
>>>>>    }
>>>>>    +static int resctrl_mbm_event_update_assign(struct rdt_resource *r,
>>>>> +                       struct rdt_mon_domain *d, u32 evtid)
>>>>> +{
>>>>> +    struct rdt_mon_domain *dom;
>>>>> +    struct rdtgroup *rdtg;
>>>>> +    int ret = 0;
>>>>> +
>>>>> +    if (!resctrl_arch_mbm_cntr_assign_enabled(r))
>>>>> +        return ret;
>>>>> +
>>>>> +    list_for_each_entry(rdtg, &rdt_all_groups, rdtgroup_list) {
>>>>> +        struct rdtgroup *crg;
>>>>> +
>>>>> +        list_for_each_entry(dom, &r->mon_domains, hdr.list) {
>>>>> +            if (d == dom && resctrl_mbm_event_assigned(rdtg, dom, evtid)) {
>>>>> +                ret = rdtgroup_assign_cntr(r, rdtg, dom, evtid);
>>>>> +                if (ret)
>>>>> +                    goto out_done;
>>>>> +            }
>>>>> +        }
>>>>> +
>>>>> +        list_for_each_entry(crg, &rdtg->mon.crdtgrp_list, mon.crdtgrp_list) {
>>>>> +            list_for_each_entry(dom, &r->mon_domains, hdr.list) {
>>>>> +                if (d == dom && resctrl_mbm_event_assigned(crg, dom, evtid)) {
>>>>> +                    ret = rdtgroup_assign_cntr(r, crg, dom, evtid);
>>>>> +                    if (ret)
>>>>> +                        goto out_done;
>>>>> +                }
>>>>> +            }
>>>>> +        }
>>>>> +    }
>>>>> +
>>>>> +out_done:
>>>>> +    return ret;
>>>>> +}
>>>>>      static void mbm_config_write_domain(struct rdt_resource *r,
>>>>>                        struct rdt_mon_domain *d, u32 evtid, u32 val)
>>>>>    {
>>>>>        struct mon_config_info mon_info = {0};
>>>>>        u32 config_val;
>>>>> +    int ret;
>>>>>          /*
>>>>>         * Check the current config value first. If both are the same then
>>>>> @@ -1822,6 +1867,14 @@ static void mbm_config_write_domain(struct rdt_resource *r,
>>>>>                      resctrl_arch_event_config_set,
>>>>>                      &mon_info, 1);
>>>>>    +    /*
>>>>> +     * Counter assignments needs to be updated to match the event
>>>>> +     * configuration.
>>>>> +     */
>>>>> +    ret = resctrl_mbm_event_update_assign(r, d, evtid);
>>>>> +    if (ret)
>>>>> +        rdt_last_cmd_puts("Assign failed, event will be Unavailable\n");
>>>>> +
>>>>
>>>> This does not look right. This function _just_ returned from an IPI on appropriate CPU and then
>>>> starts flow to do _another_ IPI to run code that could have just been run during previous IPI.
>>>> The whole flow to call rdgroup_assign_cntr() also seems like an obfuscated way to call resctrl_arch_assign_cntr()
>>>> to just reconfigure the counter (not actually assign it).
>>>> Could it perhaps call some resctrl fs code via single IPI that in turn calls the appropriate arch code to set the new
>>>> mon event config and re-configures the counter?
>>>>
>>>
>>> I think we can simplify this. We dont have to go thru all the rdtgroups to figure out if the counter is assigned or not.
>>>
>>> I can move the code inside mon_config_write() after the call mbm_config_write_domain().
>>
>> mbm_config_write_domain() already does an IPI so if I understand correctly this will still
>> result in a second IPI that seems unnecessary to me. Why can the reconfigure not be done
>> with a single IPI?
> 
> I think we can try updating the counter configuration in the same IPI. Let me try that.
> 

Thank you very much.

>>
>>>
>>> Using the domain bitmap we can figure out which of the counters are assigned in the domain. I can use the hardware help to update the assignment for each counter.  This has to be done via IPI.
>>> Something like this.
>>>
>>> static void rdtgroup_abmc_dom_cfg(void *info)
>>> {
>>>      union l3_qos_abmc_cfg *abmc_cfg = info;
>>>          u32 val = abmc_cfg->bw_type;
>>>
>>>          /* Get the counter configuration */
>>>      wrmsrl(MSR_IA32_L3_QOS_ABMC_CFG, *abmc_cfg);
>>>      rdmsrl(MGR_IA32_L3_QOS_ABMC_DSC, *abmc_cfg);
>>>
>>
>> This is not clear to me. I expected MSR_IA32_L3_QOS_ABMC_DSC
>> to return the bw_type that was just written to
>> MSR_IA32_L3_QOS_ABMC_CFG.
>>
>> It is also not clear to me how these registers can be
>> used without a valid counter ID. I seem to miss
>> the context of this call.
> 
> Event configuration changes are domain specific. We have the domain data structure and we have the bitmap(mbm_cntr_map) in rdt_mon_domain. This bitmap tells us which of the counters in the domain are configured. So, we can get the  counter id from this bitmap. Using the counter id, we can query the hardware to get the current configuration by this sequence.
> 
> /* Get the counter configuration */
> for (i=0; i< r->mon.num_mbm_cntrs; i++) {
>  if (test_bit(i, d->mbm_cntr_map)) {
>    abmc_cfg->cntr_id = i;
>    abmc_cfg.split.cfg_en = 0;
>    wrmsrl(MSR_IA32_L3_QOS_ABMC_CFG, *abmc_cfg);
> 
>    /* Reading L3_QOS_ABMC_DSC returns the configuration of the
>     * counter id specified in L3_QOS_ABMC_CFG.cntr_id with RMID(bw_src)
>     * and event configuration(bw_type)  Get the counter configuration
>     */
>    rdmsrl(MGR_IA32_L3_QOS_ABMC_DSC, *abmc_cfg);
> 

Apologies but I do still have the same question as before ... wouldn't
MSR_IA32_L3_QOS_ABMC_DSC return the value that was just written to 
MSR_IA32_L3_QOS_ABMC_CFG? If so, the previous wrmsrl() would set the
counter's bw_type to what is set in *abmc_cfg provided as parameter. It
thus still seems unclear why reading it back is necessary.

>  /*
>   * We know the new bandwidth to be updated.
>   * Update the counter by writing to QOS_ABMC_CFG with the new configuration
>   */
> 
>   if (abmc_cfg->bw_type != val) {
>       abmc_cfg->bw_type = val;
>       abmc_cfg.split.cfg_en = 1;
>       wrmsrl(MSR_IA32_L3_QOS_ABMC_CFG, *abmc_cfg);
>      }
>  }
> }
> 
> Hope this helps. I need to pass few information to IPI to make this work. Let me know if this is not clear. I will code this tomorrow then it will be much more clear.
> 

ok, it does seem as though I am not able to follow these snippets and seeing the
full solution should solve that. Thank you.

> 
>>
>>>          /* update the counter configuration */
>>>          if (abmc_cfg->bw_type != val) {
>>>              abmc_cfg->bw_type = val;
>>>              abmc_cfg.split.cfg_en = 1;
>>>              wrmsrl(MSR_IA32_L3_QOS_ABMC_CFG, *abmc_cfg);
>>>          }
>>> }
>>>


Reinette

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [PATCH v7 24/24] x86/resctrl: Introduce interface to modify assignment states of the groups
  2024-10-04  1:11         ` Moger, Babu
@ 2024-10-04  2:17           ` Reinette Chatre
  2024-10-04 16:38             ` Moger, Babu
  0 siblings, 1 reply; 96+ messages in thread
From: Reinette Chatre @ 2024-10-04  2:17 UTC (permalink / raw)
  To: babu.moger, corbet, fenghua.yu, tglx, mingo, bp, dave.hansen
  Cc: x86, hpa, paulmck, rdunlap, tj, peterz, yanjiewtw, kim.phillips,
	lukas.bulwahn, seanjc, jmattson, leitao, jpoimboe,
	rick.p.edgecombe, kirill.shutemov, jithu.joseph, kai.huang,
	kan.liang, daniel.sneddon, pbonzini, sandipan.das, ilpo.jarvinen,
	peternewman, maciej.wieczor-retman, linux-doc, linux-kernel,
	eranian, james.morse

Hi Babu,

On 10/3/24 6:11 PM, Moger, Babu wrote:
> Hi Reinette,
> 
> On 10/2/2024 1:19 PM, Reinette Chatre wrote:
>> Hi Babu,
>>
>> On 9/27/24 10:47 AM, Moger, Babu wrote:
>>> On 9/19/2024 12:59 PM, Reinette Chatre wrote:
>>>> On 9/4/24 3:21 PM, Babu Moger wrote:
>>
>>>>> v7: Simplified the parsing (strsep(&token, "//") in rdtgroup_mbm_assign_control_write().
>>>>>       Added mutex lock in rdtgroup_mbm_assign_control_write() while processing.
>>>>>       Renamed rdtgroup_find_grp to rdtgroup_find_grp_by_name.
>>>>>       Fixed rdtgroup_str_to_mon_state to return error for invalid flags.
>>>>>       Simplified the calls rdtgroup_assign_cntr by merging few functions earlier.
>>>>>       Removed ABMC reference in FS code.
>>>>>       Reinette commented about handling the combination of flags like 'lt_' and '_lt'.
>>>>>       Not sure if we need to change the behaviour here. Processed them sequencially right now.
>>>>>       Users have the liberty to pass the flags. Restricting it might be a problem later.
>>>>
>>>> Could you please give an example of what problem may be encountered later? An assignment
>>>> like "domain=_lt" seems like a contradiction to me since user space essentially asks
>>>> for "None of the MBM events" as well as "MBM total event" and "MBM local event".
>>>
>>> I agree it is contradiction. But user is the one who decides to do that. I think we should allow it. Also, there is some value to it as well.
>>>
>>> "domain=_lt" This will also reset the counters if the total and local events are assigned earlier this action.
>>
>> The last sentence is not clear to me. Could you please elaborate what
>> you mean with "are assigned earlier this action"?
>>
> 
> I think I confused you here. "domain=_lt" is equivalent to "domain=lt".  My reasoning is handling all the combination in the code adds code complexity and leave it the user what he wants to do.

hmmm ... and how about "domain=lt_"? Do you think this should also be equivalent to
"domain=lt" or perhaps an expectation that counters should be assigned to the two events
and then immediately unassigned?

Giving user such flexibility may be interpreted as the assignment seen as acting
sequentially through the flags provided. Ideally the interface should behave in
a predictable way if the goal is to provide flexibility to the user. 

Reinette


^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [PATCH v7 22/24] x86/resctrl: Update assignments on event configuration changes
  2024-10-04  2:17           ` Reinette Chatre
@ 2024-10-04 15:02             ` Moger, Babu
  2024-10-04 15:53               ` Reinette Chatre
  2024-10-08  0:01               ` Moger, Babu
  0 siblings, 2 replies; 96+ messages in thread
From: Moger, Babu @ 2024-10-04 15:02 UTC (permalink / raw)
  To: Reinette Chatre, babu.moger, corbet, fenghua.yu, tglx, mingo, bp,
	dave.hansen
  Cc: x86, hpa, paulmck, rdunlap, tj, peterz, yanjiewtw, kim.phillips,
	lukas.bulwahn, seanjc, jmattson, leitao, jpoimboe,
	rick.p.edgecombe, kirill.shutemov, jithu.joseph, kai.huang,
	kan.liang, daniel.sneddon, pbonzini, sandipan.das, ilpo.jarvinen,
	peternewman, maciej.wieczor-retman, linux-doc, linux-kernel,
	eranian, james.morse

Hi Reinette,

On 10/3/2024 9:17 PM, Reinette Chatre wrote:
> Hi Babu,
> 
> On 10/3/24 5:51 PM, Moger, Babu wrote:
>> Hi Reinette,
>>
>> On 10/2/2024 1:20 PM, Reinette Chatre wrote:
>>> Hi Babu,
>>>
>>> On 9/27/24 9:22 AM, Moger, Babu wrote:
>>>> Hi Reinitte,
>>>>
>>>> On 9/19/2024 12:45 PM, Reinette Chatre wrote:
>>>>> On 9/4/24 3:21 PM, Babu Moger wrote:
>>>
>>> ...
>>>
>>>>>> +}
>>>>>> +
>>>>>>     static int rdtgroup_mbm_assign_mode_show(struct kernfs_open_file *of,
>>>>>>                          struct seq_file *s, void *v)
>>>>>>     {
>>>>>> @@ -1793,12 +1802,48 @@ static int mbm_local_bytes_config_show(struct kernfs_open_file *of,
>>>>>>         return 0;
>>>>>>     }
>>>>>>     +static int resctrl_mbm_event_update_assign(struct rdt_resource *r,
>>>>>> +                       struct rdt_mon_domain *d, u32 evtid)
>>>>>> +{
>>>>>> +    struct rdt_mon_domain *dom;
>>>>>> +    struct rdtgroup *rdtg;
>>>>>> +    int ret = 0;
>>>>>> +
>>>>>> +    if (!resctrl_arch_mbm_cntr_assign_enabled(r))
>>>>>> +        return ret;
>>>>>> +
>>>>>> +    list_for_each_entry(rdtg, &rdt_all_groups, rdtgroup_list) {
>>>>>> +        struct rdtgroup *crg;
>>>>>> +
>>>>>> +        list_for_each_entry(dom, &r->mon_domains, hdr.list) {
>>>>>> +            if (d == dom && resctrl_mbm_event_assigned(rdtg, dom, evtid)) {
>>>>>> +                ret = rdtgroup_assign_cntr(r, rdtg, dom, evtid);
>>>>>> +                if (ret)
>>>>>> +                    goto out_done;
>>>>>> +            }
>>>>>> +        }
>>>>>> +
>>>>>> +        list_for_each_entry(crg, &rdtg->mon.crdtgrp_list, mon.crdtgrp_list) {
>>>>>> +            list_for_each_entry(dom, &r->mon_domains, hdr.list) {
>>>>>> +                if (d == dom && resctrl_mbm_event_assigned(crg, dom, evtid)) {
>>>>>> +                    ret = rdtgroup_assign_cntr(r, crg, dom, evtid);
>>>>>> +                    if (ret)
>>>>>> +                        goto out_done;
>>>>>> +                }
>>>>>> +            }
>>>>>> +        }
>>>>>> +    }
>>>>>> +
>>>>>> +out_done:
>>>>>> +    return ret;
>>>>>> +}
>>>>>>       static void mbm_config_write_domain(struct rdt_resource *r,
>>>>>>                         struct rdt_mon_domain *d, u32 evtid, u32 val)
>>>>>>     {
>>>>>>         struct mon_config_info mon_info = {0};
>>>>>>         u32 config_val;
>>>>>> +    int ret;
>>>>>>           /*
>>>>>>          * Check the current config value first. If both are the same then
>>>>>> @@ -1822,6 +1867,14 @@ static void mbm_config_write_domain(struct rdt_resource *r,
>>>>>>                       resctrl_arch_event_config_set,
>>>>>>                       &mon_info, 1);
>>>>>>     +    /*
>>>>>> +     * Counter assignments needs to be updated to match the event
>>>>>> +     * configuration.
>>>>>> +     */
>>>>>> +    ret = resctrl_mbm_event_update_assign(r, d, evtid);
>>>>>> +    if (ret)
>>>>>> +        rdt_last_cmd_puts("Assign failed, event will be Unavailable\n");
>>>>>> +
>>>>>
>>>>> This does not look right. This function _just_ returned from an IPI on appropriate CPU and then
>>>>> starts flow to do _another_ IPI to run code that could have just been run during previous IPI.
>>>>> The whole flow to call rdgroup_assign_cntr() also seems like an obfuscated way to call resctrl_arch_assign_cntr()
>>>>> to just reconfigure the counter (not actually assign it).
>>>>> Could it perhaps call some resctrl fs code via single IPI that in turn calls the appropriate arch code to set the new
>>>>> mon event config and re-configures the counter?
>>>>>
>>>>
>>>> I think we can simplify this. We dont have to go thru all the rdtgroups to figure out if the counter is assigned or not.
>>>>
>>>> I can move the code inside mon_config_write() after the call mbm_config_write_domain().
>>>
>>> mbm_config_write_domain() already does an IPI so if I understand correctly this will still
>>> result in a second IPI that seems unnecessary to me. Why can the reconfigure not be done
>>> with a single IPI?
>>
>> I think we can try updating the counter configuration in the same IPI. Let me try that.
>>
> 
> Thank you very much.
> 
>>>
>>>>
>>>> Using the domain bitmap we can figure out which of the counters are assigned in the domain. I can use the hardware help to update the assignment for each counter.  This has to be done via IPI.
>>>> Something like this.
>>>>
>>>> static void rdtgroup_abmc_dom_cfg(void *info)
>>>> {
>>>>       union l3_qos_abmc_cfg *abmc_cfg = info;
>>>>           u32 val = abmc_cfg->bw_type;
>>>>
>>>>           /* Get the counter configuration */
>>>>       wrmsrl(MSR_IA32_L3_QOS_ABMC_CFG, *abmc_cfg);
>>>>       rdmsrl(MGR_IA32_L3_QOS_ABMC_DSC, *abmc_cfg);
>>>>
>>>
>>> This is not clear to me. I expected MSR_IA32_L3_QOS_ABMC_DSC
>>> to return the bw_type that was just written to
>>> MSR_IA32_L3_QOS_ABMC_CFG.
>>>
>>> It is also not clear to me how these registers can be
>>> used without a valid counter ID. I seem to miss
>>> the context of this call.
>>
>> Event configuration changes are domain specific. We have the domain data structure and we have the bitmap(mbm_cntr_map) in rdt_mon_domain. This bitmap tells us which of the counters in the domain are configured. So, we can get the  counter id from this bitmap. Using the counter id, we can query the hardware to get the current configuration by this sequence.
>>
>> /* Get the counter configuration */
>> for (i=0; i< r->mon.num_mbm_cntrs; i++) {
>>   if (test_bit(i, d->mbm_cntr_map)) {
>>     abmc_cfg->cntr_id = i;
>>     abmc_cfg.split.cfg_en = 0;
>>     wrmsrl(MSR_IA32_L3_QOS_ABMC_CFG, *abmc_cfg);
>>
>>     /* Reading L3_QOS_ABMC_DSC returns the configuration of the
>>      * counter id specified in L3_QOS_ABMC_CFG.cntr_id with RMID(bw_src)
>>      * and event configuration(bw_type)  Get the counter configuration
>>      */
>>     rdmsrl(MGR_IA32_L3_QOS_ABMC_DSC, *abmc_cfg);
>>
> 
> Apologies but I do still have the same question as before ... wouldn't
> MSR_IA32_L3_QOS_ABMC_DSC return the value that was just written to
> MSR_IA32_L3_QOS_ABMC_CFG? If so, the previous wrmsrl() would set the
> counter's bw_type to what is set in *abmc_cfg provided as parameter. It
> thus still seems unclear why reading it back is necessary.

Yes. It is not clear in the spec.

QOS_ABMC_DSC is read-only MSR and used only to get the configured 
counter id information.

The configuration is only updated when QOS_ABMC_CFG.cfg_en = 1.

When you write QOS_ABMC_CFG with cntr_id = n, abmc_cfg.split.cfg_en = 0 
and reading the QOS_ABMC_DSC back will return the configuration of 
cntr_id. Note that when abmc_cfg.split.cfg_en = 0, it will not change 
the counter id configuration. when you read QOS_ABMC_DSC back here, we 
will get bw_type (event config), bw_src (RMID) etc.

union l3_qos_abmc_cfg {
	struct {
		unsigned long bw_type  :32,
			      bw_src   :12,
			      reserved1: 3,
			      is_clos  : 1,
			      cntr_id  : 5,
			      reserved : 9,
			      cntr_en  : 1,
			      cfg_en   : 1;
	} split;
	unsigned long full;
};

We need to update bw_tyoe (event config). When we write QOS_ABMC_CFG 
back with abmc_cfg.split.cfg_en = 1, the configuration will be updated.

if (abmc_cfg->bw_type != val) {
        abmc_cfg->bw_type = val;
        abmc_cfg.split.cfg_en = 1;
        wrmsrl(MSR_IA32_L3_QOS_ABMC_CFG, *abmc_cfg);
    }

I will send you the code later today.



-- 
- Babu Moger

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [PATCH v7 22/24] x86/resctrl: Update assignments on event configuration changes
  2024-10-04 15:02             ` Moger, Babu
@ 2024-10-04 15:53               ` Reinette Chatre
  2024-10-08  0:01               ` Moger, Babu
  1 sibling, 0 replies; 96+ messages in thread
From: Reinette Chatre @ 2024-10-04 15:53 UTC (permalink / raw)
  To: babu.moger, corbet, fenghua.yu, tglx, mingo, bp, dave.hansen
  Cc: x86, hpa, paulmck, rdunlap, tj, peterz, yanjiewtw, kim.phillips,
	lukas.bulwahn, seanjc, jmattson, leitao, jpoimboe,
	rick.p.edgecombe, kirill.shutemov, jithu.joseph, kai.huang,
	kan.liang, daniel.sneddon, pbonzini, sandipan.das, ilpo.jarvinen,
	peternewman, maciej.wieczor-retman, linux-doc, linux-kernel,
	eranian, james.morse

Hi Babu,

On 10/4/24 8:02 AM, Moger, Babu wrote:
> On 10/3/2024 9:17 PM, Reinette Chatre wrote:
>> On 10/3/24 5:51 PM, Moger, Babu wrote:
>>> On 10/2/2024 1:20 PM, Reinette Chatre wrote:
>>>> On 9/27/24 9:22 AM, Moger, Babu wrote:
>>>>> On 9/19/2024 12:45 PM, Reinette Chatre wrote:
>>>>>> On 9/4/24 3:21 PM, Babu Moger wrote:

>>>>> Using the domain bitmap we can figure out which of the counters are assigned in the domain. I can use the hardware help to update the assignment for each counter.  This has to be done via IPI.
>>>>> Something like this.
>>>>>
>>>>> static void rdtgroup_abmc_dom_cfg(void *info)
>>>>> {
>>>>>       union l3_qos_abmc_cfg *abmc_cfg = info;
>>>>>           u32 val = abmc_cfg->bw_type;
>>>>>
>>>>>           /* Get the counter configuration */
>>>>>       wrmsrl(MSR_IA32_L3_QOS_ABMC_CFG, *abmc_cfg);
>>>>>       rdmsrl(MGR_IA32_L3_QOS_ABMC_DSC, *abmc_cfg);
>>>>>
>>>>
>>>> This is not clear to me. I expected MSR_IA32_L3_QOS_ABMC_DSC
>>>> to return the bw_type that was just written to
>>>> MSR_IA32_L3_QOS_ABMC_CFG.
>>>>
>>>> It is also not clear to me how these registers can be
>>>> used without a valid counter ID. I seem to miss
>>>> the context of this call.
>>>
>>> Event configuration changes are domain specific. We have the domain data structure and we have the bitmap(mbm_cntr_map) in rdt_mon_domain. This bitmap tells us which of the counters in the domain are configured. So, we can get the  counter id from this bitmap. Using the counter id, we can query the hardware to get the current configuration by this sequence.
>>>
>>> /* Get the counter configuration */
>>> for (i=0; i< r->mon.num_mbm_cntrs; i++) {
>>>   if (test_bit(i, d->mbm_cntr_map)) {
>>>     abmc_cfg->cntr_id = i;
>>>     abmc_cfg.split.cfg_en = 0;
>>>     wrmsrl(MSR_IA32_L3_QOS_ABMC_CFG, *abmc_cfg);
>>>
>>>     /* Reading L3_QOS_ABMC_DSC returns the configuration of the
>>>      * counter id specified in L3_QOS_ABMC_CFG.cntr_id with RMID(bw_src)
>>>      * and event configuration(bw_type)  Get the counter configuration
>>>      */
>>>     rdmsrl(MGR_IA32_L3_QOS_ABMC_DSC, *abmc_cfg);
>>>
>>
>> Apologies but I do still have the same question as before ... wouldn't
>> MSR_IA32_L3_QOS_ABMC_DSC return the value that was just written to
>> MSR_IA32_L3_QOS_ABMC_CFG? If so, the previous wrmsrl() would set the
>> counter's bw_type to what is set in *abmc_cfg provided as parameter. It
>> thus still seems unclear why reading it back is necessary.
> 
> Yes. It is not clear in the spec.
> 
> QOS_ABMC_DSC is read-only MSR and used only to get the configured counter id information.
> 
> The configuration is only updated when QOS_ABMC_CFG.cfg_en = 1.
> 
> When you write QOS_ABMC_CFG with cntr_id = n, abmc_cfg.split.cfg_en
> = 0 and reading the QOS_ABMC_DSC back will return the configuration
> of cntr_id. Note that when abmc_cfg.split.cfg_en = 0, it will not
> change the counter id configuration. when you read QOS_ABMC_DSC back
> here, we will get bw_type (event config), bw_src (RMID) etc.

ah. I see now. Thank you very much for clarifying.

Reinette


^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [PATCH v7 24/24] x86/resctrl: Introduce interface to modify assignment states of the groups
  2024-10-04  2:17           ` Reinette Chatre
@ 2024-10-04 16:38             ` Moger, Babu
  2024-10-04 16:52               ` Reinette Chatre
  0 siblings, 1 reply; 96+ messages in thread
From: Moger, Babu @ 2024-10-04 16:38 UTC (permalink / raw)
  To: Reinette Chatre, babu.moger, corbet, fenghua.yu, tglx, mingo, bp,
	dave.hansen
  Cc: x86, hpa, paulmck, rdunlap, tj, peterz, yanjiewtw, kim.phillips,
	lukas.bulwahn, seanjc, jmattson, leitao, jpoimboe,
	rick.p.edgecombe, kirill.shutemov, jithu.joseph, kai.huang,
	kan.liang, daniel.sneddon, pbonzini, sandipan.das, ilpo.jarvinen,
	peternewman, maciej.wieczor-retman, linux-doc, linux-kernel,
	eranian, james.morse

Hi Reinette,

On 10/3/2024 9:17 PM, Reinette Chatre wrote:
> Hi Babu,
> 
> On 10/3/24 6:11 PM, Moger, Babu wrote:
>> Hi Reinette,
>>
>> On 10/2/2024 1:19 PM, Reinette Chatre wrote:
>>> Hi Babu,
>>>
>>> On 9/27/24 10:47 AM, Moger, Babu wrote:
>>>> On 9/19/2024 12:59 PM, Reinette Chatre wrote:
>>>>> On 9/4/24 3:21 PM, Babu Moger wrote:
>>>
>>>>>> v7: Simplified the parsing (strsep(&token, "//") in rdtgroup_mbm_assign_control_write().
>>>>>>        Added mutex lock in rdtgroup_mbm_assign_control_write() while processing.
>>>>>>        Renamed rdtgroup_find_grp to rdtgroup_find_grp_by_name.
>>>>>>        Fixed rdtgroup_str_to_mon_state to return error for invalid flags.
>>>>>>        Simplified the calls rdtgroup_assign_cntr by merging few functions earlier.
>>>>>>        Removed ABMC reference in FS code.
>>>>>>        Reinette commented about handling the combination of flags like 'lt_' and '_lt'.
>>>>>>        Not sure if we need to change the behaviour here. Processed them sequencially right now.
>>>>>>        Users have the liberty to pass the flags. Restricting it might be a problem later.
>>>>>
>>>>> Could you please give an example of what problem may be encountered later? An assignment
>>>>> like "domain=_lt" seems like a contradiction to me since user space essentially asks
>>>>> for "None of the MBM events" as well as "MBM total event" and "MBM local event".
>>>>
>>>> I agree it is contradiction. But user is the one who decides to do that. I think we should allow it. Also, there is some value to it as well.
>>>>
>>>> "domain=_lt" This will also reset the counters if the total and local events are assigned earlier this action.
>>>
>>> The last sentence is not clear to me. Could you please elaborate what
>>> you mean with "are assigned earlier this action"?
>>>
>>
>> I think I confused you here. "domain=_lt" is equivalent to "domain=lt".  My reasoning is handling all the combination in the code adds code complexity and leave it the user what he wants to do.
> 
> hmmm ... and how about "domain=lt_"? Do you think this should also be equivalent to
> "domain=lt" or perhaps an expectation that counters should be assigned to the two events
> and then immediately unassigned?

Yes. "domain=lt_" should be "domain=lt".

> 
> Giving user such flexibility may be interpreted as the assignment seen as acting
> sequentially through the flags provided. Ideally the interface should behave in
> a predictable way if the goal is to provide flexibility to the user.
> 

My only concern is adding the check now and reverting it back later.
Basically process the flags sequentially and don't differentiate between 
the flags. I feel it fits the predictable behavior. No?

-- 
- Babu Moger

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [PATCH v7 24/24] x86/resctrl: Introduce interface to modify assignment states of the groups
  2024-10-04 16:38             ` Moger, Babu
@ 2024-10-04 16:52               ` Reinette Chatre
  2024-10-04 19:36                 ` Moger, Babu
  0 siblings, 1 reply; 96+ messages in thread
From: Reinette Chatre @ 2024-10-04 16:52 UTC (permalink / raw)
  To: babu.moger, corbet, fenghua.yu, tglx, mingo, bp, dave.hansen
  Cc: x86, hpa, paulmck, rdunlap, tj, peterz, yanjiewtw, kim.phillips,
	lukas.bulwahn, seanjc, jmattson, leitao, jpoimboe,
	rick.p.edgecombe, kirill.shutemov, jithu.joseph, kai.huang,
	kan.liang, daniel.sneddon, pbonzini, sandipan.das, ilpo.jarvinen,
	peternewman, maciej.wieczor-retman, linux-doc, linux-kernel,
	eranian, james.morse

Hi Babu,

On 10/4/24 9:38 AM, Moger, Babu wrote:
> Hi Reinette,
> 
> On 10/3/2024 9:17 PM, Reinette Chatre wrote:
>> Hi Babu,
>>
>> On 10/3/24 6:11 PM, Moger, Babu wrote:
>>> Hi Reinette,
>>>
>>> On 10/2/2024 1:19 PM, Reinette Chatre wrote:
>>>> Hi Babu,
>>>>
>>>> On 9/27/24 10:47 AM, Moger, Babu wrote:
>>>>> On 9/19/2024 12:59 PM, Reinette Chatre wrote:
>>>>>> On 9/4/24 3:21 PM, Babu Moger wrote:
>>>>
>>>>>>> v7: Simplified the parsing (strsep(&token, "//") in rdtgroup_mbm_assign_control_write().
>>>>>>>        Added mutex lock in rdtgroup_mbm_assign_control_write() while processing.
>>>>>>>        Renamed rdtgroup_find_grp to rdtgroup_find_grp_by_name.
>>>>>>>        Fixed rdtgroup_str_to_mon_state to return error for invalid flags.
>>>>>>>        Simplified the calls rdtgroup_assign_cntr by merging few functions earlier.
>>>>>>>        Removed ABMC reference in FS code.
>>>>>>>        Reinette commented about handling the combination of flags like 'lt_' and '_lt'.
>>>>>>>        Not sure if we need to change the behaviour here. Processed them sequencially right now.
>>>>>>>        Users have the liberty to pass the flags. Restricting it might be a problem later.
>>>>>>
>>>>>> Could you please give an example of what problem may be encountered later? An assignment
>>>>>> like "domain=_lt" seems like a contradiction to me since user space essentially asks
>>>>>> for "None of the MBM events" as well as "MBM total event" and "MBM local event".
>>>>>
>>>>> I agree it is contradiction. But user is the one who decides to do that. I think we should allow it. Also, there is some value to it as well.
>>>>>
>>>>> "domain=_lt" This will also reset the counters if the total and local events are assigned earlier this action.
>>>>
>>>> The last sentence is not clear to me. Could you please elaborate what
>>>> you mean with "are assigned earlier this action"?
>>>>
>>>
>>> I think I confused you here. "domain=_lt" is equivalent to "domain=lt".  My reasoning is handling all the combination in the code adds code complexity and leave it the user what he wants to do.
>>
>> hmmm ... and how about "domain=lt_"? Do you think this should also be equivalent to
>> "domain=lt" or perhaps an expectation that counters should be assigned to the two events
>> and then immediately unassigned?
> 
> Yes. "domain=lt_" should be "domain=lt".
> 
>>
>> Giving user such flexibility may be interpreted as the assignment seen as acting
>> sequentially through the flags provided. Ideally the interface should behave in
>> a predictable way if the goal is to provide flexibility to the user.
>>
> 
> My only concern is adding the check now and reverting it back later.
> Basically process the flags sequentially and don't differentiate between the flags. I feel it fits the predictable behavior. No?

This is the point I was trying to make. If flags are processed sequentially then it would be
predictable behavior and if that is documented expectation then that should be ok. The problem
that I want to highlight is that if predictable sequential processing is the goal then
"domain=_lt" cannot be interpreted the same as "domain="lt_". When flags in "domain=lt_"
are processed sequentially then final state should be "domain=_", no?

If sequential processing is done then "domain=_lt" means "unassign all counters followed
by assign of counter to local MBM monitoring, followed by assign of counter to total MBM
monitoring". Similarly, "domain=lt_" means "assign a counter to local MBM monitoring, then
assign a counter to total MBM monitoring, then unassign all counters".

If this sequential processing is the goal then the implementation would still need to be
adapted. Consider, for example, "domain=lt" ... with sequential processing the user
indicates/expects that "local MBM monitoring" has priority if there is only one counter
available, but the current implementation does not process it sequentially and would end up
assigning counter to "total MBM monitoring" first.

Reinette
 


^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [PATCH v7 24/24] x86/resctrl: Introduce interface to modify assignment states of the groups
  2024-10-04 16:52               ` Reinette Chatre
@ 2024-10-04 19:36                 ` Moger, Babu
  2024-10-04 21:09                   ` Reinette Chatre
  0 siblings, 1 reply; 96+ messages in thread
From: Moger, Babu @ 2024-10-04 19:36 UTC (permalink / raw)
  To: Reinette Chatre, babu.moger, corbet, fenghua.yu, tglx, mingo, bp,
	dave.hansen
  Cc: x86, hpa, paulmck, rdunlap, tj, peterz, yanjiewtw, kim.phillips,
	lukas.bulwahn, seanjc, jmattson, leitao, jpoimboe,
	rick.p.edgecombe, kirill.shutemov, jithu.joseph, kai.huang,
	kan.liang, daniel.sneddon, pbonzini, sandipan.das, ilpo.jarvinen,
	peternewman, maciej.wieczor-retman, linux-doc, linux-kernel,
	eranian, james.morse

Hi Reinette,

On 10/4/2024 11:52 AM, Reinette Chatre wrote:
> Hi Babu,
> 
> On 10/4/24 9:38 AM, Moger, Babu wrote:
>> Hi Reinette,
>>
>> On 10/3/2024 9:17 PM, Reinette Chatre wrote:
>>> Hi Babu,
>>>
>>> On 10/3/24 6:11 PM, Moger, Babu wrote:
>>>> Hi Reinette,
>>>>
>>>> On 10/2/2024 1:19 PM, Reinette Chatre wrote:
>>>>> Hi Babu,
>>>>>
>>>>> On 9/27/24 10:47 AM, Moger, Babu wrote:
>>>>>> On 9/19/2024 12:59 PM, Reinette Chatre wrote:
>>>>>>> On 9/4/24 3:21 PM, Babu Moger wrote:
>>>>>
>>>>>>>> v7: Simplified the parsing (strsep(&token, "//") in rdtgroup_mbm_assign_control_write().
>>>>>>>>         Added mutex lock in rdtgroup_mbm_assign_control_write() while processing.
>>>>>>>>         Renamed rdtgroup_find_grp to rdtgroup_find_grp_by_name.
>>>>>>>>         Fixed rdtgroup_str_to_mon_state to return error for invalid flags.
>>>>>>>>         Simplified the calls rdtgroup_assign_cntr by merging few functions earlier.
>>>>>>>>         Removed ABMC reference in FS code.
>>>>>>>>         Reinette commented about handling the combination of flags like 'lt_' and '_lt'.
>>>>>>>>         Not sure if we need to change the behaviour here. Processed them sequencially right now.
>>>>>>>>         Users have the liberty to pass the flags. Restricting it might be a problem later.
>>>>>>>
>>>>>>> Could you please give an example of what problem may be encountered later? An assignment
>>>>>>> like "domain=_lt" seems like a contradiction to me since user space essentially asks
>>>>>>> for "None of the MBM events" as well as "MBM total event" and "MBM local event".
>>>>>>
>>>>>> I agree it is contradiction. But user is the one who decides to do that. I think we should allow it. Also, there is some value to it as well.
>>>>>>
>>>>>> "domain=_lt" This will also reset the counters if the total and local events are assigned earlier this action.
>>>>>
>>>>> The last sentence is not clear to me. Could you please elaborate what
>>>>> you mean with "are assigned earlier this action"?
>>>>>
>>>>
>>>> I think I confused you here. "domain=_lt" is equivalent to "domain=lt".  My reasoning is handling all the combination in the code adds code complexity and leave it the user what he wants to do.
>>>
>>> hmmm ... and how about "domain=lt_"? Do you think this should also be equivalent to
>>> "domain=lt" or perhaps an expectation that counters should be assigned to the two events
>>> and then immediately unassigned?
>>
>> Yes. "domain=lt_" should be "domain=lt".
>>
>>>
>>> Giving user such flexibility may be interpreted as the assignment seen as acting
>>> sequentially through the flags provided. Ideally the interface should behave in
>>> a predictable way if the goal is to provide flexibility to the user.
>>>
>>
>> My only concern is adding the check now and reverting it back later.
>> Basically process the flags sequentially and don't differentiate between the flags. I feel it fits the predictable behavior. No?
> 
> This is the point I was trying to make. If flags are processed sequentially then it would be
> predictable behavior and if that is documented expectation then that should be ok. The problem
> that I want to highlight is that if predictable sequential processing is the goal then
> "domain=_lt" cannot be interpreted the same as "domain="lt_". When flags in "domain=lt_"
> are processed sequentially then final state should be "domain=_", no?

Yes. that is correct.
> 
> If sequential processing is done then "domain=_lt" means "unassign all counters followed
> by assign of counter to local MBM monitoring, followed by assign of counter to total MBM
> monitoring". Similarly, "domain=lt_" means "assign a counter to local MBM monitoring, then
> assign a counter to total MBM monitoring, then unassign all counters".

Yes. That is correct.
> 
> If this sequential processing is the goal then the implementation would still need to be
> adapted. Consider, for example, "domain=lt" ... with sequential processing the user
> indicates/expects that "local MBM monitoring" has priority if there is only one counter
> available, but the current implementation does not process it sequentially and would end up
> assigning counter to "total MBM monitoring" first.

Sure. Lets accommodate the sequential processing. Process the  flags in 
the order it is provided. I need to make few changes to 
rdtgroup_process_flags() to address it. Hopefully, it can be done 
without much complexity. Thanks
-- 
- Babu Moger

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [PATCH v7 24/24] x86/resctrl: Introduce interface to modify assignment states of the groups
  2024-10-04 19:36                 ` Moger, Babu
@ 2024-10-04 21:09                   ` Reinette Chatre
  2024-10-05  0:23                     ` Moger, Babu
  0 siblings, 1 reply; 96+ messages in thread
From: Reinette Chatre @ 2024-10-04 21:09 UTC (permalink / raw)
  To: babu.moger, corbet, fenghua.yu, tglx, mingo, bp, dave.hansen
  Cc: x86, hpa, paulmck, rdunlap, tj, peterz, yanjiewtw, kim.phillips,
	lukas.bulwahn, seanjc, jmattson, leitao, jpoimboe,
	rick.p.edgecombe, kirill.shutemov, jithu.joseph, kai.huang,
	kan.liang, daniel.sneddon, pbonzini, sandipan.das, ilpo.jarvinen,
	peternewman, maciej.wieczor-retman, linux-doc, linux-kernel,
	eranian, james.morse

Hi Babu,

On 10/4/24 12:36 PM, Moger, Babu wrote:
> On 10/4/2024 11:52 AM, Reinette Chatre wrote:
>> On 10/4/24 9:38 AM, Moger, Babu wrote:
>>> On 10/3/2024 9:17 PM, Reinette Chatre wrote:
>>>> On 10/3/24 6:11 PM, Moger, Babu wrote:
>>>>> On 10/2/2024 1:19 PM, Reinette Chatre wrote:
>>>>>> On 9/27/24 10:47 AM, Moger, Babu wrote:
>>>>>>> On 9/19/2024 12:59 PM, Reinette Chatre wrote:
>>>>>>>> On 9/4/24 3:21 PM, Babu Moger wrote:
>>>>>>
>>>>>>>>> v7: Simplified the parsing (strsep(&token, "//") in rdtgroup_mbm_assign_control_write().
>>>>>>>>>         Added mutex lock in rdtgroup_mbm_assign_control_write() while processing.
>>>>>>>>>         Renamed rdtgroup_find_grp to rdtgroup_find_grp_by_name.
>>>>>>>>>         Fixed rdtgroup_str_to_mon_state to return error for invalid flags.
>>>>>>>>>         Simplified the calls rdtgroup_assign_cntr by merging few functions earlier.
>>>>>>>>>         Removed ABMC reference in FS code.
>>>>>>>>>         Reinette commented about handling the combination of flags like 'lt_' and '_lt'.
>>>>>>>>>         Not sure if we need to change the behaviour here. Processed them sequencially right now.
>>>>>>>>>         Users have the liberty to pass the flags. Restricting it might be a problem later.
>>>>>>>>
>>>>>>>> Could you please give an example of what problem may be encountered later? An assignment
>>>>>>>> like "domain=_lt" seems like a contradiction to me since user space essentially asks
>>>>>>>> for "None of the MBM events" as well as "MBM total event" and "MBM local event".
>>>>>>>
>>>>>>> I agree it is contradiction. But user is the one who decides to do that. I think we should allow it. Also, there is some value to it as well.
>>>>>>>
>>>>>>> "domain=_lt" This will also reset the counters if the total and local events are assigned earlier this action.
>>>>>>
>>>>>> The last sentence is not clear to me. Could you please elaborate what
>>>>>> you mean with "are assigned earlier this action"?
>>>>>>
>>>>>
>>>>> I think I confused you here. "domain=_lt" is equivalent to "domain=lt".  My reasoning is handling all the combination in the code adds code complexity and leave it the user what he wants to do.
>>>>
>>>> hmmm ... and how about "domain=lt_"? Do you think this should also be equivalent to
>>>> "domain=lt" or perhaps an expectation that counters should be assigned to the two events
>>>> and then immediately unassigned?
>>>
>>> Yes. "domain=lt_" should be "domain=lt".
>>>
>>>>
>>>> Giving user such flexibility may be interpreted as the assignment seen as acting
>>>> sequentially through the flags provided. Ideally the interface should behave in
>>>> a predictable way if the goal is to provide flexibility to the user.
>>>>
>>>
>>> My only concern is adding the check now and reverting it back later.
>>> Basically process the flags sequentially and don't differentiate between the flags. I feel it fits the predictable behavior. No?
>>
>> This is the point I was trying to make. If flags are processed sequentially then it would be
>> predictable behavior and if that is documented expectation then that should be ok. The problem
>> that I want to highlight is that if predictable sequential processing is the goal then
>> "domain=_lt" cannot be interpreted the same as "domain="lt_". When flags in "domain=lt_"
>> are processed sequentially then final state should be "domain=_", no?
> 
> Yes. that is correct.
>>
>> If sequential processing is done then "domain=_lt" means "unassign all counters followed
>> by assign of counter to local MBM monitoring, followed by assign of counter to total MBM
>> monitoring". Similarly, "domain=lt_" means "assign a counter to local MBM monitoring, then
>> assign a counter to total MBM monitoring, then unassign all counters".
> 
> Yes. That is correct.
>>
>> If this sequential processing is the goal then the implementation would still need to be
>> adapted. Consider, for example, "domain=lt" ... with sequential processing the user
>> indicates/expects that "local MBM monitoring" has priority if there is only one counter
>> available, but the current implementation does not process it sequentially and would end up
>> assigning counter to "total MBM monitoring" first.
> 
> Sure. Lets accommodate the sequential processing. Process the  flags
> in the order it is provided. I need to make few changes to
> rdtgroup_process_flags() to address it. Hopefully, it can be done
> without much complexity. Thanks

I doubt that the implementation would be complex but it may take some effort for it
to be efficient ... taking actions that involve changing kernel and HW state for each
flag as it is encountered vs. parsing all flags and changing kernel and HW state once.

The risk is that a simple request like "domain=lt" may take twice as long when
doing sequential processing. When users provide flags like "domain=_lt" to take advantage
of sequential processing then there may be an argument like "user gets what is being asked
for" when things are slower, but I am not sure the same can be true for a user
that just wants to run "domain=lt".

To me it seems simpler to require that "_" always appears by itself and that
any flags set by the user using "=" are combined during parsing so that they can be
acted on in a single flow. If indeed users want to do something sequentially
like "unassign all flags and then assign local MBM" then instead of "domain=_l"
I think "domain=_;domain=l" could be used?

Reinette


^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [PATCH v7 24/24] x86/resctrl: Introduce interface to modify assignment states of the groups
  2024-10-04 21:09                   ` Reinette Chatre
@ 2024-10-05  0:23                     ` Moger, Babu
  0 siblings, 0 replies; 96+ messages in thread
From: Moger, Babu @ 2024-10-05  0:23 UTC (permalink / raw)
  To: Reinette Chatre, babu.moger, corbet, fenghua.yu, tglx, mingo, bp,
	dave.hansen
  Cc: x86, hpa, paulmck, rdunlap, tj, peterz, yanjiewtw, kim.phillips,
	lukas.bulwahn, seanjc, jmattson, leitao, jpoimboe,
	rick.p.edgecombe, kirill.shutemov, jithu.joseph, kai.huang,
	kan.liang, daniel.sneddon, pbonzini, sandipan.das, ilpo.jarvinen,
	peternewman, maciej.wieczor-retman, linux-doc, linux-kernel,
	eranian, james.morse

Hi Reinette,

On 10/4/2024 4:09 PM, Reinette Chatre wrote:
> Hi Babu,
> 
> On 10/4/24 12:36 PM, Moger, Babu wrote:
>> On 10/4/2024 11:52 AM, Reinette Chatre wrote:
>>> On 10/4/24 9:38 AM, Moger, Babu wrote:
>>>> On 10/3/2024 9:17 PM, Reinette Chatre wrote:
>>>>> On 10/3/24 6:11 PM, Moger, Babu wrote:
>>>>>> On 10/2/2024 1:19 PM, Reinette Chatre wrote:
>>>>>>> On 9/27/24 10:47 AM, Moger, Babu wrote:
>>>>>>>> On 9/19/2024 12:59 PM, Reinette Chatre wrote:
>>>>>>>>> On 9/4/24 3:21 PM, Babu Moger wrote:
>>>>>>>
>>>>>>>>>> v7: Simplified the parsing (strsep(&token, "//") in rdtgroup_mbm_assign_control_write().
>>>>>>>>>>          Added mutex lock in rdtgroup_mbm_assign_control_write() while processing.
>>>>>>>>>>          Renamed rdtgroup_find_grp to rdtgroup_find_grp_by_name.
>>>>>>>>>>          Fixed rdtgroup_str_to_mon_state to return error for invalid flags.
>>>>>>>>>>          Simplified the calls rdtgroup_assign_cntr by merging few functions earlier.
>>>>>>>>>>          Removed ABMC reference in FS code.
>>>>>>>>>>          Reinette commented about handling the combination of flags like 'lt_' and '_lt'.
>>>>>>>>>>          Not sure if we need to change the behaviour here. Processed them sequencially right now.
>>>>>>>>>>          Users have the liberty to pass the flags. Restricting it might be a problem later.
>>>>>>>>>
>>>>>>>>> Could you please give an example of what problem may be encountered later? An assignment
>>>>>>>>> like "domain=_lt" seems like a contradiction to me since user space essentially asks
>>>>>>>>> for "None of the MBM events" as well as "MBM total event" and "MBM local event".
>>>>>>>>
>>>>>>>> I agree it is contradiction. But user is the one who decides to do that. I think we should allow it. Also, there is some value to it as well.
>>>>>>>>
>>>>>>>> "domain=_lt" This will also reset the counters if the total and local events are assigned earlier this action.
>>>>>>>
>>>>>>> The last sentence is not clear to me. Could you please elaborate what
>>>>>>> you mean with "are assigned earlier this action"?
>>>>>>>
>>>>>>
>>>>>> I think I confused you here. "domain=_lt" is equivalent to "domain=lt".  My reasoning is handling all the combination in the code adds code complexity and leave it the user what he wants to do.
>>>>>
>>>>> hmmm ... and how about "domain=lt_"? Do you think this should also be equivalent to
>>>>> "domain=lt" or perhaps an expectation that counters should be assigned to the two events
>>>>> and then immediately unassigned?
>>>>
>>>> Yes. "domain=lt_" should be "domain=lt".
>>>>
>>>>>
>>>>> Giving user such flexibility may be interpreted as the assignment seen as acting
>>>>> sequentially through the flags provided. Ideally the interface should behave in
>>>>> a predictable way if the goal is to provide flexibility to the user.
>>>>>
>>>>
>>>> My only concern is adding the check now and reverting it back later.
>>>> Basically process the flags sequentially and don't differentiate between the flags. I feel it fits the predictable behavior. No?
>>>
>>> This is the point I was trying to make. If flags are processed sequentially then it would be
>>> predictable behavior and if that is documented expectation then that should be ok. The problem
>>> that I want to highlight is that if predictable sequential processing is the goal then
>>> "domain=_lt" cannot be interpreted the same as "domain="lt_". When flags in "domain=lt_"
>>> are processed sequentially then final state should be "domain=_", no?
>>
>> Yes. that is correct.
>>>
>>> If sequential processing is done then "domain=_lt" means "unassign all counters followed
>>> by assign of counter to local MBM monitoring, followed by assign of counter to total MBM
>>> monitoring". Similarly, "domain=lt_" means "assign a counter to local MBM monitoring, then
>>> assign a counter to total MBM monitoring, then unassign all counters".
>>
>> Yes. That is correct.
>>>
>>> If this sequential processing is the goal then the implementation would still need to be
>>> adapted. Consider, for example, "domain=lt" ... with sequential processing the user
>>> indicates/expects that "local MBM monitoring" has priority if there is only one counter
>>> available, but the current implementation does not process it sequentially and would end up
>>> assigning counter to "total MBM monitoring" first.
>>
>> Sure. Lets accommodate the sequential processing. Process the  flags
>> in the order it is provided. I need to make few changes to
>> rdtgroup_process_flags() to address it. Hopefully, it can be done
>> without much complexity. Thanks
> 
> I doubt that the implementation would be complex but it may take some effort for it
> to be efficient ... taking actions that involve changing kernel and HW state for each
> flag as it is encountered vs. parsing all flags and changing kernel and HW state once.
> 
> The risk is that a simple request like "domain=lt" may take twice as long when
> doing sequential processing. When users provide flags like "domain=_lt" to take advantage
> of sequential processing then there may be an argument like "user gets what is being asked
> for" when things are slower, but I am not sure the same can be true for a user
> that just wants to run "domain=lt".
> 
> To me it seems simpler to require that "_" always appears by itself and that

Ok. Lets go with this approach treating "_" as a special and cannot be 
combined with other flags. Seems simple to implement.

> any flags set by the user using "=" are combined during parsing so that they can be
> acted on in a single flow. If indeed users want to do something sequentially
> like "unassign all flags and then assign local MBM" then instead of "domain=_l"
> I think "domain=_;domain=l" could be used?

Yes. It can be done.
thanks
Babu Moger

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [PATCH v7 22/24] x86/resctrl: Update assignments on event configuration changes
  2024-10-04 15:02             ` Moger, Babu
  2024-10-04 15:53               ` Reinette Chatre
@ 2024-10-08  0:01               ` Moger, Babu
  1 sibling, 0 replies; 96+ messages in thread
From: Moger, Babu @ 2024-10-08  0:01 UTC (permalink / raw)
  To: Reinette Chatre, babu.moger, corbet, fenghua.yu, tglx, mingo, bp,
	dave.hansen
  Cc: x86, hpa, paulmck, rdunlap, tj, peterz, yanjiewtw, kim.phillips,
	lukas.bulwahn, seanjc, jmattson, leitao, jpoimboe,
	rick.p.edgecombe, kirill.shutemov, jithu.joseph, kai.huang,
	kan.liang, daniel.sneddon, pbonzini, sandipan.das, ilpo.jarvinen,
	peternewman, maciej.wieczor-retman, linux-doc, linux-kernel,
	eranian, james.morse

Hi Reinette,

On 10/4/2024 10:02 AM, Moger, Babu wrote:
> Hi Reinette,
> 
> On 10/3/2024 9:17 PM, Reinette Chatre wrote:
>> Hi Babu,
>>
>> On 10/3/24 5:51 PM, Moger, Babu wrote:
>>> Hi Reinette,
>>>
>>> On 10/2/2024 1:20 PM, Reinette Chatre wrote:
>>>> Hi Babu,
>>>>
>>>> On 9/27/24 9:22 AM, Moger, Babu wrote:
>>>>> Hi Reinitte,
>>>>>
>>>>> On 9/19/2024 12:45 PM, Reinette Chatre wrote:
>>>>>> On 9/4/24 3:21 PM, Babu Moger wrote:
>>>>
>>>> ...
>>>>
>>>>>>> +}
>>>>>>> +
>>>>>>>     static int rdtgroup_mbm_assign_mode_show(struct 
>>>>>>> kernfs_open_file *of,
>>>>>>>                          struct seq_file *s, void *v)
>>>>>>>     {
>>>>>>> @@ -1793,12 +1802,48 @@ static int 
>>>>>>> mbm_local_bytes_config_show(struct kernfs_open_file *of,
>>>>>>>         return 0;
>>>>>>>     }
>>>>>>>     +static int resctrl_mbm_event_update_assign(struct 
>>>>>>> rdt_resource *r,
>>>>>>> +                       struct rdt_mon_domain *d, u32 evtid)
>>>>>>> +{
>>>>>>> +    struct rdt_mon_domain *dom;
>>>>>>> +    struct rdtgroup *rdtg;
>>>>>>> +    int ret = 0;
>>>>>>> +
>>>>>>> +    if (!resctrl_arch_mbm_cntr_assign_enabled(r))
>>>>>>> +        return ret;
>>>>>>> +
>>>>>>> +    list_for_each_entry(rdtg, &rdt_all_groups, rdtgroup_list) {
>>>>>>> +        struct rdtgroup *crg;
>>>>>>> +
>>>>>>> +        list_for_each_entry(dom, &r->mon_domains, hdr.list) {
>>>>>>> +            if (d == dom && resctrl_mbm_event_assigned(rdtg, 
>>>>>>> dom, evtid)) {
>>>>>>> +                ret = rdtgroup_assign_cntr(r, rdtg, dom, evtid);
>>>>>>> +                if (ret)
>>>>>>> +                    goto out_done;
>>>>>>> +            }
>>>>>>> +        }
>>>>>>> +
>>>>>>> +        list_for_each_entry(crg, &rdtg->mon.crdtgrp_list, 
>>>>>>> mon.crdtgrp_list) {
>>>>>>> +            list_for_each_entry(dom, &r->mon_domains, hdr.list) {
>>>>>>> +                if (d == dom && resctrl_mbm_event_assigned(crg, 
>>>>>>> dom, evtid)) {
>>>>>>> +                    ret = rdtgroup_assign_cntr(r, crg, dom, evtid);
>>>>>>> +                    if (ret)
>>>>>>> +                        goto out_done;
>>>>>>> +                }
>>>>>>> +            }
>>>>>>> +        }
>>>>>>> +    }
>>>>>>> +
>>>>>>> +out_done:
>>>>>>> +    return ret;
>>>>>>> +}
>>>>>>>       static void mbm_config_write_domain(struct rdt_resource *r,
>>>>>>>                         struct rdt_mon_domain *d, u32 evtid, u32 
>>>>>>> val)
>>>>>>>     {
>>>>>>>         struct mon_config_info mon_info = {0};
>>>>>>>         u32 config_val;
>>>>>>> +    int ret;
>>>>>>>           /*
>>>>>>>          * Check the current config value first. If both are the 
>>>>>>> same then
>>>>>>> @@ -1822,6 +1867,14 @@ static void mbm_config_write_domain(struct 
>>>>>>> rdt_resource *r,
>>>>>>>                       resctrl_arch_event_config_set,
>>>>>>>                       &mon_info, 1);
>>>>>>>     +    /*
>>>>>>> +     * Counter assignments needs to be updated to match the event
>>>>>>> +     * configuration.
>>>>>>> +     */
>>>>>>> +    ret = resctrl_mbm_event_update_assign(r, d, evtid);
>>>>>>> +    if (ret)
>>>>>>> +        rdt_last_cmd_puts("Assign failed, event will be 
>>>>>>> Unavailable\n");
>>>>>>> +
>>>>>>
>>>>>> This does not look right. This function _just_ returned from an 
>>>>>> IPI on appropriate CPU and then
>>>>>> starts flow to do _another_ IPI to run code that could have just 
>>>>>> been run during previous IPI.
>>>>>> The whole flow to call rdgroup_assign_cntr() also seems like an 
>>>>>> obfuscated way to call resctrl_arch_assign_cntr()
>>>>>> to just reconfigure the counter (not actually assign it).
>>>>>> Could it perhaps call some resctrl fs code via single IPI that in 
>>>>>> turn calls the appropriate arch code to set the new
>>>>>> mon event config and re-configures the counter?
>>>>>>
>>>>>
>>>>> I think we can simplify this. We dont have to go thru all the 
>>>>> rdtgroups to figure out if the counter is assigned or not.
>>>>>
>>>>> I can move the code inside mon_config_write() after the call 
>>>>> mbm_config_write_domain().
>>>>
>>>> mbm_config_write_domain() already does an IPI so if I understand 
>>>> correctly this will still
>>>> result in a second IPI that seems unnecessary to me. Why can the 
>>>> reconfigure not be done
>>>> with a single IPI?
>>>
>>> I think we can try updating the counter configuration in the same 
>>> IPI. Let me try that.
>>>
>>
>> Thank you very much.
>>
>>>>
>>>>>
>>>>> Using the domain bitmap we can figure out which of the counters are 
>>>>> assigned in the domain. I can use the hardware help to update the 
>>>>> assignment for each counter.  This has to be done via IPI.
>>>>> Something like this.
>>>>>
>>>>> static void rdtgroup_abmc_dom_cfg(void *info)
>>>>> {
>>>>>       union l3_qos_abmc_cfg *abmc_cfg = info;
>>>>>           u32 val = abmc_cfg->bw_type;
>>>>>
>>>>>           /* Get the counter configuration */
>>>>>       wrmsrl(MSR_IA32_L3_QOS_ABMC_CFG, *abmc_cfg);
>>>>>       rdmsrl(MGR_IA32_L3_QOS_ABMC_DSC, *abmc_cfg);
>>>>>
>>>>
>>>> This is not clear to me. I expected MSR_IA32_L3_QOS_ABMC_DSC
>>>> to return the bw_type that was just written to
>>>> MSR_IA32_L3_QOS_ABMC_CFG.
>>>>
>>>> It is also not clear to me how these registers can be
>>>> used without a valid counter ID. I seem to miss
>>>> the context of this call.
>>>
>>> Event configuration changes are domain specific. We have the domain 
>>> data structure and we have the bitmap(mbm_cntr_map) in 
>>> rdt_mon_domain. This bitmap tells us which of the counters in the 
>>> domain are configured. So, we can get the  counter id from this 
>>> bitmap. Using the counter id, we can query the hardware to get the 
>>> current configuration by this sequence.
>>>
>>> /* Get the counter configuration */
>>> for (i=0; i< r->mon.num_mbm_cntrs; i++) {
>>>   if (test_bit(i, d->mbm_cntr_map)) {
>>>     abmc_cfg->cntr_id = i;
>>>     abmc_cfg.split.cfg_en = 0;
>>>     wrmsrl(MSR_IA32_L3_QOS_ABMC_CFG, *abmc_cfg);
>>>
>>>     /* Reading L3_QOS_ABMC_DSC returns the configuration of the
>>>      * counter id specified in L3_QOS_ABMC_CFG.cntr_id with RMID(bw_src)
>>>      * and event configuration(bw_type)  Get the counter configuration
>>>      */
>>>     rdmsrl(MGR_IA32_L3_QOS_ABMC_DSC, *abmc_cfg);
>>>
>>
>> Apologies but I do still have the same question as before ... wouldn't
>> MSR_IA32_L3_QOS_ABMC_DSC return the value that was just written to
>> MSR_IA32_L3_QOS_ABMC_CFG? If so, the previous wrmsrl() would set the
>> counter's bw_type to what is set in *abmc_cfg provided as parameter. It
>> thus still seems unclear why reading it back is necessary.
> 
> Yes. It is not clear in the spec.
> 
> QOS_ABMC_DSC is read-only MSR and used only to get the configured 
> counter id information.
> 
> The configuration is only updated when QOS_ABMC_CFG.cfg_en = 1.
> 
> When you write QOS_ABMC_CFG with cntr_id = n, abmc_cfg.split.cfg_en = 0 
> and reading the QOS_ABMC_DSC back will return the configuration of 
> cntr_id. Note that when abmc_cfg.split.cfg_en = 0, it will not change 
> the counter id configuration. when you read QOS_ABMC_DSC back here, we 
> will get bw_type (event config), bw_src (RMID) etc.
> 
> union l3_qos_abmc_cfg {
>      struct {
>          unsigned long bw_type  :32,
>                    bw_src   :12,
>                    reserved1: 3,
>                    is_clos  : 1,
>                    cntr_id  : 5,
>                    reserved : 9,
>                    cntr_en  : 1,
>                    cfg_en   : 1;
>      } split;
>      unsigned long full;
> };
> 
> We need to update bw_tyoe (event config). When we write QOS_ABMC_CFG 
> back with abmc_cfg.split.cfg_en = 1, the configuration will be updated.
> 
> if (abmc_cfg->bw_type != val) {
>         abmc_cfg->bw_type = val;
>         abmc_cfg.split.cfg_en = 1;
>         wrmsrl(MSR_IA32_L3_QOS_ABMC_CFG, *abmc_cfg);
>     }
> 
> I will send you the code later today.
> 

Found out that we cannot do the way we disussed above.
Event update can be either local event or total event.
We need to update the counters that are only assigned to event 
type(total or local). That information is not avilable in the domain or 
by quering the hardware. Need to search in resctrl groups for that 
information.

Updated the patch for that. All the update is done in the same IPI.
Will send the series later this week.

Thanks
Babu





^ permalink raw reply	[flat|nested] 96+ messages in thread

end of thread, other threads:[~2024-10-08  0:01 UTC | newest]

Thread overview: 96+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-09-04 22:21 [PATCH v7 00/24] x86/resctrl : Support AMD Assignable Bandwidth Monitoring Counters (ABMC) Babu Moger
2024-09-04 22:21 ` [PATCH v7 01/24] x86/cpufeatures: Add support for " Babu Moger
2024-09-04 22:21 ` [PATCH v7 02/24] x86/resctrl: Add ABMC feature in the command line options Babu Moger
2024-09-19 16:00   ` Reinette Chatre
2024-09-23 14:21     ` Moger, Babu
2024-09-04 22:21 ` [PATCH v7 03/24] x86/resctrl: Consolidate monitoring related data from rdt_resource Babu Moger
2024-09-19 16:03   ` Reinette Chatre
2024-09-04 22:21 ` [PATCH v7 04/24] x86/resctrl: Detect Assignable Bandwidth Monitoring feature details Babu Moger
2024-09-19 16:16   ` Reinette Chatre
2024-09-23 14:37     ` Moger, Babu
2024-09-04 22:21 ` [PATCH v7 05/24] x86/resctrl: Introduce resctrl_file_fflags_init() to initialize fflags Babu Moger
2024-09-04 22:21 ` [PATCH v7 06/24] x86/resctrl: Add support to enable/disable AMD ABMC feature Babu Moger
2024-09-19 16:22   ` Reinette Chatre
2024-09-23 15:30     ` Moger, Babu
2024-09-04 22:21 ` [PATCH v7 07/24] x86/resctrl: Introduce the interface to display monitor mode Babu Moger
2024-09-19 16:28   ` Reinette Chatre
2024-09-23 16:01     ` Moger, Babu
2024-09-04 22:21 ` [PATCH v7 08/24] x86/resctrl: Introduce interface to display number of monitoring counters Babu Moger
2024-09-19 16:32   ` Reinette Chatre
2024-09-23 16:23     ` Moger, Babu
2024-09-04 22:21 ` [PATCH v7 09/24] x86/resctrl: Introduce bitmap mbm_cntr_free_map to track assignable counters Babu Moger
2024-09-19 16:42   ` Reinette Chatre
2024-09-23 18:33     ` Moger, Babu
2024-09-23 22:28       ` Reinette Chatre
2024-09-24 13:58         ` Moger, Babu
2024-09-24 16:25   ` Peter Newman
2024-09-24 17:01     ` Moger, Babu
2024-09-04 22:21 ` [PATCH v7 10/24] x86/resctrl: Introduce mbm_total_cfg and mbm_local_cfg in struct rdt_hw_mon_domain Babu Moger
2024-09-19 16:51   ` Reinette Chatre
2024-09-23 18:43     ` Moger, Babu
2024-09-04 22:21 ` [PATCH v7 11/24] x86/resctrl: Remove MSR reading of event configuration value Babu Moger
2024-09-19 16:55   ` Reinette Chatre
2024-09-23 18:45     ` Moger, Babu
2024-09-04 22:21 ` [PATCH v7 12/24] x86/resctrl: Introduce mbm_cntr_map to track counters at domain Babu Moger
2024-09-04 22:21 ` [PATCH v7 13/24] x86/resctrl: Add data structures and definitions for ABMC assignment Babu Moger
2024-09-19 17:08   ` Reinette Chatre
2024-09-23 20:21     ` Moger, Babu
2024-09-23 22:30       ` Reinette Chatre
2024-09-24 14:51         ` Moger, Babu
2024-09-04 22:21 ` [PATCH v7 14/24] x86/resctrl: Introduce cntr_id in mongroup for assignments Babu Moger
2024-09-04 22:21 ` [PATCH v7 15/24] x86/resctrl: Implement resctrl_arch_assign_cntr to assign a counter with ABMC Babu Moger
2024-09-19 17:13   ` Reinette Chatre
2024-09-23 21:03     ` Moger, Babu
2024-09-23 22:29       ` Reinette Chatre
2024-09-24 14:07         ` Moger, Babu
2024-09-04 22:21 ` [PATCH v7 16/24] x86/resctrl: Add the interface to assign/update counter assignment Babu Moger
2024-09-19 17:20   ` Reinette Chatre
2024-09-26 16:28     ` Moger, Babu
2024-09-26 16:46       ` Reinette Chatre
2024-09-26 16:59         ` Moger, Babu
2024-09-27  1:48           ` Reinette Chatre
2024-09-04 22:21 ` [PATCH v7 17/24] x86/resctrl: Add the interface to unassign a MBM counter Babu Moger
2024-09-19 17:26   ` Reinette Chatre
2024-09-26 16:56     ` Moger, Babu
2024-09-04 22:21 ` [PATCH v7 18/24] x86/resctrl: Auto Assign/unassign counters when mbm_cntr_assign is enabled Babu Moger
2024-09-19 17:29   ` Reinette Chatre
2024-09-26 18:48     ` Moger, Babu
2024-09-04 22:21 ` [PATCH v7 19/24] x86/resctrl: Report "Unassigned" for MBM events in mbm_cntr_assign mode Babu Moger
2024-09-19 17:31   ` Reinette Chatre
2024-09-26 19:16     ` Moger, Babu
2024-09-27  1:50       ` Reinette Chatre
2024-09-27 13:40         ` Moger, Babu
2024-09-04 22:21 ` [PATCH v7 20/24] x86/resctrl: Introduce the interface to switch between monitor modes Babu Moger
2024-09-19 17:38   ` Reinette Chatre
2024-09-26 19:39     ` Moger, Babu
2024-09-27  1:51       ` Reinette Chatre
2024-09-27 13:26         ` Moger, Babu
2024-09-27 15:07           ` Reinette Chatre
2024-09-04 22:21 ` [PATCH v7 21/24] x86/resctrl: Configure mbm_cntr_assign mode if supported Babu Moger
2024-09-19 17:43   ` Reinette Chatre
2024-09-27 14:37     ` Moger, Babu
2024-09-04 22:21 ` [PATCH v7 22/24] x86/resctrl: Update assignments on event configuration changes Babu Moger
2024-09-19 17:45   ` Reinette Chatre
2024-09-27 16:22     ` Moger, Babu
2024-10-02 18:20       ` Reinette Chatre
2024-10-04  0:51         ` Moger, Babu
2024-10-04  2:17           ` Reinette Chatre
2024-10-04 15:02             ` Moger, Babu
2024-10-04 15:53               ` Reinette Chatre
2024-10-08  0:01               ` Moger, Babu
2024-09-04 22:21 ` [PATCH v7 23/24] x86/resctrl: Introduce interface to list assignment states of all the groups Babu Moger
2024-09-19 17:53   ` Reinette Chatre
2024-09-27 17:06     ` Moger, Babu
2024-09-04 22:21 ` [PATCH v7 24/24] x86/resctrl: Introduce interface to modify assignment states of " Babu Moger
2024-09-19 17:59   ` Reinette Chatre
2024-09-27 17:47     ` Moger, Babu
2024-10-02 18:19       ` Reinette Chatre
2024-10-04  1:11         ` Moger, Babu
2024-10-04  2:17           ` Reinette Chatre
2024-10-04 16:38             ` Moger, Babu
2024-10-04 16:52               ` Reinette Chatre
2024-10-04 19:36                 ` Moger, Babu
2024-10-04 21:09                   ` Reinette Chatre
2024-10-05  0:23                     ` Moger, Babu
2024-09-19 18:00 ` [PATCH v7 00/24] x86/resctrl : Support AMD Assignable Bandwidth Monitoring Counters (ABMC) Reinette Chatre
2024-09-27 18:11   ` Moger, Babu

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).