* [PATCH v2 0/4] Introduce a quirk to control memslot zap behavior
@ 2024-07-03 2:09 Yan Zhao
2024-07-03 2:10 ` [PATCH v2 1/4] KVM: x86/mmu: " Yan Zhao
` (3 more replies)
0 siblings, 4 replies; 13+ messages in thread
From: Yan Zhao @ 2024-07-03 2:09 UTC (permalink / raw)
To: pbonzini, seanjc
Cc: rick.p.edgecombe, kai.huang, isaku.yamahata, dmatlack, sagis,
erdemaktas, graf, linux-kernel, kvm, Yan Zhao
Today "zapping only leaf SPTEs with memslot range" ("zap-slot-leafs-only"
for short) on moving/deleting a memslot is not done. Instead, KVM opts to
invalidate all page tables and generate fresh new ones based on the new
memslot layout ("zap-all" for short). This "zap-all" behavior is of low
overhead for most use cases, and is adopted primarily due to a bug which
caused VM instability when a VM is with Nvidia Geforce GPU assigned (see
link in patch 1).
However, the "zap-all" behavior is not desired for certain specific
scenarios. e.g.
- It's not viable for TDX,
a) TDX requires root page of private page table remains unaltered
throughout the TD life cycle.
b) TDX mandates that leaf entries in private page table must be zapped
prior to non-leaf entries.
c) TDX requires re-accepting of private pages after page dropping.
- It's not performant for scenarios involving frequent deletion and
re-adding of numerous small memslots.
This series therefore introduces the KVM_X86_QUIRK_SLOT_ZAP_ALL quirk,
enabling users to control the behavior of memslot zapping when a memslot is
moved/deleted for VMs of type KVM_X86_DEFAULT_VM.
The quirk is turned on by default for VMs of type KVM_X86_DEFAULT_VM,
leading to "zap-all" behavior.
Users have the option to turn off the quirk. Doing so will have KVM go
"zap-slot-leafs-only" on memslot moving/deleting.
KVM will always select "zap-slot-leafs-only" as if the quirk is disabled
for non-KVM_X86_DEFAULT_VM VMs for reasons explained in patch 1.
This series has been tested with
- Normal VMs
w/ and w/o device assignment, and kvm selftests
- TDX guests.
Tested with shared device assignment and guest memory hot-plug/unplug.
It can be applied to both kvm/queue and kvm-coco-queue.
Patch 1: KVM changes.
Patch 2-4: Selftests updates. Verify memslot move/deletion functionality
with the quirk enabled/disabled.
Changelog:
v1 --> v2:
- Make KVM behave as if the quirk is always disabled on
non-KVM_X86_DEFAULT_VM VMs. (Sean, Rick)
- Removed the patch for selftest private_mem_kvm_exits_test, since that
selftest is for VM type KVM_X86_SW_PROTECTED_VM.
v1: https://lore.kernel.org/all/20240613060708.11761-1-yan.y.zhao@intel.com
Yan Zhao (4):
KVM: x86/mmu: Introduce a quirk to control memslot zap behavior
KVM: selftests: Test slot move/delete with slot zap quirk
enabled/disabled
KVM: selftests: Allow slot modification stress test with quirk
disabled
KVM: selftests: Test memslot move in memslot_perf_test with quirk
disabled
Documentation/virt/kvm/api.rst | 8 ++++
arch/x86/include/asm/kvm_host.h | 3 +-
arch/x86/include/uapi/asm/kvm.h | 1 +
arch/x86/kvm/mmu/mmu.c | 42 ++++++++++++++++++-
.../kvm/memslot_modification_stress_test.c | 19 ++++++++-
.../testing/selftests/kvm/memslot_perf_test.c | 12 +++++-
.../selftests/kvm/set_memory_region_test.c | 29 +++++++++----
7 files changed, 101 insertions(+), 13 deletions(-)
--
2.43.2
^ permalink raw reply [flat|nested] 13+ messages in thread
* [PATCH v2 1/4] KVM: x86/mmu: Introduce a quirk to control memslot zap behavior
2024-07-03 2:09 [PATCH v2 0/4] Introduce a quirk to control memslot zap behavior Yan Zhao
@ 2024-07-03 2:10 ` Yan Zhao
2024-09-23 18:37 ` Sean Christopherson
2024-07-03 2:11 ` [PATCH v2 2/4] KVM: selftests: Test slot move/delete with slot zap quirk enabled/disabled Yan Zhao
` (2 subsequent siblings)
3 siblings, 1 reply; 13+ messages in thread
From: Yan Zhao @ 2024-07-03 2:10 UTC (permalink / raw)
To: pbonzini, seanjc
Cc: rick.p.edgecombe, kai.huang, isaku.yamahata, dmatlack, sagis,
erdemaktas, graf, linux-kernel, kvm, Yan Zhao
Introduce the quirk KVM_X86_QUIRK_SLOT_ZAP_ALL to allow users to select
KVM's behavior when a memslot is moved or deleted for KVM_X86_DEFAULT_VM
VMs. Make sure KVM behave as if the quirk is always disabled for
non-KVM_X86_DEFAULT_VM VMs.
The KVM_X86_QUIRK_SLOT_ZAP_ALL quirk offers two behavior options:
- when enabled: Invalidate/zap all SPTEs ("zap-all"),
- when disabled: Precisely zap only the leaf SPTEs within the range of the
moving/deleting memory slot ("zap-slot-leafs-only").
"zap-all" is today's KVM behavior to work around a bug [1] where the
changing the zapping behavior of memslot move/deletion would cause VM
instability for VMs with an Nvidia GPU assigned; while
"zap-slot-leafs-only" allows for more precise zapping of SPTEs within the
memory slot range, improving performance in certain scenarios [2], and
meeting the functional requirements for TDX.
Previous attempts to select "zap-slot-leafs-only" include a per-VM
capability approach [3] (which was not preferred because the root cause of
the bug remained unidentified) and a per-memslot flag approach [4]. Sean
and Paolo finally recommended the implementation of this quirk and
explained that it's the least bad option [5].
By default, the quirk is enabled on KVM_X86_DEFAULT_VM VMs to use
"zap-all". Users have the option to disable the quirk to select
"zap-slot-leafs-only" for specific KVM_X86_DEFAULT_VM VMs that are
unaffected by this bug.
For non-KVM_X86_DEFAULT_VM VMs, the "zap-slot-leafs-only" behavior is
always selected without user's opt-in, regardless of if the user opts for
"zap-all".
This is because it is assumed until proven otherwise that non-
KVM_X86_DEFAULT_VM VMs will not be exposed to the bug [1], and most
importantly, it's because TDX must have "zap-slot-leafs-only" always
selected. In TDX's case a memslot's GPA range can be a mixture of "private"
or "shared" memory. Shared is roughly analogous to how EPT is handled for
normal VMs, but private GPAs need lots of special treatment:
1) "zap-all" would require to zap private root page or non-leaf entries or
at least leaf-entries beyond the deleting memslot scope. However, TDX
demands that the root page of the private page table remains unchanged,
with leaf entries being zapped before non-leaf entries, and any dropped
private guest pages must be re-accepted by the guest.
2) if "zap-all" zaps only shared page tables, it would result in private
pages still being mapped when the memslot is gone. This may affect even
other processes if later the gmem fd was whole punched, causing the
pages being freed on the host while still mapped in the TD, because
there's no pgoff to the gfn information to zap the private page table
after memslot is gone.
So, simply go "zap-slot-leafs-only" as if the quirk is always disabled for
non-KVM_X86_DEFAULT_VM VMs to avoid manual opt-in for every VM type [6] or
complicating quirk disabling interface (current quirk disabling interface
is limited, no way to query quirks, or force them to be disabled).
Add a new function kvm_mmu_zap_memslot_leafs() to implement
"zap-slot-leafs-only". This function does not call kvm_unmap_gfn_range(),
bypassing special handling to APIC_ACCESS_PAGE_PRIVATE_MEMSLOT, as
1) The APIC_ACCESS_PAGE_PRIVATE_MEMSLOT cannot be created by users, nor can
it be moved. It is only deleted by KVM when APICv is permanently
inhibited.
2) kvm_vcpu_reload_apic_access_page() effectively does nothing when
APIC_ACCESS_PAGE_PRIVATE_MEMSLOT is deleted.
3) Avoid making all cpus request of KVM_REQ_APIC_PAGE_RELOAD can save on
costly IPIs.
Suggested-by: Kai Huang <kai.huang@intel.com>
Suggested-by: Sean Christopherson <seanjc@google.com>
Suggested-by: Paolo Bonzini <pbonzini@redhat.com>
Link: https://patchwork.kernel.org/project/kvm/patch/20190205210137.1377-11-sean.j.christopherson@intel.com [1]
Link: https://patchwork.kernel.org/project/kvm/patch/20190205210137.1377-11-sean.j.christopherson@intel.com/#25054908 [2]
Link: https://lore.kernel.org/kvm/20200713190649.GE29725@linux.intel.com/T/#mabc0119583dacf621025e9d873c85f4fbaa66d5c [3]
Link: https://lore.kernel.org/all/20240515005952.3410568-3-rick.p.edgecombe@intel.com [4]
Link: https://lore.kernel.org/all/7df9032d-83e4-46a1-ab29-6c7973a2ab0b@redhat.com [5]
Link: https://lore.kernel.org/all/ZnGa550k46ow2N3L@google.com [6]
Co-developed-by: Rick Edgecombe <rick.p.edgecombe@intel.com>
Signed-off-by: Rick Edgecombe <rick.p.edgecombe@intel.com>
Signed-off-by: Yan Zhao <yan.y.zhao@intel.com>
---
Documentation/virt/kvm/api.rst | 8 +++++++
arch/x86/include/asm/kvm_host.h | 3 ++-
arch/x86/include/uapi/asm/kvm.h | 1 +
arch/x86/kvm/mmu/mmu.c | 42 ++++++++++++++++++++++++++++++++-
4 files changed, 52 insertions(+), 2 deletions(-)
diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
index a71d91978d9e..7c1248142a5a 100644
--- a/Documentation/virt/kvm/api.rst
+++ b/Documentation/virt/kvm/api.rst
@@ -7997,6 +7997,14 @@ KVM_X86_QUIRK_MWAIT_NEVER_UD_FAULTS By default, KVM emulates MONITOR/MWAIT (if
guest CPUID on writes to MISC_ENABLE if
KVM_X86_QUIRK_MISC_ENABLE_NO_MWAIT is
disabled.
+
+KVM_X86_QUIRK_SLOT_ZAP_ALL By default, KVM invalidates all SPTEs in
+ fast way for memslot deletion when VM type
+ is KVM_X86_DEFAULT_VM.
+ When this quirk is disabled or when VM type
+ is other than KVM_X86_DEFAULT_VM, KVM zaps
+ only leaf SPTEs that are within the range of
+ the memslot being deleted.
=================================== ============================================
7.32 KVM_CAP_MAX_VCPU_ID
diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 9bb2e164c523..a40577911744 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -2327,7 +2327,8 @@ int memslot_rmap_alloc(struct kvm_memory_slot *slot, unsigned long npages);
KVM_X86_QUIRK_OUT_7E_INC_RIP | \
KVM_X86_QUIRK_MISC_ENABLE_NO_MWAIT | \
KVM_X86_QUIRK_FIX_HYPERCALL_INSN | \
- KVM_X86_QUIRK_MWAIT_NEVER_UD_FAULTS)
+ KVM_X86_QUIRK_MWAIT_NEVER_UD_FAULTS | \
+ KVM_X86_QUIRK_SLOT_ZAP_ALL)
/*
* KVM previously used a u32 field in kvm_run to indicate the hypercall was
diff --git a/arch/x86/include/uapi/asm/kvm.h b/arch/x86/include/uapi/asm/kvm.h
index 988b5204d636..6a399859c42d 100644
--- a/arch/x86/include/uapi/asm/kvm.h
+++ b/arch/x86/include/uapi/asm/kvm.h
@@ -438,6 +438,7 @@ struct kvm_sync_regs {
#define KVM_X86_QUIRK_MISC_ENABLE_NO_MWAIT (1 << 4)
#define KVM_X86_QUIRK_FIX_HYPERCALL_INSN (1 << 5)
#define KVM_X86_QUIRK_MWAIT_NEVER_UD_FAULTS (1 << 6)
+#define KVM_X86_QUIRK_SLOT_ZAP_ALL (1 << 7)
#define KVM_STATE_NESTED_FORMAT_VMX 0
#define KVM_STATE_NESTED_FORMAT_SVM 1
diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index 1432deb75cbb..be3a82a32295 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -6915,10 +6915,50 @@ void kvm_arch_flush_shadow_all(struct kvm *kvm)
kvm_mmu_zap_all(kvm);
}
+/*
+ * Zapping leaf SPTEs with memslot range when a memslot is moved/deleted.
+ *
+ * Zapping non-leaf SPTEs, a.k.a. not-last SPTEs, isn't required, worst
+ * case scenario we'll have unused shadow pages lying around until they
+ * are recycled due to age or when the VM is destroyed.
+ */
+static void kvm_mmu_zap_memslot_leafs(struct kvm *kvm, struct kvm_memory_slot *slot)
+{
+ struct kvm_gfn_range range = {
+ .slot = slot,
+ .start = slot->base_gfn,
+ .end = slot->base_gfn + slot->npages,
+ .may_block = true,
+ };
+ bool flush = false;
+
+ write_lock(&kvm->mmu_lock);
+
+ if (kvm_memslots_have_rmaps(kvm))
+ flush = kvm_handle_gfn_range(kvm, &range, kvm_zap_rmap);
+
+ if (tdp_mmu_enabled)
+ flush = kvm_tdp_mmu_unmap_gfn_range(kvm, &range, flush);
+
+ if (flush)
+ kvm_flush_remote_tlbs_memslot(kvm, slot);
+
+ write_unlock(&kvm->mmu_lock);
+}
+
+static inline bool kvm_memslot_flush_zap_all(struct kvm *kvm)
+{
+ return kvm->arch.vm_type == KVM_X86_DEFAULT_VM &&
+ kvm_check_has_quirk(kvm, KVM_X86_QUIRK_SLOT_ZAP_ALL);
+}
+
void kvm_arch_flush_shadow_memslot(struct kvm *kvm,
struct kvm_memory_slot *slot)
{
- kvm_mmu_zap_all_fast(kvm);
+ if (kvm_memslot_flush_zap_all(kvm))
+ kvm_mmu_zap_all_fast(kvm);
+ else
+ kvm_mmu_zap_memslot_leafs(kvm, slot);
}
void kvm_mmu_invalidate_mmio_sptes(struct kvm *kvm, u64 gen)
--
2.43.2
^ permalink raw reply related [flat|nested] 13+ messages in thread
* [PATCH v2 2/4] KVM: selftests: Test slot move/delete with slot zap quirk enabled/disabled
2024-07-03 2:09 [PATCH v2 0/4] Introduce a quirk to control memslot zap behavior Yan Zhao
2024-07-03 2:10 ` [PATCH v2 1/4] KVM: x86/mmu: " Yan Zhao
@ 2024-07-03 2:11 ` Yan Zhao
2024-07-03 2:12 ` [PATCH v2 3/4] KVM: selftests: Allow slot modification stress test with quirk disabled Yan Zhao
2024-07-03 2:12 ` [PATCH v2 4/4] KVM: selftests: Test memslot move in memslot_perf_test " Yan Zhao
3 siblings, 0 replies; 13+ messages in thread
From: Yan Zhao @ 2024-07-03 2:11 UTC (permalink / raw)
To: pbonzini, seanjc
Cc: rick.p.edgecombe, kai.huang, isaku.yamahata, dmatlack, sagis,
erdemaktas, graf, linux-kernel, kvm, Yan Zhao
Update set_memory_region_test to make sure memslot move and deletion
function correctly both when slot zap quirk KVM_X86_QUIRK_SLOT_ZAP_ALL is
enabled and disabled.
Signed-off-by: Yan Zhao <yan.y.zhao@intel.com>
---
.../selftests/kvm/set_memory_region_test.c | 29 ++++++++++++++-----
1 file changed, 21 insertions(+), 8 deletions(-)
diff --git a/tools/testing/selftests/kvm/set_memory_region_test.c b/tools/testing/selftests/kvm/set_memory_region_test.c
index bb8002084f52..a8267628e9ed 100644
--- a/tools/testing/selftests/kvm/set_memory_region_test.c
+++ b/tools/testing/selftests/kvm/set_memory_region_test.c
@@ -175,7 +175,7 @@ static void guest_code_move_memory_region(void)
GUEST_DONE();
}
-static void test_move_memory_region(void)
+static void test_move_memory_region(bool disable_slot_zap_quirk)
{
pthread_t vcpu_thread;
struct kvm_vcpu *vcpu;
@@ -184,6 +184,9 @@ static void test_move_memory_region(void)
vm = spawn_vm(&vcpu, &vcpu_thread, guest_code_move_memory_region);
+ if (disable_slot_zap_quirk)
+ vm_enable_cap(vm, KVM_CAP_DISABLE_QUIRKS2, KVM_X86_QUIRK_SLOT_ZAP_ALL);
+
hva = addr_gpa2hva(vm, MEM_REGION_GPA);
/*
@@ -266,7 +269,7 @@ static void guest_code_delete_memory_region(void)
GUEST_ASSERT(0);
}
-static void test_delete_memory_region(void)
+static void test_delete_memory_region(bool disable_slot_zap_quirk)
{
pthread_t vcpu_thread;
struct kvm_vcpu *vcpu;
@@ -276,6 +279,9 @@ static void test_delete_memory_region(void)
vm = spawn_vm(&vcpu, &vcpu_thread, guest_code_delete_memory_region);
+ if (disable_slot_zap_quirk)
+ vm_enable_cap(vm, KVM_CAP_DISABLE_QUIRKS2, KVM_X86_QUIRK_SLOT_ZAP_ALL);
+
/* Delete the memory region, the guest should not die. */
vm_mem_region_delete(vm, MEM_REGION_SLOT);
wait_for_vcpu();
@@ -553,7 +559,10 @@ int main(int argc, char *argv[])
{
#ifdef __x86_64__
int i, loops;
+ int j, disable_slot_zap_quirk = 0;
+ if (kvm_check_cap(KVM_CAP_DISABLE_QUIRKS2) & KVM_X86_QUIRK_SLOT_ZAP_ALL)
+ disable_slot_zap_quirk = 1;
/*
* FIXME: the zero-memslot test fails on aarch64 and s390x because
* KVM_RUN fails with ENOEXEC or EFAULT.
@@ -579,13 +588,17 @@ int main(int argc, char *argv[])
else
loops = 10;
- pr_info("Testing MOVE of in-use region, %d loops\n", loops);
- for (i = 0; i < loops; i++)
- test_move_memory_region();
+ for (j = 0; j <= disable_slot_zap_quirk; j++) {
+ pr_info("Testing MOVE of in-use region, %d loops, slot zap quirk %s\n",
+ loops, j ? "disabled" : "enabled");
+ for (i = 0; i < loops; i++)
+ test_move_memory_region(!!j);
- pr_info("Testing DELETE of in-use region, %d loops\n", loops);
- for (i = 0; i < loops; i++)
- test_delete_memory_region();
+ pr_info("Testing DELETE of in-use region, %d loops, slot zap quirk %s\n",
+ loops, j ? "disabled" : "enabled");
+ for (i = 0; i < loops; i++)
+ test_delete_memory_region(!!j);
+ }
#endif
return 0;
--
2.43.2
^ permalink raw reply related [flat|nested] 13+ messages in thread
* [PATCH v2 3/4] KVM: selftests: Allow slot modification stress test with quirk disabled
2024-07-03 2:09 [PATCH v2 0/4] Introduce a quirk to control memslot zap behavior Yan Zhao
2024-07-03 2:10 ` [PATCH v2 1/4] KVM: x86/mmu: " Yan Zhao
2024-07-03 2:11 ` [PATCH v2 2/4] KVM: selftests: Test slot move/delete with slot zap quirk enabled/disabled Yan Zhao
@ 2024-07-03 2:12 ` Yan Zhao
2024-09-24 12:26 ` Aishwarya TCV
2024-07-03 2:12 ` [PATCH v2 4/4] KVM: selftests: Test memslot move in memslot_perf_test " Yan Zhao
3 siblings, 1 reply; 13+ messages in thread
From: Yan Zhao @ 2024-07-03 2:12 UTC (permalink / raw)
To: pbonzini, seanjc
Cc: rick.p.edgecombe, kai.huang, isaku.yamahata, dmatlack, sagis,
erdemaktas, graf, linux-kernel, kvm, Yan Zhao
Add a new user option to memslot_modification_stress_test to allow testing
with slot zap quirk KVM_X86_QUIRK_SLOT_ZAP_ALL disabled.
Signed-off-by: Yan Zhao <yan.y.zhao@intel.com>
---
.../kvm/memslot_modification_stress_test.c | 19 +++++++++++++++++--
1 file changed, 17 insertions(+), 2 deletions(-)
diff --git a/tools/testing/selftests/kvm/memslot_modification_stress_test.c b/tools/testing/selftests/kvm/memslot_modification_stress_test.c
index 05fcf902e067..c6f22ded4c96 100644
--- a/tools/testing/selftests/kvm/memslot_modification_stress_test.c
+++ b/tools/testing/selftests/kvm/memslot_modification_stress_test.c
@@ -85,6 +85,7 @@ struct test_params {
useconds_t delay;
uint64_t nr_iterations;
bool partition_vcpu_memory_access;
+ bool disable_slot_zap_quirk;
};
static void run_test(enum vm_guest_mode mode, void *arg)
@@ -95,6 +96,13 @@ static void run_test(enum vm_guest_mode mode, void *arg)
vm = memstress_create_vm(mode, nr_vcpus, guest_percpu_mem_size, 1,
VM_MEM_SRC_ANONYMOUS,
p->partition_vcpu_memory_access);
+#ifdef __x86_64__
+ if (p->disable_slot_zap_quirk)
+ vm_enable_cap(vm, KVM_CAP_DISABLE_QUIRKS2, KVM_X86_QUIRK_SLOT_ZAP_ALL);
+
+ pr_info("Memslot zap quirk %s\n", p->disable_slot_zap_quirk ?
+ "disabled" : "enabled");
+#endif
pr_info("Finished creating vCPUs\n");
@@ -113,11 +121,12 @@ static void run_test(enum vm_guest_mode mode, void *arg)
static void help(char *name)
{
puts("");
- printf("usage: %s [-h] [-m mode] [-d delay_usec]\n"
+ printf("usage: %s [-h] [-m mode] [-d delay_usec] [-q]\n"
" [-b memory] [-v vcpus] [-o] [-i iterations]\n", name);
guest_modes_help();
printf(" -d: add a delay between each iteration of adding and\n"
" deleting a memslot in usec.\n");
+ printf(" -q: Disable memslot zap quirk.\n");
printf(" -b: specify the size of the memory region which should be\n"
" accessed by each vCPU. e.g. 10M or 3G.\n"
" Default: 1G\n");
@@ -143,7 +152,7 @@ int main(int argc, char *argv[])
guest_modes_append_default();
- while ((opt = getopt(argc, argv, "hm:d:b:v:oi:")) != -1) {
+ while ((opt = getopt(argc, argv, "hm:d:qb:v:oi:")) != -1) {
switch (opt) {
case 'm':
guest_modes_cmdline(optarg);
@@ -166,6 +175,12 @@ int main(int argc, char *argv[])
case 'i':
p.nr_iterations = atoi_positive("Number of iterations", optarg);
break;
+ case 'q':
+ p.disable_slot_zap_quirk = true;
+
+ TEST_REQUIRE(kvm_check_cap(KVM_CAP_DISABLE_QUIRKS2) &
+ KVM_X86_QUIRK_SLOT_ZAP_ALL);
+ break;
case 'h':
default:
help(argv[0]);
--
2.43.2
^ permalink raw reply related [flat|nested] 13+ messages in thread
* [PATCH v2 4/4] KVM: selftests: Test memslot move in memslot_perf_test with quirk disabled
2024-07-03 2:09 [PATCH v2 0/4] Introduce a quirk to control memslot zap behavior Yan Zhao
` (2 preceding siblings ...)
2024-07-03 2:12 ` [PATCH v2 3/4] KVM: selftests: Allow slot modification stress test with quirk disabled Yan Zhao
@ 2024-07-03 2:12 ` Yan Zhao
3 siblings, 0 replies; 13+ messages in thread
From: Yan Zhao @ 2024-07-03 2:12 UTC (permalink / raw)
To: pbonzini, seanjc
Cc: rick.p.edgecombe, kai.huang, isaku.yamahata, dmatlack, sagis,
erdemaktas, graf, linux-kernel, kvm, Yan Zhao
Add a new user option to memslot_perf_test to allow testing memslot move
with quirk KVM_X86_QUIRK_SLOT_ZAP_ALL disabled.
Signed-off-by: Yan Zhao <yan.y.zhao@intel.com>
---
tools/testing/selftests/kvm/memslot_perf_test.c | 12 +++++++++++-
1 file changed, 11 insertions(+), 1 deletion(-)
diff --git a/tools/testing/selftests/kvm/memslot_perf_test.c b/tools/testing/selftests/kvm/memslot_perf_test.c
index 579a64f97333..893366982f77 100644
--- a/tools/testing/selftests/kvm/memslot_perf_test.c
+++ b/tools/testing/selftests/kvm/memslot_perf_test.c
@@ -113,6 +113,7 @@ static_assert(ATOMIC_BOOL_LOCK_FREE == 2, "atomic bool is not lockless");
static sem_t vcpu_ready;
static bool map_unmap_verify;
+static bool disable_slot_zap_quirk;
static bool verbose;
#define pr_info_v(...) \
@@ -578,6 +579,9 @@ static bool test_memslot_move_prepare(struct vm_data *data,
uint32_t guest_page_size = data->vm->page_size;
uint64_t movesrcgpa, movetestgpa;
+ if (disable_slot_zap_quirk)
+ vm_enable_cap(data->vm, KVM_CAP_DISABLE_QUIRKS2, KVM_X86_QUIRK_SLOT_ZAP_ALL);
+
movesrcgpa = vm_slot2gpa(data, data->nslots - 1);
if (isactive) {
@@ -896,6 +900,7 @@ static void help(char *name, struct test_args *targs)
pr_info(" -h: print this help screen.\n");
pr_info(" -v: enable verbose mode (not for benchmarking).\n");
pr_info(" -d: enable extra debug checks.\n");
+ pr_info(" -q: Disable memslot zap quirk during memslot move.\n");
pr_info(" -s: specify memslot count cap (-1 means no cap; currently: %i)\n",
targs->nslots);
pr_info(" -f: specify the first test to run (currently: %i; max %zu)\n",
@@ -954,7 +959,7 @@ static bool parse_args(int argc, char *argv[],
uint32_t max_mem_slots;
int opt;
- while ((opt = getopt(argc, argv, "hvds:f:e:l:r:")) != -1) {
+ while ((opt = getopt(argc, argv, "hvdqs:f:e:l:r:")) != -1) {
switch (opt) {
case 'h':
default:
@@ -966,6 +971,11 @@ static bool parse_args(int argc, char *argv[],
case 'd':
map_unmap_verify = true;
break;
+ case 'q':
+ disable_slot_zap_quirk = true;
+ TEST_REQUIRE(kvm_check_cap(KVM_CAP_DISABLE_QUIRKS2) &
+ KVM_X86_QUIRK_SLOT_ZAP_ALL);
+ break;
case 's':
targs->nslots = atoi_paranoid(optarg);
if (targs->nslots <= 1 && targs->nslots != -1) {
--
2.43.2
^ permalink raw reply related [flat|nested] 13+ messages in thread
* Re: [PATCH v2 1/4] KVM: x86/mmu: Introduce a quirk to control memslot zap behavior
2024-07-03 2:10 ` [PATCH v2 1/4] KVM: x86/mmu: " Yan Zhao
@ 2024-09-23 18:37 ` Sean Christopherson
2024-09-24 0:58 ` Yan Zhao
0 siblings, 1 reply; 13+ messages in thread
From: Sean Christopherson @ 2024-09-23 18:37 UTC (permalink / raw)
To: Yan Zhao
Cc: pbonzini, rick.p.edgecombe, kai.huang, isaku.yamahata, dmatlack,
sagis, erdemaktas, graf, linux-kernel, kvm
On Wed, Jul 03, 2024, Yan Zhao wrote:
> Introduce the quirk KVM_X86_QUIRK_SLOT_ZAP_ALL to allow users to select
> KVM's behavior when a memslot is moved or deleted for KVM_X86_DEFAULT_VM
> VMs. Make sure KVM behave as if the quirk is always disabled for
> non-KVM_X86_DEFAULT_VM VMs.
...
> Suggested-by: Kai Huang <kai.huang@intel.com>
> Suggested-by: Sean Christopherson <seanjc@google.com>
Bad Sean, bad.
> +/*
> + * Zapping leaf SPTEs with memslot range when a memslot is moved/deleted.
> + *
> + * Zapping non-leaf SPTEs, a.k.a. not-last SPTEs, isn't required, worst
> + * case scenario we'll have unused shadow pages lying around until they
> + * are recycled due to age or when the VM is destroyed.
> + */
> +static void kvm_mmu_zap_memslot_leafs(struct kvm *kvm, struct kvm_memory_slot *slot)
> +{
> + struct kvm_gfn_range range = {
> + .slot = slot,
> + .start = slot->base_gfn,
> + .end = slot->base_gfn + slot->npages,
> + .may_block = true,
> + };
> + bool flush = false;
> +
> + write_lock(&kvm->mmu_lock);
> +
> + if (kvm_memslots_have_rmaps(kvm))
> + flush = kvm_handle_gfn_range(kvm, &range, kvm_zap_rmap);
This, and Paolo's merged variant, break shadow paging. As was tried in commit
4e103134b862 ("KVM: x86/mmu: Zap only the relevant pages when removing a memslot"),
all shadow pages, i.e. non-leaf SPTEs, need to be zapped. All of the accounting
for a shadow page is tied to the memslot, i.e. the shadow page holds a reference
to the memslot, for all intents and purposes. Deleting the memslot without removing
all relevant shadow pages results in NULL pointer derefs when tearing down the VM.
Note, that commit is/was buggy, and I suspect my follow-up attempt[*] was as well.
https://lore.kernel.org/all/20190820200318.GA15808@linux.intel.com
Rather than trying to get this functional for shadow paging (which includes nested
TDP), I think we should scrap the quirk idea and simply make this the behavior for
S-EPT and nothing else.
BUG: kernel NULL pointer dereference, address: 00000000000000b0
#PF: supervisor read access in kernel mode
#PF: error_code(0x0000) - not-present page
PGD 6085f43067 P4D 608c080067 PUD 608c081067 PMD 0
Oops: Oops: 0000 [#1] SMP NOPTI
CPU: 79 UID: 0 PID: 187063 Comm: set_memory_regi Tainted: G W 6.11.0-smp--24867312d167-cpl #395
Tainted: [W]=WARN
Hardware name: Google Astoria/astoria, BIOS 0.20240617.0-0 06/17/2024
RIP: 0010:__kvm_mmu_prepare_zap_page+0x3a9/0x7b0 [kvm]
Code: <48> 8b 8e b0 00 00 00 48 8b 96 e0 00 00 00 48 c1 e9 09 48 29 c8 8b
RSP: 0018:ff314a25b19f7c28 EFLAGS: 00010212
Call Trace:
<TASK>
kvm_arch_flush_shadow_all+0x7a/0xf0 [kvm]
kvm_mmu_notifier_release+0x6c/0xb0 [kvm]
mmu_notifier_unregister+0x85/0x140
kvm_put_kvm+0x263/0x410 [kvm]
kvm_vm_release+0x21/0x30 [kvm]
__fput+0x8d/0x2c0
__se_sys_close+0x71/0xc0
do_syscall_64+0x83/0x160
entry_SYSCALL_64_after_hwframe+0x76/0x7e
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH v2 1/4] KVM: x86/mmu: Introduce a quirk to control memslot zap behavior
2024-09-23 18:37 ` Sean Christopherson
@ 2024-09-24 0:58 ` Yan Zhao
2024-09-24 7:45 ` Sean Christopherson
2024-11-26 9:15 ` Yan Zhao
0 siblings, 2 replies; 13+ messages in thread
From: Yan Zhao @ 2024-09-24 0:58 UTC (permalink / raw)
To: Sean Christopherson
Cc: pbonzini, rick.p.edgecombe, kai.huang, isaku.yamahata, dmatlack,
sagis, erdemaktas, graf, linux-kernel, kvm
On Mon, Sep 23, 2024 at 11:37:14AM -0700, Sean Christopherson wrote:
> On Wed, Jul 03, 2024, Yan Zhao wrote:
> > Introduce the quirk KVM_X86_QUIRK_SLOT_ZAP_ALL to allow users to select
> > KVM's behavior when a memslot is moved or deleted for KVM_X86_DEFAULT_VM
> > VMs. Make sure KVM behave as if the quirk is always disabled for
> > non-KVM_X86_DEFAULT_VM VMs.
>
> ...
>
> > Suggested-by: Kai Huang <kai.huang@intel.com>
> > Suggested-by: Sean Christopherson <seanjc@google.com>
>
> Bad Sean, bad.
>
> > +/*
> > + * Zapping leaf SPTEs with memslot range when a memslot is moved/deleted.
> > + *
> > + * Zapping non-leaf SPTEs, a.k.a. not-last SPTEs, isn't required, worst
> > + * case scenario we'll have unused shadow pages lying around until they
> > + * are recycled due to age or when the VM is destroyed.
> > + */
> > +static void kvm_mmu_zap_memslot_leafs(struct kvm *kvm, struct kvm_memory_slot *slot)
> > +{
> > + struct kvm_gfn_range range = {
> > + .slot = slot,
> > + .start = slot->base_gfn,
> > + .end = slot->base_gfn + slot->npages,
> > + .may_block = true,
> > + };
> > + bool flush = false;
> > +
> > + write_lock(&kvm->mmu_lock);
> > +
> > + if (kvm_memslots_have_rmaps(kvm))
> > + flush = kvm_handle_gfn_range(kvm, &range, kvm_zap_rmap);
>
> This, and Paolo's merged variant, break shadow paging. As was tried in commit
> 4e103134b862 ("KVM: x86/mmu: Zap only the relevant pages when removing a memslot"),
> all shadow pages, i.e. non-leaf SPTEs, need to be zapped. All of the accounting
> for a shadow page is tied to the memslot, i.e. the shadow page holds a reference
> to the memslot, for all intents and purposes. Deleting the memslot without removing
> all relevant shadow pages results in NULL pointer derefs when tearing down the VM.
>
> Note, that commit is/was buggy, and I suspect my follow-up attempt[*] was as well.
> https://lore.kernel.org/all/20190820200318.GA15808@linux.intel.com
>
> Rather than trying to get this functional for shadow paging (which includes nested
> TDP), I think we should scrap the quirk idea and simply make this the behavior for
> S-EPT and nothing else.
Ok. Thanks for identifying this error. Will change code to this way.
BTW: update some findings regarding to the previous bug with Nvidia GPU
assignment:
I found that after v5.19-rc1+, even with nx_huge_pages=N, the bug is not
reproducible when only leaf entries of memslot are zapped.
(no more detailed info due to limited time to debug).
>
> BUG: kernel NULL pointer dereference, address: 00000000000000b0
> #PF: supervisor read access in kernel mode
> #PF: error_code(0x0000) - not-present page
> PGD 6085f43067 P4D 608c080067 PUD 608c081067 PMD 0
> Oops: Oops: 0000 [#1] SMP NOPTI
> CPU: 79 UID: 0 PID: 187063 Comm: set_memory_regi Tainted: G W 6.11.0-smp--24867312d167-cpl #395
> Tainted: [W]=WARN
> Hardware name: Google Astoria/astoria, BIOS 0.20240617.0-0 06/17/2024
> RIP: 0010:__kvm_mmu_prepare_zap_page+0x3a9/0x7b0 [kvm]
> Code: <48> 8b 8e b0 00 00 00 48 8b 96 e0 00 00 00 48 c1 e9 09 48 29 c8 8b
> RSP: 0018:ff314a25b19f7c28 EFLAGS: 00010212
> Call Trace:
> <TASK>
> kvm_arch_flush_shadow_all+0x7a/0xf0 [kvm]
> kvm_mmu_notifier_release+0x6c/0xb0 [kvm]
> mmu_notifier_unregister+0x85/0x140
> kvm_put_kvm+0x263/0x410 [kvm]
> kvm_vm_release+0x21/0x30 [kvm]
> __fput+0x8d/0x2c0
> __se_sys_close+0x71/0xc0
> do_syscall_64+0x83/0x160
> entry_SYSCALL_64_after_hwframe+0x76/0x7e
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH v2 1/4] KVM: x86/mmu: Introduce a quirk to control memslot zap behavior
2024-09-24 0:58 ` Yan Zhao
@ 2024-09-24 7:45 ` Sean Christopherson
2024-09-24 8:37 ` Yan Zhao
2024-11-26 9:15 ` Yan Zhao
1 sibling, 1 reply; 13+ messages in thread
From: Sean Christopherson @ 2024-09-24 7:45 UTC (permalink / raw)
To: Yan Zhao
Cc: pbonzini, rick.p.edgecombe, kai.huang, isaku.yamahata, dmatlack,
sagis, erdemaktas, graf, linux-kernel, kvm
On Tue, Sep 24, 2024, Yan Zhao wrote:
> On Mon, Sep 23, 2024 at 11:37:14AM -0700, Sean Christopherson wrote:
> > > +static void kvm_mmu_zap_memslot_leafs(struct kvm *kvm, struct kvm_memory_slot *slot)
> > > +{
> > > + struct kvm_gfn_range range = {
> > > + .slot = slot,
> > > + .start = slot->base_gfn,
> > > + .end = slot->base_gfn + slot->npages,
> > > + .may_block = true,
> > > + };
> > > + bool flush = false;
> > > +
> > > + write_lock(&kvm->mmu_lock);
> > > +
> > > + if (kvm_memslots_have_rmaps(kvm))
> > > + flush = kvm_handle_gfn_range(kvm, &range, kvm_zap_rmap);
> >
> > This, and Paolo's merged variant, break shadow paging. As was tried in commit
> > 4e103134b862 ("KVM: x86/mmu: Zap only the relevant pages when removing a memslot"),
> > all shadow pages, i.e. non-leaf SPTEs, need to be zapped. All of the accounting
> > for a shadow page is tied to the memslot, i.e. the shadow page holds a reference
> > to the memslot, for all intents and purposes. Deleting the memslot without removing
> > all relevant shadow pages results in NULL pointer derefs when tearing down the VM.
> >
> > Note, that commit is/was buggy, and I suspect my follow-up attempt[*] was as well.
> > https://lore.kernel.org/all/20190820200318.GA15808@linux.intel.com
> >
> > Rather than trying to get this functional for shadow paging (which includes nested
> > TDP), I think we should scrap the quirk idea and simply make this the behavior for
> > S-EPT and nothing else.
> Ok. Thanks for identifying this error. Will change code to this way.
For now, I think a full revert of the entire series makes sense. Irrespective of
this bug, I don't think KVM should commit to specific implementation behavior,
i.e. KVM shouldn't explicitly say only leaf SPTEs are zapped. The quirk docs
should instead say that if the quirk is disabled, KVM will only guarantee that
the delete memslot will be inaccessible. That way, KVM can still do a fast zap
when it makes sense, e.g. for large memslots, do a complete fast zap, and for
small memslots, do a targeted zap of the TDP MMU but a fast zap of any shadow
MMUs.
> BTW: update some findings regarding to the previous bug with Nvidia GPU
> assignment:
> I found that after v5.19-rc1+, even with nx_huge_pages=N, the bug is not
> reproducible when only leaf entries of memslot are zapped.
> (no more detailed info due to limited time to debug).
Heh, I've given up hope on ever finding a root cause for that issue.
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH v2 1/4] KVM: x86/mmu: Introduce a quirk to control memslot zap behavior
2024-09-24 7:45 ` Sean Christopherson
@ 2024-09-24 8:37 ` Yan Zhao
0 siblings, 0 replies; 13+ messages in thread
From: Yan Zhao @ 2024-09-24 8:37 UTC (permalink / raw)
To: Sean Christopherson
Cc: pbonzini, rick.p.edgecombe, kai.huang, isaku.yamahata, dmatlack,
sagis, erdemaktas, graf, linux-kernel, kvm
On Tue, Sep 24, 2024 at 12:45:52AM -0700, Sean Christopherson wrote:
> On Tue, Sep 24, 2024, Yan Zhao wrote:
> > On Mon, Sep 23, 2024 at 11:37:14AM -0700, Sean Christopherson wrote:
> > > > +static void kvm_mmu_zap_memslot_leafs(struct kvm *kvm, struct kvm_memory_slot *slot)
> > > > +{
> > > > + struct kvm_gfn_range range = {
> > > > + .slot = slot,
> > > > + .start = slot->base_gfn,
> > > > + .end = slot->base_gfn + slot->npages,
> > > > + .may_block = true,
> > > > + };
> > > > + bool flush = false;
> > > > +
> > > > + write_lock(&kvm->mmu_lock);
> > > > +
> > > > + if (kvm_memslots_have_rmaps(kvm))
> > > > + flush = kvm_handle_gfn_range(kvm, &range, kvm_zap_rmap);
> > >
> > > This, and Paolo's merged variant, break shadow paging. As was tried in commit
> > > 4e103134b862 ("KVM: x86/mmu: Zap only the relevant pages when removing a memslot"),
> > > all shadow pages, i.e. non-leaf SPTEs, need to be zapped. All of the accounting
> > > for a shadow page is tied to the memslot, i.e. the shadow page holds a reference
> > > to the memslot, for all intents and purposes. Deleting the memslot without removing
> > > all relevant shadow pages results in NULL pointer derefs when tearing down the VM.
> > >
> > > Note, that commit is/was buggy, and I suspect my follow-up attempt[*] was as well.
> > > https://lore.kernel.org/all/20190820200318.GA15808@linux.intel.com
> > >
> > > Rather than trying to get this functional for shadow paging (which includes nested
> > > TDP), I think we should scrap the quirk idea and simply make this the behavior for
> > > S-EPT and nothing else.
> > Ok. Thanks for identifying this error. Will change code to this way.
>
> For now, I think a full revert of the entire series makes sense. Irrespective of
> this bug, I don't think KVM should commit to specific implementation behavior,
> i.e. KVM shouldn't explicitly say only leaf SPTEs are zapped. The quirk docs
> should instead say that if the quirk is disabled, KVM will only guarantee that
> the delete memslot will be inaccessible. That way, KVM can still do a fast zap
> when it makes sense, e.g. for large memslots, do a complete fast zap, and for
> small memslots, do a targeted zap of the TDP MMU but a fast zap of any shadow
> MMUs.
For TDX, could we do as below after the full revert of this series?
void kvm_arch_flush_shadow_memslot(struct kvm *kvm,
struct kvm_memory_slot *slot)
{
kvm_mmu_zap_all_fast(kvm); ==> this will skip mirror root
kvm_mmu_zap_memslot_mirror_leafs(kvm, slot); ==> zap memslot leaf entries
in mirror root
}
>
> > BTW: update some findings regarding to the previous bug with Nvidia GPU
> > assignment:
> > I found that after v5.19-rc1+, even with nx_huge_pages=N, the bug is not
> > reproducible when only leaf entries of memslot are zapped.
> > (no more detailed info due to limited time to debug).
>
> Heh, I've given up hope on ever finding a root cause for that issue.
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH v2 3/4] KVM: selftests: Allow slot modification stress test with quirk disabled
2024-07-03 2:12 ` [PATCH v2 3/4] KVM: selftests: Allow slot modification stress test with quirk disabled Yan Zhao
@ 2024-09-24 12:26 ` Aishwarya TCV
2024-09-25 0:42 ` Yan Zhao
0 siblings, 1 reply; 13+ messages in thread
From: Aishwarya TCV @ 2024-09-24 12:26 UTC (permalink / raw)
To: Yan Zhao
Cc: rick.p.edgecombe, kai.huang, isaku.yamahata, dmatlack, sagis,
erdemaktas, graf, linux-kernel, kvm, Paolo Bonzini, seanjc,
Mark Brown, Naresh Kamboju
On 03/07/2024 03:12, Yan Zhao wrote:
> Add a new user option to memslot_modification_stress_test to allow testing
> with slot zap quirk KVM_X86_QUIRK_SLOT_ZAP_ALL disabled.
>
> Signed-off-by: Yan Zhao <yan.y.zhao@intel.com>
> ---
> .../kvm/memslot_modification_stress_test.c | 19 +++++++++++++++++--
Hi Yan,
When building kselftest-kvm config against next-20240924 kernel with
Arm64 an error "'KVM_X86_QUIRK_SLOT_ZAP_ALL' undeclared" is observed.
A bisect identified 218f6415004a881d116e254eeb837358aced55ab as the
first bad commit. Bisected it on the tag "next-20240923" at repo
"https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git".
Reverting the change seems to fix it.
This works fine on Linux version 6.11
Failure log
------------
https://storage.kernelci.org/next/master/next-20240924/arm64/defconfig+kselftest/gcc-12/logs/kselftest.log
In file included from include/kvm_util.h:8,
from include/memstress.h:13,
from memslot_modification_stress_test.c:21:
memslot_modification_stress_test.c: In function ‘main’:
memslot_modification_stress_test.c:176:38: error:
‘KVM_X86_QUIRK_SLOT_ZAP_ALL’ undeclared (first use in this function)
176 | KVM_X86_QUIRK_SLOT_ZAP_ALL);
| ^~~~~~~~~~~~~~~~~~~~~~~~~~
include/test_util.h:41:15: note: in definition of macro ‘__TEST_REQUIRE’
41 | if (!(f)) \
| ^
memslot_modification_stress_test.c:175:25: note: in expansion of macro
‘TEST_REQUIRE’
175 |
TEST_REQUIRE(kvm_check_cap(KVM_CAP_DISABLE_QUIRKS2) &
| ^~~~~~~~~~~~
memslot_modification_stress_test.c:176:38: note: each undeclared
identifier is reported only once for each function it appears in
176 | KVM_X86_QUIRK_SLOT_ZAP_ALL);
| ^~~~~~~~~~~~~~~~~~~~~~~~~~
include/test_util.h:41:15: note: in definition of macro ‘__TEST_REQUIRE’
41 | if (!(f)) \
| ^
memslot_modification_stress_test.c:175:25: note: in expansion of macro
‘TEST_REQUIRE’
175 |
TEST_REQUIRE(kvm_check_cap(KVM_CAP_DISABLE_QUIRKS2) &
| ^~~~~~~~~~~~
At top level:
cc1: note: unrecognized command-line option
‘-Wno-gnu-variable-sized-type-not-at-end’ may have been intended to
silence earlier diagnostics
make[4]: *** [Makefile:300:
/tmp/kci/linux/build/kselftest/kvm/memslot_modification_stress_test.o]
Error 1
make[4]: Leaving directory '/tmp/kci/linux/tools/testing/selftests/kvm'
Bisect log:
----------
git bisect start
# good: [98f7e32f20d28ec452afb208f9cffc08448a2652] Linux 6.11
git bisect good 98f7e32f20d28ec452afb208f9cffc08448a2652
# bad: [ef545bc03a65438cabe87beb1b9a15b0ffcb6ace] Add linux-next
specific files for 20240923
git bisect bad ef545bc03a65438cabe87beb1b9a15b0ffcb6ace
# good: [176000734ee2978121fde22a954eb1eabb204329] Merge tag
'ata-6.12-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/libata/linux
git bisect good 176000734ee2978121fde22a954eb1eabb204329
# good: [f55bf3fb11d7fe32a37b8d625744d22891c02e5e] Merge branch
'at91-next' of git://git.kernel.org/pub/scm/linux/kernel/git/at91/linux.git
git bisect good f55bf3fb11d7fe32a37b8d625744d22891c02e5e
# good: [1340ff0aa9e6dcb9c8ac5f86472eb78ba524b14a] Merge branch
'for-next' of git://git.kernel.dk/linux-block.git
git bisect good 1340ff0aa9e6dcb9c8ac5f86472eb78ba524b14a
# bad: [51d98f15885e036a06fef35c396c987e80c47a27] Merge branch
'char-misc-next' of
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc.git
git bisect bad 51d98f15885e036a06fef35c396c987e80c47a27
# bad: [4f216a17ef0dc3bf99c28902abbc6c70fb7798a0] Merge branch
'usb-next' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb.git
git bisect bad 4f216a17ef0dc3bf99c28902abbc6c70fb7798a0
# bad: [b11ba58b0ef5c932303dac5ce96e17d96c127870] Merge branch 'next' of
git://git.kernel.org/pub/scm/virt/kvm/kvm.git
git bisect bad b11ba58b0ef5c932303dac5ce96e17d96c127870
# good: [b7ba28772e5709196e3efffb9341c7fd698b2497] Merge branch
'for-next' of
git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace.git
git bisect good b7ba28772e5709196e3efffb9341c7fd698b2497
# bad: [c345344e8317176944be33f46e18812c0343dc63] Merge tag
'kvm-x86-selftests-6.12' of https://github.com/kvm-x86/linux into HEAD
git bisect bad c345344e8317176944be33f46e18812c0343dc63
# bad: [7056c4e2a13a61f4e8a9e8ce27cd499f27e0e63b] Merge tag
'kvm-x86-generic-6.12' of https://github.com/kvm-x86/linux into HEAD
git bisect bad 7056c4e2a13a61f4e8a9e8ce27cd499f27e0e63b
# bad: [590b09b1d88e18ae57f89930a6f7b89795d2e9f3] KVM: x86: Register
"emergency disable" callbacks when virt is enabled
git bisect bad 590b09b1d88e18ae57f89930a6f7b89795d2e9f3
# bad: [70c0194337d38dd29533e63e3cb07620f8c5eae1] KVM: Rename symbols
related to enabling virtualization hardware
git bisect bad 70c0194337d38dd29533e63e3cb07620f8c5eae1
# bad: [218f6415004a881d116e254eeb837358aced55ab] KVM: selftests: Allow
slot modification stress test with quirk disabled
git bisect bad 218f6415004a881d116e254eeb837358aced55ab
# good: [b4ed2c67d275b85b2ab07d54f88bebd5998d61d8] KVM: selftests: Test
slot move/delete with slot zap quirk enabled/disabled
git bisect good b4ed2c67d275b85b2ab07d54f88bebd5998d61d8
# first bad commit: [218f6415004a881d116e254eeb837358aced55ab] KVM:
selftests: Allow slot modification stress test with quirk disabled
Thanks,
Aishwarya
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH v2 3/4] KVM: selftests: Allow slot modification stress test with quirk disabled
2024-09-24 12:26 ` Aishwarya TCV
@ 2024-09-25 0:42 ` Yan Zhao
2024-09-30 18:15 ` Mark Brown
0 siblings, 1 reply; 13+ messages in thread
From: Yan Zhao @ 2024-09-25 0:42 UTC (permalink / raw)
To: Aishwarya TCV
Cc: rick.p.edgecombe, kai.huang, isaku.yamahata, dmatlack, sagis,
erdemaktas, graf, linux-kernel, kvm, Paolo Bonzini, seanjc,
Mark Brown, Naresh Kamboju
On Tue, Sep 24, 2024 at 01:26:20PM +0100, Aishwarya TCV wrote:
>
>
> On 03/07/2024 03:12, Yan Zhao wrote:
> > Add a new user option to memslot_modification_stress_test to allow testing
> > with slot zap quirk KVM_X86_QUIRK_SLOT_ZAP_ALL disabled.
> >
> > Signed-off-by: Yan Zhao <yan.y.zhao@intel.com>
> > ---
> > .../kvm/memslot_modification_stress_test.c | 19 +++++++++++++++++--
> Hi Yan,
>
> When building kselftest-kvm config against next-20240924 kernel with
> Arm64 an error "'KVM_X86_QUIRK_SLOT_ZAP_ALL' undeclared" is observed.
Ah, I forgot to hide
"TEST_REQUIRE(kvm_check_cap(KVM_CAP_DISABLE_QUIRKS2) &
KVM_X86_QUIRK_SLOT_ZAP_ALL)"
inside "#ifdef __x86_64__" when parsing opts though it's done in run_test().
>
> A bisect identified 218f6415004a881d116e254eeb837358aced55ab as the
> first bad commit. Bisected it on the tag "next-20240923" at repo
> "https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git".
> Reverting the change seems to fix it.
>
> This works fine on Linux version 6.11
>
> Failure log
> ------------
> https://storage.kernelci.org/next/master/next-20240924/arm64/defconfig+kselftest/gcc-12/logs/kselftest.log
>
> In file included from include/kvm_util.h:8,
> from include/memstress.h:13,
> from memslot_modification_stress_test.c:21:
> memslot_modification_stress_test.c: In function ‘main’:
> memslot_modification_stress_test.c:176:38: error:
> ‘KVM_X86_QUIRK_SLOT_ZAP_ALL’ undeclared (first use in this function)
> 176 | KVM_X86_QUIRK_SLOT_ZAP_ALL);
> | ^~~~~~~~~~~~~~~~~~~~~~~~~~
> include/test_util.h:41:15: note: in definition of macro ‘__TEST_REQUIRE’
> 41 | if (!(f)) \
> | ^
> memslot_modification_stress_test.c:175:25: note: in expansion of macro
> ‘TEST_REQUIRE’
> 175 |
> TEST_REQUIRE(kvm_check_cap(KVM_CAP_DISABLE_QUIRKS2) &
> | ^~~~~~~~~~~~
> memslot_modification_stress_test.c:176:38: note: each undeclared
> identifier is reported only once for each function it appears in
> 176 | KVM_X86_QUIRK_SLOT_ZAP_ALL);
> | ^~~~~~~~~~~~~~~~~~~~~~~~~~
> include/test_util.h:41:15: note: in definition of macro ‘__TEST_REQUIRE’
> 41 | if (!(f)) \
> | ^
> memslot_modification_stress_test.c:175:25: note: in expansion of macro
> ‘TEST_REQUIRE’
> 175 |
> TEST_REQUIRE(kvm_check_cap(KVM_CAP_DISABLE_QUIRKS2) &
> | ^~~~~~~~~~~~
> At top level:
> cc1: note: unrecognized command-line option
> ‘-Wno-gnu-variable-sized-type-not-at-end’ may have been intended to
> silence earlier diagnostics
> make[4]: *** [Makefile:300:
> /tmp/kci/linux/build/kselftest/kvm/memslot_modification_stress_test.o]
> Error 1
> make[4]: Leaving directory '/tmp/kci/linux/tools/testing/selftests/kvm'
>
>
> Bisect log:
> ----------
>
> git bisect start
> # good: [98f7e32f20d28ec452afb208f9cffc08448a2652] Linux 6.11
> git bisect good 98f7e32f20d28ec452afb208f9cffc08448a2652
> # bad: [ef545bc03a65438cabe87beb1b9a15b0ffcb6ace] Add linux-next
> specific files for 20240923
> git bisect bad ef545bc03a65438cabe87beb1b9a15b0ffcb6ace
> # good: [176000734ee2978121fde22a954eb1eabb204329] Merge tag
> 'ata-6.12-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/libata/linux
> git bisect good 176000734ee2978121fde22a954eb1eabb204329
> # good: [f55bf3fb11d7fe32a37b8d625744d22891c02e5e] Merge branch
> 'at91-next' of git://git.kernel.org/pub/scm/linux/kernel/git/at91/linux.git
> git bisect good f55bf3fb11d7fe32a37b8d625744d22891c02e5e
> # good: [1340ff0aa9e6dcb9c8ac5f86472eb78ba524b14a] Merge branch
> 'for-next' of git://git.kernel.dk/linux-block.git
> git bisect good 1340ff0aa9e6dcb9c8ac5f86472eb78ba524b14a
> # bad: [51d98f15885e036a06fef35c396c987e80c47a27] Merge branch
> 'char-misc-next' of
> git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc.git
> git bisect bad 51d98f15885e036a06fef35c396c987e80c47a27
> # bad: [4f216a17ef0dc3bf99c28902abbc6c70fb7798a0] Merge branch
> 'usb-next' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb.git
> git bisect bad 4f216a17ef0dc3bf99c28902abbc6c70fb7798a0
> # bad: [b11ba58b0ef5c932303dac5ce96e17d96c127870] Merge branch 'next' of
> git://git.kernel.org/pub/scm/virt/kvm/kvm.git
> git bisect bad b11ba58b0ef5c932303dac5ce96e17d96c127870
> # good: [b7ba28772e5709196e3efffb9341c7fd698b2497] Merge branch
> 'for-next' of
> git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace.git
> git bisect good b7ba28772e5709196e3efffb9341c7fd698b2497
> # bad: [c345344e8317176944be33f46e18812c0343dc63] Merge tag
> 'kvm-x86-selftests-6.12' of https://github.com/kvm-x86/linux into HEAD
> git bisect bad c345344e8317176944be33f46e18812c0343dc63
> # bad: [7056c4e2a13a61f4e8a9e8ce27cd499f27e0e63b] Merge tag
> 'kvm-x86-generic-6.12' of https://github.com/kvm-x86/linux into HEAD
> git bisect bad 7056c4e2a13a61f4e8a9e8ce27cd499f27e0e63b
> # bad: [590b09b1d88e18ae57f89930a6f7b89795d2e9f3] KVM: x86: Register
> "emergency disable" callbacks when virt is enabled
> git bisect bad 590b09b1d88e18ae57f89930a6f7b89795d2e9f3
> # bad: [70c0194337d38dd29533e63e3cb07620f8c5eae1] KVM: Rename symbols
> related to enabling virtualization hardware
> git bisect bad 70c0194337d38dd29533e63e3cb07620f8c5eae1
> # bad: [218f6415004a881d116e254eeb837358aced55ab] KVM: selftests: Allow
> slot modification stress test with quirk disabled
> git bisect bad 218f6415004a881d116e254eeb837358aced55ab
> # good: [b4ed2c67d275b85b2ab07d54f88bebd5998d61d8] KVM: selftests: Test
> slot move/delete with slot zap quirk enabled/disabled
> git bisect good b4ed2c67d275b85b2ab07d54f88bebd5998d61d8
> # first bad commit: [218f6415004a881d116e254eeb837358aced55ab] KVM:
> selftests: Allow slot modification stress test with quirk disabled
>
> Thanks,
> Aishwarya
>
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH v2 3/4] KVM: selftests: Allow slot modification stress test with quirk disabled
2024-09-25 0:42 ` Yan Zhao
@ 2024-09-30 18:15 ` Mark Brown
0 siblings, 0 replies; 13+ messages in thread
From: Mark Brown @ 2024-09-30 18:15 UTC (permalink / raw)
To: Yan Zhao
Cc: Aishwarya TCV, rick.p.edgecombe, kai.huang, isaku.yamahata,
dmatlack, sagis, erdemaktas, graf, linux-kernel, kvm,
Paolo Bonzini, seanjc, Naresh Kamboju, torvalds
[-- Attachment #1: Type: text/plain, Size: 1151 bytes --]
On Wed, Sep 25, 2024 at 08:42:56AM +0800, Yan Zhao wrote:
> On Tue, Sep 24, 2024 at 01:26:20PM +0100, Aishwarya TCV wrote:
> > On 03/07/2024 03:12, Yan Zhao wrote:
> > > Add a new user option to memslot_modification_stress_test to allow testing
> > > with slot zap quirk KVM_X86_QUIRK_SLOT_ZAP_ALL disabled.
> > > Signed-off-by: Yan Zhao <yan.y.zhao@intel.com>
> > When building kselftest-kvm config against next-20240924 kernel with
> > Arm64 an error "'KVM_X86_QUIRK_SLOT_ZAP_ALL' undeclared" is observed.
> Ah, I forgot to hide
> "TEST_REQUIRE(kvm_check_cap(KVM_CAP_DISABLE_QUIRKS2) &
> KVM_X86_QUIRK_SLOT_ZAP_ALL)"
> inside "#ifdef __x86_64__" when parsing opts though it's done in run_test().
This bug, which Aishwarya originally reported against -next, is now
present in mainline:
https://storage.kernelci.org/mainline/master/v6.12-rc1/arm64/defconfig+kselftest/gcc-12/logs/kselftest.log
I couldn't find a fix being posted so I sent:
https://lore.kernel.org/r/20240930-kvm-build-breakage-v1-1-866fad3cc164@kernel.org
which also fixes the same issue in memslot_perf_test.c.
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH v2 1/4] KVM: x86/mmu: Introduce a quirk to control memslot zap behavior
2024-09-24 0:58 ` Yan Zhao
2024-09-24 7:45 ` Sean Christopherson
@ 2024-11-26 9:15 ` Yan Zhao
1 sibling, 0 replies; 13+ messages in thread
From: Yan Zhao @ 2024-11-26 9:15 UTC (permalink / raw)
To: Sean Christopherson, pbonzini, rick.p.edgecombe, kai.huang,
isaku.yamahata, dmatlack, sagis, erdemaktas, graf, linux-kernel,
kvm, alex.williamson, kevin.tian, weijiang.yang
> BTW: update some findings regarding to the previous bug with Nvidia GPU
> assignment:
> I found that after v5.19-rc1+, even with nx_huge_pages=N, the bug is not
> reproducible when only leaf entries of memslot are zapped.
> (no more detailed info due to limited time to debug).
+Alex, Weijiang, and Kevin
Some updates on the Nvidia GPU assignment issue.
Good news is that I may have identified the root cause of this issue.
However, given the root cause, I'm not 100% sure that the issue I observed is
the same one reported by Alex. So it still needs Alex's confirmation and help to
verify it in the original environment.
== My Environment ==
With the help from Weijiang, I'm able to reproduce the issue using
GeForce GT 640, on a KBL desktop.
Besides the GeForce GT 640 assigned to the guest, this KBL desktop has an Intel
IGD device, which is used by host OS.
The guest OS is win10. Guest workloads: a video player + furmark + passmark.
I can observe error patterns that are very similar to those described by Alex
at [1] on kernel tags before v5.19-rc1.
- I can observe the error patterns on kernel tag v5.3-rc4.
(It uses the zap-only-memslot logic and Alex reported that this version was
with this issue at [2]).
- From tag 5.3-rc6 to v5.19-rc1, zap-only-memslot was reverted.
From tag v5.4-rc8, commit b8e8c8303ff2 ("kvm: mmu: ITLB_MULTIHIT mitigation")
was introduced. (though if I directly checkout this commit, the kernel version
is 5.4.0-rc6).
I can reproduce the issue on those kernel versions by adding back and forcing
the zap-only-memslot logic, and setting kvm.nx_huge_pages=N.
(Previously Weijiang found out that with kvm.nx_huge_pages=Y, the issue was
not reproducible [3]).
- If I switched back to zap-all in all those versions, the error pattens were
not observable.
== Root Cause ==
It's found out that with commit fc0051cb9590 ("iommu/vt-d: Check domain
force_snooping against attached devices"), the issue was not reproducible.
(I only bisected kernel tags. This commit first appeared in tag v5.19-rc1.)
Further analysis (with Kevin's help) shows that after the commit fc0051cb9590
("iommu/vt-d: Check domain force_snooping against attached devices"), VFIO
always detected the NVidia GPU device as a coherent DMA device. Prior to that
commit, VFIO detected the NVidia GPU device as a non-coherent DMA device by
querying cache coherency from Intel IOMMU driver, which, however, incorrectly
returned fail if any IOMMU lacked snoop control support.
As a result, if the machine had an Intel IGD device,
- on the Intel IOMMU driver side, it would not enforce snoop for the assigned
NVidia GPU device in the IOMMU SLPT.
- on the KVM's side, KVM also found that kvm_arch_has_noncoherent_dma() was true
and would emulate guest WBINVD.
In KVM's vmx_get_mt_mask(), with non-coherent DMA devices attached,
(using the code in tag v5.3-rc4 as an example):
- when guest CD=1 && kvm_check_has_quirk(vcpu->kvm, KVM_X86_QUIRK_CD_NW_CLEARED),
the EPT memtype is MTRR_TYPE_WRBACK | VMX_EPT_IPAT_BIT;
- when CD=0, the EPT memtype is guest MTRR type (without VMX_EPT_IPAT_BIT).
static u64 vmx_get_mt_mask(struct kvm_vcpu *vcpu, gfn_t gfn, bool is_mmio)
{
u8 cache;
u64 ipat = 0;
if (is_mmio) {
cache = MTRR_TYPE_UNCACHABLE;
goto exit;
}
if (!kvm_arch_has_noncoherent_dma(vcpu->kvm)) {
ipat = VMX_EPT_IPAT_BIT;
cache = MTRR_TYPE_WRBACK;
goto exit;
}
if (kvm_read_cr0(vcpu) & X86_CR0_CD) {
ipat = VMX_EPT_IPAT_BIT;
if (kvm_check_has_quirk(vcpu->kvm, KVM_X86_QUIRK_CD_NW_CLEARED))
cache = MTRR_TYPE_WRBACK;
else
cache = MTRR_TYPE_UNCACHABLE;
goto exit;
}
cache = kvm_mtrr_get_guest_memory_type(vcpu, gfn);
exit:
return (cache << VMX_EPT_MT_EPTE_SHIFT) | ipat;
}
However, with this vmx_get_mt_mask() implementation, KVM did not zap EPT on CD
toggling.
So if I applied patch[4], the error pattens previously observed were immediately
gone and the guest OS appeared quite stable.
Or if I changed vmx_get_mt_mask() as shown below, the issue was not reproducible
even if KVM did not zap EPT for CD toggling and update_mtrr().
static u64 vmx_get_mt_mask(struct kvm_vcpu *vcpu, gfn_t gfn, bool is_mmio)
{
if (is_mmio)
return MTRR_TYPE_UNCACHABLE << VMX_EPT_MT_EPTE_SHIFT;
if (!kvm_arch_has_noncoherent_dma(vcpu->kvm))
return (MTRR_TYPE_WRBACK << VMX_EPT_MT_EPTE_SHIFT) | VMX_EPT_IPAT_BIT;
return (MTRR_TYPE_WRBACK << VMX_EPT_MT_EPTE_SHIFT);
}
So, my conclusion is that the Nvidia GPU assignment issue was caused by the lack
of EPT zapping when the guest toggles CD. (The CD toggling occurs per-vCPU
during guest bootup for enabling guest MTRRs.)
The lack of EPT zapping was previously masked by the zap-all operations for
memslot deletions during guest bootup. However, the error became outstanding
when only memslot EPT entries were zapped. (The guest may have accessed a GPA
during CD=1 to create an EPT entry with a memtype no longer correct after CD=0).
The ITLB_MULTIHIT mitigation [3] splits non-executable huge pages in EPT to
create executable 4k pages. e.g., I can observe GFNs 0xa00, 0xc00 were mapped as
2M initially with EPT memtype=WB. They were then mapped as 2M + EPT
memtype=WB+IPAT when guest CD=1. After some seconds during guest boot, they were
split to 4K + EPT memtype=WB. The split may also mitigate the lack of zapping
for CD toggling to a great extent.
In my environment, the guest appeared quite stable with
"zap-only-memslot + kvm.nx_huge_pages=Y". However, the benchmarks sometimes
still showed around 10 errors in that case, compared to 1000+ errors with
"zap-only-memslot + kvm.nx_huge_pages=N".
== Request Help ==
So, Alex, do you recall if there was an IGD device in your original environment?
If so and if that environment is still available, could you please help verify
if patch [4] resolves the issue?
Thank you and your help is greatly appreciated!
[1] https://lore.kernel.org/kvm/20200713190649.GE29725@linux.intel.com/T/#mc45b9f909731d70551b4e10cff5a58d34a155e71
[2] https://patchwork.kernel.org/project/kvm/patch/20190205210137.1377-11-sean.j.christopherson@intel.com/
[3] https://lore.kernel.org/kvm/20200713190649.GE29725@linux.intel.com/T/#m1839c85392a7a022df9e507876bb241c022c4f06
[4]
From e41a78c95ea3478be04dcb35f374084231a08a5f Mon Sep 17 00:00:00 2001
From: Yan Zhao <yan.y.zhao@intel.com>
Date: Sat, 23 Nov 2024 21:06:42 -0800
Subject: [PATCH] KVM: x86: Zap EPT on CD changes when KVM has non-coherent DMA
Always zap EPT on CD changes when a VM has non-coherent DMA devices
attached, no matter quirk KVM_X86_QUIRK_CD_NW_CLEARED is turned on or not.
Previously when kvm_arch_has_noncoherent_dma() is true, EPT is zapped when
CD is toggled only if quirk KVM_X86_QUIRK_CD_NW_CLEARED is off.
However, EPT should also be zapped when quirk KVM_X86_QUIRK_CD_NW_CLEARED
is on because the EPT memtype would switch bewteen
- "MTRR_TYPE_WRBACK | VMX_EPT_IPAT_BIT", and
- "guest MTRR type (without VMX_EPT_IPAT_BIT)".
Fixes: 879ae1880449 ("KVM: x86: obey KVM_X86_QUIRK_CD_NW_CLEARED in kvm_set_cr0()")
Signed-off-by: Yan Zhao <yan.y.zhao@intel.com>
---
arch/x86/kvm/x86.c | 3 +--
1 file changed, 1 insertion(+), 2 deletions(-)
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 93b0bd45ac73..3e874cfaf059 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -792,8 +792,7 @@ int kvm_set_cr0(struct kvm_vcpu *vcpu, unsigned long cr0)
kvm_mmu_reset_context(vcpu);
if (((cr0 ^ old_cr0) & X86_CR0_CD) &&
- kvm_arch_has_noncoherent_dma(vcpu->kvm) &&
- !kvm_check_has_quirk(vcpu->kvm, KVM_X86_QUIRK_CD_NW_CLEARED))
+ kvm_arch_has_noncoherent_dma(vcpu->kvm))
kvm_zap_gfn_range(vcpu->kvm, 0, ~0ULL);
return 0;
base-commit: d45331b00ddb179e291766617259261c112db872
--
2.27.0
^ permalink raw reply related [flat|nested] 13+ messages in thread
end of thread, other threads:[~2024-11-26 9:18 UTC | newest]
Thread overview: 13+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-07-03 2:09 [PATCH v2 0/4] Introduce a quirk to control memslot zap behavior Yan Zhao
2024-07-03 2:10 ` [PATCH v2 1/4] KVM: x86/mmu: " Yan Zhao
2024-09-23 18:37 ` Sean Christopherson
2024-09-24 0:58 ` Yan Zhao
2024-09-24 7:45 ` Sean Christopherson
2024-09-24 8:37 ` Yan Zhao
2024-11-26 9:15 ` Yan Zhao
2024-07-03 2:11 ` [PATCH v2 2/4] KVM: selftests: Test slot move/delete with slot zap quirk enabled/disabled Yan Zhao
2024-07-03 2:12 ` [PATCH v2 3/4] KVM: selftests: Allow slot modification stress test with quirk disabled Yan Zhao
2024-09-24 12:26 ` Aishwarya TCV
2024-09-25 0:42 ` Yan Zhao
2024-09-30 18:15 ` Mark Brown
2024-07-03 2:12 ` [PATCH v2 4/4] KVM: selftests: Test memslot move in memslot_perf_test " Yan Zhao
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).