* [PATCH 0/2] arm64: Non-racy PTE setting in the presence of HW AF/DBM
@ 2015-12-09 17:26 Catalin Marinas
2015-12-09 17:26 ` [PATCH 1/2] arm64: Improve error reporting on set_pte_at() checks Catalin Marinas
` (2 more replies)
0 siblings, 3 replies; 5+ messages in thread
From: Catalin Marinas @ 2015-12-09 17:26 UTC (permalink / raw
To: linux-arm-kernel
This series addresses a potentially racy default implementation of
ptep_set_access_flags() when hardware update of the access or dirty
states is enabled. The first patch is some clean-up in set_pte_at() to
improve the information reporting and replace BUG with WARN. The second
patch contains the arm64-specific ptep_set_access_flags()
implementation.
Possible racy scenarios are described in patch 2. I think this series
could be simplified on the following assumptions:
a) if the CPUs do not support HW AF/DBM or it is disabled, no other
agent in the system will perform such updates
b) if one CPU supports HW AF/DBM, all of them must do (don't mix such
features)
Point (a) means that the current code works fine and BUG_ON() is not
necessary.
Point (b) however requires a ptep_set_access_flags() similar to the x86
one, i.e. only do the setting if (changed && dirty), otherwise let the
hardware handle the updates.
Anyway, while patch 2 is still debatable, I'd like to merge the first
patch in 4.4 to avoid an unnecessary BUG_ON on hardware that doesn't
even do DBM.
Catalin
Catalin Marinas (2):
arm64: Improve error reporting on set_pte_at() checks
arm64: Implement ptep_set_access_flags() for hardware AF/DBM
arch/arm64/include/asm/pgtable.h | 16 ++++++++---
arch/arm64/mm/fault.c | 57 ++++++++++++++++++++++++++++++++++++++++
2 files changed, 69 insertions(+), 4 deletions(-)
^ permalink raw reply [flat|nested] 5+ messages in thread
* [PATCH 1/2] arm64: Improve error reporting on set_pte_at() checks
2015-12-09 17:26 [PATCH 0/2] arm64: Non-racy PTE setting in the presence of HW AF/DBM Catalin Marinas
@ 2015-12-09 17:26 ` Catalin Marinas
2015-12-09 17:26 ` [PATCH 2/2] arm64: Implement ptep_set_access_flags() for hardware AF/DBM Catalin Marinas
2015-12-10 3:11 ` [PATCH 0/2] arm64: Non-racy PTE setting in the presence of HW AF/DBM Ming Lei
2 siblings, 0 replies; 5+ messages in thread
From: Catalin Marinas @ 2015-12-09 17:26 UTC (permalink / raw
To: linux-arm-kernel
Currently the BUG_ON() checks do not give enough information about the
PTEs being set. This patch changes BUG_ON to WARN_ONCE and dumps the
values of the old and new PTEs.
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
---
arch/arm64/include/asm/pgtable.h | 11 +++++++----
1 file changed, 7 insertions(+), 4 deletions(-)
diff --git a/arch/arm64/include/asm/pgtable.h b/arch/arm64/include/asm/pgtable.h
index 7e074f93f383..002dc61a4dff 100644
--- a/arch/arm64/include/asm/pgtable.h
+++ b/arch/arm64/include/asm/pgtable.h
@@ -276,10 +276,13 @@ static inline void set_pte_at(struct mm_struct *mm, unsigned long addr,
* hardware updates of the pte (ptep_set_access_flags safely changes
* valid ptes without going through an invalid entry).
*/
- if (IS_ENABLED(CONFIG_DEBUG_VM) && IS_ENABLED(CONFIG_ARM64_HW_AFDBM) &&
- pte_valid(*ptep)) {
- BUG_ON(!pte_young(pte));
- BUG_ON(pte_write(*ptep) && !pte_dirty(pte));
+ if (IS_ENABLED(CONFIG_ARM64_HW_AFDBM) && pte_valid(*ptep)) {
+ VM_WARN_ONCE(!pte_young(pte),
+ "%s: racy access flag clearing: %016llx -> %016llx",
+ __func__, pte_val(*ptep), pte_val(pte));
+ VM_WARN_ONCE(pte_write(*ptep) && !pte_dirty(pte),
+ "%s: racy dirty state clearing: %016llx -> %016llx",
+ __func__, pte_val(*ptep), pte_val(pte));
}
set_pte(ptep, pte);
^ permalink raw reply related [flat|nested] 5+ messages in thread
* [PATCH 2/2] arm64: Implement ptep_set_access_flags() for hardware AF/DBM
2015-12-09 17:26 [PATCH 0/2] arm64: Non-racy PTE setting in the presence of HW AF/DBM Catalin Marinas
2015-12-09 17:26 ` [PATCH 1/2] arm64: Improve error reporting on set_pte_at() checks Catalin Marinas
@ 2015-12-09 17:26 ` Catalin Marinas
2015-12-10 3:11 ` [PATCH 0/2] arm64: Non-racy PTE setting in the presence of HW AF/DBM Ming Lei
2 siblings, 0 replies; 5+ messages in thread
From: Catalin Marinas @ 2015-12-09 17:26 UTC (permalink / raw
To: linux-arm-kernel
When hardware updates of the access and dirty states are enabled, the
default ptep_set_access_flags() implementation based on calling
set_pte_at() directly is potentially racy. This triggers the "racy dirty
state clearing" warning in set_pte_at() because an existing writable PTE
is overridden with a clean entry.
There are two main scenarios for this situation:
1. The CPU getting an access fault does not support hardware updates of
the access/dirty flags. However, a different agent in the system
(e.g. SMMU) can do this, therefore overriding a writable entry with a
clean one could potentially lose the automatically updated dirty
status
2. A more complex situation is possible when all CPUs support hardware
AF/DBM:
a) Initial state: shareable + writable vma and pte_none(pte)
b) Read fault taken by two threads of the same process on different
CPUs
c) CPU0 takes the mmap_sem and proceeds to handling the fault. It
eventually reaches do_set_pte() which sets a writable + clean pte.
CPU0 releases the mmap_sem
d) CPU1 acquires the mmap_sem and proceeds to handle_pte_fault(). The
pte entry it reads is present, writable and clean and it continues
to pte_mkyoung()
e) CPU1 calls ptep_set_access_flags()
If between (d) and (e) the hardware (another CPU) updates the dirty
state (clears PTE_RDONLY), CPU1 will override the PTR_RDONLY bit
marking the entry clean again.
This patch implements an arm64-specific ptep_set_access_flags() function
which performs an atomic update of the AF bit. For the cases where the
new entry is already dirty or read-only, there is no race with the
hardware update, therefore it performs a set_pte() directly.
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Reported-by: Ming Lei <tom.leiming@gmail.com>
Cc: Will Deacon <will.deacon@arm.com>
---
arch/arm64/include/asm/pgtable.h | 5 ++++
arch/arm64/mm/fault.c | 57 ++++++++++++++++++++++++++++++++++++++++
2 files changed, 62 insertions(+)
diff --git a/arch/arm64/include/asm/pgtable.h b/arch/arm64/include/asm/pgtable.h
index 002dc61a4dff..77bc5d6ed4e9 100644
--- a/arch/arm64/include/asm/pgtable.h
+++ b/arch/arm64/include/asm/pgtable.h
@@ -536,6 +536,11 @@ static inline pmd_t pmd_modify(pmd_t pmd, pgprot_t newprot)
}
#ifdef CONFIG_ARM64_HW_AFDBM
+#define __HAVE_ARCH_PTEP_SET_ACCESS_FLAGS
+extern int ptep_set_access_flags(struct vm_area_struct *vma,
+ unsigned long address, pte_t *ptep,
+ pte_t entry, int dirty);
+
/*
* Atomic pte/pmd modifications.
*/
diff --git a/arch/arm64/mm/fault.c b/arch/arm64/mm/fault.c
index 92ddac1e8ca2..b393ee2eeda2 100644
--- a/arch/arm64/mm/fault.c
+++ b/arch/arm64/mm/fault.c
@@ -81,6 +81,63 @@ void show_pte(struct mm_struct *mm, unsigned long addr)
printk("\n");
}
+#ifdef CONFIG_ARM64_HW_AFDBM
+/*
+ * It only sets the access flags (dirty, accessed), as well as write
+ * permission, and only to a more permissive setting. This function needs to
+ * cope with hardware update of the accessed/dirty state by other agents in
+ * the system. It can safely skip the __sync_icache_dcache() call as in
+ * set_pte_at() since the PTE is never changed from no-exec to exec by this
+ * function.
+ * It returns whether the PTE actually changed.
+ */
+int ptep_set_access_flags(struct vm_area_struct *vma,
+ unsigned long address, pte_t *ptep,
+ pte_t entry, int dirty)
+{
+ unsigned int tmp;
+
+ if (pte_same(*ptep, entry))
+ return 0;
+
+ /*
+ * If the PTE is read-only, the hardware cannot update the dirty state
+ * (clear the PTE_RDONLY bit). If we write a dirty entry, we can
+ * ignore the race with the hardware update since we set both accessed
+ * and dirty states anyway. The caller guarantees that the new PTE is
+ * writable when dirty != 0.
+ */
+ if (dirty || !pte_write(entry)) {
+ /* avoid a subsequent fault on read-only entries */
+ if (dirty)
+ pte_val(entry) &= ~PTE_RDONLY;
+ set_pte(ptep, entry);
+ goto flush;
+ }
+
+ VM_WARN_ONCE(!pte_write(*ptep) && pte_write(entry),
+ "%s: making pte writable without dirty: %016llx -> %016llx\n",
+ __func__, pte_val(*ptep), pte_val(entry));
+
+ /*
+ * Setting the access flag on a writable entry must be done atomically
+ * to avoid racing with the hardware update of the dirty state.
+ */
+ asm volatile("// ptep_set_access_flags\n"
+ " prfm pstl1strm, %2\n"
+ "1: ldxr %0, %2\n"
+ " orr %0, %0, %3 // set PTE_AF\n"
+ " stxr %w1, %0, %2\n"
+ " cbnz %w1, 1b\n"
+ : "=&r" (entry), "=&r" (tmp), "+Q" (pte_val(*ptep))
+ : "L" (PTE_AF));
+
+flush:
+ flush_tlb_fix_spurious_fault(vma, address);
+ return 1;
+}
+#endif
+
/*
* The kernel tried to access some page that wasn't present.
*/
^ permalink raw reply related [flat|nested] 5+ messages in thread
* [PATCH 0/2] arm64: Non-racy PTE setting in the presence of HW AF/DBM
2015-12-09 17:26 [PATCH 0/2] arm64: Non-racy PTE setting in the presence of HW AF/DBM Catalin Marinas
2015-12-09 17:26 ` [PATCH 1/2] arm64: Improve error reporting on set_pte_at() checks Catalin Marinas
2015-12-09 17:26 ` [PATCH 2/2] arm64: Implement ptep_set_access_flags() for hardware AF/DBM Catalin Marinas
@ 2015-12-10 3:11 ` Ming Lei
2015-12-10 11:34 ` Catalin Marinas
2 siblings, 1 reply; 5+ messages in thread
From: Ming Lei @ 2015-12-10 3:11 UTC (permalink / raw
To: linux-arm-kernel
On Thu, Dec 10, 2015 at 1:26 AM, Catalin Marinas
<catalin.marinas@arm.com> wrote:
> This series addresses a potentially racy default implementation of
> ptep_set_access_flags() when hardware update of the access or dirty
> states is enabled. The first patch is some clean-up in set_pte_at() to
> improve the information reporting and replace BUG with WARN. The second
> patch contains the arm64-specific ptep_set_access_flags()
> implementation.
>
> Possible racy scenarios are described in patch 2. I think this series
> could be simplified on the following assumptions:
>
> a) if the CPUs do not support HW AF/DBM or it is disabled, no other
> agent in the system will perform such updates
>
> b) if one CPU supports HW AF/DBM, all of them must do (don't mix such
> features)
>
> Point (a) means that the current code works fine and BUG_ON() is not
> necessary.
>
> Point (b) however requires a ptep_set_access_flags() similar to the x86
> one, i.e. only do the setting if (changed && dirty), otherwise let the
> hardware handle the updates.
>
> Anyway, while patch 2 is still debatable, I'd like to merge the first
> patch in 4.4 to avoid an unnecessary BUG_ON on hardware that doesn't
> even do DBM.
>
> Catalin
>
>
> Catalin Marinas (2):
> arm64: Improve error reporting on set_pte_at() checks
> arm64: Implement ptep_set_access_flags() for hardware AF/DBM
>
> arch/arm64/include/asm/pgtable.h | 16 ++++++++---
> arch/arm64/mm/fault.c | 57 ++++++++++++++++++++++++++++++++++++++++
> 2 files changed, 69 insertions(+), 4 deletions(-)
>
With the two patches, looks no BUG is triggered any more, and just
with the following warning:
[ 98.303645] set_pte_at: racy access flag clearing: 00e80040db000b51
-> 00e80040db000b50
[ 98.303660] ------------[ cut here ]------------
[ 98.303666] WARNING: at ./arch/arm64/include/asm/pgtable.h:282
[ 98.303669] Modules linked in:
[ 98.303679] CPU: 2 PID: 2445 Comm: stress-ng-minco Tainted: G
W 4.4.0-rc3-next-20151203+ #65
[ 98.303683] Hardware name: AppliedMicro Mustang/Mustang, BIOS 2.0.0
Oct 23 2015
[ 98.303686] task: ffffffc7c0caad00 ti: ffffffc7c0d10000 task.ti:
ffffffc7c0d10000
[ 98.303696] PC is at pmdp_invalidate+0x104/0x11c
[ 98.303700] LR is at pmdp_invalidate+0x104/0x11c
[ 98.303703] pc : [<ffffffc0001e2de4>] lr : [<ffffffc0001e2de4>]
pstate: 80000145
[ 98.303705] sp : ffffffc7c0d13720
[ 98.303708] x29: ffffffc7c0d13720 x28: 0000000000000ff8
[ 98.303713] x27: ffffffc445c1a000 x26: 0000000000000001
[ 98.303718] x25: ffffffc7c01e8000 x24: ffffffc000d43000
[ 98.303722] x23: ffffffc7c01e8000 x22: 00e80040db000b50
[ 98.303727] x21: 0000000036e00000 x20: ffffffc7ccecadb0
[ 98.303731] x19: 00e80040db000b51 x18: 0000000000000000
[ 98.303735] x17: 0000007f410bf260 x16: ffffffc000140eb4
[ 98.303740] x15: 003223715c000000 x14: 3038653030203e2d
[ 98.303744] x13: 2031356230303062 x12: 6430343030386530
[ 98.303749] x11: 30203a676e697261 x10: 656c632067616c66
[ 98.303753] x9 : 00000000000001ad x8 : ffffffc7c0d13470
[ 98.303758] x7 : ffffffc7c0caad00 x6 : ffffffc000115424
[ 98.303762] x5 : 0000000000000000 x4 : ffffffc000c54000
[ 98.303766] x3 : 0000000000000000 x2 : ffffffc7c0d10000
[ 98.303771] x1 : 0000000000000000 x0 : 000000000000004b
[ 98.303778] ---[ end trace a4326fc4ab084b3e ]---
[ 98.303780] Call trace:
[ 98.306219] [<ffffffc0001e2de4>] pmdp_invalidate+0x104/0x11c
[ 98.306227] [<ffffffc000202844>] __split_huge_pmd_locked+0x2b8/0x718
[ 98.306232] [<ffffffc000208b54>] split_huge_page_to_list+0x734/0x99c
[ 98.306236] [<ffffffc0001e67d0>] add_to_swap+0xf4/0x138
[ 98.306241] [<ffffffc0001b5794>] shrink_page_list+0x3a8/0xe3c
[ 98.306245] [<ffffffc0001b6974>] shrink_inactive_list+0x228/0x538
[ 98.306249] [<ffffffc0001b748c>] shrink_lruvec+0x484/0x590
[ 98.306253] [<ffffffc0001b7600>] shrink_zone+0x68/0x19c
[ 98.306256] [<ffffffc0001b79f4>] try_to_free_pages+0x2c0/0x6f8
[ 98.306263] [<ffffffc0001a8c84>] __alloc_pages_nodemask+0x558/0xa4c
[ 98.306267] [<ffffffc0001a930c>] __get_free_pages+0x1c/0x50
[ 98.306271] [<ffffffc0001d4394>] SyS_mincore+0x80/0x218
[ 98.306276] [<ffffffc000085cb0>] el0_svc_naked+0x24/0x28
--
Ming Lei
^ permalink raw reply [flat|nested] 5+ messages in thread
* [PATCH 0/2] arm64: Non-racy PTE setting in the presence of HW AF/DBM
2015-12-10 3:11 ` [PATCH 0/2] arm64: Non-racy PTE setting in the presence of HW AF/DBM Ming Lei
@ 2015-12-10 11:34 ` Catalin Marinas
0 siblings, 0 replies; 5+ messages in thread
From: Catalin Marinas @ 2015-12-10 11:34 UTC (permalink / raw
To: linux-arm-kernel
On Thu, Dec 10, 2015 at 11:11:26AM +0800, Ming Lei wrote:
> On Thu, Dec 10, 2015 at 1:26 AM, Catalin Marinas
> <catalin.marinas@arm.com> wrote:
> > This series addresses a potentially racy default implementation of
> > ptep_set_access_flags() when hardware update of the access or dirty
> > states is enabled. The first patch is some clean-up in set_pte_at() to
> > improve the information reporting and replace BUG with WARN. The second
> > patch contains the arm64-specific ptep_set_access_flags()
> > implementation.
> >
> > Possible racy scenarios are described in patch 2. I think this series
> > could be simplified on the following assumptions:
> >
> > a) if the CPUs do not support HW AF/DBM or it is disabled, no other
> > agent in the system will perform such updates
> >
> > b) if one CPU supports HW AF/DBM, all of them must do (don't mix such
> > features)
> >
> > Point (a) means that the current code works fine and BUG_ON() is not
> > necessary.
> >
> > Point (b) however requires a ptep_set_access_flags() similar to the x86
> > one, i.e. only do the setting if (changed && dirty), otherwise let the
> > hardware handle the updates.
> >
> > Anyway, while patch 2 is still debatable, I'd like to merge the first
> > patch in 4.4 to avoid an unnecessary BUG_ON on hardware that doesn't
> > even do DBM.
> >
> > Catalin
> >
> >
> > Catalin Marinas (2):
> > arm64: Improve error reporting on set_pte_at() checks
> > arm64: Implement ptep_set_access_flags() for hardware AF/DBM
> >
> > arch/arm64/include/asm/pgtable.h | 16 ++++++++---
> > arch/arm64/mm/fault.c | 57 ++++++++++++++++++++++++++++++++++++++++
> > 2 files changed, 69 insertions(+), 4 deletions(-)
> >
>
> With the two patches, looks no BUG is triggered any more, and just
> with the following warning:
>
> [ 98.303645] set_pte_at: racy access flag clearing: 00e80040db000b51
> -> 00e80040db000b50
> [ 98.303660] ------------[ cut here ]------------
> [ 98.303666] WARNING: at ./arch/arm64/include/asm/pgtable.h:282
> [ 98.303669] Modules linked in:
>
> [ 98.303679] CPU: 2 PID: 2445 Comm: stress-ng-minco Tainted: G
> W 4.4.0-rc3-next-20151203+ #65
> [ 98.303683] Hardware name: AppliedMicro Mustang/Mustang, BIOS 2.0.0
> Oct 23 2015
> [ 98.303686] task: ffffffc7c0caad00 ti: ffffffc7c0d10000 task.ti:
> ffffffc7c0d10000
> [ 98.303696] PC is at pmdp_invalidate+0x104/0x11c
I triggered the pmdp_invalidate() warning as well last night. The
change here is safe, it's just some assumptions that
split_huge_page_to_list() makes. It always marks the resulting PTEs as
dirty, so we don't care about the race. I added this fix-up to the first
patch:
diff --git a/arch/arm64/include/asm/pgtable.h b/arch/arm64/include/asm/pgtable.h
index 002dc61a4dff..88c21c6a40fd 100644
--- a/arch/arm64/include/asm/pgtable.h
+++ b/arch/arm64/include/asm/pgtable.h
@@ -276,7 +276,8 @@ static inline void set_pte_at(struct mm_struct *mm, unsigned long addr,
* hardware updates of the pte (ptep_set_access_flags safely changes
* valid ptes without going through an invalid entry).
*/
- if (IS_ENABLED(CONFIG_ARM64_HW_AFDBM) && pte_valid(*ptep)) {
+ if (IS_ENABLED(CONFIG_ARM64_HW_AFDBM) &&
+ pte_valid(*ptep) && pte_valid(pte)) {
VM_WARN_ONCE(!pte_young(pte),
"%s: racy access flag clearing: %016llx -> %016llx",
__func__, pte_val(*ptep), pte_val(pte));
--
Catalin
^ permalink raw reply related [flat|nested] 5+ messages in thread
end of thread, other threads:[~2015-12-10 11:34 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2015-12-09 17:26 [PATCH 0/2] arm64: Non-racy PTE setting in the presence of HW AF/DBM Catalin Marinas
2015-12-09 17:26 ` [PATCH 1/2] arm64: Improve error reporting on set_pte_at() checks Catalin Marinas
2015-12-09 17:26 ` [PATCH 2/2] arm64: Implement ptep_set_access_flags() for hardware AF/DBM Catalin Marinas
2015-12-10 3:11 ` [PATCH 0/2] arm64: Non-racy PTE setting in the presence of HW AF/DBM Ming Lei
2015-12-10 11:34 ` Catalin Marinas
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.