All the mail mirrored from lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/2] arm64: Non-racy PTE setting in the presence of HW AF/DBM
@ 2015-12-09 17:26 Catalin Marinas
  2015-12-09 17:26 ` [PATCH 1/2] arm64: Improve error reporting on set_pte_at() checks Catalin Marinas
                   ` (2 more replies)
  0 siblings, 3 replies; 5+ messages in thread
From: Catalin Marinas @ 2015-12-09 17:26 UTC (permalink / raw
  To: linux-arm-kernel

This series addresses a potentially racy default implementation of
ptep_set_access_flags() when hardware update of the access or dirty
states is enabled. The first patch is some clean-up in set_pte_at() to
improve the information reporting and replace BUG with WARN. The second
patch contains the arm64-specific ptep_set_access_flags()
implementation.

Possible racy scenarios are described in patch 2. I think this series
could be simplified on the following assumptions:

a) if the CPUs do not support HW AF/DBM or it is disabled, no other
   agent in the system will perform such updates

b) if one CPU supports HW AF/DBM, all of them must do (don't mix such
   features)

Point (a) means that the current code works fine and BUG_ON() is not
necessary.

Point (b) however requires a ptep_set_access_flags() similar to the x86
one, i.e. only do the setting if (changed && dirty), otherwise let the
hardware handle the updates.

Anyway, while patch 2 is still debatable, I'd like to merge the first
patch in 4.4 to avoid an unnecessary BUG_ON on hardware that doesn't
even do DBM.

Catalin


Catalin Marinas (2):
  arm64: Improve error reporting on set_pte_at() checks
  arm64: Implement ptep_set_access_flags() for hardware AF/DBM

 arch/arm64/include/asm/pgtable.h | 16 ++++++++---
 arch/arm64/mm/fault.c            | 57 ++++++++++++++++++++++++++++++++++++++++
 2 files changed, 69 insertions(+), 4 deletions(-)

^ permalink raw reply	[flat|nested] 5+ messages in thread

* [PATCH 1/2] arm64: Improve error reporting on set_pte_at() checks
  2015-12-09 17:26 [PATCH 0/2] arm64: Non-racy PTE setting in the presence of HW AF/DBM Catalin Marinas
@ 2015-12-09 17:26 ` Catalin Marinas
  2015-12-09 17:26 ` [PATCH 2/2] arm64: Implement ptep_set_access_flags() for hardware AF/DBM Catalin Marinas
  2015-12-10  3:11 ` [PATCH 0/2] arm64: Non-racy PTE setting in the presence of HW AF/DBM Ming Lei
  2 siblings, 0 replies; 5+ messages in thread
From: Catalin Marinas @ 2015-12-09 17:26 UTC (permalink / raw
  To: linux-arm-kernel

Currently the BUG_ON() checks do not give enough information about the
PTEs being set. This patch changes BUG_ON to WARN_ONCE and dumps the
values of the old and new PTEs.

Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
---
 arch/arm64/include/asm/pgtable.h | 11 +++++++----
 1 file changed, 7 insertions(+), 4 deletions(-)

diff --git a/arch/arm64/include/asm/pgtable.h b/arch/arm64/include/asm/pgtable.h
index 7e074f93f383..002dc61a4dff 100644
--- a/arch/arm64/include/asm/pgtable.h
+++ b/arch/arm64/include/asm/pgtable.h
@@ -276,10 +276,13 @@ static inline void set_pte_at(struct mm_struct *mm, unsigned long addr,
 	 * hardware updates of the pte (ptep_set_access_flags safely changes
 	 * valid ptes without going through an invalid entry).
 	 */
-	if (IS_ENABLED(CONFIG_DEBUG_VM) && IS_ENABLED(CONFIG_ARM64_HW_AFDBM) &&
-	    pte_valid(*ptep)) {
-		BUG_ON(!pte_young(pte));
-		BUG_ON(pte_write(*ptep) && !pte_dirty(pte));
+	if (IS_ENABLED(CONFIG_ARM64_HW_AFDBM) && pte_valid(*ptep)) {
+		VM_WARN_ONCE(!pte_young(pte),
+			     "%s: racy access flag clearing: %016llx -> %016llx",
+			     __func__, pte_val(*ptep), pte_val(pte));
+		VM_WARN_ONCE(pte_write(*ptep) && !pte_dirty(pte),
+			     "%s: racy dirty state clearing: %016llx -> %016llx",
+			     __func__, pte_val(*ptep), pte_val(pte));
 	}
 
 	set_pte(ptep, pte);

^ permalink raw reply related	[flat|nested] 5+ messages in thread

* [PATCH 2/2] arm64: Implement ptep_set_access_flags() for hardware AF/DBM
  2015-12-09 17:26 [PATCH 0/2] arm64: Non-racy PTE setting in the presence of HW AF/DBM Catalin Marinas
  2015-12-09 17:26 ` [PATCH 1/2] arm64: Improve error reporting on set_pte_at() checks Catalin Marinas
@ 2015-12-09 17:26 ` Catalin Marinas
  2015-12-10  3:11 ` [PATCH 0/2] arm64: Non-racy PTE setting in the presence of HW AF/DBM Ming Lei
  2 siblings, 0 replies; 5+ messages in thread
From: Catalin Marinas @ 2015-12-09 17:26 UTC (permalink / raw
  To: linux-arm-kernel

When hardware updates of the access and dirty states are enabled, the
default ptep_set_access_flags() implementation based on calling
set_pte_at() directly is potentially racy. This triggers the "racy dirty
state clearing" warning in set_pte_at() because an existing writable PTE
is overridden with a clean entry.

There are two main scenarios for this situation:

1. The CPU getting an access fault does not support hardware updates of
   the access/dirty flags. However, a different agent in the system
   (e.g. SMMU) can do this, therefore overriding a writable entry with a
   clean one could potentially lose the automatically updated dirty
   status

2. A more complex situation is possible when all CPUs support hardware
   AF/DBM:

   a) Initial state: shareable + writable vma and pte_none(pte)
   b) Read fault taken by two threads of the same process on different
      CPUs
   c) CPU0 takes the mmap_sem and proceeds to handling the fault. It
      eventually reaches do_set_pte() which sets a writable + clean pte.
      CPU0 releases the mmap_sem
   d) CPU1 acquires the mmap_sem and proceeds to handle_pte_fault(). The
      pte entry it reads is present, writable and clean and it continues
      to pte_mkyoung()
   e) CPU1 calls ptep_set_access_flags()

   If between (d) and (e) the hardware (another CPU) updates the dirty
   state (clears PTE_RDONLY), CPU1 will override the PTR_RDONLY bit
   marking the entry clean again.

This patch implements an arm64-specific ptep_set_access_flags() function
which performs an atomic update of the AF bit. For the cases where the
new entry is already dirty or read-only, there is no race with the
hardware update, therefore it performs a set_pte() directly.

Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Reported-by: Ming Lei <tom.leiming@gmail.com>
Cc: Will Deacon <will.deacon@arm.com>
---
 arch/arm64/include/asm/pgtable.h |  5 ++++
 arch/arm64/mm/fault.c            | 57 ++++++++++++++++++++++++++++++++++++++++
 2 files changed, 62 insertions(+)

diff --git a/arch/arm64/include/asm/pgtable.h b/arch/arm64/include/asm/pgtable.h
index 002dc61a4dff..77bc5d6ed4e9 100644
--- a/arch/arm64/include/asm/pgtable.h
+++ b/arch/arm64/include/asm/pgtable.h
@@ -536,6 +536,11 @@ static inline pmd_t pmd_modify(pmd_t pmd, pgprot_t newprot)
 }
 
 #ifdef CONFIG_ARM64_HW_AFDBM
+#define __HAVE_ARCH_PTEP_SET_ACCESS_FLAGS
+extern int ptep_set_access_flags(struct vm_area_struct *vma,
+				 unsigned long address, pte_t *ptep,
+				 pte_t entry, int dirty);
+
 /*
  * Atomic pte/pmd modifications.
  */
diff --git a/arch/arm64/mm/fault.c b/arch/arm64/mm/fault.c
index 92ddac1e8ca2..b393ee2eeda2 100644
--- a/arch/arm64/mm/fault.c
+++ b/arch/arm64/mm/fault.c
@@ -81,6 +81,63 @@ void show_pte(struct mm_struct *mm, unsigned long addr)
 	printk("\n");
 }
 
+#ifdef CONFIG_ARM64_HW_AFDBM
+/*
+ * It only sets the access flags (dirty, accessed), as well as write
+ * permission, and only to a more permissive setting. This function needs to
+ * cope with hardware update of the accessed/dirty state by other agents in
+ * the system. It can safely skip the __sync_icache_dcache() call as in
+ * set_pte_at() since the PTE is never changed from no-exec to exec by this
+ * function.
+ * It returns whether the PTE actually changed.
+ */
+int ptep_set_access_flags(struct vm_area_struct *vma,
+			  unsigned long address, pte_t *ptep,
+			  pte_t entry, int dirty)
+{
+	unsigned int tmp;
+
+	if (pte_same(*ptep, entry))
+		return 0;
+
+	/*
+	 * If the PTE is read-only, the hardware cannot update the dirty state
+	 * (clear the PTE_RDONLY bit). If we write a dirty entry, we can
+	 * ignore the race with the hardware update since we set both accessed
+	 * and dirty states anyway. The caller guarantees that the new PTE is
+	 * writable when dirty != 0.
+	 */
+	if (dirty || !pte_write(entry)) {
+		/* avoid a subsequent fault on read-only entries */
+		if (dirty)
+			pte_val(entry) &= ~PTE_RDONLY;
+		set_pte(ptep, entry);
+		goto flush;
+	}
+
+	VM_WARN_ONCE(!pte_write(*ptep) && pte_write(entry),
+		     "%s: making pte writable without dirty: %016llx -> %016llx\n",
+		     __func__, pte_val(*ptep), pte_val(entry));
+
+	/*
+	 * Setting the access flag on a writable entry must be done atomically
+	 * to avoid racing with the hardware update of the dirty state.
+	 */
+	asm volatile("//	ptep_set_access_flags\n"
+	"	prfm	pstl1strm, %2\n"
+	"1:	ldxr	%0, %2\n"
+	"	orr	%0, %0, %3		// set PTE_AF\n"
+	"	stxr	%w1, %0, %2\n"
+	"	cbnz	%w1, 1b\n"
+	: "=&r" (entry), "=&r" (tmp), "+Q" (pte_val(*ptep))
+	: "L" (PTE_AF));
+
+flush:
+	flush_tlb_fix_spurious_fault(vma, address);
+	return 1;
+}
+#endif
+
 /*
  * The kernel tried to access some page that wasn't present.
  */

^ permalink raw reply related	[flat|nested] 5+ messages in thread

* [PATCH 0/2] arm64: Non-racy PTE setting in the presence of HW AF/DBM
  2015-12-09 17:26 [PATCH 0/2] arm64: Non-racy PTE setting in the presence of HW AF/DBM Catalin Marinas
  2015-12-09 17:26 ` [PATCH 1/2] arm64: Improve error reporting on set_pte_at() checks Catalin Marinas
  2015-12-09 17:26 ` [PATCH 2/2] arm64: Implement ptep_set_access_flags() for hardware AF/DBM Catalin Marinas
@ 2015-12-10  3:11 ` Ming Lei
  2015-12-10 11:34   ` Catalin Marinas
  2 siblings, 1 reply; 5+ messages in thread
From: Ming Lei @ 2015-12-10  3:11 UTC (permalink / raw
  To: linux-arm-kernel

On Thu, Dec 10, 2015 at 1:26 AM, Catalin Marinas
<catalin.marinas@arm.com> wrote:
> This series addresses a potentially racy default implementation of
> ptep_set_access_flags() when hardware update of the access or dirty
> states is enabled. The first patch is some clean-up in set_pte_at() to
> improve the information reporting and replace BUG with WARN. The second
> patch contains the arm64-specific ptep_set_access_flags()
> implementation.
>
> Possible racy scenarios are described in patch 2. I think this series
> could be simplified on the following assumptions:
>
> a) if the CPUs do not support HW AF/DBM or it is disabled, no other
>    agent in the system will perform such updates
>
> b) if one CPU supports HW AF/DBM, all of them must do (don't mix such
>    features)
>
> Point (a) means that the current code works fine and BUG_ON() is not
> necessary.
>
> Point (b) however requires a ptep_set_access_flags() similar to the x86
> one, i.e. only do the setting if (changed && dirty), otherwise let the
> hardware handle the updates.
>
> Anyway, while patch 2 is still debatable, I'd like to merge the first
> patch in 4.4 to avoid an unnecessary BUG_ON on hardware that doesn't
> even do DBM.
>
> Catalin
>
>
> Catalin Marinas (2):
>   arm64: Improve error reporting on set_pte_at() checks
>   arm64: Implement ptep_set_access_flags() for hardware AF/DBM
>
>  arch/arm64/include/asm/pgtable.h | 16 ++++++++---
>  arch/arm64/mm/fault.c            | 57 ++++++++++++++++++++++++++++++++++++++++
>  2 files changed, 69 insertions(+), 4 deletions(-)
>

With the two patches, looks no BUG is triggered any more, and just
with the following warning:

[   98.303645] set_pte_at: racy access flag clearing: 00e80040db000b51
-> 00e80040db000b50
[   98.303660] ------------[ cut here ]------------
[   98.303666] WARNING: at ./arch/arm64/include/asm/pgtable.h:282
[   98.303669] Modules linked in:

[   98.303679] CPU: 2 PID: 2445 Comm: stress-ng-minco Tainted: G
 W       4.4.0-rc3-next-20151203+ #65
[   98.303683] Hardware name: AppliedMicro Mustang/Mustang, BIOS 2.0.0
Oct 23 2015
[   98.303686] task: ffffffc7c0caad00 ti: ffffffc7c0d10000 task.ti:
ffffffc7c0d10000
[   98.303696] PC is at pmdp_invalidate+0x104/0x11c
[   98.303700] LR is at pmdp_invalidate+0x104/0x11c
[   98.303703] pc : [<ffffffc0001e2de4>] lr : [<ffffffc0001e2de4>]
pstate: 80000145
[   98.303705] sp : ffffffc7c0d13720
[   98.303708] x29: ffffffc7c0d13720 x28: 0000000000000ff8
[   98.303713] x27: ffffffc445c1a000 x26: 0000000000000001
[   98.303718] x25: ffffffc7c01e8000 x24: ffffffc000d43000
[   98.303722] x23: ffffffc7c01e8000 x22: 00e80040db000b50
[   98.303727] x21: 0000000036e00000 x20: ffffffc7ccecadb0
[   98.303731] x19: 00e80040db000b51 x18: 0000000000000000
[   98.303735] x17: 0000007f410bf260 x16: ffffffc000140eb4
[   98.303740] x15: 003223715c000000 x14: 3038653030203e2d
[   98.303744] x13: 2031356230303062 x12: 6430343030386530
[   98.303749] x11: 30203a676e697261 x10: 656c632067616c66
[   98.303753] x9 : 00000000000001ad x8 : ffffffc7c0d13470
[   98.303758] x7 : ffffffc7c0caad00 x6 : ffffffc000115424
[   98.303762] x5 : 0000000000000000 x4 : ffffffc000c54000
[   98.303766] x3 : 0000000000000000 x2 : ffffffc7c0d10000
[   98.303771] x1 : 0000000000000000 x0 : 000000000000004b

[   98.303778] ---[ end trace a4326fc4ab084b3e ]---
[   98.303780] Call trace:
[   98.306219] [<ffffffc0001e2de4>] pmdp_invalidate+0x104/0x11c
[   98.306227] [<ffffffc000202844>] __split_huge_pmd_locked+0x2b8/0x718
[   98.306232] [<ffffffc000208b54>] split_huge_page_to_list+0x734/0x99c
[   98.306236] [<ffffffc0001e67d0>] add_to_swap+0xf4/0x138
[   98.306241] [<ffffffc0001b5794>] shrink_page_list+0x3a8/0xe3c
[   98.306245] [<ffffffc0001b6974>] shrink_inactive_list+0x228/0x538
[   98.306249] [<ffffffc0001b748c>] shrink_lruvec+0x484/0x590
[   98.306253] [<ffffffc0001b7600>] shrink_zone+0x68/0x19c
[   98.306256] [<ffffffc0001b79f4>] try_to_free_pages+0x2c0/0x6f8
[   98.306263] [<ffffffc0001a8c84>] __alloc_pages_nodemask+0x558/0xa4c
[   98.306267] [<ffffffc0001a930c>] __get_free_pages+0x1c/0x50
[   98.306271] [<ffffffc0001d4394>] SyS_mincore+0x80/0x218
[   98.306276] [<ffffffc000085cb0>] el0_svc_naked+0x24/0x28


-- 
Ming Lei

^ permalink raw reply	[flat|nested] 5+ messages in thread

* [PATCH 0/2] arm64: Non-racy PTE setting in the presence of HW AF/DBM
  2015-12-10  3:11 ` [PATCH 0/2] arm64: Non-racy PTE setting in the presence of HW AF/DBM Ming Lei
@ 2015-12-10 11:34   ` Catalin Marinas
  0 siblings, 0 replies; 5+ messages in thread
From: Catalin Marinas @ 2015-12-10 11:34 UTC (permalink / raw
  To: linux-arm-kernel

On Thu, Dec 10, 2015 at 11:11:26AM +0800, Ming Lei wrote:
> On Thu, Dec 10, 2015 at 1:26 AM, Catalin Marinas
> <catalin.marinas@arm.com> wrote:
> > This series addresses a potentially racy default implementation of
> > ptep_set_access_flags() when hardware update of the access or dirty
> > states is enabled. The first patch is some clean-up in set_pte_at() to
> > improve the information reporting and replace BUG with WARN. The second
> > patch contains the arm64-specific ptep_set_access_flags()
> > implementation.
> >
> > Possible racy scenarios are described in patch 2. I think this series
> > could be simplified on the following assumptions:
> >
> > a) if the CPUs do not support HW AF/DBM or it is disabled, no other
> >    agent in the system will perform such updates
> >
> > b) if one CPU supports HW AF/DBM, all of them must do (don't mix such
> >    features)
> >
> > Point (a) means that the current code works fine and BUG_ON() is not
> > necessary.
> >
> > Point (b) however requires a ptep_set_access_flags() similar to the x86
> > one, i.e. only do the setting if (changed && dirty), otherwise let the
> > hardware handle the updates.
> >
> > Anyway, while patch 2 is still debatable, I'd like to merge the first
> > patch in 4.4 to avoid an unnecessary BUG_ON on hardware that doesn't
> > even do DBM.
> >
> > Catalin
> >
> >
> > Catalin Marinas (2):
> >   arm64: Improve error reporting on set_pte_at() checks
> >   arm64: Implement ptep_set_access_flags() for hardware AF/DBM
> >
> >  arch/arm64/include/asm/pgtable.h | 16 ++++++++---
> >  arch/arm64/mm/fault.c            | 57 ++++++++++++++++++++++++++++++++++++++++
> >  2 files changed, 69 insertions(+), 4 deletions(-)
> >
> 
> With the two patches, looks no BUG is triggered any more, and just
> with the following warning:
> 
> [   98.303645] set_pte_at: racy access flag clearing: 00e80040db000b51
> -> 00e80040db000b50
> [   98.303660] ------------[ cut here ]------------
> [   98.303666] WARNING: at ./arch/arm64/include/asm/pgtable.h:282
> [   98.303669] Modules linked in:
> 
> [   98.303679] CPU: 2 PID: 2445 Comm: stress-ng-minco Tainted: G
>  W       4.4.0-rc3-next-20151203+ #65
> [   98.303683] Hardware name: AppliedMicro Mustang/Mustang, BIOS 2.0.0
> Oct 23 2015
> [   98.303686] task: ffffffc7c0caad00 ti: ffffffc7c0d10000 task.ti:
> ffffffc7c0d10000
> [   98.303696] PC is at pmdp_invalidate+0x104/0x11c

I triggered the pmdp_invalidate() warning as well last night. The
change here is safe, it's just some assumptions that
split_huge_page_to_list() makes. It always marks the resulting PTEs as
dirty, so we don't care about the race. I added this fix-up to the first
patch:

diff --git a/arch/arm64/include/asm/pgtable.h b/arch/arm64/include/asm/pgtable.h
index 002dc61a4dff..88c21c6a40fd 100644
--- a/arch/arm64/include/asm/pgtable.h
+++ b/arch/arm64/include/asm/pgtable.h
@@ -276,7 +276,8 @@ static inline void set_pte_at(struct mm_struct *mm, unsigned long addr,
 	 * hardware updates of the pte (ptep_set_access_flags safely changes
 	 * valid ptes without going through an invalid entry).
 	 */
-	if (IS_ENABLED(CONFIG_ARM64_HW_AFDBM) && pte_valid(*ptep)) {
+	if (IS_ENABLED(CONFIG_ARM64_HW_AFDBM) &&
+	    pte_valid(*ptep) && pte_valid(pte)) {
 		VM_WARN_ONCE(!pte_young(pte),
 			     "%s: racy access flag clearing: %016llx -> %016llx",
 			     __func__, pte_val(*ptep), pte_val(pte));

-- 
Catalin

^ permalink raw reply related	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2015-12-10 11:34 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2015-12-09 17:26 [PATCH 0/2] arm64: Non-racy PTE setting in the presence of HW AF/DBM Catalin Marinas
2015-12-09 17:26 ` [PATCH 1/2] arm64: Improve error reporting on set_pte_at() checks Catalin Marinas
2015-12-09 17:26 ` [PATCH 2/2] arm64: Implement ptep_set_access_flags() for hardware AF/DBM Catalin Marinas
2015-12-10  3:11 ` [PATCH 0/2] arm64: Non-racy PTE setting in the presence of HW AF/DBM Ming Lei
2015-12-10 11:34   ` Catalin Marinas

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.