Linux-arch Archive mirror
 help / color / mirror / Atom feed
* [PATCH] asm-generic: introduce io_stop_wc() and add implementation for ARM64
@ 2021-12-17  8:56 Xiongfeng Wang
  2021-12-17 11:59 ` Catalin Marinas
  2021-12-22  9:08 ` Christoph Hellwig
  0 siblings, 2 replies; 5+ messages in thread
From: Xiongfeng Wang @ 2021-12-17  8:56 UTC (permalink / raw)
  To: will, catalin.marinas, mark.rutland
  Cc: linux-arm-kernel, linux-kernel, moyufeng, wangxiongfeng2,
	linux-arch

For memory accesses with Normal-Non Cacheable attributes, the CPU may do
write combining. But in some situation, this is bad for the performance
because the prior access may wait too long just to be merged.

We introduce io_stop_wc() to prevent the Normal-NC memory accesses before
this instruction to be merged with memory accesses after this
instruction.

We add implementation for ARM64 using DGH instruction and provide NOP
implementation for other architectures.

Signed-off-by: Xiongfeng Wang <wangxiongfeng2@huawei.com>
---
 Documentation/memory-barriers.txt |  9 +++++++++
 arch/arm64/include/asm/barrier.h  |  9 +++++++++
 include/asm-generic/barrier.h     | 11 +++++++++++
 3 files changed, 29 insertions(+)

diff --git a/Documentation/memory-barriers.txt b/Documentation/memory-barriers.txt
index 7367ada13208..b868b51b1801 100644
--- a/Documentation/memory-barriers.txt
+++ b/Documentation/memory-barriers.txt
@@ -1950,6 +1950,15 @@ There are some more advanced barrier functions:
      For load from persistent memory, existing read memory barriers are sufficient
      to ensure read ordering.
 
+ (*) io_stop_wc();
+
+     For memory accesses with Normal-Non Cacheable attributes (e.g. those
+     returned by ioremap_wc()), the CPU may do write combining. But in some
+     situation, this is bad for the performance because the prior access may
+     wait too long just to be merged. io_stop_wc() can be used to prevent
+     merging memory accesses with Normal-Non Cacheable attributes before this
+     instruction with any memory accesses appearing after this instruction.
+
 ===============================
 IMPLICIT KERNEL MEMORY BARRIERS
 ===============================
diff --git a/arch/arm64/include/asm/barrier.h b/arch/arm64/include/asm/barrier.h
index 1c5a00598458..62217be36217 100644
--- a/arch/arm64/include/asm/barrier.h
+++ b/arch/arm64/include/asm/barrier.h
@@ -26,6 +26,14 @@
 #define __tsb_csync()	asm volatile("hint #18" : : : "memory")
 #define csdb()		asm volatile("hint #20" : : : "memory")
 
+/*
+ * Data Gathering Hint:
+ * This instruction prevents merging memory accesses with Normal-NC or
+ * Device-GRE attributes before the hint instruction with any memory accesses
+ * appearing after the hint instruction.
+ */
+#define dgh()		asm volatile("hint #6" : : : "memory")
+
 #ifdef CONFIG_ARM64_PSEUDO_NMI
 #define pmr_sync()						\
 	do {							\
@@ -46,6 +54,7 @@
 #define dma_rmb()	dmb(oshld)
 #define dma_wmb()	dmb(oshst)
 
+#define io_stop_wc()	dgh()
 
 #define tsb_csync()								\
 	do {									\
diff --git a/include/asm-generic/barrier.h b/include/asm-generic/barrier.h
index 640f09479bdf..083be6d34cb9 100644
--- a/include/asm-generic/barrier.h
+++ b/include/asm-generic/barrier.h
@@ -251,5 +251,16 @@ do {									\
 #define pmem_wmb()	wmb()
 #endif
 
+/*
+ * ioremap_wc() maps I/O memory as memory with Normal-Non Cacheable attributes.
+ * The CPU may do write combining for this kind of memory access. io_stop_wc()
+ * prevents merging memory accesses with Normal-Non Cacheable attributes
+ * before this instruction with any memory accesses appearing after this
+ * instruction.
+ */
+#ifndef io_stop_wc
+#define io_stop_wc do { } while (0)
+#endif
+
 #endif /* !__ASSEMBLY__ */
 #endif /* __ASM_GENERIC_BARRIER_H */
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [PATCH] asm-generic: introduce io_stop_wc() and add implementation for ARM64
  2021-12-17  8:56 [PATCH] asm-generic: introduce io_stop_wc() and add implementation for ARM64 Xiongfeng Wang
@ 2021-12-17 11:59 ` Catalin Marinas
  2021-12-20 12:23   ` Xiongfeng Wang
  2021-12-22  9:08 ` Christoph Hellwig
  1 sibling, 1 reply; 5+ messages in thread
From: Catalin Marinas @ 2021-12-17 11:59 UTC (permalink / raw)
  To: Xiongfeng Wang
  Cc: will, mark.rutland, linux-arm-kernel, linux-kernel, moyufeng,
	linux-arch

On Fri, Dec 17, 2021 at 04:56:11PM +0800, Xiongfeng Wang wrote:
> For memory accesses with Normal-Non Cacheable attributes, the CPU may do

You may want to mention "arm64 Normal Non-Cacheable" as other
architectures have a different meaning of NC.

> write combining. But in some situation, this is bad for the performance
> because the prior access may wait too long just to be merged.
> 
> We introduce io_stop_wc() to prevent the Normal-NC memory accesses before
> this instruction to be merged with memory accesses after this
> instruction.
> 
> We add implementation for ARM64 using DGH instruction and provide NOP
> implementation for other architectures.
> 
> Signed-off-by: Xiongfeng Wang <wangxiongfeng2@huawei.com>
> ---
>  Documentation/memory-barriers.txt |  9 +++++++++
>  arch/arm64/include/asm/barrier.h  |  9 +++++++++
>  include/asm-generic/barrier.h     | 11 +++++++++++
>  3 files changed, 29 insertions(+)
> 
> diff --git a/Documentation/memory-barriers.txt b/Documentation/memory-barriers.txt
> index 7367ada13208..b868b51b1801 100644
> --- a/Documentation/memory-barriers.txt
> +++ b/Documentation/memory-barriers.txt
> @@ -1950,6 +1950,15 @@ There are some more advanced barrier functions:
>       For load from persistent memory, existing read memory barriers are sufficient
>       to ensure read ordering.
>  
> + (*) io_stop_wc();
> +
> +     For memory accesses with Normal-Non Cacheable attributes (e.g. those
> +     returned by ioremap_wc()), the CPU may do write combining. But in some
> +     situation, this is bad for the performance because the prior access may
> +     wait too long just to be merged. io_stop_wc() can be used to prevent
> +     merging memory accesses with Normal-Non Cacheable attributes before this
> +     instruction with any memory accesses appearing after this instruction.

I'm fine with the patch in general but the comment here and in
asm-generic/barrier.h should avoid Normal Non-Cacheable as that's an
arm-specific term. Looking at Documentation/driver-api/device-io.rst, we
could simply say "write-combining". Something like:

     For memory accesses with write-combining attributes (e.g. those
     returned by ioremap_wc()), the CPU may wait for prior accesses to
     be merged with subsequent ones. io_stop_wc() can be used to prevent
     the merging of write-combining memory accesses before this macro
     with those after it when such wait has performance implications.

(feel free to rephrase it but avoid Normal NC here)

> +
>  ===============================
>  IMPLICIT KERNEL MEMORY BARRIERS
>  ===============================
> diff --git a/arch/arm64/include/asm/barrier.h b/arch/arm64/include/asm/barrier.h
> index 1c5a00598458..62217be36217 100644
> --- a/arch/arm64/include/asm/barrier.h
> +++ b/arch/arm64/include/asm/barrier.h
> @@ -26,6 +26,14 @@
>  #define __tsb_csync()	asm volatile("hint #18" : : : "memory")
>  #define csdb()		asm volatile("hint #20" : : : "memory")
>  
> +/*
> + * Data Gathering Hint:
> + * This instruction prevents merging memory accesses with Normal-NC or
> + * Device-GRE attributes before the hint instruction with any memory accesses
> + * appearing after the hint instruction.
> + */
> +#define dgh()		asm volatile("hint #6" : : : "memory")

This is fine, arm-specific code.

> +
>  #ifdef CONFIG_ARM64_PSEUDO_NMI
>  #define pmr_sync()						\
>  	do {							\
> @@ -46,6 +54,7 @@
>  #define dma_rmb()	dmb(oshld)
>  #define dma_wmb()	dmb(oshst)
>  
> +#define io_stop_wc()	dgh()
>  
>  #define tsb_csync()								\
>  	do {									\
> diff --git a/include/asm-generic/barrier.h b/include/asm-generic/barrier.h
> index 640f09479bdf..083be6d34cb9 100644
> --- a/include/asm-generic/barrier.h
> +++ b/include/asm-generic/barrier.h
> @@ -251,5 +251,16 @@ do {									\
>  #define pmem_wmb()	wmb()
>  #endif
>  
> +/*
> + * ioremap_wc() maps I/O memory as memory with Normal-Non Cacheable attributes.
> + * The CPU may do write combining for this kind of memory access. io_stop_wc()
> + * prevents merging memory accesses with Normal-Non Cacheable attributes
> + * before this instruction with any memory accesses appearing after this
> + * instruction.

Please change this as well along the lines of my comment above.

> + */
> +#ifndef io_stop_wc
> +#define io_stop_wc do { } while (0)
> +#endif
> +
>  #endif /* !__ASSEMBLY__ */
>  #endif /* __ASM_GENERIC_BARRIER_H */

Thanks.

-- 
Catalin

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] asm-generic: introduce io_stop_wc() and add implementation for ARM64
  2021-12-17 11:59 ` Catalin Marinas
@ 2021-12-20 12:23   ` Xiongfeng Wang
  0 siblings, 0 replies; 5+ messages in thread
From: Xiongfeng Wang @ 2021-12-20 12:23 UTC (permalink / raw)
  To: Catalin Marinas
  Cc: will, mark.rutland, linux-arm-kernel, linux-kernel, moyufeng,
	linux-arch

Hi, Catalin

On 2021/12/17 19:59, Catalin Marinas wrote:
> On Fri, Dec 17, 2021 at 04:56:11PM +0800, Xiongfeng Wang wrote:
>> For memory accesses with Normal-Non Cacheable attributes, the CPU may do
> 
> You may want to mention "arm64 Normal Non-Cacheable" as other
> architectures have a different meaning of NC.
> 
>> write combining. But in some situation, this is bad for the performance
>> because the prior access may wait too long just to be merged.
>>
>> We introduce io_stop_wc() to prevent the Normal-NC memory accesses before
>> this instruction to be merged with memory accesses after this
>> instruction.
>>
>> We add implementation for ARM64 using DGH instruction and provide NOP
>> implementation for other architectures.
>>
>> Signed-off-by: Xiongfeng Wang <wangxiongfeng2@huawei.com>
>> ---
>>  Documentation/memory-barriers.txt |  9 +++++++++
>>  arch/arm64/include/asm/barrier.h  |  9 +++++++++
>>  include/asm-generic/barrier.h     | 11 +++++++++++
>>  3 files changed, 29 insertions(+)
>>
>> diff --git a/Documentation/memory-barriers.txt b/Documentation/memory-barriers.txt
>> index 7367ada13208..b868b51b1801 100644
>> --- a/Documentation/memory-barriers.txt
>> +++ b/Documentation/memory-barriers.txt
>> @@ -1950,6 +1950,15 @@ There are some more advanced barrier functions:
>>       For load from persistent memory, existing read memory barriers are sufficient
>>       to ensure read ordering.
>>  
>> + (*) io_stop_wc();
>> +
>> +     For memory accesses with Normal-Non Cacheable attributes (e.g. those
>> +     returned by ioremap_wc()), the CPU may do write combining. But in some
>> +     situation, this is bad for the performance because the prior access may
>> +     wait too long just to be merged. io_stop_wc() can be used to prevent
>> +     merging memory accesses with Normal-Non Cacheable attributes before this
>> +     instruction with any memory accesses appearing after this instruction.
> 
> I'm fine with the patch in general but the comment here and in
> asm-generic/barrier.h should avoid Normal Non-Cacheable as that's an
> arm-specific term. Looking at Documentation/driver-api/device-io.rst, we
> could simply say "write-combining". Something like:
> 
>      For memory accesses with write-combining attributes (e.g. those
>      returned by ioremap_wc()), the CPU may wait for prior accesses to
>      be merged with subsequent ones. io_stop_wc() can be used to prevent
>      the merging of write-combining memory accesses before this macro
>      with those after it when such wait has performance implications.
> 
> (feel free to rephrase it but avoid Normal NC here)

Thanks for your advice! I will send another version soon.

Thanks,
Xiongfeng

> 
>> +
>>  ===============================
>>  IMPLICIT KERNEL MEMORY BARRIERS
>>  ===============================
>> diff --git a/arch/arm64/include/asm/barrier.h b/arch/arm64/include/asm/barrier.h
>> index 1c5a00598458..62217be36217 100644
>> --- a/arch/arm64/include/asm/barrier.h
>> +++ b/arch/arm64/include/asm/barrier.h
>> @@ -26,6 +26,14 @@
>>  #define __tsb_csync()	asm volatile("hint #18" : : : "memory")
>>  #define csdb()		asm volatile("hint #20" : : : "memory")
>>  
>> +/*
>> + * Data Gathering Hint:
>> + * This instruction prevents merging memory accesses with Normal-NC or
>> + * Device-GRE attributes before the hint instruction with any memory accesses
>> + * appearing after the hint instruction.
>> + */
>> +#define dgh()		asm volatile("hint #6" : : : "memory")
> 
> This is fine, arm-specific code.
> 
>> +
>>  #ifdef CONFIG_ARM64_PSEUDO_NMI
>>  #define pmr_sync()						\
>>  	do {							\
>> @@ -46,6 +54,7 @@
>>  #define dma_rmb()	dmb(oshld)
>>  #define dma_wmb()	dmb(oshst)
>>  
>> +#define io_stop_wc()	dgh()
>>  
>>  #define tsb_csync()								\
>>  	do {									\
>> diff --git a/include/asm-generic/barrier.h b/include/asm-generic/barrier.h
>> index 640f09479bdf..083be6d34cb9 100644
>> --- a/include/asm-generic/barrier.h
>> +++ b/include/asm-generic/barrier.h
>> @@ -251,5 +251,16 @@ do {									\
>>  #define pmem_wmb()	wmb()
>>  #endif
>>  
>> +/*
>> + * ioremap_wc() maps I/O memory as memory with Normal-Non Cacheable attributes.
>> + * The CPU may do write combining for this kind of memory access. io_stop_wc()
>> + * prevents merging memory accesses with Normal-Non Cacheable attributes
>> + * before this instruction with any memory accesses appearing after this
>> + * instruction.
> 
> Please change this as well along the lines of my comment above.
> 
>> + */
>> +#ifndef io_stop_wc
>> +#define io_stop_wc do { } while (0)
>> +#endif
>> +
>>  #endif /* !__ASSEMBLY__ */
>>  #endif /* __ASM_GENERIC_BARRIER_H */
> 
> Thanks.
> 

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] asm-generic: introduce io_stop_wc() and add implementation for ARM64
  2021-12-17  8:56 [PATCH] asm-generic: introduce io_stop_wc() and add implementation for ARM64 Xiongfeng Wang
  2021-12-17 11:59 ` Catalin Marinas
@ 2021-12-22  9:08 ` Christoph Hellwig
  2021-12-22 10:43   ` Catalin Marinas
  1 sibling, 1 reply; 5+ messages in thread
From: Christoph Hellwig @ 2021-12-22  9:08 UTC (permalink / raw)
  To: Xiongfeng Wang
  Cc: will, catalin.marinas, mark.rutland, linux-arm-kernel,
	linux-kernel, moyufeng, linux-arch

So what is going to use this?

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] asm-generic: introduce io_stop_wc() and add implementation for ARM64
  2021-12-22  9:08 ` Christoph Hellwig
@ 2021-12-22 10:43   ` Catalin Marinas
  0 siblings, 0 replies; 5+ messages in thread
From: Catalin Marinas @ 2021-12-22 10:43 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Xiongfeng Wang, will, mark.rutland, linux-arm-kernel,
	linux-kernel, moyufeng, linux-arch

On Wed, Dec 22, 2021 at 01:08:15AM -0800, Christoph Hellwig wrote:
> So what is going to use this?

There was a series about 6 months ago making use of the new DGH arm64
instruction in a network driver:

https://lore.kernel.org/r/1627614864-50824-1-git-send-email-huangguangbin2@huawei.com

We bike-shedded on the name of the macro since (now settled on
io_stop_wc()) and there wasn't much point in carrying the rest of the
patches. I assume the driver changes will go in via their normal route
once the macro above is merged.

-- 
Catalin

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2021-12-22 10:43 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-12-17  8:56 [PATCH] asm-generic: introduce io_stop_wc() and add implementation for ARM64 Xiongfeng Wang
2021-12-17 11:59 ` Catalin Marinas
2021-12-20 12:23   ` Xiongfeng Wang
2021-12-22  9:08 ` Christoph Hellwig
2021-12-22 10:43   ` Catalin Marinas

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).