LKML Archive mirror
 help / color / mirror / Atom feed
* [PATCH] asm-generic: force inlining of some atomic_long operations
@ 2016-02-04 19:45 Denys Vlasenko
  2016-02-04 19:45 ` [PATCH] force inlining of some byteswap operations Denys Vlasenko
  2016-02-04 19:45 ` [PATCH] force inlining of unaligned byteswap operations Denys Vlasenko
  0 siblings, 2 replies; 45+ messages in thread
From: Denys Vlasenko @ 2016-02-04 19:45 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Denys Vlasenko, Thomas Graf, Peter Zijlstra, David Rientjes,
	Andrew Morton, linux-kernel

Sometimes gcc mysteriously doesn't inline
very small functions we expect to be inlined. See
    https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66122

With this .config:
http://busybox.net/~vda/kernel_config_OPTIMIZE_INLINING_and_Os,
atomic_long_inc(), atomic_long_dec() and atomic_long_add()
functions get deinlined about 40 times. Examples of disassembly:

<atomic_long_inc> (21 copies, 147 calls):
       55                      push   %rbp
       48 89 e5                mov    %rsp,%rbp
       f0 48 ff 07             lock incq (%rdi)
       5d                      pop    %rbp
       c3                      retq

<atomic_long_dec> (4 copies, 14 calls) is similar to inc.

<atomic_long_add> (11 copies, 41 calls):
       55                      push   %rbp
       48 89 e5                mov    %rsp,%rbp
       f0 48 01 3e             lock add %rdi,(%rsi)
       5d                      pop    %rbp
       c3                      retq

This patch fixes this via s/inline/__always_inline/.
Code size decrease after the patch is ~1.3k:

    text     data      bss       dec     hex filename
92203657 20826112 36417536 149447305 8e86289 vmlinux
92202377 20826112 36417536 149446025 8e85d89 vmlinux4_atomiclong_after

Signed-off-by: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Thomas Graf <tgraf@suug.ch>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: linux-kernel@vger.kernel.org
---
 include/asm-generic/atomic-long.h | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/include/asm-generic/atomic-long.h b/include/asm-generic/atomic-long.h
index eb1973b..5e1f345 100644
--- a/include/asm-generic/atomic-long.h
+++ b/include/asm-generic/atomic-long.h
@@ -98,14 +98,14 @@ ATOMIC_LONG_ADD_SUB_OP(sub, _release)
 #define atomic_long_xchg(v, new) \
 	(ATOMIC_LONG_PFX(_xchg)((ATOMIC_LONG_PFX(_t) *)(v), (new)))
 
-static inline void atomic_long_inc(atomic_long_t *l)
+static __always_inline void atomic_long_inc(atomic_long_t *l)
 {
 	ATOMIC_LONG_PFX(_t) *v = (ATOMIC_LONG_PFX(_t) *)l;
 
 	ATOMIC_LONG_PFX(_inc)(v);
 }
 
-static inline void atomic_long_dec(atomic_long_t *l)
+static __always_inline void atomic_long_dec(atomic_long_t *l)
 {
 	ATOMIC_LONG_PFX(_t) *v = (ATOMIC_LONG_PFX(_t) *)l;
 
@@ -113,7 +113,7 @@ static inline void atomic_long_dec(atomic_long_t *l)
 }
 
 #define ATOMIC_LONG_OP(op)						\
-static inline void							\
+static __always_inline void						\
 atomic_long_##op(long i, atomic_long_t *l)				\
 {									\
 	ATOMIC_LONG_PFX(_t) *v = (ATOMIC_LONG_PFX(_t) *)l;		\
-- 
1.8.1.4

^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [PATCH] force inlining of some byteswap operations
  2016-02-04 19:45 [PATCH] asm-generic: force inlining of some atomic_long operations Denys Vlasenko
@ 2016-02-04 19:45 ` Denys Vlasenko
  2016-02-05  7:28   ` Ingo Molnar
  2016-04-13  3:36   ` This patch triggers a bad gcc bug (was Re: [PATCH] force inlining of some byteswap operations) Josh Poimboeuf
  2016-02-04 19:45 ` [PATCH] force inlining of unaligned byteswap operations Denys Vlasenko
  1 sibling, 2 replies; 45+ messages in thread
From: Denys Vlasenko @ 2016-02-04 19:45 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Denys Vlasenko, Thomas Graf, Peter Zijlstra, David Rientjes,
	Andrew Morton, linux-kernel

Sometimes gcc mysteriously doesn't inline
very small functions we expect to be inlined. See
    https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66122

With this .config:
http://busybox.net/~vda/kernel_config_OPTIMIZE_INLINING_and_Os,
the following functions get deinlined many times.
Examples of disassembly:

<get_unaligned_be16> (12 copies, 51 calls):
       66 8b 07                mov    (%rdi),%ax
       55                      push   %rbp
       48 89 e5                mov    %rsp,%rbp
       86 e0                   xchg   %ah,%al
       5d                      pop    %rbp
       c3                      retq

<get_unaligned_be32> (12 copies, 135 calls):
       8b 07                   mov    (%rdi),%eax
       55                      push   %rbp
       48 89 e5                mov    %rsp,%rbp
       0f c8                   bswap  %eax
       5d                      pop    %rbp
       c3                      retq

<get_unaligned_be64> (2 copies, 20 calls):
       48 8b 07                mov    (%rdi),%rax
       55                      push   %rbp
       48 89 e5                mov    %rsp,%rbp
       48 0f c8                bswap  %rax
       5d                      pop    %rbp
       c3                      retq

<__swab16p> (16 copies, 146 calls):
       55                      push   %rbp
       89 f8                   mov    %edi,%eax
       86 e0                   xchg   %ah,%al
       48 89 e5                mov    %rsp,%rbp
       5d                      pop    %rbp
       c3                      retq

<__swab32p> (43 copies, ~560 calls):
       55                      push   %rbp
       89 f8                   mov    %edi,%eax
       0f c8                   bswap  %eax
       48 89 e5                mov    %rsp,%rbp
       5d                      pop    %rbp
       c3                      retq

<__swab64p> (21 copies, 119 calls):
       55                      push   %rbp
       48 89 f8                mov    %rdi,%rax
       48 0f c8                bswap  %rax
       48 89 e5                mov    %rsp,%rbp
       5d                      pop    %rbp
       c3                      retq

<__swab32s> (6 copies, 47 calls):
       8b 07                   mov    (%rdi),%eax
       55                      push   %rbp
       48 89 e5                mov    %rsp,%rbp
       0f c8                   bswap  %eax
       89 07                   mov    %eax,(%rdi)
       5d                      pop    %rbp
       c3                      retq

This patch fixes this via s/inline/__always_inline/.
Code size decrease after the patch is ~4.5k:

    text     data      bss       dec     hex filename
92202377 20826112 36417536 149446025 8e85d89 vmlinux
92197848 20826112 36417536 149441496 8e84bd8 vmlinux5_swap_after

Signed-off-by: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Thomas Graf <tgraf@suug.ch>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: linux-kernel@vger.kernel.org
---
 include/uapi/linux/byteorder/big_endian.h    | 24 ++++++++++++------------
 include/uapi/linux/byteorder/little_endian.h | 24 ++++++++++++------------
 include/uapi/linux/swab.h                    | 10 +++++-----
 3 files changed, 29 insertions(+), 29 deletions(-)

diff --git a/include/uapi/linux/byteorder/big_endian.h b/include/uapi/linux/byteorder/big_endian.h
index 6723744..cdab17a 100644
--- a/include/uapi/linux/byteorder/big_endian.h
+++ b/include/uapi/linux/byteorder/big_endian.h
@@ -40,51 +40,51 @@
 #define __cpu_to_be16(x) ((__force __be16)(__u16)(x))
 #define __be16_to_cpu(x) ((__force __u16)(__be16)(x))
 
-static inline __le64 __cpu_to_le64p(const __u64 *p)
+static __always_inline __le64 __cpu_to_le64p(const __u64 *p)
 {
 	return (__force __le64)__swab64p(p);
 }
-static inline __u64 __le64_to_cpup(const __le64 *p)
+static __always_inline __u64 __le64_to_cpup(const __le64 *p)
 {
 	return __swab64p((__u64 *)p);
 }
-static inline __le32 __cpu_to_le32p(const __u32 *p)
+static __always_inline __le32 __cpu_to_le32p(const __u32 *p)
 {
 	return (__force __le32)__swab32p(p);
 }
-static inline __u32 __le32_to_cpup(const __le32 *p)
+static __always_inline __u32 __le32_to_cpup(const __le32 *p)
 {
 	return __swab32p((__u32 *)p);
 }
-static inline __le16 __cpu_to_le16p(const __u16 *p)
+static __always_inline __le16 __cpu_to_le16p(const __u16 *p)
 {
 	return (__force __le16)__swab16p(p);
 }
-static inline __u16 __le16_to_cpup(const __le16 *p)
+static __always_inline __u16 __le16_to_cpup(const __le16 *p)
 {
 	return __swab16p((__u16 *)p);
 }
-static inline __be64 __cpu_to_be64p(const __u64 *p)
+static __always_inline __be64 __cpu_to_be64p(const __u64 *p)
 {
 	return (__force __be64)*p;
 }
-static inline __u64 __be64_to_cpup(const __be64 *p)
+static __always_inline __u64 __be64_to_cpup(const __be64 *p)
 {
 	return (__force __u64)*p;
 }
-static inline __be32 __cpu_to_be32p(const __u32 *p)
+static __always_inline __be32 __cpu_to_be32p(const __u32 *p)
 {
 	return (__force __be32)*p;
 }
-static inline __u32 __be32_to_cpup(const __be32 *p)
+static __always_inline __u32 __be32_to_cpup(const __be32 *p)
 {
 	return (__force __u32)*p;
 }
-static inline __be16 __cpu_to_be16p(const __u16 *p)
+static __always_inline __be16 __cpu_to_be16p(const __u16 *p)
 {
 	return (__force __be16)*p;
 }
-static inline __u16 __be16_to_cpup(const __be16 *p)
+static __always_inline __u16 __be16_to_cpup(const __be16 *p)
 {
 	return (__force __u16)*p;
 }
diff --git a/include/uapi/linux/byteorder/little_endian.h b/include/uapi/linux/byteorder/little_endian.h
index d876736..4b93f2b 100644
--- a/include/uapi/linux/byteorder/little_endian.h
+++ b/include/uapi/linux/byteorder/little_endian.h
@@ -40,51 +40,51 @@
 #define __cpu_to_be16(x) ((__force __be16)__swab16((x)))
 #define __be16_to_cpu(x) __swab16((__force __u16)(__be16)(x))
 
-static inline __le64 __cpu_to_le64p(const __u64 *p)
+static __always_inline __le64 __cpu_to_le64p(const __u64 *p)
 {
 	return (__force __le64)*p;
 }
-static inline __u64 __le64_to_cpup(const __le64 *p)
+static __always_inline __u64 __le64_to_cpup(const __le64 *p)
 {
 	return (__force __u64)*p;
 }
-static inline __le32 __cpu_to_le32p(const __u32 *p)
+static __always_inline __le32 __cpu_to_le32p(const __u32 *p)
 {
 	return (__force __le32)*p;
 }
-static inline __u32 __le32_to_cpup(const __le32 *p)
+static __always_inline __u32 __le32_to_cpup(const __le32 *p)
 {
 	return (__force __u32)*p;
 }
-static inline __le16 __cpu_to_le16p(const __u16 *p)
+static __always_inline __le16 __cpu_to_le16p(const __u16 *p)
 {
 	return (__force __le16)*p;
 }
-static inline __u16 __le16_to_cpup(const __le16 *p)
+static __always_inline __u16 __le16_to_cpup(const __le16 *p)
 {
 	return (__force __u16)*p;
 }
-static inline __be64 __cpu_to_be64p(const __u64 *p)
+static __always_inline __be64 __cpu_to_be64p(const __u64 *p)
 {
 	return (__force __be64)__swab64p(p);
 }
-static inline __u64 __be64_to_cpup(const __be64 *p)
+static __always_inline __u64 __be64_to_cpup(const __be64 *p)
 {
 	return __swab64p((__u64 *)p);
 }
-static inline __be32 __cpu_to_be32p(const __u32 *p)
+static __always_inline __be32 __cpu_to_be32p(const __u32 *p)
 {
 	return (__force __be32)__swab32p(p);
 }
-static inline __u32 __be32_to_cpup(const __be32 *p)
+static __always_inline __u32 __be32_to_cpup(const __be32 *p)
 {
 	return __swab32p((__u32 *)p);
 }
-static inline __be16 __cpu_to_be16p(const __u16 *p)
+static __always_inline __be16 __cpu_to_be16p(const __u16 *p)
 {
 	return (__force __be16)__swab16p(p);
 }
-static inline __u16 __be16_to_cpup(const __be16 *p)
+static __always_inline __u16 __be16_to_cpup(const __be16 *p)
 {
 	return __swab16p((__u16 *)p);
 }
diff --git a/include/uapi/linux/swab.h b/include/uapi/linux/swab.h
index 0e011eb..3f10e53 100644
--- a/include/uapi/linux/swab.h
+++ b/include/uapi/linux/swab.h
@@ -151,7 +151,7 @@ static inline __attribute_const__ __u32 __fswahb32(__u32 val)
  * __swab16p - return a byteswapped 16-bit value from a pointer
  * @p: pointer to a naturally-aligned 16-bit value
  */
-static inline __u16 __swab16p(const __u16 *p)
+static __always_inline __u16 __swab16p(const __u16 *p)
 {
 #ifdef __arch_swab16p
 	return __arch_swab16p(p);
@@ -164,7 +164,7 @@ static inline __u16 __swab16p(const __u16 *p)
  * __swab32p - return a byteswapped 32-bit value from a pointer
  * @p: pointer to a naturally-aligned 32-bit value
  */
-static inline __u32 __swab32p(const __u32 *p)
+static __always_inline __u32 __swab32p(const __u32 *p)
 {
 #ifdef __arch_swab32p
 	return __arch_swab32p(p);
@@ -177,7 +177,7 @@ static inline __u32 __swab32p(const __u32 *p)
  * __swab64p - return a byteswapped 64-bit value from a pointer
  * @p: pointer to a naturally-aligned 64-bit value
  */
-static inline __u64 __swab64p(const __u64 *p)
+static __always_inline __u64 __swab64p(const __u64 *p)
 {
 #ifdef __arch_swab64p
 	return __arch_swab64p(p);
@@ -232,7 +232,7 @@ static inline void __swab16s(__u16 *p)
  * __swab32s - byteswap a 32-bit value in-place
  * @p: pointer to a naturally-aligned 32-bit value
  */
-static inline void __swab32s(__u32 *p)
+static __always_inline void __swab32s(__u32 *p)
 {
 #ifdef __arch_swab32s
 	__arch_swab32s(p);
@@ -245,7 +245,7 @@ static inline void __swab32s(__u32 *p)
  * __swab64s - byteswap a 64-bit value in-place
  * @p: pointer to a naturally-aligned 64-bit value
  */
-static inline void __swab64s(__u64 *p)
+static __always_inline void __swab64s(__u64 *p)
 {
 #ifdef __arch_swab64s
 	__arch_swab64s(p);
-- 
1.8.1.4

^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [PATCH] force inlining of unaligned byteswap operations
  2016-02-04 19:45 [PATCH] asm-generic: force inlining of some atomic_long operations Denys Vlasenko
  2016-02-04 19:45 ` [PATCH] force inlining of some byteswap operations Denys Vlasenko
@ 2016-02-04 19:45 ` Denys Vlasenko
  2016-02-05  7:28   ` Ingo Molnar
  1 sibling, 1 reply; 45+ messages in thread
From: Denys Vlasenko @ 2016-02-04 19:45 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Denys Vlasenko, Thomas Graf, Peter Zijlstra, David Rientjes,
	Andrew Morton, linux-kernel

Sometimes gcc mysteriously doesn't inline
very small functions we expect to be inlined. See
    https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66122

With this .config:
http://busybox.net/~vda/kernel_config_OPTIMIZE_INLINING_and_Os,
the following functions get deinlined many times.
Examples of disassembly:

<get_unaligned_be16> (24 copies, 108 calls):
       66 8b 07                mov    (%rdi),%ax
       55                      push   %rbp
       48 89 e5                mov    %rsp,%rbp
       86 e0                   xchg   %ah,%al
       5d                      pop    %rbp
       c3                      retq

<get_unaligned_be32> (25 copies, 181 calls):
       8b 07                   mov    (%rdi),%eax
       55                      push   %rbp
       48 89 e5                mov    %rsp,%rbp
       0f c8                   bswap  %eax
       5d                      pop    %rbp
       c3                      retq

<get_unaligned_be64> (23 copies, 94 calls):
       48 8b 07                mov    (%rdi),%rax
       55                      push   %rbp
       48 89 e5                mov    %rsp,%rbp
       48 0f c8                bswap  %rax
       5d                      pop    %rbp
       c3                      retq

<put_unaligned_be16> (2 copies, 11 calls):
       89 f8                   mov    %edi,%eax
       55                      push   %rbp
       c1 ef 08                shr    $0x8,%edi
       c1 e0 08                shl    $0x8,%eax
       09 c7                   or     %eax,%edi
       48 89 e5                mov    %rsp,%rbp
       66 89 3e                mov    %di,(%rsi)

<put_unaligned_be32> (8 copies, 43 calls):
       55                      push   %rbp
       0f cf                   bswap  %edi
       89 3e                   mov    %edi,(%rsi)
       48 89 e5                mov    %rsp,%rbp
       5d                      pop    %rbp
       c3                      retq

<put_unaligned_be64> (26 copies, 157 calls):
       55                      push   %rbp
       48 0f cf                bswap  %rdi
       48 89 3e                mov    %rdi,(%rsi)
       48 89 e5                mov    %rsp,%rbp
       5d                      pop    %rbp
       c3                      retq

This patch fixes this via s/inline/__always_inline/.

It only affects arches with efficient unaligned access insns, such as x86.
(arched which lack such ops do not include linux/unaligned/access_ok.h)

Code size decrease after the patch is ~8.5k:

    text     data      bss       dec     hex filename
92197848 20826112 36417536 149441496 8e84bd8 vmlinux
92189231 20826144 36417536 149432911 8e82a4f vmlinux6_unaligned_be_after

Signed-off-by: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Thomas Graf <tgraf@suug.ch>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: linux-kernel@vger.kernel.org
---
 include/linux/unaligned/access_ok.h | 24 ++++++++++++------------
 1 file changed, 12 insertions(+), 12 deletions(-)

diff --git a/include/linux/unaligned/access_ok.h b/include/linux/unaligned/access_ok.h
index 99c1b4d..33383ca 100644
--- a/include/linux/unaligned/access_ok.h
+++ b/include/linux/unaligned/access_ok.h
@@ -4,62 +4,62 @@
 #include <linux/kernel.h>
 #include <asm/byteorder.h>
 
-static inline u16 get_unaligned_le16(const void *p)
+static __always_inline u16 get_unaligned_le16(const void *p)
 {
 	return le16_to_cpup((__le16 *)p);
 }
 
-static inline u32 get_unaligned_le32(const void *p)
+static __always_inline u32 get_unaligned_le32(const void *p)
 {
 	return le32_to_cpup((__le32 *)p);
 }
 
-static inline u64 get_unaligned_le64(const void *p)
+static __always_inline u64 get_unaligned_le64(const void *p)
 {
 	return le64_to_cpup((__le64 *)p);
 }
 
-static inline u16 get_unaligned_be16(const void *p)
+static __always_inline u16 get_unaligned_be16(const void *p)
 {
 	return be16_to_cpup((__be16 *)p);
 }
 
-static inline u32 get_unaligned_be32(const void *p)
+static __always_inline u32 get_unaligned_be32(const void *p)
 {
 	return be32_to_cpup((__be32 *)p);
 }
 
-static inline u64 get_unaligned_be64(const void *p)
+static __always_inline u64 get_unaligned_be64(const void *p)
 {
 	return be64_to_cpup((__be64 *)p);
 }
 
-static inline void put_unaligned_le16(u16 val, void *p)
+static __always_inline void put_unaligned_le16(u16 val, void *p)
 {
 	*((__le16 *)p) = cpu_to_le16(val);
 }
 
-static inline void put_unaligned_le32(u32 val, void *p)
+static __always_inline void put_unaligned_le32(u32 val, void *p)
 {
 	*((__le32 *)p) = cpu_to_le32(val);
 }
 
-static inline void put_unaligned_le64(u64 val, void *p)
+static __always_inline void put_unaligned_le64(u64 val, void *p)
 {
 	*((__le64 *)p) = cpu_to_le64(val);
 }
 
-static inline void put_unaligned_be16(u16 val, void *p)
+static __always_inline void put_unaligned_be16(u16 val, void *p)
 {
 	*((__be16 *)p) = cpu_to_be16(val);
 }
 
-static inline void put_unaligned_be32(u32 val, void *p)
+static __always_inline void put_unaligned_be32(u32 val, void *p)
 {
 	*((__be32 *)p) = cpu_to_be32(val);
 }
 
-static inline void put_unaligned_be64(u64 val, void *p)
+static __always_inline void put_unaligned_be64(u64 val, void *p)
 {
 	*((__be64 *)p) = cpu_to_be64(val);
 }
-- 
1.8.1.4

^ permalink raw reply related	[flat|nested] 45+ messages in thread

* Re: [PATCH] force inlining of some byteswap operations
  2016-02-04 19:45 ` [PATCH] force inlining of some byteswap operations Denys Vlasenko
@ 2016-02-05  7:28   ` Ingo Molnar
  2016-04-13  3:36   ` This patch triggers a bad gcc bug (was Re: [PATCH] force inlining of some byteswap operations) Josh Poimboeuf
  1 sibling, 0 replies; 45+ messages in thread
From: Ingo Molnar @ 2016-02-05  7:28 UTC (permalink / raw)
  To: Denys Vlasenko
  Cc: Thomas Graf, Peter Zijlstra, David Rientjes, Andrew Morton,
	linux-kernel


* Denys Vlasenko <dvlasenk@redhat.com> wrote:

> Sometimes gcc mysteriously doesn't inline
> very small functions we expect to be inlined. See
>     https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66122
> 
> With this .config:
> http://busybox.net/~vda/kernel_config_OPTIMIZE_INLINING_and_Os,
> the following functions get deinlined many times.
> Examples of disassembly:
> 
> <get_unaligned_be16> (12 copies, 51 calls):
>        66 8b 07                mov    (%rdi),%ax
>        55                      push   %rbp
>        48 89 e5                mov    %rsp,%rbp
>        86 e0                   xchg   %ah,%al
>        5d                      pop    %rbp
>        c3                      retq
> 
> <get_unaligned_be32> (12 copies, 135 calls):
>        8b 07                   mov    (%rdi),%eax
>        55                      push   %rbp
>        48 89 e5                mov    %rsp,%rbp
>        0f c8                   bswap  %eax
>        5d                      pop    %rbp
>        c3                      retq
> 
> <get_unaligned_be64> (2 copies, 20 calls):
>        48 8b 07                mov    (%rdi),%rax
>        55                      push   %rbp
>        48 89 e5                mov    %rsp,%rbp
>        48 0f c8                bswap  %rax
>        5d                      pop    %rbp
>        c3                      retq
> 
> <__swab16p> (16 copies, 146 calls):
>        55                      push   %rbp
>        89 f8                   mov    %edi,%eax
>        86 e0                   xchg   %ah,%al
>        48 89 e5                mov    %rsp,%rbp
>        5d                      pop    %rbp
>        c3                      retq
> 
> <__swab32p> (43 copies, ~560 calls):
>        55                      push   %rbp
>        89 f8                   mov    %edi,%eax
>        0f c8                   bswap  %eax
>        48 89 e5                mov    %rsp,%rbp
>        5d                      pop    %rbp
>        c3                      retq
> 
> <__swab64p> (21 copies, 119 calls):
>        55                      push   %rbp
>        48 89 f8                mov    %rdi,%rax
>        48 0f c8                bswap  %rax
>        48 89 e5                mov    %rsp,%rbp
>        5d                      pop    %rbp
>        c3                      retq
> 
> <__swab32s> (6 copies, 47 calls):
>        8b 07                   mov    (%rdi),%eax
>        55                      push   %rbp
>        48 89 e5                mov    %rsp,%rbp
>        0f c8                   bswap  %eax
>        89 07                   mov    %eax,(%rdi)
>        5d                      pop    %rbp
>        c3                      retq
> 
> This patch fixes this via s/inline/__always_inline/.
> Code size decrease after the patch is ~4.5k:
> 
>     text     data      bss       dec     hex filename
> 92202377 20826112 36417536 149446025 8e85d89 vmlinux
> 92197848 20826112 36417536 149441496 8e84bd8 vmlinux5_swap_after

Acked-by: Ingo Molnar <mingo@kernel.org>

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH] force inlining of unaligned byteswap operations
  2016-02-04 19:45 ` [PATCH] force inlining of unaligned byteswap operations Denys Vlasenko
@ 2016-02-05  7:28   ` Ingo Molnar
  0 siblings, 0 replies; 45+ messages in thread
From: Ingo Molnar @ 2016-02-05  7:28 UTC (permalink / raw)
  To: Denys Vlasenko
  Cc: Thomas Graf, Peter Zijlstra, David Rientjes, Andrew Morton,
	linux-kernel


* Denys Vlasenko <dvlasenk@redhat.com> wrote:

> Code size decrease after the patch is ~8.5k:
> 
>     text     data      bss       dec     hex filename
> 92197848 20826112 36417536 149441496 8e84bd8 vmlinux
> 92189231 20826144 36417536 149432911 8e82a4f vmlinux6_unaligned_be_after

Acked-by: Ingo Molnar <mingo@kernel.org>

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 45+ messages in thread

* This patch triggers a bad gcc bug (was Re: [PATCH] force inlining of some byteswap operations)
  2016-02-04 19:45 ` [PATCH] force inlining of some byteswap operations Denys Vlasenko
  2016-02-05  7:28   ` Ingo Molnar
@ 2016-04-13  3:36   ` Josh Poimboeuf
  2016-04-13 12:12     ` Denys Vlasenko
  1 sibling, 1 reply; 45+ messages in thread
From: Josh Poimboeuf @ 2016-04-13  3:36 UTC (permalink / raw)
  To: Denys Vlasenko
  Cc: Ingo Molnar, Thomas Graf, Peter Zijlstra, David Rientjes,
	Andrew Morton, linux-kernel, Arnd Bergmann

On Thu, Feb 04, 2016 at 08:45:35PM +0100, Denys Vlasenko wrote:
> Sometimes gcc mysteriously doesn't inline
> very small functions we expect to be inlined. See
>     https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66122
> 
> With this .config:
> http://busybox.net/~vda/kernel_config_OPTIMIZE_INLINING_and_Os,
> the following functions get deinlined many times.
> Examples of disassembly:
> 
> <get_unaligned_be16> (12 copies, 51 calls):
>        66 8b 07                mov    (%rdi),%ax
>        55                      push   %rbp
>        48 89 e5                mov    %rsp,%rbp
>        86 e0                   xchg   %ah,%al
>        5d                      pop    %rbp
>        c3                      retq
> 
> <get_unaligned_be32> (12 copies, 135 calls):
>        8b 07                   mov    (%rdi),%eax
>        55                      push   %rbp
>        48 89 e5                mov    %rsp,%rbp
>        0f c8                   bswap  %eax
>        5d                      pop    %rbp
>        c3                      retq
> 
> <get_unaligned_be64> (2 copies, 20 calls):
>        48 8b 07                mov    (%rdi),%rax
>        55                      push   %rbp
>        48 89 e5                mov    %rsp,%rbp
>        48 0f c8                bswap  %rax
>        5d                      pop    %rbp
>        c3                      retq
> 
> <__swab16p> (16 copies, 146 calls):
>        55                      push   %rbp
>        89 f8                   mov    %edi,%eax
>        86 e0                   xchg   %ah,%al
>        48 89 e5                mov    %rsp,%rbp
>        5d                      pop    %rbp
>        c3                      retq
> 
> <__swab32p> (43 copies, ~560 calls):
>        55                      push   %rbp
>        89 f8                   mov    %edi,%eax
>        0f c8                   bswap  %eax
>        48 89 e5                mov    %rsp,%rbp
>        5d                      pop    %rbp
>        c3                      retq
> 
> <__swab64p> (21 copies, 119 calls):
>        55                      push   %rbp
>        48 89 f8                mov    %rdi,%rax
>        48 0f c8                bswap  %rax
>        48 89 e5                mov    %rsp,%rbp
>        5d                      pop    %rbp
>        c3                      retq
> 
> <__swab32s> (6 copies, 47 calls):
>        8b 07                   mov    (%rdi),%eax
>        55                      push   %rbp
>        48 89 e5                mov    %rsp,%rbp
>        0f c8                   bswap  %eax
>        89 07                   mov    %eax,(%rdi)
>        5d                      pop    %rbp
>        c3                      retq
> 
> This patch fixes this via s/inline/__always_inline/.
> Code size decrease after the patch is ~4.5k:
> 
>     text     data      bss       dec     hex filename
> 92202377 20826112 36417536 149446025 8e85d89 vmlinux
> 92197848 20826112 36417536 149441496 8e84bd8 vmlinux5_swap_after
> 
> Signed-off-by: Denys Vlasenko <dvlasenk@redhat.com>
> Cc: Ingo Molnar <mingo@kernel.org>
> Cc: Thomas Graf <tgraf@suug.ch>
> Cc: Peter Zijlstra <peterz@infradead.org>
> Cc: David Rientjes <rientjes@google.com>
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Cc: linux-kernel@vger.kernel.org
> ---
>  include/uapi/linux/byteorder/big_endian.h    | 24 ++++++++++++------------
>  include/uapi/linux/byteorder/little_endian.h | 24 ++++++++++++------------
>  include/uapi/linux/swab.h                    | 10 +++++-----
>  3 files changed, 29 insertions(+), 29 deletions(-)

Hi,

This patch seems to trigger a gcc bug which can produce corrupt code.  I
discovered it when investigating an objtool warning reported by kbuild
bot:

  http://www.spinics.net/lists/linux-scsi/msg95481.html

>From the disassembly of drivers/scsi/qla2xxx/qla_attr.o:

0000000000002f53 <qla2x00_get_host_fabric_name>:
    2f53:       55                      push   %rbp
    2f54:       48 89 e5                mov    %rsp,%rbp

0000000000002f57 <qla2x00_get_fc_host_stats>:
    2f57:       55                      push   %rbp
    2f58:       b9 e8 00 00 00          mov    $0xe8,%ecx
    2f5d:       48 89 e5                mov    %rsp,%rbp
...

Note that qla2x00_get_host_fabric_name() is inexplicably truncated after
setting up the frame pointer.  It falls through to the next function, which is
very wrong.

I can recreate it with either gcc 5.3.1 or gcc 6.0 on linus/master with
the .config from the above link.

The call chain which appears to trigger the problem is:

qla2x00_get_host_fabric_name()
  wwn_to_u64()
    get_unaligned_be64()
      be64_to_cpup()
        __be64_to_cpup() <- changed to __always_inline by this patch

It occurs with the combination of the following two recent commits:

- bc27fb68aaad ("include/uapi/linux/byteorder, swab: force inlining of some byteswap operations")
- ef3fb2422ffe ("scsi: fc: use get/put_unaligned64 for wwn access")

I can confirm that reverting either patch makes the problem go away.
I'm planning on opening a gcc bug tomorrow.

> -static inline __u64 __be64_to_cpup(const __be64 *p)
> +static __always_inline __u64 __be64_to_cpup(const __be64 *p)
>  {
>  	return __swab64p((__u64 *)p);
>  }

-- 
Josh

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: This patch triggers a bad gcc bug (was Re: [PATCH] force inlining of some byteswap operations)
  2016-04-13  3:36   ` This patch triggers a bad gcc bug (was Re: [PATCH] force inlining of some byteswap operations) Josh Poimboeuf
@ 2016-04-13 12:12     ` Denys Vlasenko
  2016-04-13 12:36       ` Josh Poimboeuf
  0 siblings, 1 reply; 45+ messages in thread
From: Denys Vlasenko @ 2016-04-13 12:12 UTC (permalink / raw)
  To: Josh Poimboeuf
  Cc: Ingo Molnar, Thomas Graf, Peter Zijlstra, David Rientjes,
	Andrew Morton, linux-kernel, Arnd Bergmann

On 04/13/2016 05:36 AM, Josh Poimboeuf wrote:
> On Thu, Feb 04, 2016 at 08:45:35PM +0100, Denys Vlasenko wrote:
>> Sometimes gcc mysteriously doesn't inline
>> very small functions we expect to be inlined. See
>>     https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66122
>>
>> With this .config:
>> http://busybox.net/~vda/kernel_config_OPTIMIZE_INLINING_and_Os,
>> the following functions get deinlined many times.
>> Examples of disassembly:
>>
>> <get_unaligned_be16> (12 copies, 51 calls):
>>        66 8b 07                mov    (%rdi),%ax
>>        55                      push   %rbp
>>        48 89 e5                mov    %rsp,%rbp
>>        86 e0                   xchg   %ah,%al
>>        5d                      pop    %rbp
>>        c3                      retq
...
>> This patch fixes this via s/inline/__always_inline/.
>> Code size decrease after the patch is ~4.5k:
>>
>>     text     data      bss       dec     hex filename
>> 92202377 20826112 36417536 149446025 8e85d89 vmlinux
>> 92197848 20826112 36417536 149441496 8e84bd8 vmlinux5_swap_after
>>
>> Signed-off-by: Denys Vlasenko <dvlasenk@redhat.com>
>> Cc: Ingo Molnar <mingo@kernel.org>
>> Cc: Thomas Graf <tgraf@suug.ch>
>> Cc: Peter Zijlstra <peterz@infradead.org>
>> Cc: David Rientjes <rientjes@google.com>
>> Cc: Andrew Morton <akpm@linux-foundation.org>
>> Cc: linux-kernel@vger.kernel.org
>> ---
>>  include/uapi/linux/byteorder/big_endian.h    | 24 ++++++++++++------------
>>  include/uapi/linux/byteorder/little_endian.h | 24 ++++++++++++------------
>>  include/uapi/linux/swab.h                    | 10 +++++-----
>>  3 files changed, 29 insertions(+), 29 deletions(-)
> 
> Hi,
> 
> This patch seems to trigger a gcc bug which can produce corrupt code.  I
> discovered it when investigating an objtool warning reported by kbuild
> bot:
> 
>   http://www.spinics.net/lists/linux-scsi/msg95481.html
> 
> From the disassembly of drivers/scsi/qla2xxx/qla_attr.o:
> 
> 0000000000002f53 <qla2x00_get_host_fabric_name>:
>     2f53:       55                      push   %rbp
>     2f54:       48 89 e5                mov    %rsp,%rbp
> 
> 0000000000002f57 <qla2x00_get_fc_host_stats>:
>     2f57:       55                      push   %rbp
>     2f58:       b9 e8 00 00 00          mov    $0xe8,%ecx
>     2f5d:       48 89 e5                mov    %rsp,%rbp
> ...
> 
> Note that qla2x00_get_host_fabric_name() is inexplicably truncated after
> setting up the frame pointer.  It falls through to the next function, which is
> very wrong.

Wow, that's ... interesting.


> I can recreate it with either gcc 5.3.1 or gcc 6.0 on linus/master with
> the .config from the above link.
> 
> The call chain which appears to trigger the problem is:
> 
> qla2x00_get_host_fabric_name()
>   wwn_to_u64()
>     get_unaligned_be64()
>       be64_to_cpup()
>         __be64_to_cpup() <- changed to __always_inline by this patch
> 
> It occurs with the combination of the following two recent commits:
> 
> - bc27fb68aaad ("include/uapi/linux/byteorder, swab: force inlining of some byteswap operations")
> - ef3fb2422ffe ("scsi: fc: use get/put_unaligned64 for wwn access")
> 
> I can confirm that reverting either patch makes the problem go away.
> I'm planning on opening a gcc bug tomorrow.


Note that if CONFIG_OPTIMIZE_INLINING is not set, _all_ "inline"
keywords are in fact __always_inline, so the bug must be triggering
even without the patch.

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: This patch triggers a bad gcc bug (was Re: [PATCH] force inlining of some byteswap operations)
  2016-04-13 12:12     ` Denys Vlasenko
@ 2016-04-13 12:36       ` Josh Poimboeuf
  2016-04-13 15:15         ` Josh Poimboeuf
  0 siblings, 1 reply; 45+ messages in thread
From: Josh Poimboeuf @ 2016-04-13 12:36 UTC (permalink / raw)
  To: Denys Vlasenko
  Cc: Ingo Molnar, Thomas Graf, Peter Zijlstra, David Rientjes,
	Andrew Morton, linux-kernel, Arnd Bergmann

On Wed, Apr 13, 2016 at 02:12:25PM +0200, Denys Vlasenko wrote:
> On 04/13/2016 05:36 AM, Josh Poimboeuf wrote:
> > On Thu, Feb 04, 2016 at 08:45:35PM +0100, Denys Vlasenko wrote:
> >> Sometimes gcc mysteriously doesn't inline
> >> very small functions we expect to be inlined. See
> >>     https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66122
> >>
> >> With this .config:
> >> http://busybox.net/~vda/kernel_config_OPTIMIZE_INLINING_and_Os,
> >> the following functions get deinlined many times.
> >> Examples of disassembly:
> >>
> >> <get_unaligned_be16> (12 copies, 51 calls):
> >>        66 8b 07                mov    (%rdi),%ax
> >>        55                      push   %rbp
> >>        48 89 e5                mov    %rsp,%rbp
> >>        86 e0                   xchg   %ah,%al
> >>        5d                      pop    %rbp
> >>        c3                      retq
> ...
> >> This patch fixes this via s/inline/__always_inline/.
> >> Code size decrease after the patch is ~4.5k:
> >>
> >>     text     data      bss       dec     hex filename
> >> 92202377 20826112 36417536 149446025 8e85d89 vmlinux
> >> 92197848 20826112 36417536 149441496 8e84bd8 vmlinux5_swap_after
> >>
> >> Signed-off-by: Denys Vlasenko <dvlasenk@redhat.com>
> >> Cc: Ingo Molnar <mingo@kernel.org>
> >> Cc: Thomas Graf <tgraf@suug.ch>
> >> Cc: Peter Zijlstra <peterz@infradead.org>
> >> Cc: David Rientjes <rientjes@google.com>
> >> Cc: Andrew Morton <akpm@linux-foundation.org>
> >> Cc: linux-kernel@vger.kernel.org
> >> ---
> >>  include/uapi/linux/byteorder/big_endian.h    | 24 ++++++++++++------------
> >>  include/uapi/linux/byteorder/little_endian.h | 24 ++++++++++++------------
> >>  include/uapi/linux/swab.h                    | 10 +++++-----
> >>  3 files changed, 29 insertions(+), 29 deletions(-)
> > 
> > Hi,
> > 
> > This patch seems to trigger a gcc bug which can produce corrupt code.  I
> > discovered it when investigating an objtool warning reported by kbuild
> > bot:
> > 
> >   http://www.spinics.net/lists/linux-scsi/msg95481.html
> > 
> > From the disassembly of drivers/scsi/qla2xxx/qla_attr.o:
> > 
> > 0000000000002f53 <qla2x00_get_host_fabric_name>:
> >     2f53:       55                      push   %rbp
> >     2f54:       48 89 e5                mov    %rsp,%rbp
> > 
> > 0000000000002f57 <qla2x00_get_fc_host_stats>:
> >     2f57:       55                      push   %rbp
> >     2f58:       b9 e8 00 00 00          mov    $0xe8,%ecx
> >     2f5d:       48 89 e5                mov    %rsp,%rbp
> > ...
> > 
> > Note that qla2x00_get_host_fabric_name() is inexplicably truncated after
> > setting up the frame pointer.  It falls through to the next function, which is
> > very wrong.
> 
> Wow, that's ... interesting.
> 
> 
> > I can recreate it with either gcc 5.3.1 or gcc 6.0 on linus/master with
> > the .config from the above link.
> > 
> > The call chain which appears to trigger the problem is:
> > 
> > qla2x00_get_host_fabric_name()
> >   wwn_to_u64()
> >     get_unaligned_be64()
> >       be64_to_cpup()
> >         __be64_to_cpup() <- changed to __always_inline by this patch
> > 
> > It occurs with the combination of the following two recent commits:
> > 
> > - bc27fb68aaad ("include/uapi/linux/byteorder, swab: force inlining of some byteswap operations")
> > - ef3fb2422ffe ("scsi: fc: use get/put_unaligned64 for wwn access")
> > 
> > I can confirm that reverting either patch makes the problem go away.
> > I'm planning on opening a gcc bug tomorrow.
> 
> 
> Note that if CONFIG_OPTIMIZE_INLINING is not set, _all_ "inline"
> keywords are in fact __always_inline, so the bug must be triggering
> even without the patch.

Makes sense in theory, but the bug doesn't actually trigger when I
revert the patch and set CONFIG_OPTIMIZE_INLINING=n.

Perhaps even more surprising, it doesn't trigger *with* the patch and
CONFIG_OPTIMIZE_INLINING=n.

-- 
Josh

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: This patch triggers a bad gcc bug (was Re: [PATCH] force inlining of some byteswap operations)
  2016-04-13 12:36       ` Josh Poimboeuf
@ 2016-04-13 15:15         ` Josh Poimboeuf
  2016-04-13 16:55           ` James Bottomley
  0 siblings, 1 reply; 45+ messages in thread
From: Josh Poimboeuf @ 2016-04-13 15:15 UTC (permalink / raw)
  To: Denys Vlasenko
  Cc: Ingo Molnar, Thomas Graf, Peter Zijlstra, David Rientjes,
	Andrew Morton, linux-kernel, Arnd Bergmann, James Bottomley

On Wed, Apr 13, 2016 at 07:36:07AM -0500, Josh Poimboeuf wrote:
> On Wed, Apr 13, 2016 at 02:12:25PM +0200, Denys Vlasenko wrote:
> > On 04/13/2016 05:36 AM, Josh Poimboeuf wrote:
> > > On Thu, Feb 04, 2016 at 08:45:35PM +0100, Denys Vlasenko wrote:
> > >> Sometimes gcc mysteriously doesn't inline
> > >> very small functions we expect to be inlined. See
> > >>     https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66122
> > >>
> > >> With this .config:
> > >> http://busybox.net/~vda/kernel_config_OPTIMIZE_INLINING_and_Os,
> > >> the following functions get deinlined many times.
> > >> Examples of disassembly:
> > >>
> > >> <get_unaligned_be16> (12 copies, 51 calls):
> > >>        66 8b 07                mov    (%rdi),%ax
> > >>        55                      push   %rbp
> > >>        48 89 e5                mov    %rsp,%rbp
> > >>        86 e0                   xchg   %ah,%al
> > >>        5d                      pop    %rbp
> > >>        c3                      retq
> > ...
> > >> This patch fixes this via s/inline/__always_inline/.
> > >> Code size decrease after the patch is ~4.5k:
> > >>
> > >>     text     data      bss       dec     hex filename
> > >> 92202377 20826112 36417536 149446025 8e85d89 vmlinux
> > >> 92197848 20826112 36417536 149441496 8e84bd8 vmlinux5_swap_after
> > >>
> > >> Signed-off-by: Denys Vlasenko <dvlasenk@redhat.com>
> > >> Cc: Ingo Molnar <mingo@kernel.org>
> > >> Cc: Thomas Graf <tgraf@suug.ch>
> > >> Cc: Peter Zijlstra <peterz@infradead.org>
> > >> Cc: David Rientjes <rientjes@google.com>
> > >> Cc: Andrew Morton <akpm@linux-foundation.org>
> > >> Cc: linux-kernel@vger.kernel.org
> > >> ---
> > >>  include/uapi/linux/byteorder/big_endian.h    | 24 ++++++++++++------------
> > >>  include/uapi/linux/byteorder/little_endian.h | 24 ++++++++++++------------
> > >>  include/uapi/linux/swab.h                    | 10 +++++-----
> > >>  3 files changed, 29 insertions(+), 29 deletions(-)
> > > 
> > > Hi,
> > > 
> > > This patch seems to trigger a gcc bug which can produce corrupt code.  I
> > > discovered it when investigating an objtool warning reported by kbuild
> > > bot:
> > > 
> > >   http://www.spinics.net/lists/linux-scsi/msg95481.html
> > > 
> > > From the disassembly of drivers/scsi/qla2xxx/qla_attr.o:
> > > 
> > > 0000000000002f53 <qla2x00_get_host_fabric_name>:
> > >     2f53:       55                      push   %rbp
> > >     2f54:       48 89 e5                mov    %rsp,%rbp
> > > 
> > > 0000000000002f57 <qla2x00_get_fc_host_stats>:
> > >     2f57:       55                      push   %rbp
> > >     2f58:       b9 e8 00 00 00          mov    $0xe8,%ecx
> > >     2f5d:       48 89 e5                mov    %rsp,%rbp
> > > ...
> > > 
> > > Note that qla2x00_get_host_fabric_name() is inexplicably truncated after
> > > setting up the frame pointer.  It falls through to the next function, which is
> > > very wrong.
> > 
> > Wow, that's ... interesting.
> > 
> > 
> > > I can recreate it with either gcc 5.3.1 or gcc 6.0 on linus/master with
> > > the .config from the above link.
> > > 
> > > The call chain which appears to trigger the problem is:
> > > 
> > > qla2x00_get_host_fabric_name()
> > >   wwn_to_u64()
> > >     get_unaligned_be64()
> > >       be64_to_cpup()
> > >         __be64_to_cpup() <- changed to __always_inline by this patch
> > > 
> > > It occurs with the combination of the following two recent commits:
> > > 
> > > - bc27fb68aaad ("include/uapi/linux/byteorder, swab: force inlining of some byteswap operations")
> > > - ef3fb2422ffe ("scsi: fc: use get/put_unaligned64 for wwn access")
> > > 
> > > I can confirm that reverting either patch makes the problem go away.
> > > I'm planning on opening a gcc bug tomorrow.
> > 
> > 
> > Note that if CONFIG_OPTIMIZE_INLINING is not set, _all_ "inline"
> > keywords are in fact __always_inline, so the bug must be triggering
> > even without the patch.
> 
> Makes sense in theory, but the bug doesn't actually trigger when I
> revert the patch and set CONFIG_OPTIMIZE_INLINING=n.
> 
> Perhaps even more surprising, it doesn't trigger *with* the patch and
> CONFIG_OPTIMIZE_INLINING=n.

[ Adding James to CC since this bug affects scsi. ]

Here's the gcc bug:

  https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70646

-- 
Josh

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: This patch triggers a bad gcc bug (was Re: [PATCH] force inlining of some byteswap operations)
  2016-04-13 15:15         ` Josh Poimboeuf
@ 2016-04-13 16:55           ` James Bottomley
  2016-04-13 17:10             ` Josh Poimboeuf
  0 siblings, 1 reply; 45+ messages in thread
From: James Bottomley @ 2016-04-13 16:55 UTC (permalink / raw)
  To: Josh Poimboeuf, Denys Vlasenko
  Cc: Ingo Molnar, Thomas Graf, Peter Zijlstra, David Rientjes,
	Andrew Morton, linux-kernel, Arnd Bergmann, linux-scsi

On Wed, 2016-04-13 at 10:15 -0500, Josh Poimboeuf wrote:
> On Wed, Apr 13, 2016 at 07:36:07AM -0500, Josh Poimboeuf wrote:
> > On Wed, Apr 13, 2016 at 02:12:25PM +0200, Denys Vlasenko wrote:
> > > On 04/13/2016 05:36 AM, Josh Poimboeuf wrote:
> > > > On Thu, Feb 04, 2016 at 08:45:35PM +0100, Denys Vlasenko wrote:
> > > > > Sometimes gcc mysteriously doesn't inline
> > > > > very small functions we expect to be inlined. See
> > > > >     https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66122
> > > > > 
> > > > > With this .config:
> > > > > http://busybox.net/~vda/kernel_config_OPTIMIZE_INLINING_and_O
> > > > > s,
> > > > > the following functions get deinlined many times.
> > > > > Examples of disassembly:
> > > > > 
> > > > > <get_unaligned_be16> (12 copies, 51 calls):
> > > > >        66 8b 07                mov    (%rdi),%ax
> > > > >        55                      push   %rbp
> > > > >        48 89 e5                mov    %rsp,%rbp
> > > > >        86 e0                   xchg   %ah,%al
> > > > >        5d                      pop    %rbp
> > > > >        c3                      retq
> > > ...
> > > > > This patch fixes this via s/inline/__always_inline/.
> > > > > Code size decrease after the patch is ~4.5k:
> > > > > 
> > > > >     text     data      bss       dec     hex filename
> > > > > 92202377 20826112 36417536 149446025 8e85d89 vmlinux
> > > > > 92197848 20826112 36417536 149441496 8e84bd8
> > > > > vmlinux5_swap_after
> > > > > 
> > > > > Signed-off-by: Denys Vlasenko <dvlasenk@redhat.com>
> > > > > Cc: Ingo Molnar <mingo@kernel.org>
> > > > > Cc: Thomas Graf <tgraf@suug.ch>
> > > > > Cc: Peter Zijlstra <peterz@infradead.org>
> > > > > Cc: David Rientjes <rientjes@google.com>
> > > > > Cc: Andrew Morton <akpm@linux-foundation.org>
> > > > > Cc: linux-kernel@vger.kernel.org
> > > > > ---
> > > > >  include/uapi/linux/byteorder/big_endian.h    | 24
> > > > > ++++++++++++------------
> > > > >  include/uapi/linux/byteorder/little_endian.h | 24
> > > > > ++++++++++++------------
> > > > >  include/uapi/linux/swab.h                    | 10 +++++-----
> > > > >  3 files changed, 29 insertions(+), 29 deletions(-)
> > > > 
> > > > Hi,
> > > > 
> > > > This patch seems to trigger a gcc bug which can produce corrupt
> > > > code.  I
> > > > discovered it when investigating an objtool warning reported by
> > > > kbuild
> > > > bot:
> > > > 
> > > >   http://www.spinics.net/lists/linux-scsi/msg95481.html
> > > > 
> > > > From the disassembly of drivers/scsi/qla2xxx/qla_attr.o:
> > > > 
> > > > 0000000000002f53 <qla2x00_get_host_fabric_name>:
> > > >     2f53:       55                      push   %rbp
> > > >     2f54:       48 89 e5                mov    %rsp,%rbp
> > > > 
> > > > 0000000000002f57 <qla2x00_get_fc_host_stats>:
> > > >     2f57:       55                      push   %rbp
> > > >     2f58:       b9 e8 00 00 00          mov    $0xe8,%ecx
> > > >     2f5d:       48 89 e5                mov    %rsp,%rbp
> > > > ...
> > > > 
> > > > Note that qla2x00_get_host_fabric_name() is inexplicably
> > > > truncated after
> > > > setting up the frame pointer.  It falls through to the next
> > > > function, which is
> > > > very wrong.
> > > 
> > > Wow, that's ... interesting.
> > > 
> > > 
> > > > I can recreate it with either gcc 5.3.1 or gcc 6.0 on
> > > > linus/master with
> > > > the .config from the above link.
> > > > 
> > > > The call chain which appears to trigger the problem is:
> > > > 
> > > > qla2x00_get_host_fabric_name()
> > > >   wwn_to_u64()
> > > >     get_unaligned_be64()
> > > >       be64_to_cpup()
> > > >         __be64_to_cpup() <- changed to __always_inline by this
> > > > patch
> > > > 
> > > > It occurs with the combination of the following two recent
> > > > commits:
> > > > 
> > > > - bc27fb68aaad ("include/uapi/linux/byteorder, swab: force
> > > > inlining of some byteswap operations")
> > > > - ef3fb2422ffe ("scsi: fc: use get/put_unaligned64 for wwn
> > > > access")
> > > > 
> > > > I can confirm that reverting either patch makes the problem go
> > > > away.
> > > > I'm planning on opening a gcc bug tomorrow.
> > > 
> > > 
> > > Note that if CONFIG_OPTIMIZE_INLINING is not set, _all_ "inline"
> > > keywords are in fact __always_inline, so the bug must be
> > > triggering
> > > even without the patch.
> > 
> > Makes sense in theory, but the bug doesn't actually trigger when I
> > revert the patch and set CONFIG_OPTIMIZE_INLINING=n.
> > 
> > Perhaps even more surprising, it doesn't trigger *with* the patch
> > and
> > CONFIG_OPTIMIZE_INLINING=n.
> 
> [ Adding James to CC since this bug affects scsi. ]
> 
> Here's the gcc bug:
> 
>   https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70646
> 


Actually, adding me doesn't help, I've added linux-scsi.  The summary
is that there's a but in gcc-5.3.1 which is miscompiling qla_attr.c ...
this means we're going to have to ask the compiler version of reported
crashes.

James

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: This patch triggers a bad gcc bug (was Re: [PATCH] force inlining of some byteswap operations)
  2016-04-13 16:55           ` James Bottomley
@ 2016-04-13 17:10             ` Josh Poimboeuf
  2016-04-14 15:29               ` Denys Vlasenko
  0 siblings, 1 reply; 45+ messages in thread
From: Josh Poimboeuf @ 2016-04-13 17:10 UTC (permalink / raw)
  To: James Bottomley
  Cc: Denys Vlasenko, Ingo Molnar, Thomas Graf, Peter Zijlstra,
	David Rientjes, Andrew Morton, linux-kernel, Arnd Bergmann,
	linux-scsi

On Wed, Apr 13, 2016 at 09:55:09AM -0700, James Bottomley wrote:
> On Wed, 2016-04-13 at 10:15 -0500, Josh Poimboeuf wrote:
> > On Wed, Apr 13, 2016 at 07:36:07AM -0500, Josh Poimboeuf wrote:
> > > On Wed, Apr 13, 2016 at 02:12:25PM +0200, Denys Vlasenko wrote:
> > > > On 04/13/2016 05:36 AM, Josh Poimboeuf wrote:
> > > > > On Thu, Feb 04, 2016 at 08:45:35PM +0100, Denys Vlasenko wrote:
> > > > > > Sometimes gcc mysteriously doesn't inline
> > > > > > very small functions we expect to be inlined. See
> > > > > >     https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66122
> > > > > > 
> > > > > > With this .config:
> > > > > > http://busybox.net/~vda/kernel_config_OPTIMIZE_INLINING_and_O
> > > > > > s,
> > > > > > the following functions get deinlined many times.
> > > > > > Examples of disassembly:
> > > > > > 
> > > > > > <get_unaligned_be16> (12 copies, 51 calls):
> > > > > >        66 8b 07                mov    (%rdi),%ax
> > > > > >        55                      push   %rbp
> > > > > >        48 89 e5                mov    %rsp,%rbp
> > > > > >        86 e0                   xchg   %ah,%al
> > > > > >        5d                      pop    %rbp
> > > > > >        c3                      retq
> > > > ...
> > > > > > This patch fixes this via s/inline/__always_inline/.
> > > > > > Code size decrease after the patch is ~4.5k:
> > > > > > 
> > > > > >     text     data      bss       dec     hex filename
> > > > > > 92202377 20826112 36417536 149446025 8e85d89 vmlinux
> > > > > > 92197848 20826112 36417536 149441496 8e84bd8
> > > > > > vmlinux5_swap_after
> > > > > > 
> > > > > > Signed-off-by: Denys Vlasenko <dvlasenk@redhat.com>
> > > > > > Cc: Ingo Molnar <mingo@kernel.org>
> > > > > > Cc: Thomas Graf <tgraf@suug.ch>
> > > > > > Cc: Peter Zijlstra <peterz@infradead.org>
> > > > > > Cc: David Rientjes <rientjes@google.com>
> > > > > > Cc: Andrew Morton <akpm@linux-foundation.org>
> > > > > > Cc: linux-kernel@vger.kernel.org
> > > > > > ---
> > > > > >  include/uapi/linux/byteorder/big_endian.h    | 24
> > > > > > ++++++++++++------------
> > > > > >  include/uapi/linux/byteorder/little_endian.h | 24
> > > > > > ++++++++++++------------
> > > > > >  include/uapi/linux/swab.h                    | 10 +++++-----
> > > > > >  3 files changed, 29 insertions(+), 29 deletions(-)
> > > > > 
> > > > > Hi,
> > > > > 
> > > > > This patch seems to trigger a gcc bug which can produce corrupt
> > > > > code.  I
> > > > > discovered it when investigating an objtool warning reported by
> > > > > kbuild
> > > > > bot:
> > > > > 
> > > > >   http://www.spinics.net/lists/linux-scsi/msg95481.html
> > > > > 
> > > > > From the disassembly of drivers/scsi/qla2xxx/qla_attr.o:
> > > > > 
> > > > > 0000000000002f53 <qla2x00_get_host_fabric_name>:
> > > > >     2f53:       55                      push   %rbp
> > > > >     2f54:       48 89 e5                mov    %rsp,%rbp
> > > > > 
> > > > > 0000000000002f57 <qla2x00_get_fc_host_stats>:
> > > > >     2f57:       55                      push   %rbp
> > > > >     2f58:       b9 e8 00 00 00          mov    $0xe8,%ecx
> > > > >     2f5d:       48 89 e5                mov    %rsp,%rbp
> > > > > ...
> > > > > 
> > > > > Note that qla2x00_get_host_fabric_name() is inexplicably
> > > > > truncated after
> > > > > setting up the frame pointer.  It falls through to the next
> > > > > function, which is
> > > > > very wrong.
> > > > 
> > > > Wow, that's ... interesting.
> > > > 
> > > > 
> > > > > I can recreate it with either gcc 5.3.1 or gcc 6.0 on
> > > > > linus/master with
> > > > > the .config from the above link.
> > > > > 
> > > > > The call chain which appears to trigger the problem is:
> > > > > 
> > > > > qla2x00_get_host_fabric_name()
> > > > >   wwn_to_u64()
> > > > >     get_unaligned_be64()
> > > > >       be64_to_cpup()
> > > > >         __be64_to_cpup() <- changed to __always_inline by this
> > > > > patch
> > > > > 
> > > > > It occurs with the combination of the following two recent
> > > > > commits:
> > > > > 
> > > > > - bc27fb68aaad ("include/uapi/linux/byteorder, swab: force
> > > > > inlining of some byteswap operations")
> > > > > - ef3fb2422ffe ("scsi: fc: use get/put_unaligned64 for wwn
> > > > > access")
> > > > > 
> > > > > I can confirm that reverting either patch makes the problem go
> > > > > away.
> > > > > I'm planning on opening a gcc bug tomorrow.
> > > > 
> > > > 
> > > > Note that if CONFIG_OPTIMIZE_INLINING is not set, _all_ "inline"
> > > > keywords are in fact __always_inline, so the bug must be
> > > > triggering
> > > > even without the patch.
> > > 
> > > Makes sense in theory, but the bug doesn't actually trigger when I
> > > revert the patch and set CONFIG_OPTIMIZE_INLINING=n.
> > > 
> > > Perhaps even more surprising, it doesn't trigger *with* the patch
> > > and
> > > CONFIG_OPTIMIZE_INLINING=n.
> > 
> > [ Adding James to CC since this bug affects scsi. ]
> > 
> > Here's the gcc bug:
> > 
> >   https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70646
> > 
> 
> 
> Actually, adding me doesn't help, I've added linux-scsi.  The summary
> is that there's a but in gcc-5.3.1 which is miscompiling qla_attr.c ...
> this means we're going to have to ask the compiler version of reported
> crashes.

The bug isn't specific to a compiler version.  I've seen it with gcc
5.3.1 and gcc 6.0.  I haven't tried any older versions.  And the gcc bug
hasn't been resolved (or even investigated) yet.

The bug is triggered by a combination of the above two commits from the
4.6 merge window, so presumably we'd need to revert one of them to avoid
crashes in 4.6.

-- 
Josh

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: This patch triggers a bad gcc bug (was Re: [PATCH] force inlining of some byteswap operations)
  2016-04-13 17:10             ` Josh Poimboeuf
@ 2016-04-14 15:29               ` Denys Vlasenko
  2016-04-14 15:57                 ` Josh Poimboeuf
  0 siblings, 1 reply; 45+ messages in thread
From: Denys Vlasenko @ 2016-04-14 15:29 UTC (permalink / raw)
  To: Josh Poimboeuf, James Bottomley
  Cc: Ingo Molnar, Thomas Graf, Peter Zijlstra, David Rientjes,
	Andrew Morton, linux-kernel, Arnd Bergmann, linux-scsi

On 04/13/2016 07:10 PM, Josh Poimboeuf wrote:
>>>>>> From the disassembly of drivers/scsi/qla2xxx/qla_attr.o:
>>>>>>
>>>>>> 0000000000002f53 <qla2x00_get_host_fabric_name>:
>>>>>>     2f53:       55                      push   %rbp
>>>>>>     2f54:       48 89 e5                mov    %rsp,%rbp
>>>>>>
>>>>>> 0000000000002f57 <qla2x00_get_fc_host_stats>:
>>>>>>     2f57:       55                      push   %rbp
>>>>>>     2f58:       b9 e8 00 00 00          mov    $0xe8,%ecx
>>>>>>     2f5d:       48 89 e5                mov    %rsp,%rbp
>>>>>> ...
>>>>>>
>>>>>> Note that qla2x00_get_host_fabric_name() is inexplicably
>>>>>> truncated after
>>>>>> setting up the frame pointer.  It falls through to the next
>>>>>> function, which is
>>>>>> very wrong.
>>>>>
>>>>> Wow, that's ... interesting.
>>>>>
>>>>>
>>>>>> I can recreate it with either gcc 5.3.1 or gcc 6.0 on
>>>>>> linus/master with
>>>>>> the .config from the above link.
>>>>>>
>>>>>> The call chain which appears to trigger the problem is:
>>>>>>
>>>>>> qla2x00_get_host_fabric_name()
>>>>>>   wwn_to_u64()
>>>>>>     get_unaligned_be64()
>>>>>>       be64_to_cpup()
>>>>>>         __be64_to_cpup() <- changed to __always_inline by this
>>>>>> patch
>>>>>>
>>>>>> It occurs with the combination of the following two recent
>>>>>> commits:
>>>>>>
>>>>>> - bc27fb68aaad ("include/uapi/linux/byteorder, swab: force
>>>>>> inlining of some byteswap operations")
>>>>>> - ef3fb2422ffe ("scsi: fc: use get/put_unaligned64 for wwn
>>>>>> access")
>>>>>>
>>>>>> I can confirm that reverting either patch makes the problem go
>>>>>> away.
>>>>>> I'm planning on opening a gcc bug tomorrow.
>>>>>
>>>>>
>>>>> Note that if CONFIG_OPTIMIZE_INLINING is not set, _all_ "inline"
>>>>> keywords are in fact __always_inline, so the bug must be
>>>>> triggering
>>>>> even without the patch.
>>>>
>>>> Makes sense in theory, but the bug doesn't actually trigger when I
>>>> revert the patch and set CONFIG_OPTIMIZE_INLINING=n.
>>>>
>>>> Perhaps even more surprising, it doesn't trigger *with* the patch
>>>> and
>>>> CONFIG_OPTIMIZE_INLINING=n.
>>>
>>> [ Adding James to CC since this bug affects scsi. ]
>>>
>>> Here's the gcc bug:
>>>
>>>   https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70646
>>>
>>
>>
>> Actually, adding me doesn't help, I've added linux-scsi.  The summary
>> is that there's a but in gcc-5.3.1 which is miscompiling qla_attr.c ...
>> this means we're going to have to ask the compiler version of reported
>> crashes.
> 
> The bug isn't specific to a compiler version.  I've seen it with gcc
> 5.3.1 and gcc 6.0.  I haven't tried any older versions.  And the gcc bug
> hasn't been resolved (or even investigated) yet.
> 
> The bug is triggered by a combination of the above two commits from the
> 4.6 merge window, so presumably we'd need to revert one of them to avoid
> crashes in 4.6.

The bug is indeed in the compiler. 4.9 and all later versions are affected.
gcc bugzilla now has a reproducer. In abridged form:


static inline __attribute__((always_inline)) u64 __swab64p(const u64 *p)
{
 return (__builtin_constant_p((u64)(*p)) ? ((u64)( (((u64)(*p) & (u64)0x00000000000000ffULL) << 56) | (((u64)(*p) & (u64)0x000000000000ff00ULL) << 40) | (((u64)(*p) & (u64)0x0000000000ff0000ULL) << 24) | (((u64)(*p) & (u64)0x00000000ff000000ULL) << 8) | (((u64)(*p) & (u64)0x000000ff00000000ULL) >> 8) | (((u64)(*p) & (u64)0x0000ff0000000000ULL) >> 24) | (((u64)(*p) & (u64)0x00ff000000000000ULL) >> 40) | (((u64)(*p) & (u64)0xff00000000000000ULL) >> 56))) : __builtin_bswap64(*p));
}
static inline u64 wwn_to_u64(void *wwn)
{
 return __swab64p(wwn);
}
static void qla2x00_get_host_fabric_name(struct Scsi_Host *shost)
{
 scsi_qla_host_t *vha = shost_priv(shost);
 u8 node_name[8] = { 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF};
 u64 fabric_name = wwn_to_u64(node_name);
 if (vha->device_flags & 0x1)
  fabric_name = wwn_to_u64(vha->fabric_node_name);
 (((struct fc_host_attrs *)(shost)->shost_data)->fabric_name) = fabric_name;
}


Two (or more, there were more before simplification) levels of inlining
are necessary for bug to trigger in this example (folding to one level
makes it go away). "__attribute__((always_inline))" is necessary too.


Since we have lots of __always_inline anyway, this bug has a potential
to miscompile kernels regardless of CONFIG_OPTIMIZE_INLINING setting,
and with or without the patches mentioned above (they just happen
to create a reliable reproducer).

Since it was not detected for two years since gcc 4.9 release,
it must be triggering quite rarely.

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: This patch triggers a bad gcc bug (was Re: [PATCH] force inlining of some byteswap operations)
  2016-04-14 15:29               ` Denys Vlasenko
@ 2016-04-14 15:57                 ` Josh Poimboeuf
  2016-04-14 17:09                   ` Denys Vlasenko
  0 siblings, 1 reply; 45+ messages in thread
From: Josh Poimboeuf @ 2016-04-14 15:57 UTC (permalink / raw)
  To: Denys Vlasenko
  Cc: James Bottomley, Ingo Molnar, Thomas Graf, Peter Zijlstra,
	David Rientjes, Andrew Morton, linux-kernel, Arnd Bergmann,
	linux-scsi

On Thu, Apr 14, 2016 at 05:29:06PM +0200, Denys Vlasenko wrote:
> On 04/13/2016 07:10 PM, Josh Poimboeuf wrote:
> >>>>>> From the disassembly of drivers/scsi/qla2xxx/qla_attr.o:
> >>>>>>
> >>>>>> 0000000000002f53 <qla2x00_get_host_fabric_name>:
> >>>>>>     2f53:       55                      push   %rbp
> >>>>>>     2f54:       48 89 e5                mov    %rsp,%rbp
> >>>>>>
> >>>>>> 0000000000002f57 <qla2x00_get_fc_host_stats>:
> >>>>>>     2f57:       55                      push   %rbp
> >>>>>>     2f58:       b9 e8 00 00 00          mov    $0xe8,%ecx
> >>>>>>     2f5d:       48 89 e5                mov    %rsp,%rbp
> >>>>>> ...
> >>>>>>
> >>>>>> Note that qla2x00_get_host_fabric_name() is inexplicably
> >>>>>> truncated after
> >>>>>> setting up the frame pointer.  It falls through to the next
> >>>>>> function, which is
> >>>>>> very wrong.
> >>>>>
> >>>>> Wow, that's ... interesting.
> >>>>>
> >>>>>
> >>>>>> I can recreate it with either gcc 5.3.1 or gcc 6.0 on
> >>>>>> linus/master with
> >>>>>> the .config from the above link.
> >>>>>>
> >>>>>> The call chain which appears to trigger the problem is:
> >>>>>>
> >>>>>> qla2x00_get_host_fabric_name()
> >>>>>>   wwn_to_u64()
> >>>>>>     get_unaligned_be64()
> >>>>>>       be64_to_cpup()
> >>>>>>         __be64_to_cpup() <- changed to __always_inline by this
> >>>>>> patch
> >>>>>>
> >>>>>> It occurs with the combination of the following two recent
> >>>>>> commits:
> >>>>>>
> >>>>>> - bc27fb68aaad ("include/uapi/linux/byteorder, swab: force
> >>>>>> inlining of some byteswap operations")
> >>>>>> - ef3fb2422ffe ("scsi: fc: use get/put_unaligned64 for wwn
> >>>>>> access")
> >>>>>>
> >>>>>> I can confirm that reverting either patch makes the problem go
> >>>>>> away.
> >>>>>> I'm planning on opening a gcc bug tomorrow.
> >>>>>
> >>>>>
> >>>>> Note that if CONFIG_OPTIMIZE_INLINING is not set, _all_ "inline"
> >>>>> keywords are in fact __always_inline, so the bug must be
> >>>>> triggering
> >>>>> even without the patch.
> >>>>
> >>>> Makes sense in theory, but the bug doesn't actually trigger when I
> >>>> revert the patch and set CONFIG_OPTIMIZE_INLINING=n.
> >>>>
> >>>> Perhaps even more surprising, it doesn't trigger *with* the patch
> >>>> and
> >>>> CONFIG_OPTIMIZE_INLINING=n.
> >>>
> >>> [ Adding James to CC since this bug affects scsi. ]
> >>>
> >>> Here's the gcc bug:
> >>>
> >>>   https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70646
> >>>
> >>
> >>
> >> Actually, adding me doesn't help, I've added linux-scsi.  The summary
> >> is that there's a but in gcc-5.3.1 which is miscompiling qla_attr.c ...
> >> this means we're going to have to ask the compiler version of reported
> >> crashes.
> > 
> > The bug isn't specific to a compiler version.  I've seen it with gcc
> > 5.3.1 and gcc 6.0.  I haven't tried any older versions.  And the gcc bug
> > hasn't been resolved (or even investigated) yet.
> > 
> > The bug is triggered by a combination of the above two commits from the
> > 4.6 merge window, so presumably we'd need to revert one of them to avoid
> > crashes in 4.6.
> 
> The bug is indeed in the compiler. 4.9 and all later versions are affected.
> gcc bugzilla now has a reproducer. In abridged form:
> 
> 
> static inline __attribute__((always_inline)) u64 __swab64p(const u64 *p)
> {
>  return (__builtin_constant_p((u64)(*p)) ? ((u64)( (((u64)(*p) & (u64)0x00000000000000ffULL) << 56) | (((u64)(*p) & (u64)0x000000000000ff00ULL) << 40) | (((u64)(*p) & (u64)0x0000000000ff0000ULL) << 24) | (((u64)(*p) & (u64)0x00000000ff000000ULL) << 8) | (((u64)(*p) & (u64)0x000000ff00000000ULL) >> 8) | (((u64)(*p) & (u64)0x0000ff0000000000ULL) >> 24) | (((u64)(*p) & (u64)0x00ff000000000000ULL) >> 40) | (((u64)(*p) & (u64)0xff00000000000000ULL) >> 56))) : __builtin_bswap64(*p));
> }
> static inline u64 wwn_to_u64(void *wwn)
> {
>  return __swab64p(wwn);
> }
> static void qla2x00_get_host_fabric_name(struct Scsi_Host *shost)
> {
>  scsi_qla_host_t *vha = shost_priv(shost);
>  u8 node_name[8] = { 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF};
>  u64 fabric_name = wwn_to_u64(node_name);
>  if (vha->device_flags & 0x1)
>   fabric_name = wwn_to_u64(vha->fabric_node_name);
>  (((struct fc_host_attrs *)(shost)->shost_data)->fabric_name) = fabric_name;
> }

Nice work with the reproducer!

> Two (or more, there were more before simplification) levels of inlining
> are necessary for bug to trigger in this example (folding to one level
> makes it go away). "__attribute__((always_inline))" is necessary too.
> 
> 
> Since we have lots of __always_inline anyway, this bug has a potential
> to miscompile kernels regardless of CONFIG_OPTIMIZE_INLINING setting,
> and with or without the patches mentioned above (they just happen
> to create a reliable reproducer).

Well, but setting !CONFIG_OPTIMIZE_INLINING makes the problem go away
for some reason.  It seems like the bug requires wwn_to_u64() being
out-of-line and __swab64p() being in-line.

In fact, the following patch seems to fix it:

diff --git a/include/scsi/scsi_transport_fc.h b/include/scsi/scsi_transport_fc.h
index bf66ea6..56b9e81 100644
--- a/include/scsi/scsi_transport_fc.h
+++ b/include/scsi/scsi_transport_fc.h
@@ -796,7 +796,7 @@ fc_remote_port_chkready(struct fc_rport *rport)
 	return result;
 }
 
-static inline u64 wwn_to_u64(u8 *wwn)
+static __always_inline u64 wwn_to_u64(u8 *wwn)
 {
 	return get_unaligned_be64(wwn);
 }

> Since it was not detected for two years since gcc 4.9 release,
> it must be triggering quite rarely.

Yeah, and according to objtool this is the only occurrence of this bug
in the entire kernel tree.  Thanks to the kbuild robot randconfig
builds, I think we can be pretty confident that objtool will find any
more of them if they show up.

So what do you think about working around this bug by doing something
like the above patch?

-- 
Josh

^ permalink raw reply related	[flat|nested] 45+ messages in thread

* Re: This patch triggers a bad gcc bug (was Re: [PATCH] force inlining of some byteswap operations)
  2016-04-14 15:57                 ` Josh Poimboeuf
@ 2016-04-14 17:09                   ` Denys Vlasenko
  2016-04-15  5:45                     ` Ingo Molnar
  0 siblings, 1 reply; 45+ messages in thread
From: Denys Vlasenko @ 2016-04-14 17:09 UTC (permalink / raw)
  To: Josh Poimboeuf
  Cc: James Bottomley, Ingo Molnar, Thomas Graf, Peter Zijlstra,
	David Rientjes, Andrew Morton, linux-kernel, Arnd Bergmann,
	linux-scsi

On 04/14/2016 05:57 PM, Josh Poimboeuf wrote:
> On Thu, Apr 14, 2016 at 05:29:06PM +0200, Denys Vlasenko wrote:
>> On 04/13/2016 07:10 PM, Josh Poimboeuf wrote:
>>>>>>>> From the disassembly of drivers/scsi/qla2xxx/qla_attr.o:
>>>>>>>>
>>>>>>>> 0000000000002f53 <qla2x00_get_host_fabric_name>:
>>>>>>>>     2f53:       55                      push   %rbp
>>>>>>>>     2f54:       48 89 e5                mov    %rsp,%rbp
>>>>>>>>
>>>>>>>> 0000000000002f57 <qla2x00_get_fc_host_stats>:
>>>>>>>>     2f57:       55                      push   %rbp
>>>>>>>>     2f58:       b9 e8 00 00 00          mov    $0xe8,%ecx
>>>>>>>>     2f5d:       48 89 e5                mov    %rsp,%rbp
>>>>>>>> ...
>>>>>>>>
>>>>>>>> Note that qla2x00_get_host_fabric_name() is inexplicably
>>>>>>>> truncated after
>>>>>>>> setting up the frame pointer.  It falls through to the next
>>>>>>>> function, which is
>>>>>>>> very wrong.
>>>>>>>
>>>>>>> Wow, that's ... interesting.
>>>>>>>
>>>>>>>
>>>>>>>> I can recreate it with either gcc 5.3.1 or gcc 6.0 on
>>>>>>>> linus/master with
>>>>>>>> the .config from the above link.
>>>>>>>>
>>>>>>>> The call chain which appears to trigger the problem is:
>>>>>>>>
>>>>>>>> qla2x00_get_host_fabric_name()
>>>>>>>>   wwn_to_u64()
>>>>>>>>     get_unaligned_be64()
>>>>>>>>       be64_to_cpup()
>>>>>>>>         __be64_to_cpup() <- changed to __always_inline by this
>>>>>>>> patch
>>>>>>>>
>>>>>>>> It occurs with the combination of the following two recent
>>>>>>>> commits:
>>>>>>>>
>>>>>>>> - bc27fb68aaad ("include/uapi/linux/byteorder, swab: force
>>>>>>>> inlining of some byteswap operations")
>>>>>>>> - ef3fb2422ffe ("scsi: fc: use get/put_unaligned64 for wwn
>>>>>>>> access")
>>>>>>>>
>>>>>>>> I can confirm that reverting either patch makes the problem go
>>>>>>>> away.
>>>>>>>> I'm planning on opening a gcc bug tomorrow.
>>>>>>>
>>>>>>>
>>>>>>> Note that if CONFIG_OPTIMIZE_INLINING is not set, _all_ "inline"
>>>>>>> keywords are in fact __always_inline, so the bug must be
>>>>>>> triggering
>>>>>>> even without the patch.
>>>>>>
>>>>>> Makes sense in theory, but the bug doesn't actually trigger when I
>>>>>> revert the patch and set CONFIG_OPTIMIZE_INLINING=n.
>>>>>>
>>>>>> Perhaps even more surprising, it doesn't trigger *with* the patch
>>>>>> and
>>>>>> CONFIG_OPTIMIZE_INLINING=n.
>>>>>
>>>>> [ Adding James to CC since this bug affects scsi. ]
>>>>>
>>>>> Here's the gcc bug:
>>>>>
>>>>>   https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70646
>>>>>
>>>>
>>>>
>>>> Actually, adding me doesn't help, I've added linux-scsi.  The summary
>>>> is that there's a but in gcc-5.3.1 which is miscompiling qla_attr.c ...
>>>> this means we're going to have to ask the compiler version of reported
>>>> crashes.
>>>
>>> The bug isn't specific to a compiler version.  I've seen it with gcc
>>> 5.3.1 and gcc 6.0.  I haven't tried any older versions.  And the gcc bug
>>> hasn't been resolved (or even investigated) yet.
>>>
>>> The bug is triggered by a combination of the above two commits from the
>>> 4.6 merge window, so presumably we'd need to revert one of them to avoid
>>> crashes in 4.6.
>>
>> The bug is indeed in the compiler. 4.9 and all later versions are affected.
>> gcc bugzilla now has a reproducer. In abridged form:
>>
>>
>> static inline __attribute__((always_inline)) u64 __swab64p(const u64 *p)
>> {
>>  return (__builtin_constant_p((u64)(*p)) ? ((u64)( (((u64)(*p) & (u64)0x00000000000000ffULL) << 56) | (((u64)(*p) & (u64)0x000000000000ff00ULL) << 40) | (((u64)(*p) & (u64)0x0000000000ff0000ULL) << 24) | (((u64)(*p) & (u64)0x00000000ff000000ULL) << 8) | (((u64)(*p) & (u64)0x000000ff00000000ULL) >> 8) | (((u64)(*p) & (u64)0x0000ff0000000000ULL) >> 24) | (((u64)(*p) & (u64)0x00ff000000000000ULL) >> 40) | (((u64)(*p) & (u64)0xff00000000000000ULL) >> 56))) : __builtin_bswap64(*p));
>> }
>> static inline u64 wwn_to_u64(void *wwn)
>> {
>>  return __swab64p(wwn);
>> }
>> static void qla2x00_get_host_fabric_name(struct Scsi_Host *shost)
>> {
>>  scsi_qla_host_t *vha = shost_priv(shost);
>>  u8 node_name[8] = { 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF};
>>  u64 fabric_name = wwn_to_u64(node_name);
>>  if (vha->device_flags & 0x1)
>>   fabric_name = wwn_to_u64(vha->fabric_node_name);
>>  (((struct fc_host_attrs *)(shost)->shost_data)->fabric_name) = fabric_name;
>> }
> 
> Nice work with the reproducer!
> 
>> Two (or more, there were more before simplification) levels of inlining
>> are necessary for bug to trigger in this example (folding to one level
>> makes it go away). "__attribute__((always_inline))" is necessary too.
>>
>>
>> Since we have lots of __always_inline anyway, this bug has a potential
>> to miscompile kernels regardless of CONFIG_OPTIMIZE_INLINING setting,
>> and with or without the patches mentioned above (they just happen
>> to create a reliable reproducer).
> 
> Well, but setting !CONFIG_OPTIMIZE_INLINING makes the problem go away
> for some reason.  It seems like the bug requires wwn_to_u64() being
> out-of-line and __swab64p() being in-line.

By my reading of what gcc gurus are talking there,
gcc has some new-ish machinery for discarding unreachable code.
Akin to not continuing code generation after a call to never-returning
function like exit(), but smarter (it can detect much less obvious
cases when some code path is not possible).

And it has a subtle bug. In this case, it decided that both branches
of ternary op ?: in __swab64p() are impossible, and therefore
__swab64p() is impossible.
(Which is, of course, bogus, that's why it's a bug).

Then this bogus decision was propagated through inlining
and gcc decided that entire qla2x00_get_host_fabric_name()
function is an impossible (unreachable) code path, and...
eliminated it all. Good boy  :D

> In fact, the following patch seems to fix it:
> 
> diff --git a/include/scsi/scsi_transport_fc.h b/include/scsi/scsi_transport_fc.h
> index bf66ea6..56b9e81 100644
> --- a/include/scsi/scsi_transport_fc.h
> +++ b/include/scsi/scsi_transport_fc.h
> @@ -796,7 +796,7 @@ fc_remote_port_chkready(struct fc_rport *rport)
>  	return result;
>  }
>  
> -static inline u64 wwn_to_u64(u8 *wwn)
> +static __always_inline u64 wwn_to_u64(u8 *wwn)
>  {
>  	return get_unaligned_be64(wwn);
>  }

It is not a guarantee.

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: This patch triggers a bad gcc bug (was Re: [PATCH] force inlining of some byteswap operations)
  2016-04-14 17:09                   ` Denys Vlasenko
@ 2016-04-15  5:45                     ` Ingo Molnar
  2016-04-15 13:47                       ` Josh Poimboeuf
  2016-04-16  7:42                       ` This patch triggers a bad gcc bug (was Re: [PATCH] force inlining of some byteswap operations) Arnd Bergmann
  0 siblings, 2 replies; 45+ messages in thread
From: Ingo Molnar @ 2016-04-15  5:45 UTC (permalink / raw)
  To: Denys Vlasenko
  Cc: Josh Poimboeuf, James Bottomley, Thomas Graf, Peter Zijlstra,
	David Rientjes, Andrew Morton, linux-kernel, Arnd Bergmann,
	linux-scsi


* Denys Vlasenko <dvlasenk@redhat.com> wrote:

> > In fact, the following patch seems to fix it:
> > 
> > diff --git a/include/scsi/scsi_transport_fc.h b/include/scsi/scsi_transport_fc.h
> > index bf66ea6..56b9e81 100644
> > --- a/include/scsi/scsi_transport_fc.h
> > +++ b/include/scsi/scsi_transport_fc.h
> > @@ -796,7 +796,7 @@ fc_remote_port_chkready(struct fc_rport *rport)
> >  	return result;
> >  }
> >  
> > -static inline u64 wwn_to_u64(u8 *wwn)
> > +static __always_inline u64 wwn_to_u64(u8 *wwn)
> >  {
> >  	return get_unaligned_be64(wwn);
> >  }
> 
> It is not a guarantee.

Of course it's a workaround - but is there any deterministic way to turn off this 
GCC bug (by activating some GCC command line switch), or do we have to live with 
objtool warning about this GCC?

Which, by the way, is pretty cool!

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: This patch triggers a bad gcc bug (was Re: [PATCH] force inlining of some byteswap operations)
  2016-04-15  5:45                     ` Ingo Molnar
@ 2016-04-15 13:47                       ` Josh Poimboeuf
  2016-04-15 22:20                         ` Josh Poimboeuf
  2016-04-16  7:42                       ` This patch triggers a bad gcc bug (was Re: [PATCH] force inlining of some byteswap operations) Arnd Bergmann
  1 sibling, 1 reply; 45+ messages in thread
From: Josh Poimboeuf @ 2016-04-15 13:47 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Denys Vlasenko, James Bottomley, Thomas Graf, Peter Zijlstra,
	David Rientjes, Andrew Morton, linux-kernel, Arnd Bergmann,
	linux-scsi

On Fri, Apr 15, 2016 at 07:45:19AM +0200, Ingo Molnar wrote:
> 
> * Denys Vlasenko <dvlasenk@redhat.com> wrote:
> 
> > > In fact, the following patch seems to fix it:
> > > 
> > > diff --git a/include/scsi/scsi_transport_fc.h b/include/scsi/scsi_transport_fc.h
> > > index bf66ea6..56b9e81 100644
> > > --- a/include/scsi/scsi_transport_fc.h
> > > +++ b/include/scsi/scsi_transport_fc.h
> > > @@ -796,7 +796,7 @@ fc_remote_port_chkready(struct fc_rport *rport)
> > >  	return result;
> > >  }
> > >  
> > > -static inline u64 wwn_to_u64(u8 *wwn)
> > > +static __always_inline u64 wwn_to_u64(u8 *wwn)
> > >  {
> > >  	return get_unaligned_be64(wwn);
> > >  }
> > 
> > It is not a guarantee.
> 
> Of course it's a workaround - but is there any deterministic way to turn off this 
> GCC bug (by activating some GCC command line switch), or do we have to live with 
> objtool warning about this GCC?

I don't think we know yet if there's a reliable way to turn the bug off.

Also, according to the gcc guys, this bug won't always result in a
truncated function, and may sometimes just make some inline function
call sites disappear:

  https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70646#c14

though I haven't been able to confirm that experimentally.  But if it's
true, that means that objtool won't be able to detect all cases of the
bug and some function calls may just silently disappear!

There's a lot of activity in the bug now, so hopefully they'll be able
to tell us soon if there's a reliable way to avoid it and/or detect it.

BTW, Denys posted a workaround patch for the qla2xxxx code:

  https://lkml.kernel.org/r/1460716583-15673-1-git-send-email-dvlasenk@redhat.com


-- 
Josh

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: This patch triggers a bad gcc bug (was Re: [PATCH] force inlining of some byteswap operations)
  2016-04-15 13:47                       ` Josh Poimboeuf
@ 2016-04-15 22:20                         ` Josh Poimboeuf
  2016-04-16  9:03                           ` Ingo Molnar
  0 siblings, 1 reply; 45+ messages in thread
From: Josh Poimboeuf @ 2016-04-15 22:20 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Denys Vlasenko, James Bottomley, Thomas Graf, Peter Zijlstra,
	David Rientjes, Andrew Morton, linux-kernel, Arnd Bergmann,
	linux-scsi, jamborm

On Fri, Apr 15, 2016 at 08:47:45AM -0500, Josh Poimboeuf wrote:
> On Fri, Apr 15, 2016 at 07:45:19AM +0200, Ingo Molnar wrote:
> > 
> > * Denys Vlasenko <dvlasenk@redhat.com> wrote:
> > 
> > > > In fact, the following patch seems to fix it:
> > > > 
> > > > diff --git a/include/scsi/scsi_transport_fc.h b/include/scsi/scsi_transport_fc.h
> > > > index bf66ea6..56b9e81 100644
> > > > --- a/include/scsi/scsi_transport_fc.h
> > > > +++ b/include/scsi/scsi_transport_fc.h
> > > > @@ -796,7 +796,7 @@ fc_remote_port_chkready(struct fc_rport *rport)
> > > >  	return result;
> > > >  }
> > > >  
> > > > -static inline u64 wwn_to_u64(u8 *wwn)
> > > > +static __always_inline u64 wwn_to_u64(u8 *wwn)
> > > >  {
> > > >  	return get_unaligned_be64(wwn);
> > > >  }
> > > 
> > > It is not a guarantee.
> > 
> > Of course it's a workaround - but is there any deterministic way to turn off this 
> > GCC bug (by activating some GCC command line switch), or do we have to live with 
> > objtool warning about this GCC?
> 
> I don't think we know yet if there's a reliable way to turn the bug off.
> 
> Also, according to the gcc guys, this bug won't always result in a
> truncated function, and may sometimes just make some inline function
> call sites disappear:
> 
>   https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70646#c14
> 
> though I haven't been able to confirm that experimentally.  But if it's
> true, that means that objtool won't be able to detect all cases of the
> bug and some function calls may just silently disappear!
> 
> There's a lot of activity in the bug now, so hopefully they'll be able
> to tell us soon if there's a reliable way to avoid it and/or detect it.
> 
> BTW, Denys posted a workaround patch for the qla2xxxx code:
> 
>   https://lkml.kernel.org/r/1460716583-15673-1-git-send-email-dvlasenk@redhat.com

Martin Jambor wrote a succinct summary of the conditions needed for this
bug:

  "This bug can occur when an inlineable function containing a call to
  __builtin_constant_p, which checks a parameter or a value it
  references and a (possibly indirect) caller of the function actually
  passes a constant, but stores it using a type of a different size."

So to prevent it from happening elsewhere in the kernel, it sounds like
we'd have to either remove all uses of __builtin_constant_p() or disable
inlining completely.

There's also no reliable way to detect the bug has occurred, though
objtool will detect it in cases when the function gets truncated.

-- 
Josh

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: This patch triggers a bad gcc bug (was Re: [PATCH] force inlining of some byteswap operations)
  2016-04-15  5:45                     ` Ingo Molnar
  2016-04-15 13:47                       ` Josh Poimboeuf
@ 2016-04-16  7:42                       ` Arnd Bergmann
  2016-04-18 13:22                         ` Josh Poimboeuf
  1 sibling, 1 reply; 45+ messages in thread
From: Arnd Bergmann @ 2016-04-16  7:42 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Denys Vlasenko, Josh Poimboeuf, James Bottomley, Thomas Graf,
	Peter Zijlstra, David Rientjes, Andrew Morton, linux-kernel,
	linux-scsi

On Friday 15 April 2016 07:45:19 Ingo Molnar wrote:
> 
> * Denys Vlasenko <dvlasenk@redhat.com> wrote:
> 
> > > In fact, the following patch seems to fix it:
> > > 
> > > diff --git a/include/scsi/scsi_transport_fc.h b/include/scsi/scsi_transport_fc.h
> > > index bf66ea6..56b9e81 100644
> > > --- a/include/scsi/scsi_transport_fc.h
> > > +++ b/include/scsi/scsi_transport_fc.h
> > > @@ -796,7 +796,7 @@ fc_remote_port_chkready(struct fc_rport *rport)
> > >  	return result;
> > >  }
> > >  
> > > -static inline u64 wwn_to_u64(u8 *wwn)
> > > +static __always_inline u64 wwn_to_u64(u8 *wwn)
> > >  {
> > >  	return get_unaligned_be64(wwn);
> > >  }
> > 
> > It is not a guarantee.
> 
> Of course it's a workaround - but is there any deterministic way to turn off this 
> GCC bug (by activating some GCC command line switch), or do we have to live with 
> objtool warning about this GCC?
> 
> Which, by the way, is pretty cool!

I have done a patch for the asm-generic/unaligned handling recently that
reworks the implementation to avoid an ARM specific bug (gcc uses certain
CPU instructions that require aligned data when we tell it that unaligned
data is not).

It changes the code enough that the gcc bug might not be triggered any more,
aside from generating far superior code in some cases.

I thought I had submitted that patch before, but I can't find a version
with a proper changelog any more now, so I probably haven't. However, I did
all the research to show that it only makes things better on ARM and x86
except in cases where the gcc inliner happens to pick a different set of
functions to be inline (these have a 50:50 chance of better vs worse, the
result on average seems to be the same).

	Arnd

commit 752b719f6675be02a3dd29fe5d92b2f380b5743d
Author: Arnd Bergmann <arnd@arndb.de>
Date:   Fri Mar 4 16:15:20 2016 +0100

    asm-generic: always use struct based unaligned access
    
    Signed-off-by: Arnd Bergmann <arnd@arndb.de>

diff --git a/include/asm-generic/unaligned.h b/include/asm-generic/unaligned.h
index 1ac097279db1..e8f5523eeb0a 100644
--- a/include/asm-generic/unaligned.h
+++ b/include/asm-generic/unaligned.h
@@ -3,29 +3,19 @@
 
 /*
  * This is the most generic implementation of unaligned accesses
- * and should work almost anywhere.
+ * and should work almost anywhere, we trust that the compiler
+ * knows how to handle unaligned accesses.
  */
 #include <asm/byteorder.h>
 
-/* Set by the arch if it can handle unaligned accesses in hardware. */
-#ifdef CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS
-# include <linux/unaligned/access_ok.h>
-#endif
+#include <linux/unaligned/le_struct.h>
+#include <linux/unaligned/be_struct.h>
+#include <linux/unaligned/generic.h>
 
 #if defined(__LITTLE_ENDIAN)
-# ifndef CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS
-#  include <linux/unaligned/le_struct.h>
-#  include <linux/unaligned/be_byteshift.h>
-# endif
-# include <linux/unaligned/generic.h>
 # define get_unaligned	__get_unaligned_le
 # define put_unaligned	__put_unaligned_le
 #elif defined(__BIG_ENDIAN)
-# ifndef CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS
-#  include <linux/unaligned/be_struct.h>
-#  include <linux/unaligned/le_byteshift.h>
-# endif
-# include <linux/unaligned/generic.h>
 # define get_unaligned	__get_unaligned_be
 # define put_unaligned	__put_unaligned_be
 #else
diff --git a/include/linux/unaligned/be_struct.h b/include/linux/unaligned/be_struct.h
index 132415836c50..9ab8c53bb3fe 100644
--- a/include/linux/unaligned/be_struct.h
+++ b/include/linux/unaligned/be_struct.h
@@ -2,35 +2,36 @@
 #define _LINUX_UNALIGNED_BE_STRUCT_H
 
 #include <linux/unaligned/packed_struct.h>
+#include <asm/byteorder.h>
 
 static inline u16 get_unaligned_be16(const void *p)
 {
-	return __get_unaligned_cpu16((const u8 *)p);
+	return be16_to_cpu((__le16 __force)__get_unaligned_cpu16((const u8 *)p));
 }
 
 static inline u32 get_unaligned_be32(const void *p)
 {
-	return __get_unaligned_cpu32((const u8 *)p);
+	return be32_to_cpu((__le32 __force)__get_unaligned_cpu32((const u8 *)p));
 }
 
 static inline u64 get_unaligned_be64(const void *p)
 {
-	return __get_unaligned_cpu64((const u8 *)p);
+	return be64_to_cpu((__le64 __force)__get_unaligned_cpu64((const u8 *)p));
 }
 
 static inline void put_unaligned_be16(u16 val, void *p)
 {
-	__put_unaligned_cpu16(val, p);
+	__put_unaligned_cpu16((u16 __force)cpu_to_be16(val), p);
 }
 
 static inline void put_unaligned_be32(u32 val, void *p)
 {
-	__put_unaligned_cpu32(val, p);
+	__put_unaligned_cpu32((u32 __force)cpu_to_be32(val), p);
 }
 
 static inline void put_unaligned_be64(u64 val, void *p)
 {
-	__put_unaligned_cpu64(val, p);
+	__put_unaligned_cpu64((u64 __force)cpu_to_be64(val), p);
 }
 
 #endif /* _LINUX_UNALIGNED_BE_STRUCT_H */
diff --git a/include/linux/unaligned/le_struct.h b/include/linux/unaligned/le_struct.h
index 088c4572faa8..64171ad0b100 100644
--- a/include/linux/unaligned/le_struct.h
+++ b/include/linux/unaligned/le_struct.h
@@ -2,35 +2,36 @@
 #define _LINUX_UNALIGNED_LE_STRUCT_H
 
 #include <linux/unaligned/packed_struct.h>
+#include <asm/byteorder.h>
 
 static inline u16 get_unaligned_le16(const void *p)
 {
-	return __get_unaligned_cpu16((const u8 *)p);
+	return le16_to_cpu((__le16 __force)__get_unaligned_cpu16((const u8 *)p));
 }
 
 static inline u32 get_unaligned_le32(const void *p)
 {
-	return __get_unaligned_cpu32((const u8 *)p);
+	return le32_to_cpu((__le32 __force)__get_unaligned_cpu32((const u8 *)p));
 }
 
 static inline u64 get_unaligned_le64(const void *p)
 {
-	return __get_unaligned_cpu64((const u8 *)p);
+	return le64_to_cpu((__le64 __force)__get_unaligned_cpu64((const u8 *)p));
 }
 
 static inline void put_unaligned_le16(u16 val, void *p)
 {
-	__put_unaligned_cpu16(val, p);
+	__put_unaligned_cpu16((u16 __force)cpu_to_le16(val), p);
 }
 
 static inline void put_unaligned_le32(u32 val, void *p)
 {
-	__put_unaligned_cpu32(val, p);
+	__put_unaligned_cpu32((u32 __force)cpu_to_le32(val), p);
 }
 
 static inline void put_unaligned_le64(u64 val, void *p)
 {
-	__put_unaligned_cpu64(val, p);
+	__put_unaligned_cpu64((u64 __force)cpu_to_le64(val), p);
 }
 
 #endif /* _LINUX_UNALIGNED_LE_STRUCT_H */

^ permalink raw reply related	[flat|nested] 45+ messages in thread

* Re: This patch triggers a bad gcc bug (was Re: [PATCH] force inlining of some byteswap operations)
  2016-04-15 22:20                         ` Josh Poimboeuf
@ 2016-04-16  9:03                           ` Ingo Molnar
  2016-04-18 13:39                             ` Josh Poimboeuf
  0 siblings, 1 reply; 45+ messages in thread
From: Ingo Molnar @ 2016-04-16  9:03 UTC (permalink / raw)
  To: Josh Poimboeuf
  Cc: Denys Vlasenko, James Bottomley, Thomas Graf, Peter Zijlstra,
	David Rientjes, Andrew Morton, linux-kernel, Arnd Bergmann,
	linux-scsi, jamborm


* Josh Poimboeuf <jpoimboe@redhat.com> wrote:

> > I don't think we know yet if there's a reliable way to turn the bug off.
> > 
> > Also, according to the gcc guys, this bug won't always result in a
> > truncated function, and may sometimes just make some inline function
> > call sites disappear:
> > 
> >   https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70646#c14
> > 
> > though I haven't been able to confirm that experimentally.  But if it's
> > true, that means that objtool won't be able to detect all cases of the
> > bug and some function calls may just silently disappear!
> > 
> > There's a lot of activity in the bug now, so hopefully they'll be able
> > to tell us soon if there's a reliable way to avoid it and/or detect it.
> > 
> > BTW, Denys posted a workaround patch for the qla2xxxx code:
> > 
> >   https://lkml.kernel.org/r/1460716583-15673-1-git-send-email-dvlasenk@redhat.com
> 
> Martin Jambor wrote a succinct summary of the conditions needed for this
> bug:
> 
>   "This bug can occur when an inlineable function containing a call to
>   __builtin_constant_p, which checks a parameter or a value it
>   references and a (possibly indirect) caller of the function actually
>   passes a constant, but stores it using a type of a different size."
> 
> So to prevent it from happening elsewhere in the kernel, it sounds like
> we'd have to either remove all uses of __builtin_constant_p() or disable
> inlining completely.
> 
> There's also no reliable way to detect the bug has occurred, though
> objtool will detect it in cases when the function gets truncated.

So it appears to me that due to the hard to detect nature of the GCC bug the fix 
will probably be backported by them, so I think we should be fine with relying on 
objtool to detect weird code sequences in the kernel, and should work around 
specific instances of the bug.

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: This patch triggers a bad gcc bug (was Re: [PATCH] force inlining of some byteswap operations)
  2016-04-16  7:42                       ` This patch triggers a bad gcc bug (was Re: [PATCH] force inlining of some byteswap operations) Arnd Bergmann
@ 2016-04-18 13:22                         ` Josh Poimboeuf
  0 siblings, 0 replies; 45+ messages in thread
From: Josh Poimboeuf @ 2016-04-18 13:22 UTC (permalink / raw)
  To: Arnd Bergmann
  Cc: Ingo Molnar, Denys Vlasenko, James Bottomley, Thomas Graf,
	Peter Zijlstra, David Rientjes, Andrew Morton, linux-kernel,
	linux-scsi

On Sat, Apr 16, 2016 at 09:42:37AM +0200, Arnd Bergmann wrote:
> On Friday 15 April 2016 07:45:19 Ingo Molnar wrote:
> > 
> > * Denys Vlasenko <dvlasenk@redhat.com> wrote:
> > 
> > > > In fact, the following patch seems to fix it:
> > > > 
> > > > diff --git a/include/scsi/scsi_transport_fc.h b/include/scsi/scsi_transport_fc.h
> > > > index bf66ea6..56b9e81 100644
> > > > --- a/include/scsi/scsi_transport_fc.h
> > > > +++ b/include/scsi/scsi_transport_fc.h
> > > > @@ -796,7 +796,7 @@ fc_remote_port_chkready(struct fc_rport *rport)
> > > >  	return result;
> > > >  }
> > > >  
> > > > -static inline u64 wwn_to_u64(u8 *wwn)
> > > > +static __always_inline u64 wwn_to_u64(u8 *wwn)
> > > >  {
> > > >  	return get_unaligned_be64(wwn);
> > > >  }
> > > 
> > > It is not a guarantee.
> > 
> > Of course it's a workaround - but is there any deterministic way to turn off this 
> > GCC bug (by activating some GCC command line switch), or do we have to live with 
> > objtool warning about this GCC?
> > 
> > Which, by the way, is pretty cool!
> 
> I have done a patch for the asm-generic/unaligned handling recently that
> reworks the implementation to avoid an ARM specific bug (gcc uses certain
> CPU instructions that require aligned data when we tell it that unaligned
> data is not).
> 
> It changes the code enough that the gcc bug might not be triggered any more,
> aside from generating far superior code in some cases.

I tried this patch, but unfortunately it doesn't make the gcc bug go
away.

-- 
Josh

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: This patch triggers a bad gcc bug (was Re: [PATCH] force inlining of some byteswap operations)
  2016-04-16  9:03                           ` Ingo Molnar
@ 2016-04-18 13:39                             ` Josh Poimboeuf
  2016-04-18 14:07                               ` Arnd Bergmann
  2016-04-19  8:52                               ` Ingo Molnar
  0 siblings, 2 replies; 45+ messages in thread
From: Josh Poimboeuf @ 2016-04-18 13:39 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Denys Vlasenko, James Bottomley, Thomas Graf, Peter Zijlstra,
	David Rientjes, Andrew Morton, linux-kernel, Arnd Bergmann,
	linux-scsi, jamborm

On Sat, Apr 16, 2016 at 11:03:32AM +0200, Ingo Molnar wrote:
> 
> * Josh Poimboeuf <jpoimboe@redhat.com> wrote:
> 
> > > I don't think we know yet if there's a reliable way to turn the bug off.
> > > 
> > > Also, according to the gcc guys, this bug won't always result in a
> > > truncated function, and may sometimes just make some inline function
> > > call sites disappear:
> > > 
> > >   https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70646#c14
> > > 
> > > though I haven't been able to confirm that experimentally.  But if it's
> > > true, that means that objtool won't be able to detect all cases of the
> > > bug and some function calls may just silently disappear!
> > > 
> > > There's a lot of activity in the bug now, so hopefully they'll be able
> > > to tell us soon if there's a reliable way to avoid it and/or detect it.
> > > 
> > > BTW, Denys posted a workaround patch for the qla2xxxx code:
> > > 
> > >   https://lkml.kernel.org/r/1460716583-15673-1-git-send-email-dvlasenk@redhat.com
> > 
> > Martin Jambor wrote a succinct summary of the conditions needed for this
> > bug:
> > 
> >   "This bug can occur when an inlineable function containing a call to
> >   __builtin_constant_p, which checks a parameter or a value it
> >   references and a (possibly indirect) caller of the function actually
> >   passes a constant, but stores it using a type of a different size."
> > 
> > So to prevent it from happening elsewhere in the kernel, it sounds like
> > we'd have to either remove all uses of __builtin_constant_p() or disable
> > inlining completely.
> > 
> > There's also no reliable way to detect the bug has occurred, though
> > objtool will detect it in cases when the function gets truncated.
> 
> So it appears to me that due to the hard to detect nature of the GCC bug the fix 
> will probably be backported by them, so I think we should be fine with relying on 
> objtool to detect weird code sequences in the kernel, and should work around 
> specific instances of the bug.

I agree.  So how should we work around the bug in this case?  There have
been several suggestions:

- change wwn_to_u64() to __always_inline

- change qla2x00_get_host_fabric_name() to skip the unnecessary call to
  wwn_to_u64()

- revert one of the two commits:
  bc27fb68aaad ("include/uapi/linux/byteorder, swab: force inlining of some byteswap operations")
  ef3fb2422ffe ("scsi: fc: use get/put_unaligned64 for wwn access")


-- 
Josh

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: This patch triggers a bad gcc bug (was Re: [PATCH] force inlining of some byteswap operations)
  2016-04-18 13:39                             ` Josh Poimboeuf
@ 2016-04-18 14:07                               ` Arnd Bergmann
  2016-04-18 14:12                                 ` Josh Poimboeuf
  2016-04-19  8:52                               ` Ingo Molnar
  1 sibling, 1 reply; 45+ messages in thread
From: Arnd Bergmann @ 2016-04-18 14:07 UTC (permalink / raw)
  To: Josh Poimboeuf
  Cc: Ingo Molnar, Denys Vlasenko, James Bottomley, Thomas Graf,
	Peter Zijlstra, David Rientjes, Andrew Morton, linux-kernel,
	linux-scsi, jamborm

On Monday 18 April 2016 08:39:32 Josh Poimboeuf wrote:
> 
> I agree.  So how should we work around the bug in this case?  There have
> been several suggestions:
> 
> - change wwn_to_u64() to __always_inline
> 
> - change qla2x00_get_host_fabric_name() to skip the unnecessary call to
>   wwn_to_u64()
> 
> - revert one of the two commits:
>   bc27fb68aaad ("include/uapi/linux/byteorder, swab: force inlining of some byteswap operations")
>   ef3fb2422ffe ("scsi: fc: use get/put_unaligned64 for wwn access")

What about the patch to change get_unaligned_be64() that I posted?

I think we want to merge that anyway, I just don't know if that helps
with this particular problem as well.

	Arnd

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: This patch triggers a bad gcc bug (was Re: [PATCH] force inlining of some byteswap operations)
  2016-04-18 14:07                               ` Arnd Bergmann
@ 2016-04-18 14:12                                 ` Josh Poimboeuf
  2016-04-18 14:21                                   ` Arnd Bergmann
  0 siblings, 1 reply; 45+ messages in thread
From: Josh Poimboeuf @ 2016-04-18 14:12 UTC (permalink / raw)
  To: Arnd Bergmann
  Cc: Ingo Molnar, Denys Vlasenko, James Bottomley, Thomas Graf,
	Peter Zijlstra, David Rientjes, Andrew Morton, linux-kernel,
	linux-scsi, jamborm

On Mon, Apr 18, 2016 at 04:07:51PM +0200, Arnd Bergmann wrote:
> On Monday 18 April 2016 08:39:32 Josh Poimboeuf wrote:
> > 
> > I agree.  So how should we work around the bug in this case?  There have
> > been several suggestions:
> > 
> > - change wwn_to_u64() to __always_inline
> > 
> > - change qla2x00_get_host_fabric_name() to skip the unnecessary call to
> >   wwn_to_u64()
> > 
> > - revert one of the two commits:
> >   bc27fb68aaad ("include/uapi/linux/byteorder, swab: force inlining of some byteswap operations")
> >   ef3fb2422ffe ("scsi: fc: use get/put_unaligned64 for wwn access")
> 
> What about the patch to change get_unaligned_be64() that I posted?
> 
> I think we want to merge that anyway, I just don't know if that helps
> with this particular problem as well.

I replied to your other email about that -- it doesn't seem to help this
issue.

-- 
Josh

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: This patch triggers a bad gcc bug (was Re: [PATCH] force inlining of some byteswap operations)
  2016-04-18 14:12                                 ` Josh Poimboeuf
@ 2016-04-18 14:21                                   ` Arnd Bergmann
  0 siblings, 0 replies; 45+ messages in thread
From: Arnd Bergmann @ 2016-04-18 14:21 UTC (permalink / raw)
  To: Josh Poimboeuf
  Cc: Ingo Molnar, Denys Vlasenko, James Bottomley, Thomas Graf,
	Peter Zijlstra, David Rientjes, Andrew Morton, linux-kernel,
	linux-scsi, jamborm

On Monday 18 April 2016 09:12:41 Josh Poimboeuf wrote:
> On Mon, Apr 18, 2016 at 04:07:51PM +0200, Arnd Bergmann wrote:
> > On Monday 18 April 2016 08:39:32 Josh Poimboeuf wrote:
> > > 
> > > I agree.  So how should we work around the bug in this case?  There have
> > > been several suggestions:
> > > 
> > > - change wwn_to_u64() to __always_inline
> > > 
> > > - change qla2x00_get_host_fabric_name() to skip the unnecessary call to
> > >   wwn_to_u64()
> > > 
> > > - revert one of the two commits:
> > >   bc27fb68aaad ("include/uapi/linux/byteorder, swab: force inlining of some byteswap operations")
> > >   ef3fb2422ffe ("scsi: fc: use get/put_unaligned64 for wwn access")
> > 
> > What about the patch to change get_unaligned_be64() that I posted?
> > 
> > I think we want to merge that anyway, I just don't know if that helps
> > with this particular problem as well.
> 
> I replied to your other email about that -- it doesn't seem to help this
> issue.
> 

Ok, I see. I had problems with my mail server last week, your reply
must have been a victim of that as I never saw it (found it on the
web archive now).

I'd vote for the wwn_to_u64 change then as it should prevent the
same thing from happining in other drivers. I would prefer not to
see ef3fb2422ffe reverted, as that works around another gcc-6 bug
on ARM.

	Arnd

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: This patch triggers a bad gcc bug (was Re: [PATCH] force inlining of some byteswap operations)
  2016-04-18 13:39                             ` Josh Poimboeuf
  2016-04-18 14:07                               ` Arnd Bergmann
@ 2016-04-19  8:52                               ` Ingo Molnar
  2016-04-19 13:56                                 ` [PATCH] scsi: fc: force inlining of wwn conversion functions Josh Poimboeuf
  1 sibling, 1 reply; 45+ messages in thread
From: Ingo Molnar @ 2016-04-19  8:52 UTC (permalink / raw)
  To: Josh Poimboeuf
  Cc: Denys Vlasenko, James Bottomley, Thomas Graf, Peter Zijlstra,
	David Rientjes, Andrew Morton, linux-kernel, Arnd Bergmann,
	linux-scsi, jamborm


* Josh Poimboeuf <jpoimboe@redhat.com> wrote:

> On Sat, Apr 16, 2016 at 11:03:32AM +0200, Ingo Molnar wrote:
> > 
> > * Josh Poimboeuf <jpoimboe@redhat.com> wrote:
> > 
> > > > I don't think we know yet if there's a reliable way to turn the bug off.
> > > > 
> > > > Also, according to the gcc guys, this bug won't always result in a
> > > > truncated function, and may sometimes just make some inline function
> > > > call sites disappear:
> > > > 
> > > >   https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70646#c14
> > > > 
> > > > though I haven't been able to confirm that experimentally.  But if it's
> > > > true, that means that objtool won't be able to detect all cases of the
> > > > bug and some function calls may just silently disappear!
> > > > 
> > > > There's a lot of activity in the bug now, so hopefully they'll be able
> > > > to tell us soon if there's a reliable way to avoid it and/or detect it.
> > > > 
> > > > BTW, Denys posted a workaround patch for the qla2xxxx code:
> > > > 
> > > >   https://lkml.kernel.org/r/1460716583-15673-1-git-send-email-dvlasenk@redhat.com
> > > 
> > > Martin Jambor wrote a succinct summary of the conditions needed for this
> > > bug:
> > > 
> > >   "This bug can occur when an inlineable function containing a call to
> > >   __builtin_constant_p, which checks a parameter or a value it
> > >   references and a (possibly indirect) caller of the function actually
> > >   passes a constant, but stores it using a type of a different size."
> > > 
> > > So to prevent it from happening elsewhere in the kernel, it sounds like
> > > we'd have to either remove all uses of __builtin_constant_p() or disable
> > > inlining completely.
> > > 
> > > There's also no reliable way to detect the bug has occurred, though
> > > objtool will detect it in cases when the function gets truncated.
> > 
> > So it appears to me that due to the hard to detect nature of the GCC bug the fix 
> > will probably be backported by them, so I think we should be fine with relying on 
> > objtool to detect weird code sequences in the kernel, and should work around 
> > specific instances of the bug.
> 
> I agree.  So how should we work around the bug in this case?  There have
> been several suggestions:
> 
> - change wwn_to_u64() to __always_inline
> 
> - change qla2x00_get_host_fabric_name() to skip the unnecessary call to
>   wwn_to_u64()
> 
> - revert one of the two commits:
>   bc27fb68aaad ("include/uapi/linux/byteorder, swab: force inlining of some byteswap operations")
>   ef3fb2422ffe ("scsi: fc: use get/put_unaligned64 for wwn access")

The first option sounds like the best one by far: it does a change that is related 
to the GCC bug (tweaks inlining), has near zero impact and does not revert other 
useful progress.

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 45+ messages in thread

* [PATCH] scsi: fc: force inlining of wwn conversion functions
  2016-04-19  8:52                               ` Ingo Molnar
@ 2016-04-19 13:56                                 ` Josh Poimboeuf
  2016-04-22 23:17                                   ` Quinn Tran
  2016-04-25 16:07                                   ` Josh Poimboeuf
  0 siblings, 2 replies; 45+ messages in thread
From: Josh Poimboeuf @ 2016-04-19 13:56 UTC (permalink / raw)
  To: James Bottomley, Martin K. Petersen
  Cc: linux-scsi, linux-kernel, Denys Vlasenko, Thomas Graf,
	Peter Zijlstra, David Rientjes, Andrew Morton, Arnd Bergmann,
	jamborm, Ingo Molnar, Himanshu Madhani, qla2xxx-upstream

objtool reports [1] the following warning:

  drivers/scsi/qla2xxx/qla_attr.o: warning: objtool: qla2x00_get_host_fabric_name() falls through to next function qla2x00_get_starget_port_name()

This warning is due to a gcc bug [2] which causes corrupt code:

  0000000000002f53 <qla2x00_get_host_fabric_name>:
      2f53:       55                      push   %rbp
      2f54:       48 89 e5                mov    %rsp,%rbp

  0000000000002f57 <qla2x00_get_fc_host_stats>:
      2f57:       55                      push   %rbp
      2f58:       b9 e8 00 00 00          mov    $0xe8,%ecx
      2f5d:       48 89 e5                mov    %rsp,%rbp
  ...

Note that qla2x00_get_host_fabric_name() is inexplicably truncated after
setting up the frame pointer.  It falls through to the next function,
which is very bad.

It occurs with the combination of the following two recent commits:

  bc27fb68aaad ("include/uapi/linux/byteorder, swab: force inlining of some byteswap operations")
  ef3fb2422ffe ("scsi: fc: use get/put_unaligned64 for wwn access")

The call chain which appears to trigger the problem is:

  qla2x00_get_host_fabric_name()
    wwn_to_u64()
      get_unaligned_be64()
        be64_to_cpup()
          __be64_to_cpup()

The bug requires very specific conditions to trigger.  According to Martin
Jambor (from the gcc bugzilla):

  "This bug can occur when an inlineable function containing a call to
  __builtin_constant_p, which checks a parameter or a value it
  references and a (possibly indirect) caller of the function actually
  passes a constant, but stores it using a type of a different size."

There's no reliable way to avoid (or even detect) the bug.  Until it
gets fixed in released versions of gcc, the least intrusive workaround
for this particular issue is to force the wwn conversion functions to be
inlined.

[1] https://lists.01.org/pipermail/kbuild-all/2016-April/019579.html
[2] https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70646

Reported-by: kbuild test robot <fengguang.wu@intel.com>
Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
---
 include/scsi/scsi_transport_fc.h | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/include/scsi/scsi_transport_fc.h b/include/scsi/scsi_transport_fc.h
index bf66ea6..1919cd4 100644
--- a/include/scsi/scsi_transport_fc.h
+++ b/include/scsi/scsi_transport_fc.h
@@ -796,12 +796,12 @@ fc_remote_port_chkready(struct fc_rport *rport)
 	return result;
 }
 
-static inline u64 wwn_to_u64(u8 *wwn)
+static __always_inline u64 wwn_to_u64(u8 *wwn)
 {
 	return get_unaligned_be64(wwn);
 }
 
-static inline void u64_to_wwn(u64 inm, u8 *wwn)
+static __always_inline void u64_to_wwn(u64 inm, u8 *wwn)
 {
 	put_unaligned_be64(inm, wwn);
 }
-- 
2.4.11

^ permalink raw reply related	[flat|nested] 45+ messages in thread

* Re: [PATCH] scsi: fc: force inlining of wwn conversion functions
  2016-04-19 13:56                                 ` [PATCH] scsi: fc: force inlining of wwn conversion functions Josh Poimboeuf
@ 2016-04-22 23:17                                   ` Quinn Tran
  2016-04-25 16:07                                   ` Josh Poimboeuf
  1 sibling, 0 replies; 45+ messages in thread
From: Quinn Tran @ 2016-04-22 23:17 UTC (permalink / raw)
  To: Josh Poimboeuf, James Bottomley, Martin K. Petersen
  Cc: linux-scsi, linux-kernel, Denys Vlasenko, Thomas Graf,
	Peter Zijlstra, David Rientjes, Andrew Morton, Arnd Bergmann,
	jamborm@gcc.gnu.org, Ingo Molnar, Himanshu Madhani,
	Dept-Eng QLA2xxx Upstream

Current kernel (4.6.0-rc4+) + GCC 5.3.0 definitely truncated qla2x00_get_host_fabric_name() routine.  Just like Josh indicated, we’re dropping down to the next routine.


root@mars:/sys/class/fc_host/host3  2016-04-22 16:07:30
> cat fabric_name
Killed
——
static void
qla2x00_get_host_fabric_name(struct Scsi_Host *shost)
{
    32d0:       e8 00 00 00 00          callq  32d5 <qla2x00_get_host_fabric_name+0x5>
    32d5:       55                      push   %rbp
    32d6:       48 89 e5                mov    %rsp,%rbp
    32d9:       0f 1f 80 00 00 00 00    nopl   0x0(%rax)

00000000000032e0 <qla2x00_get_starget_node_name>:
qla2x00_get_starget_node_name():
/root/qt/linux.git/drivers/scsi/qla2xxx/qla_attr.c:1756
        fc_host_port_type(shost) = port_type;
}


----

Apr 22 16:07:50 mars kernel: BUG: unable to handle kernel NULL pointer dereference at 0000000000000058
Apr 22 16:07:50 mars kernel: IP: [<ffffffff813f72d7>] scsi_is_host_device+0x7/0x20
Apr 22 16:07:50 mars kernel: PGD 7fe1c8067 PUD 7f5c72067 PMD 0 
Apr 22 16:07:50 mars kernel: Oops: 0000 [#1] SMP 
Apr 22 16:07:50 mars kernel: Modules linked in: qla2xxx scsi_transport_fc ebtable_nat ebtables ipt_MASQUERADE nf_nat_masquerade_ipv4 iptable_nat nf_nat_ipv4 
...
dm_mirror(E) dm_region_hash(E) dm_log(E) dm_mod(E) [last unloaded: qla2xxx]
Apr 22 16:07:50 mars kernel: CPU: 8 PID: 10452 Comm: cat Tainted: G            E   4.6.0-rc4+ #2
Apr 22 16:07:50 mars kernel: Hardware name: HP ProLiant DL380 G7, BIOS P67 05/05/2011
Apr 22 16:07:50 mars kernel: task: ffff8807fcd1a880 ti: ffff8807ff128000 task.ti: ffff8807ff128000
Apr 22 16:07:50 mars kernel: RIP: 0010:[<ffffffff813f72d7>]  [<ffffffff813f72d7>] scsi_is_host_device+0x7/0x20
Apr 22 16:07:50 mars kernel: RSP: 0018:ffff8807ff12bcf0  EFLAGS: 00010246
Apr 22 16:07:50 mars kernel: RAX: 0000000000000000 RBX: 0000000000000000 RCX: ffff880ffe8ade88
Apr 22 16:07:50 mars kernel: RDX: ffff8807f5db9000 RSI: ffff880ffed43340 RDI: 0000000000000000
Apr 22 16:07:50 mars kernel: RBP: ffff8807ff12bd08 R08: ffff88101f45ac38 R09: ffff8807fccef280
Apr 22 16:07:50 mars kernel: R10: 0000000000000000 R11: 0000000000000000 R12: ffff88101b7e6000
Apr 22 16:07:50 mars kernel: R13: ffff8807fdaf1f00 R14: ffff8800dad379c0 R15: 0000000000000001
Apr 22 16:07:50 mars kernel: FS:  00007fd569512700(0000) GS:ffff88081fc80000(0000) knlGS:0000000000000000
Apr 22 16:07:50 mars kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Apr 22 16:07:50 mars kernel: CR2: 0000000000000058 CR3: 00000007f23f4000 CR4: 00000000000006e0
Apr 22 16:07:50 mars kernel: Stack:
Apr 22 16:07:50 mars kernel: ffffffffa0759db5 ffff88101b7e6000 ffff8807f5db9000 ffff8807ff12bd10
Apr 22 16:07:50 mars kernel: ffff8807ff12bd30 ffffffffa065b1bb ffff880ffed43340 ffffffff8166d950
Apr 22 16:07:50 mars kernel: ffff8807ff12bd50 ffffffff813c9e30 ffffffff815ccfb2 ffff8800dad379c0
Apr 22 16:07:50 mars kernel: Call Trace:
Apr 22 16:07:50 mars kernel: [<ffffffffa0759db5>] ? qla2x00_get_starget_node_name+0x25/0x90 [qla2xxx]
Apr 22 16:07:50 mars kernel: [<ffffffffa065b1bb>] ? show_fc_host_fabric_name+0x4b/0x80 [scsi_transport_fc]
Apr 22 16:07:50 mars kernel: [<ffffffff813c9e30>] ? dev_attr_show+0x20/0x50




Regards,
Quinn Tran






-----Original Message-----
From: <linux-scsi-owner@vger.kernel.org> on behalf of Josh Poimboeuf <jpoimboe@redhat.com>
Date: Tuesday, April 19, 2016 at 6:56 AM
To: James Bottomley <James.Bottomley@HansenPartnership.com>, "Martin K. Petersen" <martin.petersen@oracle.com>
Cc: linux-scsi <linux-scsi@vger.kernel.org>, linux-kernel <linux-kernel@vger.kernel.org>, Denys Vlasenko <dvlasenk@redhat.com>, Thomas Graf <tgraf@suug.ch>, Peter Zijlstra <peterz@infradead.org>, David Rientjes <rientjes@google.com>, Andrew Morton <akpm@linux-foundation.org>, Arnd Bergmann <arnd@arndb.de>, "jamborm@gcc.gnu.org" <jamborm@gcc.gnu.org>, Ingo Molnar <mingo@kernel.org>, Himanshu Madhani <himanshu.madhani@qlogic.com>, Dept-Eng QLA2xxx Upstream <qla2xxx-upstream@qlogic.com>
Subject: [PATCH] scsi: fc: force inlining of wwn conversion functions

>objtool reports [1] the following warning:
>
>  drivers/scsi/qla2xxx/qla_attr.o: warning: objtool: qla2x00_get_host_fabric_name() falls through to next function qla2x00_get_starget_port_name()
>
>This warning is due to a gcc bug [2] which causes corrupt code:
>
>  0000000000002f53 <qla2x00_get_host_fabric_name>:
>      2f53:       55                      push   %rbp
>      2f54:       48 89 e5                mov    %rsp,%rbp
>
>  0000000000002f57 <qla2x00_get_fc_host_stats>:
>      2f57:       55                      push   %rbp
>      2f58:       b9 e8 00 00 00          mov    $0xe8,%ecx
>      2f5d:       48 89 e5                mov    %rsp,%rbp
>  ...
>
>Note that qla2x00_get_host_fabric_name() is inexplicably truncated after
>setting up the frame pointer.  It falls through to the next function,
>which is very bad.
>
>It occurs with the combination of the following two recent commits:
>
>  bc27fb68aaad ("include/uapi/linux/byteorder, swab: force inlining of some byteswap operations")
>  ef3fb2422ffe ("scsi: fc: use get/put_unaligned64 for wwn access")
>
>The call chain which appears to trigger the problem is:
>
>  qla2x00_get_host_fabric_name()
>    wwn_to_u64()
>      get_unaligned_be64()
>        be64_to_cpup()
>          __be64_to_cpup()
>
>The bug requires very specific conditions to trigger.  According to Martin
>Jambor (from the gcc bugzilla):
>
>  "This bug can occur when an inlineable function containing a call to
>  __builtin_constant_p, which checks a parameter or a value it
>  references and a (possibly indirect) caller of the function actually
>  passes a constant, but stores it using a type of a different size."
>
>There's no reliable way to avoid (or even detect) the bug.  Until it
>gets fixed in released versions of gcc, the least intrusive workaround
>for this particular issue is to force the wwn conversion functions to be
>inlined.
>
>[1] https://lists.01.org/pipermail/kbuild-all/2016-April/019579.html
>[2] https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70646
>
>Reported-by: kbuild test robot <fengguang.wu@intel.com>
>Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
>---
> include/scsi/scsi_transport_fc.h | 4 ++--
> 1 file changed, 2 insertions(+), 2 deletions(-)
>
>diff --git a/include/scsi/scsi_transport_fc.h b/include/scsi/scsi_transport_fc.h
>index bf66ea6..1919cd4 100644
>--- a/include/scsi/scsi_transport_fc.h
>+++ b/include/scsi/scsi_transport_fc.h
>@@ -796,12 +796,12 @@ fc_remote_port_chkready(struct fc_rport *rport)
> 	return result;
> }
> 
>-static inline u64 wwn_to_u64(u8 *wwn)
>+static __always_inline u64 wwn_to_u64(u8 *wwn)
> {
> 	return get_unaligned_be64(wwn);
> }
> 
>-static inline void u64_to_wwn(u64 inm, u8 *wwn)
>+static __always_inline void u64_to_wwn(u64 inm, u8 *wwn)
> {
> 	put_unaligned_be64(inm, wwn);
> }
>-- 
>2.4.11
>
>--
>To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
>the body of a message to majordomo@vger.kernel.org
>More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH] scsi: fc: force inlining of wwn conversion functions
  2016-04-19 13:56                                 ` [PATCH] scsi: fc: force inlining of wwn conversion functions Josh Poimboeuf
  2016-04-22 23:17                                   ` Quinn Tran
@ 2016-04-25 16:07                                   ` Josh Poimboeuf
  2016-04-26  2:40                                     ` Martin K. Petersen
  1 sibling, 1 reply; 45+ messages in thread
From: Josh Poimboeuf @ 2016-04-25 16:07 UTC (permalink / raw)
  To: James Bottomley, Martin K. Petersen
  Cc: linux-scsi, linux-kernel, Denys Vlasenko, Thomas Graf,
	Peter Zijlstra, David Rientjes, Andrew Morton, Arnd Bergmann,
	jamborm, Ingo Molnar, Himanshu Madhani, qla2xxx-upstream

James, 

Can you merge this patch for 4.6?

On Tue, Apr 19, 2016 at 08:56:00AM -0500, Josh Poimboeuf wrote:
> objtool reports [1] the following warning:
> 
>   drivers/scsi/qla2xxx/qla_attr.o: warning: objtool: qla2x00_get_host_fabric_name() falls through to next function qla2x00_get_starget_port_name()
> 
> This warning is due to a gcc bug [2] which causes corrupt code:
> 
>   0000000000002f53 <qla2x00_get_host_fabric_name>:
>       2f53:       55                      push   %rbp
>       2f54:       48 89 e5                mov    %rsp,%rbp
> 
>   0000000000002f57 <qla2x00_get_fc_host_stats>:
>       2f57:       55                      push   %rbp
>       2f58:       b9 e8 00 00 00          mov    $0xe8,%ecx
>       2f5d:       48 89 e5                mov    %rsp,%rbp
>   ...
> 
> Note that qla2x00_get_host_fabric_name() is inexplicably truncated after
> setting up the frame pointer.  It falls through to the next function,
> which is very bad.
> 
> It occurs with the combination of the following two recent commits:
> 
>   bc27fb68aaad ("include/uapi/linux/byteorder, swab: force inlining of some byteswap operations")
>   ef3fb2422ffe ("scsi: fc: use get/put_unaligned64 for wwn access")
> 
> The call chain which appears to trigger the problem is:
> 
>   qla2x00_get_host_fabric_name()
>     wwn_to_u64()
>       get_unaligned_be64()
>         be64_to_cpup()
>           __be64_to_cpup()
> 
> The bug requires very specific conditions to trigger.  According to Martin
> Jambor (from the gcc bugzilla):
> 
>   "This bug can occur when an inlineable function containing a call to
>   __builtin_constant_p, which checks a parameter or a value it
>   references and a (possibly indirect) caller of the function actually
>   passes a constant, but stores it using a type of a different size."
> 
> There's no reliable way to avoid (or even detect) the bug.  Until it
> gets fixed in released versions of gcc, the least intrusive workaround
> for this particular issue is to force the wwn conversion functions to be
> inlined.
> 
> [1] https://lists.01.org/pipermail/kbuild-all/2016-April/019579.html
> [2] https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70646
> 
> Reported-by: kbuild test robot <fengguang.wu@intel.com>
> Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
> ---
>  include/scsi/scsi_transport_fc.h | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/include/scsi/scsi_transport_fc.h b/include/scsi/scsi_transport_fc.h
> index bf66ea6..1919cd4 100644
> --- a/include/scsi/scsi_transport_fc.h
> +++ b/include/scsi/scsi_transport_fc.h
> @@ -796,12 +796,12 @@ fc_remote_port_chkready(struct fc_rport *rport)
>  	return result;
>  }
>  
> -static inline u64 wwn_to_u64(u8 *wwn)
> +static __always_inline u64 wwn_to_u64(u8 *wwn)
>  {
>  	return get_unaligned_be64(wwn);
>  }
>  
> -static inline void u64_to_wwn(u64 inm, u8 *wwn)
> +static __always_inline void u64_to_wwn(u64 inm, u8 *wwn)
>  {
>  	put_unaligned_be64(inm, wwn);
>  }
> -- 
> 2.4.11
> 

-- 
Josh

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH] scsi: fc: force inlining of wwn conversion functions
  2016-04-25 16:07                                   ` Josh Poimboeuf
@ 2016-04-26  2:40                                     ` Martin K. Petersen
  2016-04-26  3:37                                       ` James Bottomley
  0 siblings, 1 reply; 45+ messages in thread
From: Martin K. Petersen @ 2016-04-26  2:40 UTC (permalink / raw)
  To: Josh Poimboeuf
  Cc: James Bottomley, Martin K. Petersen, linux-scsi, linux-kernel,
	Denys Vlasenko, Thomas Graf, Peter Zijlstra, David Rientjes,
	Andrew Morton, Arnd Bergmann, jamborm, Ingo Molnar,
	Himanshu Madhani, qla2xxx-upstream

>>>>> "Josh" == Josh Poimboeuf <jpoimboe@redhat.com> writes:

Josh> Can you merge this patch for 4.6?

I am really not a big fan of working around compiler bugs in a device
driver.

Are we sure there are no other get_unaligned_be64() calls in the kernel
that suffer the same fate?

-- 
Martin K. Petersen	Oracle Linux Engineering

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH] scsi: fc: force inlining of wwn conversion functions
  2016-04-26  2:40                                     ` Martin K. Petersen
@ 2016-04-26  3:37                                       ` James Bottomley
  2016-04-26  7:22                                         ` Arnd Bergmann
  0 siblings, 1 reply; 45+ messages in thread
From: James Bottomley @ 2016-04-26  3:37 UTC (permalink / raw)
  To: Martin K. Petersen, Josh Poimboeuf
  Cc: linux-scsi, linux-kernel, Denys Vlasenko, Thomas Graf,
	Peter Zijlstra, David Rientjes, Andrew Morton, Arnd Bergmann,
	jamborm, Ingo Molnar, Himanshu Madhani, qla2xxx-upstream

On Mon, 2016-04-25 at 22:40 -0400, Martin K. Petersen wrote:
> > > > > > "Josh" == Josh Poimboeuf <jpoimboe@redhat.com> writes:
> 
> Josh> Can you merge this patch for 4.6?
> 
> I am really not a big fan of working around compiler bugs in a device
> driver.

Me neither

> Are we sure there are no other get_unaligned_be64() calls in the
> kernel that suffer the same fate?

Agree, plus, as I've said before, we have 3-4 weeks before we go final,
so we still have some time before a decision has to be made.  It looks
like the gcc people already have a patch for the compiler, so the
distributions could just push that out through channels.

James

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH] scsi: fc: force inlining of wwn conversion functions
  2016-04-26  3:37                                       ` James Bottomley
@ 2016-04-26  7:22                                         ` Arnd Bergmann
  2016-04-26  8:35                                           ` Christoph Hellwig
  2016-04-26 13:06                                           ` Martin K. Petersen
  0 siblings, 2 replies; 45+ messages in thread
From: Arnd Bergmann @ 2016-04-26  7:22 UTC (permalink / raw)
  To: James Bottomley
  Cc: Martin K. Petersen, Josh Poimboeuf, linux-scsi, linux-kernel,
	Denys Vlasenko, Thomas Graf, Peter Zijlstra, David Rientjes,
	Andrew Morton, jamborm, Ingo Molnar, Himanshu Madhani,
	qla2xxx-upstream

On Monday 25 April 2016 20:37:31 James Bottomley wrote:
> On Mon, 2016-04-25 at 22:40 -0400, Martin K. Petersen wrote:
> > > > > > > "Josh" == Josh Poimboeuf <jpoimboe@redhat.com> writes:
> > 
> > Josh> Can you merge this patch for 4.6?
> > 
> > I am really not a big fan of working around compiler bugs in a device
> > driver.
> 
> Me neither
> 
> > Are we sure there are no other get_unaligned_be64() calls in the
> > kernel that suffer the same fate?
> 
> Agree, plus, as I've said before, we have 3-4 weeks before we go final,
> so we still have some time before a decision has to be made.  It looks
> like the gcc people already have a patch for the compiler, so the
> distributions could just push that out through channels.

I don't think we can realistically blacklist gcc-4.9.{0,1,2,3},
gcc-5.{0,1,2,3}.* and gcc-6.0 and require everyone to upgrade to compilers
that have not been released yet in order to build a linux-4.6 kernel.

	Arnd

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH] scsi: fc: force inlining of wwn conversion functions
  2016-04-26  7:22                                         ` Arnd Bergmann
@ 2016-04-26  8:35                                           ` Christoph Hellwig
  2016-04-26 10:05                                             ` Arnd Bergmann
  2016-04-26 13:06                                           ` Martin K. Petersen
  1 sibling, 1 reply; 45+ messages in thread
From: Christoph Hellwig @ 2016-04-26  8:35 UTC (permalink / raw)
  To: Arnd Bergmann
  Cc: James Bottomley, Martin K. Petersen, Josh Poimboeuf, linux-scsi,
	linux-kernel, Denys Vlasenko, Thomas Graf, Peter Zijlstra,
	David Rientjes, Andrew Morton, jamborm, Ingo Molnar,
	Himanshu Madhani, qla2xxx-upstream

On Tue, Apr 26, 2016 at 09:22:46AM +0200, Arnd Bergmann wrote:
> > Agree, plus, as I've said before, we have 3-4 weeks before we go final,
> > so we still have some time before a decision has to be made.  It looks
> > like the gcc people already have a patch for the compiler, so the
> > distributions could just push that out through channels.
> 
> I don't think we can realistically blacklist gcc-4.9.{0,1,2,3},
> gcc-5.{0,1,2,3}.* and gcc-6.0 and require everyone to upgrade to compilers
> that have not been released yet in order to build a linux-4.6 kernel.

Agreed.  What about just removing the wrappers?  They seem fairly
pointless to start with.

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH] scsi: fc: force inlining of wwn conversion functions
  2016-04-26  8:35                                           ` Christoph Hellwig
@ 2016-04-26 10:05                                             ` Arnd Bergmann
  0 siblings, 0 replies; 45+ messages in thread
From: Arnd Bergmann @ 2016-04-26 10:05 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: James Bottomley, Martin K. Petersen, Josh Poimboeuf, linux-scsi,
	linux-kernel, Denys Vlasenko, Thomas Graf, Peter Zijlstra,
	David Rientjes, Andrew Morton, jamborm, Ingo Molnar,
	Himanshu Madhani, qla2xxx-upstream

On Tuesday 26 April 2016 01:35:16 Christoph Hellwig wrote:
> On Tue, Apr 26, 2016 at 09:22:46AM +0200, Arnd Bergmann wrote:
> > > Agree, plus, as I've said before, we have 3-4 weeks before we go final,
> > > so we still have some time before a decision has to be made.  It looks
> > > like the gcc people already have a patch for the compiler, so the
> > > distributions could just push that out through channels.
> > 
> > I don't think we can realistically blacklist gcc-4.9.{0,1,2,3},
> > gcc-5.{0,1,2,3}.* and gcc-6.0 and require everyone to upgrade to compilers
> > that have not been released yet in order to build a linux-4.6 kernel.
> 
> Agreed.  What about just removing the wrappers?  They seem fairly
> pointless to start with.

I think at this point it's mainly a question of whether we want such a
big (however trivial) patch in v4.6. We can certainly do that for 4.7,
but as a fixup for the existing problem, either the __always_inline
hack or using a macro should be sufficient:

diff --git a/include/scsi/scsi_transport_fc.h b/include/scsi/scsi_transport_fc.h
index bf66ea6bed2b..51a98b182a67 100644
--- a/include/scsi/scsi_transport_fc.h
+++ b/include/scsi/scsi_transport_fc.h
@@ -796,15 +796,8 @@ fc_remote_port_chkready(struct fc_rport *rport)
 	return result;
 }
 
-static inline u64 wwn_to_u64(u8 *wwn)
-{
-	return get_unaligned_be64(wwn);
-}
-
-static inline void u64_to_wwn(u64 inm, u8 *wwn)
-{
-	put_unaligned_be64(inm, wwn);
-}
+#define wwn_to_u64(wwn) get_unaligned_be64(wwn)
+#define u64_to_wwn(inm, wwn) put_unaligned_be64(inm, wwn)
 
 /**
  * fc_vport_set_state() - called to set a vport's state. Saves the old state,

	Arnd

^ permalink raw reply related	[flat|nested] 45+ messages in thread

* Re: [PATCH] scsi: fc: force inlining of wwn conversion functions
  2016-04-26  7:22                                         ` Arnd Bergmann
  2016-04-26  8:35                                           ` Christoph Hellwig
@ 2016-04-26 13:06                                           ` Martin K. Petersen
  2016-04-26 15:58                                             ` Arnd Bergmann
  1 sibling, 1 reply; 45+ messages in thread
From: Martin K. Petersen @ 2016-04-26 13:06 UTC (permalink / raw)
  To: Arnd Bergmann
  Cc: James Bottomley, Martin K. Petersen, Josh Poimboeuf, linux-scsi,
	linux-kernel, Denys Vlasenko, Thomas Graf, Peter Zijlstra,
	David Rientjes, Andrew Morton, jamborm, Ingo Molnar,
	Himanshu Madhani, qla2xxx-upstream

>>>>> "Arnd" == Arnd Bergmann <arnd@arndb.de> writes:

Arnd> I don't think we can realistically blacklist gcc-4.9.{0,1,2,3},
Arnd> gcc-5.{0,1,2,3}.* and gcc-6.0 and require everyone to upgrade to
Arnd> compilers that have not been released yet in order to build a
Arnd> linux-4.6 kernel.

I agree that compiler blacklisting is problematic and I'd like to avoid
it. The question is how far we go in the kernel to accommodate various
levels of brokenness.

In any case. Sticking compiler workarounds in device driver code is akin
to putting demolition orders on display on Alpha Centauri. At the very
minimum the patch should put a fat comment in the code stating that
these wrapper functions or #defines should not be changed in the future
because that'll break builds using gcc XYZ. But that does not solve the
problem for anybody else that might be doing something similar.
Converting between u64 and $RANDOM_TYPE in an inline wrapper does not
seem like a rare and unusual programming pattern.

-- 
Martin K. Petersen	Oracle Linux Engineering

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH] scsi: fc: force inlining of wwn conversion functions
  2016-04-26 13:06                                           ` Martin K. Petersen
@ 2016-04-26 15:58                                             ` Arnd Bergmann
  2016-04-26 22:36                                               ` James Bottomley
  2016-04-27 11:05                                               ` Martin Jambor
  0 siblings, 2 replies; 45+ messages in thread
From: Arnd Bergmann @ 2016-04-26 15:58 UTC (permalink / raw)
  To: Martin K. Petersen
  Cc: James Bottomley, Josh Poimboeuf, linux-scsi, linux-kernel,
	Denys Vlasenko, Thomas Graf, Peter Zijlstra, David Rientjes,
	Andrew Morton, jamborm, Ingo Molnar, Himanshu Madhani,
	qla2xxx-upstream

On Tuesday 26 April 2016 09:06:54 Martin K. Petersen wrote:
> >>>>> "Arnd" == Arnd Bergmann <arnd@arndb.de> writes:
> 
> Arnd> I don't think we can realistically blacklist gcc-4.9.{0,1,2,3},
> Arnd> gcc-5.{0,1,2,3}.* and gcc-6.0 and require everyone to upgrade to
> Arnd> compilers that have not been released yet in order to build a
> Arnd> linux-4.6 kernel.
> 
> I agree that compiler blacklisting is problematic and I'd like to avoid
> it. The question is how far we go in the kernel to accommodate various
> levels of brokenness.
> 
> In any case. Sticking compiler workarounds in device driver code is akin
> to putting demolition orders on display on Alpha Centauri. At the very
> minimum the patch should put a fat comment in the code stating that
> these wrapper functions or #defines should not be changed in the future
> because that'll break builds using gcc XYZ. But that does not solve the
> problem for anybody else that might be doing something similar.
> Converting between u64 and $RANDOM_TYPE in an inline wrapper does not
> seem like a rare and unusual programming pattern.

It's not the driver really, it's the core scsi/fc layer, which makes
it a little dangerous that a random driver.

I agree that putting a comment in would also help. What I understand
from the bug report is that to trigger this bug you need these elements:

1. an inline function marked __always_inline
2. another inline function that is automatically inlined (not __always_inline)
3. CONFIG_OPTIMIZE_INLINING=y to guarantee 2
4. __builtin_compatible_p inside that inline function

The last point is what Denys introduced in the kernel with
bc27fb68aaad ("include/uapi/linux/byteorder, swab: force inlining of some
byteswap operations"). So maybe it's better after all to revert that
patch, to have a higher confidence in the same bug not appearing
elsewhere. It's also really a workaround for another quirk of the
compiler, but that one only results in duplicated functions in object
code rather than functions that end in the middle.

	Arnd

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH] scsi: fc: force inlining of wwn conversion functions
  2016-04-26 15:58                                             ` Arnd Bergmann
@ 2016-04-26 22:36                                               ` James Bottomley
  2016-04-27  0:44                                                 ` Martin K. Petersen
  2016-04-27 11:05                                               ` Martin Jambor
  1 sibling, 1 reply; 45+ messages in thread
From: James Bottomley @ 2016-04-26 22:36 UTC (permalink / raw)
  To: Arnd Bergmann, Martin K. Petersen
  Cc: Josh Poimboeuf, linux-scsi, linux-kernel, Denys Vlasenko,
	Thomas Graf, Peter Zijlstra, David Rientjes, Andrew Morton,
	jamborm, Ingo Molnar, Himanshu Madhani, qla2xxx-upstream

On Tue, 2016-04-26 at 17:58 +0200, Arnd Bergmann wrote:
> On Tuesday 26 April 2016 09:06:54 Martin K. Petersen wrote:
> > > > > > > "Arnd" == Arnd Bergmann <arnd@arndb.de> writes:
> > 
> > Arnd> I don't think we can realistically blacklist gcc
> > -4.9.{0,1,2,3},
> > Arnd> gcc-5.{0,1,2,3}.* and gcc-6.0 and require everyone to upgrade
> > to
> > Arnd> compilers that have not been released yet in order to build a
> > Arnd> linux-4.6 kernel.
> > 
> > I agree that compiler blacklisting is problematic and I'd like to 
> > avoid it. The question is how far we go in the kernel to 
> > accommodate various levels of brokenness.
> > 
> > In any case. Sticking compiler workarounds in device driver code is 
> > akin to putting demolition orders on display on Alpha Centauri. At 
> > the very minimum the patch should put a fat comment in the code 
> > stating that these wrapper functions or #defines should not be 
> > changed in the future because that'll break builds using gcc XYZ. 
> > But that does not solve the problem for anybody else that might be 
> > doing something similar. Converting between u64 and $RANDOM_TYPE in 
> > an inline wrapper does not seem like a rare and unusual programming
> > pattern.
> 
> It's not the driver really, it's the core scsi/fc layer, which makes
> it a little dangerous that a random driver.

Well, it's libfc; that's a fibre channel transport class mostly used by
FCoE drivers ... there's few enough of those to call it driver only.

> I agree that putting a comment in would also help. What I understand
> from the bug report is that to trigger this bug you need these
> elements:
> 
> 1. an inline function marked __always_inline
> 2. another inline function that is automatically inlined (not
> __always_inline)
> 3. CONFIG_OPTIMIZE_INLINING=y to guarantee 2
> 4. __builtin_compatible_p inside that inline function
> 
> The last point is what Denys introduced in the kernel with
> bc27fb68aaad ("include/uapi/linux/byteorder, swab: force inlining of 
> some byteswap operations"). So maybe it's better after all to revert 
> that patch, to have a higher confidence in the same bug not appearing
> elsewhere. It's also really a workaround for another quirk of the
> compiler, but that one only results in duplicated functions in object
> code rather than functions that end in the middle.

Yes, I think this is my preferred option.  That patch is nothing more
than an attempt to force the compiler to do something it didn't do but
should have.  If we apply the general rule that we shouldn't work
around compiler problems in the kernel code, then that should have been
disallowed first.  Plus, as the root cause of all of this, reverting
that patch will ensure that nothing else picks up this problem (at
least from the route we got it).

James

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH] scsi: fc: force inlining of wwn conversion functions
  2016-04-26 22:36                                               ` James Bottomley
@ 2016-04-27  0:44                                                 ` Martin K. Petersen
  0 siblings, 0 replies; 45+ messages in thread
From: Martin K. Petersen @ 2016-04-27  0:44 UTC (permalink / raw)
  To: James Bottomley
  Cc: Arnd Bergmann, Martin K. Petersen, Josh Poimboeuf, linux-scsi,
	linux-kernel, Denys Vlasenko, Thomas Graf, Peter Zijlstra,
	David Rientjes, Andrew Morton, jamborm, Ingo Molnar,
	Himanshu Madhani, qla2xxx-upstream

>>>>> "James" == James Bottomley <James.Bottomley@HansenPartnership.com> writes:

>> The last point is what Denys introduced in the kernel with
>> bc27fb68aaad ("include/uapi/linux/byteorder, swab: force inlining of
>> some byteswap operations"). So maybe it's better after all to revert
>> that patch, to have a higher confidence in the same bug not appearing
>> elsewhere. It's also really a workaround for another quirk of the
>> compiler, but that one only results in duplicated functions in object
>> code rather than functions that end in the middle.

James> Yes, I think this is my preferred option.

Same here.

-- 
Martin K. Petersen	Oracle Linux Engineering

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH] scsi: fc: force inlining of wwn conversion functions
  2016-04-26 15:58                                             ` Arnd Bergmann
  2016-04-26 22:36                                               ` James Bottomley
@ 2016-04-27 11:05                                               ` Martin Jambor
  2016-04-27 21:34                                                 ` Arnd Bergmann
  2016-04-27 22:00                                                 ` [PATCH, RFT] byteswap: try to avoid __builtin_constant_p gcc bug Arnd Bergmann
  1 sibling, 2 replies; 45+ messages in thread
From: Martin Jambor @ 2016-04-27 11:05 UTC (permalink / raw)
  To: Arnd Bergmann
  Cc: Martin K. Petersen, James Bottomley, Josh Poimboeuf, linux-scsi,
	linux-kernel, Denys Vlasenko, Thomas Graf, Peter Zijlstra,
	David Rientjes, Andrew Morton, Ingo Molnar, Himanshu Madhani,
	qla2xxx-upstream, Jan Hubicka

Hi,

On Tue, Apr 26, 2016 at 05:58:20PM +0200, Arnd Bergmann wrote:
> On Tuesday 26 April 2016 09:06:54 Martin K. Petersen wrote:
> > >>>>> "Arnd" == Arnd Bergmann <arnd@arndb.de> writes:
> > 
> > Arnd> I don't think we can realistically blacklist gcc-4.9.{0,1,2,3},
> > Arnd> gcc-5.{0,1,2,3}.* and gcc-6.0 and require everyone to upgrade to
> > Arnd> compilers that have not been released yet in order to build a
> > Arnd> linux-4.6 kernel.
> > 
> > I agree that compiler blacklisting is problematic and I'd like to avoid
> > it. The question is how far we go in the kernel to accommodate various
> > levels of brokenness.
> > 
> > In any case. Sticking compiler workarounds in device driver code is akin
> > to putting demolition orders on display on Alpha Centauri. At the very
> > minimum the patch should put a fat comment in the code stating that
> > these wrapper functions or #defines should not be changed in the future
> > because that'll break builds using gcc XYZ. But that does not solve the
> > problem for anybody else that might be doing something similar.
> > Converting between u64 and $RANDOM_TYPE in an inline wrapper does not
> > seem like a rare and unusual programming pattern.
> 
> It's not the driver really, it's the core scsi/fc layer, which makes
> it a little dangerous that a random driver.
> 
> I agree that putting a comment in would also help. What I understand
> from the bug report is that to trigger this bug you need these elements:
> 
> 1. an inline function marked __always_inline
> 2. another inline function that is automatically inlined (not __always_inline)
> 3. CONFIG_OPTIMIZE_INLINING=y to guarantee 2
> 4. __builtin_compatible_p inside that inline function

The __always_inline requirement is not true.  In fact, if you look at
the example testcase filed in
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70646#c7 you'll see it
uses __builtin_compatible_p in an __always inline function that is
called from one that is not tagged with that attribute.

And generally speaking, always inline is never a requirement, any call
or chain of calls that the inliner can decide to inline can lead to
the bug (if it complies with the condition below).

What is a requirement, though, is that __builtin_compatible_p is
called on something passed in an argument by reference or in an
aggregate (i.e. struct or array) argument.

So,

  int foo1 (unsigned long *ref)
  {
    if (__builtin_constant (*ref))
      ...
    else
      /* wrongly unreachable code */
  }

can lead to this issue, as can

  int foo2 (struct S s)
  {
    if ((__builtin_constant (s.l))
      ...
    else
      /* wrongly unreachable code */
  }

but

  int foo3 (unsigned long val)
  {
    if (__builtin_constant (val))
      ...
    else
      /* all OK */
  }

cannot, and is fine.  But please note that wrapping a foo[12]-like
function into a dereferencing wrapper might not help if foo[12] would
be early-inlined into such wrapper (GCC has two inliners, a very
simple early-inliner that only handles simple cases and a full-blown
IPA inliner that contains the bug).  I believe this can be ensured by
making the wrapper always_inline and never calling it indirectly (via
a pointer).  Honza (CCed), you know inlining heuristics better, please
correct me if my last statement is somehow inaccurate (or indeed if
you have a better idea how kernel developers can make sure they do not
hit the bug).

Thanks,

Martin

> 
> The last point is what Denys introduced in the kernel with
> bc27fb68aaad ("include/uapi/linux/byteorder, swab: force inlining of some
> byteswap operations"). So maybe it's better after all to revert that
> patch, to have a higher confidence in the same bug not appearing
> elsewhere. It's also really a workaround for another quirk of the
> compiler, but that one only results in duplicated functions in object
> code rather than functions that end in the middle.
> 
> 	Arnd
> 

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH] scsi: fc: force inlining of wwn conversion functions
  2016-04-27 11:05                                               ` Martin Jambor
@ 2016-04-27 21:34                                                 ` Arnd Bergmann
  2016-04-28 14:58                                                   ` Chris Metcalf
  2016-04-27 22:00                                                 ` [PATCH, RFT] byteswap: try to avoid __builtin_constant_p gcc bug Arnd Bergmann
  1 sibling, 1 reply; 45+ messages in thread
From: Arnd Bergmann @ 2016-04-27 21:34 UTC (permalink / raw)
  To: Martin Jambor
  Cc: Martin K. Petersen, James Bottomley, Josh Poimboeuf, linux-scsi,
	linux-kernel, Denys Vlasenko, Thomas Graf, Peter Zijlstra,
	David Rientjes, Andrew Morton, Ingo Molnar, Himanshu Madhani,
	qla2xxx-upstream, Jan Hubicka, Chris Metcalf

On Wednesday 27 April 2016 13:05:03 Martin Jambor wrote:
> On Tue, Apr 26, 2016 at 05:58:20PM +0200, Arnd Bergmann wrote:
> > On Tuesday 26 April 2016 09:06:54 Martin K. Petersen wrote:
> > > >>>>> "Arnd" == Arnd Bergmann <arnd@arndb.de> writes:
> > > 
> > > Arnd> I don't think we can realistically blacklist gcc-4.9.{0,1,2,3},
> > > Arnd> gcc-5.{0,1,2,3}.* and gcc-6.0 and require everyone to upgrade to
> > > Arnd> compilers that have not been released yet in order to build a
> > > Arnd> linux-4.6 kernel.
> > > 
> > > I agree that compiler blacklisting is problematic and I'd like to avoid
> > > it. The question is how far we go in the kernel to accommodate various
> > > levels of brokenness.
> > > 
> > > In any case. Sticking compiler workarounds in device driver code is akin
> > > to putting demolition orders on display on Alpha Centauri. At the very
> > > minimum the patch should put a fat comment in the code stating that
> > > these wrapper functions or #defines should not be changed in the future
> > > because that'll break builds using gcc XYZ. But that does not solve the
> > > problem for anybody else that might be doing something similar.
> > > Converting between u64 and $RANDOM_TYPE in an inline wrapper does not
> > > seem like a rare and unusual programming pattern.
> > 
> > It's not the driver really, it's the core scsi/fc layer, which makes
> > it a little dangerous that a random driver.
> > 
> > I agree that putting a comment in would also help. What I understand
> > from the bug report is that to trigger this bug you need these elements:
> > 
> > 1. an inline function marked __always_inline
> > 2. another inline function that is automatically inlined (not __always_inline)
> > 3. CONFIG_OPTIMIZE_INLINING=y to guarantee 2
> > 4. __builtin_compatible_p inside that inline function
> 
> The __always_inline requirement is not true.  In fact, if you look at
> the example testcase filed in
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70646#c7 you'll see it
> uses __builtin_compatible_p in an __always inline function that is
> called from one that is not tagged with that attribute.
>
> And generally speaking, always inline is never a requirement, any call
> or chain of calls that the inliner can decide to inline can lead to
> the bug (if it complies with the condition below).

Ok, thanks for the clarification, I thought you always had to have both
kinds of inline functions.
 
> What is a requirement, though, is that __builtin_compatible_p is
> called on something passed in an argument by reference or in an
> aggregate (i.e. struct or array) argument.
> 
> So,
> 
>   int foo1 (unsigned long *ref)
>   {
>     if (__builtin_constant (*ref))
>       ...
>     else
>       /* wrongly unreachable code */
>   }
> 
>   }
> 
> cannot, and is fine.  But please note that wrapping a foo[12]-like
> function into a dereferencing wrapper might not help if foo[12] would
> be early-inlined into such wrapper (GCC has two inliners, a very
> simple early-inliner that only handles simple cases and a full-blown
> IPA inliner that contains the bug).  I believe this can be ensured by
> making the wrapper always_inline and never calling it indirectly (via
> a pointer).  Honza (CCed), you know inlining heuristics better, please
> correct me if my last statement is somehow inaccurate (or indeed if
> you have a better idea how kernel developers can make sure they do not
> hit the bug).

I guess that means that any user of this code in the kernel:

static inline __attribute_const__ __u64 __fswab64(__u64 val)
{
#ifdef __HAVE_BUILTIN_BSWAP64__
        return __builtin_bswap64(val);
#elif defined (__arch_swab64)
        return __arch_swab64(val);
#elif defined(__SWAB_64_THRU_32__)
        __u32 h = val >> 32;
        __u32 l = val & ((1ULL << 32) - 1);
        return (((__u64)__fswab32(l)) << 32) | ((__u64)(__fswab32(h)));
#else
        return ___constant_swab64(val);
#endif
}

#define __swab64(x)                             \
        (__builtin_constant_p((__u64)(x)) ?     \
        ___constant_swab64(x) :                 \
        __fswab64(x))

static __always_inline __u64 __swab64p(const __u64 *p)
{       
#ifdef __arch_swab64p
        return __arch_swab64p(p);
#else
        return __swab64(*p);
#endif
}

has a chance of running into the same problem, and we may want to solve
it at the root. For architectures that define __HAVE_BUILTIN_BSWAP64__
(i.e. ARM, MIPS, POWERPC, S390, and x86 with gcc-4.4 or higher, 4.8
for __HAVE_BUILTIN_BSWAP16__), we can probably just change the logic
to avoid __builtin_constant_p() and always use __builtin_bswap64().

This won't help on TILE, which is the one architecture that sets
ARCH_SUPPORTS_OPTIMIZED_INLINING but does not set ARCH_USE_BUILTIN_BSWAP.
Chris Metcalf should be able to figure out whether we can just
set ARCH_USE_BUILTIN_BSWAP for tile as well.

	Arnd

^ permalink raw reply	[flat|nested] 45+ messages in thread

* [PATCH, RFT] byteswap: try to avoid __builtin_constant_p gcc bug
  2016-04-27 11:05                                               ` Martin Jambor
  2016-04-27 21:34                                                 ` Arnd Bergmann
@ 2016-04-27 22:00                                                 ` Arnd Bergmann
  2016-04-27 22:11                                                   ` Josh Poimboeuf
  1 sibling, 1 reply; 45+ messages in thread
From: Arnd Bergmann @ 2016-04-27 22:00 UTC (permalink / raw)
  To: Martin Jambor
  Cc: Martin K. Petersen, James Bottomley, Josh Poimboeuf, linux-scsi,
	linux-kernel, Denys Vlasenko, Thomas Graf, Peter Zijlstra,
	David Rientjes, Andrew Morton, Ingo Molnar, Himanshu Madhani,
	qla2xxx-upstream, Jan Hubicka

This is another attempt to avoid a regression in wwn_to_u64()
after that started using get_unaligned_be64(), which in turn
ran into a bug on gcc-4.9 through 6.1.

As part of the problem is how __builtin_constant_p gets evaluated
on an argument passed by reference into an inline function, this
avoids the use of __builtin_constant_p() for all architectures
that set CONFIG_ARCH_USE_BUILTIN_BSWAP. Most architectures do not
set ARCH_SUPPORTS_OPTIMIZED_INLINING, which means they probably
do not suffer from the problem in the qla2xxx driver, but they
might still run into it elsewhere.

I have not been able to reproduce the original problem, so I don't
know if this patch solves it, but at least it leads to simpler
code doing the same thing, so at least there should be no downsides.

Please test.

Signed-off-by: Arnd Bergmann <arnd@arndb.de>

diff --git a/include/uapi/linux/swab.h b/include/uapi/linux/swab.h
index 3f10e5317b46..de56fd54428d 100644
--- a/include/uapi/linux/swab.h
+++ b/include/uapi/linux/swab.h
@@ -45,9 +45,7 @@
 
 static inline __attribute_const__ __u16 __fswab16(__u16 val)
 {
-#ifdef __HAVE_BUILTIN_BSWAP16__
-	return __builtin_bswap16(val);
-#elif defined (__arch_swab16)
+#if defined (__arch_swab16)
 	return __arch_swab16(val);
 #else
 	return ___constant_swab16(val);
@@ -56,9 +54,7 @@ static inline __attribute_const__ __u16 __fswab16(__u16 val)
 
 static inline __attribute_const__ __u32 __fswab32(__u32 val)
 {
-#ifdef __HAVE_BUILTIN_BSWAP32__
-	return __builtin_bswap32(val);
-#elif defined(__arch_swab32)
+#if defined(__arch_swab32)
 	return __arch_swab32(val);
 #else
 	return ___constant_swab32(val);
@@ -67,9 +63,7 @@ static inline __attribute_const__ __u32 __fswab32(__u32 val)
 
 static inline __attribute_const__ __u64 __fswab64(__u64 val)
 {
-#ifdef __HAVE_BUILTIN_BSWAP64__
-	return __builtin_bswap64(val);
-#elif defined (__arch_swab64)
+#if defined (__arch_swab64)
 	return __arch_swab64(val);
 #elif defined(__SWAB_64_THRU_32__)
 	__u32 h = val >> 32;
@@ -102,28 +96,40 @@ static inline __attribute_const__ __u32 __fswahb32(__u32 val)
  * __swab16 - return a byteswapped 16-bit value
  * @x: value to byteswap
  */
+#ifdef __HAVE_BUILTIN_BSWAP16__
+#define __swab16(x) __builtin_bswap16((__u16)(x))
+#else
 #define __swab16(x)				\
 	(__builtin_constant_p((__u16)(x)) ?	\
 	___constant_swab16(x) :			\
 	__fswab16(x))
+#endif
 
 /**
  * __swab32 - return a byteswapped 32-bit value
  * @x: value to byteswap
  */
+#ifdef __HAVE_BUILTIN_BSWAP32__
+#define __swab32(x) __builtin_bswap32((__u32)(x))
+#else
 #define __swab32(x)				\
 	(__builtin_constant_p((__u32)(x)) ?	\
 	___constant_swab32(x) :			\
 	__fswab32(x))
+#endif
 
 /**
  * __swab64 - return a byteswapped 64-bit value
  * @x: value to byteswap
  */
+#ifdef __HAVE_BUILTIN_BSWAP64__
+#define __swab64(x) __builtin_bswap64((__u64)(x))
+#else
 #define __swab64(x)				\
 	(__builtin_constant_p((__u64)(x)) ?	\
 	___constant_swab64(x) :			\
 	__fswab64(x))
+#endif
 
 /**
  * __swahw32 - return a word-swapped 32-bit value

^ permalink raw reply related	[flat|nested] 45+ messages in thread

* Re: [PATCH, RFT] byteswap: try to avoid __builtin_constant_p gcc bug
  2016-04-27 22:00                                                 ` [PATCH, RFT] byteswap: try to avoid __builtin_constant_p gcc bug Arnd Bergmann
@ 2016-04-27 22:11                                                   ` Josh Poimboeuf
  2016-04-28 16:27                                                     ` Quinn Tran
  0 siblings, 1 reply; 45+ messages in thread
From: Josh Poimboeuf @ 2016-04-27 22:11 UTC (permalink / raw)
  To: Arnd Bergmann
  Cc: Martin Jambor, Martin K. Petersen, James Bottomley, linux-scsi,
	linux-kernel, Denys Vlasenko, Thomas Graf, Peter Zijlstra,
	David Rientjes, Andrew Morton, Ingo Molnar, Himanshu Madhani,
	qla2xxx-upstream, Jan Hubicka

On Thu, Apr 28, 2016 at 12:00:36AM +0200, Arnd Bergmann wrote:
> This is another attempt to avoid a regression in wwn_to_u64()
> after that started using get_unaligned_be64(), which in turn
> ran into a bug on gcc-4.9 through 6.1.
> 
> As part of the problem is how __builtin_constant_p gets evaluated
> on an argument passed by reference into an inline function, this
> avoids the use of __builtin_constant_p() for all architectures
> that set CONFIG_ARCH_USE_BUILTIN_BSWAP. Most architectures do not
> set ARCH_SUPPORTS_OPTIMIZED_INLINING, which means they probably
> do not suffer from the problem in the qla2xxx driver, but they
> might still run into it elsewhere.
> 
> I have not been able to reproduce the original problem, so I don't
> know if this patch solves it, but at least it leads to simpler
> code doing the same thing, so at least there should be no downsides.
> 
> Please test.
> 
> Signed-off-by: Arnd Bergmann <arnd@arndb.de>

Nice patch.  I can confirm it fixes the issue with gcc 5.3.1.

Tested-by: Josh Poimboeuf <jpoimboe@redhat.com>
Reviewed-by: Josh Poimboeuf <jpoimboe@redhat.com>

> diff --git a/include/uapi/linux/swab.h b/include/uapi/linux/swab.h
> index 3f10e5317b46..de56fd54428d 100644
> --- a/include/uapi/linux/swab.h
> +++ b/include/uapi/linux/swab.h
> @@ -45,9 +45,7 @@
>  
>  static inline __attribute_const__ __u16 __fswab16(__u16 val)
>  {
> -#ifdef __HAVE_BUILTIN_BSWAP16__
> -	return __builtin_bswap16(val);
> -#elif defined (__arch_swab16)
> +#if defined (__arch_swab16)
>  	return __arch_swab16(val);
>  #else
>  	return ___constant_swab16(val);
> @@ -56,9 +54,7 @@ static inline __attribute_const__ __u16 __fswab16(__u16 val)
>  
>  static inline __attribute_const__ __u32 __fswab32(__u32 val)
>  {
> -#ifdef __HAVE_BUILTIN_BSWAP32__
> -	return __builtin_bswap32(val);
> -#elif defined(__arch_swab32)
> +#if defined(__arch_swab32)
>  	return __arch_swab32(val);
>  #else
>  	return ___constant_swab32(val);
> @@ -67,9 +63,7 @@ static inline __attribute_const__ __u32 __fswab32(__u32 val)
>  
>  static inline __attribute_const__ __u64 __fswab64(__u64 val)
>  {
> -#ifdef __HAVE_BUILTIN_BSWAP64__
> -	return __builtin_bswap64(val);
> -#elif defined (__arch_swab64)
> +#if defined (__arch_swab64)
>  	return __arch_swab64(val);
>  #elif defined(__SWAB_64_THRU_32__)
>  	__u32 h = val >> 32;
> @@ -102,28 +96,40 @@ static inline __attribute_const__ __u32 __fswahb32(__u32 val)
>   * __swab16 - return a byteswapped 16-bit value
>   * @x: value to byteswap
>   */
> +#ifdef __HAVE_BUILTIN_BSWAP16__
> +#define __swab16(x) __builtin_bswap16((__u16)(x))
> +#else
>  #define __swab16(x)				\
>  	(__builtin_constant_p((__u16)(x)) ?	\
>  	___constant_swab16(x) :			\
>  	__fswab16(x))
> +#endif
>  
>  /**
>   * __swab32 - return a byteswapped 32-bit value
>   * @x: value to byteswap
>   */
> +#ifdef __HAVE_BUILTIN_BSWAP32__
> +#define __swab32(x) __builtin_bswap32((__u32)(x))
> +#else
>  #define __swab32(x)				\
>  	(__builtin_constant_p((__u32)(x)) ?	\
>  	___constant_swab32(x) :			\
>  	__fswab32(x))
> +#endif
>  
>  /**
>   * __swab64 - return a byteswapped 64-bit value
>   * @x: value to byteswap
>   */
> +#ifdef __HAVE_BUILTIN_BSWAP64__
> +#define __swab64(x) __builtin_bswap64((__u64)(x))
> +#else
>  #define __swab64(x)				\
>  	(__builtin_constant_p((__u64)(x)) ?	\
>  	___constant_swab64(x) :			\
>  	__fswab64(x))
> +#endif
>  
>  /**
>   * __swahw32 - return a word-swapped 32-bit value
> 

-- 
Josh

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH] scsi: fc: force inlining of wwn conversion functions
  2016-04-27 21:34                                                 ` Arnd Bergmann
@ 2016-04-28 14:58                                                   ` Chris Metcalf
  2016-04-28 15:23                                                     ` Arnd Bergmann
  0 siblings, 1 reply; 45+ messages in thread
From: Chris Metcalf @ 2016-04-28 14:58 UTC (permalink / raw)
  To: Arnd Bergmann, Martin Jambor
  Cc: Martin K. Petersen, James Bottomley, Josh Poimboeuf, linux-scsi,
	linux-kernel, Denys Vlasenko, Thomas Graf, Peter Zijlstra,
	David Rientjes, Andrew Morton, Ingo Molnar, Himanshu Madhani,
	qla2xxx-upstream, Jan Hubicka

(Resending as text/plain)

On 4/27/2016 5:34 PM, Arnd Bergmann wrote:
> This won't help on TILE, which is the one architecture that sets
> ARCH_SUPPORTS_OPTIMIZED_INLINING but does not set ARCH_USE_BUILTIN_BSWAP.
> Chris Metcalf should be able to figure out whether we can just
> set ARCH_USE_BUILTIN_BSWAP for tile as well.

We certainly could enable ARCH_USE_BUILTIN_BSWAP.  The only problem is
that we never added explicit support for bswap16() in gcc, which is
efficiently done on tilegx via the "revbytes" instruction and a 48-bit
right-shift.  So gcc instead does a generic thing with four
instructions in three bundles, so really not as good as our asm/swab.h.

I'm not sure how to weigh the implications of converting to
builtin_bswap16 (and possibly upstreaming a better implementation to
gcc), vs. disabling ARCH_SUPPORTS_OPTIMIZED_INLINING (which no one
else but x86 uses anyway), vs. just ignoring the compiler bug and
hoping it's not an issue in practice :-)
-- 
Chris Metcalf, Mellanox Technologies
http://www.mellanox.com

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH] scsi: fc: force inlining of wwn conversion functions
  2016-04-28 14:58                                                   ` Chris Metcalf
@ 2016-04-28 15:23                                                     ` Arnd Bergmann
  2016-04-28 15:48                                                       ` Chris Metcalf
  0 siblings, 1 reply; 45+ messages in thread
From: Arnd Bergmann @ 2016-04-28 15:23 UTC (permalink / raw)
  To: Chris Metcalf
  Cc: Martin Jambor, Martin K. Petersen, James Bottomley,
	Josh Poimboeuf, linux-scsi, linux-kernel, Denys Vlasenko,
	Thomas Graf, Peter Zijlstra, David Rientjes, Andrew Morton,
	Ingo Molnar, Himanshu Madhani, qla2xxx-upstream, Jan Hubicka

On Thursday 28 April 2016 10:58:43 Chris Metcalf wrote:
> (Resending as text/plain)
> 
> On 4/27/2016 5:34 PM, Arnd Bergmann wrote:
> > This won't help on TILE, which is the one architecture that sets
> > ARCH_SUPPORTS_OPTIMIZED_INLINING but does not set ARCH_USE_BUILTIN_BSWAP.
> > Chris Metcalf should be able to figure out whether we can just
> > set ARCH_USE_BUILTIN_BSWAP for tile as well.
> 
> We certainly could enable ARCH_USE_BUILTIN_BSWAP.  The only problem is
> that we never added explicit support for bswap16() in gcc, which is
> efficiently done on tilegx via the "revbytes" instruction and a 48-bit
> right-shift.  So gcc instead does a generic thing with four
> instructions in three bundles, so really not as good as our asm/swab.h.
> 
> I'm not sure how to weigh the implications of converting to
> builtin_bswap16 (and possibly upstreaming a better implementation to
> gcc), vs. disabling ARCH_SUPPORTS_OPTIMIZED_INLINING (which no one
> else but x86 uses anyway), vs. just ignoring the compiler bug and
> hoping it's not an issue in practice 

How about figuring out whether you hit the gcc bug on tile as a
first step?

Another idea would be to adapt this section in include/linux/compiler-gcc.h:

#if !defined(CONFIG_ARCH_SUPPORTS_OPTIMIZED_INLINING) ||                \
    !defined(CONFIG_OPTIMIZE_INLINING) || (__GNUC__ < 4)
#define inline          inline          __attribute__((always_inline)) notrace
#define __inline__      __inline__      __attribute__((always_inline)) notrace
#define __inline        __inline        __attribute__((always_inline)) notrace
#else
/* A lot of inline functions can cause havoc with function tracing */
#define inline          inline          notrace
#define __inline__      __inline__      notrace
#define __inline        __inline        notrace
#endif

to work around the issue. We already check for gcc before 4.0, and
we could also check for the affected releases (4.9, 5.x, 6.1) in the
same place, possibly conditional on ARCH_USE_BUILTIN_BSWAP with
a comment pointing to the gcc bug tracker.

	Arnd

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH] scsi: fc: force inlining of wwn conversion functions
  2016-04-28 15:23                                                     ` Arnd Bergmann
@ 2016-04-28 15:48                                                       ` Chris Metcalf
  0 siblings, 0 replies; 45+ messages in thread
From: Chris Metcalf @ 2016-04-28 15:48 UTC (permalink / raw)
  To: Arnd Bergmann
  Cc: Martin Jambor, Martin K. Petersen, James Bottomley,
	Josh Poimboeuf, linux-scsi, linux-kernel, Denys Vlasenko,
	Thomas Graf, Peter Zijlstra, David Rientjes, Andrew Morton,
	Ingo Molnar, Himanshu Madhani, qla2xxx-upstream, Jan Hubicka

On 4/28/2016 11:23 AM, Arnd Bergmann wrote:
> On Thursday 28 April 2016 10:58:43 Chris Metcalf wrote:
>> (Resending as text/plain)
>>
>> On 4/27/2016 5:34 PM, Arnd Bergmann wrote:
>>> This won't help on TILE, which is the one architecture that sets
>>> ARCH_SUPPORTS_OPTIMIZED_INLINING but does not set ARCH_USE_BUILTIN_BSWAP.
>>> Chris Metcalf should be able to figure out whether we can just
>>> set ARCH_USE_BUILTIN_BSWAP for tile as well.
>> We certainly could enable ARCH_USE_BUILTIN_BSWAP.  The only problem is
>> that we never added explicit support for bswap16() in gcc, which is
>> efficiently done on tilegx via the "revbytes" instruction and a 48-bit
>> right-shift.  So gcc instead does a generic thing with four
>> instructions in three bundles, so really not as good as our asm/swab.h.
>>
>> I'm not sure how to weigh the implications of converting to
>> builtin_bswap16 (and possibly upstreaming a better implementation to
>> gcc), vs. disabling ARCH_SUPPORTS_OPTIMIZED_INLINING (which no one
>> else but x86 uses anyway), vs. just ignoring the compiler bug and
>> hoping it's not an issue in practice
> How about figuring out whether you hit the gcc bug on tile as a
> first step?

I don't have an affected build of gcc handy (just 4.8 and 4.4).  I will pass
this to our compiler folks and see what they know.

> Another idea would be to adapt this section in include/linux/compiler-gcc.h:
>
> #if !defined(CONFIG_ARCH_SUPPORTS_OPTIMIZED_INLINING) ||                \
>      !defined(CONFIG_OPTIMIZE_INLINING) || (__GNUC__ < 4)
> #define inline          inline          __attribute__((always_inline)) notrace
> #define __inline__      __inline__      __attribute__((always_inline)) notrace
> #define __inline        __inline        __attribute__((always_inline)) notrace
> #else
> /* A lot of inline functions can cause havoc with function tracing */
> #define inline          inline          notrace
> #define __inline__      __inline__      notrace
> #define __inline        __inline        notrace
> #endif
>
> to work around the issue. We already check for gcc before 4.0, and
> we could also check for the affected releases (4.9, 5.x, 6.1) in the
> same place, possibly conditional on ARCH_USE_BUILTIN_BSWAP with
> a comment pointing to the gcc bug tracker.

This does seem like a more robust solution anyway, since more instances of the
bad inline pattern might get introduced in the future in other places.
I wouldn't make it conditional on ARCH_USE_BUILTIN_BSWAP for the same reason.

-- 
Chris Metcalf, Mellanox Technologies
http://www.mellanox.com

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH, RFT] byteswap: try to avoid __builtin_constant_p gcc bug
  2016-04-27 22:11                                                   ` Josh Poimboeuf
@ 2016-04-28 16:27                                                     ` Quinn Tran
  0 siblings, 0 replies; 45+ messages in thread
From: Quinn Tran @ 2016-04-28 16:27 UTC (permalink / raw)
  To: Josh Poimboeuf, Arnd Bergmann
  Cc: Martin Jambor, Martin K. Petersen, James Bottomley, linux-scsi,
	linux-kernel, Denys Vlasenko, Thomas Graf, Peter Zijlstra,
	David Rientjes, Andrew Morton, Ingo Molnar, Himanshu Madhani,
	Dept-Eng QLA2xxx Upstream, Jan Hubicka



-----Original Message-----
From: <linux-scsi-owner@vger.kernel.org> on behalf of Josh Poimboeuf <jpoimboe@redhat.com>
Date: Wednesday, April 27, 2016 at 3:11 PM
To: Arnd Bergmann <arnd@arndb.de>
Cc: Martin Jambor <mjambor@suse.cz>, "Martin K. Petersen" <martin.petersen@oracle.com>, James Bottomley <James.Bottomley@hansenpartnership.com>, linux-scsi <linux-scsi@vger.kernel.org>, linux-kernel <linux-kernel@vger.kernel.org>, Denys Vlasenko <dvlasenk@redhat.com>, Thomas Graf <tgraf@suug.ch>, Peter Zijlstra <peterz@infradead.org>, David Rientjes <rientjes@google.com>, Andrew Morton <akpm@linux-foundation.org>, Ingo Molnar <mingo@kernel.org>, Himanshu Madhani <himanshu.madhani@qlogic.com>, Dept-Eng QLA2xxx Upstream <qla2xxx-upstream@qlogic.com>, Jan Hubicka <hubicka@ucw.cz>
Subject: Re: [PATCH, RFT] byteswap: try to avoid __builtin_constant_p gcc bug

>On Thu, Apr 28, 2016 at 12:00:36AM +0200, Arnd Bergmann wrote:
>> This is another attempt to avoid a regression in wwn_to_u64()
>> after that started using get_unaligned_be64(), which in turn
>> ran into a bug on gcc-4.9 through 6.1.
>> 
>> As part of the problem is how __builtin_constant_p gets evaluated
>> on an argument passed by reference into an inline function, this
>> avoids the use of __builtin_constant_p() for all architectures
>> that set CONFIG_ARCH_USE_BUILTIN_BSWAP. Most architectures do not
>> set ARCH_SUPPORTS_OPTIMIZED_INLINING, which means they probably
>> do not suffer from the problem in the qla2xxx driver, but they
>> might still run into it elsewhere.
>> 
>> I have not been able to reproduce the original problem, so I don't
>> know if this patch solves it, but at least it leads to simpler
>> code doing the same thing, so at least there should be no downsides.
>> 
>> Please test.
>> 
>> Signed-off-by: Arnd Bergmann <arnd@arndb.de>
>
>Nice patch.  I can confirm it fixes the issue with gcc 5.3.1.
>
>Tested-by: Josh Poimboeuf <jpoimboe@redhat.com>
>Reviewed-by: Josh Poimboeuf <jpoimboe@redhat.com>
>
>> diff --git a/include/uapi/linux/swab.h b/include/uapi/linux/swab.h
>> index 3f10e5317b46..de56fd54428d 100644
>> --- a/include/uapi/linux/swab.h
>> +++ b/include/uapi/linux/swab.h
>> @@ -45,9 +45,7 @@
>>  
>>  static inline __attribute_const__ __u16 __fswab16(__u16 val)
>>  {
>> -#ifdef __HAVE_BUILTIN_BSWAP16__
>> -	return __builtin_bswap16(val);
>> -#elif defined (__arch_swab16)
>> +#if defined (__arch_swab16)
>>  	return __arch_swab16(val);
>>  #else
>>  	return ___constant_swab16(val);
>> @@ -56,9 +54,7 @@ static inline __attribute_const__ __u16 __fswab16(__u16 val)
>>  
>>  static inline __attribute_const__ __u32 __fswab32(__u32 val)
>>  {
>> -#ifdef __HAVE_BUILTIN_BSWAP32__
>> -	return __builtin_bswap32(val);
>> -#elif defined(__arch_swab32)
>> +#if defined(__arch_swab32)
>>  	return __arch_swab32(val);
>>  #else
>>  	return ___constant_swab32(val);
>> @@ -67,9 +63,7 @@ static inline __attribute_const__ __u32 __fswab32(__u32 val)
>>  
>>  static inline __attribute_const__ __u64 __fswab64(__u64 val)
>>  {
>> -#ifdef __HAVE_BUILTIN_BSWAP64__
>> -	return __builtin_bswap64(val);
>> -#elif defined (__arch_swab64)
>> +#if defined (__arch_swab64)
>>  	return __arch_swab64(val);
>>  #elif defined(__SWAB_64_THRU_32__)
>>  	__u32 h = val >> 32;
>> @@ -102,28 +96,40 @@ static inline __attribute_const__ __u32 __fswahb32(__u32 val)
>>   * __swab16 - return a byteswapped 16-bit value
>>   * @x: value to byteswap
>>   */
>> +#ifdef __HAVE_BUILTIN_BSWAP16__
>> +#define __swab16(x) __builtin_bswap16((__u16)(x))
>> +#else
>>  #define __swab16(x)				\
>>  	(__builtin_constant_p((__u16)(x)) ?	\
>>  	___constant_swab16(x) :			\
>>  	__fswab16(x))
>> +#endif
>>  
>>  /**
>>   * __swab32 - return a byteswapped 32-bit value
>>   * @x: value to byteswap
>>   */
>> +#ifdef __HAVE_BUILTIN_BSWAP32__
>> +#define __swab32(x) __builtin_bswap32((__u32)(x))
>> +#else
>>  #define __swab32(x)				\
>>  	(__builtin_constant_p((__u32)(x)) ?	\
>>  	___constant_swab32(x) :			\
>>  	__fswab32(x))
>> +#endif
>>  
>>  /**
>>   * __swab64 - return a byteswapped 64-bit value
>>   * @x: value to byteswap
>>   */
>> +#ifdef __HAVE_BUILTIN_BSWAP64__
>> +#define __swab64(x) __builtin_bswap64((__u64)(x))
>> +#else
>>  #define __swab64(x)				\
>>  	(__builtin_constant_p((__u64)(x)) ?	\
>>  	___constant_swab64(x) :			\
>>  	__fswab64(x))
>> +#endif
>>  
>>  /**
>>   * __swahw32 - return a word-swapped 32-bit value
>> 

The patch works. Thanks.

>
>-- 
>Josh
>--
>To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
>the body of a message to majordomo@vger.kernel.org
>More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 45+ messages in thread

end of thread, other threads:[~2016-04-28 16:27 UTC | newest]

Thread overview: 45+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-02-04 19:45 [PATCH] asm-generic: force inlining of some atomic_long operations Denys Vlasenko
2016-02-04 19:45 ` [PATCH] force inlining of some byteswap operations Denys Vlasenko
2016-02-05  7:28   ` Ingo Molnar
2016-04-13  3:36   ` This patch triggers a bad gcc bug (was Re: [PATCH] force inlining of some byteswap operations) Josh Poimboeuf
2016-04-13 12:12     ` Denys Vlasenko
2016-04-13 12:36       ` Josh Poimboeuf
2016-04-13 15:15         ` Josh Poimboeuf
2016-04-13 16:55           ` James Bottomley
2016-04-13 17:10             ` Josh Poimboeuf
2016-04-14 15:29               ` Denys Vlasenko
2016-04-14 15:57                 ` Josh Poimboeuf
2016-04-14 17:09                   ` Denys Vlasenko
2016-04-15  5:45                     ` Ingo Molnar
2016-04-15 13:47                       ` Josh Poimboeuf
2016-04-15 22:20                         ` Josh Poimboeuf
2016-04-16  9:03                           ` Ingo Molnar
2016-04-18 13:39                             ` Josh Poimboeuf
2016-04-18 14:07                               ` Arnd Bergmann
2016-04-18 14:12                                 ` Josh Poimboeuf
2016-04-18 14:21                                   ` Arnd Bergmann
2016-04-19  8:52                               ` Ingo Molnar
2016-04-19 13:56                                 ` [PATCH] scsi: fc: force inlining of wwn conversion functions Josh Poimboeuf
2016-04-22 23:17                                   ` Quinn Tran
2016-04-25 16:07                                   ` Josh Poimboeuf
2016-04-26  2:40                                     ` Martin K. Petersen
2016-04-26  3:37                                       ` James Bottomley
2016-04-26  7:22                                         ` Arnd Bergmann
2016-04-26  8:35                                           ` Christoph Hellwig
2016-04-26 10:05                                             ` Arnd Bergmann
2016-04-26 13:06                                           ` Martin K. Petersen
2016-04-26 15:58                                             ` Arnd Bergmann
2016-04-26 22:36                                               ` James Bottomley
2016-04-27  0:44                                                 ` Martin K. Petersen
2016-04-27 11:05                                               ` Martin Jambor
2016-04-27 21:34                                                 ` Arnd Bergmann
2016-04-28 14:58                                                   ` Chris Metcalf
2016-04-28 15:23                                                     ` Arnd Bergmann
2016-04-28 15:48                                                       ` Chris Metcalf
2016-04-27 22:00                                                 ` [PATCH, RFT] byteswap: try to avoid __builtin_constant_p gcc bug Arnd Bergmann
2016-04-27 22:11                                                   ` Josh Poimboeuf
2016-04-28 16:27                                                     ` Quinn Tran
2016-04-16  7:42                       ` This patch triggers a bad gcc bug (was Re: [PATCH] force inlining of some byteswap operations) Arnd Bergmann
2016-04-18 13:22                         ` Josh Poimboeuf
2016-02-04 19:45 ` [PATCH] force inlining of unaligned byteswap operations Denys Vlasenko
2016-02-05  7:28   ` Ingo Molnar

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).