KVM Archive mirror
 help / color / mirror / Atom feed
* [linus:master] [x86/bugs]  6613d82e61: general_protection_fault:#[##]
@ 2024-03-28  7:36 kernel test robot
  2024-03-28 21:17 ` Pawan Gupta
  0 siblings, 1 reply; 4+ messages in thread
From: kernel test robot @ 2024-03-28  7:36 UTC (permalink / raw
  To: Pawan Gupta; +Cc: oe-lkp, lkp, linux-kernel, Dave Hansen, kvm, oliver.sang



Hello,


we reported a performance issue for this commit in
https://lore.kernel.org/all/202403041300.a7fb1462-yujie.liu@intel.com/

now we noticed a persistent crash issue:

a0e2dab44d22b913 6613d82e617dd7eb8b0c40b2fe3
---------------- ---------------------------
       fail:runs  %reproduction    fail:runs
           |             |             |
           :100         99%          100:100   dmesg.EIP:restore_all_switch_stack
           :100         99%          100:100   dmesg.Kernel_panic-not_syncing:Fatal_exception
           :100         99%          100:100   dmesg.general_protection_fault:#[##]


below details FYI.


kernel test robot noticed "general_protection_fault:#[##]" on:

commit: 6613d82e617dd7eb8b0c40b2fe3acea655b1d611 ("x86/bugs: Use ALTERNATIVE() instead of mds_user_clear static key")
https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master

[test failed on linus/master 70293240c5ce675a67bfc48f419b093023b862b3]
[test failed on linux-next/master 13ee4a7161b6fd938aef6688ff43b163f6d83e37]

in testcase: trinity
version: 
with following parameters:

	runtime: 600s



compiler: clang-17
test machine: qemu-system-x86_64 -enable-kvm -cpu SandyBridge -smp 2 -m 16G

(please refer to attached dmesg/kmsg for entire log/backtrace)



If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <oliver.sang@intel.com>
| Closes: https://lore.kernel.org/oe-lkp/202403281553.79f5a16f-lkp@intel.com


[   25.175767][  T670] VFS: Warning: trinity-c2 using old stat() call. Recompile your binary.
[   25.245597][  T669] general protection fault: 0000 [#1] PREEMPT SMP
[   25.246417][  T669] CPU: 1 PID: 669 Comm: trinity-c1 Not tainted 6.8.0-rc5-00004-g6613d82e617d #1 85a4928d2e6b42899c3861e57e26bdc646c4c5f9
[   25.247743][  T669] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.2-debian-1.16.2-1 04/01/2014
[ 25.248865][ T669] EIP: restore_all_switch_stack (kbuild/src/consumer/arch/x86/entry/entry_32.S:957) 
[ 25.249510][ T669] Code: 4c 24 10 36 89 48 fc 8b 4c 24 0c 81 e1 ff ff 00 00 36 89 48 f8 8b 4c 24 08 36 89 48 f4 8b 4c 24 04 36 89 48 f0 59 8d 60 f0 58 <0f> 00 2d 00 94 d5 c1 cf 6a 00 68 88 6b d4 c1 eb 00 fc 0f a0 50 b8
All code
========
   0:	4c 24 10             	rex.WR and $0x10,%al
   3:	36 89 48 fc          	ss mov %ecx,-0x4(%rax)
   7:	8b 4c 24 0c          	mov    0xc(%rsp),%ecx
   b:	81 e1 ff ff 00 00    	and    $0xffff,%ecx
  11:	36 89 48 f8          	ss mov %ecx,-0x8(%rax)
  15:	8b 4c 24 08          	mov    0x8(%rsp),%ecx
  19:	36 89 48 f4          	ss mov %ecx,-0xc(%rax)
  1d:	8b 4c 24 04          	mov    0x4(%rsp),%ecx
  21:	36 89 48 f0          	ss mov %ecx,-0x10(%rax)
  25:	59                   	pop    %rcx
  26:	8d 60 f0             	lea    -0x10(%rax),%esp
  29:	58                   	pop    %rax
  2a:*	0f 00 2d 00 94 d5 c1 	verw   -0x3e2a6c00(%rip)        # 0xffffffffc1d59431		<-- trapping instruction
  31:	cf                   	iret
  32:	6a 00                	push   $0x0
  34:	68 88 6b d4 c1       	push   $0xffffffffc1d46b88
  39:	eb 00                	jmp    0x3b
  3b:	fc                   	cld
  3c:	0f a0                	push   %fs
  3e:	50                   	push   %rax
  3f:	b8                   	.byte 0xb8

Code starting with the faulting instruction
===========================================
   0:	0f 00 2d 00 94 d5 c1 	verw   -0x3e2a6c00(%rip)        # 0xffffffffc1d59407
   7:	cf                   	iret
   8:	6a 00                	push   $0x0
   a:	68 88 6b d4 c1       	push   $0xffffffffc1d46b88
   f:	eb 00                	jmp    0x11
  11:	fc                   	cld
  12:	0f a0                	push   %fs
  14:	50                   	push   %rax
  15:	b8                   	.byte 0xb8
[   25.251494][  T669] EAX: 00000000 EBX: 000001a0 ECX: 000001a1 EDX: 00000000
[   25.252271][  T669] ESI: 00000000 EDI: 00000000 EBP: 00000000 ESP: ffa2efdc
[   25.253037][  T669] DS: 0000 ES: 0000 FS: 0000 GS: 0033 SS: 0068 EFLAGS: 00010046
[   25.253892][  T669] CR0: 80050033 CR2: b7dabd6e CR3: 2cc341c0 CR4: 000406b0
[   25.254655][  T669] DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000
[   25.255413][  T669] DR6: fffe0ff0 DR7: 00000400
[   25.255952][  T669] Call Trace:
[ 25.256376][ T669] ? __die_body (kbuild/src/consumer/arch/x86/kernel/dumpstack.c:478 kbuild/src/consumer/arch/x86/kernel/dumpstack.c:420) 
[ 25.256907][ T669] ? die_addr (kbuild/src/consumer/arch/x86/kernel/dumpstack.c:?) 
[ 25.257411][ T669] ? exc_general_protection (kbuild/src/consumer/arch/x86/kernel/traps.c:698) 
[ 25.258067][ T669] ? __entry_text_start (??:?) 
[ 25.258691][ T669] ? irqentry_exit_to_user_mode (kbuild/src/consumer/kernel/entry/common.c:228) 


The kernel config and materials to reproduce are available at:
https://download.01.org/0day-ci/archive/20240328/202403281553.79f5a16f-lkp@intel.com



-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [linus:master] [x86/bugs]  6613d82e61: general_protection_fault:#[##]
  2024-03-28  7:36 [linus:master] [x86/bugs] 6613d82e61: general_protection_fault:#[##] kernel test robot
@ 2024-03-28 21:17 ` Pawan Gupta
  2024-04-14  6:41   ` Linux regression tracking (Thorsten Leemhuis)
  0 siblings, 1 reply; 4+ messages in thread
From: Pawan Gupta @ 2024-03-28 21:17 UTC (permalink / raw
  To: kernel test robot; +Cc: oe-lkp, lkp, linux-kernel, Dave Hansen, kvm

On Thu, Mar 28, 2024 at 03:36:28PM +0800, kernel test robot wrote:
> compiler: clang-17
> test machine: qemu-system-x86_64 -enable-kvm -cpu SandyBridge -smp 2 -m 16G
> 
> (please refer to attached dmesg/kmsg for entire log/backtrace)
> 
> 
> 
> If you fix the issue in a separate patch/commit (i.e. not just a new version of
> the same patch/commit), kindly add following tags
> | Reported-by: kernel test robot <oliver.sang@intel.com>
> | Closes: https://lore.kernel.org/oe-lkp/202403281553.79f5a16f-lkp@intel.com
> 
> 
> [   25.175767][  T670] VFS: Warning: trinity-c2 using old stat() call. Recompile your binary.
> [   25.245597][  T669] general protection fault: 0000 [#1] PREEMPT SMP
> [   25.246417][  T669] CPU: 1 PID: 669 Comm: trinity-c1 Not tainted 6.8.0-rc5-00004-g6613d82e617d #1 85a4928d2e6b42899c3861e57e26bdc646c4c5f9
> [   25.247743][  T669] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.2-debian-1.16.2-1 04/01/2014
> [ 25.248865][ T669] EIP: restore_all_switch_stack (kbuild/src/consumer/arch/x86/entry/entry_32.S:957) 
> [ 25.249510][ T669] Code: 4c 24 10 36 89 48 fc 8b 4c 24 0c 81 e1 ff ff 00 00 36 89 48 f8 8b 4c 24 08 36 89 48 f4 8b 4c 24 04 36 89 48 f0 59 8d 60 f0 58 <0f> 00 2d 00 94 d5 c1 cf 6a 00 68 88 6b d4 c1 eb 00 fc 0f a0 50 b8
> All code
> ========
>    0:	4c 24 10             	rex.WR and $0x10,%al
>    3:	36 89 48 fc          	ss mov %ecx,-0x4(%rax)
>    7:	8b 4c 24 0c          	mov    0xc(%rsp),%ecx
>    b:	81 e1 ff ff 00 00    	and    $0xffff,%ecx
>   11:	36 89 48 f8          	ss mov %ecx,-0x8(%rax)
>   15:	8b 4c 24 08          	mov    0x8(%rsp),%ecx
>   19:	36 89 48 f4          	ss mov %ecx,-0xc(%rax)
>   1d:	8b 4c 24 04          	mov    0x4(%rsp),%ecx
>   21:	36 89 48 f0          	ss mov %ecx,-0x10(%rax)
>   25:	59                   	pop    %rcx
>   26:	8d 60 f0             	lea    -0x10(%rax),%esp
>   29:	58                   	pop    %rax
>   2a:*	0f 00 2d 00 94 d5 c1 	verw   -0x3e2a6c00(%rip)        # 0xffffffffc1d59431		<-- trapping instruction

This is due to 64-bit addressing with CONFIG_X86_32=y on clang.

I haven't tried with clang, but I don't see this happening with gcc-11:

	entry_INT80_32:
	...
	<+446>:   mov    0x4(%esp),%ecx
	<+450>:   mov    %ecx,%ss:-0x10(%eax)
	<+454>:   pop    %ecx
	<+455>:   lea    -0x10(%eax),%esp
	<+458>:   pop    %eax
	<+459>:   verw   0xc1d5c700              <----------
	<+466>:   iret

>   31:	cf                   	iret
>   32:	6a 00                	push   $0x0
>   34:	68 88 6b d4 c1       	push   $0xffffffffc1d46b88
>   39:	eb 00                	jmp    0x3b
...

The config has CONFIG_X86_32=y, but it is possible that in 32-bit build
with clang, 64-bit mode expansion of "VERW (_ASM_RIP(addr))" is getting
used i.e. __ASM_FORM_RAW(b) below:

  file: arch/x86/include/asm/asm.h
  ...
  #ifndef __x86_64__
  /* 32 bit */
  # define __ASM_SEL(a,b)         __ASM_FORM(a)
  # define __ASM_SEL_RAW(a,b)     __ASM_FORM_RAW(a)
  #else
  /* 64 bit */
  # define __ASM_SEL(a,b)         __ASM_FORM(b)
  # define __ASM_SEL_RAW(a,b)     __ASM_FORM_RAW(b)   <--------
  #endif
  ...
  /* Adds a (%rip) suffix on 64 bits only; for immediate memory references */
  #define _ASM_RIP(x)     __ASM_SEL_RAW(x, x (__ASM_REGPFX rip))

Possibly __x86_64__ is being defined with clang even when CONFIG_X86_32=y.

I am not sure about current level of 32-bit mode support in clang. This
seems inconclusive:

  https://discourse.llvm.org/t/x86-32-bit-testing/65480

Does anyone care about 32-bit mode builds with clang?

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [linus:master] [x86/bugs] 6613d82e61: general_protection_fault:#[##]
  2024-03-28 21:17 ` Pawan Gupta
@ 2024-04-14  6:41   ` Linux regression tracking (Thorsten Leemhuis)
  2024-04-17 18:54     ` Pawan Gupta
  0 siblings, 1 reply; 4+ messages in thread
From: Linux regression tracking (Thorsten Leemhuis) @ 2024-04-14  6:41 UTC (permalink / raw
  To: Pawan Gupta
  Cc: oe-lkp, lkp, linux-kernel, Dave Hansen, kvm,
	Linux kernel regressions list, kernel test robot

Hi, Thorsten here, the Linux kernel's regression tracker.

On 28.03.24 22:17, Pawan Gupta wrote:
> On Thu, Mar 28, 2024 at 03:36:28PM +0800, kernel test robot wrote:
>> compiler: clang-17
>> test machine: qemu-system-x86_64 -enable-kvm -cpu SandyBridge -smp 2 -m 16G
>>
>> If you fix the issue in a separate patch/commit (i.e. not just a new version of
>> the same patch/commit), kindly add following tags
>> | Reported-by: kernel test robot <oliver.sang@intel.com>
>> | Closes: https://lore.kernel.org/oe-lkp/202403281553.79f5a16f-lkp@intel.com

TWIMC, a user report general protection faults with dosemu that were
bisected to a 6.6.y backport of the commit that causes the problem
discussed in this thread (6613d82e617dd7 ("x86/bugs: Use ALTERNATIVE()
instead of mds_user_clear static key")).

User compiles using gcc, so it might be a different problem. Happens
with 6.8.y as well.

The problem occurs with x86-32 kernels, but strangely only on some of
the x86-32 systems the reporter has (e.g. on some everything works
fine). Makes me wonder if the commit exposed an older problem that only
happens on some machines.

For details see https://bugzilla.kernel.org/show_bug.cgi?id=218707
Could not CC the reporter here due to the bugzilla privacy policy; if
you want to get in contact, please use bugzilla.

Ciao, Thorsten

>> [   25.175767][  T670] VFS: Warning: trinity-c2 using old stat() call. Recompile your binary.
>> [   25.245597][  T669] general protection fault: 0000 [#1] PREEMPT SMP
>> [   25.246417][  T669] CPU: 1 PID: 669 Comm: trinity-c1 Not tainted 6.8.0-rc5-00004-g6613d82e617d #1 85a4928d2e6b42899c3861e57e26bdc646c4c5f9
>> [   25.247743][  T669] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.2-debian-1.16.2-1 04/01/2014
>> [ 25.248865][ T669] EIP: restore_all_switch_stack (kbuild/src/consumer/arch/x86/entry/entry_32.S:957) 
>> [ 25.249510][ T669] Code: 4c 24 10 36 89 48 fc 8b 4c 24 0c 81 e1 ff ff 00 00 36 89 48 f8 8b 4c 24 08 36 89 48 f4 8b 4c 24 04 36 89 48 f0 59 8d 60 f0 58 <0f> 00 2d 00 94 d5 c1 cf 6a 00 68 88 6b d4 c1 eb 00 fc 0f a0 50 b8
>> All code
>> ========
>>    0:	4c 24 10             	rex.WR and $0x10,%al
>>    3:	36 89 48 fc          	ss mov %ecx,-0x4(%rax)
>>    7:	8b 4c 24 0c          	mov    0xc(%rsp),%ecx
>>    b:	81 e1 ff ff 00 00    	and    $0xffff,%ecx
>>   11:	36 89 48 f8          	ss mov %ecx,-0x8(%rax)
>>   15:	8b 4c 24 08          	mov    0x8(%rsp),%ecx
>>   19:	36 89 48 f4          	ss mov %ecx,-0xc(%rax)
>>   1d:	8b 4c 24 04          	mov    0x4(%rsp),%ecx
>>   21:	36 89 48 f0          	ss mov %ecx,-0x10(%rax)
>>   25:	59                   	pop    %rcx
>>   26:	8d 60 f0             	lea    -0x10(%rax),%esp
>>   29:	58                   	pop    %rax
>>   2a:*	0f 00 2d 00 94 d5 c1 	verw   -0x3e2a6c00(%rip)        # 0xffffffffc1d59431		<-- trapping instruction
> 
> This is due to 64-bit addressing with CONFIG_X86_32=y on clang.
> 
> I haven't tried with clang, but I don't see this happening with gcc-11:
> 
> 	entry_INT80_32:
> 	...
> 	<+446>:   mov    0x4(%esp),%ecx
> 	<+450>:   mov    %ecx,%ss:-0x10(%eax)
> 	<+454>:   pop    %ecx
> 	<+455>:   lea    -0x10(%eax),%esp
> 	<+458>:   pop    %eax
> 	<+459>:   verw   0xc1d5c700              <----------
> 	<+466>:   iret
> 
>>   31:	cf                   	iret
>>   32:	6a 00                	push   $0x0
>>   34:	68 88 6b d4 c1       	push   $0xffffffffc1d46b88
>>   39:	eb 00                	jmp    0x3b
> ...
> 
> The config has CONFIG_X86_32=y, but it is possible that in 32-bit build
> with clang, 64-bit mode expansion of "VERW (_ASM_RIP(addr))" is getting
> used i.e. __ASM_FORM_RAW(b) below:
> 
>   file: arch/x86/include/asm/asm.h
>   ...
>   #ifndef __x86_64__
>   /* 32 bit */
>   # define __ASM_SEL(a,b)         __ASM_FORM(a)
>   # define __ASM_SEL_RAW(a,b)     __ASM_FORM_RAW(a)
>   #else
>   /* 64 bit */
>   # define __ASM_SEL(a,b)         __ASM_FORM(b)
>   # define __ASM_SEL_RAW(a,b)     __ASM_FORM_RAW(b)   <--------
>   #endif
>   ...
>   /* Adds a (%rip) suffix on 64 bits only; for immediate memory references */
>   #define _ASM_RIP(x)     __ASM_SEL_RAW(x, x (__ASM_REGPFX rip))
> 
> Possibly __x86_64__ is being defined with clang even when CONFIG_X86_32=y.
> 
> I am not sure about current level of 32-bit mode support in clang. This
> seems inconclusive:
> 
>   https://discourse.llvm.org/t/x86-32-bit-testing/65480
> 
> Does anyone care about 32-bit mode builds with clang?

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [linus:master] [x86/bugs] 6613d82e61: general_protection_fault:#[##]
  2024-04-14  6:41   ` Linux regression tracking (Thorsten Leemhuis)
@ 2024-04-17 18:54     ` Pawan Gupta
  0 siblings, 0 replies; 4+ messages in thread
From: Pawan Gupta @ 2024-04-17 18:54 UTC (permalink / raw
  To: Linux regressions mailing list
  Cc: oe-lkp, lkp, linux-kernel, Dave Hansen, kvm, kernel test robot

On Sun, Apr 14, 2024 at 08:41:52AM +0200, Linux regression tracking (Thorsten Leemhuis) wrote:
> Hi, Thorsten here, the Linux kernel's regression tracker.
> 
> On 28.03.24 22:17, Pawan Gupta wrote:
> > On Thu, Mar 28, 2024 at 03:36:28PM +0800, kernel test robot wrote:
> >> compiler: clang-17
> >> test machine: qemu-system-x86_64 -enable-kvm -cpu SandyBridge -smp 2 -m 16G
> >>
> >> If you fix the issue in a separate patch/commit (i.e. not just a new version of
> >> the same patch/commit), kindly add following tags
> >> | Reported-by: kernel test robot <oliver.sang@intel.com>
> >> | Closes: https://lore.kernel.org/oe-lkp/202403281553.79f5a16f-lkp@intel.com
> 
> TWIMC, a user report general protection faults with dosemu that were
> bisected to a 6.6.y backport of the commit that causes the problem
> discussed in this thread (6613d82e617dd7 ("x86/bugs: Use ALTERNATIVE()
> instead of mds_user_clear static key")).
> 
> User compiles using gcc, so it might be a different problem. Happens
> with 6.8.y as well.
> 
> The problem occurs with x86-32 kernels, but strangely only on some of
> the x86-32 systems the reporter has (e.g. on some everything works
> fine). Makes me wonder if the commit exposed an older problem that only
> happens on some machines.
> 
> For details see https://bugzilla.kernel.org/show_bug.cgi?id=218707
> Could not CC the reporter here due to the bugzilla privacy policy; if
> you want to get in contact, please use bugzilla.

Sorry for the late response, I was off work. I will look into this and
get back. I might need help reproducing this issue, but let me first see
if I can reproduce with the info in the bugzilla.

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2024-04-17 18:54 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-03-28  7:36 [linus:master] [x86/bugs] 6613d82e61: general_protection_fault:#[##] kernel test robot
2024-03-28 21:17 ` Pawan Gupta
2024-04-14  6:41   ` Linux regression tracking (Thorsten Leemhuis)
2024-04-17 18:54     ` Pawan Gupta

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).