[ISSUE] `cxl destory-region region0` causes kernel panic when cxl memory is occupied

NVDIMM Device and Persistent Memory development
 help / color / mirror / Atom feed

From: "Cao, Quanquan/曹 全全" <caoqq@fujitsu.com>
To: Dave Jiang <dave.jiang@intel.com>, vishal.l.verma@intel.com
Cc: linux-cxl@vger.kernel.org, nvdimm@lists.linux.dev
Subject: [ISSUE] `cxl destory-region region0` causes kernel panic when cxl memory is occupied
Date: Fri, 3 Nov 2023 19:11:21 +0800	[thread overview]
Message-ID: <0137fb34-7291-b88b-34aa-78471d57921b@fujitsu.com> (raw)

Hi guys,

I am writing to report an issue that I have encountered while executing 
'cxl destroy-region region0', causing a kernel panic when the cxl memory 
is occupied. I have provided a detailed description of the problem along 
with relevant test for reference.

Problem Description:

After 'create-region', if cxl memory is occupied using a script, then 
'disable-region' without `daxctl offline-memory` firstly, it will result 
in a kernel panic.

I made a few investigation on this, the panic was caused during the 
process of resetting the region decode in preparation for removal within 
the "destroy_region()" function in cxl/region.c. When the value of 
"/sys/bus/cxl/devices/root0/decoder0.0/region0/commit" is changed from 1 
to 0, it will invoke the driver code to reset the region decode, which 
in turn leads to a kernel panic:

[  397.898809] Code: Unable to access opcode bytes at 0xffffffffffffffd6.
[  397.908416] systemd[1]: segfault at 0 ip 0000000000000000 sp 
00007ffcdc242520 error 14 in systemd[55555aef50)
[  397.910578] Code: Unable to access opcode bytes at 0xffffffffffffffd6.
[  397.920233] systemd[1]: segfault at 0 ip 0000000000000000 sp 
00007ffcdc2416a0 error 14 in systemd[55555aef50)
[  397.922309] Code: Unable to access opcode bytes at 0xffffffffffffffd6.
[  397.933175] systemd[1]: segfault at 0 ip 0000000000000000 sp 
00007ffcdc240820 error 14 in systemd[55555aef50)
[  397.935553] Code: Unable to access opcode bytes at 0xffffffffffffffd6.
[  397.945611] systemd[1]: segfault at 0 ip 0000000000000000 sp 
00007ffcdc23f9a0 error 14 in systemd[55555aef50)
[  397.947751] Code: Unable to access opcode bytes at 0xffffffffffffffd6.
[  400.474068] Kernel panic - not syncing: Attempted to kill init! 
exitcode=0x0000000b
[  400.474583] CPU: 2 PID: 1 Comm: systemd Tainted: G           O     N 
6.6.0-rc6+ #1
[  400.474583] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 
rel-1.16.2-0-gea1b7a073390-prebuilt.qem4
[  400.474583] Call Trace:
[  400.474583]  <TASK>
[  400.474583]  dump_stack_lvl+0x43/0x60
[  400.474583]  panic+0x32a/0x340
[  400.474583]  ? _raw_spin_unlock+0x15/0x30
[  400.474583]  do_exit+0x9a1/0xb30
[  400.474583]  do_group_exit+0x2d/0x80
[  400.474583]  get_signal+0x9c7/0xa00
[  400.474583]  arch_do_signal_or_restart+0x3a/0x280
[  400.474583]  exit_to_user_mode_prepare+0x192/0x1f0
[  400.474583]  irqentry_exit_to_user_mode+0x5/0x30
[  400.474583]  asm_exc_page_fault+0x22/0x30
[  400.474583] RIP: 0033:0x0
[  400.474583] Code: Unable to access opcode bytes at 0xffffffffffffffd6.
[  400.474583] RSP: 002b:00007ffcdc1579a0 EFLAGS: 00000207
[  400.474583] RAX: 0000000000000001 RBX: 0000000000000000 RCX: 
00007fb10db2796d
[  400.474583] RDX: 00007fb10db2796d RSI: 00000000ffffffff RDI: 
00007ffcdc157c70
[  400.474583] RBP: 000000000000000b R08: 0000000000000000 R09: 
0000000000000000
[  400.474583] R10: 0000000000000000 R11: 0000000000000246 R12: 
00007ffcdc94dce8
[  400.474583] R13: 00007ffcdc94dce0 R14: 00000000000004bb R15: 
000000000000005d
[  400.474583]  </TASK>
[  400.474583] Kernel Offset: 0x20000000 from 0xffffffff81000000 
(relocation range: 0xffffffff80000000-0xffffff)
[  400.474583] ---[ end Kernel panic - not syncing: Attempted to kill 
init! exitcode=0x0000000b ]---

According to the panic message, the systemd process in the system 
encountered a segmentation fault (segfault), resulting in a kernel panic.

Test Example:

1.echo online_movable > /sys/devices/system/memory/auto_online_blocks
2.cxl create-region -t ram -d decoder0.0 -m mem0
3.python consumemem.py         <------execute script
4.cxl disable-region region0
5.cxl destory-region region0   <------kernel panic !!!

Thank you very much for taking the time to look on this issue. Looking 
forward to your response.

Best regards,
Quanquan Cao
caoqq@fujitsu.com

next             reply	other threads:[~2023-11-03 11:12 UTC|newest]

Thread overview: 2+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-11-03 11:11 Cao, Quanquan/曹 全全 [this message]
2023-11-03 15:37 ` [ISSUE] `cxl destory-region region0` causes kernel panic when cxl memory is occupied Dave Jiang

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=0137fb34-7291-b88b-34aa-78471d57921b@fujitsu.com \
    --to=caoqq@fujitsu.com \
    --cc=dave.jiang@intel.com \
    --cc=linux-cxl@vger.kernel.org \
    --cc=nvdimm@lists.linux.dev \
    --cc=vishal.l.verma@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).