Greetings, 0day kernel testing robot got the below dmesg and the first bad commit is git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git master commit 0e1cc95b4cc7293bb7b39175035e7f7e45c90977 Author: Mel Gorman AuthorDate: Tue Jun 30 14:57:27 2015 -0700 Commit: Linus Torvalds CommitDate: Tue Jun 30 19:44:56 2015 -0700 mm: meminit: finish initialisation of struct pages before basic setup Waiman Long reported that 24TB machines hit OOM during basic setup when struct page initialisation was deferred. One approach is to initialise memory on demand but it interferes with page allocator paths. This patch creates dedicated threads to initialise memory before basic setup. It then blocks on a rw_semaphore until completion as a wait_queue and counter is overkill. This may be slower to boot but it's simplier overall and also gets rid of a section mangling which existed so kswapd could do the initialisation. [akpm@linux-foundation.org: include rwsem.h, use DECLARE_RWSEM, fix comment, remove unneeded cast] Signed-off-by: Mel Gorman Cc: Waiman Long Cc: Dave Hansen Cc: Scott Norton Tested-by: Daniel J Blueman Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds +-----------------------------------------------------+------------+------------+-----------------+ | | 74033a798f | 0e1cc95b4c | v4.2-rc1_071220 | +-----------------------------------------------------+------------+------------+-----------------+ | boot_successes | 0 | 0 | 0 | | boot_failures | 132 | 35 | 13 | | kernel_BUG_at_include/linux/mtd/map.h | 132 | 35 | 13 | | invalid_opcode | 132 | 35 | 13 | | RIP:mtd_do_chip_probe | 132 | 35 | 13 | | Kernel_panic-not_syncing:Fatal_exception | 132 | 35 | 13 | | backtrace:do_map_probe | 132 | 35 | 13 | | backtrace:init_sbc_gxx | 132 | 35 | 13 | | backtrace:kernel_init_freeable | 132 | 35 | 13 | | INFO:possible_recursive_locking_detected | 0 | 16 | 13 | | backtrace:page_alloc_init_late | 0 | 16 | 13 | | backtrace:down_write | 0 | 16 | 13 | | WARNING:at_kernel/locking/lockdep.c:#lock_release() | 0 | 19 | | | backtrace:up_read | 0 | 19 | | | backtrace:deferred_init_memmap | 0 | 19 | | +-----------------------------------------------------+------------+------------+-----------------+ Attached parent dmesg too, which looks like an independent bug. [ 0.084000] ..... host bus clock speed is 1000.0062 MHz. [ 0.084323] [ 0.084537] ============================================= [ 0.085229] [ INFO: possible recursive locking detected ] [ 0.085913] 4.1.0-11369-g0e1cc95b4 #5 Not tainted [ 0.086524] --------------------------------------------- [ 0.087224] swapper/1 is trying to acquire lock: [ 0.087839] (pgdat_init_rwsem){++++.+}, at: [] page_alloc_init_late+0x7f/0x90 [ 0.088000] [ 0.088000] but task is already holding lock: [ 0.088000] (pgdat_init_rwsem){++++.+}, at: [] page_alloc_init_late+0x13/0x90 [ 0.088000] [ 0.088000] other info that might help us debug this: [ 0.088000] Possible unsafe locking scenario: [ 0.088000] [ 0.088000] CPU0 [ 0.088000] ---- [ 0.088000] lock(pgdat_init_rwsem); [ 0.088000] lock(pgdat_init_rwsem); [ 0.088000] [ 0.088000] *** DEADLOCK *** [ 0.088000] [ 0.088000] May be due to missing lock nesting notation [ 0.088000] [ 0.088000] 1 lock held by swapper/1: [ 0.088000] #0: (pgdat_init_rwsem){++++.+}, at: [] page_alloc_init_late+0x13/0x90 [ 0.088000] [ 0.088000] stack backtrace: [ 0.088000] CPU: 0 PID: 1 Comm: swapper Not tainted 4.1.0-11369-g0e1cc95b4 #5 [ 0.088000] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.7.5-20140531_083030-gandalf 04/01/2014 [ 0.088000] ffffffff83591d60 ffff880010ee7d78 ffffffff81a4cb82 ffff880010ee7e48 [ 0.088000] ffffffff810fb38c ffff880010ee7db8 00000000810e53f7 ffff880010ef0c70 [ 0.088000] 0000000000000000 ffffffff83591d60 ffffffff834c0c00 00000000004b425a [ 0.088000] Call Trace: [ 0.088000] [] dump_stack+0x19/0x1b [ 0.088000] [] __lock_acquire+0xe3b/0xfeb [ 0.088000] [] ? check_preemption_disabled+0x3c/0x196 [ 0.088000] [] lock_acquire+0x10e/0x198 [ 0.088000] [] ? page_alloc_init_late+0x7f/0x90 [ 0.088000] [] down_write+0x3d/0x8b [ 0.088000] [] ? page_alloc_init_late+0x7f/0x90 [ 0.088000] [] page_alloc_init_late+0x7f/0x90 [ 0.088000] [] kernel_init_freeable+0x180/0x2c9 [ 0.088000] [] ? rest_init+0x155/0x155 [ 0.088000] [] kernel_init+0x9/0x152 [ 0.088000] [] ret_from_fork+0x3f/0x70 [ 0.088000] [] ? rest_init+0x155/0x155 [ 0.088611] devtmpfs: initialized [ 0.098991] clocksource: jiffies: mask: 0xffffffff max_cycles: 0xffffffff, max_idle_ns: 7645041785100000 ns [ 0.100542] xor: measuring software checksum speed git bisect start d770e558e21961ad6cfdf0ff7df0eb5d7d4f0754 v4.1 -- git bisect good e382608254e06c8109f40044f5e693f2e04f3899 # 22:59 22+ 22 Merge tag 'trace-v4.2' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace git bisect bad 5f1201d515819e7cfaaac3f0a30ff7b556261386 # 23:27 1- 20 Merge tag 'clk-for-linus-4.2' of git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux git bisect good 88793e5c774ec69351ef6b5200bb59f532e41bca # 23:36 22+ 22 Merge tag 'libnvdimm-for-4.2' of git://git.kernel.org/pub/scm/linux/kernel/git/djbw/nvdimm git bisect good 7adf12b87f45a77d364464018fb8e9e1ac875152 # 23:41 22+ 22 Merge tag 'for-linus-4.2-rc0-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip git bisect good 8fff77551a9215a725650263e30fa105acca95ab # 23:45 20+ 20 Merge git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6 git bisect bad 2d01eedf1d14432f4db5388a49dc5596a8c5bd02 # 23:50 1- 2 Merge branch 'akpm' (patches from Andrew) git bisect good d5fb82137b6cd39e67c4321f4f5ce9b03d4d04e6 # 00:16 33+ 35 Merge branch 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip git bisect good 6ac15baacb6ecd87c66209627753b96ded3b4515 # 00:21 33+ 33 Merge branch 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip git bisect bad 9ce71148b027e2bd27016139cae1c39401587695 # 00:25 1- 5 devpts: if initialization failed, don't crash when opening /dev/ptmx git bisect bad 460b865e53c347ebf110e50d499718cd9b39d810 # 00:30 5- 33 fs: document seq_open()'s usage of file->private_data git bisect good 7e18adb4f80bea90d30b62158694d97c31f71d37 # 00:35 33+ 33 mm: meminit: initialise remaining struct pages in parallel with kswapd git bisect good ac5d2539b2382689b1cdb90bd60dcd49f61c2773 # 00:41 31+ 31 mm: meminit: reduce number of times pageblocks are set during struct page init git bisect bad 0e1cc95b4cc7293bb7b39175035e7f7e45c90977 # 00:45 11- 24 mm: meminit: finish initialisation of struct pages before basic setup git bisect good 74033a798f5a5db368126ee6f690111cf019bf7a # 00:48 32+ 32 mm: meminit: remove mminit_verify_page_links # first bad commit: [0e1cc95b4cc7293bb7b39175035e7f7e45c90977] mm: meminit: finish initialisation of struct pages before basic setup git bisect good 74033a798f5a5db368126ee6f690111cf019bf7a # 00:51 100+ 132 mm: meminit: remove mminit_verify_page_links # extra tests with DEBUG_INFO git bisect bad 0e1cc95b4cc7293bb7b39175035e7f7e45c90977 # 00:54 0- 45 mm: meminit: finish initialisation of struct pages before basic setup # extra tests on HEAD of linux-devel/devel-hourly-2015071220 git bisect bad 1ae922e305feca3d8af890cf4601ef6a6cb5bbf1 # 00:54 0- 13 0day head guard for 'devel-hourly-2015071220' # extra tests on tree/branch linus/master git bisect bad bc0195aad0daa2ad5b0d76cce22b167bc3435590 # 00:58 4- 45 Linux 4.2-rc2 # extra tests with first bad commit reverted git bisect good 44813dd2ca45b1917d85ba59197678fdf069ce76 # 01:06 99+ 99 Revert "mm: meminit: finish initialisation of struct pages before basic setup" # extra tests on tree/branch linus/master git bisect bad bc0195aad0daa2ad5b0d76cce22b167bc3435590 # 01:06 0- 99 Linux 4.2-rc2 # extra tests on tree/branch next/master git bisect bad 2eb62d762a2112579f259903e62ba18d16c51f66 # 01:17 3- 23 Add linux-next specific files for 20150713 This script may reproduce the error. ---------------------------------------------------------------------------- #!/bin/bash kernel=$1 initrd=quantal-core-x86_64.cgz wget --no-clobber https://github.com/fengguang/reproduce-kernel-bug/raw/master/initrd/$initrd kvm=( qemu-system-x86_64 -enable-kvm -cpu kvm64 -kernel $kernel -initrd $initrd -m 300 -smp 2 -device e1000,netdev=net0 -netdev user,id=net0 -boot order=nc -no-reboot -watchdog i6300esb -rtc base=localtime -serial stdio -display none -monitor null ) append=( hung_task_panic=1 earlyprintk=ttyS0,115200 systemd.log_level=err debug apic=debug sysrq_always_enabled rcupdate.rcu_cpu_stall_timeout=100 panic=-1 softlockup_panic=1 nmi_watchdog=panic oops=panic load_ramdisk=2 prompt_ramdisk=0 console=ttyS0,115200 console=tty0 vga=normal root=/dev/ram0 rw drbd.minor_count=8 ) "${kvm[@]}" --append "${append[*]}" ---------------------------------------------------------------------------- --- 0-DAY kernel test infrastructure Open Source Technology Center https://lists.01.org/pipermail/lkp Intel Corporation