Hi Brain,

There you go.

$ md5sum vmlinux
82aaa694a174c0a29e78c05e73adf5d8  vmlinux

Yes, I can read it with this vmlinux image. Put all files (vmcore,vmlinux,System.map) in a folder and run $crash vmlinux vmcore

Hugo


2015-07-09 23:18 GMT+08:00 Brian Foster <bfoster@redhat.com>:
On Thu, Jul 09, 2015 at 09:20:00PM +0800, Kuo Hugo wrote:
> Hi Brian,
>
> *Operating System Version:*
> Linux-2.6.32-504.23.4.el6.x86_64-x86_64-with-centos-6.6-Final
>
> *NODE 1*
>
> https://cloud.swiftstack.com/v1/AUTH_burton/brtnswift/vmcore
> https://cloud.swiftstack.com/v1/AUTH_burton/brtnswift/vmcore-dmesg.txt
>
>
> *NODE 2*
>
> https://cloud.swiftstack.com/v1/AUTH_burton/brtnswift/vmcore_r2obj02
> https://cloud.swiftstack.com/v1/AUTH_burton/brtnswift/vmcore-dmesg_r2obj02.txt
>
>
> Any thoughts would be appreciate
>

I'm not able to fire up crash with these core files and the kernel debug
info from the following centos kernel debuginfo package:

kernel-debuginfo-2.6.32-504.23.4.el6.centos.plus.x86_64.rpm

It complains about a version mismatch between the vmlinux and core file.
I'm no crash expert... are you sure the cores above correspond to this
kernel? Does crash load up for you on said box if you run something like
the following?

        crash /usr/lib/debug/lib/modules/.../vmlinux vmcore

Note that you might need to install the above kernel-debuginfo package
to get the debug (vmlinux) file. If so, could you also upload that
debuginfo rpm somewhere?

Brian

> Thanks // Hugo
>
>
> 2015-07-09 20:51 GMT+08:00 Brian Foster <bfoster@redhat.com>:
>
> > On Thu, Jul 09, 2015 at 06:57:55PM +0800, Kuo Hugo wrote:
> > > Hi Folks,
> > >
> > > As the results of 32 disks with xfs_repair -n seems no any error shows
> > up.
> > > We currently tried to deploy CentOS 6.6 for testing. (The previous kernel
> > > panic was came from Ubuntu).
> > > The CentOS nodes encountered kernel panic with same daemon but the
> > problem
> > > may a bit differ.
> > >
> > >    - It was broken on xfs_dir2_sf_get_parent_ino+0xa/0x20 in Ubuntu.
> > >    - Here’s the log in CentOS. It’s broken on
> > >    xfs_dir2_sf_getdents+0x2a0/0x3a0
> > >
> >
> > I'd venture to guess it's the same behavior here. The previous kernel
> > had a callback for the parent inode number that was called via
> > xfs_dir2_sf_getdents(). Taking a look at a 6.6 kernel, it has a static
> > inline here instead.
> >
> > > <1>BUG: unable to handle kernel NULL pointer dereference at
> > 0000000000000001
> > > <1>IP: [<ffffffffa0362d60>] xfs_dir2_sf_getdents+0x2a0/0x3a0 [xfs]
> > > <4>PGD 1072327067 PUD 1072328067 PMD 0
> > > <4>Oops: 0000 [#1] SMP
> > > <4>last sysfs file:
> > >
> > /sys/devices/pci0000:80/0000:80:03.2/0000:83:00.0/host10/port-10:1/expander-10:1/port-10:1:16/end_device-10:1:16/target10:0:25/10:0:25:0/block/sdz/queue/rotational
> > > <4>CPU 17
> > > <4>Modules linked in: xt_conntrack tun xfs exportfs iptable_filter
> > > ipt_REDIRECT iptable_nat nf_nat nf_conntrack_ipv4 nf_conntrack
> > > nf_defrag_ipv4 ip_tables ip_vs ipv6 libcrc32c iTCO_wdt
> > > iTCO_vendor_support ses enclosure igb i2c_algo_bit sb_edac edac_core
> > > i2c_i801 i2c_core sg shpchp lpc_ich mfd_core ixgbe dca ptp pps_core
> > > mdio power_meter acpi_ipmi ipmi_si ipmi_msghandler ext4 jbd2 mbcache
> > > sd_mod crc_t10dif mpt3sas scsi_transport_sas raid_class xhci_hcd ahci
> > > wmi dm_mirror dm_region_hash dm_log dm_mod [last unloaded:
> > > scsi_wait_scan]
> > > <4>
> > > <4>Pid: 4454, comm: swift-object-se Not tainted
> > > 2.6.32-504.23.4.el6.x86_64 #1 Silicon Mechanics Storform
> > > R518.v5P/X10DRi-T4+
> > > <4>RIP: 0010:[<ffffffffa0362d60>]  [<ffffffffa0362d60>]
> > > xfs_dir2_sf_getdents+0x2a0/0x3a0 [xfs]
> > > <4>RSP: 0018:ffff880871f6de18  EFLAGS: 00010202
> > > <4>RAX: 0000000000000000 RBX: 0000000000000004 RCX: 0000000000000000
> > > <4>RDX: 0000000000000001 RSI: 0000000000000000 RDI: 00007faa74006203
> > > <4>RBP: ffff880871f6de68 R08: 000000032eb04bc9 R09: 0000000000000004
> > > <4>R10: 0000000000008030 R11: 0000000000000246 R12: 0000000000000000
> > > <4>R13: 0000000000000002 R14: ffff88106eff7000 R15: ffff8808715b4580
> > > <4>FS:  00007faa85425700(0000) GS:ffff880028360000(0000)
> > knlGS:0000000000000000
> > > <4>CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > > <4>CR2: 0000000000000001 CR3: 0000001072325000 CR4: 00000000001407e0
> > > <4>DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> > > <4>DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
> > > <4>Process swift-object-se (pid: 4454, threadinfo ffff880871f6c000,
> > > task ffff880860f18ab0)
> > > <4>Stack:
> > > <4> ffff880871f6de28 ffffffff811a4bb0 ffff880871f6df38 ffff880874749cc0
> > > <4><d> 0000000100000103 ffff8802381f8c00 ffff880871f6df38
> > ffff8808715b4580
> > > <4><d> 0000000000000082 ffff8802381f8d88 ffff880871f6dec8
> > ffffffffa035ab31
> > > <4>Call Trace:
> > > <4> [<ffffffff811a4bb0>] ? filldir+0x0/0xe0
> > > <4> [<ffffffffa035ab31>] xfs_readdir+0xe1/0x130 [xfs]
> > > <4> [<ffffffff811a4bb0>] ? filldir+0x0/0xe0
> > > <4> [<ffffffffa038fe29>] xfs_file_readdir+0x39/0x50 [xfs]
> > > <4> [<ffffffff811a4e30>] vfs_readdir+0xc0/0xe0
> > > <4> [<ffffffff8119bd86>] ? final_putname+0x26/0x50
> > > <4> [<ffffffff811a4fb9>] sys_getdents+0x89/0xf0
> > > <4> [<ffffffff8100b0f2>] system_call_fastpath+0x16/0x1b
> > > <4>Code: 01 00 00 00 48 c7 c6 38 6b 3a a0 48 8b 7d c0 ff 55 b8 85 c0
> > > 0f 85 af 00 00 00 49 8b 37 e9 ec fd ff ff 66 0f 1f 84 00 00 00 00 00
> > > <41> 80 7c 24 01 00 0f 84 9c 00 00 00 45 0f b6 44 24 03 41 0f b6
> > > <1>RIP  [<ffffffffa0362d60>] xfs_dir2_sf_getdents+0x2a0/0x3a0 [xfs]
> > > <4> RSP <ffff880871f6de18>
> > > <4>CR2: 0000000000000001
> > >
> > ...
> > >
> > > I’ve got the vmcore dump from operator. Does vmcore help for
> > > troubleshooting kind issue ?
> > >
> >
> > Hmm, well it couldn't hurt. Is the vmcore based on this 6.6 kernel? Can
> > you provide the exact kernel version and post the vmcore somewhere?
> >
> > Brian
> >
> > > Thanks // Hugo
> > > ​
> > >
> > > 2015-06-18 22:59 GMT+08:00 Eric Sandeen <sandeen@sandeen.net>:
> > >
> > > > On 6/18/15 9:29 AM, Kuo Hugo wrote:
> > > > >>- Have you tried an 'xfs_repair -n' of the affected filesystem? Note
> > > > that -n will report problems only and prevent any modification by
> > repair.
> > > > >
> > > > > *We might to to xfs_repair if we can address which disk causes the
> > > > issue. *
> > > >
> > > > If you do, please save the output, and if it finds anything, please
> > > > provide the output in this thread.
> > > >
> > > > Thanks,
> > > > -Eric
> > > >
> >
> > > _______________________________________________
> > > xfs mailing list
> > > xfs@oss.sgi.com
> > > http://oss.sgi.com/mailman/listinfo/xfs
> >
> >