From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from relay.sgi.com (relay3.corp.sgi.com [198.149.34.15]) by oss.sgi.com (Postfix) with ESMTP id A43E87F5F for ; Fri, 17 Jul 2015 14:39:20 -0500 (CDT) Received: from cuda.sgi.com (cuda2.sgi.com [192.48.176.25]) by relay3.corp.sgi.com (Postfix) with ESMTP id 05AE6AC001 for ; Fri, 17 Jul 2015 12:39:16 -0700 (PDT) Received: from mail-wi0-f178.google.com (mail-wi0-f178.google.com [209.85.212.178]) by cuda.sgi.com with ESMTP id QdjU9ZgYZiFzubSg (version=TLSv1 cipher=RC4-SHA bits=128 verify=NO) for ; Fri, 17 Jul 2015 12:39:12 -0700 (PDT) Received: by wibud3 with SMTP id ud3so50685533wib.0 for ; Fri, 17 Jul 2015 12:39:11 -0700 (PDT) MIME-Version: 1.0 In-Reply-To: References: <5582DCDA.9070200@sandeen.net> <20150709125119.GC63282@bfoster.bfoster> <20150709151811.GE63282@bfoster.bfoster> <20150709183255.GG63282@bfoster.bfoster> <20150713125214.GA50787@bfoster.bfoster> <20150713170158.GB50787@bfoster.bfoster> Date: Sat, 18 Jul 2015 03:39:10 +0800 Message-ID: Subject: Re: Data can't be wrote to XFS RIP [] xfs_dir2_sf_get_parent_ino+0xa/0x20 From: Kuo Hugo List-Id: XFS Filesystem from SGI List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Content-Type: multipart/mixed; boundary="===============0814401067983345142==" Errors-To: xfs-bounces@oss.sgi.com Sender: xfs-bounces@oss.sgi.com To: Brian Foster Cc: Hugo Kuo , Eric Sandeen , Darrell Bishop , xfs@oss.sgi.com --===============0814401067983345142== Content-Type: multipart/alternative; boundary=bcaec53f398590e7f5051b17574c --bcaec53f398590e7f5051b17574c Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable Hi all, We may hit this bug in OpenStack Swift : Race condition quarantines valid objects https://bugs.launchpad.net/swift/+bug/1451520 This race condition may cause the problem which Brain mentioned previously. OpenStack community fixed this in Swift's code. Should it be a kernel bug of XFS too ? Thanks for all your efforts. Hugo 2015-07-14 2:10 GMT+08:00 Kuo Hugo : > Hi Brain, > > The sdb is mounted on /srv/node/d224 on this server. That's one of three > disks that the process was making I/O on it. > Those openfiles on /srv/node/d224 were made by the process to > storing/deleting data at that moment while the kernel panic appeared. > > ``` > 36 ffff8808703ed6c0 ffff88086fd65540 ffff8805eb03ed88 REG > /srv/node/d205/quarantined/objects/cd1d68f515006d443a54ff4f658091bc- > a114bba1449b45238abf38dc741d7c27/1436254020.89801.ts 37 ffff8810718343c0 > ffff88105b9d32c0 ffff8808745aa5e8 REG [eventpoll] 38 ffff8808713da780 > ffff880010c9a900 ffff88096368a188 REG /srv/node/d224/quarantined/objects/ > b146865bf8034bfc42570b747c341b32/1436266042.57775.ts 39 ffff880871cb03c0 > ffff880495a8b380 ffff8808a5e6c988 REG /srv/node/d224/tmp/tmpSpnrHg 40 > ffff8808715b4540 ffff8804819c58c0 ffff8802381f8d88 DIR > /srv/node/d224/quarantined/objects/b146865bf8034bfc42570b747c341b32 > ``` > > I'll check the dir location of the inode number. > > Nice information. > > Thanks // Hugo > > 2015-07-14 1:01 GMT+08:00 Brian Foster : > >> On Mon, Jul 13, 2015 at 10:06:39PM +0800, Kuo Hugo wrote: >> > Hi Brain, >> > >> > Sorry for the wrong file in previous message. I believe this the right >> one. >> > >> > https://cloud.swiftstack.com/v1/AUTH_hugo/public/xfs.ko.debug >> > >> > >> /usr/lib/debug/lib/modules/2.6.32-504.23.4.el6.x86_64/kernel/fs/xfs/xfs.= ko.debug >> > >> > MD5 : 27829c9c55f4f5b095d29a7de7c27254 >> > >> >> Yes, that works. I have a few bits of information so far, but nothing >> obvious to me as to what caused the problem. Some info: >> >> - The crash is indeed at xfs_dir2_sf_get_inumber(): >> >> /usr/src/debug/kernel-2.6.32-504.23.4.el6/linux-2.6.32-504.23.4.el6.x86_= 64/fs/xfs/xfs_dir2_sf.h: >> 101 >> 0xffffffffa0362d60 : cmpb $0x0,0x1(%r12) >> ... >> >> - %r12 above has a value of 0 and is set as follows: >> >> /usr/src/debug/kernel-2.6.32-504.23.4.el6/linux-2.6.32-504.23.4.el6.x86_= 64/fs/xfs/xfs_dir2_sf.c: >> 727 >> 0xffffffffa0362b11 : mov 0x50(%rdi),%r12 >> >> ... which is the sfp pointer assignment in the getdents function: >> >> sfp =3D (xfs_dir2_sf_t *)dp->i_df.if_u1.if_data; >> >> This implies a NULL if_data. >> >> - The backtrace lists a couple of inodes on the stack in this frame. I'm >> not sure why, but one looks like a valid directory and the other looks >> bogus. The valid inode has an inode number of 13668207561. >> >> - The fsname for this inode is "sdb." >> >> - The inode does appear to have a non-NULL if_data: >> >> ... >> if_u1 =3D { >> if_extents =3D 0xffff88084feaf5c0, >> if_ext_irec =3D 0xffff88084feaf5c0, >> if_data =3D 0xffff88084feaf5c0 "\004" >> }, >> ... >> >> So it's not totally clear what's going on there. It might be interesting >> to see what directory this refers to, if it still exists on the sdb fs. >> For example, is it an external directory or some kind of internal >> directory created by the application? You could use something like the >> following to try and locate the directory based on inode number: >> >> find -inum 13668207561 >> >> Brian >> >> > Thanks // Hugo >> > =E2=80=8B >> > >> > 2015-07-13 20:52 GMT+08:00 Brian Foster : >> > >> > > On Fri, Jul 10, 2015 at 01:36:41PM +0800, Kuo Hugo wrote: >> > > > Hi Brain, >> > > > >> > > > Is this the file which you need ? >> > > > >> > > > https://cloud.swiftstack.com/v1/AUTH_hugo/public/xfs.ko >> > > > >> > > > $> modinfo xfs >> > > > >> > > > filename: >> /lib/modules/2.6.32-504.23.4.el6.x86_64/kernel/fs/xfs/xfs.ko >> > > > license: GPL >> > > > description: SGI XFS with ACLs, security attributes, large >> block/inode >> > > > numbers, no debug enabled >> > > > author: Silicon Graphics, Inc. >> > > > srcversion: 0C1B17926BDDA4F121479EE >> > > > depends: exportfs >> > > > vermagic: 2.6.32-504.23.4.el6.x86_64 SMP mod_unload modversion >> > > > >> > > >> > > No, this isn't the debug version. We need the one from the debug >> package >> > > that was installed (/usr/lib/debug?). >> > > >> > > Brian >> > > >> > > > Thanks // Hugo >> > > > =E2=80=8B >> > > > >> > > > 2015-07-10 2:32 GMT+08:00 Brian Foster : >> > > > >> > > > > On Fri, Jul 10, 2015 at 12:40:00AM +0800, Kuo Hugo wrote: >> > > > > > Hi Brain, >> > > > > > >> > > > > > There you go. >> > > > > > >> > > > > > https://cloud.swiftstack.com/v1/AUTH_hugo/public/vmlinux >> > > > > > >> > > > > >> > > >> https://cloud.swiftstack.com/v1/AUTH_hugo/public/System.map-2.6.32-504.2= 3.4.el6.x86_64 >> > > > > > >> > > > > > $ md5sum vmlinux >> > > > > > 82aaa694a174c0a29e78c05e73adf5d8 vmlinux >> > > > > > >> > > > > > Yes, I can read it with this vmlinux image. Put all files >> > > > > > (vmcore,vmlinux,System.map) in a folder and run $crash vmlinux >> vmcore >> > > > > > >> > > > > >> > > > > Thanks, I can actually load that up now. Note that we'll probabl= y >> need >> > > > > the modules and whatnot (xfs.ko) also to be able to look at any >> XFS >> > > > > bits. It might be easiest to just tar up and compress whatever >> > > directory >> > > > > structure has the debug-enabled vmlinux and all the kernel >> modules. >> > > > > Thanks. >> > > > > >> > > > > Brian >> > > > > >> > > > > > Hugo >> > > > > > =E2=80=8B >> > > > > > >> > > > > > 2015-07-09 23:18 GMT+08:00 Brian Foster : >> > > > > > >> > > > > > > On Thu, Jul 09, 2015 at 09:20:00PM +0800, Kuo Hugo wrote: >> > > > > > > > Hi Brian, >> > > > > > > > >> > > > > > > > *Operating System Version:* >> > > > > > > > >> Linux-2.6.32-504.23.4.el6.x86_64-x86_64-with-centos-6.6-Final >> > > > > > > > >> > > > > > > > *NODE 1* >> > > > > > > > >> > > > > > > > >> https://cloud.swiftstack.com/v1/AUTH_burton/brtnswift/vmcore >> > > > > > > > >> > > > > >> https://cloud.swiftstack.com/v1/AUTH_burton/brtnswift/vmcore-dmesg.txt >> > > > > > > > >> > > > > > > > >> > > > > > > > *NODE 2* >> > > > > > > > >> > > > > > > > >> > > https://cloud.swiftstack.com/v1/AUTH_burton/brtnswift/vmcore_r2obj02 >> > > > > > > > >> > > > > > > >> > > > > >> > > >> https://cloud.swiftstack.com/v1/AUTH_burton/brtnswift/vmcore-dmesg_r2obj= 02.txt >> > > > > > > > >> > > > > > > > >> > > > > > > > Any thoughts would be appreciate >> > > > > > > > >> > > > > > > >> > > > > > > I'm not able to fire up crash with these core files and the >> kernel >> > > > > debug >> > > > > > > info from the following centos kernel debuginfo package: >> > > > > > > >> > > > > > > kernel-debuginfo-2.6.32-504.23.4.el6.centos.plus.x86_64.rpm >> > > > > > > >> > > > > > > It complains about a version mismatch between the vmlinux an= d >> core >> > > > > file. >> > > > > > > I'm no crash expert... are you sure the cores above >> correspond to >> > > this >> > > > > > > kernel? Does crash load up for you on said box if you run >> something >> > > > > like >> > > > > > > the following? >> > > > > > > >> > > > > > > crash /usr/lib/debug/lib/modules/.../vmlinux vmcore >> > > > > > > >> > > > > > > Note that you might need to install the above kernel-debugin= fo >> > > package >> > > > > > > to get the debug (vmlinux) file. If so, could you also uploa= d >> that >> > > > > > > debuginfo rpm somewhere? >> > > > > > > >> > > > > > > Brian >> > > > > > > >> > > > > > > > Thanks // Hugo >> > > > > > > > >> > > > > > > > >> > > > > > > > 2015-07-09 20:51 GMT+08:00 Brian Foster > >: >> > > > > > > > >> > > > > > > > > On Thu, Jul 09, 2015 at 06:57:55PM +0800, Kuo Hugo wrote= : >> > > > > > > > > > Hi Folks, >> > > > > > > > > > >> > > > > > > > > > As the results of 32 disks with xfs_repair -n seems no >> any >> > > error >> > > > > > > shows >> > > > > > > > > up. >> > > > > > > > > > We currently tried to deploy CentOS 6.6 for testing. >> (The >> > > > > previous >> > > > > > > kernel >> > > > > > > > > > panic was came from Ubuntu). >> > > > > > > > > > The CentOS nodes encountered kernel panic with same >> daemon >> > > but >> > > > > the >> > > > > > > > > problem >> > > > > > > > > > may a bit differ. >> > > > > > > > > > >> > > > > > > > > > - It was broken on >> xfs_dir2_sf_get_parent_ino+0xa/0x20 in >> > > > > Ubuntu. >> > > > > > > > > > - Here=E2=80=99s the log in CentOS. It=E2=80=99s br= oken on >> > > > > > > > > > xfs_dir2_sf_getdents+0x2a0/0x3a0 >> > > > > > > > > > >> > > > > > > > > >> > > > > > > > > I'd venture to guess it's the same behavior here. The >> previous >> > > > > kernel >> > > > > > > > > had a callback for the parent inode number that was >> called via >> > > > > > > > > xfs_dir2_sf_getdents(). Taking a look at a 6.6 kernel, i= t >> has a >> > > > > static >> > > > > > > > > inline here instead. >> > > > > > > > > >> > > > > > > > > > <1>BUG: unable to handle kernel NULL pointer >> dereference at >> > > > > > > > > 0000000000000001 >> > > > > > > > > > <1>IP: [] >> xfs_dir2_sf_getdents+0x2a0/0x3a0 >> > > > > [xfs] >> > > > > > > > > > <4>PGD 1072327067 PUD 1072328067 PMD 0 >> > > > > > > > > > <4>Oops: 0000 [#1] SMP >> > > > > > > > > > <4>last sysfs file: >> > > > > > > > > > >> > > > > > > > > >> > > > > > > >> > > > > >> > > >> /sys/devices/pci0000:80/0000:80:03.2/0000:83:00.0/host10/port-10:1/expan= der-10:1/port-10:1:16/end_device-10:1:16/target10:0:25/10:0:25:0/block/sdz/= queue/rotational >> > > > > > > > > > <4>CPU 17 >> > > > > > > > > > <4>Modules linked in: xt_conntrack tun xfs exportfs >> > > > > iptable_filter >> > > > > > > > > > ipt_REDIRECT iptable_nat nf_nat nf_conntrack_ipv4 >> > > nf_conntrack >> > > > > > > > > > nf_defrag_ipv4 ip_tables ip_vs ipv6 libcrc32c iTCO_wdt >> > > > > > > > > > iTCO_vendor_support ses enclosure igb i2c_algo_bit >> sb_edac >> > > > > edac_core >> > > > > > > > > > i2c_i801 i2c_core sg shpchp lpc_ich mfd_core ixgbe dca >> ptp >> > > > > pps_core >> > > > > > > > > > mdio power_meter acpi_ipmi ipmi_si ipmi_msghandler ext= 4 >> jbd2 >> > > > > mbcache >> > > > > > > > > > sd_mod crc_t10dif mpt3sas scsi_transport_sas raid_clas= s >> > > xhci_hcd >> > > > > ahci >> > > > > > > > > > wmi dm_mirror dm_region_hash dm_log dm_mod [last >> unloaded: >> > > > > > > > > > scsi_wait_scan] >> > > > > > > > > > <4> >> > > > > > > > > > <4>Pid: 4454, comm: swift-object-se Not tainted >> > > > > > > > > > 2.6.32-504.23.4.el6.x86_64 #1 Silicon Mechanics Storfo= rm >> > > > > > > > > > R518.v5P/X10DRi-T4+ >> > > > > > > > > > <4>RIP: 0010:[] [= ] >> > > > > > > > > > xfs_dir2_sf_getdents+0x2a0/0x3a0 [xfs] >> > > > > > > > > > <4>RSP: 0018:ffff880871f6de18 EFLAGS: 00010202 >> > > > > > > > > > <4>RAX: 0000000000000000 RBX: 0000000000000004 RCX: >> > > > > 0000000000000000 >> > > > > > > > > > <4>RDX: 0000000000000001 RSI: 0000000000000000 RDI: >> > > > > 00007faa74006203 >> > > > > > > > > > <4>RBP: ffff880871f6de68 R08: 000000032eb04bc9 R09: >> > > > > 0000000000000004 >> > > > > > > > > > <4>R10: 0000000000008030 R11: 0000000000000246 R12: >> > > > > 0000000000000000 >> > > > > > > > > > <4>R13: 0000000000000002 R14: ffff88106eff7000 R15: >> > > > > ffff8808715b4580 >> > > > > > > > > > <4>FS: 00007faa85425700(0000) GS:ffff880028360000(000= 0) >> > > > > > > > > knlGS:0000000000000000 >> > > > > > > > > > <4>CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 >> > > > > > > > > > <4>CR2: 0000000000000001 CR3: 0000001072325000 CR4: >> > > > > 00000000001407e0 >> > > > > > > > > > <4>DR0: 0000000000000000 DR1: 0000000000000000 DR2: >> > > > > 0000000000000000 >> > > > > > > > > > <4>DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: >> > > > > 0000000000000400 >> > > > > > > > > > <4>Process swift-object-se (pid: 4454, threadinfo >> > > > > ffff880871f6c000, >> > > > > > > > > > task ffff880860f18ab0) >> > > > > > > > > > <4>Stack: >> > > > > > > > > > <4> ffff880871f6de28 ffffffff811a4bb0 ffff880871f6df38 >> > > > > > > ffff880874749cc0 >> > > > > > > > > > <4> 0000000100000103 ffff8802381f8c00 >> ffff880871f6df38 >> > > > > > > > > ffff8808715b4580 >> > > > > > > > > > <4> 0000000000000082 ffff8802381f8d88 >> ffff880871f6dec8 >> > > > > > > > > ffffffffa035ab31 >> > > > > > > > > > <4>Call Trace: >> > > > > > > > > > <4> [] ? filldir+0x0/0xe0 >> > > > > > > > > > <4> [] xfs_readdir+0xe1/0x130 [xfs] >> > > > > > > > > > <4> [] ? filldir+0x0/0xe0 >> > > > > > > > > > <4> [] xfs_file_readdir+0x39/0x50 >> [xfs] >> > > > > > > > > > <4> [] vfs_readdir+0xc0/0xe0 >> > > > > > > > > > <4> [] ? final_putname+0x26/0x50 >> > > > > > > > > > <4> [] sys_getdents+0x89/0xf0 >> > > > > > > > > > <4> [] system_call_fastpath+0x16/0x1= b >> > > > > > > > > > <4>Code: 01 00 00 00 48 c7 c6 38 6b 3a a0 48 8b 7d c0 >> ff 55 >> > > b8 >> > > > > 85 c0 >> > > > > > > > > > 0f 85 af 00 00 00 49 8b 37 e9 ec fd ff ff 66 0f 1f 84 >> 00 00 >> > > 00 >> > > > > 00 00 >> > > > > > > > > > <41> 80 7c 24 01 00 0f 84 9c 00 00 00 45 0f b6 44 24 0= 3 >> 41 >> > > 0f b6 >> > > > > > > > > > <1>RIP [] >> xfs_dir2_sf_getdents+0x2a0/0x3a0 >> > > > > [xfs] >> > > > > > > > > > <4> RSP >> > > > > > > > > > <4>CR2: 0000000000000001 >> > > > > > > > > > >> > > > > > > > > ... >> > > > > > > > > > >> > > > > > > > > > I=E2=80=99ve got the vmcore dump from operator. Does v= mcore >> help for >> > > > > > > > > > troubleshooting kind issue ? >> > > > > > > > > > >> > > > > > > > > >> > > > > > > > > Hmm, well it couldn't hurt. Is the vmcore based on this >> 6.6 >> > > > > kernel? Can >> > > > > > > > > you provide the exact kernel version and post the vmcore >> > > somewhere? >> > > > > > > > > >> > > > > > > > > Brian >> > > > > > > > > >> > > > > > > > > > Thanks // Hugo >> > > > > > > > > > =E2=80=8B >> > > > > > > > > > >> > > > > > > > > > 2015-06-18 22:59 GMT+08:00 Eric Sandeen < >> sandeen@sandeen.net >> > > >: >> > > > > > > > > > >> > > > > > > > > > > On 6/18/15 9:29 AM, Kuo Hugo wrote: >> > > > > > > > > > > >>- Have you tried an 'xfs_repair -n' of the affecte= d >> > > > > filesystem? >> > > > > > > Note >> > > > > > > > > > > that -n will report problems only and prevent any >> > > modification >> > > > > by >> > > > > > > > > repair. >> > > > > > > > > > > > >> > > > > > > > > > > > *We might to to xfs_repair if we can address which >> disk >> > > > > causes >> > > > > > > the >> > > > > > > > > > > issue. * >> > > > > > > > > > > >> > > > > > > > > > > If you do, please save the output, and if it finds >> > > anything, >> > > > > please >> > > > > > > > > > > provide the output in this thread. >> > > > > > > > > > > >> > > > > > > > > > > Thanks, >> > > > > > > > > > > -Eric >> > > > > > > > > > > >> > > > > > > > > >> > > > > > > > > > _______________________________________________ >> > > > > > > > > > xfs mailing list >> > > > > > > > > > xfs@oss.sgi.com >> > > > > > > > > > http://oss.sgi.com/mailman/listinfo/xfs >> > > > > > > > > >> > > > > > > > > >> > > > > > > >> > > > > >> > > >> > > > _______________________________________________ >> > > > xfs mailing list >> > > > xfs@oss.sgi.com >> > > > http://oss.sgi.com/mailman/listinfo/xfs >> > > >> > > >> > > --bcaec53f398590e7f5051b17574c Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable
Hi all,=C2=A0

We may hit this bug in Op= enStack Swift =C2=A0:
Race condition quarantines valid objects https://bugs.launchpad.n= et/swift/+bug/1451520

This race condition may cause = the problem which Brain mentioned previously. OpenStack community fixed thi= s in Swift's code. Should it be a kernel bug of XFS too =C2=A0?=C2=A0

Thanks for all your efforts.=C2=A0
Hugo


2015-07-14 2:10 GMT+08:00 Kuo Hugo <tonytkdk@gmail.com>:
Hi Brain,=C2=A0
The sdb is mounted on /srv/node/d224 on this server. That&= #39;s one of three disks that the process was making I/O on it.=C2=A0
=
Those openfiles on /srv/node/d224 were made by the process to storing/= deleting data at that moment while the kernel panic appeared.=C2=A0

```
36 ffff8808703ed6c0 ffff88086fd65540 ffff8805eb03ed88 REG /srv/node= /d205/quarantined/objects/cd1d68f515006d443a54ff4f658091bc-a114bba144= 9b45238abf38dc741d7c27/1436254020.89= 801.ts 37 ffff8810718343c0 ffff88105b9d32c0 ffff8808745aa5e8 REG [eventpoll] 38 ffff8808713da780 ffff880010c9a900 ffff88096368a188 REG /srv/node/d224/= quarantined/objects/b146865bf8034bfc42570b747c341b32/1436266042.57775.ts 39 ffff880871cb03c0 ffff880495a8b380 ffff8808a5e6c988 REG /srv/node/d224/= tmp/tmpSpnrHg 40 ffff8808715b4540 ffff8804819c58c0 ffff8802381f8d88 DIR /srv/node/d224/= quarantined/objects/b146865bf8034bfc42570b747c341b32
```

I'll check the dir location of the inode number.=C2=A0

=
Nice information.=C2=A0

Thanks // Hugo= =C2=A0

2015-07-14 1:01 GMT+08:00 Brian Foster = <bfoster@redhat.= com>:
On Mon, Jul 13, 2015 at 10:06:39PM +0800, Kuo Hugo wrote:
> Hi Brain,
>
> Sorry for the wrong file in previ= ous message. I believe this the right one.
>
> https://cloud.swiftstack.com/v1/AU= TH_hugo/public/xfs.ko.debug
>
> /usr/lib/debug/lib/modules/2.6.32-504.23.4.el6.x86_64/kernel/fs/xfs/xf= s.ko.debug
>
> MD5 : 27829c9c55f4f5b095d29a7de7c27254
>

Yes, that works. I have a few bits of informa= tion so far, but nothing
obvious to me as to what caused the problem. Some info:

- The crash is indeed at xfs_dir2_sf_get_inumber():

/usr/src/debug/kernel-2.6.32-504.23.4.el6/linux-2.6.32-504.23.4.el6.x86_64/= fs/xfs/xfs_dir2_sf.h: 101
0xffffffffa0362d60 <xfs_dir2_sf_getdents+672>:=C2=A0 cmpb=C2=A0 =C2= =A0$0x0,0x1(%r12)
...

- %r12 above has a value of 0 and is set as follows:

/usr/src/debug/kernel-2.6.32-504.23.4.el6/linux-2.6.32-504.23.4.el6.x86_64/= fs/xfs/xfs_dir2_sf.c: 727
0xffffffffa0362b11 <xfs_dir2_sf_getdents+81>:=C2=A0 =C2=A0mov=C2=A0 = =C2=A0 0x50(%rdi),%r12

... which is the sfp pointer assignment in the getdents function:

=C2=A0 =C2=A0 =C2=A0 =C2=A0 sfp =3D (xfs_dir2_sf_t *)dp->i_df.if_u1.if_d= ata;

This implies a NULL if_data.

- The backtrace lists a couple of inodes on the stack in this frame. I'= m
not sure why, but one looks like a valid directory and the other looks
bogus. The valid inode has an inode number of 13668207561.

- The fsname for this inode is "sdb."

- The inode does appear to have a non-NULL if_data:

=C2=A0 =C2=A0 ...
=C2=A0 =C2=A0 if_u1 =3D {
=C2=A0 =C2=A0 =C2=A0 if_extents =3D 0xffff88084feaf5c0,
=C2=A0 =C2=A0 =C2=A0 if_ext_irec =3D 0xffff88084feaf5c0,
=C2=A0 =C2=A0 =C2=A0 if_data =3D 0xffff88084feaf5c0 "\004"
=C2=A0 =C2=A0 },
=C2=A0 =C2=A0 ...

So it's not totally clear what's going on there. It might be intere= sting
to see what directory this refers to, if it still exists on the sdb fs.
For example, is it an external directory or some kind of internal
directory created by the application? You could use something like the
following to try and locate the directory based on inode number:

=C2=A0 =C2=A0 =C2=A0 =C2=A0 find <mntpath> -inum 13668207561

Brian

> Thanks // Hugo
> =E2=80=8B
>
> 2015-07-13 20:52 GMT+08:00 Brian Foster <bfoster@redhat.com>:
>
> > On Fri, Jul 10, 2015 at 01:36:41PM +0800, Kuo Hugo wrote:
> > > Hi Brain,
> > >
> > > Is this the file which you need ?
> > >
> > > https://cloud.swiftstack.com/v= 1/AUTH_hugo/public/xfs.ko
> > >
> > > $> modinfo xfs
> > >
> > > filename: /lib/modules/2.6.32-504.23.4.el6.x86_64/kernel/fs/= xfs/xfs.ko
> > > license: GPL
> > > description: SGI XFS with ACLs, security attributes, large b= lock/inode
> > > numbers, no debug enabled
> > > author: Silicon Graphics, Inc.
> > > srcversion: 0C1B17926BDDA4F121479EE
> > > depends: exportfs
> > > vermagic: 2.6.32-504.23.4.el6.x86_64 SMP mod_unload modversi= on
> > >
> >
> > No, this isn't the debug version. We need the one from the de= bug package
> > that was installed (/usr/lib/debug?).
> >
> > Brian
> >
> > > Thanks // Hugo
> > > =E2=80=8B
> > >
> > > 2015-07-10 2:32 GMT+08:00 Brian Foster <bfoster@redhat.com>:
> > >
> > > > On Fri, Jul 10, 2015 at 12:40:00AM +0800, Kuo Hugo wrot= e:
> > > > > Hi Brain,
> > > > >
> > > > > There you go.
> > > > >
> > > > > https://cloud.swift= stack.com/v1/AUTH_hugo/public/vmlinux
> > > > >
> > > >
> > http= s://cloud.swiftstack.com/v1/AUTH_hugo/public/System.map-2.6.32-504.23.4.el6= .x86_64
> > > > >
> > > > > $ md5sum vmlinux
> > > > > 82aaa694a174c0a29e78c05e73adf5d8=C2=A0 vmlinux
> > > > >
> > > > > Yes, I can read it with this vmlinux image. Put al= l files
> > > > > (vmcore,vmlinux,System.map) in a folder and run $c= rash vmlinux vmcore
> > > > >
> > > >
> > > > Thanks, I can actually load that up now. Note that we&#= 39;ll probably need
> > > > the modules and whatnot (xfs.ko) also to be able to loo= k at any XFS
> > > > bits. It might be easiest to just tar up and compress w= hatever
> > directory
> > > > structure has the debug-enabled vmlinux and all the ker= nel modules.
> > > > Thanks.
> > > >
> > > > Brian
> > > >
> > > > > Hugo
> > > > > =E2=80=8B
> > > > >
> > > > > 2015-07-09 23:18 GMT+08:00 Brian Foster <bfoster@redhat.com>= ;:
> > > > >
> > > > > > On Thu, Jul 09, 2015 at 09:20:00PM +0800, Kuo= Hugo wrote:
> > > > > > > Hi Brian,
> > > > > > >
> > > > > > > *Operating System Version:*
> > > > > > > Linux-2.6.32-504.23.4.el6.x86_64-x86_64-= with-centos-6.6-Final
> > > > > > >
> > > > > > > *NODE 1*
> > > > > > >
> > > > > > > https= ://cloud.swiftstack.com/v1/AUTH_burton/brtnswift/vmcore
> > > > > > >
> > > > https://cl= oud.swiftstack.com/v1/AUTH_burton/brtnswift/vmcore-dmesg.txt
> > > > > > >
> > > > > > >
> > > > > > > *NODE 2*
> > > > > > >
> > > > > > >
> > https://cloud.swiftsta= ck.com/v1/AUTH_burton/brtnswift/vmcore_r2obj02
> > > > > > >
> > > > > >
> > > >
> > https://clou= d.swiftstack.com/v1/AUTH_burton/brtnswift/vmcore-dmesg_r2obj02.txt
> > > > > > >
> > > > > > >
> > > > > > > Any thoughts would be appreciate
> > > > > > >
> > > > > >
> > > > > > I'm not able to fire up crash with these = core files and the kernel
> > > > debug
> > > > > > info from the following centos kernel debugin= fo package:
> > > > > >
> > > > > > kernel-debuginfo-2.6.32-504.23.4.el6.centos.p= lus.x86_64.rpm
> > > > > >
> > > > > > It complains about a version mismatch between= the vmlinux and core
> > > > file.
> > > > > > I'm no crash expert... are you sure the c= ores above correspond to
> > this
> > > > > > kernel? Does crash load up for you on said bo= x if you run something
> > > > like
> > > > > > the following?
> > > > > >
> > > > > >=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0crash /usr/l= ib/debug/lib/modules/.../vmlinux vmcore
> > > > > >
> > > > > > Note that you might need to install the above= kernel-debuginfo
> > package
> > > > > > to get the debug (vmlinux) file. If so, could= you also upload that
> > > > > > debuginfo rpm somewhere?
> > > > > >
> > > > > > Brian
> > > > > >
> > > > > > > Thanks // Hugo
> > > > > > >
> > > > > > >
> > > > > > > 2015-07-09 20:51 GMT+08:00 Brian Foster = <bfoster@redhat.= com>:
> > > > > > >
> > > > > > > > On Thu, Jul 09, 2015 at 06:57:55PM = +0800, Kuo Hugo wrote:
> > > > > > > > > Hi Folks,
> > > > > > > > >
> > > > > > > > > As the results of 32 disks wit= h xfs_repair -n seems no any
> > error
> > > > > > shows
> > > > > > > > up.
> > > > > > > > > We currently tried to deploy C= entOS 6.6 for testing. (The
> > > > previous
> > > > > > kernel
> > > > > > > > > panic was came from Ubuntu). > > > > > > > > > The CentOS nodes encountered k= ernel panic with same daemon
> > but
> > > > the
> > > > > > > > problem
> > > > > > > > > may a bit differ.
> > > > > > > > >
> > > > > > > > >=C2=A0 =C2=A0 - It was broken o= n xfs_dir2_sf_get_parent_ino+0xa/0x20 in
> > > > Ubuntu.
> > > > > > > > >=C2=A0 =C2=A0 - Here=E2=80=99s = the log in CentOS. It=E2=80=99s broken on
> > > > > > > > >=C2=A0 =C2=A0 xfs_dir2_sf_getde= nts+0x2a0/0x3a0
> > > > > > > > >
> > > > > > > >
> > > > > > > > I'd venture to guess it's t= he same behavior here. The previous
> > > > kernel
> > > > > > > > had a callback for the parent inode= number that was called via
> > > > > > > > xfs_dir2_sf_getdents(). Taking a lo= ok at a 6.6 kernel, it has a
> > > > static
> > > > > > > > inline here instead.
> > > > > > > >
> > > > > > > > > <1>BUG: unable to handle= kernel NULL pointer dereference at
> > > > > > > > 0000000000000001
> > > > > > > > > <1>IP: [<ffffffffa036= 2d60>] xfs_dir2_sf_getdents+0x2a0/0x3a0
> > > > [xfs]
> > > > > > > > > <4>PGD 1072327067 PUD 10= 72328067 PMD 0
> > > > > > > > > <4>Oops: 0000 [#1] SMP > > > > > > > > > <4>last sysfs file:
> > > > > > > > >
> > > > > > > >
> > > > > >
> > > >
> > /sys/devices/pci0000:80/0000:80:03.2/0000:83:00.0/host10/port-10:= 1/expander-10:1/port-10:1:16/end_device-10:1:16/target10:0:25/10:0:25:0/blo= ck/sdz/queue/rotational
> > > > > > > > > <4>CPU 17
> > > > > > > > > <4>Modules linked in: xt= _conntrack tun xfs exportfs
> > > > iptable_filter
> > > > > > > > > ipt_REDIRECT iptable_nat nf_na= t nf_conntrack_ipv4
> > nf_conntrack
> > > > > > > > > nf_defrag_ipv4 ip_tables ip_vs= ipv6 libcrc32c iTCO_wdt
> > > > > > > > > iTCO_vendor_support ses enclos= ure igb i2c_algo_bit sb_edac
> > > > edac_core
> > > > > > > > > i2c_i801 i2c_core sg shpchp lp= c_ich mfd_core ixgbe dca ptp
> > > > pps_core
> > > > > > > > > mdio power_meter acpi_ipmi ipm= i_si ipmi_msghandler ext4 jbd2
> > > > mbcache
> > > > > > > > > sd_mod crc_t10dif mpt3sas scsi= _transport_sas raid_class
> > xhci_hcd
> > > > ahci
> > > > > > > > > wmi dm_mirror dm_region_hash d= m_log dm_mod [last unloaded:
> > > > > > > > > scsi_wait_scan]
> > > > > > > > > <4>
> > > > > > > > > <4>Pid: 4454, comm: swif= t-object-se Not tainted
> > > > > > > > > 2.6.32-504.23.4.el6.x86_64 #1 = Silicon Mechanics Storform
> > > > > > > > > R518.v5P/X10DRi-T4+
> > > > > > > > > <4>RIP: 0010:[<ffffff= ffa0362d60>]=C2=A0 [<ffffffffa0362d60>]
> > > > > > > > > xfs_dir2_sf_getdents+0x2a0/0x3= a0 [xfs]
> > > > > > > > > <4>RSP: 0018:ffff880871f= 6de18=C2=A0 EFLAGS: 00010202
> > > > > > > > > <4>RAX: 0000000000000000= RBX: 0000000000000004 RCX:
> > > > 0000000000000000
> > > > > > > > > <4>RDX: 0000000000000001= RSI: 0000000000000000 RDI:
> > > > 00007faa74006203
> > > > > > > > > <4>RBP: ffff880871f6de68= R08: 000000032eb04bc9 R09:
> > > > 0000000000000004
> > > > > > > > > <4>R10: 0000000000008030= R11: 0000000000000246 R12:
> > > > 0000000000000000
> > > > > > > > > <4>R13: 0000000000000002= R14: ffff88106eff7000 R15:
> > > > ffff8808715b4580
> > > > > > > > > <4>FS:=C2=A0 00007faa854= 25700(0000) GS:ffff880028360000(0000)
> > > > > > > > knlGS:0000000000000000
> > > > > > > > > <4>CS:=C2=A0 0010 DS: 00= 00 ES: 0000 CR0: 0000000080050033
> > > > > > > > > <4>CR2: 0000000000000001= CR3: 0000001072325000 CR4:
> > > > 00000000001407e0
> > > > > > > > > <4>DR0: 0000000000000000= DR1: 0000000000000000 DR2:
> > > > 0000000000000000
> > > > > > > > > <4>DR3: 0000000000000000= DR6: 00000000ffff0ff0 DR7:
> > > > 0000000000000400
> > > > > > > > > <4>Process swift-object-= se (pid: 4454, threadinfo
> > > > ffff880871f6c000,
> > > > > > > > > task ffff880860f18ab0)
> > > > > > > > > <4>Stack:
> > > > > > > > > <4> ffff880871f6de28 fff= fffff811a4bb0 ffff880871f6df38
> > > > > > ffff880874749cc0
> > > > > > > > > <4><d> 00000001000= 00103 ffff8802381f8c00 ffff880871f6df38
> > > > > > > > ffff8808715b4580
> > > > > > > > > <4><d> 00000000000= 00082 ffff8802381f8d88 ffff880871f6dec8
> > > > > > > > ffffffffa035ab31
> > > > > > > > > <4>Call Trace:
> > > > > > > > > <4> [<ffffffff811a4bb= 0>] ? filldir+0x0/0xe0
> > > > > > > > > <4> [<ffffffffa035ab3= 1>] xfs_readdir+0xe1/0x130 [xfs]
> > > > > > > > > <4> [<ffffffff811a4bb= 0>] ? filldir+0x0/0xe0
> > > > > > > > > <4> [<ffffffffa038fe2= 9>] xfs_file_readdir+0x39/0x50 [xfs]
> > > > > > > > > <4> [<ffffffff811a4e3= 0>] vfs_readdir+0xc0/0xe0
> > > > > > > > > <4> [<ffffffff8119bd8= 6>] ? final_putname+0x26/0x50
> > > > > > > > > <4> [<ffffffff811a4fb= 9>] sys_getdents+0x89/0xf0
> > > > > > > > > <4> [<ffffffff8100b0f= 2>] system_call_fastpath+0x16/0x1b
> > > > > > > > > <4>Code: 01 00 00 00 48 = c7 c6 38 6b 3a a0 48 8b 7d c0 ff 55
> > b8
> > > > 85 c0
> > > > > > > > > 0f 85 af 00 00 00 49 8b 37 e9 = ec fd ff ff 66 0f 1f 84 00 00
> > 00
> > > > 00 00
> > > > > > > > > <41> 80 7c 24 01 00 0f 8= 4 9c 00 00 00 45 0f b6 44 24 03 41
> > 0f b6
> > > > > > > > > <1>RIP=C2=A0 [<ffffff= ffa0362d60>] xfs_dir2_sf_getdents+0x2a0/0x3a0
> > > > [xfs]
> > > > > > > > > <4> RSP <ffff880871f6= de18>
> > > > > > > > > <4>CR2: 0000000000000001=
> > > > > > > > >
> > > > > > > > ...
> > > > > > > > >
> > > > > > > > > I=E2=80=99ve got the vmcore du= mp from operator. Does vmcore help for
> > > > > > > > > troubleshooting kind issue ? > > > > > > > > >
> > > > > > > >
> > > > > > > > Hmm, well it couldn't hurt. Is = the vmcore based on this 6.6
> > > > kernel? Can
> > > > > > > > you provide the exact kernel versio= n and post the vmcore
> > somewhere?
> > > > > > > >
> > > > > > > > Brian
> > > > > > > >
> > > > > > > > > Thanks // Hugo
> > > > > > > > > =E2=80=8B
> > > > > > > > >
> > > > > > > > > 2015-06-18 22:59 GMT+08:00 Eri= c Sandeen <sand= een@sandeen.net
> > >:
> > > > > > > > >
> > > > > > > > > > On 6/18/15 9:29 AM, Kuo H= ugo wrote:
> > > > > > > > > > >>- Have you tried = an 'xfs_repair -n' of the affected
> > > > filesystem?
> > > > > > Note
> > > > > > > > > > that -n will report probl= ems only and prevent any
> > modification
> > > > by
> > > > > > > > repair.
> > > > > > > > > > >
> > > > > > > > > > > *We might to to xfs_= repair if we can address which disk
> > > > causes
> > > > > > the
> > > > > > > > > > issue. *
> > > > > > > > > >
> > > > > > > > > > If you do, please save th= e output, and if it finds
> > anything,
> > > > please
> > > > > > > > > > provide the output in thi= s thread.
> > > > > > > > > >
> > > > > > > > > > Thanks,
> > > > > > > > > > -Eric
> > > > > > > > > >
> > > > > > > >
> > > > > > > > > ______________________________= _________________
> > > > > > > > > xfs mailing list
> > > > > > > > > xfs@oss.sgi.com
> > > > > > > > > http://oss.sgi.c= om/mailman/listinfo/xfs
> > > > > > > >
> > > > > > > >
> > > > > >
> > > >
> >
> > > _______________________________________________
> > > xfs mailing list
> > > xfs@oss= .sgi.com
> > > http://oss.sgi.com/mailman/listinfo/xfs > >
> >


--bcaec53f398590e7f5051b17574c-- --===============0814401067983345142== Content-Type: text/plain; charset="us-ascii" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit Content-Disposition: inline _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs --===============0814401067983345142==--