All the mail mirrored from lore.kernel.org
 help / color / mirror / Atom feed
* Data can't be wrote to XFS RIP [<ffffffffa041a99a>] xfs_dir2_sf_get_parent_ino+0xa/0x20
@ 2015-06-18 11:56 Kuo Hugo
  2015-06-18 13:31 ` Brian Foster
  0 siblings, 1 reply; 23+ messages in thread
From: Kuo Hugo @ 2015-06-18 11:56 UTC (permalink / raw)
  To: xfs


[-- Attachment #1.1: Type: text/plain, Size: 5227 bytes --]

Hi folks,

Recently we found the following kernel message of XFS. I don’t really know
how to read it in the right way to figure out the problem in the system.
Is there any known bug for
Linux-3.13.0-32-generic-x86_64-with-Ubuntu-14.04-trusty ? Or the problem is
on the swift-object-se rather than XFS itself ?

swift-object-se means swift-object-server which is a daemon handles data
from http to XFS. I can’t address the problem came from XFS or the daemon
swift-object-server.
Any idea would be appreciated.

Jun 15 09:49:30 r1obj02 kernel: [607696.798803] BUG: unable to handle
kernel NULL pointer dereference at 0000000000000001
Jun 15 09:49:30 r1obj02 kernel: [607696.800582] IP:
[<ffffffffa041a99a>] xfs_dir2_sf_get_parent_ino+0xa/0x20 [xfs]
Jun 15 09:49:30 r1obj02 kernel: [607696.802230] PGD 1046c6c067 PUD
1044eba067 PMD 0
Jun 15 09:49:30 r1obj02 kernel: [607696.803308] Oops: 0000 [#1] SMP
Jun 15 09:49:30 r1obj02 kernel: [607696.804058] Modules linked in:
xt_conntrack xfs xt_REDIRECT iptable_nat nf_conntrack_ipv4
nf_defrag_ipv4 nf_nat_ipv4 nf_nat xt_tcpudp iptable_filter ip_tables
x_tables x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm
crct10dif_pclmul crc32_pclmul ghash_clmulni_intel ip_vs aesni_intel
aes_x86_64 gpio_ich lrw nf_conntrack gf128mul libcrc32c mei_me
glue_helper sb_edac ablk_helper cryptd edac_core joydev mei lpc_ich
ioatdma lp ipmi_si shpchp wmi mac_hid parport ses enclosure
hid_generic igb usbhid ixgbe mpt2sas ahci hid i2c_algo_bit libahci dca
raid_class ptp mdio scsi_transport_sas pps_core
Jun 15 09:49:30 r1obj02 kernel: [607696.817125] CPU: 13 PID: 32401
Comm: swift-object-se Not tainted 3.13.0-32-generic #57-Ubuntu
Jun 15 09:49:30 r1obj02 kernel: [607696.819020] Hardware name: Silicon
Mechanics Storform iServ R518.v4/X9DRH-7TF/7F/iTF/iF, BIOS 3.0b
04/28/2014
Jun 15 09:49:30 r1obj02 kernel: [607696.821235] task: ffff880017d68000
ti: ffff8808e87e4000 task.ti: ffff8808e87e4000
Jun 15 09:49:30 r1obj02 kernel: [607696.822889] RIP:
0010:[<ffffffffa041a99a>] [<ffffffffa041a99a>]
xfs_dir2_sf_get_parent_ino+0xa/0x20 [xfs]
Jun 15 09:49:30 r1obj02 kernel: [607696.825117] RSP:
0018:ffff8808e87e5e38 EFLAGS: 00010202
Jun 15 09:49:30 r1obj02 kernel: [607696.826296] RAX: ffffffffa0458360
RBX: 0000000000000004 RCX: 0000000000000000
Jun 15 09:49:30 r1obj02 kernel: [607696.905158] RDX: 0000000000000002
RSI: 0000000000000002 RDI: 0000000000000000
Jun 15 09:49:30 r1obj02 kernel: [607696.987107] RBP: ffff8808e87e5e88
R08: 000000020079e3b9 R09: 0000000000000004
Jun 15 09:49:30 r1obj02 kernel: [607697.069214] R10: 00000000000003e0
R11: 00000000000005b0 R12: ffff88104d0c0800
Jun 15 09:49:30 r1obj02 kernel: [607697.151676] R13: ffff8808e87e5f20
R14: ffff88004988f000 R15: 0000000000000000
Jun 15 09:49:30 r1obj02 kernel: [607697.234244] FS:
00007fe74c9fb740(0000) GS:ffff88085fce0000(0000)
knlGS:0000000000000000
Jun 15 09:49:30 r1obj02 kernel: [607697.318842] CS: 0010 DS: 0000 ES:
0000 CR0: 0000000080050033
Jun 15 09:49:30 r1obj02 kernel: [607697.361609] CR2: 0000000000000001
CR3: 0000000bcb9b1000 CR4: 00000000001407e0
Jun 15 09:49:30 r1obj02 kernel: [607697.445360] Stack:
Jun 15 09:49:30 r1obj02 kernel: [607697.485796] ffff8808e87e5e88
ffffffffa03e2a33 ffff8808e87e5e58 ffffffff817205f9
Jun 15 09:49:30 r1obj02 kernel: [607697.567306] ffff8808e87e5eb8
ffff88084e1e6700 ffff88004988f000 ffff8808e87e5f20
Jun 15 09:49:30 r1obj02 kernel: [607697.648568] 0000000000000082
00007fe7487aa7a6 ffff8808e87e5ec0 ffffffffa03e2e0b
Jun 15 09:49:30 r1obj02 kernel: [607697.729785] Call Trace:
Jun 15 09:49:30 r1obj02 kernel: [607697.769297] [<ffffffffa03e2a33>] ?
xfs_dir2_sf_getdents+0x263/0x2a0 [xfs]
Jun 15 09:49:30 r1obj02 kernel: [607697.809560] [<ffffffff817205f9>] ?
schedule_preempt_disabled+0x29/0x70
Jun 15 09:49:30 r1obj02 kernel: [607697.849087] [<ffffffffa03e2e0b>]
xfs_readdir+0xeb/0x110 [xfs]
Jun 15 09:49:30 r1obj02 kernel: [607697.887918] [<ffffffffa03e4a3b>]
xfs_file_readdir+0x2b/0x40 [xfs]
Jun 15 09:49:30 r1obj02 kernel: [607697.926061] [<ffffffff811d0035>]
iterate_dir+0xa5/0xe0
Jun 15 09:49:30 r1obj02 kernel: [607697.963349] [<ffffffff8109ddf4>] ?
vtime_account_user+0x54/0x60
Jun 15 09:49:30 r1obj02 kernel: [607698.000413] [<ffffffff811d0492>]
SyS_getdents+0x92/0x120
Jun 15 09:49:30 r1obj02 kernel: [607698.037112] [<ffffffff811d0150>] ?
fillonedir+0xe0/0xe0
Jun 15 09:49:30 r1obj02 kernel: [607698.072867] [<ffffffff8172c81c>] ?
tracesys+0x7e/0xe6
Jun 15 09:49:30 r1obj02 kernel: [607698.107679] [<ffffffff8172c87f>]
tracesys+0xe1/0xe6
Jun 15 09:49:30 r1obj02 kernel: [607698.141543] Code: 00 48 8b 06 48
ba ff ff ff ff ff ff ff 00 5d 48 0f c8 48 21 d0 c3 66 66 2e 0f 1f 84
00 00 00 00 00 0f 1f 44 00 00 55 48 8d 77 02 <0f> b6 7f 01 48 89 e5 e8
aa ff ff ff 5d c3 0f 1f 84 00 00 00 00
Jun 15 09:49:30 r1obj02 kernel: [607698.244881] RIP
[<ffffffffa041a99a>] xfs_dir2_sf_get_parent_ino+0xa/0x20 [xfs]
Jun 15 09:49:30 r1obj02 kernel: [607698.310872] RSP <ffff8808e87e5e38>
Jun 15 09:49:30 r1obj02 kernel: [607698.343092] CR2: 0000000000000001
Jun 15 09:49:30 r1obj02 kernel: [607698.420933] ---[ end trace
ba3fdf319346b7e6 ]---

Thanks // Hugo Kuo
​

[-- Attachment #1.2: Type: text/html, Size: 33895 bytes --]

[-- Attachment #2: Type: text/plain, Size: 121 bytes --]

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Data can't be wrote to XFS RIP [<ffffffffa041a99a>] xfs_dir2_sf_get_parent_ino+0xa/0x20
  2015-06-18 11:56 Data can't be wrote to XFS RIP [<ffffffffa041a99a>] xfs_dir2_sf_get_parent_ino+0xa/0x20 Kuo Hugo
@ 2015-06-18 13:31 ` Brian Foster
  2015-06-18 14:29   ` Kuo Hugo
  0 siblings, 1 reply; 23+ messages in thread
From: Brian Foster @ 2015-06-18 13:31 UTC (permalink / raw)
  To: Kuo Hugo; +Cc: xfs

On Thu, Jun 18, 2015 at 07:56:24PM +0800, Kuo Hugo wrote:
> Hi folks,
> 
> Recently we found the following kernel message of XFS. I don’t really know
> how to read it in the right way to figure out the problem in the system.
> Is there any known bug for
> Linux-3.13.0-32-generic-x86_64-with-Ubuntu-14.04-trusty ? Or the problem is
> on the swift-object-se rather than XFS itself ?
> 

Nothing that I know of, but others might have seen something like this.

> swift-object-se means swift-object-server which is a daemon handles data
> from http to XFS. I can’t address the problem came from XFS or the daemon
> swift-object-server.
> Any idea would be appreciated.
> 
> Jun 15 09:49:30 r1obj02 kernel: [607696.798803] BUG: unable to handle
> kernel NULL pointer dereference at 0000000000000001
> Jun 15 09:49:30 r1obj02 kernel: [607696.800582] IP:
> [<ffffffffa041a99a>] xfs_dir2_sf_get_parent_ino+0xa/0x20 [xfs]

So that looks like a NULL header down in xfs_dir2_sf_get_ino(), as
hdr->i8count is at a 1 byte offset in the structure.

> Jun 15 09:49:30 r1obj02 kernel: [607696.802230] PGD 1046c6c067 PUD
> 1044eba067 PMD 0
> Jun 15 09:49:30 r1obj02 kernel: [607696.803308] Oops: 0000 [#1] SMP
> Jun 15 09:49:30 r1obj02 kernel: [607696.804058] Modules linked in:
> xt_conntrack xfs xt_REDIRECT iptable_nat nf_conntrack_ipv4
> nf_defrag_ipv4 nf_nat_ipv4 nf_nat xt_tcpudp iptable_filter ip_tables
> x_tables x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm
> crct10dif_pclmul crc32_pclmul ghash_clmulni_intel ip_vs aesni_intel
> aes_x86_64 gpio_ich lrw nf_conntrack gf128mul libcrc32c mei_me
> glue_helper sb_edac ablk_helper cryptd edac_core joydev mei lpc_ich
> ioatdma lp ipmi_si shpchp wmi mac_hid parport ses enclosure
> hid_generic igb usbhid ixgbe mpt2sas ahci hid i2c_algo_bit libahci dca
> raid_class ptp mdio scsi_transport_sas pps_core
> Jun 15 09:49:30 r1obj02 kernel: [607696.817125] CPU: 13 PID: 32401
> Comm: swift-object-se Not tainted 3.13.0-32-generic #57-Ubuntu
> Jun 15 09:49:30 r1obj02 kernel: [607696.819020] Hardware name: Silicon
> Mechanics Storform iServ R518.v4/X9DRH-7TF/7F/iTF/iF, BIOS 3.0b
> 04/28/2014
> Jun 15 09:49:30 r1obj02 kernel: [607696.821235] task: ffff880017d68000
> ti: ffff8808e87e4000 task.ti: ffff8808e87e4000
> Jun 15 09:49:30 r1obj02 kernel: [607696.822889] RIP:
> 0010:[<ffffffffa041a99a>] [<ffffffffa041a99a>]
> xfs_dir2_sf_get_parent_ino+0xa/0x20 [xfs]
> Jun 15 09:49:30 r1obj02 kernel: [607696.825117] RSP:
> 0018:ffff8808e87e5e38 EFLAGS: 00010202
> Jun 15 09:49:30 r1obj02 kernel: [607696.826296] RAX: ffffffffa0458360
> RBX: 0000000000000004 RCX: 0000000000000000
> Jun 15 09:49:30 r1obj02 kernel: [607696.905158] RDX: 0000000000000002
> RSI: 0000000000000002 RDI: 0000000000000000
> Jun 15 09:49:30 r1obj02 kernel: [607696.987107] RBP: ffff8808e87e5e88
> R08: 000000020079e3b9 R09: 0000000000000004
> Jun 15 09:49:30 r1obj02 kernel: [607697.069214] R10: 00000000000003e0
> R11: 00000000000005b0 R12: ffff88104d0c0800
> Jun 15 09:49:30 r1obj02 kernel: [607697.151676] R13: ffff8808e87e5f20
> R14: ffff88004988f000 R15: 0000000000000000
> Jun 15 09:49:30 r1obj02 kernel: [607697.234244] FS:
> 00007fe74c9fb740(0000) GS:ffff88085fce0000(0000)
> knlGS:0000000000000000
> Jun 15 09:49:30 r1obj02 kernel: [607697.318842] CS: 0010 DS: 0000 ES:
> 0000 CR0: 0000000080050033
> Jun 15 09:49:30 r1obj02 kernel: [607697.361609] CR2: 0000000000000001
> CR3: 0000000bcb9b1000 CR4: 00000000001407e0
> Jun 15 09:49:30 r1obj02 kernel: [607697.445360] Stack:
> Jun 15 09:49:30 r1obj02 kernel: [607697.485796] ffff8808e87e5e88
> ffffffffa03e2a33 ffff8808e87e5e58 ffffffff817205f9
> Jun 15 09:49:30 r1obj02 kernel: [607697.567306] ffff8808e87e5eb8
> ffff88084e1e6700 ffff88004988f000 ffff8808e87e5f20
> Jun 15 09:49:30 r1obj02 kernel: [607697.648568] 0000000000000082
> 00007fe7487aa7a6 ffff8808e87e5ec0 ffffffffa03e2e0b
> Jun 15 09:49:30 r1obj02 kernel: [607697.729785] Call Trace:
> Jun 15 09:49:30 r1obj02 kernel: [607697.769297] [<ffffffffa03e2a33>] ?
> xfs_dir2_sf_getdents+0x263/0x2a0 [xfs]

We're called from here attempting to list a directory, which appears to
be the following block of code:

	...
	sfp = (xfs_dir2_sf_hdr_t *)dp->i_df.if_u1.if_data;
	...
        if (ctx->pos <= dotdot_offset) {
                ino = dp->d_ops->sf_get_parent_ino(sfp);
                ctx->pos = dotdot_offset & 0x7fffffff;
                if (!dir_emit(ctx, "..", 2, ino, DT_DIR))
                        return 0;
        }

It wants to emit the ".." directory entry and apparently the in-core
data fork is NULL. There's an assertion against that earlier in the
function so I take it the expectation is that this has been read/set
beforehand. In fact, if this is a short form directory I also take it
this should be set to if_inline_data, which appears to be part of the
fork allocation itself.

It's not immediately clear to me how this could happen. First off, it
would probably be good to determine whether this is a runtime issue or
due to some kind of on-disk problem. Some questions:

- Is this (and how often) reproducible?
- Have you identified which directory in your fs that the object server
  is attempting to enumerate when this occurs?
- Do you have any other, related output in /var/log/messages prior to
  this event? E.g., corruption messages or anything of that nature?
- Have you tried an 'xfs_repair -n' of the affected filesystem? Note
  that -n will report problems only and prevent any modification by
  repair.

Brian

> Jun 15 09:49:30 r1obj02 kernel: [607697.809560] [<ffffffff817205f9>] ?
> schedule_preempt_disabled+0x29/0x70
> Jun 15 09:49:30 r1obj02 kernel: [607697.849087] [<ffffffffa03e2e0b>]
> xfs_readdir+0xeb/0x110 [xfs]
> Jun 15 09:49:30 r1obj02 kernel: [607697.887918] [<ffffffffa03e4a3b>]
> xfs_file_readdir+0x2b/0x40 [xfs]
> Jun 15 09:49:30 r1obj02 kernel: [607697.926061] [<ffffffff811d0035>]
> iterate_dir+0xa5/0xe0
> Jun 15 09:49:30 r1obj02 kernel: [607697.963349] [<ffffffff8109ddf4>] ?
> vtime_account_user+0x54/0x60
> Jun 15 09:49:30 r1obj02 kernel: [607698.000413] [<ffffffff811d0492>]
> SyS_getdents+0x92/0x120
> Jun 15 09:49:30 r1obj02 kernel: [607698.037112] [<ffffffff811d0150>] ?
> fillonedir+0xe0/0xe0
> Jun 15 09:49:30 r1obj02 kernel: [607698.072867] [<ffffffff8172c81c>] ?
> tracesys+0x7e/0xe6
> Jun 15 09:49:30 r1obj02 kernel: [607698.107679] [<ffffffff8172c87f>]
> tracesys+0xe1/0xe6
> Jun 15 09:49:30 r1obj02 kernel: [607698.141543] Code: 00 48 8b 06 48
> ba ff ff ff ff ff ff ff 00 5d 48 0f c8 48 21 d0 c3 66 66 2e 0f 1f 84
> 00 00 00 00 00 0f 1f 44 00 00 55 48 8d 77 02 <0f> b6 7f 01 48 89 e5 e8
> aa ff ff ff 5d c3 0f 1f 84 00 00 00 00
> Jun 15 09:49:30 r1obj02 kernel: [607698.244881] RIP
> [<ffffffffa041a99a>] xfs_dir2_sf_get_parent_ino+0xa/0x20 [xfs]
> Jun 15 09:49:30 r1obj02 kernel: [607698.310872] RSP <ffff8808e87e5e38>
> Jun 15 09:49:30 r1obj02 kernel: [607698.343092] CR2: 0000000000000001
> Jun 15 09:49:30 r1obj02 kernel: [607698.420933] ---[ end trace
> ba3fdf319346b7e6 ]---
> 
> Thanks // Hugo Kuo
> ​

> _______________________________________________
> xfs mailing list
> xfs@oss.sgi.com
> http://oss.sgi.com/mailman/listinfo/xfs

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Data can't be wrote to XFS RIP [<ffffffffa041a99a>] xfs_dir2_sf_get_parent_ino+0xa/0x20
  2015-06-18 13:31 ` Brian Foster
@ 2015-06-18 14:29   ` Kuo Hugo
  2015-06-18 14:59     ` Eric Sandeen
  0 siblings, 1 reply; 23+ messages in thread
From: Kuo Hugo @ 2015-06-18 14:29 UTC (permalink / raw)
  To: Brian Foster, xfs; +Cc: Hugo Kuo, darrell


[-- Attachment #1.1: Type: text/plain, Size: 9286 bytes --]

Hi all,


>- Is this (and how often) reproducible?

*This is the third time happened in three different servers in past 5
days. *

>- Have you identified which directory in your fs that the object server is
attempting to enumerate when this occurs?

*There's multiple object server workers R/W on over 30 XFS disks in a
server.  I don't have clue about which object server request causes the
kernel panic. I'm still investigating. *

>- Do you have any other, related output in /var/log/messages prior to this
event? E.g., corruption messages or anything of that nature?

*Seems no useful information in the /var/log/syslog*

```
Jun 18 06:07:00 r1obj03 ovpn-454f2951-b955-11e4-8034-0cc47a1f36ee[4069]:
Data Channel Decrypt: Using 160 bit message hash 'SHA1' for HMAC
authentication
Jun 18 06:07:00 r1obj03 ovpn-454f2951-b955-11e4-8034-0cc47a1f36ee[4069]:
Control Channel: TLSv1, cipher TLSv1/SSLv3 DHE-RSA-AES256-SHA, 2048 bit RSA
Jun 18 06:10:01 r1obj03 CRON[13595]: (swift) CMD ((date; test -f
/etc/swift/object-server.conf && /opt/ss/bin/swift-recon-cron
/etc/swift/object-server.conf || /opt/ss/bin/swift-recon-cron
/etc/swift/object-server/1.conf) >> /var/log/swift-recon-cron.log 2>&1)
Jun 18 06:10:14 r1obj03 kernel: [7631629.083099] BUG: unable to handle
kernel NULL pointer dereference at 0000000000000001
```

>- Have you tried an 'xfs_repair -n' of the affected filesystem? Note that
-n will report problems only and prevent any modification by repair.

*We might to to xfs_repair if we can address which disk causes the issue. *

Thanks // Hugo Kuo

2015-06-18 21:31 GMT+08:00 Brian Foster <bfoster@redhat.com>:

> On Thu, Jun 18, 2015 at 07:56:24PM +0800, Kuo Hugo wrote:
> > Hi folks,
> >
> > Recently we found the following kernel message of XFS. I don’t really
> know
> > how to read it in the right way to figure out the problem in the system.
> > Is there any known bug for
> > Linux-3.13.0-32-generic-x86_64-with-Ubuntu-14.04-trusty ? Or the problem
> is
> > on the swift-object-se rather than XFS itself ?
> >
>
> Nothing that I know of, but others might have seen something like this.
>
> > swift-object-se means swift-object-server which is a daemon handles data
> > from http to XFS. I can’t address the problem came from XFS or the daemon
> > swift-object-server.
> > Any idea would be appreciated.
> >
> > Jun 15 09:49:30 r1obj02 kernel: [607696.798803] BUG: unable to handle
> > kernel NULL pointer dereference at 0000000000000001
> > Jun 15 09:49:30 r1obj02 kernel: [607696.800582] IP:
> > [<ffffffffa041a99a>] xfs_dir2_sf_get_parent_ino+0xa/0x20 [xfs]
>
> So that looks like a NULL header down in xfs_dir2_sf_get_ino(), as
> hdr->i8count is at a 1 byte offset in the structure.
>
> > Jun 15 09:49:30 r1obj02 kernel: [607696.802230] PGD 1046c6c067 PUD
> > 1044eba067 PMD 0
> > Jun 15 09:49:30 r1obj02 kernel: [607696.803308] Oops: 0000 [#1] SMP
> > Jun 15 09:49:30 r1obj02 kernel: [607696.804058] Modules linked in:
> > xt_conntrack xfs xt_REDIRECT iptable_nat nf_conntrack_ipv4
> > nf_defrag_ipv4 nf_nat_ipv4 nf_nat xt_tcpudp iptable_filter ip_tables
> > x_tables x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm
> > crct10dif_pclmul crc32_pclmul ghash_clmulni_intel ip_vs aesni_intel
> > aes_x86_64 gpio_ich lrw nf_conntrack gf128mul libcrc32c mei_me
> > glue_helper sb_edac ablk_helper cryptd edac_core joydev mei lpc_ich
> > ioatdma lp ipmi_si shpchp wmi mac_hid parport ses enclosure
> > hid_generic igb usbhid ixgbe mpt2sas ahci hid i2c_algo_bit libahci dca
> > raid_class ptp mdio scsi_transport_sas pps_core
> > Jun 15 09:49:30 r1obj02 kernel: [607696.817125] CPU: 13 PID: 32401
> > Comm: swift-object-se Not tainted 3.13.0-32-generic #57-Ubuntu
> > Jun 15 09:49:30 r1obj02 kernel: [607696.819020] Hardware name: Silicon
> > Mechanics Storform iServ R518.v4/X9DRH-7TF/7F/iTF/iF, BIOS 3.0b
> > 04/28/2014
> > Jun 15 09:49:30 r1obj02 kernel: [607696.821235] task: ffff880017d68000
> > ti: ffff8808e87e4000 task.ti: ffff8808e87e4000
> > Jun 15 09:49:30 r1obj02 kernel: [607696.822889] RIP:
> > 0010:[<ffffffffa041a99a>] [<ffffffffa041a99a>]
> > xfs_dir2_sf_get_parent_ino+0xa/0x20 [xfs]
> > Jun 15 09:49:30 r1obj02 kernel: [607696.825117] RSP:
> > 0018:ffff8808e87e5e38 EFLAGS: 00010202
> > Jun 15 09:49:30 r1obj02 kernel: [607696.826296] RAX: ffffffffa0458360
> > RBX: 0000000000000004 RCX: 0000000000000000
> > Jun 15 09:49:30 r1obj02 kernel: [607696.905158] RDX: 0000000000000002
> > RSI: 0000000000000002 RDI: 0000000000000000
> > Jun 15 09:49:30 r1obj02 kernel: [607696.987107] RBP: ffff8808e87e5e88
> > R08: 000000020079e3b9 R09: 0000000000000004
> > Jun 15 09:49:30 r1obj02 kernel: [607697.069214] R10: 00000000000003e0
> > R11: 00000000000005b0 R12: ffff88104d0c0800
> > Jun 15 09:49:30 r1obj02 kernel: [607697.151676] R13: ffff8808e87e5f20
> > R14: ffff88004988f000 R15: 0000000000000000
> > Jun 15 09:49:30 r1obj02 kernel: [607697.234244] FS:
> > 00007fe74c9fb740(0000) GS:ffff88085fce0000(0000)
> > knlGS:0000000000000000
> > Jun 15 09:49:30 r1obj02 kernel: [607697.318842] CS: 0010 DS: 0000 ES:
> > 0000 CR0: 0000000080050033
> > Jun 15 09:49:30 r1obj02 kernel: [607697.361609] CR2: 0000000000000001
> > CR3: 0000000bcb9b1000 CR4: 00000000001407e0
> > Jun 15 09:49:30 r1obj02 kernel: [607697.445360] Stack:
> > Jun 15 09:49:30 r1obj02 kernel: [607697.485796] ffff8808e87e5e88
> > ffffffffa03e2a33 ffff8808e87e5e58 ffffffff817205f9
> > Jun 15 09:49:30 r1obj02 kernel: [607697.567306] ffff8808e87e5eb8
> > ffff88084e1e6700 ffff88004988f000 ffff8808e87e5f20
> > Jun 15 09:49:30 r1obj02 kernel: [607697.648568] 0000000000000082
> > 00007fe7487aa7a6 ffff8808e87e5ec0 ffffffffa03e2e0b
> > Jun 15 09:49:30 r1obj02 kernel: [607697.729785] Call Trace:
> > Jun 15 09:49:30 r1obj02 kernel: [607697.769297] [<ffffffffa03e2a33>] ?
> > xfs_dir2_sf_getdents+0x263/0x2a0 [xfs]
>
> We're called from here attempting to list a directory, which appears to
> be the following block of code:
>
>         ...
>         sfp = (xfs_dir2_sf_hdr_t *)dp->i_df.if_u1.if_data;
>         ...
>         if (ctx->pos <= dotdot_offset) {
>                 ino = dp->d_ops->sf_get_parent_ino(sfp);
>                 ctx->pos = dotdot_offset & 0x7fffffff;
>                 if (!dir_emit(ctx, "..", 2, ino, DT_DIR))
>                         return 0;
>         }
>
> It wants to emit the ".." directory entry and apparently the in-core
> data fork is NULL. There's an assertion against that earlier in the
> function so I take it the expectation is that this has been read/set
> beforehand. In fact, if this is a short form directory I also take it
> this should be set to if_inline_data, which appears to be part of the
> fork allocation itself.
>
> It's not immediately clear to me how this could happen. First off, it
> would probably be good to determine whether this is a runtime issue or
> due to some kind of on-disk problem. Some questions:
>
> - Is this (and how often) reproducible?
> - Have you identified which directory in your fs that the object server
>   is attempting to enumerate when this occurs?
> - Do you have any other, related output in /var/log/messages prior to
>   this event? E.g., corruption messages or anything of that nature?
> - Have you tried an 'xfs_repair -n' of the affected filesystem? Note
>   that -n will report problems only and prevent any modification by
>   repair.
>
> Brian
>
> > Jun 15 09:49:30 r1obj02 kernel: [607697.809560] [<ffffffff817205f9>] ?
> > schedule_preempt_disabled+0x29/0x70
> > Jun 15 09:49:30 r1obj02 kernel: [607697.849087] [<ffffffffa03e2e0b>]
> > xfs_readdir+0xeb/0x110 [xfs]
> > Jun 15 09:49:30 r1obj02 kernel: [607697.887918] [<ffffffffa03e4a3b>]
> > xfs_file_readdir+0x2b/0x40 [xfs]
> > Jun 15 09:49:30 r1obj02 kernel: [607697.926061] [<ffffffff811d0035>]
> > iterate_dir+0xa5/0xe0
> > Jun 15 09:49:30 r1obj02 kernel: [607697.963349] [<ffffffff8109ddf4>] ?
> > vtime_account_user+0x54/0x60
> > Jun 15 09:49:30 r1obj02 kernel: [607698.000413] [<ffffffff811d0492>]
> > SyS_getdents+0x92/0x120
> > Jun 15 09:49:30 r1obj02 kernel: [607698.037112] [<ffffffff811d0150>] ?
> > fillonedir+0xe0/0xe0
> > Jun 15 09:49:30 r1obj02 kernel: [607698.072867] [<ffffffff8172c81c>] ?
> > tracesys+0x7e/0xe6
> > Jun 15 09:49:30 r1obj02 kernel: [607698.107679] [<ffffffff8172c87f>]
> > tracesys+0xe1/0xe6
> > Jun 15 09:49:30 r1obj02 kernel: [607698.141543] Code: 00 48 8b 06 48
> > ba ff ff ff ff ff ff ff 00 5d 48 0f c8 48 21 d0 c3 66 66 2e 0f 1f 84
> > 00 00 00 00 00 0f 1f 44 00 00 55 48 8d 77 02 <0f> b6 7f 01 48 89 e5 e8
> > aa ff ff ff 5d c3 0f 1f 84 00 00 00 00
> > Jun 15 09:49:30 r1obj02 kernel: [607698.244881] RIP
> > [<ffffffffa041a99a>] xfs_dir2_sf_get_parent_ino+0xa/0x20 [xfs]
> > Jun 15 09:49:30 r1obj02 kernel: [607698.310872] RSP <ffff8808e87e5e38>
> > Jun 15 09:49:30 r1obj02 kernel: [607698.343092] CR2: 0000000000000001
> > Jun 15 09:49:30 r1obj02 kernel: [607698.420933] ---[ end trace
> > ba3fdf319346b7e6 ]---
> >
> > Thanks // Hugo Kuo
> > ​
>
> > _______________________________________________
> > xfs mailing list
> > xfs@oss.sgi.com
> > http://oss.sgi.com/mailman/listinfo/xfs
>
>

[-- Attachment #1.2: Type: text/html, Size: 11872 bytes --]

[-- Attachment #2: Type: text/plain, Size: 121 bytes --]

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Data can't be wrote to XFS RIP [<ffffffffa041a99a>] xfs_dir2_sf_get_parent_ino+0xa/0x20
  2015-06-18 14:29   ` Kuo Hugo
@ 2015-06-18 14:59     ` Eric Sandeen
  2015-07-09 10:57       ` Kuo Hugo
  0 siblings, 1 reply; 23+ messages in thread
From: Eric Sandeen @ 2015-06-18 14:59 UTC (permalink / raw)
  To: Kuo Hugo, Brian Foster, xfs; +Cc: Hugo Kuo, darrell

On 6/18/15 9:29 AM, Kuo Hugo wrote:
>>- Have you tried an 'xfs_repair -n' of the affected filesystem? Note that -n will report problems only and prevent any modification by repair.
> 
> *We might to to xfs_repair if we can address which disk causes the issue. *

If you do, please save the output, and if it finds anything, please provide the output in this thread.

Thanks,
-Eric

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Data can't be wrote to XFS RIP [<ffffffffa041a99a>] xfs_dir2_sf_get_parent_ino+0xa/0x20
  2015-06-18 14:59     ` Eric Sandeen
@ 2015-07-09 10:57       ` Kuo Hugo
  2015-07-09 12:51         ` Brian Foster
  0 siblings, 1 reply; 23+ messages in thread
From: Kuo Hugo @ 2015-07-09 10:57 UTC (permalink / raw)
  To: Eric Sandeen; +Cc: Hugo Kuo, Brian Foster, darrell, xfs


[-- Attachment #1.1: Type: text/plain, Size: 7558 bytes --]

Hi Folks,

As the results of 32 disks with xfs_repair -n seems no any error shows up.
We currently tried to deploy CentOS 6.6 for testing. (The previous kernel
panic was came from Ubuntu).
The CentOS nodes encountered kernel panic with same daemon but the problem
may a bit differ.

   - It was broken on xfs_dir2_sf_get_parent_ino+0xa/0x20 in Ubuntu.
   - Here’s the log in CentOS. It’s broken on
   xfs_dir2_sf_getdents+0x2a0/0x3a0

<1>BUG: unable to handle kernel NULL pointer dereference at 0000000000000001
<1>IP: [<ffffffffa0362d60>] xfs_dir2_sf_getdents+0x2a0/0x3a0 [xfs]
<4>PGD 1072327067 PUD 1072328067 PMD 0
<4>Oops: 0000 [#1] SMP
<4>last sysfs file:
/sys/devices/pci0000:80/0000:80:03.2/0000:83:00.0/host10/port-10:1/expander-10:1/port-10:1:16/end_device-10:1:16/target10:0:25/10:0:25:0/block/sdz/queue/rotational
<4>CPU 17
<4>Modules linked in: xt_conntrack tun xfs exportfs iptable_filter
ipt_REDIRECT iptable_nat nf_nat nf_conntrack_ipv4 nf_conntrack
nf_defrag_ipv4 ip_tables ip_vs ipv6 libcrc32c iTCO_wdt
iTCO_vendor_support ses enclosure igb i2c_algo_bit sb_edac edac_core
i2c_i801 i2c_core sg shpchp lpc_ich mfd_core ixgbe dca ptp pps_core
mdio power_meter acpi_ipmi ipmi_si ipmi_msghandler ext4 jbd2 mbcache
sd_mod crc_t10dif mpt3sas scsi_transport_sas raid_class xhci_hcd ahci
wmi dm_mirror dm_region_hash dm_log dm_mod [last unloaded:
scsi_wait_scan]
<4>
<4>Pid: 4454, comm: swift-object-se Not tainted
2.6.32-504.23.4.el6.x86_64 #1 Silicon Mechanics Storform
R518.v5P/X10DRi-T4+
<4>RIP: 0010:[<ffffffffa0362d60>]  [<ffffffffa0362d60>]
xfs_dir2_sf_getdents+0x2a0/0x3a0 [xfs]
<4>RSP: 0018:ffff880871f6de18  EFLAGS: 00010202
<4>RAX: 0000000000000000 RBX: 0000000000000004 RCX: 0000000000000000
<4>RDX: 0000000000000001 RSI: 0000000000000000 RDI: 00007faa74006203
<4>RBP: ffff880871f6de68 R08: 000000032eb04bc9 R09: 0000000000000004
<4>R10: 0000000000008030 R11: 0000000000000246 R12: 0000000000000000
<4>R13: 0000000000000002 R14: ffff88106eff7000 R15: ffff8808715b4580
<4>FS:  00007faa85425700(0000) GS:ffff880028360000(0000) knlGS:0000000000000000
<4>CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
<4>CR2: 0000000000000001 CR3: 0000001072325000 CR4: 00000000001407e0
<4>DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
<4>DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
<4>Process swift-object-se (pid: 4454, threadinfo ffff880871f6c000,
task ffff880860f18ab0)
<4>Stack:
<4> ffff880871f6de28 ffffffff811a4bb0 ffff880871f6df38 ffff880874749cc0
<4><d> 0000000100000103 ffff8802381f8c00 ffff880871f6df38 ffff8808715b4580
<4><d> 0000000000000082 ffff8802381f8d88 ffff880871f6dec8 ffffffffa035ab31
<4>Call Trace:
<4> [<ffffffff811a4bb0>] ? filldir+0x0/0xe0
<4> [<ffffffffa035ab31>] xfs_readdir+0xe1/0x130 [xfs]
<4> [<ffffffff811a4bb0>] ? filldir+0x0/0xe0
<4> [<ffffffffa038fe29>] xfs_file_readdir+0x39/0x50 [xfs]
<4> [<ffffffff811a4e30>] vfs_readdir+0xc0/0xe0
<4> [<ffffffff8119bd86>] ? final_putname+0x26/0x50
<4> [<ffffffff811a4fb9>] sys_getdents+0x89/0xf0
<4> [<ffffffff8100b0f2>] system_call_fastpath+0x16/0x1b
<4>Code: 01 00 00 00 48 c7 c6 38 6b 3a a0 48 8b 7d c0 ff 55 b8 85 c0
0f 85 af 00 00 00 49 8b 37 e9 ec fd ff ff 66 0f 1f 84 00 00 00 00 00
<41> 80 7c 24 01 00 0f 84 9c 00 00 00 45 0f b6 44 24 03 41 0f b6
<1>RIP  [<ffffffffa0362d60>] xfs_dir2_sf_getdents+0x2a0/0x3a0 [xfs]
<4> RSP <ffff880871f6de18>
<4>CR2: 0000000000000001

PID: 4454   TASK: ffff880860f18ab0  CPU: 17  COMMAND: "swift-object-se"
ROOT: /    CWD: /
 FD       FILE            DENTRY           INODE       TYPE PATH
  0 ffff881073604900 ffff8810749a9440 ffff8808740b9728 CHR  /dev/null
  1 ffff881073604900 ffff8810749a9440 ffff8808740b9728 CHR  /dev/null
  2 ffff881073604900 ffff8810749a9440 ffff8808740b9728 CHR  /dev/null
  3 ffff881074222840 ffff88106e184980 ffff88106e16cd48 SOCK
  4 ffff881072952c00 ffff88106e1848c0 ffff8810711cca08 SOCK
  5 ffff88087154b2c0 ffff88044f0f1780 ffff880018e51a08 SOCK
  6 ffff8810716f2600 ffff88107122d5c0 ffff881071315cb8 REG  /tmp/ffi1ECJ8Z
  7 ffff88086bd0a6c0 ffff88086fc4b840 ffff88086fc4a100 REG  /tmp/ffiIArHUO
  8 ffff88106f516ec0 ffff881067d9ae00 ffff8808745aa5e8 REG  [eventpoll]
  9 ffff88106ed35b40 ffff88106e15b200 ffff88106e201cc8 SOCK
 10 ffff88106f31ae00 ffff881074ad75c0 ffff88106e169a08 SOCK
 11 ffff88106ede2740 ffff881067f9a8c0 ffff8808745aa5e8 REG  [eventpoll]
 12 ffff880122e8bc80 ffff8808745a8240 ffff881074379d48 CHR  /dev/urandom
 13 ffff88087162e200 ffff88086fd87200 ffff88086fe82748 SOCK
 14 ffff88087135e840 ffff88086fc653c0 ffff88086fe82488 SOCK
 15 ffff88106f36f900 ffff88106e263680 ffff8808745aa5e8 REG  [eventpoll]
 16 ffff8810737f8680 ffff88106e3e9c80 ffff8808745aa5e8 REG  [eventpoll]
 17 ffff881073635540 ffff88106e1f9240 ffff8808745aa5e8 REG  [eventpoll]
 18 ffff88106ef5ba40 ffff88105bb6b080 ffff8808745aa5e8 REG  [eventpoll]
 19 ffff881074222300 ffff88105b975300 ffff8808745aa5e8 REG  [eventpoll]
 20 ffff881073770f00 ffff881013478080 ffff8808745aa5e8 REG  [eventpoll]
 21 ffff8810737f8bc0 ffff88106e3e9500 ffff8808745aa5e8 REG  [eventpoll]
 22 ffff88106ef5bc80 ffff88105bb72e00 ffff8808745aa5e8 REG  [eventpoll]
 23 ffff88106ef25e00 ffff88106e3e9bc0 ffff8808745aa5e8 REG  [eventpoll]
 24 ffff881071950d80 ffff8810383ee980 ffff8808745aa5e8 REG  [eventpoll]
 25 ffff88106ecca600 ffff881067d96840 ffff8808745aa5e8 REG  [eventpoll]
 26 ffff8808737ec740 ffff880855d95cc0 ffff8808745aa5e8 REG  [eventpoll]
 27 ffff88107345d3c0 ffff880fc46160c0 ffff8808745aa5e8 REG  [eventpoll]
 28 ffff88086bf2d600 ffff880777987b00 ffff880159ac0448 SOCK
 29 ffff8808737e9240 ffff880855c80a40 ffff8808745aa5e8 REG  [eventpoll]
 30 ffff88106f5e0140 ffff880ff5752440 ffff8808745aa5e8 REG  [eventpoll]
 31 ffff8808703a19c0 ffff8807847c8e40 ffff8808745aa5e8 REG  [eventpoll]
 32 ffff88086bd738c0 ffff88033be10800 ffff8806b47f2c08 SOCK
 33 ffff88087119eb40 ffff8806916b48c0 ffff8804cd27e648 SOCK
 34 ffff880870aed480 ffff8806b3fc4900 ffff880015583588 REG
/srv/node/d199/objects/12860/2c0/323cc020fd7dbd6c12472cd1c10742c0/1436266036.98015.ts
 35 ffff88106eeb0e00 ffff88101347de40 ffff8808745aa5e8 REG  [eventpoll]
 36 ffff8808703ed6c0 ffff88086fd65540 ffff8805eb03ed88 REG
/srv/node/d205/quarantined/objects/cd1d68f515006d443a54ff4f658091bc-a114bba1449b45238abf38dc741d7c27/1436254020.89801.ts
 37 ffff8810718343c0 ffff88105b9d32c0 ffff8808745aa5e8 REG  [eventpoll]
 38 ffff8808713da780 ffff880010c9a900 ffff88096368a188 REG
/srv/node/d224/quarantined/objects/b146865bf8034bfc42570b747c341b32/1436266042.57775.ts
 39 ffff880871cb03c0 ffff880495a8b380 ffff8808a5e6c988 REG
/srv/node/d224/tmp/tmpSpnrHg
 40 ffff8808715b4540 ffff8804819c58c0 ffff8802381f8d88 DIR
/srv/node/d224/quarantined/objects/b146865bf8034bfc42570b747c341b32
 41 ffff880871fce240 ffff880951136c00 ffff880bacf63d88 DIR
/srv/node/d199/objects/12860/2c0/323cc020fd7dbd6c12472cd1c10742c0

I’ve got the vmcore dump from operator. Does vmcore help for
troubleshooting kind issue ?

Thanks // Hugo
​

2015-06-18 22:59 GMT+08:00 Eric Sandeen <sandeen@sandeen.net>:

> On 6/18/15 9:29 AM, Kuo Hugo wrote:
> >>- Have you tried an 'xfs_repair -n' of the affected filesystem? Note
> that -n will report problems only and prevent any modification by repair.
> >
> > *We might to to xfs_repair if we can address which disk causes the
> issue. *
>
> If you do, please save the output, and if it finds anything, please
> provide the output in this thread.
>
> Thanks,
> -Eric
>

[-- Attachment #1.2: Type: text/html, Size: 22938 bytes --]

[-- Attachment #2: Type: text/plain, Size: 121 bytes --]

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Data can't be wrote to XFS RIP [<ffffffffa041a99a>] xfs_dir2_sf_get_parent_ino+0xa/0x20
  2015-07-09 10:57       ` Kuo Hugo
@ 2015-07-09 12:51         ` Brian Foster
  2015-07-09 13:20           ` Kuo Hugo
  0 siblings, 1 reply; 23+ messages in thread
From: Brian Foster @ 2015-07-09 12:51 UTC (permalink / raw)
  To: Kuo Hugo; +Cc: Hugo Kuo, Eric Sandeen, darrell, xfs

On Thu, Jul 09, 2015 at 06:57:55PM +0800, Kuo Hugo wrote:
> Hi Folks,
> 
> As the results of 32 disks with xfs_repair -n seems no any error shows up.
> We currently tried to deploy CentOS 6.6 for testing. (The previous kernel
> panic was came from Ubuntu).
> The CentOS nodes encountered kernel panic with same daemon but the problem
> may a bit differ.
> 
>    - It was broken on xfs_dir2_sf_get_parent_ino+0xa/0x20 in Ubuntu.
>    - Here’s the log in CentOS. It’s broken on
>    xfs_dir2_sf_getdents+0x2a0/0x3a0
> 

I'd venture to guess it's the same behavior here. The previous kernel
had a callback for the parent inode number that was called via
xfs_dir2_sf_getdents(). Taking a look at a 6.6 kernel, it has a static
inline here instead.

> <1>BUG: unable to handle kernel NULL pointer dereference at 0000000000000001
> <1>IP: [<ffffffffa0362d60>] xfs_dir2_sf_getdents+0x2a0/0x3a0 [xfs]
> <4>PGD 1072327067 PUD 1072328067 PMD 0
> <4>Oops: 0000 [#1] SMP
> <4>last sysfs file:
> /sys/devices/pci0000:80/0000:80:03.2/0000:83:00.0/host10/port-10:1/expander-10:1/port-10:1:16/end_device-10:1:16/target10:0:25/10:0:25:0/block/sdz/queue/rotational
> <4>CPU 17
> <4>Modules linked in: xt_conntrack tun xfs exportfs iptable_filter
> ipt_REDIRECT iptable_nat nf_nat nf_conntrack_ipv4 nf_conntrack
> nf_defrag_ipv4 ip_tables ip_vs ipv6 libcrc32c iTCO_wdt
> iTCO_vendor_support ses enclosure igb i2c_algo_bit sb_edac edac_core
> i2c_i801 i2c_core sg shpchp lpc_ich mfd_core ixgbe dca ptp pps_core
> mdio power_meter acpi_ipmi ipmi_si ipmi_msghandler ext4 jbd2 mbcache
> sd_mod crc_t10dif mpt3sas scsi_transport_sas raid_class xhci_hcd ahci
> wmi dm_mirror dm_region_hash dm_log dm_mod [last unloaded:
> scsi_wait_scan]
> <4>
> <4>Pid: 4454, comm: swift-object-se Not tainted
> 2.6.32-504.23.4.el6.x86_64 #1 Silicon Mechanics Storform
> R518.v5P/X10DRi-T4+
> <4>RIP: 0010:[<ffffffffa0362d60>]  [<ffffffffa0362d60>]
> xfs_dir2_sf_getdents+0x2a0/0x3a0 [xfs]
> <4>RSP: 0018:ffff880871f6de18  EFLAGS: 00010202
> <4>RAX: 0000000000000000 RBX: 0000000000000004 RCX: 0000000000000000
> <4>RDX: 0000000000000001 RSI: 0000000000000000 RDI: 00007faa74006203
> <4>RBP: ffff880871f6de68 R08: 000000032eb04bc9 R09: 0000000000000004
> <4>R10: 0000000000008030 R11: 0000000000000246 R12: 0000000000000000
> <4>R13: 0000000000000002 R14: ffff88106eff7000 R15: ffff8808715b4580
> <4>FS:  00007faa85425700(0000) GS:ffff880028360000(0000) knlGS:0000000000000000
> <4>CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> <4>CR2: 0000000000000001 CR3: 0000001072325000 CR4: 00000000001407e0
> <4>DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> <4>DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
> <4>Process swift-object-se (pid: 4454, threadinfo ffff880871f6c000,
> task ffff880860f18ab0)
> <4>Stack:
> <4> ffff880871f6de28 ffffffff811a4bb0 ffff880871f6df38 ffff880874749cc0
> <4><d> 0000000100000103 ffff8802381f8c00 ffff880871f6df38 ffff8808715b4580
> <4><d> 0000000000000082 ffff8802381f8d88 ffff880871f6dec8 ffffffffa035ab31
> <4>Call Trace:
> <4> [<ffffffff811a4bb0>] ? filldir+0x0/0xe0
> <4> [<ffffffffa035ab31>] xfs_readdir+0xe1/0x130 [xfs]
> <4> [<ffffffff811a4bb0>] ? filldir+0x0/0xe0
> <4> [<ffffffffa038fe29>] xfs_file_readdir+0x39/0x50 [xfs]
> <4> [<ffffffff811a4e30>] vfs_readdir+0xc0/0xe0
> <4> [<ffffffff8119bd86>] ? final_putname+0x26/0x50
> <4> [<ffffffff811a4fb9>] sys_getdents+0x89/0xf0
> <4> [<ffffffff8100b0f2>] system_call_fastpath+0x16/0x1b
> <4>Code: 01 00 00 00 48 c7 c6 38 6b 3a a0 48 8b 7d c0 ff 55 b8 85 c0
> 0f 85 af 00 00 00 49 8b 37 e9 ec fd ff ff 66 0f 1f 84 00 00 00 00 00
> <41> 80 7c 24 01 00 0f 84 9c 00 00 00 45 0f b6 44 24 03 41 0f b6
> <1>RIP  [<ffffffffa0362d60>] xfs_dir2_sf_getdents+0x2a0/0x3a0 [xfs]
> <4> RSP <ffff880871f6de18>
> <4>CR2: 0000000000000001
> 
...
> 
> I’ve got the vmcore dump from operator. Does vmcore help for
> troubleshooting kind issue ?
> 

Hmm, well it couldn't hurt. Is the vmcore based on this 6.6 kernel? Can
you provide the exact kernel version and post the vmcore somewhere?

Brian

> Thanks // Hugo
> ​
> 
> 2015-06-18 22:59 GMT+08:00 Eric Sandeen <sandeen@sandeen.net>:
> 
> > On 6/18/15 9:29 AM, Kuo Hugo wrote:
> > >>- Have you tried an 'xfs_repair -n' of the affected filesystem? Note
> > that -n will report problems only and prevent any modification by repair.
> > >
> > > *We might to to xfs_repair if we can address which disk causes the
> > issue. *
> >
> > If you do, please save the output, and if it finds anything, please
> > provide the output in this thread.
> >
> > Thanks,
> > -Eric
> >

> _______________________________________________
> xfs mailing list
> xfs@oss.sgi.com
> http://oss.sgi.com/mailman/listinfo/xfs

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Data can't be wrote to XFS RIP [<ffffffffa041a99a>] xfs_dir2_sf_get_parent_ino+0xa/0x20
  2015-07-09 12:51         ` Brian Foster
@ 2015-07-09 13:20           ` Kuo Hugo
  2015-07-09 13:27             ` Kuo Hugo
  2015-07-09 15:18             ` Brian Foster
  0 siblings, 2 replies; 23+ messages in thread
From: Kuo Hugo @ 2015-07-09 13:20 UTC (permalink / raw)
  To: Brian Foster; +Cc: Hugo Kuo, Eric Sandeen, darrell, xfs


[-- Attachment #1.1: Type: text/plain, Size: 5608 bytes --]

Hi Brian,

*Operating System Version:*
Linux-2.6.32-504.23.4.el6.x86_64-x86_64-with-centos-6.6-Final

*NODE 1*

https://cloud.swiftstack.com/v1/AUTH_burton/brtnswift/vmcore
https://cloud.swiftstack.com/v1/AUTH_burton/brtnswift/vmcore-dmesg.txt


*NODE 2*

https://cloud.swiftstack.com/v1/AUTH_burton/brtnswift/vmcore_r2obj02
https://cloud.swiftstack.com/v1/AUTH_burton/brtnswift/vmcore-dmesg_r2obj02.txt


Any thoughts would be appreciate

Thanks // Hugo


2015-07-09 20:51 GMT+08:00 Brian Foster <bfoster@redhat.com>:

> On Thu, Jul 09, 2015 at 06:57:55PM +0800, Kuo Hugo wrote:
> > Hi Folks,
> >
> > As the results of 32 disks with xfs_repair -n seems no any error shows
> up.
> > We currently tried to deploy CentOS 6.6 for testing. (The previous kernel
> > panic was came from Ubuntu).
> > The CentOS nodes encountered kernel panic with same daemon but the
> problem
> > may a bit differ.
> >
> >    - It was broken on xfs_dir2_sf_get_parent_ino+0xa/0x20 in Ubuntu.
> >    - Here’s the log in CentOS. It’s broken on
> >    xfs_dir2_sf_getdents+0x2a0/0x3a0
> >
>
> I'd venture to guess it's the same behavior here. The previous kernel
> had a callback for the parent inode number that was called via
> xfs_dir2_sf_getdents(). Taking a look at a 6.6 kernel, it has a static
> inline here instead.
>
> > <1>BUG: unable to handle kernel NULL pointer dereference at
> 0000000000000001
> > <1>IP: [<ffffffffa0362d60>] xfs_dir2_sf_getdents+0x2a0/0x3a0 [xfs]
> > <4>PGD 1072327067 PUD 1072328067 PMD 0
> > <4>Oops: 0000 [#1] SMP
> > <4>last sysfs file:
> >
> /sys/devices/pci0000:80/0000:80:03.2/0000:83:00.0/host10/port-10:1/expander-10:1/port-10:1:16/end_device-10:1:16/target10:0:25/10:0:25:0/block/sdz/queue/rotational
> > <4>CPU 17
> > <4>Modules linked in: xt_conntrack tun xfs exportfs iptable_filter
> > ipt_REDIRECT iptable_nat nf_nat nf_conntrack_ipv4 nf_conntrack
> > nf_defrag_ipv4 ip_tables ip_vs ipv6 libcrc32c iTCO_wdt
> > iTCO_vendor_support ses enclosure igb i2c_algo_bit sb_edac edac_core
> > i2c_i801 i2c_core sg shpchp lpc_ich mfd_core ixgbe dca ptp pps_core
> > mdio power_meter acpi_ipmi ipmi_si ipmi_msghandler ext4 jbd2 mbcache
> > sd_mod crc_t10dif mpt3sas scsi_transport_sas raid_class xhci_hcd ahci
> > wmi dm_mirror dm_region_hash dm_log dm_mod [last unloaded:
> > scsi_wait_scan]
> > <4>
> > <4>Pid: 4454, comm: swift-object-se Not tainted
> > 2.6.32-504.23.4.el6.x86_64 #1 Silicon Mechanics Storform
> > R518.v5P/X10DRi-T4+
> > <4>RIP: 0010:[<ffffffffa0362d60>]  [<ffffffffa0362d60>]
> > xfs_dir2_sf_getdents+0x2a0/0x3a0 [xfs]
> > <4>RSP: 0018:ffff880871f6de18  EFLAGS: 00010202
> > <4>RAX: 0000000000000000 RBX: 0000000000000004 RCX: 0000000000000000
> > <4>RDX: 0000000000000001 RSI: 0000000000000000 RDI: 00007faa74006203
> > <4>RBP: ffff880871f6de68 R08: 000000032eb04bc9 R09: 0000000000000004
> > <4>R10: 0000000000008030 R11: 0000000000000246 R12: 0000000000000000
> > <4>R13: 0000000000000002 R14: ffff88106eff7000 R15: ffff8808715b4580
> > <4>FS:  00007faa85425700(0000) GS:ffff880028360000(0000)
> knlGS:0000000000000000
> > <4>CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > <4>CR2: 0000000000000001 CR3: 0000001072325000 CR4: 00000000001407e0
> > <4>DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> > <4>DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
> > <4>Process swift-object-se (pid: 4454, threadinfo ffff880871f6c000,
> > task ffff880860f18ab0)
> > <4>Stack:
> > <4> ffff880871f6de28 ffffffff811a4bb0 ffff880871f6df38 ffff880874749cc0
> > <4><d> 0000000100000103 ffff8802381f8c00 ffff880871f6df38
> ffff8808715b4580
> > <4><d> 0000000000000082 ffff8802381f8d88 ffff880871f6dec8
> ffffffffa035ab31
> > <4>Call Trace:
> > <4> [<ffffffff811a4bb0>] ? filldir+0x0/0xe0
> > <4> [<ffffffffa035ab31>] xfs_readdir+0xe1/0x130 [xfs]
> > <4> [<ffffffff811a4bb0>] ? filldir+0x0/0xe0
> > <4> [<ffffffffa038fe29>] xfs_file_readdir+0x39/0x50 [xfs]
> > <4> [<ffffffff811a4e30>] vfs_readdir+0xc0/0xe0
> > <4> [<ffffffff8119bd86>] ? final_putname+0x26/0x50
> > <4> [<ffffffff811a4fb9>] sys_getdents+0x89/0xf0
> > <4> [<ffffffff8100b0f2>] system_call_fastpath+0x16/0x1b
> > <4>Code: 01 00 00 00 48 c7 c6 38 6b 3a a0 48 8b 7d c0 ff 55 b8 85 c0
> > 0f 85 af 00 00 00 49 8b 37 e9 ec fd ff ff 66 0f 1f 84 00 00 00 00 00
> > <41> 80 7c 24 01 00 0f 84 9c 00 00 00 45 0f b6 44 24 03 41 0f b6
> > <1>RIP  [<ffffffffa0362d60>] xfs_dir2_sf_getdents+0x2a0/0x3a0 [xfs]
> > <4> RSP <ffff880871f6de18>
> > <4>CR2: 0000000000000001
> >
> ...
> >
> > I’ve got the vmcore dump from operator. Does vmcore help for
> > troubleshooting kind issue ?
> >
>
> Hmm, well it couldn't hurt. Is the vmcore based on this 6.6 kernel? Can
> you provide the exact kernel version and post the vmcore somewhere?
>
> Brian
>
> > Thanks // Hugo
> > ​
> >
> > 2015-06-18 22:59 GMT+08:00 Eric Sandeen <sandeen@sandeen.net>:
> >
> > > On 6/18/15 9:29 AM, Kuo Hugo wrote:
> > > >>- Have you tried an 'xfs_repair -n' of the affected filesystem? Note
> > > that -n will report problems only and prevent any modification by
> repair.
> > > >
> > > > *We might to to xfs_repair if we can address which disk causes the
> > > issue. *
> > >
> > > If you do, please save the output, and if it finds anything, please
> > > provide the output in this thread.
> > >
> > > Thanks,
> > > -Eric
> > >
>
> > _______________________________________________
> > xfs mailing list
> > xfs@oss.sgi.com
> > http://oss.sgi.com/mailman/listinfo/xfs
>
>

[-- Attachment #1.2: Type: text/html, Size: 7904 bytes --]

[-- Attachment #2: Type: text/plain, Size: 121 bytes --]

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Data can't be wrote to XFS RIP [<ffffffffa041a99a>] xfs_dir2_sf_get_parent_ino+0xa/0x20
  2015-07-09 13:20           ` Kuo Hugo
@ 2015-07-09 13:27             ` Kuo Hugo
  2015-07-09 15:18             ` Brian Foster
  1 sibling, 0 replies; 23+ messages in thread
From: Kuo Hugo @ 2015-07-09 13:27 UTC (permalink / raw)
  To: Brian Foster; +Cc: Hugo Kuo, Eric Sandeen, darrell, xfs


[-- Attachment #1.1: Type: text/plain, Size: 6218 bytes --]

For vmcore files please use these links which will be available in 24hrs.
Thanks

https://cloud.swiftstack.com/v1/AUTH_burton/brtnswift/vmcore?temp_url_sig=e54df337458dc48b3c7e211d9e36bc4df1939c33&temp_url_expires=1436534130

https://cloud.swiftstack.com/v1/AUTH_burton/brtnswift/vmcore_r2obj02?temp_url_sig=663c6abd17ce2ee8eac92f8b0e388e7a0ec3d052&temp_url_expires=1436534130

2015-07-09 21:20 GMT+08:00 Kuo Hugo <tonytkdk@gmail.com>:

> Hi Brian,
>
> *Operating System Version:*
> Linux-2.6.32-504.23.4.el6.x86_64-x86_64-with-centos-6.6-Final
>
> *NODE 1*
>
> https://cloud.swiftstack.com/v1/AUTH_burton/brtnswift/vmcore
> https://cloud.swiftstack.com/v1/AUTH_burton/brtnswift/vmcore-dmesg.txt
>
>
> *NODE 2*
>
> https://cloud.swiftstack.com/v1/AUTH_burton/brtnswift/vmcore_r2obj02
>
> https://cloud.swiftstack.com/v1/AUTH_burton/brtnswift/vmcore-dmesg_r2obj02.txt
>
>
> Any thoughts would be appreciate
>
> Thanks // Hugo
>
>
> 2015-07-09 20:51 GMT+08:00 Brian Foster <bfoster@redhat.com>:
>
>> On Thu, Jul 09, 2015 at 06:57:55PM +0800, Kuo Hugo wrote:
>> > Hi Folks,
>> >
>> > As the results of 32 disks with xfs_repair -n seems no any error shows
>> up.
>> > We currently tried to deploy CentOS 6.6 for testing. (The previous
>> kernel
>> > panic was came from Ubuntu).
>> > The CentOS nodes encountered kernel panic with same daemon but the
>> problem
>> > may a bit differ.
>> >
>> >    - It was broken on xfs_dir2_sf_get_parent_ino+0xa/0x20 in Ubuntu.
>> >    - Here’s the log in CentOS. It’s broken on
>> >    xfs_dir2_sf_getdents+0x2a0/0x3a0
>> >
>>
>> I'd venture to guess it's the same behavior here. The previous kernel
>> had a callback for the parent inode number that was called via
>> xfs_dir2_sf_getdents(). Taking a look at a 6.6 kernel, it has a static
>> inline here instead.
>>
>> > <1>BUG: unable to handle kernel NULL pointer dereference at
>> 0000000000000001
>> > <1>IP: [<ffffffffa0362d60>] xfs_dir2_sf_getdents+0x2a0/0x3a0 [xfs]
>> > <4>PGD 1072327067 PUD 1072328067 PMD 0
>> > <4>Oops: 0000 [#1] SMP
>> > <4>last sysfs file:
>> >
>> /sys/devices/pci0000:80/0000:80:03.2/0000:83:00.0/host10/port-10:1/expander-10:1/port-10:1:16/end_device-10:1:16/target10:0:25/10:0:25:0/block/sdz/queue/rotational
>> > <4>CPU 17
>> > <4>Modules linked in: xt_conntrack tun xfs exportfs iptable_filter
>> > ipt_REDIRECT iptable_nat nf_nat nf_conntrack_ipv4 nf_conntrack
>> > nf_defrag_ipv4 ip_tables ip_vs ipv6 libcrc32c iTCO_wdt
>> > iTCO_vendor_support ses enclosure igb i2c_algo_bit sb_edac edac_core
>> > i2c_i801 i2c_core sg shpchp lpc_ich mfd_core ixgbe dca ptp pps_core
>> > mdio power_meter acpi_ipmi ipmi_si ipmi_msghandler ext4 jbd2 mbcache
>> > sd_mod crc_t10dif mpt3sas scsi_transport_sas raid_class xhci_hcd ahci
>> > wmi dm_mirror dm_region_hash dm_log dm_mod [last unloaded:
>> > scsi_wait_scan]
>> > <4>
>> > <4>Pid: 4454, comm: swift-object-se Not tainted
>> > 2.6.32-504.23.4.el6.x86_64 #1 Silicon Mechanics Storform
>> > R518.v5P/X10DRi-T4+
>> > <4>RIP: 0010:[<ffffffffa0362d60>]  [<ffffffffa0362d60>]
>> > xfs_dir2_sf_getdents+0x2a0/0x3a0 [xfs]
>> > <4>RSP: 0018:ffff880871f6de18  EFLAGS: 00010202
>> > <4>RAX: 0000000000000000 RBX: 0000000000000004 RCX: 0000000000000000
>> > <4>RDX: 0000000000000001 RSI: 0000000000000000 RDI: 00007faa74006203
>> > <4>RBP: ffff880871f6de68 R08: 000000032eb04bc9 R09: 0000000000000004
>> > <4>R10: 0000000000008030 R11: 0000000000000246 R12: 0000000000000000
>> > <4>R13: 0000000000000002 R14: ffff88106eff7000 R15: ffff8808715b4580
>> > <4>FS:  00007faa85425700(0000) GS:ffff880028360000(0000)
>> knlGS:0000000000000000
>> > <4>CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>> > <4>CR2: 0000000000000001 CR3: 0000001072325000 CR4: 00000000001407e0
>> > <4>DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
>> > <4>DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
>> > <4>Process swift-object-se (pid: 4454, threadinfo ffff880871f6c000,
>> > task ffff880860f18ab0)
>> > <4>Stack:
>> > <4> ffff880871f6de28 ffffffff811a4bb0 ffff880871f6df38 ffff880874749cc0
>> > <4><d> 0000000100000103 ffff8802381f8c00 ffff880871f6df38
>> ffff8808715b4580
>> > <4><d> 0000000000000082 ffff8802381f8d88 ffff880871f6dec8
>> ffffffffa035ab31
>> > <4>Call Trace:
>> > <4> [<ffffffff811a4bb0>] ? filldir+0x0/0xe0
>> > <4> [<ffffffffa035ab31>] xfs_readdir+0xe1/0x130 [xfs]
>> > <4> [<ffffffff811a4bb0>] ? filldir+0x0/0xe0
>> > <4> [<ffffffffa038fe29>] xfs_file_readdir+0x39/0x50 [xfs]
>> > <4> [<ffffffff811a4e30>] vfs_readdir+0xc0/0xe0
>> > <4> [<ffffffff8119bd86>] ? final_putname+0x26/0x50
>> > <4> [<ffffffff811a4fb9>] sys_getdents+0x89/0xf0
>> > <4> [<ffffffff8100b0f2>] system_call_fastpath+0x16/0x1b
>> > <4>Code: 01 00 00 00 48 c7 c6 38 6b 3a a0 48 8b 7d c0 ff 55 b8 85 c0
>> > 0f 85 af 00 00 00 49 8b 37 e9 ec fd ff ff 66 0f 1f 84 00 00 00 00 00
>> > <41> 80 7c 24 01 00 0f 84 9c 00 00 00 45 0f b6 44 24 03 41 0f b6
>> > <1>RIP  [<ffffffffa0362d60>] xfs_dir2_sf_getdents+0x2a0/0x3a0 [xfs]
>> > <4> RSP <ffff880871f6de18>
>> > <4>CR2: 0000000000000001
>> >
>> ...
>> >
>> > I’ve got the vmcore dump from operator. Does vmcore help for
>> > troubleshooting kind issue ?
>> >
>>
>> Hmm, well it couldn't hurt. Is the vmcore based on this 6.6 kernel? Can
>> you provide the exact kernel version and post the vmcore somewhere?
>>
>> Brian
>>
>> > Thanks // Hugo
>> > ​
>> >
>> > 2015-06-18 22:59 GMT+08:00 Eric Sandeen <sandeen@sandeen.net>:
>> >
>> > > On 6/18/15 9:29 AM, Kuo Hugo wrote:
>> > > >>- Have you tried an 'xfs_repair -n' of the affected filesystem? Note
>> > > that -n will report problems only and prevent any modification by
>> repair.
>> > > >
>> > > > *We might to to xfs_repair if we can address which disk causes the
>> > > issue. *
>> > >
>> > > If you do, please save the output, and if it finds anything, please
>> > > provide the output in this thread.
>> > >
>> > > Thanks,
>> > > -Eric
>> > >
>>
>> > _______________________________________________
>> > xfs mailing list
>> > xfs@oss.sgi.com
>> > http://oss.sgi.com/mailman/listinfo/xfs
>>
>>
>

[-- Attachment #1.2: Type: text/html, Size: 9087 bytes --]

[-- Attachment #2: Type: text/plain, Size: 121 bytes --]

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Data can't be wrote to XFS RIP [<ffffffffa041a99a>] xfs_dir2_sf_get_parent_ino+0xa/0x20
  2015-07-09 13:20           ` Kuo Hugo
  2015-07-09 13:27             ` Kuo Hugo
@ 2015-07-09 15:18             ` Brian Foster
  2015-07-09 16:40               ` Kuo Hugo
  1 sibling, 1 reply; 23+ messages in thread
From: Brian Foster @ 2015-07-09 15:18 UTC (permalink / raw)
  To: Kuo Hugo; +Cc: Hugo Kuo, Eric Sandeen, darrell, xfs

On Thu, Jul 09, 2015 at 09:20:00PM +0800, Kuo Hugo wrote:
> Hi Brian,
> 
> *Operating System Version:*
> Linux-2.6.32-504.23.4.el6.x86_64-x86_64-with-centos-6.6-Final
> 
> *NODE 1*
> 
> https://cloud.swiftstack.com/v1/AUTH_burton/brtnswift/vmcore
> https://cloud.swiftstack.com/v1/AUTH_burton/brtnswift/vmcore-dmesg.txt
> 
> 
> *NODE 2*
> 
> https://cloud.swiftstack.com/v1/AUTH_burton/brtnswift/vmcore_r2obj02
> https://cloud.swiftstack.com/v1/AUTH_burton/brtnswift/vmcore-dmesg_r2obj02.txt
> 
> 
> Any thoughts would be appreciate
> 

I'm not able to fire up crash with these core files and the kernel debug
info from the following centos kernel debuginfo package:

kernel-debuginfo-2.6.32-504.23.4.el6.centos.plus.x86_64.rpm

It complains about a version mismatch between the vmlinux and core file.
I'm no crash expert... are you sure the cores above correspond to this
kernel? Does crash load up for you on said box if you run something like
the following?

	crash /usr/lib/debug/lib/modules/.../vmlinux vmcore

Note that you might need to install the above kernel-debuginfo package
to get the debug (vmlinux) file. If so, could you also upload that
debuginfo rpm somewhere?

Brian

> Thanks // Hugo
> 
> 
> 2015-07-09 20:51 GMT+08:00 Brian Foster <bfoster@redhat.com>:
> 
> > On Thu, Jul 09, 2015 at 06:57:55PM +0800, Kuo Hugo wrote:
> > > Hi Folks,
> > >
> > > As the results of 32 disks with xfs_repair -n seems no any error shows
> > up.
> > > We currently tried to deploy CentOS 6.6 for testing. (The previous kernel
> > > panic was came from Ubuntu).
> > > The CentOS nodes encountered kernel panic with same daemon but the
> > problem
> > > may a bit differ.
> > >
> > >    - It was broken on xfs_dir2_sf_get_parent_ino+0xa/0x20 in Ubuntu.
> > >    - Here’s the log in CentOS. It’s broken on
> > >    xfs_dir2_sf_getdents+0x2a0/0x3a0
> > >
> >
> > I'd venture to guess it's the same behavior here. The previous kernel
> > had a callback for the parent inode number that was called via
> > xfs_dir2_sf_getdents(). Taking a look at a 6.6 kernel, it has a static
> > inline here instead.
> >
> > > <1>BUG: unable to handle kernel NULL pointer dereference at
> > 0000000000000001
> > > <1>IP: [<ffffffffa0362d60>] xfs_dir2_sf_getdents+0x2a0/0x3a0 [xfs]
> > > <4>PGD 1072327067 PUD 1072328067 PMD 0
> > > <4>Oops: 0000 [#1] SMP
> > > <4>last sysfs file:
> > >
> > /sys/devices/pci0000:80/0000:80:03.2/0000:83:00.0/host10/port-10:1/expander-10:1/port-10:1:16/end_device-10:1:16/target10:0:25/10:0:25:0/block/sdz/queue/rotational
> > > <4>CPU 17
> > > <4>Modules linked in: xt_conntrack tun xfs exportfs iptable_filter
> > > ipt_REDIRECT iptable_nat nf_nat nf_conntrack_ipv4 nf_conntrack
> > > nf_defrag_ipv4 ip_tables ip_vs ipv6 libcrc32c iTCO_wdt
> > > iTCO_vendor_support ses enclosure igb i2c_algo_bit sb_edac edac_core
> > > i2c_i801 i2c_core sg shpchp lpc_ich mfd_core ixgbe dca ptp pps_core
> > > mdio power_meter acpi_ipmi ipmi_si ipmi_msghandler ext4 jbd2 mbcache
> > > sd_mod crc_t10dif mpt3sas scsi_transport_sas raid_class xhci_hcd ahci
> > > wmi dm_mirror dm_region_hash dm_log dm_mod [last unloaded:
> > > scsi_wait_scan]
> > > <4>
> > > <4>Pid: 4454, comm: swift-object-se Not tainted
> > > 2.6.32-504.23.4.el6.x86_64 #1 Silicon Mechanics Storform
> > > R518.v5P/X10DRi-T4+
> > > <4>RIP: 0010:[<ffffffffa0362d60>]  [<ffffffffa0362d60>]
> > > xfs_dir2_sf_getdents+0x2a0/0x3a0 [xfs]
> > > <4>RSP: 0018:ffff880871f6de18  EFLAGS: 00010202
> > > <4>RAX: 0000000000000000 RBX: 0000000000000004 RCX: 0000000000000000
> > > <4>RDX: 0000000000000001 RSI: 0000000000000000 RDI: 00007faa74006203
> > > <4>RBP: ffff880871f6de68 R08: 000000032eb04bc9 R09: 0000000000000004
> > > <4>R10: 0000000000008030 R11: 0000000000000246 R12: 0000000000000000
> > > <4>R13: 0000000000000002 R14: ffff88106eff7000 R15: ffff8808715b4580
> > > <4>FS:  00007faa85425700(0000) GS:ffff880028360000(0000)
> > knlGS:0000000000000000
> > > <4>CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > > <4>CR2: 0000000000000001 CR3: 0000001072325000 CR4: 00000000001407e0
> > > <4>DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> > > <4>DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
> > > <4>Process swift-object-se (pid: 4454, threadinfo ffff880871f6c000,
> > > task ffff880860f18ab0)
> > > <4>Stack:
> > > <4> ffff880871f6de28 ffffffff811a4bb0 ffff880871f6df38 ffff880874749cc0
> > > <4><d> 0000000100000103 ffff8802381f8c00 ffff880871f6df38
> > ffff8808715b4580
> > > <4><d> 0000000000000082 ffff8802381f8d88 ffff880871f6dec8
> > ffffffffa035ab31
> > > <4>Call Trace:
> > > <4> [<ffffffff811a4bb0>] ? filldir+0x0/0xe0
> > > <4> [<ffffffffa035ab31>] xfs_readdir+0xe1/0x130 [xfs]
> > > <4> [<ffffffff811a4bb0>] ? filldir+0x0/0xe0
> > > <4> [<ffffffffa038fe29>] xfs_file_readdir+0x39/0x50 [xfs]
> > > <4> [<ffffffff811a4e30>] vfs_readdir+0xc0/0xe0
> > > <4> [<ffffffff8119bd86>] ? final_putname+0x26/0x50
> > > <4> [<ffffffff811a4fb9>] sys_getdents+0x89/0xf0
> > > <4> [<ffffffff8100b0f2>] system_call_fastpath+0x16/0x1b
> > > <4>Code: 01 00 00 00 48 c7 c6 38 6b 3a a0 48 8b 7d c0 ff 55 b8 85 c0
> > > 0f 85 af 00 00 00 49 8b 37 e9 ec fd ff ff 66 0f 1f 84 00 00 00 00 00
> > > <41> 80 7c 24 01 00 0f 84 9c 00 00 00 45 0f b6 44 24 03 41 0f b6
> > > <1>RIP  [<ffffffffa0362d60>] xfs_dir2_sf_getdents+0x2a0/0x3a0 [xfs]
> > > <4> RSP <ffff880871f6de18>
> > > <4>CR2: 0000000000000001
> > >
> > ...
> > >
> > > I’ve got the vmcore dump from operator. Does vmcore help for
> > > troubleshooting kind issue ?
> > >
> >
> > Hmm, well it couldn't hurt. Is the vmcore based on this 6.6 kernel? Can
> > you provide the exact kernel version and post the vmcore somewhere?
> >
> > Brian
> >
> > > Thanks // Hugo
> > > ​
> > >
> > > 2015-06-18 22:59 GMT+08:00 Eric Sandeen <sandeen@sandeen.net>:
> > >
> > > > On 6/18/15 9:29 AM, Kuo Hugo wrote:
> > > > >>- Have you tried an 'xfs_repair -n' of the affected filesystem? Note
> > > > that -n will report problems only and prevent any modification by
> > repair.
> > > > >
> > > > > *We might to to xfs_repair if we can address which disk causes the
> > > > issue. *
> > > >
> > > > If you do, please save the output, and if it finds anything, please
> > > > provide the output in this thread.
> > > >
> > > > Thanks,
> > > > -Eric
> > > >
> >
> > > _______________________________________________
> > > xfs mailing list
> > > xfs@oss.sgi.com
> > > http://oss.sgi.com/mailman/listinfo/xfs
> >
> >

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Data can't be wrote to XFS RIP [<ffffffffa041a99a>] xfs_dir2_sf_get_parent_ino+0xa/0x20
  2015-07-09 15:18             ` Brian Foster
@ 2015-07-09 16:40               ` Kuo Hugo
  2015-07-09 18:32                 ` Brian Foster
  0 siblings, 1 reply; 23+ messages in thread
From: Kuo Hugo @ 2015-07-09 16:40 UTC (permalink / raw)
  To: Brian Foster; +Cc: Hugo Kuo, Eric Sandeen, darrell, xfs


[-- Attachment #1.1: Type: text/plain, Size: 7408 bytes --]

Hi Brain,

There you go.

https://cloud.swiftstack.com/v1/AUTH_hugo/public/vmlinux
https://cloud.swiftstack.com/v1/AUTH_hugo/public/System.map-2.6.32-504.23.4.el6.x86_64

$ md5sum vmlinux
82aaa694a174c0a29e78c05e73adf5d8  vmlinux

Yes, I can read it with this vmlinux image. Put all files
(vmcore,vmlinux,System.map) in a folder and run $crash vmlinux vmcore

Hugo
​

2015-07-09 23:18 GMT+08:00 Brian Foster <bfoster@redhat.com>:

> On Thu, Jul 09, 2015 at 09:20:00PM +0800, Kuo Hugo wrote:
> > Hi Brian,
> >
> > *Operating System Version:*
> > Linux-2.6.32-504.23.4.el6.x86_64-x86_64-with-centos-6.6-Final
> >
> > *NODE 1*
> >
> > https://cloud.swiftstack.com/v1/AUTH_burton/brtnswift/vmcore
> > https://cloud.swiftstack.com/v1/AUTH_burton/brtnswift/vmcore-dmesg.txt
> >
> >
> > *NODE 2*
> >
> > https://cloud.swiftstack.com/v1/AUTH_burton/brtnswift/vmcore_r2obj02
> >
> https://cloud.swiftstack.com/v1/AUTH_burton/brtnswift/vmcore-dmesg_r2obj02.txt
> >
> >
> > Any thoughts would be appreciate
> >
>
> I'm not able to fire up crash with these core files and the kernel debug
> info from the following centos kernel debuginfo package:
>
> kernel-debuginfo-2.6.32-504.23.4.el6.centos.plus.x86_64.rpm
>
> It complains about a version mismatch between the vmlinux and core file.
> I'm no crash expert... are you sure the cores above correspond to this
> kernel? Does crash load up for you on said box if you run something like
> the following?
>
>         crash /usr/lib/debug/lib/modules/.../vmlinux vmcore
>
> Note that you might need to install the above kernel-debuginfo package
> to get the debug (vmlinux) file. If so, could you also upload that
> debuginfo rpm somewhere?
>
> Brian
>
> > Thanks // Hugo
> >
> >
> > 2015-07-09 20:51 GMT+08:00 Brian Foster <bfoster@redhat.com>:
> >
> > > On Thu, Jul 09, 2015 at 06:57:55PM +0800, Kuo Hugo wrote:
> > > > Hi Folks,
> > > >
> > > > As the results of 32 disks with xfs_repair -n seems no any error
> shows
> > > up.
> > > > We currently tried to deploy CentOS 6.6 for testing. (The previous
> kernel
> > > > panic was came from Ubuntu).
> > > > The CentOS nodes encountered kernel panic with same daemon but the
> > > problem
> > > > may a bit differ.
> > > >
> > > >    - It was broken on xfs_dir2_sf_get_parent_ino+0xa/0x20 in Ubuntu.
> > > >    - Here’s the log in CentOS. It’s broken on
> > > >    xfs_dir2_sf_getdents+0x2a0/0x3a0
> > > >
> > >
> > > I'd venture to guess it's the same behavior here. The previous kernel
> > > had a callback for the parent inode number that was called via
> > > xfs_dir2_sf_getdents(). Taking a look at a 6.6 kernel, it has a static
> > > inline here instead.
> > >
> > > > <1>BUG: unable to handle kernel NULL pointer dereference at
> > > 0000000000000001
> > > > <1>IP: [<ffffffffa0362d60>] xfs_dir2_sf_getdents+0x2a0/0x3a0 [xfs]
> > > > <4>PGD 1072327067 PUD 1072328067 PMD 0
> > > > <4>Oops: 0000 [#1] SMP
> > > > <4>last sysfs file:
> > > >
> > >
> /sys/devices/pci0000:80/0000:80:03.2/0000:83:00.0/host10/port-10:1/expander-10:1/port-10:1:16/end_device-10:1:16/target10:0:25/10:0:25:0/block/sdz/queue/rotational
> > > > <4>CPU 17
> > > > <4>Modules linked in: xt_conntrack tun xfs exportfs iptable_filter
> > > > ipt_REDIRECT iptable_nat nf_nat nf_conntrack_ipv4 nf_conntrack
> > > > nf_defrag_ipv4 ip_tables ip_vs ipv6 libcrc32c iTCO_wdt
> > > > iTCO_vendor_support ses enclosure igb i2c_algo_bit sb_edac edac_core
> > > > i2c_i801 i2c_core sg shpchp lpc_ich mfd_core ixgbe dca ptp pps_core
> > > > mdio power_meter acpi_ipmi ipmi_si ipmi_msghandler ext4 jbd2 mbcache
> > > > sd_mod crc_t10dif mpt3sas scsi_transport_sas raid_class xhci_hcd ahci
> > > > wmi dm_mirror dm_region_hash dm_log dm_mod [last unloaded:
> > > > scsi_wait_scan]
> > > > <4>
> > > > <4>Pid: 4454, comm: swift-object-se Not tainted
> > > > 2.6.32-504.23.4.el6.x86_64 #1 Silicon Mechanics Storform
> > > > R518.v5P/X10DRi-T4+
> > > > <4>RIP: 0010:[<ffffffffa0362d60>]  [<ffffffffa0362d60>]
> > > > xfs_dir2_sf_getdents+0x2a0/0x3a0 [xfs]
> > > > <4>RSP: 0018:ffff880871f6de18  EFLAGS: 00010202
> > > > <4>RAX: 0000000000000000 RBX: 0000000000000004 RCX: 0000000000000000
> > > > <4>RDX: 0000000000000001 RSI: 0000000000000000 RDI: 00007faa74006203
> > > > <4>RBP: ffff880871f6de68 R08: 000000032eb04bc9 R09: 0000000000000004
> > > > <4>R10: 0000000000008030 R11: 0000000000000246 R12: 0000000000000000
> > > > <4>R13: 0000000000000002 R14: ffff88106eff7000 R15: ffff8808715b4580
> > > > <4>FS:  00007faa85425700(0000) GS:ffff880028360000(0000)
> > > knlGS:0000000000000000
> > > > <4>CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > > > <4>CR2: 0000000000000001 CR3: 0000001072325000 CR4: 00000000001407e0
> > > > <4>DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> > > > <4>DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
> > > > <4>Process swift-object-se (pid: 4454, threadinfo ffff880871f6c000,
> > > > task ffff880860f18ab0)
> > > > <4>Stack:
> > > > <4> ffff880871f6de28 ffffffff811a4bb0 ffff880871f6df38
> ffff880874749cc0
> > > > <4><d> 0000000100000103 ffff8802381f8c00 ffff880871f6df38
> > > ffff8808715b4580
> > > > <4><d> 0000000000000082 ffff8802381f8d88 ffff880871f6dec8
> > > ffffffffa035ab31
> > > > <4>Call Trace:
> > > > <4> [<ffffffff811a4bb0>] ? filldir+0x0/0xe0
> > > > <4> [<ffffffffa035ab31>] xfs_readdir+0xe1/0x130 [xfs]
> > > > <4> [<ffffffff811a4bb0>] ? filldir+0x0/0xe0
> > > > <4> [<ffffffffa038fe29>] xfs_file_readdir+0x39/0x50 [xfs]
> > > > <4> [<ffffffff811a4e30>] vfs_readdir+0xc0/0xe0
> > > > <4> [<ffffffff8119bd86>] ? final_putname+0x26/0x50
> > > > <4> [<ffffffff811a4fb9>] sys_getdents+0x89/0xf0
> > > > <4> [<ffffffff8100b0f2>] system_call_fastpath+0x16/0x1b
> > > > <4>Code: 01 00 00 00 48 c7 c6 38 6b 3a a0 48 8b 7d c0 ff 55 b8 85 c0
> > > > 0f 85 af 00 00 00 49 8b 37 e9 ec fd ff ff 66 0f 1f 84 00 00 00 00 00
> > > > <41> 80 7c 24 01 00 0f 84 9c 00 00 00 45 0f b6 44 24 03 41 0f b6
> > > > <1>RIP  [<ffffffffa0362d60>] xfs_dir2_sf_getdents+0x2a0/0x3a0 [xfs]
> > > > <4> RSP <ffff880871f6de18>
> > > > <4>CR2: 0000000000000001
> > > >
> > > ...
> > > >
> > > > I’ve got the vmcore dump from operator. Does vmcore help for
> > > > troubleshooting kind issue ?
> > > >
> > >
> > > Hmm, well it couldn't hurt. Is the vmcore based on this 6.6 kernel? Can
> > > you provide the exact kernel version and post the vmcore somewhere?
> > >
> > > Brian
> > >
> > > > Thanks // Hugo
> > > > ​
> > > >
> > > > 2015-06-18 22:59 GMT+08:00 Eric Sandeen <sandeen@sandeen.net>:
> > > >
> > > > > On 6/18/15 9:29 AM, Kuo Hugo wrote:
> > > > > >>- Have you tried an 'xfs_repair -n' of the affected filesystem?
> Note
> > > > > that -n will report problems only and prevent any modification by
> > > repair.
> > > > > >
> > > > > > *We might to to xfs_repair if we can address which disk causes
> the
> > > > > issue. *
> > > > >
> > > > > If you do, please save the output, and if it finds anything, please
> > > > > provide the output in this thread.
> > > > >
> > > > > Thanks,
> > > > > -Eric
> > > > >
> > >
> > > > _______________________________________________
> > > > xfs mailing list
> > > > xfs@oss.sgi.com
> > > > http://oss.sgi.com/mailman/listinfo/xfs
> > >
> > >
>

[-- Attachment #1.2: Type: text/html, Size: 12697 bytes --]

[-- Attachment #2: Type: text/plain, Size: 121 bytes --]

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Data can't be wrote to XFS RIP [<ffffffffa041a99a>] xfs_dir2_sf_get_parent_ino+0xa/0x20
  2015-07-09 16:40               ` Kuo Hugo
@ 2015-07-09 18:32                 ` Brian Foster
  2015-07-10  5:36                   ` Kuo Hugo
  0 siblings, 1 reply; 23+ messages in thread
From: Brian Foster @ 2015-07-09 18:32 UTC (permalink / raw)
  To: Kuo Hugo; +Cc: Hugo Kuo, Eric Sandeen, darrell, xfs

On Fri, Jul 10, 2015 at 12:40:00AM +0800, Kuo Hugo wrote:
> Hi Brain,
> 
> There you go.
> 
> https://cloud.swiftstack.com/v1/AUTH_hugo/public/vmlinux
> https://cloud.swiftstack.com/v1/AUTH_hugo/public/System.map-2.6.32-504.23.4.el6.x86_64
> 
> $ md5sum vmlinux
> 82aaa694a174c0a29e78c05e73adf5d8  vmlinux
> 
> Yes, I can read it with this vmlinux image. Put all files
> (vmcore,vmlinux,System.map) in a folder and run $crash vmlinux vmcore
> 

Thanks, I can actually load that up now. Note that we'll probably need
the modules and whatnot (xfs.ko) also to be able to look at any XFS
bits. It might be easiest to just tar up and compress whatever directory
structure has the debug-enabled vmlinux and all the kernel modules.
Thanks.

Brian

> Hugo
> ​
> 
> 2015-07-09 23:18 GMT+08:00 Brian Foster <bfoster@redhat.com>:
> 
> > On Thu, Jul 09, 2015 at 09:20:00PM +0800, Kuo Hugo wrote:
> > > Hi Brian,
> > >
> > > *Operating System Version:*
> > > Linux-2.6.32-504.23.4.el6.x86_64-x86_64-with-centos-6.6-Final
> > >
> > > *NODE 1*
> > >
> > > https://cloud.swiftstack.com/v1/AUTH_burton/brtnswift/vmcore
> > > https://cloud.swiftstack.com/v1/AUTH_burton/brtnswift/vmcore-dmesg.txt
> > >
> > >
> > > *NODE 2*
> > >
> > > https://cloud.swiftstack.com/v1/AUTH_burton/brtnswift/vmcore_r2obj02
> > >
> > https://cloud.swiftstack.com/v1/AUTH_burton/brtnswift/vmcore-dmesg_r2obj02.txt
> > >
> > >
> > > Any thoughts would be appreciate
> > >
> >
> > I'm not able to fire up crash with these core files and the kernel debug
> > info from the following centos kernel debuginfo package:
> >
> > kernel-debuginfo-2.6.32-504.23.4.el6.centos.plus.x86_64.rpm
> >
> > It complains about a version mismatch between the vmlinux and core file.
> > I'm no crash expert... are you sure the cores above correspond to this
> > kernel? Does crash load up for you on said box if you run something like
> > the following?
> >
> >         crash /usr/lib/debug/lib/modules/.../vmlinux vmcore
> >
> > Note that you might need to install the above kernel-debuginfo package
> > to get the debug (vmlinux) file. If so, could you also upload that
> > debuginfo rpm somewhere?
> >
> > Brian
> >
> > > Thanks // Hugo
> > >
> > >
> > > 2015-07-09 20:51 GMT+08:00 Brian Foster <bfoster@redhat.com>:
> > >
> > > > On Thu, Jul 09, 2015 at 06:57:55PM +0800, Kuo Hugo wrote:
> > > > > Hi Folks,
> > > > >
> > > > > As the results of 32 disks with xfs_repair -n seems no any error
> > shows
> > > > up.
> > > > > We currently tried to deploy CentOS 6.6 for testing. (The previous
> > kernel
> > > > > panic was came from Ubuntu).
> > > > > The CentOS nodes encountered kernel panic with same daemon but the
> > > > problem
> > > > > may a bit differ.
> > > > >
> > > > >    - It was broken on xfs_dir2_sf_get_parent_ino+0xa/0x20 in Ubuntu.
> > > > >    - Here’s the log in CentOS. It’s broken on
> > > > >    xfs_dir2_sf_getdents+0x2a0/0x3a0
> > > > >
> > > >
> > > > I'd venture to guess it's the same behavior here. The previous kernel
> > > > had a callback for the parent inode number that was called via
> > > > xfs_dir2_sf_getdents(). Taking a look at a 6.6 kernel, it has a static
> > > > inline here instead.
> > > >
> > > > > <1>BUG: unable to handle kernel NULL pointer dereference at
> > > > 0000000000000001
> > > > > <1>IP: [<ffffffffa0362d60>] xfs_dir2_sf_getdents+0x2a0/0x3a0 [xfs]
> > > > > <4>PGD 1072327067 PUD 1072328067 PMD 0
> > > > > <4>Oops: 0000 [#1] SMP
> > > > > <4>last sysfs file:
> > > > >
> > > >
> > /sys/devices/pci0000:80/0000:80:03.2/0000:83:00.0/host10/port-10:1/expander-10:1/port-10:1:16/end_device-10:1:16/target10:0:25/10:0:25:0/block/sdz/queue/rotational
> > > > > <4>CPU 17
> > > > > <4>Modules linked in: xt_conntrack tun xfs exportfs iptable_filter
> > > > > ipt_REDIRECT iptable_nat nf_nat nf_conntrack_ipv4 nf_conntrack
> > > > > nf_defrag_ipv4 ip_tables ip_vs ipv6 libcrc32c iTCO_wdt
> > > > > iTCO_vendor_support ses enclosure igb i2c_algo_bit sb_edac edac_core
> > > > > i2c_i801 i2c_core sg shpchp lpc_ich mfd_core ixgbe dca ptp pps_core
> > > > > mdio power_meter acpi_ipmi ipmi_si ipmi_msghandler ext4 jbd2 mbcache
> > > > > sd_mod crc_t10dif mpt3sas scsi_transport_sas raid_class xhci_hcd ahci
> > > > > wmi dm_mirror dm_region_hash dm_log dm_mod [last unloaded:
> > > > > scsi_wait_scan]
> > > > > <4>
> > > > > <4>Pid: 4454, comm: swift-object-se Not tainted
> > > > > 2.6.32-504.23.4.el6.x86_64 #1 Silicon Mechanics Storform
> > > > > R518.v5P/X10DRi-T4+
> > > > > <4>RIP: 0010:[<ffffffffa0362d60>]  [<ffffffffa0362d60>]
> > > > > xfs_dir2_sf_getdents+0x2a0/0x3a0 [xfs]
> > > > > <4>RSP: 0018:ffff880871f6de18  EFLAGS: 00010202
> > > > > <4>RAX: 0000000000000000 RBX: 0000000000000004 RCX: 0000000000000000
> > > > > <4>RDX: 0000000000000001 RSI: 0000000000000000 RDI: 00007faa74006203
> > > > > <4>RBP: ffff880871f6de68 R08: 000000032eb04bc9 R09: 0000000000000004
> > > > > <4>R10: 0000000000008030 R11: 0000000000000246 R12: 0000000000000000
> > > > > <4>R13: 0000000000000002 R14: ffff88106eff7000 R15: ffff8808715b4580
> > > > > <4>FS:  00007faa85425700(0000) GS:ffff880028360000(0000)
> > > > knlGS:0000000000000000
> > > > > <4>CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > > > > <4>CR2: 0000000000000001 CR3: 0000001072325000 CR4: 00000000001407e0
> > > > > <4>DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> > > > > <4>DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
> > > > > <4>Process swift-object-se (pid: 4454, threadinfo ffff880871f6c000,
> > > > > task ffff880860f18ab0)
> > > > > <4>Stack:
> > > > > <4> ffff880871f6de28 ffffffff811a4bb0 ffff880871f6df38
> > ffff880874749cc0
> > > > > <4><d> 0000000100000103 ffff8802381f8c00 ffff880871f6df38
> > > > ffff8808715b4580
> > > > > <4><d> 0000000000000082 ffff8802381f8d88 ffff880871f6dec8
> > > > ffffffffa035ab31
> > > > > <4>Call Trace:
> > > > > <4> [<ffffffff811a4bb0>] ? filldir+0x0/0xe0
> > > > > <4> [<ffffffffa035ab31>] xfs_readdir+0xe1/0x130 [xfs]
> > > > > <4> [<ffffffff811a4bb0>] ? filldir+0x0/0xe0
> > > > > <4> [<ffffffffa038fe29>] xfs_file_readdir+0x39/0x50 [xfs]
> > > > > <4> [<ffffffff811a4e30>] vfs_readdir+0xc0/0xe0
> > > > > <4> [<ffffffff8119bd86>] ? final_putname+0x26/0x50
> > > > > <4> [<ffffffff811a4fb9>] sys_getdents+0x89/0xf0
> > > > > <4> [<ffffffff8100b0f2>] system_call_fastpath+0x16/0x1b
> > > > > <4>Code: 01 00 00 00 48 c7 c6 38 6b 3a a0 48 8b 7d c0 ff 55 b8 85 c0
> > > > > 0f 85 af 00 00 00 49 8b 37 e9 ec fd ff ff 66 0f 1f 84 00 00 00 00 00
> > > > > <41> 80 7c 24 01 00 0f 84 9c 00 00 00 45 0f b6 44 24 03 41 0f b6
> > > > > <1>RIP  [<ffffffffa0362d60>] xfs_dir2_sf_getdents+0x2a0/0x3a0 [xfs]
> > > > > <4> RSP <ffff880871f6de18>
> > > > > <4>CR2: 0000000000000001
> > > > >
> > > > ...
> > > > >
> > > > > I’ve got the vmcore dump from operator. Does vmcore help for
> > > > > troubleshooting kind issue ?
> > > > >
> > > >
> > > > Hmm, well it couldn't hurt. Is the vmcore based on this 6.6 kernel? Can
> > > > you provide the exact kernel version and post the vmcore somewhere?
> > > >
> > > > Brian
> > > >
> > > > > Thanks // Hugo
> > > > > ​
> > > > >
> > > > > 2015-06-18 22:59 GMT+08:00 Eric Sandeen <sandeen@sandeen.net>:
> > > > >
> > > > > > On 6/18/15 9:29 AM, Kuo Hugo wrote:
> > > > > > >>- Have you tried an 'xfs_repair -n' of the affected filesystem?
> > Note
> > > > > > that -n will report problems only and prevent any modification by
> > > > repair.
> > > > > > >
> > > > > > > *We might to to xfs_repair if we can address which disk causes
> > the
> > > > > > issue. *
> > > > > >
> > > > > > If you do, please save the output, and if it finds anything, please
> > > > > > provide the output in this thread.
> > > > > >
> > > > > > Thanks,
> > > > > > -Eric
> > > > > >
> > > >
> > > > > _______________________________________________
> > > > > xfs mailing list
> > > > > xfs@oss.sgi.com
> > > > > http://oss.sgi.com/mailman/listinfo/xfs
> > > >
> > > >
> >

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Data can't be wrote to XFS RIP [<ffffffffa041a99a>] xfs_dir2_sf_get_parent_ino+0xa/0x20
  2015-07-09 18:32                 ` Brian Foster
@ 2015-07-10  5:36                   ` Kuo Hugo
  2015-07-10 10:39                     ` Kuo Hugo
  2015-07-13 12:52                     ` Brian Foster
  0 siblings, 2 replies; 23+ messages in thread
From: Kuo Hugo @ 2015-07-10  5:36 UTC (permalink / raw)
  To: Brian Foster; +Cc: Hugo Kuo, Eric Sandeen, darrell, xfs


[-- Attachment #1.1: Type: text/plain, Size: 9196 bytes --]

Hi Brain,

Is this the file which you need ?

https://cloud.swiftstack.com/v1/AUTH_hugo/public/xfs.ko

$> modinfo xfs

filename: /lib/modules/2.6.32-504.23.4.el6.x86_64/kernel/fs/xfs/xfs.ko
license: GPL
description: SGI XFS with ACLs, security attributes, large block/inode
numbers, no debug enabled
author: Silicon Graphics, Inc.
srcversion: 0C1B17926BDDA4F121479EE
depends: exportfs
vermagic: 2.6.32-504.23.4.el6.x86_64 SMP mod_unload modversion

Thanks // Hugo
​

2015-07-10 2:32 GMT+08:00 Brian Foster <bfoster@redhat.com>:

> On Fri, Jul 10, 2015 at 12:40:00AM +0800, Kuo Hugo wrote:
> > Hi Brain,
> >
> > There you go.
> >
> > https://cloud.swiftstack.com/v1/AUTH_hugo/public/vmlinux
> >
> https://cloud.swiftstack.com/v1/AUTH_hugo/public/System.map-2.6.32-504.23.4.el6.x86_64
> >
> > $ md5sum vmlinux
> > 82aaa694a174c0a29e78c05e73adf5d8  vmlinux
> >
> > Yes, I can read it with this vmlinux image. Put all files
> > (vmcore,vmlinux,System.map) in a folder and run $crash vmlinux vmcore
> >
>
> Thanks, I can actually load that up now. Note that we'll probably need
> the modules and whatnot (xfs.ko) also to be able to look at any XFS
> bits. It might be easiest to just tar up and compress whatever directory
> structure has the debug-enabled vmlinux and all the kernel modules.
> Thanks.
>
> Brian
>
> > Hugo
> > ​
> >
> > 2015-07-09 23:18 GMT+08:00 Brian Foster <bfoster@redhat.com>:
> >
> > > On Thu, Jul 09, 2015 at 09:20:00PM +0800, Kuo Hugo wrote:
> > > > Hi Brian,
> > > >
> > > > *Operating System Version:*
> > > > Linux-2.6.32-504.23.4.el6.x86_64-x86_64-with-centos-6.6-Final
> > > >
> > > > *NODE 1*
> > > >
> > > > https://cloud.swiftstack.com/v1/AUTH_burton/brtnswift/vmcore
> > > >
> https://cloud.swiftstack.com/v1/AUTH_burton/brtnswift/vmcore-dmesg.txt
> > > >
> > > >
> > > > *NODE 2*
> > > >
> > > > https://cloud.swiftstack.com/v1/AUTH_burton/brtnswift/vmcore_r2obj02
> > > >
> > >
> https://cloud.swiftstack.com/v1/AUTH_burton/brtnswift/vmcore-dmesg_r2obj02.txt
> > > >
> > > >
> > > > Any thoughts would be appreciate
> > > >
> > >
> > > I'm not able to fire up crash with these core files and the kernel
> debug
> > > info from the following centos kernel debuginfo package:
> > >
> > > kernel-debuginfo-2.6.32-504.23.4.el6.centos.plus.x86_64.rpm
> > >
> > > It complains about a version mismatch between the vmlinux and core
> file.
> > > I'm no crash expert... are you sure the cores above correspond to this
> > > kernel? Does crash load up for you on said box if you run something
> like
> > > the following?
> > >
> > >         crash /usr/lib/debug/lib/modules/.../vmlinux vmcore
> > >
> > > Note that you might need to install the above kernel-debuginfo package
> > > to get the debug (vmlinux) file. If so, could you also upload that
> > > debuginfo rpm somewhere?
> > >
> > > Brian
> > >
> > > > Thanks // Hugo
> > > >
> > > >
> > > > 2015-07-09 20:51 GMT+08:00 Brian Foster <bfoster@redhat.com>:
> > > >
> > > > > On Thu, Jul 09, 2015 at 06:57:55PM +0800, Kuo Hugo wrote:
> > > > > > Hi Folks,
> > > > > >
> > > > > > As the results of 32 disks with xfs_repair -n seems no any error
> > > shows
> > > > > up.
> > > > > > We currently tried to deploy CentOS 6.6 for testing. (The
> previous
> > > kernel
> > > > > > panic was came from Ubuntu).
> > > > > > The CentOS nodes encountered kernel panic with same daemon but
> the
> > > > > problem
> > > > > > may a bit differ.
> > > > > >
> > > > > >    - It was broken on xfs_dir2_sf_get_parent_ino+0xa/0x20 in
> Ubuntu.
> > > > > >    - Here’s the log in CentOS. It’s broken on
> > > > > >    xfs_dir2_sf_getdents+0x2a0/0x3a0
> > > > > >
> > > > >
> > > > > I'd venture to guess it's the same behavior here. The previous
> kernel
> > > > > had a callback for the parent inode number that was called via
> > > > > xfs_dir2_sf_getdents(). Taking a look at a 6.6 kernel, it has a
> static
> > > > > inline here instead.
> > > > >
> > > > > > <1>BUG: unable to handle kernel NULL pointer dereference at
> > > > > 0000000000000001
> > > > > > <1>IP: [<ffffffffa0362d60>] xfs_dir2_sf_getdents+0x2a0/0x3a0
> [xfs]
> > > > > > <4>PGD 1072327067 PUD 1072328067 PMD 0
> > > > > > <4>Oops: 0000 [#1] SMP
> > > > > > <4>last sysfs file:
> > > > > >
> > > > >
> > >
> /sys/devices/pci0000:80/0000:80:03.2/0000:83:00.0/host10/port-10:1/expander-10:1/port-10:1:16/end_device-10:1:16/target10:0:25/10:0:25:0/block/sdz/queue/rotational
> > > > > > <4>CPU 17
> > > > > > <4>Modules linked in: xt_conntrack tun xfs exportfs
> iptable_filter
> > > > > > ipt_REDIRECT iptable_nat nf_nat nf_conntrack_ipv4 nf_conntrack
> > > > > > nf_defrag_ipv4 ip_tables ip_vs ipv6 libcrc32c iTCO_wdt
> > > > > > iTCO_vendor_support ses enclosure igb i2c_algo_bit sb_edac
> edac_core
> > > > > > i2c_i801 i2c_core sg shpchp lpc_ich mfd_core ixgbe dca ptp
> pps_core
> > > > > > mdio power_meter acpi_ipmi ipmi_si ipmi_msghandler ext4 jbd2
> mbcache
> > > > > > sd_mod crc_t10dif mpt3sas scsi_transport_sas raid_class xhci_hcd
> ahci
> > > > > > wmi dm_mirror dm_region_hash dm_log dm_mod [last unloaded:
> > > > > > scsi_wait_scan]
> > > > > > <4>
> > > > > > <4>Pid: 4454, comm: swift-object-se Not tainted
> > > > > > 2.6.32-504.23.4.el6.x86_64 #1 Silicon Mechanics Storform
> > > > > > R518.v5P/X10DRi-T4+
> > > > > > <4>RIP: 0010:[<ffffffffa0362d60>]  [<ffffffffa0362d60>]
> > > > > > xfs_dir2_sf_getdents+0x2a0/0x3a0 [xfs]
> > > > > > <4>RSP: 0018:ffff880871f6de18  EFLAGS: 00010202
> > > > > > <4>RAX: 0000000000000000 RBX: 0000000000000004 RCX:
> 0000000000000000
> > > > > > <4>RDX: 0000000000000001 RSI: 0000000000000000 RDI:
> 00007faa74006203
> > > > > > <4>RBP: ffff880871f6de68 R08: 000000032eb04bc9 R09:
> 0000000000000004
> > > > > > <4>R10: 0000000000008030 R11: 0000000000000246 R12:
> 0000000000000000
> > > > > > <4>R13: 0000000000000002 R14: ffff88106eff7000 R15:
> ffff8808715b4580
> > > > > > <4>FS:  00007faa85425700(0000) GS:ffff880028360000(0000)
> > > > > knlGS:0000000000000000
> > > > > > <4>CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > > > > > <4>CR2: 0000000000000001 CR3: 0000001072325000 CR4:
> 00000000001407e0
> > > > > > <4>DR0: 0000000000000000 DR1: 0000000000000000 DR2:
> 0000000000000000
> > > > > > <4>DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7:
> 0000000000000400
> > > > > > <4>Process swift-object-se (pid: 4454, threadinfo
> ffff880871f6c000,
> > > > > > task ffff880860f18ab0)
> > > > > > <4>Stack:
> > > > > > <4> ffff880871f6de28 ffffffff811a4bb0 ffff880871f6df38
> > > ffff880874749cc0
> > > > > > <4><d> 0000000100000103 ffff8802381f8c00 ffff880871f6df38
> > > > > ffff8808715b4580
> > > > > > <4><d> 0000000000000082 ffff8802381f8d88 ffff880871f6dec8
> > > > > ffffffffa035ab31
> > > > > > <4>Call Trace:
> > > > > > <4> [<ffffffff811a4bb0>] ? filldir+0x0/0xe0
> > > > > > <4> [<ffffffffa035ab31>] xfs_readdir+0xe1/0x130 [xfs]
> > > > > > <4> [<ffffffff811a4bb0>] ? filldir+0x0/0xe0
> > > > > > <4> [<ffffffffa038fe29>] xfs_file_readdir+0x39/0x50 [xfs]
> > > > > > <4> [<ffffffff811a4e30>] vfs_readdir+0xc0/0xe0
> > > > > > <4> [<ffffffff8119bd86>] ? final_putname+0x26/0x50
> > > > > > <4> [<ffffffff811a4fb9>] sys_getdents+0x89/0xf0
> > > > > > <4> [<ffffffff8100b0f2>] system_call_fastpath+0x16/0x1b
> > > > > > <4>Code: 01 00 00 00 48 c7 c6 38 6b 3a a0 48 8b 7d c0 ff 55 b8
> 85 c0
> > > > > > 0f 85 af 00 00 00 49 8b 37 e9 ec fd ff ff 66 0f 1f 84 00 00 00
> 00 00
> > > > > > <41> 80 7c 24 01 00 0f 84 9c 00 00 00 45 0f b6 44 24 03 41 0f b6
> > > > > > <1>RIP  [<ffffffffa0362d60>] xfs_dir2_sf_getdents+0x2a0/0x3a0
> [xfs]
> > > > > > <4> RSP <ffff880871f6de18>
> > > > > > <4>CR2: 0000000000000001
> > > > > >
> > > > > ...
> > > > > >
> > > > > > I’ve got the vmcore dump from operator. Does vmcore help for
> > > > > > troubleshooting kind issue ?
> > > > > >
> > > > >
> > > > > Hmm, well it couldn't hurt. Is the vmcore based on this 6.6
> kernel? Can
> > > > > you provide the exact kernel version and post the vmcore somewhere?
> > > > >
> > > > > Brian
> > > > >
> > > > > > Thanks // Hugo
> > > > > > ​
> > > > > >
> > > > > > 2015-06-18 22:59 GMT+08:00 Eric Sandeen <sandeen@sandeen.net>:
> > > > > >
> > > > > > > On 6/18/15 9:29 AM, Kuo Hugo wrote:
> > > > > > > >>- Have you tried an 'xfs_repair -n' of the affected
> filesystem?
> > > Note
> > > > > > > that -n will report problems only and prevent any modification
> by
> > > > > repair.
> > > > > > > >
> > > > > > > > *We might to to xfs_repair if we can address which disk
> causes
> > > the
> > > > > > > issue. *
> > > > > > >
> > > > > > > If you do, please save the output, and if it finds anything,
> please
> > > > > > > provide the output in this thread.
> > > > > > >
> > > > > > > Thanks,
> > > > > > > -Eric
> > > > > > >
> > > > >
> > > > > > _______________________________________________
> > > > > > xfs mailing list
> > > > > > xfs@oss.sgi.com
> > > > > > http://oss.sgi.com/mailman/listinfo/xfs
> > > > >
> > > > >
> > >
>

[-- Attachment #1.2: Type: text/html, Size: 15294 bytes --]

[-- Attachment #2: Type: text/plain, Size: 121 bytes --]

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Data can't be wrote to XFS RIP [<ffffffffa041a99a>] xfs_dir2_sf_get_parent_ino+0xa/0x20
  2015-07-10  5:36                   ` Kuo Hugo
@ 2015-07-10 10:39                     ` Kuo Hugo
  2015-07-10 16:25                       ` Kuo Hugo
  2015-07-13 12:52                     ` Brian Foster
  1 sibling, 1 reply; 23+ messages in thread
From: Kuo Hugo @ 2015-07-10 10:39 UTC (permalink / raw)
  To: Brian Foster; +Cc: Hugo Kuo, Eric Sandeen, darrell, xfs


[-- Attachment #1.1.1: Type: text/plain, Size: 9635 bytes --]

We observed this message this morning on the node. Is it possible a related
issue?


​

2015-07-10 13:36 GMT+08:00 Kuo Hugo <tonytkdk@gmail.com>:

> Hi Brain,
>
> Is this the file which you need ?
>
> https://cloud.swiftstack.com/v1/AUTH_hugo/public/xfs.ko
>
> $> modinfo xfs
>
> filename: /lib/modules/2.6.32-504.23.4.el6.x86_64/kernel/fs/xfs/xfs.ko
> license: GPL
> description: SGI XFS with ACLs, security attributes, large block/inode numbers, no debug enabled
> author: Silicon Graphics, Inc.
> srcversion: 0C1B17926BDDA4F121479EE
> depends: exportfs
> vermagic: 2.6.32-504.23.4.el6.x86_64 SMP mod_unload modversion
>
> Thanks // Hugo
> ​
>
> 2015-07-10 2:32 GMT+08:00 Brian Foster <bfoster@redhat.com>:
>
>> On Fri, Jul 10, 2015 at 12:40:00AM +0800, Kuo Hugo wrote:
>> > Hi Brain,
>> >
>> > There you go.
>> >
>> > https://cloud.swiftstack.com/v1/AUTH_hugo/public/vmlinux
>> >
>> https://cloud.swiftstack.com/v1/AUTH_hugo/public/System.map-2.6.32-504.23.4.el6.x86_64
>> >
>> > $ md5sum vmlinux
>> > 82aaa694a174c0a29e78c05e73adf5d8  vmlinux
>> >
>> > Yes, I can read it with this vmlinux image. Put all files
>> > (vmcore,vmlinux,System.map) in a folder and run $crash vmlinux vmcore
>> >
>>
>> Thanks, I can actually load that up now. Note that we'll probably need
>> the modules and whatnot (xfs.ko) also to be able to look at any XFS
>> bits. It might be easiest to just tar up and compress whatever directory
>> structure has the debug-enabled vmlinux and all the kernel modules.
>> Thanks.
>>
>> Brian
>>
>> > Hugo
>> > ​
>> >
>> > 2015-07-09 23:18 GMT+08:00 Brian Foster <bfoster@redhat.com>:
>> >
>> > > On Thu, Jul 09, 2015 at 09:20:00PM +0800, Kuo Hugo wrote:
>> > > > Hi Brian,
>> > > >
>> > > > *Operating System Version:*
>> > > > Linux-2.6.32-504.23.4.el6.x86_64-x86_64-with-centos-6.6-Final
>> > > >
>> > > > *NODE 1*
>> > > >
>> > > > https://cloud.swiftstack.com/v1/AUTH_burton/brtnswift/vmcore
>> > > >
>> https://cloud.swiftstack.com/v1/AUTH_burton/brtnswift/vmcore-dmesg.txt
>> > > >
>> > > >
>> > > > *NODE 2*
>> > > >
>> > > >
>> https://cloud.swiftstack.com/v1/AUTH_burton/brtnswift/vmcore_r2obj02
>> > > >
>> > >
>> https://cloud.swiftstack.com/v1/AUTH_burton/brtnswift/vmcore-dmesg_r2obj02.txt
>> > > >
>> > > >
>> > > > Any thoughts would be appreciate
>> > > >
>> > >
>> > > I'm not able to fire up crash with these core files and the kernel
>> debug
>> > > info from the following centos kernel debuginfo package:
>> > >
>> > > kernel-debuginfo-2.6.32-504.23.4.el6.centos.plus.x86_64.rpm
>> > >
>> > > It complains about a version mismatch between the vmlinux and core
>> file.
>> > > I'm no crash expert... are you sure the cores above correspond to this
>> > > kernel? Does crash load up for you on said box if you run something
>> like
>> > > the following?
>> > >
>> > >         crash /usr/lib/debug/lib/modules/.../vmlinux vmcore
>> > >
>> > > Note that you might need to install the above kernel-debuginfo package
>> > > to get the debug (vmlinux) file. If so, could you also upload that
>> > > debuginfo rpm somewhere?
>> > >
>> > > Brian
>> > >
>> > > > Thanks // Hugo
>> > > >
>> > > >
>> > > > 2015-07-09 20:51 GMT+08:00 Brian Foster <bfoster@redhat.com>:
>> > > >
>> > > > > On Thu, Jul 09, 2015 at 06:57:55PM +0800, Kuo Hugo wrote:
>> > > > > > Hi Folks,
>> > > > > >
>> > > > > > As the results of 32 disks with xfs_repair -n seems no any error
>> > > shows
>> > > > > up.
>> > > > > > We currently tried to deploy CentOS 6.6 for testing. (The
>> previous
>> > > kernel
>> > > > > > panic was came from Ubuntu).
>> > > > > > The CentOS nodes encountered kernel panic with same daemon but
>> the
>> > > > > problem
>> > > > > > may a bit differ.
>> > > > > >
>> > > > > >    - It was broken on xfs_dir2_sf_get_parent_ino+0xa/0x20 in
>> Ubuntu.
>> > > > > >    - Here’s the log in CentOS. It’s broken on
>> > > > > >    xfs_dir2_sf_getdents+0x2a0/0x3a0
>> > > > > >
>> > > > >
>> > > > > I'd venture to guess it's the same behavior here. The previous
>> kernel
>> > > > > had a callback for the parent inode number that was called via
>> > > > > xfs_dir2_sf_getdents(). Taking a look at a 6.6 kernel, it has a
>> static
>> > > > > inline here instead.
>> > > > >
>> > > > > > <1>BUG: unable to handle kernel NULL pointer dereference at
>> > > > > 0000000000000001
>> > > > > > <1>IP: [<ffffffffa0362d60>] xfs_dir2_sf_getdents+0x2a0/0x3a0
>> [xfs]
>> > > > > > <4>PGD 1072327067 PUD 1072328067 PMD 0
>> > > > > > <4>Oops: 0000 [#1] SMP
>> > > > > > <4>last sysfs file:
>> > > > > >
>> > > > >
>> > >
>> /sys/devices/pci0000:80/0000:80:03.2/0000:83:00.0/host10/port-10:1/expander-10:1/port-10:1:16/end_device-10:1:16/target10:0:25/10:0:25:0/block/sdz/queue/rotational
>> > > > > > <4>CPU 17
>> > > > > > <4>Modules linked in: xt_conntrack tun xfs exportfs
>> iptable_filter
>> > > > > > ipt_REDIRECT iptable_nat nf_nat nf_conntrack_ipv4 nf_conntrack
>> > > > > > nf_defrag_ipv4 ip_tables ip_vs ipv6 libcrc32c iTCO_wdt
>> > > > > > iTCO_vendor_support ses enclosure igb i2c_algo_bit sb_edac
>> edac_core
>> > > > > > i2c_i801 i2c_core sg shpchp lpc_ich mfd_core ixgbe dca ptp
>> pps_core
>> > > > > > mdio power_meter acpi_ipmi ipmi_si ipmi_msghandler ext4 jbd2
>> mbcache
>> > > > > > sd_mod crc_t10dif mpt3sas scsi_transport_sas raid_class
>> xhci_hcd ahci
>> > > > > > wmi dm_mirror dm_region_hash dm_log dm_mod [last unloaded:
>> > > > > > scsi_wait_scan]
>> > > > > > <4>
>> > > > > > <4>Pid: 4454, comm: swift-object-se Not tainted
>> > > > > > 2.6.32-504.23.4.el6.x86_64 #1 Silicon Mechanics Storform
>> > > > > > R518.v5P/X10DRi-T4+
>> > > > > > <4>RIP: 0010:[<ffffffffa0362d60>]  [<ffffffffa0362d60>]
>> > > > > > xfs_dir2_sf_getdents+0x2a0/0x3a0 [xfs]
>> > > > > > <4>RSP: 0018:ffff880871f6de18  EFLAGS: 00010202
>> > > > > > <4>RAX: 0000000000000000 RBX: 0000000000000004 RCX:
>> 0000000000000000
>> > > > > > <4>RDX: 0000000000000001 RSI: 0000000000000000 RDI:
>> 00007faa74006203
>> > > > > > <4>RBP: ffff880871f6de68 R08: 000000032eb04bc9 R09:
>> 0000000000000004
>> > > > > > <4>R10: 0000000000008030 R11: 0000000000000246 R12:
>> 0000000000000000
>> > > > > > <4>R13: 0000000000000002 R14: ffff88106eff7000 R15:
>> ffff8808715b4580
>> > > > > > <4>FS:  00007faa85425700(0000) GS:ffff880028360000(0000)
>> > > > > knlGS:0000000000000000
>> > > > > > <4>CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>> > > > > > <4>CR2: 0000000000000001 CR3: 0000001072325000 CR4:
>> 00000000001407e0
>> > > > > > <4>DR0: 0000000000000000 DR1: 0000000000000000 DR2:
>> 0000000000000000
>> > > > > > <4>DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7:
>> 0000000000000400
>> > > > > > <4>Process swift-object-se (pid: 4454, threadinfo
>> ffff880871f6c000,
>> > > > > > task ffff880860f18ab0)
>> > > > > > <4>Stack:
>> > > > > > <4> ffff880871f6de28 ffffffff811a4bb0 ffff880871f6df38
>> > > ffff880874749cc0
>> > > > > > <4><d> 0000000100000103 ffff8802381f8c00 ffff880871f6df38
>> > > > > ffff8808715b4580
>> > > > > > <4><d> 0000000000000082 ffff8802381f8d88 ffff880871f6dec8
>> > > > > ffffffffa035ab31
>> > > > > > <4>Call Trace:
>> > > > > > <4> [<ffffffff811a4bb0>] ? filldir+0x0/0xe0
>> > > > > > <4> [<ffffffffa035ab31>] xfs_readdir+0xe1/0x130 [xfs]
>> > > > > > <4> [<ffffffff811a4bb0>] ? filldir+0x0/0xe0
>> > > > > > <4> [<ffffffffa038fe29>] xfs_file_readdir+0x39/0x50 [xfs]
>> > > > > > <4> [<ffffffff811a4e30>] vfs_readdir+0xc0/0xe0
>> > > > > > <4> [<ffffffff8119bd86>] ? final_putname+0x26/0x50
>> > > > > > <4> [<ffffffff811a4fb9>] sys_getdents+0x89/0xf0
>> > > > > > <4> [<ffffffff8100b0f2>] system_call_fastpath+0x16/0x1b
>> > > > > > <4>Code: 01 00 00 00 48 c7 c6 38 6b 3a a0 48 8b 7d c0 ff 55 b8
>> 85 c0
>> > > > > > 0f 85 af 00 00 00 49 8b 37 e9 ec fd ff ff 66 0f 1f 84 00 00 00
>> 00 00
>> > > > > > <41> 80 7c 24 01 00 0f 84 9c 00 00 00 45 0f b6 44 24 03 41 0f b6
>> > > > > > <1>RIP  [<ffffffffa0362d60>] xfs_dir2_sf_getdents+0x2a0/0x3a0
>> [xfs]
>> > > > > > <4> RSP <ffff880871f6de18>
>> > > > > > <4>CR2: 0000000000000001
>> > > > > >
>> > > > > ...
>> > > > > >
>> > > > > > I’ve got the vmcore dump from operator. Does vmcore help for
>> > > > > > troubleshooting kind issue ?
>> > > > > >
>> > > > >
>> > > > > Hmm, well it couldn't hurt. Is the vmcore based on this 6.6
>> kernel? Can
>> > > > > you provide the exact kernel version and post the vmcore
>> somewhere?
>> > > > >
>> > > > > Brian
>> > > > >
>> > > > > > Thanks // Hugo
>> > > > > > ​
>> > > > > >
>> > > > > > 2015-06-18 22:59 GMT+08:00 Eric Sandeen <sandeen@sandeen.net>:
>> > > > > >
>> > > > > > > On 6/18/15 9:29 AM, Kuo Hugo wrote:
>> > > > > > > >>- Have you tried an 'xfs_repair -n' of the affected
>> filesystem?
>> > > Note
>> > > > > > > that -n will report problems only and prevent any
>> modification by
>> > > > > repair.
>> > > > > > > >
>> > > > > > > > *We might to to xfs_repair if we can address which disk
>> causes
>> > > the
>> > > > > > > issue. *
>> > > > > > >
>> > > > > > > If you do, please save the output, and if it finds anything,
>> please
>> > > > > > > provide the output in this thread.
>> > > > > > >
>> > > > > > > Thanks,
>> > > > > > > -Eric
>> > > > > > >
>> > > > >
>> > > > > > _______________________________________________
>> > > > > > xfs mailing list
>> > > > > > xfs@oss.sgi.com
>> > > > > > http://oss.sgi.com/mailman/listinfo/xfs
>> > > > >
>> > > > >
>> > >
>>
>
>

[-- Attachment #1.1.2: Type: text/html, Size: 15887 bytes --]

[-- Attachment #1.2: r2obj01_sdh_xfs_error.png --]
[-- Type: image/png, Size: 56919 bytes --]

[-- Attachment #2: Type: text/plain, Size: 121 bytes --]

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Data can't be wrote to XFS RIP [<ffffffffa041a99a>] xfs_dir2_sf_get_parent_ino+0xa/0x20
  2015-07-10 10:39                     ` Kuo Hugo
@ 2015-07-10 16:25                       ` Kuo Hugo
  0 siblings, 0 replies; 23+ messages in thread
From: Kuo Hugo @ 2015-07-10 16:25 UTC (permalink / raw)
  To: Brian Foster; +Cc: Hugo Kuo, Eric Sandeen, Darrell Bishop, xfs


[-- Attachment #1.1.1: Type: text/plain, Size: 10229 bytes --]

Finally got the results of xfs_repair -n on all disks for this server.
There's no any warning or error from my observation.

https://cloud.swiftstack.com/v1/AUTH_burton/brtnswift/r2obj01.burton.com-xfs-repair-n.txt

Thanks // Hugo

2015-07-10 18:39 GMT+08:00 Kuo Hugo <tonytkdk@gmail.com>:

> We observed this message this morning on the node. Is it possible a
> related issue?
>
>
> ​
>
> 2015-07-10 13:36 GMT+08:00 Kuo Hugo <tonytkdk@gmail.com>:
>
>> Hi Brain,
>>
>> Is this the file which you need ?
>>
>> https://cloud.swiftstack.com/v1/AUTH_hugo/public/xfs.ko
>>
>> $> modinfo xfs
>>
>> filename: /lib/modules/2.6.32-504.23.4.el6.x86_64/kernel/fs/xfs/xfs.ko
>> license: GPL
>> description: SGI XFS with ACLs, security attributes, large block/inode numbers, no debug enabled
>> author: Silicon Graphics, Inc.
>> srcversion: 0C1B17926BDDA4F121479EE
>> depends: exportfs
>> vermagic: 2.6.32-504.23.4.el6.x86_64 SMP mod_unload modversion
>>
>> Thanks // Hugo
>> ​
>>
>> 2015-07-10 2:32 GMT+08:00 Brian Foster <bfoster@redhat.com>:
>>
>>> On Fri, Jul 10, 2015 at 12:40:00AM +0800, Kuo Hugo wrote:
>>> > Hi Brain,
>>> >
>>> > There you go.
>>> >
>>> > https://cloud.swiftstack.com/v1/AUTH_hugo/public/vmlinux
>>> >
>>> https://cloud.swiftstack.com/v1/AUTH_hugo/public/System.map-2.6.32-504.23.4.el6.x86_64
>>> >
>>> > $ md5sum vmlinux
>>> > 82aaa694a174c0a29e78c05e73adf5d8  vmlinux
>>> >
>>> > Yes, I can read it with this vmlinux image. Put all files
>>> > (vmcore,vmlinux,System.map) in a folder and run $crash vmlinux vmcore
>>> >
>>>
>>> Thanks, I can actually load that up now. Note that we'll probably need
>>> the modules and whatnot (xfs.ko) also to be able to look at any XFS
>>> bits. It might be easiest to just tar up and compress whatever directory
>>> structure has the debug-enabled vmlinux and all the kernel modules.
>>> Thanks.
>>>
>>> Brian
>>>
>>> > Hugo
>>> > ​
>>> >
>>> > 2015-07-09 23:18 GMT+08:00 Brian Foster <bfoster@redhat.com>:
>>> >
>>> > > On Thu, Jul 09, 2015 at 09:20:00PM +0800, Kuo Hugo wrote:
>>> > > > Hi Brian,
>>> > > >
>>> > > > *Operating System Version:*
>>> > > > Linux-2.6.32-504.23.4.el6.x86_64-x86_64-with-centos-6.6-Final
>>> > > >
>>> > > > *NODE 1*
>>> > > >
>>> > > > https://cloud.swiftstack.com/v1/AUTH_burton/brtnswift/vmcore
>>> > > >
>>> https://cloud.swiftstack.com/v1/AUTH_burton/brtnswift/vmcore-dmesg.txt
>>> > > >
>>> > > >
>>> > > > *NODE 2*
>>> > > >
>>> > > >
>>> https://cloud.swiftstack.com/v1/AUTH_burton/brtnswift/vmcore_r2obj02
>>> > > >
>>> > >
>>> https://cloud.swiftstack.com/v1/AUTH_burton/brtnswift/vmcore-dmesg_r2obj02.txt
>>> > > >
>>> > > >
>>> > > > Any thoughts would be appreciate
>>> > > >
>>> > >
>>> > > I'm not able to fire up crash with these core files and the kernel
>>> debug
>>> > > info from the following centos kernel debuginfo package:
>>> > >
>>> > > kernel-debuginfo-2.6.32-504.23.4.el6.centos.plus.x86_64.rpm
>>> > >
>>> > > It complains about a version mismatch between the vmlinux and core
>>> file.
>>> > > I'm no crash expert... are you sure the cores above correspond to
>>> this
>>> > > kernel? Does crash load up for you on said box if you run something
>>> like
>>> > > the following?
>>> > >
>>> > >         crash /usr/lib/debug/lib/modules/.../vmlinux vmcore
>>> > >
>>> > > Note that you might need to install the above kernel-debuginfo
>>> package
>>> > > to get the debug (vmlinux) file. If so, could you also upload that
>>> > > debuginfo rpm somewhere?
>>> > >
>>> > > Brian
>>> > >
>>> > > > Thanks // Hugo
>>> > > >
>>> > > >
>>> > > > 2015-07-09 20:51 GMT+08:00 Brian Foster <bfoster@redhat.com>:
>>> > > >
>>> > > > > On Thu, Jul 09, 2015 at 06:57:55PM +0800, Kuo Hugo wrote:
>>> > > > > > Hi Folks,
>>> > > > > >
>>> > > > > > As the results of 32 disks with xfs_repair -n seems no any
>>> error
>>> > > shows
>>> > > > > up.
>>> > > > > > We currently tried to deploy CentOS 6.6 for testing. (The
>>> previous
>>> > > kernel
>>> > > > > > panic was came from Ubuntu).
>>> > > > > > The CentOS nodes encountered kernel panic with same daemon but
>>> the
>>> > > > > problem
>>> > > > > > may a bit differ.
>>> > > > > >
>>> > > > > >    - It was broken on xfs_dir2_sf_get_parent_ino+0xa/0x20 in
>>> Ubuntu.
>>> > > > > >    - Here’s the log in CentOS. It’s broken on
>>> > > > > >    xfs_dir2_sf_getdents+0x2a0/0x3a0
>>> > > > > >
>>> > > > >
>>> > > > > I'd venture to guess it's the same behavior here. The previous
>>> kernel
>>> > > > > had a callback for the parent inode number that was called via
>>> > > > > xfs_dir2_sf_getdents(). Taking a look at a 6.6 kernel, it has a
>>> static
>>> > > > > inline here instead.
>>> > > > >
>>> > > > > > <1>BUG: unable to handle kernel NULL pointer dereference at
>>> > > > > 0000000000000001
>>> > > > > > <1>IP: [<ffffffffa0362d60>] xfs_dir2_sf_getdents+0x2a0/0x3a0
>>> [xfs]
>>> > > > > > <4>PGD 1072327067 PUD 1072328067 PMD 0
>>> > > > > > <4>Oops: 0000 [#1] SMP
>>> > > > > > <4>last sysfs file:
>>> > > > > >
>>> > > > >
>>> > >
>>> /sys/devices/pci0000:80/0000:80:03.2/0000:83:00.0/host10/port-10:1/expander-10:1/port-10:1:16/end_device-10:1:16/target10:0:25/10:0:25:0/block/sdz/queue/rotational
>>> > > > > > <4>CPU 17
>>> > > > > > <4>Modules linked in: xt_conntrack tun xfs exportfs
>>> iptable_filter
>>> > > > > > ipt_REDIRECT iptable_nat nf_nat nf_conntrack_ipv4 nf_conntrack
>>> > > > > > nf_defrag_ipv4 ip_tables ip_vs ipv6 libcrc32c iTCO_wdt
>>> > > > > > iTCO_vendor_support ses enclosure igb i2c_algo_bit sb_edac
>>> edac_core
>>> > > > > > i2c_i801 i2c_core sg shpchp lpc_ich mfd_core ixgbe dca ptp
>>> pps_core
>>> > > > > > mdio power_meter acpi_ipmi ipmi_si ipmi_msghandler ext4 jbd2
>>> mbcache
>>> > > > > > sd_mod crc_t10dif mpt3sas scsi_transport_sas raid_class
>>> xhci_hcd ahci
>>> > > > > > wmi dm_mirror dm_region_hash dm_log dm_mod [last unloaded:
>>> > > > > > scsi_wait_scan]
>>> > > > > > <4>
>>> > > > > > <4>Pid: 4454, comm: swift-object-se Not tainted
>>> > > > > > 2.6.32-504.23.4.el6.x86_64 #1 Silicon Mechanics Storform
>>> > > > > > R518.v5P/X10DRi-T4+
>>> > > > > > <4>RIP: 0010:[<ffffffffa0362d60>]  [<ffffffffa0362d60>]
>>> > > > > > xfs_dir2_sf_getdents+0x2a0/0x3a0 [xfs]
>>> > > > > > <4>RSP: 0018:ffff880871f6de18  EFLAGS: 00010202
>>> > > > > > <4>RAX: 0000000000000000 RBX: 0000000000000004 RCX:
>>> 0000000000000000
>>> > > > > > <4>RDX: 0000000000000001 RSI: 0000000000000000 RDI:
>>> 00007faa74006203
>>> > > > > > <4>RBP: ffff880871f6de68 R08: 000000032eb04bc9 R09:
>>> 0000000000000004
>>> > > > > > <4>R10: 0000000000008030 R11: 0000000000000246 R12:
>>> 0000000000000000
>>> > > > > > <4>R13: 0000000000000002 R14: ffff88106eff7000 R15:
>>> ffff8808715b4580
>>> > > > > > <4>FS:  00007faa85425700(0000) GS:ffff880028360000(0000)
>>> > > > > knlGS:0000000000000000
>>> > > > > > <4>CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>>> > > > > > <4>CR2: 0000000000000001 CR3: 0000001072325000 CR4:
>>> 00000000001407e0
>>> > > > > > <4>DR0: 0000000000000000 DR1: 0000000000000000 DR2:
>>> 0000000000000000
>>> > > > > > <4>DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7:
>>> 0000000000000400
>>> > > > > > <4>Process swift-object-se (pid: 4454, threadinfo
>>> ffff880871f6c000,
>>> > > > > > task ffff880860f18ab0)
>>> > > > > > <4>Stack:
>>> > > > > > <4> ffff880871f6de28 ffffffff811a4bb0 ffff880871f6df38
>>> > > ffff880874749cc0
>>> > > > > > <4><d> 0000000100000103 ffff8802381f8c00 ffff880871f6df38
>>> > > > > ffff8808715b4580
>>> > > > > > <4><d> 0000000000000082 ffff8802381f8d88 ffff880871f6dec8
>>> > > > > ffffffffa035ab31
>>> > > > > > <4>Call Trace:
>>> > > > > > <4> [<ffffffff811a4bb0>] ? filldir+0x0/0xe0
>>> > > > > > <4> [<ffffffffa035ab31>] xfs_readdir+0xe1/0x130 [xfs]
>>> > > > > > <4> [<ffffffff811a4bb0>] ? filldir+0x0/0xe0
>>> > > > > > <4> [<ffffffffa038fe29>] xfs_file_readdir+0x39/0x50 [xfs]
>>> > > > > > <4> [<ffffffff811a4e30>] vfs_readdir+0xc0/0xe0
>>> > > > > > <4> [<ffffffff8119bd86>] ? final_putname+0x26/0x50
>>> > > > > > <4> [<ffffffff811a4fb9>] sys_getdents+0x89/0xf0
>>> > > > > > <4> [<ffffffff8100b0f2>] system_call_fastpath+0x16/0x1b
>>> > > > > > <4>Code: 01 00 00 00 48 c7 c6 38 6b 3a a0 48 8b 7d c0 ff 55 b8
>>> 85 c0
>>> > > > > > 0f 85 af 00 00 00 49 8b 37 e9 ec fd ff ff 66 0f 1f 84 00 00 00
>>> 00 00
>>> > > > > > <41> 80 7c 24 01 00 0f 84 9c 00 00 00 45 0f b6 44 24 03 41 0f
>>> b6
>>> > > > > > <1>RIP  [<ffffffffa0362d60>] xfs_dir2_sf_getdents+0x2a0/0x3a0
>>> [xfs]
>>> > > > > > <4> RSP <ffff880871f6de18>
>>> > > > > > <4>CR2: 0000000000000001
>>> > > > > >
>>> > > > > ...
>>> > > > > >
>>> > > > > > I’ve got the vmcore dump from operator. Does vmcore help for
>>> > > > > > troubleshooting kind issue ?
>>> > > > > >
>>> > > > >
>>> > > > > Hmm, well it couldn't hurt. Is the vmcore based on this 6.6
>>> kernel? Can
>>> > > > > you provide the exact kernel version and post the vmcore
>>> somewhere?
>>> > > > >
>>> > > > > Brian
>>> > > > >
>>> > > > > > Thanks // Hugo
>>> > > > > > ​
>>> > > > > >
>>> > > > > > 2015-06-18 22:59 GMT+08:00 Eric Sandeen <sandeen@sandeen.net>:
>>> > > > > >
>>> > > > > > > On 6/18/15 9:29 AM, Kuo Hugo wrote:
>>> > > > > > > >>- Have you tried an 'xfs_repair -n' of the affected
>>> filesystem?
>>> > > Note
>>> > > > > > > that -n will report problems only and prevent any
>>> modification by
>>> > > > > repair.
>>> > > > > > > >
>>> > > > > > > > *We might to to xfs_repair if we can address which disk
>>> causes
>>> > > the
>>> > > > > > > issue. *
>>> > > > > > >
>>> > > > > > > If you do, please save the output, and if it finds anything,
>>> please
>>> > > > > > > provide the output in this thread.
>>> > > > > > >
>>> > > > > > > Thanks,
>>> > > > > > > -Eric
>>> > > > > > >
>>> > > > >
>>> > > > > > _______________________________________________
>>> > > > > > xfs mailing list
>>> > > > > > xfs@oss.sgi.com
>>> > > > > > http://oss.sgi.com/mailman/listinfo/xfs
>>> > > > >
>>> > > > >
>>> > >
>>>
>>
>>
>

[-- Attachment #1.1.2: Type: text/html, Size: 16657 bytes --]

[-- Attachment #1.2: r2obj01_sdh_xfs_error.png --]
[-- Type: image/png, Size: 56919 bytes --]

[-- Attachment #2: Type: text/plain, Size: 121 bytes --]

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Data can't be wrote to XFS RIP [<ffffffffa041a99a>] xfs_dir2_sf_get_parent_ino+0xa/0x20
  2015-07-10  5:36                   ` Kuo Hugo
  2015-07-10 10:39                     ` Kuo Hugo
@ 2015-07-13 12:52                     ` Brian Foster
  2015-07-13 14:06                       ` Kuo Hugo
  1 sibling, 1 reply; 23+ messages in thread
From: Brian Foster @ 2015-07-13 12:52 UTC (permalink / raw)
  To: Kuo Hugo; +Cc: Hugo Kuo, Eric Sandeen, darrell, xfs

On Fri, Jul 10, 2015 at 01:36:41PM +0800, Kuo Hugo wrote:
> Hi Brain,
> 
> Is this the file which you need ?
> 
> https://cloud.swiftstack.com/v1/AUTH_hugo/public/xfs.ko
> 
> $> modinfo xfs
> 
> filename: /lib/modules/2.6.32-504.23.4.el6.x86_64/kernel/fs/xfs/xfs.ko
> license: GPL
> description: SGI XFS with ACLs, security attributes, large block/inode
> numbers, no debug enabled
> author: Silicon Graphics, Inc.
> srcversion: 0C1B17926BDDA4F121479EE
> depends: exportfs
> vermagic: 2.6.32-504.23.4.el6.x86_64 SMP mod_unload modversion
> 

No, this isn't the debug version. We need the one from the debug package
that was installed (/usr/lib/debug?).

Brian

> Thanks // Hugo
> ​
> 
> 2015-07-10 2:32 GMT+08:00 Brian Foster <bfoster@redhat.com>:
> 
> > On Fri, Jul 10, 2015 at 12:40:00AM +0800, Kuo Hugo wrote:
> > > Hi Brain,
> > >
> > > There you go.
> > >
> > > https://cloud.swiftstack.com/v1/AUTH_hugo/public/vmlinux
> > >
> > https://cloud.swiftstack.com/v1/AUTH_hugo/public/System.map-2.6.32-504.23.4.el6.x86_64
> > >
> > > $ md5sum vmlinux
> > > 82aaa694a174c0a29e78c05e73adf5d8  vmlinux
> > >
> > > Yes, I can read it with this vmlinux image. Put all files
> > > (vmcore,vmlinux,System.map) in a folder and run $crash vmlinux vmcore
> > >
> >
> > Thanks, I can actually load that up now. Note that we'll probably need
> > the modules and whatnot (xfs.ko) also to be able to look at any XFS
> > bits. It might be easiest to just tar up and compress whatever directory
> > structure has the debug-enabled vmlinux and all the kernel modules.
> > Thanks.
> >
> > Brian
> >
> > > Hugo
> > > ​
> > >
> > > 2015-07-09 23:18 GMT+08:00 Brian Foster <bfoster@redhat.com>:
> > >
> > > > On Thu, Jul 09, 2015 at 09:20:00PM +0800, Kuo Hugo wrote:
> > > > > Hi Brian,
> > > > >
> > > > > *Operating System Version:*
> > > > > Linux-2.6.32-504.23.4.el6.x86_64-x86_64-with-centos-6.6-Final
> > > > >
> > > > > *NODE 1*
> > > > >
> > > > > https://cloud.swiftstack.com/v1/AUTH_burton/brtnswift/vmcore
> > > > >
> > https://cloud.swiftstack.com/v1/AUTH_burton/brtnswift/vmcore-dmesg.txt
> > > > >
> > > > >
> > > > > *NODE 2*
> > > > >
> > > > > https://cloud.swiftstack.com/v1/AUTH_burton/brtnswift/vmcore_r2obj02
> > > > >
> > > >
> > https://cloud.swiftstack.com/v1/AUTH_burton/brtnswift/vmcore-dmesg_r2obj02.txt
> > > > >
> > > > >
> > > > > Any thoughts would be appreciate
> > > > >
> > > >
> > > > I'm not able to fire up crash with these core files and the kernel
> > debug
> > > > info from the following centos kernel debuginfo package:
> > > >
> > > > kernel-debuginfo-2.6.32-504.23.4.el6.centos.plus.x86_64.rpm
> > > >
> > > > It complains about a version mismatch between the vmlinux and core
> > file.
> > > > I'm no crash expert... are you sure the cores above correspond to this
> > > > kernel? Does crash load up for you on said box if you run something
> > like
> > > > the following?
> > > >
> > > >         crash /usr/lib/debug/lib/modules/.../vmlinux vmcore
> > > >
> > > > Note that you might need to install the above kernel-debuginfo package
> > > > to get the debug (vmlinux) file. If so, could you also upload that
> > > > debuginfo rpm somewhere?
> > > >
> > > > Brian
> > > >
> > > > > Thanks // Hugo
> > > > >
> > > > >
> > > > > 2015-07-09 20:51 GMT+08:00 Brian Foster <bfoster@redhat.com>:
> > > > >
> > > > > > On Thu, Jul 09, 2015 at 06:57:55PM +0800, Kuo Hugo wrote:
> > > > > > > Hi Folks,
> > > > > > >
> > > > > > > As the results of 32 disks with xfs_repair -n seems no any error
> > > > shows
> > > > > > up.
> > > > > > > We currently tried to deploy CentOS 6.6 for testing. (The
> > previous
> > > > kernel
> > > > > > > panic was came from Ubuntu).
> > > > > > > The CentOS nodes encountered kernel panic with same daemon but
> > the
> > > > > > problem
> > > > > > > may a bit differ.
> > > > > > >
> > > > > > >    - It was broken on xfs_dir2_sf_get_parent_ino+0xa/0x20 in
> > Ubuntu.
> > > > > > >    - Here’s the log in CentOS. It’s broken on
> > > > > > >    xfs_dir2_sf_getdents+0x2a0/0x3a0
> > > > > > >
> > > > > >
> > > > > > I'd venture to guess it's the same behavior here. The previous
> > kernel
> > > > > > had a callback for the parent inode number that was called via
> > > > > > xfs_dir2_sf_getdents(). Taking a look at a 6.6 kernel, it has a
> > static
> > > > > > inline here instead.
> > > > > >
> > > > > > > <1>BUG: unable to handle kernel NULL pointer dereference at
> > > > > > 0000000000000001
> > > > > > > <1>IP: [<ffffffffa0362d60>] xfs_dir2_sf_getdents+0x2a0/0x3a0
> > [xfs]
> > > > > > > <4>PGD 1072327067 PUD 1072328067 PMD 0
> > > > > > > <4>Oops: 0000 [#1] SMP
> > > > > > > <4>last sysfs file:
> > > > > > >
> > > > > >
> > > >
> > /sys/devices/pci0000:80/0000:80:03.2/0000:83:00.0/host10/port-10:1/expander-10:1/port-10:1:16/end_device-10:1:16/target10:0:25/10:0:25:0/block/sdz/queue/rotational
> > > > > > > <4>CPU 17
> > > > > > > <4>Modules linked in: xt_conntrack tun xfs exportfs
> > iptable_filter
> > > > > > > ipt_REDIRECT iptable_nat nf_nat nf_conntrack_ipv4 nf_conntrack
> > > > > > > nf_defrag_ipv4 ip_tables ip_vs ipv6 libcrc32c iTCO_wdt
> > > > > > > iTCO_vendor_support ses enclosure igb i2c_algo_bit sb_edac
> > edac_core
> > > > > > > i2c_i801 i2c_core sg shpchp lpc_ich mfd_core ixgbe dca ptp
> > pps_core
> > > > > > > mdio power_meter acpi_ipmi ipmi_si ipmi_msghandler ext4 jbd2
> > mbcache
> > > > > > > sd_mod crc_t10dif mpt3sas scsi_transport_sas raid_class xhci_hcd
> > ahci
> > > > > > > wmi dm_mirror dm_region_hash dm_log dm_mod [last unloaded:
> > > > > > > scsi_wait_scan]
> > > > > > > <4>
> > > > > > > <4>Pid: 4454, comm: swift-object-se Not tainted
> > > > > > > 2.6.32-504.23.4.el6.x86_64 #1 Silicon Mechanics Storform
> > > > > > > R518.v5P/X10DRi-T4+
> > > > > > > <4>RIP: 0010:[<ffffffffa0362d60>]  [<ffffffffa0362d60>]
> > > > > > > xfs_dir2_sf_getdents+0x2a0/0x3a0 [xfs]
> > > > > > > <4>RSP: 0018:ffff880871f6de18  EFLAGS: 00010202
> > > > > > > <4>RAX: 0000000000000000 RBX: 0000000000000004 RCX:
> > 0000000000000000
> > > > > > > <4>RDX: 0000000000000001 RSI: 0000000000000000 RDI:
> > 00007faa74006203
> > > > > > > <4>RBP: ffff880871f6de68 R08: 000000032eb04bc9 R09:
> > 0000000000000004
> > > > > > > <4>R10: 0000000000008030 R11: 0000000000000246 R12:
> > 0000000000000000
> > > > > > > <4>R13: 0000000000000002 R14: ffff88106eff7000 R15:
> > ffff8808715b4580
> > > > > > > <4>FS:  00007faa85425700(0000) GS:ffff880028360000(0000)
> > > > > > knlGS:0000000000000000
> > > > > > > <4>CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > > > > > > <4>CR2: 0000000000000001 CR3: 0000001072325000 CR4:
> > 00000000001407e0
> > > > > > > <4>DR0: 0000000000000000 DR1: 0000000000000000 DR2:
> > 0000000000000000
> > > > > > > <4>DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7:
> > 0000000000000400
> > > > > > > <4>Process swift-object-se (pid: 4454, threadinfo
> > ffff880871f6c000,
> > > > > > > task ffff880860f18ab0)
> > > > > > > <4>Stack:
> > > > > > > <4> ffff880871f6de28 ffffffff811a4bb0 ffff880871f6df38
> > > > ffff880874749cc0
> > > > > > > <4><d> 0000000100000103 ffff8802381f8c00 ffff880871f6df38
> > > > > > ffff8808715b4580
> > > > > > > <4><d> 0000000000000082 ffff8802381f8d88 ffff880871f6dec8
> > > > > > ffffffffa035ab31
> > > > > > > <4>Call Trace:
> > > > > > > <4> [<ffffffff811a4bb0>] ? filldir+0x0/0xe0
> > > > > > > <4> [<ffffffffa035ab31>] xfs_readdir+0xe1/0x130 [xfs]
> > > > > > > <4> [<ffffffff811a4bb0>] ? filldir+0x0/0xe0
> > > > > > > <4> [<ffffffffa038fe29>] xfs_file_readdir+0x39/0x50 [xfs]
> > > > > > > <4> [<ffffffff811a4e30>] vfs_readdir+0xc0/0xe0
> > > > > > > <4> [<ffffffff8119bd86>] ? final_putname+0x26/0x50
> > > > > > > <4> [<ffffffff811a4fb9>] sys_getdents+0x89/0xf0
> > > > > > > <4> [<ffffffff8100b0f2>] system_call_fastpath+0x16/0x1b
> > > > > > > <4>Code: 01 00 00 00 48 c7 c6 38 6b 3a a0 48 8b 7d c0 ff 55 b8
> > 85 c0
> > > > > > > 0f 85 af 00 00 00 49 8b 37 e9 ec fd ff ff 66 0f 1f 84 00 00 00
> > 00 00
> > > > > > > <41> 80 7c 24 01 00 0f 84 9c 00 00 00 45 0f b6 44 24 03 41 0f b6
> > > > > > > <1>RIP  [<ffffffffa0362d60>] xfs_dir2_sf_getdents+0x2a0/0x3a0
> > [xfs]
> > > > > > > <4> RSP <ffff880871f6de18>
> > > > > > > <4>CR2: 0000000000000001
> > > > > > >
> > > > > > ...
> > > > > > >
> > > > > > > I’ve got the vmcore dump from operator. Does vmcore help for
> > > > > > > troubleshooting kind issue ?
> > > > > > >
> > > > > >
> > > > > > Hmm, well it couldn't hurt. Is the vmcore based on this 6.6
> > kernel? Can
> > > > > > you provide the exact kernel version and post the vmcore somewhere?
> > > > > >
> > > > > > Brian
> > > > > >
> > > > > > > Thanks // Hugo
> > > > > > > ​
> > > > > > >
> > > > > > > 2015-06-18 22:59 GMT+08:00 Eric Sandeen <sandeen@sandeen.net>:
> > > > > > >
> > > > > > > > On 6/18/15 9:29 AM, Kuo Hugo wrote:
> > > > > > > > >>- Have you tried an 'xfs_repair -n' of the affected
> > filesystem?
> > > > Note
> > > > > > > > that -n will report problems only and prevent any modification
> > by
> > > > > > repair.
> > > > > > > > >
> > > > > > > > > *We might to to xfs_repair if we can address which disk
> > causes
> > > > the
> > > > > > > > issue. *
> > > > > > > >
> > > > > > > > If you do, please save the output, and if it finds anything,
> > please
> > > > > > > > provide the output in this thread.
> > > > > > > >
> > > > > > > > Thanks,
> > > > > > > > -Eric
> > > > > > > >
> > > > > >
> > > > > > > _______________________________________________
> > > > > > > xfs mailing list
> > > > > > > xfs@oss.sgi.com
> > > > > > > http://oss.sgi.com/mailman/listinfo/xfs
> > > > > >
> > > > > >
> > > >
> >

> _______________________________________________
> xfs mailing list
> xfs@oss.sgi.com
> http://oss.sgi.com/mailman/listinfo/xfs

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Data can't be wrote to XFS RIP [<ffffffffa041a99a>] xfs_dir2_sf_get_parent_ino+0xa/0x20
  2015-07-13 12:52                     ` Brian Foster
@ 2015-07-13 14:06                       ` Kuo Hugo
  2015-07-13 17:01                         ` Brian Foster
  0 siblings, 1 reply; 23+ messages in thread
From: Kuo Hugo @ 2015-07-13 14:06 UTC (permalink / raw)
  To: Brian Foster; +Cc: Hugo Kuo, Eric Sandeen, darrell, xfs


[-- Attachment #1.1: Type: text/plain, Size: 10985 bytes --]

Hi Brain,

Sorry for the wrong file in previous message. I believe this the right one.

https://cloud.swiftstack.com/v1/AUTH_hugo/public/xfs.ko.debug

/usr/lib/debug/lib/modules/2.6.32-504.23.4.el6.x86_64/kernel/fs/xfs/xfs.ko.debug

MD5 : 27829c9c55f4f5b095d29a7de7c27254

Thanks // Hugo
​

2015-07-13 20:52 GMT+08:00 Brian Foster <bfoster@redhat.com>:

> On Fri, Jul 10, 2015 at 01:36:41PM +0800, Kuo Hugo wrote:
> > Hi Brain,
> >
> > Is this the file which you need ?
> >
> > https://cloud.swiftstack.com/v1/AUTH_hugo/public/xfs.ko
> >
> > $> modinfo xfs
> >
> > filename: /lib/modules/2.6.32-504.23.4.el6.x86_64/kernel/fs/xfs/xfs.ko
> > license: GPL
> > description: SGI XFS with ACLs, security attributes, large block/inode
> > numbers, no debug enabled
> > author: Silicon Graphics, Inc.
> > srcversion: 0C1B17926BDDA4F121479EE
> > depends: exportfs
> > vermagic: 2.6.32-504.23.4.el6.x86_64 SMP mod_unload modversion
> >
>
> No, this isn't the debug version. We need the one from the debug package
> that was installed (/usr/lib/debug?).
>
> Brian
>
> > Thanks // Hugo
> > ​
> >
> > 2015-07-10 2:32 GMT+08:00 Brian Foster <bfoster@redhat.com>:
> >
> > > On Fri, Jul 10, 2015 at 12:40:00AM +0800, Kuo Hugo wrote:
> > > > Hi Brain,
> > > >
> > > > There you go.
> > > >
> > > > https://cloud.swiftstack.com/v1/AUTH_hugo/public/vmlinux
> > > >
> > >
> https://cloud.swiftstack.com/v1/AUTH_hugo/public/System.map-2.6.32-504.23.4.el6.x86_64
> > > >
> > > > $ md5sum vmlinux
> > > > 82aaa694a174c0a29e78c05e73adf5d8  vmlinux
> > > >
> > > > Yes, I can read it with this vmlinux image. Put all files
> > > > (vmcore,vmlinux,System.map) in a folder and run $crash vmlinux vmcore
> > > >
> > >
> > > Thanks, I can actually load that up now. Note that we'll probably need
> > > the modules and whatnot (xfs.ko) also to be able to look at any XFS
> > > bits. It might be easiest to just tar up and compress whatever
> directory
> > > structure has the debug-enabled vmlinux and all the kernel modules.
> > > Thanks.
> > >
> > > Brian
> > >
> > > > Hugo
> > > > ​
> > > >
> > > > 2015-07-09 23:18 GMT+08:00 Brian Foster <bfoster@redhat.com>:
> > > >
> > > > > On Thu, Jul 09, 2015 at 09:20:00PM +0800, Kuo Hugo wrote:
> > > > > > Hi Brian,
> > > > > >
> > > > > > *Operating System Version:*
> > > > > > Linux-2.6.32-504.23.4.el6.x86_64-x86_64-with-centos-6.6-Final
> > > > > >
> > > > > > *NODE 1*
> > > > > >
> > > > > > https://cloud.swiftstack.com/v1/AUTH_burton/brtnswift/vmcore
> > > > > >
> > > https://cloud.swiftstack.com/v1/AUTH_burton/brtnswift/vmcore-dmesg.txt
> > > > > >
> > > > > >
> > > > > > *NODE 2*
> > > > > >
> > > > > >
> https://cloud.swiftstack.com/v1/AUTH_burton/brtnswift/vmcore_r2obj02
> > > > > >
> > > > >
> > >
> https://cloud.swiftstack.com/v1/AUTH_burton/brtnswift/vmcore-dmesg_r2obj02.txt
> > > > > >
> > > > > >
> > > > > > Any thoughts would be appreciate
> > > > > >
> > > > >
> > > > > I'm not able to fire up crash with these core files and the kernel
> > > debug
> > > > > info from the following centos kernel debuginfo package:
> > > > >
> > > > > kernel-debuginfo-2.6.32-504.23.4.el6.centos.plus.x86_64.rpm
> > > > >
> > > > > It complains about a version mismatch between the vmlinux and core
> > > file.
> > > > > I'm no crash expert... are you sure the cores above correspond to
> this
> > > > > kernel? Does crash load up for you on said box if you run something
> > > like
> > > > > the following?
> > > > >
> > > > >         crash /usr/lib/debug/lib/modules/.../vmlinux vmcore
> > > > >
> > > > > Note that you might need to install the above kernel-debuginfo
> package
> > > > > to get the debug (vmlinux) file. If so, could you also upload that
> > > > > debuginfo rpm somewhere?
> > > > >
> > > > > Brian
> > > > >
> > > > > > Thanks // Hugo
> > > > > >
> > > > > >
> > > > > > 2015-07-09 20:51 GMT+08:00 Brian Foster <bfoster@redhat.com>:
> > > > > >
> > > > > > > On Thu, Jul 09, 2015 at 06:57:55PM +0800, Kuo Hugo wrote:
> > > > > > > > Hi Folks,
> > > > > > > >
> > > > > > > > As the results of 32 disks with xfs_repair -n seems no any
> error
> > > > > shows
> > > > > > > up.
> > > > > > > > We currently tried to deploy CentOS 6.6 for testing. (The
> > > previous
> > > > > kernel
> > > > > > > > panic was came from Ubuntu).
> > > > > > > > The CentOS nodes encountered kernel panic with same daemon
> but
> > > the
> > > > > > > problem
> > > > > > > > may a bit differ.
> > > > > > > >
> > > > > > > >    - It was broken on xfs_dir2_sf_get_parent_ino+0xa/0x20 in
> > > Ubuntu.
> > > > > > > >    - Here’s the log in CentOS. It’s broken on
> > > > > > > >    xfs_dir2_sf_getdents+0x2a0/0x3a0
> > > > > > > >
> > > > > > >
> > > > > > > I'd venture to guess it's the same behavior here. The previous
> > > kernel
> > > > > > > had a callback for the parent inode number that was called via
> > > > > > > xfs_dir2_sf_getdents(). Taking a look at a 6.6 kernel, it has a
> > > static
> > > > > > > inline here instead.
> > > > > > >
> > > > > > > > <1>BUG: unable to handle kernel NULL pointer dereference at
> > > > > > > 0000000000000001
> > > > > > > > <1>IP: [<ffffffffa0362d60>] xfs_dir2_sf_getdents+0x2a0/0x3a0
> > > [xfs]
> > > > > > > > <4>PGD 1072327067 PUD 1072328067 PMD 0
> > > > > > > > <4>Oops: 0000 [#1] SMP
> > > > > > > > <4>last sysfs file:
> > > > > > > >
> > > > > > >
> > > > >
> > >
> /sys/devices/pci0000:80/0000:80:03.2/0000:83:00.0/host10/port-10:1/expander-10:1/port-10:1:16/end_device-10:1:16/target10:0:25/10:0:25:0/block/sdz/queue/rotational
> > > > > > > > <4>CPU 17
> > > > > > > > <4>Modules linked in: xt_conntrack tun xfs exportfs
> > > iptable_filter
> > > > > > > > ipt_REDIRECT iptable_nat nf_nat nf_conntrack_ipv4
> nf_conntrack
> > > > > > > > nf_defrag_ipv4 ip_tables ip_vs ipv6 libcrc32c iTCO_wdt
> > > > > > > > iTCO_vendor_support ses enclosure igb i2c_algo_bit sb_edac
> > > edac_core
> > > > > > > > i2c_i801 i2c_core sg shpchp lpc_ich mfd_core ixgbe dca ptp
> > > pps_core
> > > > > > > > mdio power_meter acpi_ipmi ipmi_si ipmi_msghandler ext4 jbd2
> > > mbcache
> > > > > > > > sd_mod crc_t10dif mpt3sas scsi_transport_sas raid_class
> xhci_hcd
> > > ahci
> > > > > > > > wmi dm_mirror dm_region_hash dm_log dm_mod [last unloaded:
> > > > > > > > scsi_wait_scan]
> > > > > > > > <4>
> > > > > > > > <4>Pid: 4454, comm: swift-object-se Not tainted
> > > > > > > > 2.6.32-504.23.4.el6.x86_64 #1 Silicon Mechanics Storform
> > > > > > > > R518.v5P/X10DRi-T4+
> > > > > > > > <4>RIP: 0010:[<ffffffffa0362d60>]  [<ffffffffa0362d60>]
> > > > > > > > xfs_dir2_sf_getdents+0x2a0/0x3a0 [xfs]
> > > > > > > > <4>RSP: 0018:ffff880871f6de18  EFLAGS: 00010202
> > > > > > > > <4>RAX: 0000000000000000 RBX: 0000000000000004 RCX:
> > > 0000000000000000
> > > > > > > > <4>RDX: 0000000000000001 RSI: 0000000000000000 RDI:
> > > 00007faa74006203
> > > > > > > > <4>RBP: ffff880871f6de68 R08: 000000032eb04bc9 R09:
> > > 0000000000000004
> > > > > > > > <4>R10: 0000000000008030 R11: 0000000000000246 R12:
> > > 0000000000000000
> > > > > > > > <4>R13: 0000000000000002 R14: ffff88106eff7000 R15:
> > > ffff8808715b4580
> > > > > > > > <4>FS:  00007faa85425700(0000) GS:ffff880028360000(0000)
> > > > > > > knlGS:0000000000000000
> > > > > > > > <4>CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > > > > > > > <4>CR2: 0000000000000001 CR3: 0000001072325000 CR4:
> > > 00000000001407e0
> > > > > > > > <4>DR0: 0000000000000000 DR1: 0000000000000000 DR2:
> > > 0000000000000000
> > > > > > > > <4>DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7:
> > > 0000000000000400
> > > > > > > > <4>Process swift-object-se (pid: 4454, threadinfo
> > > ffff880871f6c000,
> > > > > > > > task ffff880860f18ab0)
> > > > > > > > <4>Stack:
> > > > > > > > <4> ffff880871f6de28 ffffffff811a4bb0 ffff880871f6df38
> > > > > ffff880874749cc0
> > > > > > > > <4><d> 0000000100000103 ffff8802381f8c00 ffff880871f6df38
> > > > > > > ffff8808715b4580
> > > > > > > > <4><d> 0000000000000082 ffff8802381f8d88 ffff880871f6dec8
> > > > > > > ffffffffa035ab31
> > > > > > > > <4>Call Trace:
> > > > > > > > <4> [<ffffffff811a4bb0>] ? filldir+0x0/0xe0
> > > > > > > > <4> [<ffffffffa035ab31>] xfs_readdir+0xe1/0x130 [xfs]
> > > > > > > > <4> [<ffffffff811a4bb0>] ? filldir+0x0/0xe0
> > > > > > > > <4> [<ffffffffa038fe29>] xfs_file_readdir+0x39/0x50 [xfs]
> > > > > > > > <4> [<ffffffff811a4e30>] vfs_readdir+0xc0/0xe0
> > > > > > > > <4> [<ffffffff8119bd86>] ? final_putname+0x26/0x50
> > > > > > > > <4> [<ffffffff811a4fb9>] sys_getdents+0x89/0xf0
> > > > > > > > <4> [<ffffffff8100b0f2>] system_call_fastpath+0x16/0x1b
> > > > > > > > <4>Code: 01 00 00 00 48 c7 c6 38 6b 3a a0 48 8b 7d c0 ff 55
> b8
> > > 85 c0
> > > > > > > > 0f 85 af 00 00 00 49 8b 37 e9 ec fd ff ff 66 0f 1f 84 00 00
> 00
> > > 00 00
> > > > > > > > <41> 80 7c 24 01 00 0f 84 9c 00 00 00 45 0f b6 44 24 03 41
> 0f b6
> > > > > > > > <1>RIP  [<ffffffffa0362d60>] xfs_dir2_sf_getdents+0x2a0/0x3a0
> > > [xfs]
> > > > > > > > <4> RSP <ffff880871f6de18>
> > > > > > > > <4>CR2: 0000000000000001
> > > > > > > >
> > > > > > > ...
> > > > > > > >
> > > > > > > > I’ve got the vmcore dump from operator. Does vmcore help for
> > > > > > > > troubleshooting kind issue ?
> > > > > > > >
> > > > > > >
> > > > > > > Hmm, well it couldn't hurt. Is the vmcore based on this 6.6
> > > kernel? Can
> > > > > > > you provide the exact kernel version and post the vmcore
> somewhere?
> > > > > > >
> > > > > > > Brian
> > > > > > >
> > > > > > > > Thanks // Hugo
> > > > > > > > ​
> > > > > > > >
> > > > > > > > 2015-06-18 22:59 GMT+08:00 Eric Sandeen <sandeen@sandeen.net
> >:
> > > > > > > >
> > > > > > > > > On 6/18/15 9:29 AM, Kuo Hugo wrote:
> > > > > > > > > >>- Have you tried an 'xfs_repair -n' of the affected
> > > filesystem?
> > > > > Note
> > > > > > > > > that -n will report problems only and prevent any
> modification
> > > by
> > > > > > > repair.
> > > > > > > > > >
> > > > > > > > > > *We might to to xfs_repair if we can address which disk
> > > causes
> > > > > the
> > > > > > > > > issue. *
> > > > > > > > >
> > > > > > > > > If you do, please save the output, and if it finds
> anything,
> > > please
> > > > > > > > > provide the output in this thread.
> > > > > > > > >
> > > > > > > > > Thanks,
> > > > > > > > > -Eric
> > > > > > > > >
> > > > > > >
> > > > > > > > _______________________________________________
> > > > > > > > xfs mailing list
> > > > > > > > xfs@oss.sgi.com
> > > > > > > > http://oss.sgi.com/mailman/listinfo/xfs
> > > > > > >
> > > > > > >
> > > > >
> > >
>
> > _______________________________________________
> > xfs mailing list
> > xfs@oss.sgi.com
> > http://oss.sgi.com/mailman/listinfo/xfs
>
>

[-- Attachment #1.2: Type: text/html, Size: 18846 bytes --]

[-- Attachment #2: Type: text/plain, Size: 121 bytes --]

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Data can't be wrote to XFS RIP [<ffffffffa041a99a>] xfs_dir2_sf_get_parent_ino+0xa/0x20
  2015-07-13 14:06                       ` Kuo Hugo
@ 2015-07-13 17:01                         ` Brian Foster
  2015-07-13 18:10                           ` Kuo Hugo
  0 siblings, 1 reply; 23+ messages in thread
From: Brian Foster @ 2015-07-13 17:01 UTC (permalink / raw)
  To: Kuo Hugo; +Cc: Hugo Kuo, Eric Sandeen, darrell, xfs

On Mon, Jul 13, 2015 at 10:06:39PM +0800, Kuo Hugo wrote:
> Hi Brain,
> 
> Sorry for the wrong file in previous message. I believe this the right one.
> 
> https://cloud.swiftstack.com/v1/AUTH_hugo/public/xfs.ko.debug
> 
> /usr/lib/debug/lib/modules/2.6.32-504.23.4.el6.x86_64/kernel/fs/xfs/xfs.ko.debug
> 
> MD5 : 27829c9c55f4f5b095d29a7de7c27254
> 

Yes, that works. I have a few bits of information so far, but nothing
obvious to me as to what caused the problem. Some info:

- The crash is indeed at xfs_dir2_sf_get_inumber():

/usr/src/debug/kernel-2.6.32-504.23.4.el6/linux-2.6.32-504.23.4.el6.x86_64/fs/xfs/xfs_dir2_sf.h: 101
0xffffffffa0362d60 <xfs_dir2_sf_getdents+672>:  cmpb   $0x0,0x1(%r12)
...

- %r12 above has a value of 0 and is set as follows:

/usr/src/debug/kernel-2.6.32-504.23.4.el6/linux-2.6.32-504.23.4.el6.x86_64/fs/xfs/xfs_dir2_sf.c: 727
0xffffffffa0362b11 <xfs_dir2_sf_getdents+81>:   mov    0x50(%rdi),%r12

... which is the sfp pointer assignment in the getdents function:

	sfp = (xfs_dir2_sf_t *)dp->i_df.if_u1.if_data;

This implies a NULL if_data.

- The backtrace lists a couple of inodes on the stack in this frame. I'm
not sure why, but one looks like a valid directory and the other looks
bogus. The valid inode has an inode number of 13668207561.

- The fsname for this inode is "sdb."

- The inode does appear to have a non-NULL if_data:

    ...
    if_u1 = {
      if_extents = 0xffff88084feaf5c0, 
      if_ext_irec = 0xffff88084feaf5c0, 
      if_data = 0xffff88084feaf5c0 "\004"
    }, 
    ...

So it's not totally clear what's going on there. It might be interesting
to see what directory this refers to, if it still exists on the sdb fs.
For example, is it an external directory or some kind of internal
directory created by the application? You could use something like the
following to try and locate the directory based on inode number:

	find <mntpath> -inum 13668207561

Brian

> Thanks // Hugo
> ​
> 
> 2015-07-13 20:52 GMT+08:00 Brian Foster <bfoster@redhat.com>:
> 
> > On Fri, Jul 10, 2015 at 01:36:41PM +0800, Kuo Hugo wrote:
> > > Hi Brain,
> > >
> > > Is this the file which you need ?
> > >
> > > https://cloud.swiftstack.com/v1/AUTH_hugo/public/xfs.ko
> > >
> > > $> modinfo xfs
> > >
> > > filename: /lib/modules/2.6.32-504.23.4.el6.x86_64/kernel/fs/xfs/xfs.ko
> > > license: GPL
> > > description: SGI XFS with ACLs, security attributes, large block/inode
> > > numbers, no debug enabled
> > > author: Silicon Graphics, Inc.
> > > srcversion: 0C1B17926BDDA4F121479EE
> > > depends: exportfs
> > > vermagic: 2.6.32-504.23.4.el6.x86_64 SMP mod_unload modversion
> > >
> >
> > No, this isn't the debug version. We need the one from the debug package
> > that was installed (/usr/lib/debug?).
> >
> > Brian
> >
> > > Thanks // Hugo
> > > ​
> > >
> > > 2015-07-10 2:32 GMT+08:00 Brian Foster <bfoster@redhat.com>:
> > >
> > > > On Fri, Jul 10, 2015 at 12:40:00AM +0800, Kuo Hugo wrote:
> > > > > Hi Brain,
> > > > >
> > > > > There you go.
> > > > >
> > > > > https://cloud.swiftstack.com/v1/AUTH_hugo/public/vmlinux
> > > > >
> > > >
> > https://cloud.swiftstack.com/v1/AUTH_hugo/public/System.map-2.6.32-504.23.4.el6.x86_64
> > > > >
> > > > > $ md5sum vmlinux
> > > > > 82aaa694a174c0a29e78c05e73adf5d8  vmlinux
> > > > >
> > > > > Yes, I can read it with this vmlinux image. Put all files
> > > > > (vmcore,vmlinux,System.map) in a folder and run $crash vmlinux vmcore
> > > > >
> > > >
> > > > Thanks, I can actually load that up now. Note that we'll probably need
> > > > the modules and whatnot (xfs.ko) also to be able to look at any XFS
> > > > bits. It might be easiest to just tar up and compress whatever
> > directory
> > > > structure has the debug-enabled vmlinux and all the kernel modules.
> > > > Thanks.
> > > >
> > > > Brian
> > > >
> > > > > Hugo
> > > > > ​
> > > > >
> > > > > 2015-07-09 23:18 GMT+08:00 Brian Foster <bfoster@redhat.com>:
> > > > >
> > > > > > On Thu, Jul 09, 2015 at 09:20:00PM +0800, Kuo Hugo wrote:
> > > > > > > Hi Brian,
> > > > > > >
> > > > > > > *Operating System Version:*
> > > > > > > Linux-2.6.32-504.23.4.el6.x86_64-x86_64-with-centos-6.6-Final
> > > > > > >
> > > > > > > *NODE 1*
> > > > > > >
> > > > > > > https://cloud.swiftstack.com/v1/AUTH_burton/brtnswift/vmcore
> > > > > > >
> > > > https://cloud.swiftstack.com/v1/AUTH_burton/brtnswift/vmcore-dmesg.txt
> > > > > > >
> > > > > > >
> > > > > > > *NODE 2*
> > > > > > >
> > > > > > >
> > https://cloud.swiftstack.com/v1/AUTH_burton/brtnswift/vmcore_r2obj02
> > > > > > >
> > > > > >
> > > >
> > https://cloud.swiftstack.com/v1/AUTH_burton/brtnswift/vmcore-dmesg_r2obj02.txt
> > > > > > >
> > > > > > >
> > > > > > > Any thoughts would be appreciate
> > > > > > >
> > > > > >
> > > > > > I'm not able to fire up crash with these core files and the kernel
> > > > debug
> > > > > > info from the following centos kernel debuginfo package:
> > > > > >
> > > > > > kernel-debuginfo-2.6.32-504.23.4.el6.centos.plus.x86_64.rpm
> > > > > >
> > > > > > It complains about a version mismatch between the vmlinux and core
> > > > file.
> > > > > > I'm no crash expert... are you sure the cores above correspond to
> > this
> > > > > > kernel? Does crash load up for you on said box if you run something
> > > > like
> > > > > > the following?
> > > > > >
> > > > > >         crash /usr/lib/debug/lib/modules/.../vmlinux vmcore
> > > > > >
> > > > > > Note that you might need to install the above kernel-debuginfo
> > package
> > > > > > to get the debug (vmlinux) file. If so, could you also upload that
> > > > > > debuginfo rpm somewhere?
> > > > > >
> > > > > > Brian
> > > > > >
> > > > > > > Thanks // Hugo
> > > > > > >
> > > > > > >
> > > > > > > 2015-07-09 20:51 GMT+08:00 Brian Foster <bfoster@redhat.com>:
> > > > > > >
> > > > > > > > On Thu, Jul 09, 2015 at 06:57:55PM +0800, Kuo Hugo wrote:
> > > > > > > > > Hi Folks,
> > > > > > > > >
> > > > > > > > > As the results of 32 disks with xfs_repair -n seems no any
> > error
> > > > > > shows
> > > > > > > > up.
> > > > > > > > > We currently tried to deploy CentOS 6.6 for testing. (The
> > > > previous
> > > > > > kernel
> > > > > > > > > panic was came from Ubuntu).
> > > > > > > > > The CentOS nodes encountered kernel panic with same daemon
> > but
> > > > the
> > > > > > > > problem
> > > > > > > > > may a bit differ.
> > > > > > > > >
> > > > > > > > >    - It was broken on xfs_dir2_sf_get_parent_ino+0xa/0x20 in
> > > > Ubuntu.
> > > > > > > > >    - Here’s the log in CentOS. It’s broken on
> > > > > > > > >    xfs_dir2_sf_getdents+0x2a0/0x3a0
> > > > > > > > >
> > > > > > > >
> > > > > > > > I'd venture to guess it's the same behavior here. The previous
> > > > kernel
> > > > > > > > had a callback for the parent inode number that was called via
> > > > > > > > xfs_dir2_sf_getdents(). Taking a look at a 6.6 kernel, it has a
> > > > static
> > > > > > > > inline here instead.
> > > > > > > >
> > > > > > > > > <1>BUG: unable to handle kernel NULL pointer dereference at
> > > > > > > > 0000000000000001
> > > > > > > > > <1>IP: [<ffffffffa0362d60>] xfs_dir2_sf_getdents+0x2a0/0x3a0
> > > > [xfs]
> > > > > > > > > <4>PGD 1072327067 PUD 1072328067 PMD 0
> > > > > > > > > <4>Oops: 0000 [#1] SMP
> > > > > > > > > <4>last sysfs file:
> > > > > > > > >
> > > > > > > >
> > > > > >
> > > >
> > /sys/devices/pci0000:80/0000:80:03.2/0000:83:00.0/host10/port-10:1/expander-10:1/port-10:1:16/end_device-10:1:16/target10:0:25/10:0:25:0/block/sdz/queue/rotational
> > > > > > > > > <4>CPU 17
> > > > > > > > > <4>Modules linked in: xt_conntrack tun xfs exportfs
> > > > iptable_filter
> > > > > > > > > ipt_REDIRECT iptable_nat nf_nat nf_conntrack_ipv4
> > nf_conntrack
> > > > > > > > > nf_defrag_ipv4 ip_tables ip_vs ipv6 libcrc32c iTCO_wdt
> > > > > > > > > iTCO_vendor_support ses enclosure igb i2c_algo_bit sb_edac
> > > > edac_core
> > > > > > > > > i2c_i801 i2c_core sg shpchp lpc_ich mfd_core ixgbe dca ptp
> > > > pps_core
> > > > > > > > > mdio power_meter acpi_ipmi ipmi_si ipmi_msghandler ext4 jbd2
> > > > mbcache
> > > > > > > > > sd_mod crc_t10dif mpt3sas scsi_transport_sas raid_class
> > xhci_hcd
> > > > ahci
> > > > > > > > > wmi dm_mirror dm_region_hash dm_log dm_mod [last unloaded:
> > > > > > > > > scsi_wait_scan]
> > > > > > > > > <4>
> > > > > > > > > <4>Pid: 4454, comm: swift-object-se Not tainted
> > > > > > > > > 2.6.32-504.23.4.el6.x86_64 #1 Silicon Mechanics Storform
> > > > > > > > > R518.v5P/X10DRi-T4+
> > > > > > > > > <4>RIP: 0010:[<ffffffffa0362d60>]  [<ffffffffa0362d60>]
> > > > > > > > > xfs_dir2_sf_getdents+0x2a0/0x3a0 [xfs]
> > > > > > > > > <4>RSP: 0018:ffff880871f6de18  EFLAGS: 00010202
> > > > > > > > > <4>RAX: 0000000000000000 RBX: 0000000000000004 RCX:
> > > > 0000000000000000
> > > > > > > > > <4>RDX: 0000000000000001 RSI: 0000000000000000 RDI:
> > > > 00007faa74006203
> > > > > > > > > <4>RBP: ffff880871f6de68 R08: 000000032eb04bc9 R09:
> > > > 0000000000000004
> > > > > > > > > <4>R10: 0000000000008030 R11: 0000000000000246 R12:
> > > > 0000000000000000
> > > > > > > > > <4>R13: 0000000000000002 R14: ffff88106eff7000 R15:
> > > > ffff8808715b4580
> > > > > > > > > <4>FS:  00007faa85425700(0000) GS:ffff880028360000(0000)
> > > > > > > > knlGS:0000000000000000
> > > > > > > > > <4>CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > > > > > > > > <4>CR2: 0000000000000001 CR3: 0000001072325000 CR4:
> > > > 00000000001407e0
> > > > > > > > > <4>DR0: 0000000000000000 DR1: 0000000000000000 DR2:
> > > > 0000000000000000
> > > > > > > > > <4>DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7:
> > > > 0000000000000400
> > > > > > > > > <4>Process swift-object-se (pid: 4454, threadinfo
> > > > ffff880871f6c000,
> > > > > > > > > task ffff880860f18ab0)
> > > > > > > > > <4>Stack:
> > > > > > > > > <4> ffff880871f6de28 ffffffff811a4bb0 ffff880871f6df38
> > > > > > ffff880874749cc0
> > > > > > > > > <4><d> 0000000100000103 ffff8802381f8c00 ffff880871f6df38
> > > > > > > > ffff8808715b4580
> > > > > > > > > <4><d> 0000000000000082 ffff8802381f8d88 ffff880871f6dec8
> > > > > > > > ffffffffa035ab31
> > > > > > > > > <4>Call Trace:
> > > > > > > > > <4> [<ffffffff811a4bb0>] ? filldir+0x0/0xe0
> > > > > > > > > <4> [<ffffffffa035ab31>] xfs_readdir+0xe1/0x130 [xfs]
> > > > > > > > > <4> [<ffffffff811a4bb0>] ? filldir+0x0/0xe0
> > > > > > > > > <4> [<ffffffffa038fe29>] xfs_file_readdir+0x39/0x50 [xfs]
> > > > > > > > > <4> [<ffffffff811a4e30>] vfs_readdir+0xc0/0xe0
> > > > > > > > > <4> [<ffffffff8119bd86>] ? final_putname+0x26/0x50
> > > > > > > > > <4> [<ffffffff811a4fb9>] sys_getdents+0x89/0xf0
> > > > > > > > > <4> [<ffffffff8100b0f2>] system_call_fastpath+0x16/0x1b
> > > > > > > > > <4>Code: 01 00 00 00 48 c7 c6 38 6b 3a a0 48 8b 7d c0 ff 55
> > b8
> > > > 85 c0
> > > > > > > > > 0f 85 af 00 00 00 49 8b 37 e9 ec fd ff ff 66 0f 1f 84 00 00
> > 00
> > > > 00 00
> > > > > > > > > <41> 80 7c 24 01 00 0f 84 9c 00 00 00 45 0f b6 44 24 03 41
> > 0f b6
> > > > > > > > > <1>RIP  [<ffffffffa0362d60>] xfs_dir2_sf_getdents+0x2a0/0x3a0
> > > > [xfs]
> > > > > > > > > <4> RSP <ffff880871f6de18>
> > > > > > > > > <4>CR2: 0000000000000001
> > > > > > > > >
> > > > > > > > ...
> > > > > > > > >
> > > > > > > > > I’ve got the vmcore dump from operator. Does vmcore help for
> > > > > > > > > troubleshooting kind issue ?
> > > > > > > > >
> > > > > > > >
> > > > > > > > Hmm, well it couldn't hurt. Is the vmcore based on this 6.6
> > > > kernel? Can
> > > > > > > > you provide the exact kernel version and post the vmcore
> > somewhere?
> > > > > > > >
> > > > > > > > Brian
> > > > > > > >
> > > > > > > > > Thanks // Hugo
> > > > > > > > > ​
> > > > > > > > >
> > > > > > > > > 2015-06-18 22:59 GMT+08:00 Eric Sandeen <sandeen@sandeen.net
> > >:
> > > > > > > > >
> > > > > > > > > > On 6/18/15 9:29 AM, Kuo Hugo wrote:
> > > > > > > > > > >>- Have you tried an 'xfs_repair -n' of the affected
> > > > filesystem?
> > > > > > Note
> > > > > > > > > > that -n will report problems only and prevent any
> > modification
> > > > by
> > > > > > > > repair.
> > > > > > > > > > >
> > > > > > > > > > > *We might to to xfs_repair if we can address which disk
> > > > causes
> > > > > > the
> > > > > > > > > > issue. *
> > > > > > > > > >
> > > > > > > > > > If you do, please save the output, and if it finds
> > anything,
> > > > please
> > > > > > > > > > provide the output in this thread.
> > > > > > > > > >
> > > > > > > > > > Thanks,
> > > > > > > > > > -Eric
> > > > > > > > > >
> > > > > > > >
> > > > > > > > > _______________________________________________
> > > > > > > > > xfs mailing list
> > > > > > > > > xfs@oss.sgi.com
> > > > > > > > > http://oss.sgi.com/mailman/listinfo/xfs
> > > > > > > >
> > > > > > > >
> > > > > >
> > > >
> >
> > > _______________________________________________
> > > xfs mailing list
> > > xfs@oss.sgi.com
> > > http://oss.sgi.com/mailman/listinfo/xfs
> >
> >

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Data can't be wrote to XFS RIP [<ffffffffa041a99a>] xfs_dir2_sf_get_parent_ino+0xa/0x20
  2015-07-13 17:01                         ` Brian Foster
@ 2015-07-13 18:10                           ` Kuo Hugo
  2015-07-17 19:39                             ` Kuo Hugo
  0 siblings, 1 reply; 23+ messages in thread
From: Kuo Hugo @ 2015-07-13 18:10 UTC (permalink / raw)
  To: Brian Foster; +Cc: Hugo Kuo, Eric Sandeen, Darrell Bishop, xfs


[-- Attachment #1.1: Type: text/plain, Size: 15139 bytes --]

Hi Brain,

The sdb is mounted on /srv/node/d224 on this server. That's one of three
disks that the process was making I/O on it.
Those openfiles on /srv/node/d224 were made by the process to
storing/deleting data at that moment while the kernel panic appeared.

```
36 ffff8808703ed6c0 ffff88086fd65540 ffff8805eb03ed88 REG
/srv/node/d205/quarantined/objects/cd1d68f515006d443a54ff4f658091bc-
a114bba1449b45238abf38dc741d7c27/1436254020.89801.ts 37 ffff8810718343c0
ffff88105b9d32c0 ffff8808745aa5e8 REG [eventpoll] 38 ffff8808713da780
ffff880010c9a900 ffff88096368a188 REG /srv/node/d224/quarantined/objects/
b146865bf8034bfc42570b747c341b32/1436266042.57775.ts 39 ffff880871cb03c0
ffff880495a8b380 ffff8808a5e6c988 REG /srv/node/d224/tmp/tmpSpnrHg 40
ffff8808715b4540 ffff8804819c58c0 ffff8802381f8d88 DIR
/srv/node/d224/quarantined/objects/b146865bf8034bfc42570b747c341b32
```

I'll check the dir location of the inode number.

Nice information.

Thanks // Hugo

2015-07-14 1:01 GMT+08:00 Brian Foster <bfoster@redhat.com>:

> On Mon, Jul 13, 2015 at 10:06:39PM +0800, Kuo Hugo wrote:
> > Hi Brain,
> >
> > Sorry for the wrong file in previous message. I believe this the right
> one.
> >
> > https://cloud.swiftstack.com/v1/AUTH_hugo/public/xfs.ko.debug
> >
> >
> /usr/lib/debug/lib/modules/2.6.32-504.23.4.el6.x86_64/kernel/fs/xfs/xfs.ko.debug
> >
> > MD5 : 27829c9c55f4f5b095d29a7de7c27254
> >
>
> Yes, that works. I have a few bits of information so far, but nothing
> obvious to me as to what caused the problem. Some info:
>
> - The crash is indeed at xfs_dir2_sf_get_inumber():
>
> /usr/src/debug/kernel-2.6.32-504.23.4.el6/linux-2.6.32-504.23.4.el6.x86_64/fs/xfs/xfs_dir2_sf.h:
> 101
> 0xffffffffa0362d60 <xfs_dir2_sf_getdents+672>:  cmpb   $0x0,0x1(%r12)
> ...
>
> - %r12 above has a value of 0 and is set as follows:
>
> /usr/src/debug/kernel-2.6.32-504.23.4.el6/linux-2.6.32-504.23.4.el6.x86_64/fs/xfs/xfs_dir2_sf.c:
> 727
> 0xffffffffa0362b11 <xfs_dir2_sf_getdents+81>:   mov    0x50(%rdi),%r12
>
> ... which is the sfp pointer assignment in the getdents function:
>
>         sfp = (xfs_dir2_sf_t *)dp->i_df.if_u1.if_data;
>
> This implies a NULL if_data.
>
> - The backtrace lists a couple of inodes on the stack in this frame. I'm
> not sure why, but one looks like a valid directory and the other looks
> bogus. The valid inode has an inode number of 13668207561.
>
> - The fsname for this inode is "sdb."
>
> - The inode does appear to have a non-NULL if_data:
>
>     ...
>     if_u1 = {
>       if_extents = 0xffff88084feaf5c0,
>       if_ext_irec = 0xffff88084feaf5c0,
>       if_data = 0xffff88084feaf5c0 "\004"
>     },
>     ...
>
> So it's not totally clear what's going on there. It might be interesting
> to see what directory this refers to, if it still exists on the sdb fs.
> For example, is it an external directory or some kind of internal
> directory created by the application? You could use something like the
> following to try and locate the directory based on inode number:
>
>         find <mntpath> -inum 13668207561
>
> Brian
>
> > Thanks // Hugo
> > ​
> >
> > 2015-07-13 20:52 GMT+08:00 Brian Foster <bfoster@redhat.com>:
> >
> > > On Fri, Jul 10, 2015 at 01:36:41PM +0800, Kuo Hugo wrote:
> > > > Hi Brain,
> > > >
> > > > Is this the file which you need ?
> > > >
> > > > https://cloud.swiftstack.com/v1/AUTH_hugo/public/xfs.ko
> > > >
> > > > $> modinfo xfs
> > > >
> > > > filename:
> /lib/modules/2.6.32-504.23.4.el6.x86_64/kernel/fs/xfs/xfs.ko
> > > > license: GPL
> > > > description: SGI XFS with ACLs, security attributes, large
> block/inode
> > > > numbers, no debug enabled
> > > > author: Silicon Graphics, Inc.
> > > > srcversion: 0C1B17926BDDA4F121479EE
> > > > depends: exportfs
> > > > vermagic: 2.6.32-504.23.4.el6.x86_64 SMP mod_unload modversion
> > > >
> > >
> > > No, this isn't the debug version. We need the one from the debug
> package
> > > that was installed (/usr/lib/debug?).
> > >
> > > Brian
> > >
> > > > Thanks // Hugo
> > > > ​
> > > >
> > > > 2015-07-10 2:32 GMT+08:00 Brian Foster <bfoster@redhat.com>:
> > > >
> > > > > On Fri, Jul 10, 2015 at 12:40:00AM +0800, Kuo Hugo wrote:
> > > > > > Hi Brain,
> > > > > >
> > > > > > There you go.
> > > > > >
> > > > > > https://cloud.swiftstack.com/v1/AUTH_hugo/public/vmlinux
> > > > > >
> > > > >
> > >
> https://cloud.swiftstack.com/v1/AUTH_hugo/public/System.map-2.6.32-504.23.4.el6.x86_64
> > > > > >
> > > > > > $ md5sum vmlinux
> > > > > > 82aaa694a174c0a29e78c05e73adf5d8  vmlinux
> > > > > >
> > > > > > Yes, I can read it with this vmlinux image. Put all files
> > > > > > (vmcore,vmlinux,System.map) in a folder and run $crash vmlinux
> vmcore
> > > > > >
> > > > >
> > > > > Thanks, I can actually load that up now. Note that we'll probably
> need
> > > > > the modules and whatnot (xfs.ko) also to be able to look at any XFS
> > > > > bits. It might be easiest to just tar up and compress whatever
> > > directory
> > > > > structure has the debug-enabled vmlinux and all the kernel modules.
> > > > > Thanks.
> > > > >
> > > > > Brian
> > > > >
> > > > > > Hugo
> > > > > > ​
> > > > > >
> > > > > > 2015-07-09 23:18 GMT+08:00 Brian Foster <bfoster@redhat.com>:
> > > > > >
> > > > > > > On Thu, Jul 09, 2015 at 09:20:00PM +0800, Kuo Hugo wrote:
> > > > > > > > Hi Brian,
> > > > > > > >
> > > > > > > > *Operating System Version:*
> > > > > > > > Linux-2.6.32-504.23.4.el6.x86_64-x86_64-with-centos-6.6-Final
> > > > > > > >
> > > > > > > > *NODE 1*
> > > > > > > >
> > > > > > > > https://cloud.swiftstack.com/v1/AUTH_burton/brtnswift/vmcore
> > > > > > > >
> > > > >
> https://cloud.swiftstack.com/v1/AUTH_burton/brtnswift/vmcore-dmesg.txt
> > > > > > > >
> > > > > > > >
> > > > > > > > *NODE 2*
> > > > > > > >
> > > > > > > >
> > > https://cloud.swiftstack.com/v1/AUTH_burton/brtnswift/vmcore_r2obj02
> > > > > > > >
> > > > > > >
> > > > >
> > >
> https://cloud.swiftstack.com/v1/AUTH_burton/brtnswift/vmcore-dmesg_r2obj02.txt
> > > > > > > >
> > > > > > > >
> > > > > > > > Any thoughts would be appreciate
> > > > > > > >
> > > > > > >
> > > > > > > I'm not able to fire up crash with these core files and the
> kernel
> > > > > debug
> > > > > > > info from the following centos kernel debuginfo package:
> > > > > > >
> > > > > > > kernel-debuginfo-2.6.32-504.23.4.el6.centos.plus.x86_64.rpm
> > > > > > >
> > > > > > > It complains about a version mismatch between the vmlinux and
> core
> > > > > file.
> > > > > > > I'm no crash expert... are you sure the cores above correspond
> to
> > > this
> > > > > > > kernel? Does crash load up for you on said box if you run
> something
> > > > > like
> > > > > > > the following?
> > > > > > >
> > > > > > >         crash /usr/lib/debug/lib/modules/.../vmlinux vmcore
> > > > > > >
> > > > > > > Note that you might need to install the above kernel-debuginfo
> > > package
> > > > > > > to get the debug (vmlinux) file. If so, could you also upload
> that
> > > > > > > debuginfo rpm somewhere?
> > > > > > >
> > > > > > > Brian
> > > > > > >
> > > > > > > > Thanks // Hugo
> > > > > > > >
> > > > > > > >
> > > > > > > > 2015-07-09 20:51 GMT+08:00 Brian Foster <bfoster@redhat.com
> >:
> > > > > > > >
> > > > > > > > > On Thu, Jul 09, 2015 at 06:57:55PM +0800, Kuo Hugo wrote:
> > > > > > > > > > Hi Folks,
> > > > > > > > > >
> > > > > > > > > > As the results of 32 disks with xfs_repair -n seems no
> any
> > > error
> > > > > > > shows
> > > > > > > > > up.
> > > > > > > > > > We currently tried to deploy CentOS 6.6 for testing. (The
> > > > > previous
> > > > > > > kernel
> > > > > > > > > > panic was came from Ubuntu).
> > > > > > > > > > The CentOS nodes encountered kernel panic with same
> daemon
> > > but
> > > > > the
> > > > > > > > > problem
> > > > > > > > > > may a bit differ.
> > > > > > > > > >
> > > > > > > > > >    - It was broken on
> xfs_dir2_sf_get_parent_ino+0xa/0x20 in
> > > > > Ubuntu.
> > > > > > > > > >    - Here’s the log in CentOS. It’s broken on
> > > > > > > > > >    xfs_dir2_sf_getdents+0x2a0/0x3a0
> > > > > > > > > >
> > > > > > > > >
> > > > > > > > > I'd venture to guess it's the same behavior here. The
> previous
> > > > > kernel
> > > > > > > > > had a callback for the parent inode number that was called
> via
> > > > > > > > > xfs_dir2_sf_getdents(). Taking a look at a 6.6 kernel, it
> has a
> > > > > static
> > > > > > > > > inline here instead.
> > > > > > > > >
> > > > > > > > > > <1>BUG: unable to handle kernel NULL pointer dereference
> at
> > > > > > > > > 0000000000000001
> > > > > > > > > > <1>IP: [<ffffffffa0362d60>]
> xfs_dir2_sf_getdents+0x2a0/0x3a0
> > > > > [xfs]
> > > > > > > > > > <4>PGD 1072327067 PUD 1072328067 PMD 0
> > > > > > > > > > <4>Oops: 0000 [#1] SMP
> > > > > > > > > > <4>last sysfs file:
> > > > > > > > > >
> > > > > > > > >
> > > > > > >
> > > > >
> > >
> /sys/devices/pci0000:80/0000:80:03.2/0000:83:00.0/host10/port-10:1/expander-10:1/port-10:1:16/end_device-10:1:16/target10:0:25/10:0:25:0/block/sdz/queue/rotational
> > > > > > > > > > <4>CPU 17
> > > > > > > > > > <4>Modules linked in: xt_conntrack tun xfs exportfs
> > > > > iptable_filter
> > > > > > > > > > ipt_REDIRECT iptable_nat nf_nat nf_conntrack_ipv4
> > > nf_conntrack
> > > > > > > > > > nf_defrag_ipv4 ip_tables ip_vs ipv6 libcrc32c iTCO_wdt
> > > > > > > > > > iTCO_vendor_support ses enclosure igb i2c_algo_bit
> sb_edac
> > > > > edac_core
> > > > > > > > > > i2c_i801 i2c_core sg shpchp lpc_ich mfd_core ixgbe dca
> ptp
> > > > > pps_core
> > > > > > > > > > mdio power_meter acpi_ipmi ipmi_si ipmi_msghandler ext4
> jbd2
> > > > > mbcache
> > > > > > > > > > sd_mod crc_t10dif mpt3sas scsi_transport_sas raid_class
> > > xhci_hcd
> > > > > ahci
> > > > > > > > > > wmi dm_mirror dm_region_hash dm_log dm_mod [last
> unloaded:
> > > > > > > > > > scsi_wait_scan]
> > > > > > > > > > <4>
> > > > > > > > > > <4>Pid: 4454, comm: swift-object-se Not tainted
> > > > > > > > > > 2.6.32-504.23.4.el6.x86_64 #1 Silicon Mechanics Storform
> > > > > > > > > > R518.v5P/X10DRi-T4+
> > > > > > > > > > <4>RIP: 0010:[<ffffffffa0362d60>]  [<ffffffffa0362d60>]
> > > > > > > > > > xfs_dir2_sf_getdents+0x2a0/0x3a0 [xfs]
> > > > > > > > > > <4>RSP: 0018:ffff880871f6de18  EFLAGS: 00010202
> > > > > > > > > > <4>RAX: 0000000000000000 RBX: 0000000000000004 RCX:
> > > > > 0000000000000000
> > > > > > > > > > <4>RDX: 0000000000000001 RSI: 0000000000000000 RDI:
> > > > > 00007faa74006203
> > > > > > > > > > <4>RBP: ffff880871f6de68 R08: 000000032eb04bc9 R09:
> > > > > 0000000000000004
> > > > > > > > > > <4>R10: 0000000000008030 R11: 0000000000000246 R12:
> > > > > 0000000000000000
> > > > > > > > > > <4>R13: 0000000000000002 R14: ffff88106eff7000 R15:
> > > > > ffff8808715b4580
> > > > > > > > > > <4>FS:  00007faa85425700(0000) GS:ffff880028360000(0000)
> > > > > > > > > knlGS:0000000000000000
> > > > > > > > > > <4>CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > > > > > > > > > <4>CR2: 0000000000000001 CR3: 0000001072325000 CR4:
> > > > > 00000000001407e0
> > > > > > > > > > <4>DR0: 0000000000000000 DR1: 0000000000000000 DR2:
> > > > > 0000000000000000
> > > > > > > > > > <4>DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7:
> > > > > 0000000000000400
> > > > > > > > > > <4>Process swift-object-se (pid: 4454, threadinfo
> > > > > ffff880871f6c000,
> > > > > > > > > > task ffff880860f18ab0)
> > > > > > > > > > <4>Stack:
> > > > > > > > > > <4> ffff880871f6de28 ffffffff811a4bb0 ffff880871f6df38
> > > > > > > ffff880874749cc0
> > > > > > > > > > <4><d> 0000000100000103 ffff8802381f8c00 ffff880871f6df38
> > > > > > > > > ffff8808715b4580
> > > > > > > > > > <4><d> 0000000000000082 ffff8802381f8d88 ffff880871f6dec8
> > > > > > > > > ffffffffa035ab31
> > > > > > > > > > <4>Call Trace:
> > > > > > > > > > <4> [<ffffffff811a4bb0>] ? filldir+0x0/0xe0
> > > > > > > > > > <4> [<ffffffffa035ab31>] xfs_readdir+0xe1/0x130 [xfs]
> > > > > > > > > > <4> [<ffffffff811a4bb0>] ? filldir+0x0/0xe0
> > > > > > > > > > <4> [<ffffffffa038fe29>] xfs_file_readdir+0x39/0x50 [xfs]
> > > > > > > > > > <4> [<ffffffff811a4e30>] vfs_readdir+0xc0/0xe0
> > > > > > > > > > <4> [<ffffffff8119bd86>] ? final_putname+0x26/0x50
> > > > > > > > > > <4> [<ffffffff811a4fb9>] sys_getdents+0x89/0xf0
> > > > > > > > > > <4> [<ffffffff8100b0f2>] system_call_fastpath+0x16/0x1b
> > > > > > > > > > <4>Code: 01 00 00 00 48 c7 c6 38 6b 3a a0 48 8b 7d c0 ff
> 55
> > > b8
> > > > > 85 c0
> > > > > > > > > > 0f 85 af 00 00 00 49 8b 37 e9 ec fd ff ff 66 0f 1f 84 00
> 00
> > > 00
> > > > > 00 00
> > > > > > > > > > <41> 80 7c 24 01 00 0f 84 9c 00 00 00 45 0f b6 44 24 03
> 41
> > > 0f b6
> > > > > > > > > > <1>RIP  [<ffffffffa0362d60>]
> xfs_dir2_sf_getdents+0x2a0/0x3a0
> > > > > [xfs]
> > > > > > > > > > <4> RSP <ffff880871f6de18>
> > > > > > > > > > <4>CR2: 0000000000000001
> > > > > > > > > >
> > > > > > > > > ...
> > > > > > > > > >
> > > > > > > > > > I’ve got the vmcore dump from operator. Does vmcore help
> for
> > > > > > > > > > troubleshooting kind issue ?
> > > > > > > > > >
> > > > > > > > >
> > > > > > > > > Hmm, well it couldn't hurt. Is the vmcore based on this 6.6
> > > > > kernel? Can
> > > > > > > > > you provide the exact kernel version and post the vmcore
> > > somewhere?
> > > > > > > > >
> > > > > > > > > Brian
> > > > > > > > >
> > > > > > > > > > Thanks // Hugo
> > > > > > > > > > ​
> > > > > > > > > >
> > > > > > > > > > 2015-06-18 22:59 GMT+08:00 Eric Sandeen <
> sandeen@sandeen.net
> > > >:
> > > > > > > > > >
> > > > > > > > > > > On 6/18/15 9:29 AM, Kuo Hugo wrote:
> > > > > > > > > > > >>- Have you tried an 'xfs_repair -n' of the affected
> > > > > filesystem?
> > > > > > > Note
> > > > > > > > > > > that -n will report problems only and prevent any
> > > modification
> > > > > by
> > > > > > > > > repair.
> > > > > > > > > > > >
> > > > > > > > > > > > *We might to to xfs_repair if we can address which
> disk
> > > > > causes
> > > > > > > the
> > > > > > > > > > > issue. *
> > > > > > > > > > >
> > > > > > > > > > > If you do, please save the output, and if it finds
> > > anything,
> > > > > please
> > > > > > > > > > > provide the output in this thread.
> > > > > > > > > > >
> > > > > > > > > > > Thanks,
> > > > > > > > > > > -Eric
> > > > > > > > > > >
> > > > > > > > >
> > > > > > > > > > _______________________________________________
> > > > > > > > > > xfs mailing list
> > > > > > > > > > xfs@oss.sgi.com
> > > > > > > > > > http://oss.sgi.com/mailman/listinfo/xfs
> > > > > > > > >
> > > > > > > > >
> > > > > > >
> > > > >
> > >
> > > > _______________________________________________
> > > > xfs mailing list
> > > > xfs@oss.sgi.com
> > > > http://oss.sgi.com/mailman/listinfo/xfs
> > >
> > >
>

[-- Attachment #1.2: Type: text/html, Size: 25890 bytes --]

[-- Attachment #2: Type: text/plain, Size: 121 bytes --]

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Data can't be wrote to XFS RIP [<ffffffffa041a99a>] xfs_dir2_sf_get_parent_ino+0xa/0x20
  2015-07-13 18:10                           ` Kuo Hugo
@ 2015-07-17 19:39                             ` Kuo Hugo
  2015-07-20 11:46                               ` Brian Foster
  0 siblings, 1 reply; 23+ messages in thread
From: Kuo Hugo @ 2015-07-17 19:39 UTC (permalink / raw)
  To: Brian Foster; +Cc: Hugo Kuo, Eric Sandeen, Darrell Bishop, xfs


[-- Attachment #1.1: Type: text/plain, Size: 16032 bytes --]

Hi all,

We may hit this bug in OpenStack Swift  :
Race condition quarantines valid objects
https://bugs.launchpad.net/swift/+bug/1451520

This race condition may cause the problem which Brain mentioned previously.
OpenStack community fixed this in Swift's code. Should it be a kernel bug
of XFS too  ?

Thanks for all your efforts.
Hugo


2015-07-14 2:10 GMT+08:00 Kuo Hugo <tonytkdk@gmail.com>:

> Hi Brain,
>
> The sdb is mounted on /srv/node/d224 on this server. That's one of three
> disks that the process was making I/O on it.
> Those openfiles on /srv/node/d224 were made by the process to
> storing/deleting data at that moment while the kernel panic appeared.
>
> ```
> 36 ffff8808703ed6c0 ffff88086fd65540 ffff8805eb03ed88 REG
> /srv/node/d205/quarantined/objects/cd1d68f515006d443a54ff4f658091bc-
> a114bba1449b45238abf38dc741d7c27/1436254020.89801.ts 37 ffff8810718343c0
> ffff88105b9d32c0 ffff8808745aa5e8 REG [eventpoll] 38 ffff8808713da780
> ffff880010c9a900 ffff88096368a188 REG /srv/node/d224/quarantined/objects/
> b146865bf8034bfc42570b747c341b32/1436266042.57775.ts 39 ffff880871cb03c0
> ffff880495a8b380 ffff8808a5e6c988 REG /srv/node/d224/tmp/tmpSpnrHg 40
> ffff8808715b4540 ffff8804819c58c0 ffff8802381f8d88 DIR
> /srv/node/d224/quarantined/objects/b146865bf8034bfc42570b747c341b32
> ```
>
> I'll check the dir location of the inode number.
>
> Nice information.
>
> Thanks // Hugo
>
> 2015-07-14 1:01 GMT+08:00 Brian Foster <bfoster@redhat.com>:
>
>> On Mon, Jul 13, 2015 at 10:06:39PM +0800, Kuo Hugo wrote:
>> > Hi Brain,
>> >
>> > Sorry for the wrong file in previous message. I believe this the right
>> one.
>> >
>> > https://cloud.swiftstack.com/v1/AUTH_hugo/public/xfs.ko.debug
>> >
>> >
>> /usr/lib/debug/lib/modules/2.6.32-504.23.4.el6.x86_64/kernel/fs/xfs/xfs.ko.debug
>> >
>> > MD5 : 27829c9c55f4f5b095d29a7de7c27254
>> >
>>
>> Yes, that works. I have a few bits of information so far, but nothing
>> obvious to me as to what caused the problem. Some info:
>>
>> - The crash is indeed at xfs_dir2_sf_get_inumber():
>>
>> /usr/src/debug/kernel-2.6.32-504.23.4.el6/linux-2.6.32-504.23.4.el6.x86_64/fs/xfs/xfs_dir2_sf.h:
>> 101
>> 0xffffffffa0362d60 <xfs_dir2_sf_getdents+672>:  cmpb   $0x0,0x1(%r12)
>> ...
>>
>> - %r12 above has a value of 0 and is set as follows:
>>
>> /usr/src/debug/kernel-2.6.32-504.23.4.el6/linux-2.6.32-504.23.4.el6.x86_64/fs/xfs/xfs_dir2_sf.c:
>> 727
>> 0xffffffffa0362b11 <xfs_dir2_sf_getdents+81>:   mov    0x50(%rdi),%r12
>>
>> ... which is the sfp pointer assignment in the getdents function:
>>
>>         sfp = (xfs_dir2_sf_t *)dp->i_df.if_u1.if_data;
>>
>> This implies a NULL if_data.
>>
>> - The backtrace lists a couple of inodes on the stack in this frame. I'm
>> not sure why, but one looks like a valid directory and the other looks
>> bogus. The valid inode has an inode number of 13668207561.
>>
>> - The fsname for this inode is "sdb."
>>
>> - The inode does appear to have a non-NULL if_data:
>>
>>     ...
>>     if_u1 = {
>>       if_extents = 0xffff88084feaf5c0,
>>       if_ext_irec = 0xffff88084feaf5c0,
>>       if_data = 0xffff88084feaf5c0 "\004"
>>     },
>>     ...
>>
>> So it's not totally clear what's going on there. It might be interesting
>> to see what directory this refers to, if it still exists on the sdb fs.
>> For example, is it an external directory or some kind of internal
>> directory created by the application? You could use something like the
>> following to try and locate the directory based on inode number:
>>
>>         find <mntpath> -inum 13668207561
>>
>> Brian
>>
>> > Thanks // Hugo
>> > ​
>> >
>> > 2015-07-13 20:52 GMT+08:00 Brian Foster <bfoster@redhat.com>:
>> >
>> > > On Fri, Jul 10, 2015 at 01:36:41PM +0800, Kuo Hugo wrote:
>> > > > Hi Brain,
>> > > >
>> > > > Is this the file which you need ?
>> > > >
>> > > > https://cloud.swiftstack.com/v1/AUTH_hugo/public/xfs.ko
>> > > >
>> > > > $> modinfo xfs
>> > > >
>> > > > filename:
>> /lib/modules/2.6.32-504.23.4.el6.x86_64/kernel/fs/xfs/xfs.ko
>> > > > license: GPL
>> > > > description: SGI XFS with ACLs, security attributes, large
>> block/inode
>> > > > numbers, no debug enabled
>> > > > author: Silicon Graphics, Inc.
>> > > > srcversion: 0C1B17926BDDA4F121479EE
>> > > > depends: exportfs
>> > > > vermagic: 2.6.32-504.23.4.el6.x86_64 SMP mod_unload modversion
>> > > >
>> > >
>> > > No, this isn't the debug version. We need the one from the debug
>> package
>> > > that was installed (/usr/lib/debug?).
>> > >
>> > > Brian
>> > >
>> > > > Thanks // Hugo
>> > > > ​
>> > > >
>> > > > 2015-07-10 2:32 GMT+08:00 Brian Foster <bfoster@redhat.com>:
>> > > >
>> > > > > On Fri, Jul 10, 2015 at 12:40:00AM +0800, Kuo Hugo wrote:
>> > > > > > Hi Brain,
>> > > > > >
>> > > > > > There you go.
>> > > > > >
>> > > > > > https://cloud.swiftstack.com/v1/AUTH_hugo/public/vmlinux
>> > > > > >
>> > > > >
>> > >
>> https://cloud.swiftstack.com/v1/AUTH_hugo/public/System.map-2.6.32-504.23.4.el6.x86_64
>> > > > > >
>> > > > > > $ md5sum vmlinux
>> > > > > > 82aaa694a174c0a29e78c05e73adf5d8  vmlinux
>> > > > > >
>> > > > > > Yes, I can read it with this vmlinux image. Put all files
>> > > > > > (vmcore,vmlinux,System.map) in a folder and run $crash vmlinux
>> vmcore
>> > > > > >
>> > > > >
>> > > > > Thanks, I can actually load that up now. Note that we'll probably
>> need
>> > > > > the modules and whatnot (xfs.ko) also to be able to look at any
>> XFS
>> > > > > bits. It might be easiest to just tar up and compress whatever
>> > > directory
>> > > > > structure has the debug-enabled vmlinux and all the kernel
>> modules.
>> > > > > Thanks.
>> > > > >
>> > > > > Brian
>> > > > >
>> > > > > > Hugo
>> > > > > > ​
>> > > > > >
>> > > > > > 2015-07-09 23:18 GMT+08:00 Brian Foster <bfoster@redhat.com>:
>> > > > > >
>> > > > > > > On Thu, Jul 09, 2015 at 09:20:00PM +0800, Kuo Hugo wrote:
>> > > > > > > > Hi Brian,
>> > > > > > > >
>> > > > > > > > *Operating System Version:*
>> > > > > > > >
>> Linux-2.6.32-504.23.4.el6.x86_64-x86_64-with-centos-6.6-Final
>> > > > > > > >
>> > > > > > > > *NODE 1*
>> > > > > > > >
>> > > > > > > >
>> https://cloud.swiftstack.com/v1/AUTH_burton/brtnswift/vmcore
>> > > > > > > >
>> > > > >
>> https://cloud.swiftstack.com/v1/AUTH_burton/brtnswift/vmcore-dmesg.txt
>> > > > > > > >
>> > > > > > > >
>> > > > > > > > *NODE 2*
>> > > > > > > >
>> > > > > > > >
>> > > https://cloud.swiftstack.com/v1/AUTH_burton/brtnswift/vmcore_r2obj02
>> > > > > > > >
>> > > > > > >
>> > > > >
>> > >
>> https://cloud.swiftstack.com/v1/AUTH_burton/brtnswift/vmcore-dmesg_r2obj02.txt
>> > > > > > > >
>> > > > > > > >
>> > > > > > > > Any thoughts would be appreciate
>> > > > > > > >
>> > > > > > >
>> > > > > > > I'm not able to fire up crash with these core files and the
>> kernel
>> > > > > debug
>> > > > > > > info from the following centos kernel debuginfo package:
>> > > > > > >
>> > > > > > > kernel-debuginfo-2.6.32-504.23.4.el6.centos.plus.x86_64.rpm
>> > > > > > >
>> > > > > > > It complains about a version mismatch between the vmlinux and
>> core
>> > > > > file.
>> > > > > > > I'm no crash expert... are you sure the cores above
>> correspond to
>> > > this
>> > > > > > > kernel? Does crash load up for you on said box if you run
>> something
>> > > > > like
>> > > > > > > the following?
>> > > > > > >
>> > > > > > >         crash /usr/lib/debug/lib/modules/.../vmlinux vmcore
>> > > > > > >
>> > > > > > > Note that you might need to install the above kernel-debuginfo
>> > > package
>> > > > > > > to get the debug (vmlinux) file. If so, could you also upload
>> that
>> > > > > > > debuginfo rpm somewhere?
>> > > > > > >
>> > > > > > > Brian
>> > > > > > >
>> > > > > > > > Thanks // Hugo
>> > > > > > > >
>> > > > > > > >
>> > > > > > > > 2015-07-09 20:51 GMT+08:00 Brian Foster <bfoster@redhat.com
>> >:
>> > > > > > > >
>> > > > > > > > > On Thu, Jul 09, 2015 at 06:57:55PM +0800, Kuo Hugo wrote:
>> > > > > > > > > > Hi Folks,
>> > > > > > > > > >
>> > > > > > > > > > As the results of 32 disks with xfs_repair -n seems no
>> any
>> > > error
>> > > > > > > shows
>> > > > > > > > > up.
>> > > > > > > > > > We currently tried to deploy CentOS 6.6 for testing.
>> (The
>> > > > > previous
>> > > > > > > kernel
>> > > > > > > > > > panic was came from Ubuntu).
>> > > > > > > > > > The CentOS nodes encountered kernel panic with same
>> daemon
>> > > but
>> > > > > the
>> > > > > > > > > problem
>> > > > > > > > > > may a bit differ.
>> > > > > > > > > >
>> > > > > > > > > >    - It was broken on
>> xfs_dir2_sf_get_parent_ino+0xa/0x20 in
>> > > > > Ubuntu.
>> > > > > > > > > >    - Here’s the log in CentOS. It’s broken on
>> > > > > > > > > >    xfs_dir2_sf_getdents+0x2a0/0x3a0
>> > > > > > > > > >
>> > > > > > > > >
>> > > > > > > > > I'd venture to guess it's the same behavior here. The
>> previous
>> > > > > kernel
>> > > > > > > > > had a callback for the parent inode number that was
>> called via
>> > > > > > > > > xfs_dir2_sf_getdents(). Taking a look at a 6.6 kernel, it
>> has a
>> > > > > static
>> > > > > > > > > inline here instead.
>> > > > > > > > >
>> > > > > > > > > > <1>BUG: unable to handle kernel NULL pointer
>> dereference at
>> > > > > > > > > 0000000000000001
>> > > > > > > > > > <1>IP: [<ffffffffa0362d60>]
>> xfs_dir2_sf_getdents+0x2a0/0x3a0
>> > > > > [xfs]
>> > > > > > > > > > <4>PGD 1072327067 PUD 1072328067 PMD 0
>> > > > > > > > > > <4>Oops: 0000 [#1] SMP
>> > > > > > > > > > <4>last sysfs file:
>> > > > > > > > > >
>> > > > > > > > >
>> > > > > > >
>> > > > >
>> > >
>> /sys/devices/pci0000:80/0000:80:03.2/0000:83:00.0/host10/port-10:1/expander-10:1/port-10:1:16/end_device-10:1:16/target10:0:25/10:0:25:0/block/sdz/queue/rotational
>> > > > > > > > > > <4>CPU 17
>> > > > > > > > > > <4>Modules linked in: xt_conntrack tun xfs exportfs
>> > > > > iptable_filter
>> > > > > > > > > > ipt_REDIRECT iptable_nat nf_nat nf_conntrack_ipv4
>> > > nf_conntrack
>> > > > > > > > > > nf_defrag_ipv4 ip_tables ip_vs ipv6 libcrc32c iTCO_wdt
>> > > > > > > > > > iTCO_vendor_support ses enclosure igb i2c_algo_bit
>> sb_edac
>> > > > > edac_core
>> > > > > > > > > > i2c_i801 i2c_core sg shpchp lpc_ich mfd_core ixgbe dca
>> ptp
>> > > > > pps_core
>> > > > > > > > > > mdio power_meter acpi_ipmi ipmi_si ipmi_msghandler ext4
>> jbd2
>> > > > > mbcache
>> > > > > > > > > > sd_mod crc_t10dif mpt3sas scsi_transport_sas raid_class
>> > > xhci_hcd
>> > > > > ahci
>> > > > > > > > > > wmi dm_mirror dm_region_hash dm_log dm_mod [last
>> unloaded:
>> > > > > > > > > > scsi_wait_scan]
>> > > > > > > > > > <4>
>> > > > > > > > > > <4>Pid: 4454, comm: swift-object-se Not tainted
>> > > > > > > > > > 2.6.32-504.23.4.el6.x86_64 #1 Silicon Mechanics Storform
>> > > > > > > > > > R518.v5P/X10DRi-T4+
>> > > > > > > > > > <4>RIP: 0010:[<ffffffffa0362d60>]  [<ffffffffa0362d60>]
>> > > > > > > > > > xfs_dir2_sf_getdents+0x2a0/0x3a0 [xfs]
>> > > > > > > > > > <4>RSP: 0018:ffff880871f6de18  EFLAGS: 00010202
>> > > > > > > > > > <4>RAX: 0000000000000000 RBX: 0000000000000004 RCX:
>> > > > > 0000000000000000
>> > > > > > > > > > <4>RDX: 0000000000000001 RSI: 0000000000000000 RDI:
>> > > > > 00007faa74006203
>> > > > > > > > > > <4>RBP: ffff880871f6de68 R08: 000000032eb04bc9 R09:
>> > > > > 0000000000000004
>> > > > > > > > > > <4>R10: 0000000000008030 R11: 0000000000000246 R12:
>> > > > > 0000000000000000
>> > > > > > > > > > <4>R13: 0000000000000002 R14: ffff88106eff7000 R15:
>> > > > > ffff8808715b4580
>> > > > > > > > > > <4>FS:  00007faa85425700(0000) GS:ffff880028360000(0000)
>> > > > > > > > > knlGS:0000000000000000
>> > > > > > > > > > <4>CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>> > > > > > > > > > <4>CR2: 0000000000000001 CR3: 0000001072325000 CR4:
>> > > > > 00000000001407e0
>> > > > > > > > > > <4>DR0: 0000000000000000 DR1: 0000000000000000 DR2:
>> > > > > 0000000000000000
>> > > > > > > > > > <4>DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7:
>> > > > > 0000000000000400
>> > > > > > > > > > <4>Process swift-object-se (pid: 4454, threadinfo
>> > > > > ffff880871f6c000,
>> > > > > > > > > > task ffff880860f18ab0)
>> > > > > > > > > > <4>Stack:
>> > > > > > > > > > <4> ffff880871f6de28 ffffffff811a4bb0 ffff880871f6df38
>> > > > > > > ffff880874749cc0
>> > > > > > > > > > <4><d> 0000000100000103 ffff8802381f8c00
>> ffff880871f6df38
>> > > > > > > > > ffff8808715b4580
>> > > > > > > > > > <4><d> 0000000000000082 ffff8802381f8d88
>> ffff880871f6dec8
>> > > > > > > > > ffffffffa035ab31
>> > > > > > > > > > <4>Call Trace:
>> > > > > > > > > > <4> [<ffffffff811a4bb0>] ? filldir+0x0/0xe0
>> > > > > > > > > > <4> [<ffffffffa035ab31>] xfs_readdir+0xe1/0x130 [xfs]
>> > > > > > > > > > <4> [<ffffffff811a4bb0>] ? filldir+0x0/0xe0
>> > > > > > > > > > <4> [<ffffffffa038fe29>] xfs_file_readdir+0x39/0x50
>> [xfs]
>> > > > > > > > > > <4> [<ffffffff811a4e30>] vfs_readdir+0xc0/0xe0
>> > > > > > > > > > <4> [<ffffffff8119bd86>] ? final_putname+0x26/0x50
>> > > > > > > > > > <4> [<ffffffff811a4fb9>] sys_getdents+0x89/0xf0
>> > > > > > > > > > <4> [<ffffffff8100b0f2>] system_call_fastpath+0x16/0x1b
>> > > > > > > > > > <4>Code: 01 00 00 00 48 c7 c6 38 6b 3a a0 48 8b 7d c0
>> ff 55
>> > > b8
>> > > > > 85 c0
>> > > > > > > > > > 0f 85 af 00 00 00 49 8b 37 e9 ec fd ff ff 66 0f 1f 84
>> 00 00
>> > > 00
>> > > > > 00 00
>> > > > > > > > > > <41> 80 7c 24 01 00 0f 84 9c 00 00 00 45 0f b6 44 24 03
>> 41
>> > > 0f b6
>> > > > > > > > > > <1>RIP  [<ffffffffa0362d60>]
>> xfs_dir2_sf_getdents+0x2a0/0x3a0
>> > > > > [xfs]
>> > > > > > > > > > <4> RSP <ffff880871f6de18>
>> > > > > > > > > > <4>CR2: 0000000000000001
>> > > > > > > > > >
>> > > > > > > > > ...
>> > > > > > > > > >
>> > > > > > > > > > I’ve got the vmcore dump from operator. Does vmcore
>> help for
>> > > > > > > > > > troubleshooting kind issue ?
>> > > > > > > > > >
>> > > > > > > > >
>> > > > > > > > > Hmm, well it couldn't hurt. Is the vmcore based on this
>> 6.6
>> > > > > kernel? Can
>> > > > > > > > > you provide the exact kernel version and post the vmcore
>> > > somewhere?
>> > > > > > > > >
>> > > > > > > > > Brian
>> > > > > > > > >
>> > > > > > > > > > Thanks // Hugo
>> > > > > > > > > > ​
>> > > > > > > > > >
>> > > > > > > > > > 2015-06-18 22:59 GMT+08:00 Eric Sandeen <
>> sandeen@sandeen.net
>> > > >:
>> > > > > > > > > >
>> > > > > > > > > > > On 6/18/15 9:29 AM, Kuo Hugo wrote:
>> > > > > > > > > > > >>- Have you tried an 'xfs_repair -n' of the affected
>> > > > > filesystem?
>> > > > > > > Note
>> > > > > > > > > > > that -n will report problems only and prevent any
>> > > modification
>> > > > > by
>> > > > > > > > > repair.
>> > > > > > > > > > > >
>> > > > > > > > > > > > *We might to to xfs_repair if we can address which
>> disk
>> > > > > causes
>> > > > > > > the
>> > > > > > > > > > > issue. *
>> > > > > > > > > > >
>> > > > > > > > > > > If you do, please save the output, and if it finds
>> > > anything,
>> > > > > please
>> > > > > > > > > > > provide the output in this thread.
>> > > > > > > > > > >
>> > > > > > > > > > > Thanks,
>> > > > > > > > > > > -Eric
>> > > > > > > > > > >
>> > > > > > > > >
>> > > > > > > > > > _______________________________________________
>> > > > > > > > > > xfs mailing list
>> > > > > > > > > > xfs@oss.sgi.com
>> > > > > > > > > > http://oss.sgi.com/mailman/listinfo/xfs
>> > > > > > > > >
>> > > > > > > > >
>> > > > > > >
>> > > > >
>> > >
>> > > > _______________________________________________
>> > > > xfs mailing list
>> > > > xfs@oss.sgi.com
>> > > > http://oss.sgi.com/mailman/listinfo/xfs
>> > >
>> > >
>>
>
>

[-- Attachment #1.2: Type: text/html, Size: 27131 bytes --]

[-- Attachment #2: Type: text/plain, Size: 121 bytes --]

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Data can't be wrote to XFS RIP [<ffffffffa041a99a>] xfs_dir2_sf_get_parent_ino+0xa/0x20
  2015-07-17 19:39                             ` Kuo Hugo
@ 2015-07-20 11:46                               ` Brian Foster
  2015-07-20 14:30                                 ` Kuo Hugo
  0 siblings, 1 reply; 23+ messages in thread
From: Brian Foster @ 2015-07-20 11:46 UTC (permalink / raw)
  To: Kuo Hugo; +Cc: Hugo Kuo, Eric Sandeen, Darrell Bishop, xfs

On Sat, Jul 18, 2015 at 03:39:10AM +0800, Kuo Hugo wrote:
> Hi all,
> 

FYI, the top-posting here is making this thread increasingly difficult
to follow. Please reply inline if you can.

> We may hit this bug in OpenStack Swift  :
> Race condition quarantines valid objects
> https://bugs.launchpad.net/swift/+bug/1451520
> 
> This race condition may cause the problem which Brain mentioned previously.
> OpenStack community fixed this in Swift's code. Should it be a kernel bug
> of XFS too  ?
> 

I don't know much about the Swift bug. A BUG() or crash in the kernel is
generally always a kernel bug, regardless of what userspace is doing. It
certainly could be that whatever userspace is doing to trigger the
kernel bug is a bug in the userspace application, but either way it
shouldn't cause the kernel to crash. By the same token, if Swift is
updated to fix the aforementioned bug and the kernel crash no longer
reproduces, that doesn't necessarily mean the kernel bug is fixed (just
potentially hidden).

Were you able to track down the directory inode mentioned in the
previous message? Is it some kind of internal directory used by the
application (e.g., perhaps related to the quarantine mechanism mentioned
in the bug)?

Brian

> Thanks for all your efforts.
> Hugo
> 
> 
> 2015-07-14 2:10 GMT+08:00 Kuo Hugo <tonytkdk@gmail.com>:
> 
> > Hi Brain,
> >
> > The sdb is mounted on /srv/node/d224 on this server. That's one of three
> > disks that the process was making I/O on it.
> > Those openfiles on /srv/node/d224 were made by the process to
> > storing/deleting data at that moment while the kernel panic appeared.
> >
> > ```
> > 36 ffff8808703ed6c0 ffff88086fd65540 ffff8805eb03ed88 REG
> > /srv/node/d205/quarantined/objects/cd1d68f515006d443a54ff4f658091bc-
> > a114bba1449b45238abf38dc741d7c27/1436254020.89801.ts 37 ffff8810718343c0
> > ffff88105b9d32c0 ffff8808745aa5e8 REG [eventpoll] 38 ffff8808713da780
> > ffff880010c9a900 ffff88096368a188 REG /srv/node/d224/quarantined/objects/
> > b146865bf8034bfc42570b747c341b32/1436266042.57775.ts 39 ffff880871cb03c0
> > ffff880495a8b380 ffff8808a5e6c988 REG /srv/node/d224/tmp/tmpSpnrHg 40
> > ffff8808715b4540 ffff8804819c58c0 ffff8802381f8d88 DIR
> > /srv/node/d224/quarantined/objects/b146865bf8034bfc42570b747c341b32
> > ```
> >
> > I'll check the dir location of the inode number.
> >
> > Nice information.
> >
> > Thanks // Hugo
> >
> > 2015-07-14 1:01 GMT+08:00 Brian Foster <bfoster@redhat.com>:
> >
> >> On Mon, Jul 13, 2015 at 10:06:39PM +0800, Kuo Hugo wrote:
> >> > Hi Brain,
> >> >
> >> > Sorry for the wrong file in previous message. I believe this the right
> >> one.
> >> >
> >> > https://cloud.swiftstack.com/v1/AUTH_hugo/public/xfs.ko.debug
> >> >
> >> >
> >> /usr/lib/debug/lib/modules/2.6.32-504.23.4.el6.x86_64/kernel/fs/xfs/xfs.ko.debug
> >> >
> >> > MD5 : 27829c9c55f4f5b095d29a7de7c27254
> >> >
> >>
> >> Yes, that works. I have a few bits of information so far, but nothing
> >> obvious to me as to what caused the problem. Some info:
> >>
> >> - The crash is indeed at xfs_dir2_sf_get_inumber():
> >>
> >> /usr/src/debug/kernel-2.6.32-504.23.4.el6/linux-2.6.32-504.23.4.el6.x86_64/fs/xfs/xfs_dir2_sf.h:
> >> 101
> >> 0xffffffffa0362d60 <xfs_dir2_sf_getdents+672>:  cmpb   $0x0,0x1(%r12)
> >> ...
> >>
> >> - %r12 above has a value of 0 and is set as follows:
> >>
> >> /usr/src/debug/kernel-2.6.32-504.23.4.el6/linux-2.6.32-504.23.4.el6.x86_64/fs/xfs/xfs_dir2_sf.c:
> >> 727
> >> 0xffffffffa0362b11 <xfs_dir2_sf_getdents+81>:   mov    0x50(%rdi),%r12
> >>
> >> ... which is the sfp pointer assignment in the getdents function:
> >>
> >>         sfp = (xfs_dir2_sf_t *)dp->i_df.if_u1.if_data;
> >>
> >> This implies a NULL if_data.
> >>
> >> - The backtrace lists a couple of inodes on the stack in this frame. I'm
> >> not sure why, but one looks like a valid directory and the other looks
> >> bogus. The valid inode has an inode number of 13668207561.
> >>
> >> - The fsname for this inode is "sdb."
> >>
> >> - The inode does appear to have a non-NULL if_data:
> >>
> >>     ...
> >>     if_u1 = {
> >>       if_extents = 0xffff88084feaf5c0,
> >>       if_ext_irec = 0xffff88084feaf5c0,
> >>       if_data = 0xffff88084feaf5c0 "\004"
> >>     },
> >>     ...
> >>
> >> So it's not totally clear what's going on there. It might be interesting
> >> to see what directory this refers to, if it still exists on the sdb fs.
> >> For example, is it an external directory or some kind of internal
> >> directory created by the application? You could use something like the
> >> following to try and locate the directory based on inode number:
> >>
> >>         find <mntpath> -inum 13668207561
> >>
> >> Brian
> >>
> >> > Thanks // Hugo
> >> > ​
> >> >
> >> > 2015-07-13 20:52 GMT+08:00 Brian Foster <bfoster@redhat.com>:
> >> >
> >> > > On Fri, Jul 10, 2015 at 01:36:41PM +0800, Kuo Hugo wrote:
> >> > > > Hi Brain,
> >> > > >
> >> > > > Is this the file which you need ?
> >> > > >
> >> > > > https://cloud.swiftstack.com/v1/AUTH_hugo/public/xfs.ko
> >> > > >
> >> > > > $> modinfo xfs
> >> > > >
> >> > > > filename:
> >> /lib/modules/2.6.32-504.23.4.el6.x86_64/kernel/fs/xfs/xfs.ko
> >> > > > license: GPL
> >> > > > description: SGI XFS with ACLs, security attributes, large
> >> block/inode
> >> > > > numbers, no debug enabled
> >> > > > author: Silicon Graphics, Inc.
> >> > > > srcversion: 0C1B17926BDDA4F121479EE
> >> > > > depends: exportfs
> >> > > > vermagic: 2.6.32-504.23.4.el6.x86_64 SMP mod_unload modversion
> >> > > >
> >> > >
> >> > > No, this isn't the debug version. We need the one from the debug
> >> package
> >> > > that was installed (/usr/lib/debug?).
> >> > >
> >> > > Brian
> >> > >
> >> > > > Thanks // Hugo
> >> > > > ​
> >> > > >
> >> > > > 2015-07-10 2:32 GMT+08:00 Brian Foster <bfoster@redhat.com>:
> >> > > >
> >> > > > > On Fri, Jul 10, 2015 at 12:40:00AM +0800, Kuo Hugo wrote:
> >> > > > > > Hi Brain,
> >> > > > > >
> >> > > > > > There you go.
> >> > > > > >
> >> > > > > > https://cloud.swiftstack.com/v1/AUTH_hugo/public/vmlinux
> >> > > > > >
> >> > > > >
> >> > >
> >> https://cloud.swiftstack.com/v1/AUTH_hugo/public/System.map-2.6.32-504.23.4.el6.x86_64
> >> > > > > >
> >> > > > > > $ md5sum vmlinux
> >> > > > > > 82aaa694a174c0a29e78c05e73adf5d8  vmlinux
> >> > > > > >
> >> > > > > > Yes, I can read it with this vmlinux image. Put all files
> >> > > > > > (vmcore,vmlinux,System.map) in a folder and run $crash vmlinux
> >> vmcore
> >> > > > > >
> >> > > > >
> >> > > > > Thanks, I can actually load that up now. Note that we'll probably
> >> need
> >> > > > > the modules and whatnot (xfs.ko) also to be able to look at any
> >> XFS
> >> > > > > bits. It might be easiest to just tar up and compress whatever
> >> > > directory
> >> > > > > structure has the debug-enabled vmlinux and all the kernel
> >> modules.
> >> > > > > Thanks.
> >> > > > >
> >> > > > > Brian
> >> > > > >
> >> > > > > > Hugo
> >> > > > > > ​
> >> > > > > >
> >> > > > > > 2015-07-09 23:18 GMT+08:00 Brian Foster <bfoster@redhat.com>:
> >> > > > > >
> >> > > > > > > On Thu, Jul 09, 2015 at 09:20:00PM +0800, Kuo Hugo wrote:
> >> > > > > > > > Hi Brian,
> >> > > > > > > >
> >> > > > > > > > *Operating System Version:*
> >> > > > > > > >
> >> Linux-2.6.32-504.23.4.el6.x86_64-x86_64-with-centos-6.6-Final
> >> > > > > > > >
> >> > > > > > > > *NODE 1*
> >> > > > > > > >
> >> > > > > > > >
> >> https://cloud.swiftstack.com/v1/AUTH_burton/brtnswift/vmcore
> >> > > > > > > >
> >> > > > >
> >> https://cloud.swiftstack.com/v1/AUTH_burton/brtnswift/vmcore-dmesg.txt
> >> > > > > > > >
> >> > > > > > > >
> >> > > > > > > > *NODE 2*
> >> > > > > > > >
> >> > > > > > > >
> >> > > https://cloud.swiftstack.com/v1/AUTH_burton/brtnswift/vmcore_r2obj02
> >> > > > > > > >
> >> > > > > > >
> >> > > > >
> >> > >
> >> https://cloud.swiftstack.com/v1/AUTH_burton/brtnswift/vmcore-dmesg_r2obj02.txt
> >> > > > > > > >
> >> > > > > > > >
> >> > > > > > > > Any thoughts would be appreciate
> >> > > > > > > >
> >> > > > > > >
> >> > > > > > > I'm not able to fire up crash with these core files and the
> >> kernel
> >> > > > > debug
> >> > > > > > > info from the following centos kernel debuginfo package:
> >> > > > > > >
> >> > > > > > > kernel-debuginfo-2.6.32-504.23.4.el6.centos.plus.x86_64.rpm
> >> > > > > > >
> >> > > > > > > It complains about a version mismatch between the vmlinux and
> >> core
> >> > > > > file.
> >> > > > > > > I'm no crash expert... are you sure the cores above
> >> correspond to
> >> > > this
> >> > > > > > > kernel? Does crash load up for you on said box if you run
> >> something
> >> > > > > like
> >> > > > > > > the following?
> >> > > > > > >
> >> > > > > > >         crash /usr/lib/debug/lib/modules/.../vmlinux vmcore
> >> > > > > > >
> >> > > > > > > Note that you might need to install the above kernel-debuginfo
> >> > > package
> >> > > > > > > to get the debug (vmlinux) file. If so, could you also upload
> >> that
> >> > > > > > > debuginfo rpm somewhere?
> >> > > > > > >
> >> > > > > > > Brian
> >> > > > > > >
> >> > > > > > > > Thanks // Hugo
> >> > > > > > > >
> >> > > > > > > >
> >> > > > > > > > 2015-07-09 20:51 GMT+08:00 Brian Foster <bfoster@redhat.com
> >> >:
> >> > > > > > > >
> >> > > > > > > > > On Thu, Jul 09, 2015 at 06:57:55PM +0800, Kuo Hugo wrote:
> >> > > > > > > > > > Hi Folks,
> >> > > > > > > > > >
> >> > > > > > > > > > As the results of 32 disks with xfs_repair -n seems no
> >> any
> >> > > error
> >> > > > > > > shows
> >> > > > > > > > > up.
> >> > > > > > > > > > We currently tried to deploy CentOS 6.6 for testing.
> >> (The
> >> > > > > previous
> >> > > > > > > kernel
> >> > > > > > > > > > panic was came from Ubuntu).
> >> > > > > > > > > > The CentOS nodes encountered kernel panic with same
> >> daemon
> >> > > but
> >> > > > > the
> >> > > > > > > > > problem
> >> > > > > > > > > > may a bit differ.
> >> > > > > > > > > >
> >> > > > > > > > > >    - It was broken on
> >> xfs_dir2_sf_get_parent_ino+0xa/0x20 in
> >> > > > > Ubuntu.
> >> > > > > > > > > >    - Here’s the log in CentOS. It’s broken on
> >> > > > > > > > > >    xfs_dir2_sf_getdents+0x2a0/0x3a0
> >> > > > > > > > > >
> >> > > > > > > > >
> >> > > > > > > > > I'd venture to guess it's the same behavior here. The
> >> previous
> >> > > > > kernel
> >> > > > > > > > > had a callback for the parent inode number that was
> >> called via
> >> > > > > > > > > xfs_dir2_sf_getdents(). Taking a look at a 6.6 kernel, it
> >> has a
> >> > > > > static
> >> > > > > > > > > inline here instead.
> >> > > > > > > > >
> >> > > > > > > > > > <1>BUG: unable to handle kernel NULL pointer
> >> dereference at
> >> > > > > > > > > 0000000000000001
> >> > > > > > > > > > <1>IP: [<ffffffffa0362d60>]
> >> xfs_dir2_sf_getdents+0x2a0/0x3a0
> >> > > > > [xfs]
> >> > > > > > > > > > <4>PGD 1072327067 PUD 1072328067 PMD 0
> >> > > > > > > > > > <4>Oops: 0000 [#1] SMP
> >> > > > > > > > > > <4>last sysfs file:
> >> > > > > > > > > >
> >> > > > > > > > >
> >> > > > > > >
> >> > > > >
> >> > >
> >> /sys/devices/pci0000:80/0000:80:03.2/0000:83:00.0/host10/port-10:1/expander-10:1/port-10:1:16/end_device-10:1:16/target10:0:25/10:0:25:0/block/sdz/queue/rotational
> >> > > > > > > > > > <4>CPU 17
> >> > > > > > > > > > <4>Modules linked in: xt_conntrack tun xfs exportfs
> >> > > > > iptable_filter
> >> > > > > > > > > > ipt_REDIRECT iptable_nat nf_nat nf_conntrack_ipv4
> >> > > nf_conntrack
> >> > > > > > > > > > nf_defrag_ipv4 ip_tables ip_vs ipv6 libcrc32c iTCO_wdt
> >> > > > > > > > > > iTCO_vendor_support ses enclosure igb i2c_algo_bit
> >> sb_edac
> >> > > > > edac_core
> >> > > > > > > > > > i2c_i801 i2c_core sg shpchp lpc_ich mfd_core ixgbe dca
> >> ptp
> >> > > > > pps_core
> >> > > > > > > > > > mdio power_meter acpi_ipmi ipmi_si ipmi_msghandler ext4
> >> jbd2
> >> > > > > mbcache
> >> > > > > > > > > > sd_mod crc_t10dif mpt3sas scsi_transport_sas raid_class
> >> > > xhci_hcd
> >> > > > > ahci
> >> > > > > > > > > > wmi dm_mirror dm_region_hash dm_log dm_mod [last
> >> unloaded:
> >> > > > > > > > > > scsi_wait_scan]
> >> > > > > > > > > > <4>
> >> > > > > > > > > > <4>Pid: 4454, comm: swift-object-se Not tainted
> >> > > > > > > > > > 2.6.32-504.23.4.el6.x86_64 #1 Silicon Mechanics Storform
> >> > > > > > > > > > R518.v5P/X10DRi-T4+
> >> > > > > > > > > > <4>RIP: 0010:[<ffffffffa0362d60>]  [<ffffffffa0362d60>]
> >> > > > > > > > > > xfs_dir2_sf_getdents+0x2a0/0x3a0 [xfs]
> >> > > > > > > > > > <4>RSP: 0018:ffff880871f6de18  EFLAGS: 00010202
> >> > > > > > > > > > <4>RAX: 0000000000000000 RBX: 0000000000000004 RCX:
> >> > > > > 0000000000000000
> >> > > > > > > > > > <4>RDX: 0000000000000001 RSI: 0000000000000000 RDI:
> >> > > > > 00007faa74006203
> >> > > > > > > > > > <4>RBP: ffff880871f6de68 R08: 000000032eb04bc9 R09:
> >> > > > > 0000000000000004
> >> > > > > > > > > > <4>R10: 0000000000008030 R11: 0000000000000246 R12:
> >> > > > > 0000000000000000
> >> > > > > > > > > > <4>R13: 0000000000000002 R14: ffff88106eff7000 R15:
> >> > > > > ffff8808715b4580
> >> > > > > > > > > > <4>FS:  00007faa85425700(0000) GS:ffff880028360000(0000)
> >> > > > > > > > > knlGS:0000000000000000
> >> > > > > > > > > > <4>CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> >> > > > > > > > > > <4>CR2: 0000000000000001 CR3: 0000001072325000 CR4:
> >> > > > > 00000000001407e0
> >> > > > > > > > > > <4>DR0: 0000000000000000 DR1: 0000000000000000 DR2:
> >> > > > > 0000000000000000
> >> > > > > > > > > > <4>DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7:
> >> > > > > 0000000000000400
> >> > > > > > > > > > <4>Process swift-object-se (pid: 4454, threadinfo
> >> > > > > ffff880871f6c000,
> >> > > > > > > > > > task ffff880860f18ab0)
> >> > > > > > > > > > <4>Stack:
> >> > > > > > > > > > <4> ffff880871f6de28 ffffffff811a4bb0 ffff880871f6df38
> >> > > > > > > ffff880874749cc0
> >> > > > > > > > > > <4><d> 0000000100000103 ffff8802381f8c00
> >> ffff880871f6df38
> >> > > > > > > > > ffff8808715b4580
> >> > > > > > > > > > <4><d> 0000000000000082 ffff8802381f8d88
> >> ffff880871f6dec8
> >> > > > > > > > > ffffffffa035ab31
> >> > > > > > > > > > <4>Call Trace:
> >> > > > > > > > > > <4> [<ffffffff811a4bb0>] ? filldir+0x0/0xe0
> >> > > > > > > > > > <4> [<ffffffffa035ab31>] xfs_readdir+0xe1/0x130 [xfs]
> >> > > > > > > > > > <4> [<ffffffff811a4bb0>] ? filldir+0x0/0xe0
> >> > > > > > > > > > <4> [<ffffffffa038fe29>] xfs_file_readdir+0x39/0x50
> >> [xfs]
> >> > > > > > > > > > <4> [<ffffffff811a4e30>] vfs_readdir+0xc0/0xe0
> >> > > > > > > > > > <4> [<ffffffff8119bd86>] ? final_putname+0x26/0x50
> >> > > > > > > > > > <4> [<ffffffff811a4fb9>] sys_getdents+0x89/0xf0
> >> > > > > > > > > > <4> [<ffffffff8100b0f2>] system_call_fastpath+0x16/0x1b
> >> > > > > > > > > > <4>Code: 01 00 00 00 48 c7 c6 38 6b 3a a0 48 8b 7d c0
> >> ff 55
> >> > > b8
> >> > > > > 85 c0
> >> > > > > > > > > > 0f 85 af 00 00 00 49 8b 37 e9 ec fd ff ff 66 0f 1f 84
> >> 00 00
> >> > > 00
> >> > > > > 00 00
> >> > > > > > > > > > <41> 80 7c 24 01 00 0f 84 9c 00 00 00 45 0f b6 44 24 03
> >> 41
> >> > > 0f b6
> >> > > > > > > > > > <1>RIP  [<ffffffffa0362d60>]
> >> xfs_dir2_sf_getdents+0x2a0/0x3a0
> >> > > > > [xfs]
> >> > > > > > > > > > <4> RSP <ffff880871f6de18>
> >> > > > > > > > > > <4>CR2: 0000000000000001
> >> > > > > > > > > >
> >> > > > > > > > > ...
> >> > > > > > > > > >
> >> > > > > > > > > > I’ve got the vmcore dump from operator. Does vmcore
> >> help for
> >> > > > > > > > > > troubleshooting kind issue ?
> >> > > > > > > > > >
> >> > > > > > > > >
> >> > > > > > > > > Hmm, well it couldn't hurt. Is the vmcore based on this
> >> 6.6
> >> > > > > kernel? Can
> >> > > > > > > > > you provide the exact kernel version and post the vmcore
> >> > > somewhere?
> >> > > > > > > > >
> >> > > > > > > > > Brian
> >> > > > > > > > >
> >> > > > > > > > > > Thanks // Hugo
> >> > > > > > > > > > ​
> >> > > > > > > > > >
> >> > > > > > > > > > 2015-06-18 22:59 GMT+08:00 Eric Sandeen <
> >> sandeen@sandeen.net
> >> > > >:
> >> > > > > > > > > >
> >> > > > > > > > > > > On 6/18/15 9:29 AM, Kuo Hugo wrote:
> >> > > > > > > > > > > >>- Have you tried an 'xfs_repair -n' of the affected
> >> > > > > filesystem?
> >> > > > > > > Note
> >> > > > > > > > > > > that -n will report problems only and prevent any
> >> > > modification
> >> > > > > by
> >> > > > > > > > > repair.
> >> > > > > > > > > > > >
> >> > > > > > > > > > > > *We might to to xfs_repair if we can address which
> >> disk
> >> > > > > causes
> >> > > > > > > the
> >> > > > > > > > > > > issue. *
> >> > > > > > > > > > >
> >> > > > > > > > > > > If you do, please save the output, and if it finds
> >> > > anything,
> >> > > > > please
> >> > > > > > > > > > > provide the output in this thread.
> >> > > > > > > > > > >
> >> > > > > > > > > > > Thanks,
> >> > > > > > > > > > > -Eric
> >> > > > > > > > > > >
> >> > > > > > > > >
> >> > > > > > > > > > _______________________________________________
> >> > > > > > > > > > xfs mailing list
> >> > > > > > > > > > xfs@oss.sgi.com
> >> > > > > > > > > > http://oss.sgi.com/mailman/listinfo/xfs
> >> > > > > > > > >
> >> > > > > > > > >
> >> > > > > > >
> >> > > > >
> >> > >
> >> > > > _______________________________________________
> >> > > > xfs mailing list
> >> > > > xfs@oss.sgi.com
> >> > > > http://oss.sgi.com/mailman/listinfo/xfs
> >> > >
> >> > >
> >>
> >
> >

> _______________________________________________
> xfs mailing list
> xfs@oss.sgi.com
> http://oss.sgi.com/mailman/listinfo/xfs

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Data can't be wrote to XFS RIP [<ffffffffa041a99a>] xfs_dir2_sf_get_parent_ino+0xa/0x20
  2015-07-20 11:46                               ` Brian Foster
@ 2015-07-20 14:30                                 ` Kuo Hugo
  2015-07-20 15:12                                   ` Brian Foster
  0 siblings, 1 reply; 23+ messages in thread
From: Kuo Hugo @ 2015-07-20 14:30 UTC (permalink / raw)
  To: Brian Foster; +Cc: Hugo Kuo, Eric Sandeen, Darrell Bishop, xfs


[-- Attachment #1.1: Type: text/plain, Size: 3100 bytes --]

Hi Brain,

>I don’t know much about the Swift bug. A BUG() or crash in the kernel is
generally always a kernel bug, regardless of what userspace is doing. It
>certainly could be that whatever userspace is doing to trigger the kernel
bug is a bug in the userspace application, but either way it shouldn’t
cause the >kernel to crash. By the same token, if Swift is updated to fix
the aforementioned bug and the kernel crash no longer reproduces, that
doesn’t >necessarily mean the kernel bug is fixed (just potentially hidden).

Understand.

[Previous Message]

The valid inode has an inode number of 13668207561.
- The fsname for this inode is "sdb."
- The inode does appear to have a non-NULL if_data:

    if_u1 = {
      if_extents = 0xffff88084feaf5c0,
      if_ext_irec = 0xffff88084feaf5c0,
      if_data = 0xffff88084feaf5c0 "\004"
    },

        find <mntpath> -inum 13668207561

Q1: Were you able to track down the directory inode mentioned in the
previous message?

Ans: Yes, it’s the directory/file as below. /srv/node/d224 is the mount
point of /dev/sdb . This is the original location of the path. This folder
includes the file 1436266052.71893.ts now. The .ts file is 0 size


[root@r2obj01 ~]# find /srv/node/d224 -inum 13668207561
/srv/node/d224/objects/45382/b32/b146865bf8034bfc42570b747c341b32

[root@r2obj01 ~]# ls -lrt
/srv/node/d224/objects/45382/b32/b146865bf8034bfc42570b747c341b32
-rw------- 1 swift swift 0 Jul 7 22:37 1436266052.71893.ts

Q2: Is it some kind of internal directory used by the application (e.g.,
perhaps related to the quarantine mechanism mentioned in the bug)?

Ans: Yes, it’s a directory which accessing by application.


 37 ffff8810718343c0 ffff88105b9d32c0 ffff8808745aa5e8 REG  [eventpoll]
 38 ffff8808713da780 ffff880010c9a900 ffff88096368a188 REG
/srv/node/d224/quarantined/objects/b146865bf8034bfc42570b747c341b32/1436266042.57775.ts
 39 ffff880871cb03c0 ffff880495a8b380 ffff8808a5e6c988 REG
/srv/node/d224/tmp/tmpSpnrHg

 40 ffff8808715b4540 ffff8804819c58c0 ffff8802381f8d88 DIR
/srv/node/d224/quarantined/objects/b146865bf8034bfc42570b747c341b32

The above operation in the swift-object-server was doing python function
call to rename the file
* /srv/node/d224/objects/45382/b32/b146865bf8034bfc42570b747c341b32/1436266042.57775.ts*
as
*/srv/node/d224/quarantined/objects/b146865bf8034bfc42570b747c341b32/1436266042.57775.ts*

os.rename(old, new)

And it crashed at this point. In the Q1, we found the inum is pointing to
the directory
/srv/node/d224/objects/45382/b32/b146865bf8034bfc42570b747c341b32 .

We found that multiple(over 10) DELETE from application against the target
file at almost same moment. The DELETE is removing the original file in the
directory and create new empty .ts file in this directory. I suspect that
multiple os.rename on the same file in that directory will cause the kernel
panic.

And the file
/srv/node/d224/quarantined/objects/b146865bf8034bfc42570b747c341b32/1436266042.57775.ts
was not created.

Regards // Hugo
​

[-- Attachment #1.2: Type: text/html, Size: 13843 bytes --]

[-- Attachment #2: Type: text/plain, Size: 121 bytes --]

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Data can't be wrote to XFS RIP [<ffffffffa041a99a>] xfs_dir2_sf_get_parent_ino+0xa/0x20
  2015-07-20 14:30                                 ` Kuo Hugo
@ 2015-07-20 15:12                                   ` Brian Foster
  2015-07-22  8:54                                     ` Kuo Hugo
  0 siblings, 1 reply; 23+ messages in thread
From: Brian Foster @ 2015-07-20 15:12 UTC (permalink / raw)
  To: Kuo Hugo; +Cc: Hugo Kuo, Eric Sandeen, Darrell Bishop, xfs

On Mon, Jul 20, 2015 at 10:30:31PM +0800, Kuo Hugo wrote:
> Hi Brain,
> 
> >I don’t know much about the Swift bug. A BUG() or crash in the kernel is
> generally always a kernel bug, regardless of what userspace is doing. It
> >certainly could be that whatever userspace is doing to trigger the kernel
> bug is a bug in the userspace application, but either way it shouldn’t
> cause the >kernel to crash. By the same token, if Swift is updated to fix
> the aforementioned bug and the kernel crash no longer reproduces, that
> doesn’t >necessarily mean the kernel bug is fixed (just potentially hidden).
> 
> Understand.
> 
> [Previous Message]
> 
> The valid inode has an inode number of 13668207561.
> - The fsname for this inode is "sdb."
> - The inode does appear to have a non-NULL if_data:
> 
>     if_u1 = {
>       if_extents = 0xffff88084feaf5c0,
>       if_ext_irec = 0xffff88084feaf5c0,
>       if_data = 0xffff88084feaf5c0 "\004"
>     },
> 
>         find <mntpath> -inum 13668207561
> 
> Q1: Were you able to track down the directory inode mentioned in the
> previous message?
> 
> Ans: Yes, it’s the directory/file as below. /srv/node/d224 is the mount
> point of /dev/sdb . This is the original location of the path. This folder
> includes the file 1436266052.71893.ts now. The .ts file is 0 size
> 
> 
> [root@r2obj01 ~]# find /srv/node/d224 -inum 13668207561
> /srv/node/d224/objects/45382/b32/b146865bf8034bfc42570b747c341b32
> 
> [root@r2obj01 ~]# ls -lrt
> /srv/node/d224/objects/45382/b32/b146865bf8034bfc42570b747c341b32
> -rw------- 1 swift swift 0 Jul 7 22:37 1436266052.71893.ts
> 
> Q2: Is it some kind of internal directory used by the application (e.g.,
> perhaps related to the quarantine mechanism mentioned in the bug)?
> 
> Ans: Yes, it’s a directory which accessing by application.
> 

Ok, so I take it that we have a directory per object based on some kind
of hash. The directory presumably contains the object along with
whatever metadata is tracked.

> 
>  37 ffff8810718343c0 ffff88105b9d32c0 ffff8808745aa5e8 REG  [eventpoll]
>  38 ffff8808713da780 ffff880010c9a900 ffff88096368a188 REG
> /srv/node/d224/quarantined/objects/b146865bf8034bfc42570b747c341b32/1436266042.57775.ts
>  39 ffff880871cb03c0 ffff880495a8b380 ffff8808a5e6c988 REG
> /srv/node/d224/tmp/tmpSpnrHg
> 
>  40 ffff8808715b4540 ffff8804819c58c0 ffff8802381f8d88 DIR
> /srv/node/d224/quarantined/objects/b146865bf8034bfc42570b747c341b32
> 
> The above operation in the swift-object-server was doing python function
> call to rename the file
> * /srv/node/d224/objects/45382/b32/b146865bf8034bfc42570b747c341b32/1436266042.57775.ts*
> as
> */srv/node/d224/quarantined/objects/b146865bf8034bfc42570b747c341b32/1436266042.57775.ts*
> 
> os.rename(old, new)
> 
> And it crashed at this point. In the Q1, we found the inum is pointing to
> the directory
> /srv/node/d224/objects/45382/b32/b146865bf8034bfc42570b747c341b32 .
> 

The original stacktrace shows the crash in a readdir request. I'm sure
there are multiple things going on here (and there are a couple rename
traces in the vmcore sitting on locks), of course, but where does the
information about the rename come from?

> We found that multiple(over 10) DELETE from application against the target
> file at almost same moment. The DELETE is removing the original file in the
> directory and create new empty .ts file in this directory. I suspect that
> multiple os.rename on the same file in that directory will cause the kernel
> panic.
> 
> And the file
> /srv/node/d224/quarantined/objects/b146865bf8034bfc42570b747c341b32/1436266042.57775.ts
> was not created.
> 

I'm not quite following here because I don't have enough context about
what the application server is doing. So far, it sounds like we somehow
have multiple threads competing to rename the same file..? Is there
anything else in this directory at the time this sequence executes
(e.g., a file with object data that also gets quarantined)?

Ideally, we'd ultimately like to translate this into a sequence of
operations as seen by the fs that hopefully trigger the problem. We
might have to start by reproducing through the application server.
Looking back at that bug report, it sounds like a 'DELETE' is a
high-level server operation that can consist of multiple sub-operations
at the filesystem level (e.g., list, conditional rename if *.ts file
exists, etc.). Do you have enough information through any of the above
to try and run something against Swift that might explicitly reproduce
the problem? For example, have one thread that creates and recreates the
same object repeatedly and many more competing threads that try to
remove (or whatever results in the quarantine) it? Note that I'm just
grasping at straws here, you might be able to design a more accurate
reproducer based on what it looks like is happening within Swift.

Brian

> Regards // Hugo
> ​

> _______________________________________________
> xfs mailing list
> xfs@oss.sgi.com
> http://oss.sgi.com/mailman/listinfo/xfs

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Data can't be wrote to XFS RIP [<ffffffffa041a99a>] xfs_dir2_sf_get_parent_ino+0xa/0x20
  2015-07-20 15:12                                   ` Brian Foster
@ 2015-07-22  8:54                                     ` Kuo Hugo
  0 siblings, 0 replies; 23+ messages in thread
From: Kuo Hugo @ 2015-07-22  8:54 UTC (permalink / raw)
  To: Brian Foster; +Cc: Hugo Kuo, Eric Sandeen, Darrell Bishop, xfs


[-- Attachment #1.1: Type: text/plain, Size: 2534 bytes --]

Hi Brain,

>The original stacktrace shows the crash in a readdir request. I'm sure
>there are multiple things going on here (and there are a couple rename
>traces in the vmcore sitting on locks), of course, but where does the
>information about the rename come from?

I tracked source code of the application. It moves data to a quarantined
area(another folder on same disk) under some conditions. In the bug report,
it indicates a condition that DELETE(create empty file in a directory)
object + list the directory will cause data MOVE (os.rename) to quarantined
area(another folder). The os.rename function call is the only function of
the application to touch quarantined folder.

>I'm not quite following here because I don't have enough context about
>what the application server is doing. So far, it sounds like we somehow
>have multiple threads competing to rename the same file..? Is there
>anything else in this directory at the time this sequence executes
>(e.g., a file with object data that also gets quarantined)?

The previous behavior (a bug in the application) should not trigger Kernel
panic. Yes, there's multiple threads competing to DELETE(create a empty
file) in the same directory also move the existing one to the quarantined
area. I think this is the root cause of kernel panic. The scenario is 10
application workers raise 10 thread to do same thing in the same moment.

>Ideally, we'd ultimately like to translate this into a sequence of
>operations as seen by the fs that hopefully trigger the problem. We
>might have to start by reproducing through the application server.
>Looking back at that bug report, it sounds like a 'DELETE' is a
>high-level server operation that can consist of multiple sub-operations
>at the filesystem level (e.g., list, conditional rename if *.ts file
>exists, etc.). Do you have enough information through any of the above
>to try and run something against Swift that might explicitly reproduce
>the problem? For example, have one thread that creates and recreates the
>same object repeatedly and many more competing threads that try to
>remove (or whatever results in the quarantine) it? Note that I'm just
>grasping at straws here, you might be able to design a more accurate
>reproducer based on what it looks like is happening within Swift.

We observe this issue on production cluster. It's hard to have a free gear
with 100% same HW to test it currently.
I'll try to figure out an approach to reproduce it. I'll update this mail
thread if I can make it.

Thanks // Hugo

[-- Attachment #1.2: Type: text/html, Size: 4874 bytes --]

[-- Attachment #2: Type: text/plain, Size: 121 bytes --]

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 23+ messages in thread

end of thread, other threads:[~2015-07-22  8:54 UTC | newest]

Thread overview: 23+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-06-18 11:56 Data can't be wrote to XFS RIP [<ffffffffa041a99a>] xfs_dir2_sf_get_parent_ino+0xa/0x20 Kuo Hugo
2015-06-18 13:31 ` Brian Foster
2015-06-18 14:29   ` Kuo Hugo
2015-06-18 14:59     ` Eric Sandeen
2015-07-09 10:57       ` Kuo Hugo
2015-07-09 12:51         ` Brian Foster
2015-07-09 13:20           ` Kuo Hugo
2015-07-09 13:27             ` Kuo Hugo
2015-07-09 15:18             ` Brian Foster
2015-07-09 16:40               ` Kuo Hugo
2015-07-09 18:32                 ` Brian Foster
2015-07-10  5:36                   ` Kuo Hugo
2015-07-10 10:39                     ` Kuo Hugo
2015-07-10 16:25                       ` Kuo Hugo
2015-07-13 12:52                     ` Brian Foster
2015-07-13 14:06                       ` Kuo Hugo
2015-07-13 17:01                         ` Brian Foster
2015-07-13 18:10                           ` Kuo Hugo
2015-07-17 19:39                             ` Kuo Hugo
2015-07-20 11:46                               ` Brian Foster
2015-07-20 14:30                                 ` Kuo Hugo
2015-07-20 15:12                                   ` Brian Foster
2015-07-22  8:54                                     ` Kuo Hugo

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.