Hi all, >- Is this (and how often) reproducible? *This is the third time happened in three different servers in past 5 days. * >- Have you identified which directory in your fs that the object server is attempting to enumerate when this occurs? *There's multiple object server workers R/W on over 30 XFS disks in a server. I don't have clue about which object server request causes the kernel panic. I'm still investigating. * >- Do you have any other, related output in /var/log/messages prior to this event? E.g., corruption messages or anything of that nature? *Seems no useful information in the /var/log/syslog* ``` Jun 18 06:07:00 r1obj03 ovpn-454f2951-b955-11e4-8034-0cc47a1f36ee[4069]: Data Channel Decrypt: Using 160 bit message hash 'SHA1' for HMAC authentication Jun 18 06:07:00 r1obj03 ovpn-454f2951-b955-11e4-8034-0cc47a1f36ee[4069]: Control Channel: TLSv1, cipher TLSv1/SSLv3 DHE-RSA-AES256-SHA, 2048 bit RSA Jun 18 06:10:01 r1obj03 CRON[13595]: (swift) CMD ((date; test -f /etc/swift/object-server.conf && /opt/ss/bin/swift-recon-cron /etc/swift/object-server.conf || /opt/ss/bin/swift-recon-cron /etc/swift/object-server/1.conf) >> /var/log/swift-recon-cron.log 2>&1) Jun 18 06:10:14 r1obj03 kernel: [7631629.083099] BUG: unable to handle kernel NULL pointer dereference at 0000000000000001 ``` >- Have you tried an 'xfs_repair -n' of the affected filesystem? Note that -n will report problems only and prevent any modification by repair. *We might to to xfs_repair if we can address which disk causes the issue. * Thanks // Hugo Kuo 2015-06-18 21:31 GMT+08:00 Brian Foster : > On Thu, Jun 18, 2015 at 07:56:24PM +0800, Kuo Hugo wrote: > > Hi folks, > > > > Recently we found the following kernel message of XFS. I don’t really > know > > how to read it in the right way to figure out the problem in the system. > > Is there any known bug for > > Linux-3.13.0-32-generic-x86_64-with-Ubuntu-14.04-trusty ? Or the problem > is > > on the swift-object-se rather than XFS itself ? > > > > Nothing that I know of, but others might have seen something like this. > > > swift-object-se means swift-object-server which is a daemon handles data > > from http to XFS. I can’t address the problem came from XFS or the daemon > > swift-object-server. > > Any idea would be appreciated. > > > > Jun 15 09:49:30 r1obj02 kernel: [607696.798803] BUG: unable to handle > > kernel NULL pointer dereference at 0000000000000001 > > Jun 15 09:49:30 r1obj02 kernel: [607696.800582] IP: > > [] xfs_dir2_sf_get_parent_ino+0xa/0x20 [xfs] > > So that looks like a NULL header down in xfs_dir2_sf_get_ino(), as > hdr->i8count is at a 1 byte offset in the structure. > > > Jun 15 09:49:30 r1obj02 kernel: [607696.802230] PGD 1046c6c067 PUD > > 1044eba067 PMD 0 > > Jun 15 09:49:30 r1obj02 kernel: [607696.803308] Oops: 0000 [#1] SMP > > Jun 15 09:49:30 r1obj02 kernel: [607696.804058] Modules linked in: > > xt_conntrack xfs xt_REDIRECT iptable_nat nf_conntrack_ipv4 > > nf_defrag_ipv4 nf_nat_ipv4 nf_nat xt_tcpudp iptable_filter ip_tables > > x_tables x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm > > crct10dif_pclmul crc32_pclmul ghash_clmulni_intel ip_vs aesni_intel > > aes_x86_64 gpio_ich lrw nf_conntrack gf128mul libcrc32c mei_me > > glue_helper sb_edac ablk_helper cryptd edac_core joydev mei lpc_ich > > ioatdma lp ipmi_si shpchp wmi mac_hid parport ses enclosure > > hid_generic igb usbhid ixgbe mpt2sas ahci hid i2c_algo_bit libahci dca > > raid_class ptp mdio scsi_transport_sas pps_core > > Jun 15 09:49:30 r1obj02 kernel: [607696.817125] CPU: 13 PID: 32401 > > Comm: swift-object-se Not tainted 3.13.0-32-generic #57-Ubuntu > > Jun 15 09:49:30 r1obj02 kernel: [607696.819020] Hardware name: Silicon > > Mechanics Storform iServ R518.v4/X9DRH-7TF/7F/iTF/iF, BIOS 3.0b > > 04/28/2014 > > Jun 15 09:49:30 r1obj02 kernel: [607696.821235] task: ffff880017d68000 > > ti: ffff8808e87e4000 task.ti: ffff8808e87e4000 > > Jun 15 09:49:30 r1obj02 kernel: [607696.822889] RIP: > > 0010:[] [] > > xfs_dir2_sf_get_parent_ino+0xa/0x20 [xfs] > > Jun 15 09:49:30 r1obj02 kernel: [607696.825117] RSP: > > 0018:ffff8808e87e5e38 EFLAGS: 00010202 > > Jun 15 09:49:30 r1obj02 kernel: [607696.826296] RAX: ffffffffa0458360 > > RBX: 0000000000000004 RCX: 0000000000000000 > > Jun 15 09:49:30 r1obj02 kernel: [607696.905158] RDX: 0000000000000002 > > RSI: 0000000000000002 RDI: 0000000000000000 > > Jun 15 09:49:30 r1obj02 kernel: [607696.987107] RBP: ffff8808e87e5e88 > > R08: 000000020079e3b9 R09: 0000000000000004 > > Jun 15 09:49:30 r1obj02 kernel: [607697.069214] R10: 00000000000003e0 > > R11: 00000000000005b0 R12: ffff88104d0c0800 > > Jun 15 09:49:30 r1obj02 kernel: [607697.151676] R13: ffff8808e87e5f20 > > R14: ffff88004988f000 R15: 0000000000000000 > > Jun 15 09:49:30 r1obj02 kernel: [607697.234244] FS: > > 00007fe74c9fb740(0000) GS:ffff88085fce0000(0000) > > knlGS:0000000000000000 > > Jun 15 09:49:30 r1obj02 kernel: [607697.318842] CS: 0010 DS: 0000 ES: > > 0000 CR0: 0000000080050033 > > Jun 15 09:49:30 r1obj02 kernel: [607697.361609] CR2: 0000000000000001 > > CR3: 0000000bcb9b1000 CR4: 00000000001407e0 > > Jun 15 09:49:30 r1obj02 kernel: [607697.445360] Stack: > > Jun 15 09:49:30 r1obj02 kernel: [607697.485796] ffff8808e87e5e88 > > ffffffffa03e2a33 ffff8808e87e5e58 ffffffff817205f9 > > Jun 15 09:49:30 r1obj02 kernel: [607697.567306] ffff8808e87e5eb8 > > ffff88084e1e6700 ffff88004988f000 ffff8808e87e5f20 > > Jun 15 09:49:30 r1obj02 kernel: [607697.648568] 0000000000000082 > > 00007fe7487aa7a6 ffff8808e87e5ec0 ffffffffa03e2e0b > > Jun 15 09:49:30 r1obj02 kernel: [607697.729785] Call Trace: > > Jun 15 09:49:30 r1obj02 kernel: [607697.769297] [] ? > > xfs_dir2_sf_getdents+0x263/0x2a0 [xfs] > > We're called from here attempting to list a directory, which appears to > be the following block of code: > > ... > sfp = (xfs_dir2_sf_hdr_t *)dp->i_df.if_u1.if_data; > ... > if (ctx->pos <= dotdot_offset) { > ino = dp->d_ops->sf_get_parent_ino(sfp); > ctx->pos = dotdot_offset & 0x7fffffff; > if (!dir_emit(ctx, "..", 2, ino, DT_DIR)) > return 0; > } > > It wants to emit the ".." directory entry and apparently the in-core > data fork is NULL. There's an assertion against that earlier in the > function so I take it the expectation is that this has been read/set > beforehand. In fact, if this is a short form directory I also take it > this should be set to if_inline_data, which appears to be part of the > fork allocation itself. > > It's not immediately clear to me how this could happen. First off, it > would probably be good to determine whether this is a runtime issue or > due to some kind of on-disk problem. Some questions: > > - Is this (and how often) reproducible? > - Have you identified which directory in your fs that the object server > is attempting to enumerate when this occurs? > - Do you have any other, related output in /var/log/messages prior to > this event? E.g., corruption messages or anything of that nature? > - Have you tried an 'xfs_repair -n' of the affected filesystem? Note > that -n will report problems only and prevent any modification by > repair. > > Brian > > > Jun 15 09:49:30 r1obj02 kernel: [607697.809560] [] ? > > schedule_preempt_disabled+0x29/0x70 > > Jun 15 09:49:30 r1obj02 kernel: [607697.849087] [] > > xfs_readdir+0xeb/0x110 [xfs] > > Jun 15 09:49:30 r1obj02 kernel: [607697.887918] [] > > xfs_file_readdir+0x2b/0x40 [xfs] > > Jun 15 09:49:30 r1obj02 kernel: [607697.926061] [] > > iterate_dir+0xa5/0xe0 > > Jun 15 09:49:30 r1obj02 kernel: [607697.963349] [] ? > > vtime_account_user+0x54/0x60 > > Jun 15 09:49:30 r1obj02 kernel: [607698.000413] [] > > SyS_getdents+0x92/0x120 > > Jun 15 09:49:30 r1obj02 kernel: [607698.037112] [] ? > > fillonedir+0xe0/0xe0 > > Jun 15 09:49:30 r1obj02 kernel: [607698.072867] [] ? > > tracesys+0x7e/0xe6 > > Jun 15 09:49:30 r1obj02 kernel: [607698.107679] [] > > tracesys+0xe1/0xe6 > > Jun 15 09:49:30 r1obj02 kernel: [607698.141543] Code: 00 48 8b 06 48 > > ba ff ff ff ff ff ff ff 00 5d 48 0f c8 48 21 d0 c3 66 66 2e 0f 1f 84 > > 00 00 00 00 00 0f 1f 44 00 00 55 48 8d 77 02 <0f> b6 7f 01 48 89 e5 e8 > > aa ff ff ff 5d c3 0f 1f 84 00 00 00 00 > > Jun 15 09:49:30 r1obj02 kernel: [607698.244881] RIP > > [] xfs_dir2_sf_get_parent_ino+0xa/0x20 [xfs] > > Jun 15 09:49:30 r1obj02 kernel: [607698.310872] RSP > > Jun 15 09:49:30 r1obj02 kernel: [607698.343092] CR2: 0000000000000001 > > Jun 15 09:49:30 r1obj02 kernel: [607698.420933] ---[ end trace > > ba3fdf319346b7e6 ]--- > > > > Thanks // Hugo Kuo > > ​ > > > _______________________________________________ > > xfs mailing list > > xfs@oss.sgi.com > > http://oss.sgi.com/mailman/listinfo/xfs > >