All the mail mirrored from lore.kernel.org
 help / color / mirror / Atom feed
From: Kuo Hugo <tonytkdk@gmail.com>
To: Brian Foster <bfoster@redhat.com>, xfs@oss.sgi.com
Cc: Hugo Kuo <hugo@swiftstack.com>, darrell@swiftstack.com
Subject: Re: Data can't be wrote to XFS RIP [<ffffffffa041a99a>] xfs_dir2_sf_get_parent_ino+0xa/0x20
Date: Thu, 18 Jun 2015 22:29:09 +0800	[thread overview]
Message-ID: <CA++_uhuLbTT32v4ZeVkP3WOjBEeMU0O1NZDcv66a67hZ=pj9Sw@mail.gmail.com> (raw)
In-Reply-To: <20150618133122.GC43254@bfoster.bfoster>


[-- Attachment #1.1: Type: text/plain, Size: 9286 bytes --]

Hi all,


>- Is this (and how often) reproducible?

*This is the third time happened in three different servers in past 5
days. *

>- Have you identified which directory in your fs that the object server is
attempting to enumerate when this occurs?

*There's multiple object server workers R/W on over 30 XFS disks in a
server.  I don't have clue about which object server request causes the
kernel panic. I'm still investigating. *

>- Do you have any other, related output in /var/log/messages prior to this
event? E.g., corruption messages or anything of that nature?

*Seems no useful information in the /var/log/syslog*

```
Jun 18 06:07:00 r1obj03 ovpn-454f2951-b955-11e4-8034-0cc47a1f36ee[4069]:
Data Channel Decrypt: Using 160 bit message hash 'SHA1' for HMAC
authentication
Jun 18 06:07:00 r1obj03 ovpn-454f2951-b955-11e4-8034-0cc47a1f36ee[4069]:
Control Channel: TLSv1, cipher TLSv1/SSLv3 DHE-RSA-AES256-SHA, 2048 bit RSA
Jun 18 06:10:01 r1obj03 CRON[13595]: (swift) CMD ((date; test -f
/etc/swift/object-server.conf && /opt/ss/bin/swift-recon-cron
/etc/swift/object-server.conf || /opt/ss/bin/swift-recon-cron
/etc/swift/object-server/1.conf) >> /var/log/swift-recon-cron.log 2>&1)
Jun 18 06:10:14 r1obj03 kernel: [7631629.083099] BUG: unable to handle
kernel NULL pointer dereference at 0000000000000001
```

>- Have you tried an 'xfs_repair -n' of the affected filesystem? Note that
-n will report problems only and prevent any modification by repair.

*We might to to xfs_repair if we can address which disk causes the issue. *

Thanks // Hugo Kuo

2015-06-18 21:31 GMT+08:00 Brian Foster <bfoster@redhat.com>:

> On Thu, Jun 18, 2015 at 07:56:24PM +0800, Kuo Hugo wrote:
> > Hi folks,
> >
> > Recently we found the following kernel message of XFS. I don’t really
> know
> > how to read it in the right way to figure out the problem in the system.
> > Is there any known bug for
> > Linux-3.13.0-32-generic-x86_64-with-Ubuntu-14.04-trusty ? Or the problem
> is
> > on the swift-object-se rather than XFS itself ?
> >
>
> Nothing that I know of, but others might have seen something like this.
>
> > swift-object-se means swift-object-server which is a daemon handles data
> > from http to XFS. I can’t address the problem came from XFS or the daemon
> > swift-object-server.
> > Any idea would be appreciated.
> >
> > Jun 15 09:49:30 r1obj02 kernel: [607696.798803] BUG: unable to handle
> > kernel NULL pointer dereference at 0000000000000001
> > Jun 15 09:49:30 r1obj02 kernel: [607696.800582] IP:
> > [<ffffffffa041a99a>] xfs_dir2_sf_get_parent_ino+0xa/0x20 [xfs]
>
> So that looks like a NULL header down in xfs_dir2_sf_get_ino(), as
> hdr->i8count is at a 1 byte offset in the structure.
>
> > Jun 15 09:49:30 r1obj02 kernel: [607696.802230] PGD 1046c6c067 PUD
> > 1044eba067 PMD 0
> > Jun 15 09:49:30 r1obj02 kernel: [607696.803308] Oops: 0000 [#1] SMP
> > Jun 15 09:49:30 r1obj02 kernel: [607696.804058] Modules linked in:
> > xt_conntrack xfs xt_REDIRECT iptable_nat nf_conntrack_ipv4
> > nf_defrag_ipv4 nf_nat_ipv4 nf_nat xt_tcpudp iptable_filter ip_tables
> > x_tables x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm
> > crct10dif_pclmul crc32_pclmul ghash_clmulni_intel ip_vs aesni_intel
> > aes_x86_64 gpio_ich lrw nf_conntrack gf128mul libcrc32c mei_me
> > glue_helper sb_edac ablk_helper cryptd edac_core joydev mei lpc_ich
> > ioatdma lp ipmi_si shpchp wmi mac_hid parport ses enclosure
> > hid_generic igb usbhid ixgbe mpt2sas ahci hid i2c_algo_bit libahci dca
> > raid_class ptp mdio scsi_transport_sas pps_core
> > Jun 15 09:49:30 r1obj02 kernel: [607696.817125] CPU: 13 PID: 32401
> > Comm: swift-object-se Not tainted 3.13.0-32-generic #57-Ubuntu
> > Jun 15 09:49:30 r1obj02 kernel: [607696.819020] Hardware name: Silicon
> > Mechanics Storform iServ R518.v4/X9DRH-7TF/7F/iTF/iF, BIOS 3.0b
> > 04/28/2014
> > Jun 15 09:49:30 r1obj02 kernel: [607696.821235] task: ffff880017d68000
> > ti: ffff8808e87e4000 task.ti: ffff8808e87e4000
> > Jun 15 09:49:30 r1obj02 kernel: [607696.822889] RIP:
> > 0010:[<ffffffffa041a99a>] [<ffffffffa041a99a>]
> > xfs_dir2_sf_get_parent_ino+0xa/0x20 [xfs]
> > Jun 15 09:49:30 r1obj02 kernel: [607696.825117] RSP:
> > 0018:ffff8808e87e5e38 EFLAGS: 00010202
> > Jun 15 09:49:30 r1obj02 kernel: [607696.826296] RAX: ffffffffa0458360
> > RBX: 0000000000000004 RCX: 0000000000000000
> > Jun 15 09:49:30 r1obj02 kernel: [607696.905158] RDX: 0000000000000002
> > RSI: 0000000000000002 RDI: 0000000000000000
> > Jun 15 09:49:30 r1obj02 kernel: [607696.987107] RBP: ffff8808e87e5e88
> > R08: 000000020079e3b9 R09: 0000000000000004
> > Jun 15 09:49:30 r1obj02 kernel: [607697.069214] R10: 00000000000003e0
> > R11: 00000000000005b0 R12: ffff88104d0c0800
> > Jun 15 09:49:30 r1obj02 kernel: [607697.151676] R13: ffff8808e87e5f20
> > R14: ffff88004988f000 R15: 0000000000000000
> > Jun 15 09:49:30 r1obj02 kernel: [607697.234244] FS:
> > 00007fe74c9fb740(0000) GS:ffff88085fce0000(0000)
> > knlGS:0000000000000000
> > Jun 15 09:49:30 r1obj02 kernel: [607697.318842] CS: 0010 DS: 0000 ES:
> > 0000 CR0: 0000000080050033
> > Jun 15 09:49:30 r1obj02 kernel: [607697.361609] CR2: 0000000000000001
> > CR3: 0000000bcb9b1000 CR4: 00000000001407e0
> > Jun 15 09:49:30 r1obj02 kernel: [607697.445360] Stack:
> > Jun 15 09:49:30 r1obj02 kernel: [607697.485796] ffff8808e87e5e88
> > ffffffffa03e2a33 ffff8808e87e5e58 ffffffff817205f9
> > Jun 15 09:49:30 r1obj02 kernel: [607697.567306] ffff8808e87e5eb8
> > ffff88084e1e6700 ffff88004988f000 ffff8808e87e5f20
> > Jun 15 09:49:30 r1obj02 kernel: [607697.648568] 0000000000000082
> > 00007fe7487aa7a6 ffff8808e87e5ec0 ffffffffa03e2e0b
> > Jun 15 09:49:30 r1obj02 kernel: [607697.729785] Call Trace:
> > Jun 15 09:49:30 r1obj02 kernel: [607697.769297] [<ffffffffa03e2a33>] ?
> > xfs_dir2_sf_getdents+0x263/0x2a0 [xfs]
>
> We're called from here attempting to list a directory, which appears to
> be the following block of code:
>
>         ...
>         sfp = (xfs_dir2_sf_hdr_t *)dp->i_df.if_u1.if_data;
>         ...
>         if (ctx->pos <= dotdot_offset) {
>                 ino = dp->d_ops->sf_get_parent_ino(sfp);
>                 ctx->pos = dotdot_offset & 0x7fffffff;
>                 if (!dir_emit(ctx, "..", 2, ino, DT_DIR))
>                         return 0;
>         }
>
> It wants to emit the ".." directory entry and apparently the in-core
> data fork is NULL. There's an assertion against that earlier in the
> function so I take it the expectation is that this has been read/set
> beforehand. In fact, if this is a short form directory I also take it
> this should be set to if_inline_data, which appears to be part of the
> fork allocation itself.
>
> It's not immediately clear to me how this could happen. First off, it
> would probably be good to determine whether this is a runtime issue or
> due to some kind of on-disk problem. Some questions:
>
> - Is this (and how often) reproducible?
> - Have you identified which directory in your fs that the object server
>   is attempting to enumerate when this occurs?
> - Do you have any other, related output in /var/log/messages prior to
>   this event? E.g., corruption messages or anything of that nature?
> - Have you tried an 'xfs_repair -n' of the affected filesystem? Note
>   that -n will report problems only and prevent any modification by
>   repair.
>
> Brian
>
> > Jun 15 09:49:30 r1obj02 kernel: [607697.809560] [<ffffffff817205f9>] ?
> > schedule_preempt_disabled+0x29/0x70
> > Jun 15 09:49:30 r1obj02 kernel: [607697.849087] [<ffffffffa03e2e0b>]
> > xfs_readdir+0xeb/0x110 [xfs]
> > Jun 15 09:49:30 r1obj02 kernel: [607697.887918] [<ffffffffa03e4a3b>]
> > xfs_file_readdir+0x2b/0x40 [xfs]
> > Jun 15 09:49:30 r1obj02 kernel: [607697.926061] [<ffffffff811d0035>]
> > iterate_dir+0xa5/0xe0
> > Jun 15 09:49:30 r1obj02 kernel: [607697.963349] [<ffffffff8109ddf4>] ?
> > vtime_account_user+0x54/0x60
> > Jun 15 09:49:30 r1obj02 kernel: [607698.000413] [<ffffffff811d0492>]
> > SyS_getdents+0x92/0x120
> > Jun 15 09:49:30 r1obj02 kernel: [607698.037112] [<ffffffff811d0150>] ?
> > fillonedir+0xe0/0xe0
> > Jun 15 09:49:30 r1obj02 kernel: [607698.072867] [<ffffffff8172c81c>] ?
> > tracesys+0x7e/0xe6
> > Jun 15 09:49:30 r1obj02 kernel: [607698.107679] [<ffffffff8172c87f>]
> > tracesys+0xe1/0xe6
> > Jun 15 09:49:30 r1obj02 kernel: [607698.141543] Code: 00 48 8b 06 48
> > ba ff ff ff ff ff ff ff 00 5d 48 0f c8 48 21 d0 c3 66 66 2e 0f 1f 84
> > 00 00 00 00 00 0f 1f 44 00 00 55 48 8d 77 02 <0f> b6 7f 01 48 89 e5 e8
> > aa ff ff ff 5d c3 0f 1f 84 00 00 00 00
> > Jun 15 09:49:30 r1obj02 kernel: [607698.244881] RIP
> > [<ffffffffa041a99a>] xfs_dir2_sf_get_parent_ino+0xa/0x20 [xfs]
> > Jun 15 09:49:30 r1obj02 kernel: [607698.310872] RSP <ffff8808e87e5e38>
> > Jun 15 09:49:30 r1obj02 kernel: [607698.343092] CR2: 0000000000000001
> > Jun 15 09:49:30 r1obj02 kernel: [607698.420933] ---[ end trace
> > ba3fdf319346b7e6 ]---
> >
> > Thanks // Hugo Kuo
> > ​
>
> > _______________________________________________
> > xfs mailing list
> > xfs@oss.sgi.com
> > http://oss.sgi.com/mailman/listinfo/xfs
>
>

[-- Attachment #1.2: Type: text/html, Size: 11872 bytes --]

[-- Attachment #2: Type: text/plain, Size: 121 bytes --]

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

  reply	other threads:[~2015-06-18 14:29 UTC|newest]

Thread overview: 23+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-06-18 11:56 Data can't be wrote to XFS RIP [<ffffffffa041a99a>] xfs_dir2_sf_get_parent_ino+0xa/0x20 Kuo Hugo
2015-06-18 13:31 ` Brian Foster
2015-06-18 14:29   ` Kuo Hugo [this message]
2015-06-18 14:59     ` Eric Sandeen
2015-07-09 10:57       ` Kuo Hugo
2015-07-09 12:51         ` Brian Foster
2015-07-09 13:20           ` Kuo Hugo
2015-07-09 13:27             ` Kuo Hugo
2015-07-09 15:18             ` Brian Foster
2015-07-09 16:40               ` Kuo Hugo
2015-07-09 18:32                 ` Brian Foster
2015-07-10  5:36                   ` Kuo Hugo
2015-07-10 10:39                     ` Kuo Hugo
2015-07-10 16:25                       ` Kuo Hugo
2015-07-13 12:52                     ` Brian Foster
2015-07-13 14:06                       ` Kuo Hugo
2015-07-13 17:01                         ` Brian Foster
2015-07-13 18:10                           ` Kuo Hugo
2015-07-17 19:39                             ` Kuo Hugo
2015-07-20 11:46                               ` Brian Foster
2015-07-20 14:30                                 ` Kuo Hugo
2015-07-20 15:12                                   ` Brian Foster
2015-07-22  8:54                                     ` Kuo Hugo

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CA++_uhuLbTT32v4ZeVkP3WOjBEeMU0O1NZDcv66a67hZ=pj9Sw@mail.gmail.com' \
    --to=tonytkdk@gmail.com \
    --cc=bfoster@redhat.com \
    --cc=darrell@swiftstack.com \
    --cc=hugo@swiftstack.com \
    --cc=xfs@oss.sgi.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.