From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from relay.sgi.com (relay1.corp.sgi.com [137.38.102.111]) by oss.sgi.com (Postfix) with ESMTP id A8EFE7F5E for ; Mon, 20 Jul 2015 09:30:34 -0500 (CDT) Received: from cuda.sgi.com (cuda3.sgi.com [192.48.176.15]) by relay1.corp.sgi.com (Postfix) with ESMTP id 896EB8F8049 for ; Mon, 20 Jul 2015 07:30:34 -0700 (PDT) Received: from mail-wi0-f182.google.com (mail-wi0-f182.google.com [209.85.212.182]) by cuda.sgi.com with ESMTP id YVUPRHbeW49BaWrP (version=TLSv1 cipher=RC4-SHA bits=128 verify=NO) for ; Mon, 20 Jul 2015 07:30:32 -0700 (PDT) Received: by wibxm9 with SMTP id xm9so93539916wib.0 for ; Mon, 20 Jul 2015 07:30:31 -0700 (PDT) MIME-Version: 1.0 In-Reply-To: <20150720114648.GB53450@bfoster.bfoster> References: <20150709151811.GE63282@bfoster.bfoster> <20150709183255.GG63282@bfoster.bfoster> <20150713125214.GA50787@bfoster.bfoster> <20150713170158.GB50787@bfoster.bfoster> <20150720114648.GB53450@bfoster.bfoster> Date: Mon, 20 Jul 2015 22:30:31 +0800 Message-ID: Subject: Re: Data can't be wrote to XFS RIP [] xfs_dir2_sf_get_parent_ino+0xa/0x20 From: Kuo Hugo List-Id: XFS Filesystem from SGI List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Content-Type: multipart/mixed; boundary="===============1231010175806122022==" Errors-To: xfs-bounces@oss.sgi.com Sender: xfs-bounces@oss.sgi.com To: Brian Foster Cc: Hugo Kuo , Eric Sandeen , Darrell Bishop , xfs@oss.sgi.com --===============1231010175806122022== Content-Type: multipart/alternative; boundary=047d7bfcf2f64167ef051b4f615d --047d7bfcf2f64167ef051b4f615d Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable Hi Brain, >I don=E2=80=99t know much about the Swift bug. A BUG() or crash in the ker= nel is generally always a kernel bug, regardless of what userspace is doing. It >certainly could be that whatever userspace is doing to trigger the kernel bug is a bug in the userspace application, but either way it shouldn=E2=80= =99t cause the >kernel to crash. By the same token, if Swift is updated to fix the aforementioned bug and the kernel crash no longer reproduces, that doesn=E2=80=99t >necessarily mean the kernel bug is fixed (just potentially= hidden). Understand. [Previous Message] The valid inode has an inode number of 13668207561. - The fsname for this inode is "sdb." - The inode does appear to have a non-NULL if_data: if_u1 =3D { if_extents =3D 0xffff88084feaf5c0, if_ext_irec =3D 0xffff88084feaf5c0, if_data =3D 0xffff88084feaf5c0 "\004" }, find -inum 13668207561 Q1: Were you able to track down the directory inode mentioned in the previous message? Ans: Yes, it=E2=80=99s the directory/file as below. /srv/node/d224 is the m= ount point of /dev/sdb . This is the original location of the path. This folder includes the file 1436266052.71893.ts now. The .ts file is 0 size [root@r2obj01 ~]# find /srv/node/d224 -inum 13668207561 /srv/node/d224/objects/45382/b32/b146865bf8034bfc42570b747c341b32 [root@r2obj01 ~]# ls -lrt /srv/node/d224/objects/45382/b32/b146865bf8034bfc42570b747c341b32 -rw------- 1 swift swift 0 Jul 7 22:37 1436266052.71893.ts Q2: Is it some kind of internal directory used by the application (e.g., perhaps related to the quarantine mechanism mentioned in the bug)? Ans: Yes, it=E2=80=99s a directory which accessing by application. 37 ffff8810718343c0 ffff88105b9d32c0 ffff8808745aa5e8 REG [eventpoll] 38 ffff8808713da780 ffff880010c9a900 ffff88096368a188 REG /srv/node/d224/quarantined/objects/b146865bf8034bfc42570b747c341b32/1436266= 042.57775.ts 39 ffff880871cb03c0 ffff880495a8b380 ffff8808a5e6c988 REG /srv/node/d224/tmp/tmpSpnrHg 40 ffff8808715b4540 ffff8804819c58c0 ffff8802381f8d88 DIR /srv/node/d224/quarantined/objects/b146865bf8034bfc42570b747c341b32 The above operation in the swift-object-server was doing python function call to rename the file * /srv/node/d224/objects/45382/b32/b146865bf8034bfc42570b747c341b32/1436266= 042.57775.ts* as */srv/node/d224/quarantined/objects/b146865bf8034bfc42570b747c341b32/143626= 6042.57775.ts* os.rename(old, new) And it crashed at this point. In the Q1, we found the inum is pointing to the directory /srv/node/d224/objects/45382/b32/b146865bf8034bfc42570b747c341b32 . We found that multiple(over 10) DELETE from application against the target file at almost same moment. The DELETE is removing the original file in the directory and create new empty .ts file in this directory. I suspect that multiple os.rename on the same file in that directory will cause the kernel panic. And the file /srv/node/d224/quarantined/objects/b146865bf8034bfc42570b747c341b32/1436266= 042.57775.ts was not created. Regards // Hugo =E2=80=8B --047d7bfcf2f64167ef051b4f615d Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable

Hi Brain,

>I don=E2=80=99t know much a= bout the Swift bug. A BUG() or crash in the kernel is generally always a ke= rnel bug, regardless of what userspace is doing. It >certainly could be = that whatever userspace is doing to trigger the kernel bug is a bug in the = userspace application, but either way it shouldn=E2=80=99t cause the >ke= rnel to crash. By the same token, if Swift is updated to fix the aforementi= oned bug and the kernel crash no longer reproduces, that doesn=E2=80=99t &g= t;necessarily mean the kernel bug is fixed (just potentially hidden).

Understand.=C2=A0

[Previous Message]

Q1: Were you able =
to track down the directory inode mentioned in the previous message?

Ans: Yes, it=E2=80=99s the dire= ctory/file as below. /srv/node/d224 is the mount point of /dev/sdb . This i= s the original location of the path. This folder includes the file 14362660= 52.71893.ts now. The .ts file is 0 size

Q2: Is it some kin=
d of internal directory used by the application (e.g., perhaps related to t=
he quarantine mechanism mentioned in the bug)?

Ans: Yes, it=E2=80=99s a direct= ory which accessing by application.

The above operatio=
n in the swift-object-server was doing python function call to rename the f=
ile=C2=A0/srv/node/d224/objects/45382/b32/b146865bf8034bfc42570b747c341b=
32/1436266042.57775.ts=C2=A0as /srv/node/d224/quarantined/objects/b1=
46865bf8034bfc42570b747c341b32/1436266042.57775.ts

os.rename(old, new)

And it crashed at this point. I= n the Q1, we found the inum is pointing to the directory=C2=A0We found that = multiple(over 10) DELETE from application against the target file at almost= same moment. The DELETE is removing the original file in the directory and= create new empty .ts file in this directory. I suspect that multiple os.re= name on the same file in that directory will cause the kernel panic.=C2=A0<= /p>

And the file /srv/node/d224/= quarantined/objects/b146865bf8034bfc42570b747c341b32/1436266042.57775.ts wa= s not created.

Regards /= / Hugo=C2=A0

=E2= =80=8B
--047d7bfcf2f64167ef051b4f615d-- --===============1231010175806122022== Content-Type: text/plain; charset="us-ascii" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit Content-Disposition: inline _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs --===============1231010175806122022==--