All the mail mirrored from lore.kernel.org
 help / color / mirror / Atom feed
From: Kuo Hugo <tonytkdk@gmail.com>
To: Brian Foster <bfoster@redhat.com>
Cc: Hugo Kuo <hugo@swiftstack.com>,
	Eric Sandeen <sandeen@sandeen.net>,
	Darrell Bishop <darrell@swiftstack.com>,
	xfs@oss.sgi.com
Subject: Re: Data can't be wrote to XFS RIP [<ffffffffa041a99a>] xfs_dir2_sf_get_parent_ino+0xa/0x20
Date: Wed, 22 Jul 2015 16:54:11 +0800	[thread overview]
Message-ID: <CA++_uhu-2Eq02WAvzDo=TQBZzMmAmfAn4H1gPnDZGV+JsqAHWw@mail.gmail.com> (raw)
In-Reply-To: <20150720151256.GA17816@bfoster.bfoster>


[-- Attachment #1.1: Type: text/plain, Size: 2534 bytes --]

Hi Brain,

>The original stacktrace shows the crash in a readdir request. I'm sure
>there are multiple things going on here (and there are a couple rename
>traces in the vmcore sitting on locks), of course, but where does the
>information about the rename come from?

I tracked source code of the application. It moves data to a quarantined
area(another folder on same disk) under some conditions. In the bug report,
it indicates a condition that DELETE(create empty file in a directory)
object + list the directory will cause data MOVE (os.rename) to quarantined
area(another folder). The os.rename function call is the only function of
the application to touch quarantined folder.

>I'm not quite following here because I don't have enough context about
>what the application server is doing. So far, it sounds like we somehow
>have multiple threads competing to rename the same file..? Is there
>anything else in this directory at the time this sequence executes
>(e.g., a file with object data that also gets quarantined)?

The previous behavior (a bug in the application) should not trigger Kernel
panic. Yes, there's multiple threads competing to DELETE(create a empty
file) in the same directory also move the existing one to the quarantined
area. I think this is the root cause of kernel panic. The scenario is 10
application workers raise 10 thread to do same thing in the same moment.

>Ideally, we'd ultimately like to translate this into a sequence of
>operations as seen by the fs that hopefully trigger the problem. We
>might have to start by reproducing through the application server.
>Looking back at that bug report, it sounds like a 'DELETE' is a
>high-level server operation that can consist of multiple sub-operations
>at the filesystem level (e.g., list, conditional rename if *.ts file
>exists, etc.). Do you have enough information through any of the above
>to try and run something against Swift that might explicitly reproduce
>the problem? For example, have one thread that creates and recreates the
>same object repeatedly and many more competing threads that try to
>remove (or whatever results in the quarantine) it? Note that I'm just
>grasping at straws here, you might be able to design a more accurate
>reproducer based on what it looks like is happening within Swift.

We observe this issue on production cluster. It's hard to have a free gear
with 100% same HW to test it currently.
I'll try to figure out an approach to reproduce it. I'll update this mail
thread if I can make it.

Thanks // Hugo

[-- Attachment #1.2: Type: text/html, Size: 4874 bytes --]

[-- Attachment #2: Type: text/plain, Size: 121 bytes --]

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

      reply	other threads:[~2015-07-22  8:54 UTC|newest]

Thread overview: 23+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-06-18 11:56 Data can't be wrote to XFS RIP [<ffffffffa041a99a>] xfs_dir2_sf_get_parent_ino+0xa/0x20 Kuo Hugo
2015-06-18 13:31 ` Brian Foster
2015-06-18 14:29   ` Kuo Hugo
2015-06-18 14:59     ` Eric Sandeen
2015-07-09 10:57       ` Kuo Hugo
2015-07-09 12:51         ` Brian Foster
2015-07-09 13:20           ` Kuo Hugo
2015-07-09 13:27             ` Kuo Hugo
2015-07-09 15:18             ` Brian Foster
2015-07-09 16:40               ` Kuo Hugo
2015-07-09 18:32                 ` Brian Foster
2015-07-10  5:36                   ` Kuo Hugo
2015-07-10 10:39                     ` Kuo Hugo
2015-07-10 16:25                       ` Kuo Hugo
2015-07-13 12:52                     ` Brian Foster
2015-07-13 14:06                       ` Kuo Hugo
2015-07-13 17:01                         ` Brian Foster
2015-07-13 18:10                           ` Kuo Hugo
2015-07-17 19:39                             ` Kuo Hugo
2015-07-20 11:46                               ` Brian Foster
2015-07-20 14:30                                 ` Kuo Hugo
2015-07-20 15:12                                   ` Brian Foster
2015-07-22  8:54                                     ` Kuo Hugo [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CA++_uhu-2Eq02WAvzDo=TQBZzMmAmfAn4H1gPnDZGV+JsqAHWw@mail.gmail.com' \
    --to=tonytkdk@gmail.com \
    --cc=bfoster@redhat.com \
    --cc=darrell@swiftstack.com \
    --cc=hugo@swiftstack.com \
    --cc=sandeen@sandeen.net \
    --cc=xfs@oss.sgi.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.