All the mail mirrored from lore.kernel.org
 help / color / mirror / Atom feed
From: Brian Foster <bfoster@redhat.com>
To: Dave Chinner <david@fromorbit.com>
Cc: David Jeffery <djeffery@redhat.com>, xfs@oss.sgi.com
Subject: Re: [PATCH] xfs: add missing ilock around dio write last extent alignment
Date: Mon, 14 Sep 2015 09:24:55 -0400	[thread overview]
Message-ID: <20150914132455.GA22770@bfoster.bfoster> (raw)
In-Reply-To: <20150913235835.GV26895@dastard>

On Mon, Sep 14, 2015 at 09:58:35AM +1000, Dave Chinner wrote:
> On Wed, Sep 09, 2015 at 10:43:32AM -0400, Brian Foster wrote:
> > The iomap codepath (via get_blocks()) acquires and release the inode
> > lock in the case of a direct write that requires block allocation. This
> > is because xfs_iomap_write_direct() allocates a transaction, which means
> > the ilock must be dropped and reacquired after the transaction is
> > allocated and reserved.
> > 
> > xfs_iomap_write_direct() invokes xfs_iomap_eof_align_last_fsb() before
> > the transaction is created and thus before the ilock is reacquired. This
> > can lead to calls to xfs_iread_extents() and reads of the in-core extent
> > list without any synchronization (via xfs_bmap_eof() and
> > xfs_bmap_last_extent()). xfs_iread_extents() assert fails if the ilock
> > is not held, but this is not currently seen in practice as the current
> > callers had already invoked xfs_bmapi_read().
> > 
> > What has been seen in practice are reports of crashes down in the
> > xfs_bmap_eof() codepath on direct writes due to seemingly bogus pointer
> > references from xfs_iext_get_ext(). While an explicit reproducer is not
> > currently available to confirm the cause of the problem, crash analysis
> > and code inspection from David Jeffrey had identified the insufficient
> > locking.
> > 
> > xfs_iomap_eof_align_last_fsb() is called from other contexts with the
> > inode lock already held. __xfs_get_blocks() acquires and drops the ilock
> > with variable flags. Therefore, take the simple approach to cycle ilock
> > around the last extent alignment call from xfs_iomap_write_direct().
> > 
> > Reported-by: David Jeffery <djeffery@redhat.com>
> > Signed-off-by: Brian Foster <bfoster@redhat.com>
> > ---
> >  fs/xfs/xfs_iomap.c | 2 ++
> >  1 file changed, 2 insertions(+)
> > 
> > diff --git a/fs/xfs/xfs_iomap.c b/fs/xfs/xfs_iomap.c
> > index 1f86033..4d7534e 100644
> > --- a/fs/xfs/xfs_iomap.c
> > +++ b/fs/xfs/xfs_iomap.c
> > @@ -142,7 +142,9 @@ xfs_iomap_write_direct(
> >  	offset_fsb = XFS_B_TO_FSBT(mp, offset);
> >  	last_fsb = XFS_B_TO_FSB(mp, ((xfs_ufsize_t)(offset + count)));
> >  	if ((offset + count) > XFS_ISIZE(ip)) {
> > +		xfs_ilock(ip, XFS_ILOCK_EXCL);
> >  		error = xfs_iomap_eof_align_last_fsb(mp, ip, extsz, &last_fsb);
> > +		xfs_iunlock(ip, XFS_ILOCK_EXCL);
> 
> XFS_ILOCK_SHARED?
> 

I suspect that is technically sufficient in this particular call path
given that we've called xfs_bmapi_read(). The problem is that there is a
call to xfs_iread_extents() buried a few calls deep in
xfs_bmap_last_extent(). My understanding is that we need the exclusive
lock because it's not safe for multiple threads to populate the in-core
extent list at the same time, so I don't really want to replace the
existing race with a landmine should the context happen to change in the
future.

> Also, looking at __xfs_get_blocks(), we drop the ilock immediately
> before calling xfs_iomap_write_direct(), which we already hold in
> shared mode for the xfs_bmapi_read() for direct IO.
> 
> Can we push that lock dropping into xfs_iomap_write_direct() after
> we've done the xfs_iomap_eof_align_last_fsb() call and before we do
> transaction reservations so we don't need an extra lock round-trip
> here? e.g. xfs_iomap_write_delay() is called under the lock context
> held by __xfs_get_blocks()....
> 

That was my initial thought when looking at this code... e.g., to just
carry the lock over and drop it prior to transaction setup. I didn't go
that route because __xfs_get_blocks() uses a variable locking mode and
it seemed ugly to pass along the lock mode to xfs_iomap_direct_write().
Further, given the above it also looked like we'd have to check and
cycle the ilock EXCL if it were ILOCK_SHARED. Finally,
xfs_iomap_direct_write() has a call to xfs_qm_dqattach() which itself
acquires ILOCK_EXCL. Looking at xfs_iomap_write_delay(), we do have a
dqattach_locked() variant but it also expects to have ILOCK_EXCL.

Hmm, so in the common case both the extent list and a quota are handled
once and thus the only notable lock cycle is the align_last_fsb() case.
I think we could do something like this:

- Create a shared lock safe variant of xfs_iomap_eof_align_last_fsb() to
  be called from xfs_iomap_write_direct().
- __xfs_get_blocks() continues to call xfs_ilock_data_map_shared(), but
  unconditionally demotes XFS_ILOCK_EXCL to XFS_ILOCK_SHARED before
  calling xfs_iomap_write_direct().
- xfs_iomap_write_direct() moves the xfs_qm_dqattach() call to
  immediately before the transaction allocation. E.g., it executes the
  existing align_last_fsb() bits and whatnot under XFS_ILOCK_SHARED, drops
  the lock, potentially attaches the quota and carries on as normal with
  the transaction.

The only thing I'm not sure about is the shared lock safe version of
xfs_iomap_eof_align_last_fsb(). The xfs_iread_extents() call is a few
calls deep and xfs_bmap_last_extent() is called from other contexts. I
suppose we could call it as is and pull up an assert to check for
XFS_IFEXTENTS such that the situation is explicitly documented in the
appropriate context (we do already have the assert in
xfs_iread_extents() if it were called). Also, I take it we can safely
assume the in-core extent list is still around if we still hold the lock
from the xfs_bmapi_read() call. Thoughts? I guess I'll float another
patch...

Brian

> Cheers,
> 
> Dave.
> -- 
> Dave Chinner
> david@fromorbit.com
> 
> _______________________________________________
> xfs mailing list
> xfs@oss.sgi.com
> http://oss.sgi.com/mailman/listinfo/xfs

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

  reply	other threads:[~2015-09-14 13:25 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-09-09 14:43 [PATCH] xfs: add missing ilock around dio write last extent alignment Brian Foster
2015-09-13 23:58 ` Dave Chinner
2015-09-14 13:24   ` Brian Foster [this message]
2015-09-16 22:34     ` Dave Chinner
2015-09-17 12:16       ` Brian Foster

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20150914132455.GA22770@bfoster.bfoster \
    --to=bfoster@redhat.com \
    --cc=david@fromorbit.com \
    --cc=djeffery@redhat.com \
    --cc=xfs@oss.sgi.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.