intel-xe.lists.freedesktop.org archive mirror
 help / color / mirror / Atom feed
From: Matthew Brost <matthew.brost@intel.com>
To: Nirmoy Das <nirmoy.das@intel.com>
Cc: <intel-xe@lists.freedesktop.org>, Oak Zeng <oak.zeng@intel.com>
Subject: Re: [RFC] drm/xe: Ensure write lock is held when calling xe_vma_rebind()
Date: Thu, 6 Jun 2024 16:30:17 +0000	[thread overview]
Message-ID: <ZmHkGWrFnLMD+sZW@DUT025-TGLU.fm.intel.com> (raw)
In-Reply-To: <566daea7-d438-4c63-a491-851d4dc804f7@intel.com>

On Thu, Jun 06, 2024 at 12:09:39PM +0200, Nirmoy Das wrote:
> Hi Matt,
> 
> On 6/5/2024 6:44 PM, Matthew Brost wrote:
> > On Wed, Jun 05, 2024 at 12:16:07PM +0200, Nirmoy Das wrote:
> > > xe_vma_rebind() expects write vm lock is held so start out with a
> > > read lock and upgrade if a vma rebind is required.
> > > 
> > > Closes: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/1894
> > > Fixes: bf69918b7199 ("drm/xe: Use xe_vma_ops to implement page fault rebinds")
> > > Cc: Matthew Brost <matthew.brost@intel.com>
> > > Cc: Oak Zeng <oak.zeng@intel.com>
> > > Signed-off-by: Nirmoy Das <nirmoy.das@intel.com>
> > > ---
> > >   drivers/gpu/drm/xe/xe_gt_pagefault.c | 25 +++++++++----------------
> > >   1 file changed, 9 insertions(+), 16 deletions(-)
> > > 
> > > diff --git a/drivers/gpu/drm/xe/xe_gt_pagefault.c b/drivers/gpu/drm/xe/xe_gt_pagefault.c
> > > index 040dd142c49c..543cfa9892f6 100644
> > > --- a/drivers/gpu/drm/xe/xe_gt_pagefault.c
> > > +++ b/drivers/gpu/drm/xe/xe_gt_pagefault.c
> > > @@ -153,24 +153,15 @@ static int handle_pagefault(struct xe_gt *gt, struct pagefault *pf)
> > >   		return -EINVAL;
> > >   retry_userptr:
> > > -	/*
> > > -	 * TODO: Avoid exclusive lock if VM doesn't have userptrs, or
> > > -	 * start out read-locked?
> > > -	 */
> > > -	down_write(&vm->lock);
> > > -	write_locked = true;
> > > +	/* start out with read-locked and upgrade of vma is needed a rebind */
> > > +	down_read(&vm->lock);
> > > +	write_locked = false;
> > >   	vma = lookup_vma(vm, pf->page_addr);
> > >   	if (!vma) {
> > >   		ret = -EINVAL;
> > >   		goto unlock_vm;
> > >   	}
> > > -	if (!xe_vma_is_userptr(vma) ||
> > > -	    !xe_vma_userptr_check_repin(to_userptr_vma(vma))) {
> > > -		downgrade_write(&vm->lock);
> > > -		write_locked = false;
> > > -	}
> > > -
> > >   	trace_xe_vma_pagefault(vma);
> > >   	atomic = access_is_atomic(pf->access_type);
> > > @@ -179,9 +170,14 @@ static int handle_pagefault(struct xe_gt *gt, struct pagefault *pf)
> > >   	if (vma_is_valid(tile, vma) && !atomic)
> > >   		goto unlock_vm;
> > > +	/* xe_vma_rebind() expects a write lock so upgrade the vm lock */
> > > +	up_read(&vm->lock);
> > > +	down_write(&vm->lock);
> > You can't do this, or even if it works it is bad, bad practice to drop
> > the lock as someone else could come in and acquire the lock in write
> > mode. In this case, a user unbind of the VMA could happen and the VMA
> > could be destoryed.
> 
> I didn't verify but assumed vma is ref_counted but anyways I knew this
> wasn't correct
> 

VMAs are not ref counted which indeed is very tricky and exposes us to a
bunch of possible bugs. The current scheme is if the VMA is in GPUVM tree,
it is safe to use the VMA while under the vm->lock. Once the VMA is
removed from GPUVM tree, it can only be referenced in the pending
destroy code.

I did post a patch to refcount VMAs a while back that got lost in the
stack [1].

Matt

[1] https://patchwork.freedesktop.org/series/132573/

> so tagged it RFC :)
> 
> >   Now the VMA refs below are accessing corrupt memory.
> > 
> > I have on going refactor [1] here which just takes the vm->lock in write
> > mode for page faults. I suggest we just review / use that now.
> I will do that.
> > 
> > Long term I think we need to audit the read / write usage of vm->lock as
> > I'm almost certain we have broken the intended usage of this lock. Or
> > honestly maybe just change this lock to a mutex as I'm unsure if the
> > read vs. write buys us anything and complicates the driver.
> 
> Yes, that might be simpler.
> 
> 
> Thanks,
> 
> Nirmoy
> 
> > 
> > Matt
> > 
> > [1] https://patchwork.freedesktop.org/series/133628/
> > 
> > > +	write_locked = true;
> > >   	/* TODO: Validate fault */
> > > -	if (xe_vma_is_userptr(vma) && write_locked) {
> > > +	if (xe_vma_is_userptr(vma) &&
> > > +	    xe_vma_userptr_check_repin(to_userptr_vma(vma))) {
> > >   		struct xe_userptr_vma *uvma = to_userptr_vma(vma);
> > >   		spin_lock(&vm->userptr.invalidated_lock);
> > > @@ -191,9 +187,6 @@ static int handle_pagefault(struct xe_gt *gt, struct pagefault *pf)
> > >   		ret = xe_vma_userptr_pin_pages(uvma);
> > >   		if (ret)
> > >   			goto unlock_vm;
> > > -
> > > -		downgrade_write(&vm->lock);
> > > -		write_locked = false;
> > >   	}
> > >   	/* Lock VM and BOs dma-resv */
> > > -- 
> > > 2.42.0
> > > 

  reply	other threads:[~2024-06-06 16:31 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-06-05 10:16 [RFC] drm/xe: Ensure write lock is held when calling xe_vma_rebind() Nirmoy Das
2024-06-05 10:55 ` ✓ CI.Patch_applied: success for " Patchwork
2024-06-05 10:55 ` ✓ CI.checkpatch: " Patchwork
2024-06-05 10:56 ` ✓ CI.KUnit: " Patchwork
2024-06-05 11:07 ` ✓ CI.Build: " Patchwork
2024-06-05 11:08 ` ✗ CI.Hooks: failure " Patchwork
2024-06-05 11:09 ` ✓ CI.checksparse: success " Patchwork
2024-06-05 11:41 ` ✓ CI.BAT: " Patchwork
2024-06-05 16:44 ` [RFC] " Matthew Brost
2024-06-06 10:09   ` Nirmoy Das
2024-06-06 16:30     ` Matthew Brost [this message]
2024-06-06  1:43 ` ✗ CI.FULL: failure for " Patchwork

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ZmHkGWrFnLMD+sZW@DUT025-TGLU.fm.intel.com \
    --to=matthew.brost@intel.com \
    --cc=intel-xe@lists.freedesktop.org \
    --cc=nirmoy.das@intel.com \
    --cc=oak.zeng@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).