LKML Archive mirror
 help / color / mirror / Atom feed
* [PATCH] nouveau: explicitly wait on the fence in nouveau_bo_move_m2mf
@ 2022-08-19 20:09 Karol Herbst
  2022-08-22 21:15 ` Lyude Paul
                   ` (2 more replies)
  0 siblings, 3 replies; 7+ messages in thread
From: Karol Herbst @ 2022-08-19 20:09 UTC (permalink / raw
  To: linux-kernel
  Cc: Ben Skeggs, Lyude Paul, dri-devel, nouveau, Karol Herbst, stable

It is a bit unlcear to us why that's helping, but it does and unbreaks
suspend/resume on a lot of GPUs without any known drawbacks.

Cc: stable@vger.kernel.org # v5.15+
Closes: https://gitlab.freedesktop.org/drm/nouveau/-/issues/156
Signed-off-by: Karol Herbst <kherbst@redhat.com>
---
 drivers/gpu/drm/nouveau/nouveau_bo.c | 9 +++++++++
 1 file changed, 9 insertions(+)

diff --git a/drivers/gpu/drm/nouveau/nouveau_bo.c b/drivers/gpu/drm/nouveau/nouveau_bo.c
index 35bb0bb3fe61..126b3c6e12f9 100644
--- a/drivers/gpu/drm/nouveau/nouveau_bo.c
+++ b/drivers/gpu/drm/nouveau/nouveau_bo.c
@@ -822,6 +822,15 @@ nouveau_bo_move_m2mf(struct ttm_buffer_object *bo, int evict,
 		if (ret == 0) {
 			ret = nouveau_fence_new(chan, false, &fence);
 			if (ret == 0) {
+				/* TODO: figure out a better solution here
+				 *
+				 * wait on the fence here explicitly as going through
+				 * ttm_bo_move_accel_cleanup somehow doesn't seem to do it.
+				 *
+				 * Without this the operation can timeout and we'll fallback to a
+				 * software copy, which might take several minutes to finish.
+				 */
+				nouveau_fence_wait(fence, false, false);
 				ret = ttm_bo_move_accel_cleanup(bo,
 								&fence->base,
 								evict, false,
-- 
2.37.1


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH] nouveau: explicitly wait on the fence in nouveau_bo_move_m2mf
  2022-08-19 20:09 [PATCH] nouveau: explicitly wait on the fence in nouveau_bo_move_m2mf Karol Herbst
@ 2022-08-22 21:15 ` Lyude Paul
  2022-09-20 10:42 ` Salvatore Bonaccorso
  2022-11-19  5:20 ` [Nouveau] " Computer Enthusiastic
  2 siblings, 0 replies; 7+ messages in thread
From: Lyude Paul @ 2022-08-22 21:15 UTC (permalink / raw
  To: Karol Herbst, linux-kernel; +Cc: Ben Skeggs, dri-devel, nouveau, stable

Reviewed-by: Lyude Paul <lyude@redhat.com>

On Fri, 2022-08-19 at 22:09 +0200, Karol Herbst wrote:
> It is a bit unlcear to us why that's helping, but it does and unbreaks
> suspend/resume on a lot of GPUs without any known drawbacks.
> 
> Cc: stable@vger.kernel.org # v5.15+
> Closes: https://gitlab.freedesktop.org/drm/nouveau/-/issues/156
> Signed-off-by: Karol Herbst <kherbst@redhat.com>
> ---
>  drivers/gpu/drm/nouveau/nouveau_bo.c | 9 +++++++++
>  1 file changed, 9 insertions(+)
> 
> diff --git a/drivers/gpu/drm/nouveau/nouveau_bo.c b/drivers/gpu/drm/nouveau/nouveau_bo.c
> index 35bb0bb3fe61..126b3c6e12f9 100644
> --- a/drivers/gpu/drm/nouveau/nouveau_bo.c
> +++ b/drivers/gpu/drm/nouveau/nouveau_bo.c
> @@ -822,6 +822,15 @@ nouveau_bo_move_m2mf(struct ttm_buffer_object *bo, int evict,
>  		if (ret == 0) {
>  			ret = nouveau_fence_new(chan, false, &fence);
>  			if (ret == 0) {
> +				/* TODO: figure out a better solution here
> +				 *
> +				 * wait on the fence here explicitly as going through
> +				 * ttm_bo_move_accel_cleanup somehow doesn't seem to do it.
> +				 *
> +				 * Without this the operation can timeout and we'll fallback to a
> +				 * software copy, which might take several minutes to finish.
> +				 */
> +				nouveau_fence_wait(fence, false, false);
>  				ret = ttm_bo_move_accel_cleanup(bo,
>  								&fence->base,
>  								evict, false,

-- 
Cheers,
 Lyude Paul (she/her)
 Software Engineer at Red Hat


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] nouveau: explicitly wait on the fence in nouveau_bo_move_m2mf
  2022-08-19 20:09 [PATCH] nouveau: explicitly wait on the fence in nouveau_bo_move_m2mf Karol Herbst
  2022-08-22 21:15 ` Lyude Paul
@ 2022-09-20 10:42 ` Salvatore Bonaccorso
  2022-09-20 11:36   ` Karol Herbst
  2022-11-19  5:20 ` [Nouveau] " Computer Enthusiastic
  2 siblings, 1 reply; 7+ messages in thread
From: Salvatore Bonaccorso @ 2022-09-20 10:42 UTC (permalink / raw
  To: Karol Herbst
  Cc: linux-kernel, Ben Skeggs, Lyude Paul, dri-devel, nouveau, stable,
	Computer Enthusiastic

Hi,

On Fri, Aug 19, 2022 at 10:09:28PM +0200, Karol Herbst wrote:
> It is a bit unlcear to us why that's helping, but it does and unbreaks
> suspend/resume on a lot of GPUs without any known drawbacks.
> 
> Cc: stable@vger.kernel.org # v5.15+
> Closes: https://gitlab.freedesktop.org/drm/nouveau/-/issues/156
> Signed-off-by: Karol Herbst <kherbst@redhat.com>
> ---
>  drivers/gpu/drm/nouveau/nouveau_bo.c | 9 +++++++++
>  1 file changed, 9 insertions(+)
> 
> diff --git a/drivers/gpu/drm/nouveau/nouveau_bo.c b/drivers/gpu/drm/nouveau/nouveau_bo.c
> index 35bb0bb3fe61..126b3c6e12f9 100644
> --- a/drivers/gpu/drm/nouveau/nouveau_bo.c
> +++ b/drivers/gpu/drm/nouveau/nouveau_bo.c
> @@ -822,6 +822,15 @@ nouveau_bo_move_m2mf(struct ttm_buffer_object *bo, int evict,
>  		if (ret == 0) {
>  			ret = nouveau_fence_new(chan, false, &fence);
>  			if (ret == 0) {
> +				/* TODO: figure out a better solution here
> +				 *
> +				 * wait on the fence here explicitly as going through
> +				 * ttm_bo_move_accel_cleanup somehow doesn't seem to do it.
> +				 *
> +				 * Without this the operation can timeout and we'll fallback to a
> +				 * software copy, which might take several minutes to finish.
> +				 */
> +				nouveau_fence_wait(fence, false, false);
>  				ret = ttm_bo_move_accel_cleanup(bo,
>  								&fence->base,
>  								evict, false,
> -- 
> 2.37.1
> 
> 

While this is marked for 5.15+ only, a user in Debian was seeing the
suspend issue as well on 5.10.y and did confirm the commit fixes the
issue as well in the 5.10.y series:

https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=989705#69

Karol, Lyude, should that as well be picked for 5.10.y?

Regards,
Salvatore

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] nouveau: explicitly wait on the fence in nouveau_bo_move_m2mf
  2022-09-20 10:42 ` Salvatore Bonaccorso
@ 2022-09-20 11:36   ` Karol Herbst
  2022-09-20 11:59     ` Salvatore Bonaccorso
  0 siblings, 1 reply; 7+ messages in thread
From: Karol Herbst @ 2022-09-20 11:36 UTC (permalink / raw
  To: Salvatore Bonaccorso
  Cc: linux-kernel, Ben Skeggs, Lyude Paul, dri-devel, nouveau, stable,
	Computer Enthusiastic

On Tue, Sep 20, 2022 at 12:42 PM Salvatore Bonaccorso <carnil@debian.org> wrote:
>
> Hi,
>
> On Fri, Aug 19, 2022 at 10:09:28PM +0200, Karol Herbst wrote:
> > It is a bit unlcear to us why that's helping, but it does and unbreaks
> > suspend/resume on a lot of GPUs without any known drawbacks.
> >
> > Cc: stable@vger.kernel.org # v5.15+
> > Closes: https://gitlab.freedesktop.org/drm/nouveau/-/issues/156
> > Signed-off-by: Karol Herbst <kherbst@redhat.com>
> > ---
> >  drivers/gpu/drm/nouveau/nouveau_bo.c | 9 +++++++++
> >  1 file changed, 9 insertions(+)
> >
> > diff --git a/drivers/gpu/drm/nouveau/nouveau_bo.c b/drivers/gpu/drm/nouveau/nouveau_bo.c
> > index 35bb0bb3fe61..126b3c6e12f9 100644
> > --- a/drivers/gpu/drm/nouveau/nouveau_bo.c
> > +++ b/drivers/gpu/drm/nouveau/nouveau_bo.c
> > @@ -822,6 +822,15 @@ nouveau_bo_move_m2mf(struct ttm_buffer_object *bo, int evict,
> >               if (ret == 0) {
> >                       ret = nouveau_fence_new(chan, false, &fence);
> >                       if (ret == 0) {
> > +                             /* TODO: figure out a better solution here
> > +                              *
> > +                              * wait on the fence here explicitly as going through
> > +                              * ttm_bo_move_accel_cleanup somehow doesn't seem to do it.
> > +                              *
> > +                              * Without this the operation can timeout and we'll fallback to a
> > +                              * software copy, which might take several minutes to finish.
> > +                              */
> > +                             nouveau_fence_wait(fence, false, false);
> >                               ret = ttm_bo_move_accel_cleanup(bo,
> >                                                               &fence->base,
> >                                                               evict, false,
> > --
> > 2.37.1
> >
> >
>
> While this is marked for 5.15+ only, a user in Debian was seeing the
> suspend issue as well on 5.10.y and did confirm the commit fixes the
> issue as well in the 5.10.y series:
>
> https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=989705#69
>
> Karol, Lyude, should that as well be picked for 5.10.y?
>

mhh from the original report 5.10 was fine, but maybe something got
backported and it broke it? I'll try to do some testing on my machine
and see what I can figure out, but it could also be a debian only
issue at this point.

> Regards,
> Salvatore
>


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] nouveau: explicitly wait on the fence in nouveau_bo_move_m2mf
  2022-09-20 11:36   ` Karol Herbst
@ 2022-09-20 11:59     ` Salvatore Bonaccorso
  2022-09-30 21:09       ` Computer Enthusiastic
  0 siblings, 1 reply; 7+ messages in thread
From: Salvatore Bonaccorso @ 2022-09-20 11:59 UTC (permalink / raw
  To: Karol Herbst
  Cc: linux-kernel, Ben Skeggs, Lyude Paul, dri-devel, nouveau, stable,
	Computer Enthusiastic

Hi,

On Tue, Sep 20, 2022 at 01:36:32PM +0200, Karol Herbst wrote:
> On Tue, Sep 20, 2022 at 12:42 PM Salvatore Bonaccorso <carnil@debian.org> wrote:
> >
> > Hi,
> >
> > On Fri, Aug 19, 2022 at 10:09:28PM +0200, Karol Herbst wrote:
> > > It is a bit unlcear to us why that's helping, but it does and unbreaks
> > > suspend/resume on a lot of GPUs without any known drawbacks.
> > >
> > > Cc: stable@vger.kernel.org # v5.15+
> > > Closes: https://gitlab.freedesktop.org/drm/nouveau/-/issues/156
> > > Signed-off-by: Karol Herbst <kherbst@redhat.com>
> > > ---
> > >  drivers/gpu/drm/nouveau/nouveau_bo.c | 9 +++++++++
> > >  1 file changed, 9 insertions(+)
> > >
> > > diff --git a/drivers/gpu/drm/nouveau/nouveau_bo.c b/drivers/gpu/drm/nouveau/nouveau_bo.c
> > > index 35bb0bb3fe61..126b3c6e12f9 100644
> > > --- a/drivers/gpu/drm/nouveau/nouveau_bo.c
> > > +++ b/drivers/gpu/drm/nouveau/nouveau_bo.c
> > > @@ -822,6 +822,15 @@ nouveau_bo_move_m2mf(struct ttm_buffer_object *bo, int evict,
> > >               if (ret == 0) {
> > >                       ret = nouveau_fence_new(chan, false, &fence);
> > >                       if (ret == 0) {
> > > +                             /* TODO: figure out a better solution here
> > > +                              *
> > > +                              * wait on the fence here explicitly as going through
> > > +                              * ttm_bo_move_accel_cleanup somehow doesn't seem to do it.
> > > +                              *
> > > +                              * Without this the operation can timeout and we'll fallback to a
> > > +                              * software copy, which might take several minutes to finish.
> > > +                              */
> > > +                             nouveau_fence_wait(fence, false, false);
> > >                               ret = ttm_bo_move_accel_cleanup(bo,
> > >                                                               &fence->base,
> > >                                                               evict, false,
> > > --
> > > 2.37.1
> > >
> > >
> >
> > While this is marked for 5.15+ only, a user in Debian was seeing the
> > suspend issue as well on 5.10.y and did confirm the commit fixes the
> > issue as well in the 5.10.y series:
> >
> > https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=989705#69
> >
> > Karol, Lyude, should that as well be picked for 5.10.y?
> >
> 
> mhh from the original report 5.10 was fine, but maybe something got
> backported and it broke it? I'll try to do some testing on my machine
> and see what I can figure out, but it could also be a debian only
> issue at this point.

Right, this is a possiblity, thanks for looking into it!

Computer Enthusiastic, can you verify the problem as well in a
non-Debian patched upstream kernel directly from the 5.10.y series
(latest 5.10.144) and verify the fix there?

Regards,
Salvatore

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] nouveau: explicitly wait on the fence in nouveau_bo_move_m2mf
  2022-09-20 11:59     ` Salvatore Bonaccorso
@ 2022-09-30 21:09       ` Computer Enthusiastic
  0 siblings, 0 replies; 7+ messages in thread
From: Computer Enthusiastic @ 2022-09-30 21:09 UTC (permalink / raw
  To: Salvatore Bonaccorso
  Cc: Karol Herbst, linux-kernel, Ben Skeggs, Lyude Paul, dri-devel,
	nouveau, stable

[-- Attachment #1: Type: text/plain, Size: 1432 bytes --]

Hello,

Il giorno mar 20 set 2022 alle ore 13:59 Salvatore Bonaccorso
<carnil@debian.org> ha scritto:
[..]
> Computer Enthusiastic, can you verify the problem as well in a
> non-Debian patched upstream kernel directly from the 5.10.y series
> (latest 5.10.144) and verify the fix there?
>
> Regards,
> Salvatore

I've tested the vanilla kernel 5.10.145 (it was the latest one week
ago) without Debian kernel patches, but using the kernel config file
from the latest kernel for Debian Stable:
- without the Karol's patch: it always fails both suspend to ram and
hibernate to disk with the usual behavior (a very long time to suspend
or hibernate, then it fails on resume with a garbled screen)
- with the Karol's patch: it succeeds both suspend and hibernate and
it correctly resumes afterwards.

The kernel was tested using the following graphic adapter:
Graphics:  Device-1: NVIDIA G96CM [GeForce 9600M GT] driver: nouveau v: kernel
          Device-2: Suyin Acer HD Crystal Eye webcam type: USB driver:
uvcvideo
          Display: x11 server: X.Org 1.20.11 driver: loaded:
modesetting unloaded: fbdev,vesa
          resolution: 1280x800~60Hz
          OpenGL: renderer: NV96 v: 3.3 Mesa 20.3.5

Therefore, 5.10.y series of the kernel need to be patched to work
correctly at least with the aforementioned graphic card.

The script I used to compile the kernel are attached for further
reference and verification.

Hope that helps.

[-- Attachment #2: vanilla-kernel-build-5.10.145 --]
[-- Type: application/octet-stream, Size: 572 bytes --]

# Download source code
wget -nc https://cdn.kernel.org/pub/linux/kernel/v5.x/linux-5.10.145.tar.xz
tar xf linux-5.10.145.tar.xz

# Automate subversion index
SUBVERSION_INDEX="1"

# Delete from previous builds
cd linux-5.10.145
rm -rf ./debian
rm -rf ../linux.orig/
rm -rf ../linux-upstream*

cp /boot/config-5.10.0-18-amd64 .config
make olddefconfig

scripts/config --disable SYSTEM_TRUSTED_KEYRING
scripts/config --set-str SYSTEM_TRUSTED_KEYS ''

# Build kernel
time make -j 8 deb-pkg LOCALVERSION=-vanilla KDEB_PKGVERSION=$(make kernelversion)-$SUBVERSION_INDEX

exit 0

[-- Attachment #3: vanilla-kernel-build-5.10.145-patched --]
[-- Type: application/octet-stream, Size: 781 bytes --]

# Download source code
wget -nc https://cdn.kernel.org/pub/linux/kernel/v5.x/linux-5.10.145.tar.xz
tar xf linux-5.10.145.tar.xz

# get patch
wget nc -O nouveau.patch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/patch/?id=3640cdccbe75b8922e5bfc0191dd37e3aaa24833

# Automate subversion index
SUBVERSION_INDEX="1"

# Delete from previous builds
cd linux-5.10.145
rm -rf ./debian
rm -rf ../linux.orig/
rm -rf ../linux-upstream*

cp /boot/config-5.10.0-18-amd64 .config
make olddefconfig

scripts/config --disable SYSTEM_TRUSTED_KEYRING
scripts/config --set-str SYSTEM_TRUSTED_KEYS ''

# Apply patch
patch -p 1 < ../nouveau.patch || exit 1

# Build kernel
time make -j 8 deb-pkg LOCALVERSION=-patched KDEB_PKGVERSION=$(make kernelversion)-$SUBVERSION_INDEX

exit 0

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [Nouveau] [PATCH] nouveau: explicitly wait on the fence in nouveau_bo_move_m2mf
  2022-08-19 20:09 [PATCH] nouveau: explicitly wait on the fence in nouveau_bo_move_m2mf Karol Herbst
  2022-08-22 21:15 ` Lyude Paul
  2022-09-20 10:42 ` Salvatore Bonaccorso
@ 2022-11-19  5:20 ` Computer Enthusiastic
  2 siblings, 0 replies; 7+ messages in thread
From: Computer Enthusiastic @ 2022-11-19  5:20 UTC (permalink / raw
  To: Karol Herbst, stable; +Cc: linux-kernel, nouveau, dri-devel, Ben Skeggs

Hello,

Il giorno ven 19 ago 2022 alle ore 22:09 Karol Herbst
<kherbst@redhat.com> ha scritto:
>
> It is a bit unlcear to us why that's helping, but it does and unbreaks
> suspend/resume on a lot of GPUs without any known drawbacks.
>
> Cc: stable@vger.kernel.org # v5.15+
> Closes: https://gitlab.freedesktop.org/drm/nouveau/-/issues/156
> Signed-off-by: Karol Herbst <kherbst@redhat.com>
> ---
>  drivers/gpu/drm/nouveau/nouveau_bo.c | 9 +++++++++
>  1 file changed, 9 insertions(+)
>
> diff --git a/drivers/gpu/drm/nouveau/nouveau_bo.c b/drivers/gpu/drm/nouveau/nouveau_bo.c
> index 35bb0bb3fe61..126b3c6e12f9 100644
> --- a/drivers/gpu/drm/nouveau/nouveau_bo.c
> +++ b/drivers/gpu/drm/nouveau/nouveau_bo.c
> @@ -822,6 +822,15 @@ nouveau_bo_move_m2mf(struct ttm_buffer_object *bo, int evict,
>                 if (ret == 0) {
>                         ret = nouveau_fence_new(chan, false, &fence);
>                         if (ret == 0) {
> +                               /* TODO: figure out a better solution here
> +                                *
> +                                * wait on the fence here explicitly as going through
> +                                * ttm_bo_move_accel_cleanup somehow doesn't seem to do it.
> +                                *
> +                                * Without this the operation can timeout and we'll fallback to a
> +                                * software copy, which might take several minutes to finish.
> +                                */
> +                               nouveau_fence_wait(fence, false, false);
>                                 ret = ttm_bo_move_accel_cleanup(bo,
>                                                                 &fence->base,
>                                                                 evict, false,
> --
> 2.37.1
>

Could it be possible to make land the aforementioned patch to the
5.10.x kernel version ? It is currently for >= 5.15.x kernel version
only.

Thanks.

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2022-11-19  5:20 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2022-08-19 20:09 [PATCH] nouveau: explicitly wait on the fence in nouveau_bo_move_m2mf Karol Herbst
2022-08-22 21:15 ` Lyude Paul
2022-09-20 10:42 ` Salvatore Bonaccorso
2022-09-20 11:36   ` Karol Herbst
2022-09-20 11:59     ` Salvatore Bonaccorso
2022-09-30 21:09       ` Computer Enthusiastic
2022-11-19  5:20 ` [Nouveau] " Computer Enthusiastic

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).