All the mail mirrored from lore.kernel.org
 help / color / mirror / Atom feed
From: Chris Wilson <chris@chris-wilson.co.uk>
To: Tomas Elf <tomas.elf@intel.com>
Cc: "Intel-GFX@Lists.FreeDesktop.Org"
	<Intel-GFX@Lists.FreeDesktop.Org>,
	Ian Lister <ian.lister@intel.com>
Subject: Re: [RFC 10/11] drm/i915: Debugfs interface for per-engine hang recovery.
Date: Tue, 9 Jun 2015 13:27:03 +0100	[thread overview]
Message-ID: <20150609122703.GA26371@nuc-i3427.alporthouse.com> (raw)
In-Reply-To: <5576CB84.7020204@intel.com>

On Tue, Jun 09, 2015 at 12:18:28PM +0100, Tomas Elf wrote:
> On 08/06/2015 18:45, Chris Wilson wrote:
> >On Mon, Jun 08, 2015 at 06:03:28PM +0100, Tomas Elf wrote:
> >>1. The i915_wedged_set function allows us to schedule three forms of hang recovery:
> >>
> >>	a) Legacy hang recovery: By passing e.g. -1 we trigger the legacy full
> >>	GPU reset recovery path.
> >>
> >>	b) Single engine hang recovery: By passing an engine ID in the interval
> >>	of [0, I915_NUM_RINGS) we can schedule hang recovery of any single
> >>	engine assuming that the context submission consistency requirements
> >>	are met (otherwise the hang recovery path will simply exit early and
> >>	wait for another hang detection). The values are assumed to use up bits
> >>	3:0 only since we certainly do not support as many as 16 engines.
> >>
> >>	This mode is supported since there are several legacy test applications
> >>	that rely on this interface.
> >
> >Are there? I don't see them in igt - and let's not start making debugfs
> >ABI.
> 
> They're not in IGT only internal to VPG. I guess we could limit
> these changes and adapt the internal test suite in VPG instead of
> upstreaming changes that only VPG validation cares about.

Also note that there are quite a few concurrent hang tests in igt that
this series should aim to fix.

You will be expected to provide basic validation tests for igt as well,
which will be using the debugfs interface I guess.

> >>	c) Multiple engine hang recovery: By passing in an engine flag mask in
> >>	bits 31:8 (bit 8 corresponds to engine 0 = RCS, bit 9 corresponds to
> >>	engine 1 = VCS etc) we can schedule any combination of engine hang
> >>	recoveries as we please. For example, by passing in the value 0x3 << 8
> >>	we would schedule hang recovery for engines 0 and 1 (RCS and VCS) at
> >>	the same time.
> >
> >Seems fine. But I don't see the reason for the extra complication.
> 
> I wanted to make sure that we could test multiple concurrent hang
> recoveries, but to be fair nobody is actually using this at the
> moment so unless someone actually _needs_ this we probably don't
> need to upstream it.
> 
> I guess we could leave it in its currently upstreamed form where it
> only allows full GPU reset. Or would it be of use to anyone to
> support per-engine recovery?

I like the per-engine flags, I was just arguing about having the
interface do both seems overly compliated (when the existing behaviour
can be retrained to use -1).

> >>	If bits in fields 3:0 and 31:8 are both used then single engine hang
> >>	recovery mode takes presidence and bits 31:8 are ignored.
> >>
> >>2. The i915_wedged_get function produces a set of statistics related to:
> >
> >Add it to hangcheck_info instead.
> 
> Yeah, I considered that but I felt that hangcheck_info had too much
> text and it would be too much of a hassle to parse out the data. But
> having spoken to the validation guys it seems like they're fine with
> updating the parser. So I could update hangcheck_info with this new
> information.

It can be more or less just be searching for the start of your info block.
A quick string search on new debugfs name, old debugfs name could even
provide backwards compatibility in the test.

> >i915_wedged_get could be updated to give the ring mask of wedged rings?
> >If that concept exists.
> >-Chris
> >
> 
> Nah, no need, I'll just add the information to hangcheck_info.
> Besides, wedged_get needs to provide more information than just the
> current wedged state. It also provides information about the number
> of resets, the number of watchdog timeouts etc. So it's not that
> easy to summarise it as a ring mask.

We are still talking about today's single valued debugfs/i915_wedged
rather than the extended info?

Oh, whilst I am thinking of it, you could also add the reset stats to
the error state.
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

  reply	other threads:[~2015-06-09 12:27 UTC|newest]

Thread overview: 59+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-06-08 17:03 [RFC 00/11] TDR/watchdog timeout support for gen8 Tomas Elf
2015-06-08 17:03 ` [RFC 01/11] drm/i915: Early exit from semaphore_waits_for for execlist mode Tomas Elf
2015-06-08 17:36   ` Chris Wilson
2015-06-09 11:02     ` Tomas Elf
2015-06-16 13:44   ` Daniel Vetter
2015-06-16 15:46     ` Tomas Elf
2015-06-16 16:50       ` Chris Wilson
2015-06-16 17:07         ` Tomas Elf
2015-06-17 11:43       ` Daniel Vetter
2015-06-08 17:03 ` [RFC 02/11] drm/i915: Introduce uevent for full GPU reset Tomas Elf
2015-06-16 13:43   ` Daniel Vetter
2015-06-16 15:43     ` Tomas Elf
2015-06-16 16:55       ` Chris Wilson
2015-06-16 17:32         ` Tomas Elf
2015-06-16 19:33           ` Chris Wilson
2015-06-17 11:49             ` Daniel Vetter
2015-06-17 12:51               ` Chris Wilson
2015-06-08 17:03 ` [RFC 03/11] drm/i915: Add reset stats entry point for per-engine reset Tomas Elf
2015-06-08 17:33   ` Chris Wilson
2015-06-09 11:06     ` Tomas Elf
2015-06-16 13:48     ` Daniel Vetter
2015-06-16 13:54       ` Chris Wilson
2015-06-16 15:55         ` Daniel Vetter
2015-06-18 11:12         ` Dave Gordon
2015-06-11  9:14   ` Dave Gordon
2015-06-16 13:49   ` Daniel Vetter
2015-06-16 15:54     ` Tomas Elf
2015-06-17 11:51       ` Daniel Vetter
2015-06-08 17:03 ` [RFC 04/11] drm/i915: Adding TDR / per-engine reset support for gen8 Tomas Elf
2015-06-08 17:03 ` [RFC 05/11] drm/i915: Extending i915_gem_check_wedge to check engine reset in progress Tomas Elf
2015-06-08 17:24   ` Chris Wilson
2015-06-09 11:08     ` Tomas Elf
2015-06-09 11:11   ` Chris Wilson
2015-06-08 17:03 ` [RFC 06/11] drm/i915: Disable warnings for TDR interruptions in the display driver Tomas Elf
2015-06-08 17:53   ` Chris Wilson
2015-06-08 17:03 ` [RFC 07/11] drm/i915: Reinstate hang recovery work queue Tomas Elf
2015-06-08 17:03 ` [RFC 08/11] drm/i915: Watchdog timeout support for gen8 Tomas Elf
2015-06-08 17:03 ` [RFC 09/11] drm/i915: Fake lost context interrupts through forced CSB check Tomas Elf
2015-06-08 17:03 ` [RFC 10/11] drm/i915: Debugfs interface for per-engine hang recovery Tomas Elf
2015-06-08 17:45   ` Chris Wilson
2015-06-09 11:18     ` Tomas Elf
2015-06-09 12:27       ` Chris Wilson [this message]
2015-06-09 17:28         ` Tomas Elf
2015-06-11  9:32     ` Dave Gordon
2015-06-08 17:03 ` [RFC 11/11] drm/i915: TDR/watchdog trace points Tomas Elf
2015-06-23 10:05 ` [RFC 00/11] TDR/watchdog timeout support for gen8 Daniel Vetter
2015-06-23 10:47   ` Tomas Elf
2015-06-23 11:38     ` Daniel Vetter
2015-06-23 14:06       ` Tomas Elf
2015-06-23 15:20         ` Daniel Vetter
2015-06-23 15:35           ` Daniel Vetter
2015-06-25 10:38             ` Tomas Elf
2015-07-03 11:15 ` Mika Kuoppala
2015-07-03 17:41   ` Tomas Elf
2015-07-09 18:47 ` Chris Wilson
2015-07-10 15:24   ` Tomas Elf
2015-07-10 15:48     ` Tomas Elf
2015-07-11 18:15       ` Chris Wilson
2015-07-11 18:22     ` Chris Wilson

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20150609122703.GA26371@nuc-i3427.alporthouse.com \
    --to=chris@chris-wilson.co.uk \
    --cc=Intel-GFX@Lists.FreeDesktop.Org \
    --cc=ian.lister@intel.com \
    --cc=tomas.elf@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.