All the mail mirrored from lore.kernel.org
 help / color / mirror / Atom feed
From: Tomas Elf <tomas.elf@intel.com>
To: Chris Wilson <chris@chris-wilson.co.uk>,
	Daniel Vetter <daniel@ffwll.ch>,
	Intel-GFX@Lists.FreeDesktop.Org
Subject: Re: [RFC 02/11] drm/i915: Introduce uevent for full GPU reset.
Date: Tue, 16 Jun 2015 18:32:22 +0100	[thread overview]
Message-ID: <55805DA6.5040909@intel.com> (raw)
In-Reply-To: <20150616165502.GK11933@nuc-i3427.alporthouse.com>

On 16/06/2015 17:55, Chris Wilson wrote:
> On Tue, Jun 16, 2015 at 04:43:55PM +0100, Tomas Elf wrote:
>> On 16/06/2015 14:43, Daniel Vetter wrote:
>>> On Mon, Jun 08, 2015 at 06:03:20PM +0100, Tomas Elf wrote:
>>>> The TDR ULT used to validate this patch series requires a special uevent for
>>>> full GPU resets in order to distinguish between different kinds of resets.
>>>>
>>>> Signed-off-by: Tomas Elf <tomas.elf@intel.com>
>>>
>>> Why duplicate the uevent we send out from i915_reset_and_wakeup? At least
>>> I can't spot what this gets us in addition to the existing one.
>>> -Daniel
>>
>> Look at this line:
>>>> +		reset_event[0] = kasprintf(GFP_KERNEL, "%s", "GPU RESET=0");
>>
>> It doesn't exist in reset_and_wakeup (specifically, the "GPU
>> RESET=0" part). It's a uevent that happens at the time of the actual
>> GPU reset (GDRST register write). In the subsequent TDR commit we
>> add another one to the point of the actual engine reset, which also
>> includes information about what exact engine was reset.
>>
>> The uevents in reset_and_wakeup only tell the user that an error has
>> been detected and that some kind of reset has happened, these new
>> uevents specify exactly what kind of reset has happened. This
>> particular one on its own it's not very meaningful since there is
>> only one supported form of reset at this point but once we add
>> engine reset support it's useful to be able to discern the types of
>> resets from each other (GPU reset, RCS engine reset, VCS engine
>> reset, VCS2 engine reset, BCS engine reset, VECS engine reset).
>>
>> Does that make sense?
>
> The ultimate question is how do you envisage these uevents being used?
>
> At present, we have abrtd listening out for when to grab the
> /sys/drm/cardX/error and maybe for GL robustness (though I would imagine
> if they thought such events useful we would have had demands for a DRM
> event on the guilty/victim fd).
>
> Does it really make sense to send uevents for both hang, partial-reset,
> and full-reset?
> -Chris
>

The reason we have such a detailed set of uevents is primarily for 
testing purposes. Our internal VPG tests check for these uevents to make 
sure that the expected recovery mode is actually being used. Which makes 
sense, because the TDR driver code contains reset promotion logic to 
decide what recovery mode to use and if that logic somehow gets broken 
the driver might go with the wrong recovery mode. Thus, it's worth 
testing and therefore those uevents need to be there. Of course, I guess 
the argument "our internal VPG tests do this" might not hold water since 
the tests haven't been upstreamed? If that's the case then I guess I 
don't have any opinion about what uevent goes where and we could go with 
whatever set of uevents you prefer.

Also, it might not be worth the hassle to have the reset_done_event at 
recovery completion at the end of i915_reset_and_wakeup() / 
i915_error_work_func() _as_well_as_ the respective uevent after each 
actual GPU/engine reset completion since reset_done_event doesn't really 
offer that much information that you didn't already know from the 
post-reset uevent. So I would be ok with removing reset_done_event.

Thanks,
Tomas

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

  reply	other threads:[~2015-06-16 17:32 UTC|newest]

Thread overview: 59+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-06-08 17:03 [RFC 00/11] TDR/watchdog timeout support for gen8 Tomas Elf
2015-06-08 17:03 ` [RFC 01/11] drm/i915: Early exit from semaphore_waits_for for execlist mode Tomas Elf
2015-06-08 17:36   ` Chris Wilson
2015-06-09 11:02     ` Tomas Elf
2015-06-16 13:44   ` Daniel Vetter
2015-06-16 15:46     ` Tomas Elf
2015-06-16 16:50       ` Chris Wilson
2015-06-16 17:07         ` Tomas Elf
2015-06-17 11:43       ` Daniel Vetter
2015-06-08 17:03 ` [RFC 02/11] drm/i915: Introduce uevent for full GPU reset Tomas Elf
2015-06-16 13:43   ` Daniel Vetter
2015-06-16 15:43     ` Tomas Elf
2015-06-16 16:55       ` Chris Wilson
2015-06-16 17:32         ` Tomas Elf [this message]
2015-06-16 19:33           ` Chris Wilson
2015-06-17 11:49             ` Daniel Vetter
2015-06-17 12:51               ` Chris Wilson
2015-06-08 17:03 ` [RFC 03/11] drm/i915: Add reset stats entry point for per-engine reset Tomas Elf
2015-06-08 17:33   ` Chris Wilson
2015-06-09 11:06     ` Tomas Elf
2015-06-16 13:48     ` Daniel Vetter
2015-06-16 13:54       ` Chris Wilson
2015-06-16 15:55         ` Daniel Vetter
2015-06-18 11:12         ` Dave Gordon
2015-06-11  9:14   ` Dave Gordon
2015-06-16 13:49   ` Daniel Vetter
2015-06-16 15:54     ` Tomas Elf
2015-06-17 11:51       ` Daniel Vetter
2015-06-08 17:03 ` [RFC 04/11] drm/i915: Adding TDR / per-engine reset support for gen8 Tomas Elf
2015-06-08 17:03 ` [RFC 05/11] drm/i915: Extending i915_gem_check_wedge to check engine reset in progress Tomas Elf
2015-06-08 17:24   ` Chris Wilson
2015-06-09 11:08     ` Tomas Elf
2015-06-09 11:11   ` Chris Wilson
2015-06-08 17:03 ` [RFC 06/11] drm/i915: Disable warnings for TDR interruptions in the display driver Tomas Elf
2015-06-08 17:53   ` Chris Wilson
2015-06-08 17:03 ` [RFC 07/11] drm/i915: Reinstate hang recovery work queue Tomas Elf
2015-06-08 17:03 ` [RFC 08/11] drm/i915: Watchdog timeout support for gen8 Tomas Elf
2015-06-08 17:03 ` [RFC 09/11] drm/i915: Fake lost context interrupts through forced CSB check Tomas Elf
2015-06-08 17:03 ` [RFC 10/11] drm/i915: Debugfs interface for per-engine hang recovery Tomas Elf
2015-06-08 17:45   ` Chris Wilson
2015-06-09 11:18     ` Tomas Elf
2015-06-09 12:27       ` Chris Wilson
2015-06-09 17:28         ` Tomas Elf
2015-06-11  9:32     ` Dave Gordon
2015-06-08 17:03 ` [RFC 11/11] drm/i915: TDR/watchdog trace points Tomas Elf
2015-06-23 10:05 ` [RFC 00/11] TDR/watchdog timeout support for gen8 Daniel Vetter
2015-06-23 10:47   ` Tomas Elf
2015-06-23 11:38     ` Daniel Vetter
2015-06-23 14:06       ` Tomas Elf
2015-06-23 15:20         ` Daniel Vetter
2015-06-23 15:35           ` Daniel Vetter
2015-06-25 10:38             ` Tomas Elf
2015-07-03 11:15 ` Mika Kuoppala
2015-07-03 17:41   ` Tomas Elf
2015-07-09 18:47 ` Chris Wilson
2015-07-10 15:24   ` Tomas Elf
2015-07-10 15:48     ` Tomas Elf
2015-07-11 18:15       ` Chris Wilson
2015-07-11 18:22     ` Chris Wilson

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=55805DA6.5040909@intel.com \
    --to=tomas.elf@intel.com \
    --cc=Intel-GFX@Lists.FreeDesktop.Org \
    --cc=chris@chris-wilson.co.uk \
    --cc=daniel@ffwll.ch \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.