All the mail mirrored from lore.kernel.org
 help / color / mirror / Atom feed
From: Daniel Vetter <daniel@ffwll.ch>
To: Dave Gordon <david.s.gordon@intel.com>
Cc: intel-gfx@lists.freedesktop.org
Subject: Re: [PATCH 07/15] drm/i915: Defer default hardware context initialisation until first open
Date: Wed, 17 Jun 2015 14:18:12 +0200	[thread overview]
Message-ID: <20150617121812.GS23637@phenom.ffwll.local> (raw)
In-Reply-To: <1434393394-21002-8-git-send-email-david.s.gordon@intel.com>

On Mon, Jun 15, 2015 at 07:36:25PM +0100, Dave Gordon wrote:
> In order to fully initialise the default contexts, we have to execute
> batchbuffer commands on the GPU engines. But in the case of GuC-based
> batch submission, we can't do that until any required firmware has
> been loaded, which may not be possible during driver load, because the
> filesystem(s) containing the firmware may not be mounted until later.
> 
> Therefore, we now allow the first call to the firmware-loading code to
> return -EAGAIN to indicate that it's not yet ready, and that it should
> be retried when the device is first opened from user code, by which
> time we expect that all required filesystems will have been mounted.
> The late-retry code will then re-attempt to load the firmware if the
> early attempt failed.
> 
> If the late retry fails, the current open-in-progress will fail, but
> the recovery code will disable GuC submission and reset the GPU and
> driver. The next open will therefore be in non-GuC mode, and will be
> allowed to complete even if the GuC cannot be loaded or used.
> 
> Issue: VIZ-4884
> Signed-off-by: Dave Gordon <david.s.gordon@intel.com>
> Signed-off-by: Alex Dai <yu.dai@intel.com>

I'm not really sold on this super-flexible fallback scheme implemented
here. Because such fallback schemes means more code to test (which no on
will do likely) or just even bigger fireworks when we actually hit them in
reality when something goes wrong. Imo if anything goes wrong in the setup
we just throw in the towel and fail the driver loading.

There's only one exception: If something fails with GT init we declare the
gpu wedged but proceed with all the modeset setup. This makes sense
because we need all the code to handle a wedge gpu anyway, dead-on-boot
gpus happen occasionally and it's really not nice to greet the user with a
black screen. But more fallbacks are imo just headache.

Hence when the guc fails we imo really shouldn't bother with fallbacks,
but instead just declare the thing wedged and carry on.

That should also allow us to simplify the firmware loading: We can do that
in an async worker and if the blob isn't there in time then we just move
on.
-Daniel

> ---
>  drivers/gpu/drm/i915/i915_drv.h         |    2 ++
>  drivers/gpu/drm/i915/i915_gem.c         |    9 +++++-
>  drivers/gpu/drm/i915/i915_gem_context.c |   52 ++++++++++++++++++++++++++++---
>  drivers/gpu/drm/i915/i915_irq.c         |   48 ++++++++++++++++++++++++++++
>  4 files changed, 105 insertions(+), 6 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
> index f47cde7..a1fc278 100644
> --- a/drivers/gpu/drm/i915/i915_drv.h
> +++ b/drivers/gpu/drm/i915/i915_drv.h
> @@ -1837,6 +1837,7 @@ struct drm_i915_private {
>  	/* hda/i915 audio component */
>  	bool audio_component_registered;
>  
> +	bool contexts_ready;
>  	uint32_t hw_context_size;
>  	struct list_head context_list;
>  
> @@ -2614,6 +2615,7 @@ void i915_queue_hangcheck(struct drm_device *dev);
>  __printf(3, 4)
>  void i915_handle_error(struct drm_device *dev, bool wedged,
>  		       const char *fmt, ...);
> +void i915_handle_guc_error(struct drm_device *dev, int err);
>  
>  extern void intel_irq_init(struct drm_i915_private *dev_priv);
>  extern void intel_hpd_init(struct drm_i915_private *dev_priv);
> diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
> index cd4a865..d1a8862 100644
> --- a/drivers/gpu/drm/i915/i915_gem.c
> +++ b/drivers/gpu/drm/i915/i915_gem.c
> @@ -5025,8 +5025,15 @@ i915_gem_init_hw(struct drm_device *dev)
>  
>  	/* We can't enable contexts until all firmware is loaded */
>  	ret = intel_guc_ucode_load(dev, false);
> +	if (ret == -EAGAIN) {
> +		ret = 0;
> +		goto out;		/* too early */
> +	}
> +
>  	ret = i915_gem_context_enable(dev_priv);
> -	if (ret && ret != -EIO) {
> +	if (ret == 0) {
> +		dev_priv->contexts_ready = true;
> +	} else if (ret && ret != -EIO) {
>  		DRM_ERROR("Context enable failed %d\n", ret);
>  		i915_gem_cleanup_ringbuffer(dev);
>  
> diff --git a/drivers/gpu/drm/i915/i915_gem_context.c b/drivers/gpu/drm/i915/i915_gem_context.c
> index 133afcf..debbfc9 100644
> --- a/drivers/gpu/drm/i915/i915_gem_context.c
> +++ b/drivers/gpu/drm/i915/i915_gem_context.c
> @@ -447,23 +447,65 @@ static int context_idr_cleanup(int id, void *p, void *data)
>  	return 0;
>  }
>  
> +/* Complete any late initialisation here */
> +static int i915_gem_context_first_open(struct drm_device *dev)
> +{
> +	struct drm_i915_private *dev_priv = dev->dev_private;
> +	int ret;
> +
> +	/*
> +	 * We can't enable contexts until all firmware is loaded. This
> +	 * call shouldn't return -EAGAIN because we pass wait=true, but
> +	 * it can still fail with code -EIO if the GuC doesn't respond,
> +	 * or -ENOEXEC if the GuC firmware image is invalid.
> +	 */
> +	ret = intel_guc_ucode_load(dev, true);
> +	WARN_ON(ret == -EAGAIN);
> +
> +	/*
> +	 * If an error occurred and GuC submission has been requested, we can
> +	 * attempt recovery by disabling GuC submission and reinitialising
> +	 * the GPU and driver. We then fail this open() anyway, but the next
> +	 * attempt will find that GuC submission is already disabled, and so
> +	 * proceed to complete context initialisation in non-GuC mode instead.
> +	 */
> +	if (ret && i915.enable_guc_submission) {
> +		i915_handle_guc_error(dev, ret);
> +		return ret;
> +	}
> +
> +	ret = i915_gem_context_enable(dev_priv);
> +	if (ret == 0)
> +		dev_priv->contexts_ready = true;
> +	return ret;
> +}
> +
>  int i915_gem_context_open(struct drm_device *dev, struct drm_file *file)
>  {
> +	struct drm_i915_private *dev_priv = dev->dev_private;
>  	struct drm_i915_file_private *file_priv = file->driver_priv;
>  	struct intel_context *ctx;
> +	int ret = 0;
>  
>  	idr_init(&file_priv->context_idr);
>  
>  	mutex_lock(&dev->struct_mutex);
> -	ctx = i915_gem_create_context(dev, file_priv);
> +
> +	if (!dev_priv->contexts_ready)
> +		ret = i915_gem_context_first_open(dev);
> +
> +	if (ret == 0) {
> +		ctx = i915_gem_create_context(dev, file_priv);
> +		if (IS_ERR(ctx))
> +			ret = PTR_ERR(ctx);
> +	}
> +
>  	mutex_unlock(&dev->struct_mutex);
>  
> -	if (IS_ERR(ctx)) {
> +	if (ret)
>  		idr_destroy(&file_priv->context_idr);
> -		return PTR_ERR(ctx);
> -	}
>  
> -	return 0;
> +	return ret;
>  }
>  
>  void i915_gem_context_close(struct drm_device *dev, struct drm_file *file)
> diff --git a/drivers/gpu/drm/i915/i915_irq.c b/drivers/gpu/drm/i915/i915_irq.c
> index 56db9e74..f7dcf8d 100644
> --- a/drivers/gpu/drm/i915/i915_irq.c
> +++ b/drivers/gpu/drm/i915/i915_irq.c
> @@ -2665,6 +2665,54 @@ void i915_handle_error(struct drm_device *dev, bool wedged,
>  	i915_reset_and_wakeup(dev);
>  }
>  
> +/**
> + * i915_handle_error - handle a GuC error
> + * @dev: drm device
> + *
> + * If the GuC can't be (re-)initialised, disable GuC submission and
> + * then reset and reinitialise the rest of the GPU, so that we can
> + * fall back to operating in ELSP mode. Don't bother capturing error
> + * state, because it probably isn't relevant here.
> + *
> + * Unlike i915_handle_error() above, this is called with the global
> + * struct_mutex held, so we need to release it after setting the
> + * reset-in-progress bit so that other threads can make progress,
> + * and reacquire it after the reset is complete.
> + */
> +void i915_handle_guc_error(struct drm_device *dev, int err)
> +{
> +	struct drm_i915_private *dev_priv = dev->dev_private;
> +
> +	DRM_ERROR("GuC failure %d, disabling GuC submission\n", err);
> +	i915.enable_guc_submission = false;
> +
> +	i915_report_and_clear_eir(dev);	/* unlikely? */
> +
> +	atomic_set_mask(I915_RESET_IN_PROGRESS_FLAG,
> +			&dev_priv->gpu_error.reset_counter);
> +
> +	mutex_unlock(&dev->struct_mutex);
> +
> +	/*
> +	 * Wakeup waiting processes so that the reset function
> +	 * i915_reset_and_wakeup doesn't deadlock trying to grab
> +	 * various locks. By bumping the reset counter first, the woken
> +	 * processes will see a reset in progress and back off,
> +	 * releasing their locks and then wait for the reset completion.
> +	 * We must do this for _all_ gpu waiters that might hold locks
> +	 * that the reset work needs to acquire.
> +	 *
> +	 * Note: The wake_up serves as the required memory barrier to
> +	 * ensure that the waiters see the updated value of the reset
> +	 * counter atomic_t.
> +	 */
> +	i915_error_wake_up(dev_priv, false);
> +
> +	i915_reset_and_wakeup(dev);
> +
> +	mutex_lock(&dev->struct_mutex);
> +}
> +
>  /* Called from drm generic code, passed 'crtc' which
>   * we use as a pipe index
>   */
> -- 
> 1.7.9.5
> 
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/intel-gfx

-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

  parent reply	other threads:[~2015-06-17 12:15 UTC|newest]

Thread overview: 90+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-06-15 18:36 [PATCH 00/15] Batch submission via GuC Dave Gordon
2015-06-15 18:36 ` [PATCH 01/15] drm/i915: Add i915_gem_object_write() to i915_gem.c Dave Gordon
2015-06-15 20:09   ` Chris Wilson
2015-06-17  7:23     ` Dave Gordon
2015-06-17 12:02       ` Daniel Vetter
2015-06-18 11:49         ` Dave Gordon
2015-06-18 12:10           ` Chris Wilson
2015-06-18 18:07             ` Dave Gordon
2015-06-19  8:44               ` Chris Wilson
2015-06-22 11:59                 ` Dave Gordon
2015-06-22 12:37                   ` Chris Wilson
2015-06-23 16:54                     ` Dave Gordon
2015-06-18 14:31           ` Daniel Vetter
2015-06-18 18:28             ` Dave Gordon
2015-06-24  9:32               ` Daniel Vetter
2015-06-25 12:28                 ` Dave Gordon
2015-06-24  9:40               ` Chris Wilson
2015-06-15 18:36 ` [PATCH 02/15] drm/i915: Embedded microcontroller (uC) firmware loading support Dave Gordon
2015-06-17 12:05   ` Daniel Vetter
2015-06-18 12:11     ` Dave Gordon
2015-06-18 14:49       ` Daniel Vetter
2015-06-18 15:27         ` Chris Wilson
2015-06-18 15:35           ` Daniel Vetter
2015-06-18 15:49             ` Chris Wilson
2015-06-19  8:43         ` Dave Gordon
2015-06-24 10:29           ` Daniel Vetter
2015-07-06 12:44             ` Dave Gordon
2015-07-06 13:24               ` Daniel Vetter
2015-06-15 18:36 ` [PATCH 03/15] drm/i915: Add GuC-related module parameters Dave Gordon
2015-06-15 18:36 ` [PATCH 04/15] drm/i915: Add GuC-related header files Dave Gordon
2015-06-15 20:20   ` Chris Wilson
2015-06-17 15:01     ` Dave Gordon
2015-06-23 18:10       ` Dave Gordon
2015-06-24  7:41     ` Dave Gordon
2015-06-24  9:37       ` Daniel Vetter
2015-06-15 18:36 ` [PATCH 05/15] drm/i915: GuC-specific firmware loader Dave Gordon
2015-06-15 20:30   ` Chris Wilson
2015-06-18 17:53     ` Yu Dai
2015-06-18 20:12       ` Chris Wilson
2015-06-19 14:34         ` Dave Gordon
2015-06-18 18:54     ` Dave Gordon
2015-06-15 18:36 ` [PATCH 06/15] drm/i915: Debugfs interface to read GuC load status Dave Gordon
2015-06-16  9:40   ` Chris Wilson
2015-06-19  7:49     ` Dave Gordon
2015-06-15 18:36 ` [PATCH 07/15] drm/i915: Defer default hardware context initialisation until first open Dave Gordon
2015-06-16  9:35   ` Chris Wilson
2015-06-19  9:42     ` Dave Gordon
2015-06-17 12:18   ` Daniel Vetter [this message]
2015-06-19  9:19     ` Dave Gordon
2015-06-24 10:15       ` Daniel Vetter
2015-06-15 18:36 ` [PATCH 08/15] drm/i915: Move execlists defines from .c to .h Dave Gordon
2015-06-16  9:37   ` Chris Wilson
2015-06-17  7:31     ` Dave Gordon
2015-06-17  7:54       ` Chris Wilson
2015-06-17  7:59       ` Chris Wilson
2015-06-22 13:05         ` Dave Gordon
2015-06-15 18:36 ` [PATCH 09/15] drm/i915: GuC submission setup, phase 1 Dave Gordon
2015-06-15 21:32   ` Chris Wilson
2015-06-19 17:02     ` Dave Gordon
2015-06-19 17:22       ` Dave Gordon
2015-06-16 11:44   ` Chris Wilson
2015-06-15 18:36 ` [PATCH 10/15] drm/i915: Enable GuC firmware log Dave Gordon
2015-06-15 21:40   ` Chris Wilson
2015-06-16  9:26   ` Tvrtko Ursulin
2015-06-16 11:40     ` Chris Wilson
2015-06-16 12:29       ` Tvrtko Ursulin
2015-06-15 18:36 ` [PATCH 11/15] drm/i915: Implementation of GuC client Dave Gordon
2015-06-15 21:55   ` Chris Wilson
2015-06-19 17:55     ` Dave Gordon
2015-06-15 18:36 ` [PATCH 12/15] drm/i915: Interrupt routing for GuC submission Dave Gordon
2015-06-16  9:24   ` Chris Wilson
2015-06-17  8:20     ` Dave Gordon
2015-06-17 12:22       ` Daniel Vetter
2015-06-17 12:41         ` Daniel Vetter
2015-06-23 11:33           ` Dave Gordon
2015-06-23 23:48             ` Yu Dai
2015-06-24 10:02               ` Daniel Vetter
2015-06-15 18:36 ` [PATCH 13/15] drm/i915: Integrate GuC-based command submission Dave Gordon
2015-06-16  9:22   ` Chris Wilson
2015-06-19 18:18     ` Dave Gordon
2015-06-15 18:36 ` [PATCH 14/15] drm/i915: Debugfs interface for GuC submission statistics Dave Gordon
2015-06-16  9:28   ` Chris Wilson
2015-06-24  8:27     ` Dave Gordon
2015-06-15 18:36 ` [PATCH 15/15] Documentation/drm: kerneldoc for GuC Dave Gordon
2015-06-15 18:36 ` [PATCH 16/15] drm/i915: Enable GuC submission, where supported Dave Gordon
2015-06-17 12:43 ` [PATCH 00/15] Batch submission via GuC Daniel Vetter
2015-06-25  7:23   ` Dave Gordon
2015-06-25  8:05     ` Chris Wilson
2015-06-24 12:16 ` Daniel Vetter
2015-06-24 12:57   ` Chris Wilson

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20150617121812.GS23637@phenom.ffwll.local \
    --to=daniel@ffwll.ch \
    --cc=david.s.gordon@intel.com \
    --cc=intel-gfx@lists.freedesktop.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.