From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:51529) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1Z5clk-0007XU-0p for qemu-devel@nongnu.org; Thu, 18 Jun 2015 12:34:25 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1Z5cle-0005Zj-Sp for qemu-devel@nongnu.org; Thu, 18 Jun 2015 12:34:23 -0400 Received: from mx1.redhat.com ([209.132.183.28]:53124) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1Z5cle-0005ZX-Lf for qemu-devel@nongnu.org; Thu, 18 Jun 2015 12:34:18 -0400 Received: from int-mx10.intmail.prod.int.phx2.redhat.com (int-mx10.intmail.prod.int.phx2.redhat.com [10.5.11.23]) by mx1.redhat.com (Postfix) with ESMTPS id 2D7DD38DCA5 for ; Thu, 18 Jun 2015 16:34:18 +0000 (UTC) From: Markus Armbruster References: <1434525861-21768-1-git-send-email-mst@redhat.com> Date: Thu, 18 Jun 2015 18:34:15 +0200 In-Reply-To: <1434525861-21768-1-git-send-email-mst@redhat.com> (Michael S. Tsirkin's message of "Wed, 17 Jun 2015 09:24:40 +0200") Message-ID: <87a8vxrn5k.fsf@blackfin.pond.sub.org> MIME-Version: 1.0 Content-Type: text/plain Subject: Re: [Qemu-devel] [PATCH v2 0/3] error: allow local errors to trigger abort List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: "Michael S. Tsirkin" Cc: kwolf@redhat.com, qemu-devel@nongnu.org, dgilbert@redhat.com "Michael S. Tsirkin" writes: > It's a common idiom: > > Error *local_err = NULL; > .... > foo(&local_err); > ... > if (local_err) { > error_propagate(errp, local_err); > return; > } > > Unfortunately it means that call to foo(&local_err) will > not abort even if errp is set to error_abort. > > Instead, we get an abort at error_propagate which is too late, > that is, the quality of the stack trace is degraded in that it no longer > pinpoints the actual cause of failure. > > To fix, add an API to check errp and set local_err to error_abort > if errp is error_abort. > > This is out of RFC but I'm still not converting all users: > let's merge these patches, then I'll convert all users > on top. Let's take a step back and review intended use of Error before and after this series. Before: * Parameter Error **errp (by convention the last one) To create and return an error, do: error_setg(errp, ...); To propagate an existing error (typically one you received from a callee), do: error_propagate(errp, ...); You're not supposed to examine errp, let alone dereference it. * Actual argument - to receive an error: address of an Error * variable, its value must be null before the call, and may be examined afterwards - to ignore errors: null pointer - to abort on error: &error_abort This leads to a few patterns: T foo(..., Error **errp) { // pattern: receive, test and propagate error Error *err = NULL; bar(..., &err); if (err) { error_propagate(errp, err); return ...; } // pattern: set error if (...) { error_setg(errp, ...); return ...; } // pattern: pass through error // really the first pattern less the test simplified baz(..., errp); return ...; } Your patch modifies the "actual argument to receive an error" clause: the variable must now be either null or the value of error_init_local(errp). If it's the latter, then it must be passed to error_propagate(errp, err). We acquire a new pattern: // pattern: receive, test and propagate error Error *err = error_init_local(errp); bar(..., &err); if (err) { error_propagate(errp, err); return ...; } Let's see whether it can fully replace the existing pattern. Before you can convert err = NULL to err = error_init_local(errp), you have to: * find the appropriate errp (usually trivial) * double-check we error_propagate(errp, err) on every path leaving the function It's perfectly possible that we *don't* propagate on all paths: Error *err = NULL; bar(..., &err); if (error is non-recoverable) { error_propagate(errp, err); return ...; } recover and carry on Here, error_init_local(errp) would be *wrong*. The Error boilerplate is annoying, but at least there are few ways to get it wrong (the most common one is "Error * variable not null before the call"). I'm most reluctant to add more ways to get it wrong. Is the gain worth all this additional complexity? The gain is certainly real: backtrace shows where the error got created instead of where it was propagated to &error_abort. But the existing backtraces have never bothered me. When I get one ending at an error_propagate(errp, err), and I need the real one instead, finding it in the debugger is easy enough. The complexity does bother me. Here's an utterly trivial way to get some of the gain for none of the complexity: make error_setg() & friends store caller's __FILE__ and __LINE__.