From mboxrd@z Thu Jan  1 00:00:00 1970
Received: from eggs.gnu.org ([2001:4830:134:3::10]:44845)
	by lists.gnu.org with esmtp (Exim 4.71)
	(envelope-from <cota@braap.org>) id 1Z5eeM-0004NG-Rc
	for qemu-devel@nongnu.org; Thu, 18 Jun 2015 14:35:00 -0400
Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71)
	(envelope-from <cota@braap.org>) id 1Z5eeG-0003IA-H8
	for qemu-devel@nongnu.org; Thu, 18 Jun 2015 14:34:54 -0400
Received: from out5-smtp.messagingengine.com ([66.111.4.29]:42871)
	by eggs.gnu.org with esmtp (Exim 4.71)
	(envelope-from <cota@braap.org>) id 1Z5eeG-0003Hr-Bp
	for qemu-devel@nongnu.org; Thu, 18 Jun 2015 14:34:48 -0400
Received: from compute1.internal (compute1.nyi.internal [10.202.2.41])
	by mailout.nyi.internal (Postfix) with ESMTP id C767A20AD4
	for <qemu-devel@nongnu.org>; Thu, 18 Jun 2015 14:34:47 -0400 (EDT)
Date: Thu, 18 Jun 2015 14:36:06 -0400
From: "Emilio G. Cota" <cota@braap.org>
Message-ID: <20150618183606.GA15998@flamenco>
References: <20150617005222.GA18884@flamenco>
	<CAFEAcA_y0d3T3368Ya9rnvKVTCsHvonnQguGAs=kaDdMS3Gh2w@mail.gmail.com>
	<20150617213625.GA16082@flamenco>
	<CAFEAcA-Rw+urpNKVfj=k1as+YaT72z+OAU5_gw0HyKsoLoMbaQ@mail.gmail.com>
	<20150618142309.GA12305@flamenco>
	<CAFEAcA8uUhiac86G1qchNQAEHJs0R6_s1BD5+qfMHU0Gw2ec5Q@mail.gmail.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <CAFEAcA8uUhiac86G1qchNQAEHJs0R6_s1BD5+qfMHU0Gw2ec5Q@mail.gmail.com>
Subject: Re: [Qemu-devel] linux-user crashes on clone(2) when run on ppc host
List-Id: <qemu-devel.nongnu.org>
List-Unsubscribe: <https://lists.nongnu.org/mailman/options/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=unsubscribe>
List-Archive: <http://lists.nongnu.org/archive/html/qemu-devel>
List-Post: <mailto:qemu-devel@nongnu.org>
List-Help: <mailto:qemu-devel-request@nongnu.org?subject=help>
List-Subscribe: <https://lists.nongnu.org/mailman/listinfo/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=subscribe>
To: Peter Maydell <peter.maydell@linaro.org>
Cc: Riku Voipio <riku.voipio@iki.fi>, "qemu-ppc@nongnu.org" <qemu-ppc@nongnu.org>, Alexander Graf <agraf@suse.de>, QEMU Developers <qemu-devel@nongnu.org>

On Thu, Jun 18, 2015 at 15:55:54 +0100, Peter Maydell wrote:
> I'd forgotten we had that mutex. However it's not actually
> a sufficient fix for the problem. What needs to happen is
> that:
>  (a) somebody actually sits down and figures out what data
> structures we have and what locking/per-cpuness/etc they need,
> ie a design
>  (b) somebody implements that design

I'm exactly doing (a) and (b). In doing so I've found this crash,
and I think that it is not due to races in QEMU--it seems
to be a ppc64 issue.

> This is happening as port of the TCG multithreading work:
>   http://wiki.qemu.org/Features/tcg-multithread

I'm closely following these discussions. My goal is to
have a sane multithreaded linux-user first, and then move on
to full-system.

> This is the bug we've had kicking around for a while about
> multithreading races:
>  https://bugs.launchpad.net/qemu/+bug/1098729

I've tried to reproduce it, but I can't (I could easily trigger
it with the qemu that is packaged by Ubuntu 14.04). I've
added a message to the thread stating this.
 
> As just one example race, consider the possibility that
> thread A calls tb_gen_code, which calls tb_alloc, which
> calls tb_flush, which clears the whole code cache, and then
> tb_gen_code starts generating code over the top of a TB
> that thread B was in the middle of executing from...

Agreed, this needs to be fixed. Certainly not the problem I'm
reporting here, however.

> >> On 17 June 2015 at 22:36, Emilio G. Cota <cota@braap.org> wrote:
> >> > I don't think this is a race because it also breaks when
> >> > run on a single core (with taskset -c 0).
> >
> > As I said, this problem doesn't seem to be a race.
> 
> The multiple threads will still all be racing with each
> other on the single core.

How? Even the bug report on launchpad cannot be triggered (on a
relatively ancient qemu) if the program is pinned to just one host core.

> In general I don't see much benefit in detailed investigation
> into the precise reason why a guest program crashes when
> the whole area is known to be fundamentally not designed
> right...

I'm working on such a design, not just on paper but with working
code. For instance, my trying to run qemu on ppc64 is to test an
initial solution to the TSO on RMO problem, i.e. the memory
consistency mismatch. ppc64 is unfortunately the only SMP RMO
machine I could get access to--I'd be happy to test this on an
ARM SMP and forget about ppc64 for now if I could.

To me this looks like a ppc64 issue, and I'd be very grateful
if ppc64 folks could take a look.

Thanks,

		Emilio