From mboxrd@z Thu Jan  1 00:00:00 1970
Received: from eggs.gnu.org ([2001:4830:134:3::10]:47939)
	by lists.gnu.org with esmtp (Exim 4.71)
	(envelope-from <peter.maydell@linaro.org>) id 1Z5bEp-00082j-1X
	for qemu-devel@nongnu.org; Thu, 18 Jun 2015 10:56:20 -0400
Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71)
	(envelope-from <peter.maydell@linaro.org>) id 1Z5bEl-0000yc-RM
	for qemu-devel@nongnu.org; Thu, 18 Jun 2015 10:56:18 -0400
Received: from mail-yh0-f41.google.com ([209.85.213.41]:35727)
	by eggs.gnu.org with esmtp (Exim 4.71)
	(envelope-from <peter.maydell@linaro.org>) id 1Z5bEl-0000yQ-OS
	for qemu-devel@nongnu.org; Thu, 18 Jun 2015 10:56:15 -0400
Received: by yhak3 with SMTP id k3so57886263yha.2
	for <qemu-devel@nongnu.org>; Thu, 18 Jun 2015 07:56:15 -0700 (PDT)
MIME-Version: 1.0
In-Reply-To: <20150618142309.GA12305@flamenco>
References: <20150617005222.GA18884@flamenco>
	<CAFEAcA_y0d3T3368Ya9rnvKVTCsHvonnQguGAs=kaDdMS3Gh2w@mail.gmail.com>
	<20150617213625.GA16082@flamenco>
	<CAFEAcA-Rw+urpNKVfj=k1as+YaT72z+OAU5_gw0HyKsoLoMbaQ@mail.gmail.com>
	<20150618142309.GA12305@flamenco>
From: Peter Maydell <peter.maydell@linaro.org>
Date: Thu, 18 Jun 2015 15:55:54 +0100
Message-ID: <CAFEAcA8uUhiac86G1qchNQAEHJs0R6_s1BD5+qfMHU0Gw2ec5Q@mail.gmail.com>
Content-Type: text/plain; charset=UTF-8
Subject: Re: [Qemu-devel] linux-user crashes on clone(2) when run on ppc host
List-Id: <qemu-devel.nongnu.org>
List-Unsubscribe: <https://lists.nongnu.org/mailman/options/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=unsubscribe>
List-Archive: <http://lists.nongnu.org/archive/html/qemu-devel>
List-Post: <mailto:qemu-devel@nongnu.org>
List-Help: <mailto:qemu-devel-request@nongnu.org?subject=help>
List-Subscribe: <https://lists.nongnu.org/mailman/listinfo/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=subscribe>
To: "Emilio G. Cota" <cota@braap.org>
Cc: Riku Voipio <riku.voipio@iki.fi>, "qemu-ppc@nongnu.org" <qemu-ppc@nongnu.org>, Alexander Graf <agraf@suse.de>, QEMU Developers <qemu-devel@nongnu.org>

On 18 June 2015 at 15:23, Emilio G. Cota <cota@braap.org> wrote:
> On Thu, Jun 18, 2015 at 08:42:40 +0100, Peter Maydell wrote:
>> > What data structures are you referring to? Are they ppc-specific?
>>
>> None of the code generation data structures are locked at all --
>> if two threads try to generate code at the same time they'll
>> tend to clobber each other.
>
> AFAICT tb_gen_code is called with a mutex held (the sequence is
> mutex->tb_find_fast->tb_find_slow->tb_gen_code in cpu-exec.c)
>
> The only call to tb_gen_code that in usermode is not holding
> the lock is in cpu_breakpoint_insert->breakpoint_invalidate->
> tb_invalidate_phys_page_range->tb_gen_code. I'm not using
> gdb so I guess I cannot trigger this.
>
> Am I missing something?

I'd forgotten we had that mutex. However it's not actually
a sufficient fix for the problem. What needs to happen is
that:
 (a) somebody actually sits down and figures out what data
structures we have and what locking/per-cpuness/etc they need,
ie a design
 (b) somebody implements that design

This is happening as port of the TCG multithreading work:
  http://wiki.qemu.org/Features/tcg-multithread

This is the bug we've had kicking around for a while about
multithreading races:
 https://bugs.launchpad.net/qemu/+bug/1098729

As just one example race, consider the possibility that
thread A calls tb_gen_code, which calls tb_alloc, which
calls tb_flush, which clears the whole code cache, and then
tb_gen_code starts generating code over the top of a TB
that thread B was in the middle of executing from...

>> On 17 June 2015 at 22:36, Emilio G. Cota <cota@braap.org> wrote:
>> > I don't think this is a race because it also breaks when
>> > run on a single core (with taskset -c 0).
>
> As I said, this problem doesn't seem to be a race.

The multiple threads will still all be racing with each
other on the single core.

In general I don't see much benefit in detailed investigation
into the precise reason why a guest program crashes when
the whole area is known to be fundamentally not designed
right...

thanks
-- PMM