From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:47939) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1Z5bEp-00082j-1X for qemu-devel@nongnu.org; Thu, 18 Jun 2015 10:56:20 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1Z5bEl-0000yc-RM for qemu-devel@nongnu.org; Thu, 18 Jun 2015 10:56:18 -0400 Received: from mail-yh0-f41.google.com ([209.85.213.41]:35727) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1Z5bEl-0000yQ-OS for qemu-devel@nongnu.org; Thu, 18 Jun 2015 10:56:15 -0400 Received: by yhak3 with SMTP id k3so57886263yha.2 for ; Thu, 18 Jun 2015 07:56:15 -0700 (PDT) MIME-Version: 1.0 In-Reply-To: <20150618142309.GA12305@flamenco> References: <20150617005222.GA18884@flamenco> <20150617213625.GA16082@flamenco> <20150618142309.GA12305@flamenco> From: Peter Maydell Date: Thu, 18 Jun 2015 15:55:54 +0100 Message-ID: Content-Type: text/plain; charset=UTF-8 Subject: Re: [Qemu-devel] linux-user crashes on clone(2) when run on ppc host List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: "Emilio G. Cota" Cc: Riku Voipio , "qemu-ppc@nongnu.org" , Alexander Graf , QEMU Developers On 18 June 2015 at 15:23, Emilio G. Cota wrote: > On Thu, Jun 18, 2015 at 08:42:40 +0100, Peter Maydell wrote: >> > What data structures are you referring to? Are they ppc-specific? >> >> None of the code generation data structures are locked at all -- >> if two threads try to generate code at the same time they'll >> tend to clobber each other. > > AFAICT tb_gen_code is called with a mutex held (the sequence is > mutex->tb_find_fast->tb_find_slow->tb_gen_code in cpu-exec.c) > > The only call to tb_gen_code that in usermode is not holding > the lock is in cpu_breakpoint_insert->breakpoint_invalidate-> > tb_invalidate_phys_page_range->tb_gen_code. I'm not using > gdb so I guess I cannot trigger this. > > Am I missing something? I'd forgotten we had that mutex. However it's not actually a sufficient fix for the problem. What needs to happen is that: (a) somebody actually sits down and figures out what data structures we have and what locking/per-cpuness/etc they need, ie a design (b) somebody implements that design This is happening as port of the TCG multithreading work: http://wiki.qemu.org/Features/tcg-multithread This is the bug we've had kicking around for a while about multithreading races: https://bugs.launchpad.net/qemu/+bug/1098729 As just one example race, consider the possibility that thread A calls tb_gen_code, which calls tb_alloc, which calls tb_flush, which clears the whole code cache, and then tb_gen_code starts generating code over the top of a TB that thread B was in the middle of executing from... >> On 17 June 2015 at 22:36, Emilio G. Cota wrote: >> > I don't think this is a race because it also breaks when >> > run on a single core (with taskset -c 0). > > As I said, this problem doesn't seem to be a race. The multiple threads will still all be racing with each other on the single core. In general I don't see much benefit in detailed investigation into the precise reason why a guest program crashes when the whole area is known to be fundamentally not designed right... thanks -- PMM