From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:44845) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1Z5eeM-0004NG-Rc for qemu-devel@nongnu.org; Thu, 18 Jun 2015 14:35:00 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1Z5eeG-0003IA-H8 for qemu-devel@nongnu.org; Thu, 18 Jun 2015 14:34:54 -0400 Received: from out5-smtp.messagingengine.com ([66.111.4.29]:42871) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1Z5eeG-0003Hr-Bp for qemu-devel@nongnu.org; Thu, 18 Jun 2015 14:34:48 -0400 Received: from compute1.internal (compute1.nyi.internal [10.202.2.41]) by mailout.nyi.internal (Postfix) with ESMTP id C767A20AD4 for ; Thu, 18 Jun 2015 14:34:47 -0400 (EDT) Date: Thu, 18 Jun 2015 14:36:06 -0400 From: "Emilio G. Cota" Message-ID: <20150618183606.GA15998@flamenco> References: <20150617005222.GA18884@flamenco> <20150617213625.GA16082@flamenco> <20150618142309.GA12305@flamenco> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: Subject: Re: [Qemu-devel] linux-user crashes on clone(2) when run on ppc host List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Peter Maydell Cc: Riku Voipio , "qemu-ppc@nongnu.org" , Alexander Graf , QEMU Developers On Thu, Jun 18, 2015 at 15:55:54 +0100, Peter Maydell wrote: > I'd forgotten we had that mutex. However it's not actually > a sufficient fix for the problem. What needs to happen is > that: > (a) somebody actually sits down and figures out what data > structures we have and what locking/per-cpuness/etc they need, > ie a design > (b) somebody implements that design I'm exactly doing (a) and (b). In doing so I've found this crash, and I think that it is not due to races in QEMU--it seems to be a ppc64 issue. > This is happening as port of the TCG multithreading work: > http://wiki.qemu.org/Features/tcg-multithread I'm closely following these discussions. My goal is to have a sane multithreaded linux-user first, and then move on to full-system. > This is the bug we've had kicking around for a while about > multithreading races: > https://bugs.launchpad.net/qemu/+bug/1098729 I've tried to reproduce it, but I can't (I could easily trigger it with the qemu that is packaged by Ubuntu 14.04). I've added a message to the thread stating this. > As just one example race, consider the possibility that > thread A calls tb_gen_code, which calls tb_alloc, which > calls tb_flush, which clears the whole code cache, and then > tb_gen_code starts generating code over the top of a TB > that thread B was in the middle of executing from... Agreed, this needs to be fixed. Certainly not the problem I'm reporting here, however. > >> On 17 June 2015 at 22:36, Emilio G. Cota wrote: > >> > I don't think this is a race because it also breaks when > >> > run on a single core (with taskset -c 0). > > > > As I said, this problem doesn't seem to be a race. > > The multiple threads will still all be racing with each > other on the single core. How? Even the bug report on launchpad cannot be triggered (on a relatively ancient qemu) if the program is pinned to just one host core. > In general I don't see much benefit in detailed investigation > into the precise reason why a guest program crashes when > the whole area is known to be fundamentally not designed > right... I'm working on such a design, not just on paper but with working code. For instance, my trying to run qemu on ppc64 is to test an initial solution to the TSO on RMO problem, i.e. the memory consistency mismatch. ppc64 is unfortunately the only SMP RMO machine I could get access to--I'd be happy to test this on an ARM SMP and forget about ppc64 for now if I could. To me this looks like a ppc64 issue, and I'd be very grateful if ppc64 folks could take a look. Thanks, Emilio