From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:44619) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1Z5V08-00064R-N3 for qemu-devel@nongnu.org; Thu, 18 Jun 2015 04:16:46 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1Z5V07-0005ud-Cr for qemu-devel@nongnu.org; Thu, 18 Jun 2015 04:16:44 -0400 Received: from hall.aurel32.net ([2001:bc8:30d7:101::1]:35158) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1Z5V07-0005u9-4Y for qemu-devel@nongnu.org; Thu, 18 Jun 2015 04:16:43 -0400 Date: Thu, 18 Jun 2015 10:16:40 +0200 From: Aurelien Jarno Message-ID: <20150618081640.GK931@aurel32.net> References: <20150617124158.3316.54954.stgit@PASHA-ISP> <20150617141901.GE19635@aurel32.net> <001101d0a996$19a72f80$4cf58e80$@Dovgaluk@ispras.ru> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <001101d0a996$19a72f80$4cf58e80$@Dovgaluk@ispras.ru> Subject: Re: [Qemu-devel] [PATCH v2 0/3] Fix exceptions handling for MIPS and i386 List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Pavel Dovgaluk Cc: pbonzini@redhat.com, rth7680@gmail.com, leon.alrae@imgtec.com, qemu-devel@nongnu.org On 2015-06-18 10:12, Pavel Dovgaluk wrote: > > From: Aurelien Jarno [mailto:aurelien@aurel32.net] > > On 2015-06-17 15:41, Pavel Dovgalyuk wrote: > > > In icount mode every translation block looks as follows: > > > > > > if icount < n then exit > > > icount -= n > > > instr1 > > > instr2 > > > ... > > > instrn > > > exit > > > > > > When one of these instructions initiates an exception, icount should be > > > restored and adjusted number of instructions should be subtracted from icount > > > instead of initial n. > > > > > > tlb_fill function passes retaddr to raise_exception, which allows restoring > > > current instructions in TB and correct icount calculation. > > > > > > When exception triggered with other function (e.g. by embedding call to > > > exception raising helper into TB), then PC is not passed as retaddr and > > > correct icount is not recovered. In such cases icount will be decreased > > > by the value equal to the size of TB. > > > > Looking at how icount work, I see it's basically a variable in the CPU > > state (icount_decr.u16.low), which is already accessed from the TB. > > Couldn't we adjust it using additional code before generating an > > exception, when in icount mode. > > > > For example for MIPS, we can add some code before generate_exception > > which use the value from s->gen_opc_icount[j] to adjust > > the variable icount_decr.u16.low. > > It is possible, but it will incur additional overhead, because we will > have to update icount every time the exception might be generated. > We'll have to update icount value before and after every helper call, > that can cause an exception: > > icount -= n > ... > instr_k > icount += n - k > helper > icount -= n - k > ... > > And this overhead will slowdown the code even if no exception occur. That's where I might disagree. Retranslation seems a very good idea on the paper, but in practice it doesn't seems to always bring the performance improvement it should. In addition it seems to be highly dependent on the target. Just to give some numbers, on MIPS (as your patch originally concerns this architecture), 40% of code generation is actually due to retranslation. The problem is that over the time we have improved a lot the code generation (liveness analysis, better register allocation, constant propagation, ...) and thus we have increased the code generation time. While it clearly has some benefits when this code is actually executed, it's not the case when the code is simply retranslated. In short we spend more time to find the CPU state corresponding to an exception than before. A simple way to show that is to apply the simple patch below, which disable retranslation and save the CPU state before each instruction: diff --git a/target-mips/translate.c b/target-mips/translate.c index 1d128ee..5238d71 100644 --- a/target-mips/translate.c +++ b/target-mips/translate.c @@ -19435,6 +19435,7 @@ gen_intermediate_code_internal(MIPSCPU *cpu, TranslationBlock *tb, LOG_DISAS("\ntb %p idx %d hflags %04x\n", tb, ctx.mem_idx, ctx.hflags); gen_tb_start(tb); while (ctx.bstate == BS_NONE) { + save_cpu_state(&ctx, 1); if (unlikely(!QTAILQ_EMPTY(&cs->breakpoints))) { QTAILQ_FOREACH(bp, &cs->breakpoints, entry) { if (bp->pc == ctx.pc) { diff --git a/translate-all.c b/translate-all.c index b6b0e1c..3d4c017 100644 --- a/translate-all.c +++ b/translate-all.c @@ -212,6 +212,8 @@ static int cpu_restore_state_from_tb(CPUState *cpu, TranslationBlock *tb, int64_t ti; #endif + return -1; + #ifdef CONFIG_PROFILER ti = profile_getclock(); #endif On x86, this patch brings a 5% boot time improvement on MIPS. One of the reason is that the TCG code generator has a good knowledge about which TCG ops or helpers can trigger an exception, so it can optimize out part of the instructions saving the CPU state. I guess that the host CPUs have also evolved over the time, now being superscalar and out-of-order so that saving the CPU state can be done "in background". Also it's just a quick and dirty patch, we can probably even do better. All of that to say that I am worried for the performances to see more paths through the retranslation code, especially on MIPS as it seems to be costly. That said I haven't really look in details at other targets, nor hosts. Now to come back about your patches, we might want to simply fix icount first, even if it has some performance impact, and deal with the retranslation issue separately, as it concerns more than just icount. -- Aurelien Jarno GPG: 4096R/1DDD8C9B aurelien@aurel32.net http://www.aurel32.net