QEMU-Devel Archive mirror
 help / color / mirror / Atom feed
From: Programmingkid <programmingkidx@gmail.com>
To: "Alex Bennée" <alex.bennee@linaro.org>
Cc: Peter Maydell <peter.maydell@linaro.org>,
	Richard Henderson <richard.henderson@linaro.org>,
	Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>,
	qemu-devel@nongnu.org
Subject: Re: tb_flush() calls causing long Windows XP boot times
Date: Fri, 11 Jun 2021 11:01:35 -0400	[thread overview]
Message-ID: <3D29C466-BB81-4BCA-96E9-A46721B1ED59@gmail.com> (raw)
In-Reply-To: <874ke4iqf8.fsf@linaro.org>



> On Jun 11, 2021, at 7:24 AM, Alex Bennée <alex.bennee@linaro.org> wrote:
> 
> 
> Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk> writes:
> 
>> On 10/06/2021 14:14, Peter Maydell wrote:
>> 
>>> On Thu, 10 Jun 2021 at 14:02, Programmingkid <programmingkidx@gmail.com> wrote:
>>>> 
>>>> Hi Richard,
>>>> 
>>>> There is a function called breakpoint_invalidate() in cpu.c that
>>>> calls a function called tb_flush(). I have determined that this
>>>> call is being made over 200,000 times when Windows XP boots.
>>>> Disabling this function makes Windows XP boot way faster than
>>>> before. The time went down from around 3 minutes to 20 seconds when
>>>> I applied the patch below.
>>>> 
>>>> After I applied the patch I ran several tests in my VM's to see if anything broke. I could not find any problems. Here is the list my VM's I tested:
>>>> 
>>>> Mac OS 10.8 in qemu-system-x86_64
>>>> Windows 7 in qemu-system-x86_64
>>>> Windows XP in qemu-system-i386
>>>> Mac OS 10.4 in qemu-system-ppc
>>>> 
>>>> I would be happy if the patch below was accepted but I would like to know your thoughts.
>>> 
>>>>  cpu.c | 2 +-
>>>>  1 file changed, 1 insertion(+), 1 deletion(-)
>>>> 
>>>> diff --git a/cpu.c b/cpu.c
>>>> index bfbe5a66f9..297c2e4281 100644
>>>> --- a/cpu.c
>>>> +++ b/cpu.c
>>>> @@ -253,7 +253,7 @@ static void breakpoint_invalidate(CPUState *cpu, target_ulong pc)
>>>>       * Flush the whole TB cache to force re-translation of such TBs.
>>>>       * This is heavyweight, but we're debugging anyway.
>>>>       */
>>>> -    tb_flush(cpu);
>>>> +    /* tb_flush(cpu); */
>>>>  }
>>>>  #endif
>>> The patch is clearly wrong -- this function is called when a CPU
>>> breakpoint
>>> is added or removed, and we *must* drop generated code which either
>>> (a) includes code to take the breakpoint exception and now should not
>>> or (b) doesn't include code to take the breakpoint exception and now should.
>>> Otherwise we will incorrectly take/not take a breakpoint exception when
>>> that stale code is executed.
>>> As the comment notes, the assumption is that we won't be adding and
>>> removing breakpoints except when we're debugging and therefore
>>> performance is not critical. Windows XP is clearly doing something
>>> we weren't expecting, so we should ideally have a look at whether
>>> we can be a bit more efficient about not throwing the whole TB
>>> cache away.
>> 
>> FWIW this was reported a while back on Launchpad as
>> https://bugs.launchpad.net/qemu/+bug/1883593 where the performance
>> regression was traced back to:
>> 
>> commit b55f54bc965607c45b5010a107a792ba333ba654
>> Author: Max Filippov <jcmvbkbc@gmail.com>
>> Date:   Wed Nov 27 14:06:01 2019 -0800
>> 
>>    exec: flush CPU TB cache in breakpoint_invalidate
>> 
>>    When a breakpoint is inserted at location for which there's currently no
>>    virtual to physical translation no action is taken on CPU TB cache. If a
>>    TB for that virtual address already exists but is not visible ATM the
>>    breakpoint won't be hit next time an instruction at that address will be
>>    executed.
>> 
>>    Flush entire CPU TB cache in breakpoint_invalidate to force
>>    re-translation of all TBs for the breakpoint address.
>> 
>>    This change fixes the following scenario:
>>    - linux user application is running
>>    - a breakpoint is inserted from QEMU gdbstub for a user address that is
>>      not currently present in the target CPU TLB
>>    - an instruction at that address is executed, but the external debugger
>>      doesn't get control.
>> 
>>    Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
>>    Signed-off-by: Max Filippov <jcmvbkbc@gmail.com>
>>    Message-Id: <20191127220602.10827-2-jcmvbkbc@gmail.com>
>>    Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
> 
> This was a reversion of a reversion (see 406bc339b0 and a9353fe89). So
> we've bounced between solutions several times. Fundamentally if it gets
> tricky to ensure your translated state matches the actual machine state
> the easiest option is to throw everything away and start again.
> 
>> Perhaps Windows XP is constantly executing some kind of breakpoint
>> instruction or updating some CPU debug registers during boot?
> 
> It would be useful to identify what exactly is triggering the changes
> here. If it's old fashioned breakpoint instructions being inserted into
> memory we will need to ensure our invalidating of old translations is
> rock solid. If we are updating our internal breakpoint/watchpoint
> tracking as a result of the guest messing with debug registers maybe we
> can do something a bit more finessed?
> 
>> 
>> 
>> ATB,
>> 
>> Mark.
> 
> 
> -- 
> Alex Bennée

Hello Alex,

The good news is the source code to Windows XP is available online: https://github.com/cryptoAlgorithm/nt5src

  reply	other threads:[~2021-06-11 15:02 UTC|newest]

Thread overview: 19+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-06-10 12:59 tb_flush() calls causing long Windows XP boot times Programmingkid
2021-06-10 13:14 ` Peter Maydell
2021-06-10 13:24   ` Mark Cave-Ayland
2021-06-11 11:24     ` Alex Bennée
2021-06-11 15:01       ` Programmingkid [this message]
2021-06-11 17:13         ` Paolo Bonzini
2021-06-11 18:22           ` Alex Bennée
2021-06-13 14:03             ` Mark Cave-Ayland
2021-06-14 14:37               ` Alex Bennée
2021-06-15 13:58                 ` Programmingkid
2021-06-16  1:58                   ` Richard Henderson
2021-06-16  8:59                     ` Mark Cave-Ayland
2021-06-16 12:53                       ` Alex Bennée
2021-06-16 13:06                         ` Peter Maydell
2021-06-16 15:30                           ` Alex Bennée
2021-06-16 13:21                       ` Alex Bennée
2021-06-16 12:12                     ` Programmingkid
2021-06-10 13:38   ` Programmingkid
2021-06-14 22:19 ` no-reply

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=3D29C466-BB81-4BCA-96E9-A46721B1ED59@gmail.com \
    --to=programmingkidx@gmail.com \
    --cc=alex.bennee@linaro.org \
    --cc=mark.cave-ayland@ilande.co.uk \
    --cc=peter.maydell@linaro.org \
    --cc=qemu-devel@nongnu.org \
    --cc=richard.henderson@linaro.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).