From: Max Chou <max.chou@sifive.com>
To: Richard Henderson <richard.henderson@linaro.org>,
qemu-devel@nongnu.org, qemu-riscv@nongnu.org
Cc: dbarboza@ventanamicro.com, Riku Voipio <riku.voipio@iki.fi>,
Paolo Bonzini <pbonzini@redhat.com>
Subject: Re: [RFC PATCH 4/6] accel/tcg: Inline cpu_mmu_lookup function
Date: Sun, 18 Feb 2024 01:27:53 +0800 [thread overview]
Message-ID: <79830f29-485c-4755-1ba7-57aa76ccc9d7@sifive.com> (raw)
In-Reply-To: <4a2b1c91-0f25-4474-ad5c-13d9993a16f5@linaro.org>
Hi Richard,
After I tried to reduce the overhead of unnecessary segment flow and
memory plugin callbacks, I observed that there several functions consume
most of runtime from perf report.
So these three inline patches and previous patch were created to reduce
the overhead without modifying the functions.
The following are the experiment results.
The benchmark target is the bench-memcpy executable generated from the
glibc repository (release 2.38 with RVV support patch [1]).
- Execution command
`qemu-riscv64 -E TIMEOUTFACTOR=10 -R 1G -L {glibc build folder}/rootfs
-cpu
rv64,zba=true,zbb=true,v=true,vlen=256,vext_spec=v1.0,rvv_ta_all_1s=true,rvv_ma_all_1s=true
bench-memcpy`
- Total runtime
0. Original riscv-to-apply.next branch (commit ID: deb0ff0)
- Total execution time: ~383 sec.
1. Cherry pick PATCH 4 to riscv-to-apply.next branch (commit ID: deb0ff0)
- Total execution time: ~375 sec.
2. Cherry pick PATCH 4+5+6 to riscv-to-apply.next branch (commit ID:
deb0ff0)
- Total execution time: ~342 sec.
- Perf report (cycles)
0. Original riscv-to-apply.next branch (commit ID: deb0ff0)
# Samples: 385K of event 'cycles:u'
# Event count (approx.): 1411574690539
#
# Children Self Command Shared Object Symbol
# ........ ........ ............ .......................
............................................
#
47.54% 31.35% qemu-riscv64 qemu-riscv64 [.]
vext_ldst_us
25.60% 0.03% qemu-riscv64 qemu-riscv64 [.]
helper_vse8_v
18.87% 13.29% qemu-riscv64 qemu-riscv64 [.]
cpu_ldb_mmu
17.40% 0.04% qemu-riscv64 qemu-riscv64 [.]
helper_vle8_v
17.39% 15.71% qemu-riscv64 qemu-riscv64 [.]
cpu_stb_mmu
17.17% 17.17% qemu-riscv64 qemu-riscv64 [.]
qemu_plugin_vcpu_mem_cb
12.25% 0.00% qemu-riscv64 qemu-riscv64 [.]
helper_stb_mmu (inlined)
8.18% 4.08% qemu-riscv64 qemu-riscv64 [.] lde_b
8.17% 8.17% qemu-riscv64 qemu-riscv64 [.]
cpu_mmu_lookup
7.45% 0.00% qemu-riscv64 qemu-riscv64 [.]
plugin_load_cb (inlined)
7.32% 0.00% qemu-riscv64 qemu-riscv64 [.]
do_st1_mmu (inlined)
6.79% 0.00% qemu-riscv64 [unknown] [.]
0x00000000000000ff
5.91% 0.00% qemu-riscv64 qemu-riscv64 [.]
adjust_addr (inlined)
5.70% 0.00% qemu-riscv64 qemu-riscv64 [.]
do_ld1_mmu (inlined)
5.58% 0.00% qemu-riscv64 [unknown] [.]
0x00000000000001fe
5.23% 4.25% qemu-riscv64 qemu-riscv64 [.]
cpu_ldsb_data_ra
4.93% 0.00% qemu-riscv64 qemu-riscv64 [.]
get_memop (inlined)
4.11% 0.00% qemu-riscv64 qemu-riscv64 [.]
cpu_ldub_data_ra (inlined)
4.11% 0.00% qemu-riscv64 qemu-riscv64 [.]
cpu_ldub_mmuidx_ra (inlined)
2.88% 2.88% qemu-riscv64 qemu-riscv64 [.]
cpu_stb_data_ra
2.88% 0.00% qemu-riscv64 qemu-riscv64 [.]
cpu_stb_mmuidx_ra (inlined)
2.75% 0.00% qemu-riscv64 qemu-riscv64 [.] stb_p
(inlined)
2.66% 0.00% qemu-riscv64 qemu-riscv64 [.]
get_alignment_bits (inlined)
1.79% 0.00% qemu-riscv64 [unknown] [.]
0x00000000e40203bf
1.68% 0.00% qemu-riscv64 qemu-riscv64 [.]
plugin_store_cb (inlined)
1.60% 0.00% qemu-riscv64 [unknown] [.]
0x00000000e403733f
1.13% 0.00% qemu-riscv64 qemu-riscv64 [.] ldub_p
(inlined)
0.73% 0.73% qemu-riscv64 qemu-riscv64 [.] ste_b
0.53% 0.21% qemu-riscv64 qemu-riscv64 [.]
helper_lookup_tb_ptr
1. Cherry pick PATCH 4 to riscv-to-apply.next branch (commit ID: deb0ff0)
# Samples: 378K of event 'cycles:u'
# Event count (approx.): 1381912775966
#
# Children Self Command Shared Object Symbol
# ........ ........ ............ .......................
.......................................
#
63.30% 29.62% qemu-riscv64 qemu-riscv64 [.]
vext_ldst_us
30.77% 0.04% qemu-riscv64 qemu-riscv64 [.]
helper_vle8_v
28.59% 0.02% qemu-riscv64 qemu-riscv64 [.]
helper_vse8_v
22.78% 22.78% qemu-riscv64 qemu-riscv64 [.]
qemu_plugin_vcpu_mem_cb
21.40% 5.26% qemu-riscv64 qemu-riscv64 [.] lde_b
20.69% 10.40% qemu-riscv64 qemu-riscv64 [.]
cpu_ldb_mmu
20.06% 3.91% qemu-riscv64 qemu-riscv64 [.]
cpu_ldsb_data_ra
19.16% 0.00% qemu-riscv64 qemu-riscv64 [.]
cpu_ldub_data_ra (inlined)
19.16% 0.00% qemu-riscv64 qemu-riscv64 [.]
cpu_ldub_mmuidx_ra (inlined)
12.65% 9.36% qemu-riscv64 qemu-riscv64 [.]
cpu_stb_mmu
8.60% 0.00% qemu-riscv64 qemu-riscv64 [.]
plugin_load_cb (inlined)
6.73% 6.73% qemu-riscv64 qemu-riscv64 [.]
do_ld1_mmu.constprop.23
6.73% 0.00% qemu-riscv64 qemu-riscv64 [.]
do_ld1_mmu (inlined)
6.31% 0.00% qemu-riscv64 qemu-riscv64 [.]
helper_stb_mmu (inlined)
6.20% 6.20% qemu-riscv64 qemu-riscv64 [.] do_st1_mmu
6.20% 0.00% qemu-riscv64 qemu-riscv64 [.]
adjust_addr (inlined)
5.49% 0.00% qemu-riscv64 [unknown] [.]
0x00000000000001fe
3.82% 0.00% qemu-riscv64 qemu-riscv64 [.]
plugin_store_cb (inlined)
3.01% 0.00% qemu-riscv64 qemu-riscv64 [.] ldub_p
(inlined)
2.94% 0.00% qemu-riscv64 qemu-riscv64 [.]
cpu_mmu_lookup (inlined)
2.94% 0.00% qemu-riscv64 qemu-riscv64 [.]
get_alignment_bits (inlined)
2.91% 0.00% qemu-riscv64 qemu-riscv64 [.]
clear_helper_retaddr (inlined)
2.91% 2.91% qemu-riscv64 qemu-riscv64 [.] ste_b
2.90% 0.00% qemu-riscv64 qemu-riscv64 [.]
cpu_mmu_lookup (inlined)
2.90% 0.00% qemu-riscv64 qemu-riscv64 [.]
get_alignment_bits (inlined)
0.59% 0.24% qemu-riscv64 qemu-riscv64 [.]
helper_lookup_tb_ptr
2. Cherry pick PATCH 4+5+6 to riscv-to-apply.next branch (commit ID:
deb0ff0)
# Samples: 343K of event 'cycles:u'
# Event count (approx.): 1259748868940
#
# Children Self Command Shared Object Symbol
# ........ ........ ............ .......................
.......................................
#
64.16% 35.81% qemu-riscv64 qemu-riscv64 [.]
vext_ldst_us
30.96% 0.02% qemu-riscv64 qemu-riscv64 [.]
helper_vse8_v
27.44% 0.05% qemu-riscv64 qemu-riscv64 [.]
helper_vle8_v
18.37% 18.37% qemu-riscv64 qemu-riscv64 [.]
qemu_plugin_vcpu_mem_cb
17.33% 7.45% qemu-riscv64 qemu-riscv64 [.]
cpu_ldsb_data_ra
14.90% 5.02% qemu-riscv64 qemu-riscv64 [.] lde_b
14.83% 0.00% qemu-riscv64 qemu-riscv64 [.]
cpu_ldub_data_ra (inlined)
14.83% 0.00% qemu-riscv64 qemu-riscv64 [.]
cpu_ldub_mmuidx_ra (inlined)
14.15% 10.33% qemu-riscv64 qemu-riscv64 [.]
cpu_stb_mmu
11.25% 9.62% qemu-riscv64 qemu-riscv64 [.]
cpu_ldb_mmu
7.22% 0.00% qemu-riscv64 qemu-riscv64 [.]
adjust_addr (inlined)
6.99% 6.99% qemu-riscv64 qemu-riscv64 [.]
helper_stb_mmu
6.99% 0.00% qemu-riscv64 qemu-riscv64 [.]
do_st1_mmu (inlined)
5.96% 0.00% qemu-riscv64 qemu-riscv64 [.]
do_ld1_mmu (inlined)
5.95% 0.00% qemu-riscv64 qemu-riscv64 [.]
cpu_mmu_lookup (inlined)
5.30% 0.00% qemu-riscv64 [unknown] [.]
0x00000000000001fe
4.18% 0.00% qemu-riscv64 qemu-riscv64 [.]
plugin_store_cb (inlined)
3.22% 0.00% qemu-riscv64 qemu-riscv64 [.]
cpu_mmu_lookup (inlined)
3.22% 0.00% qemu-riscv64 qemu-riscv64 [.]
get_alignment_bits (inlined)
3.22% 3.22% qemu-riscv64 qemu-riscv64 [.] ste_b
3.19% 0.00% qemu-riscv64 qemu-riscv64 [.]
get_alignment_bits (inlined)
3.16% 0.00% qemu-riscv64 qemu-riscv64 [.]
clear_helper_retaddr (inlined)
2.76% 0.00% qemu-riscv64 qemu-riscv64 [.] g2h
(inlined)
2.76% 0.00% qemu-riscv64 qemu-riscv64 [.]
g2h_untagged (inlined)
2.30% 0.00% qemu-riscv64 qemu-riscv64 [.]
env_cpu (inlined)
2.21% 0.00% qemu-riscv64 qemu-riscv64 [.]
plugin_load_cb (inlined)
0.99% 0.00% qemu-riscv64 qemu-riscv64 [.]
env_cpu (inlined)
I agree that these functions are large functions to inline and I didn't
test these patches on all combinations (different guest architecture +
different host architecture).
So I think that we can drop these three patches until we can make sure
that these patches can get benefit on all combinations without side effect.
I'll focus on avoiding over-use of the full out-of-line load/store
routines for the next version.
Thanks for the suggestion and question,
Max
[1]
https://inbox.sourceware.org/libc-alpha/20230504074851.38763-1-hau.hsu@sifive.com
On 2024/2/16 4:10 AM, Richard Henderson wrote:
> On 2/15/24 09:28, Max Chou wrote:
>> Signed-off-by: Max Chou <max.chou@sifive.com>
>> ---
>> accel/tcg/user-exec.c | 7 +++++--
>> 1 file changed, 5 insertions(+), 2 deletions(-)
>>
>> diff --git a/accel/tcg/user-exec.c b/accel/tcg/user-exec.c
>> index 68b252cb8e8..c5453810eee 100644
>> --- a/accel/tcg/user-exec.c
>> +++ b/accel/tcg/user-exec.c
>> @@ -942,8 +942,11 @@ void page_reset_target_data(target_ulong start,
>> target_ulong last) { }
>> /* The system-mode versions of these helpers are in cputlb.c. */
>> -static void *cpu_mmu_lookup(CPUState *cpu, vaddr addr,
>> - MemOp mop, uintptr_t ra, MMUAccessType
>> type)
>> +static inline QEMU_ALWAYS_INLINE void *cpu_mmu_lookup(CPUState *cpu,
>> + vaddr addr,
>> + MemOp mop,
>> + uintptr_t ra,
>> + MMUAccessType type)
>> {
>> int a_bits = get_alignment_bits(mop);
>> void *ret;
>
> This is a large function. Why does it need to be inlined?
> For this and the next two patches I require evidence, because I don't
> believe you are attacking the problem correctly.
>
>
> r~
next prev parent reply other threads:[~2024-02-17 17:28 UTC|newest]
Thread overview: 19+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-02-15 19:28 [RFC PATCH 0/6] Improve the performance of RISC-V vector unit-stride ld/st instructions Max Chou
2024-02-15 19:28 ` [RFC PATCH 1/6] target/riscv: Seperate vector segment " Max Chou
2024-02-15 19:28 ` [RFC PATCH 2/6] accel/tcg: Avoid uncessary call overhead from qemu_plugin_vcpu_mem_cb Max Chou
2024-02-15 20:03 ` Richard Henderson
2024-02-17 9:08 ` Max Chou
2024-02-15 20:21 ` Daniel Henrique Barboza
2024-02-17 9:45 ` Max Chou
2024-02-15 19:28 ` [RFC PATCH 3/6] target/riscv: Inline vext_ldst_us and coressponding function for performance Max Chou
2024-02-15 20:09 ` Richard Henderson
2024-02-15 21:11 ` Daniel Henrique Barboza
2024-02-17 10:10 ` Max Chou
2024-02-15 19:28 ` [RFC PATCH 4/6] accel/tcg: Inline cpu_mmu_lookup function Max Chou
2024-02-15 20:10 ` Richard Henderson
2024-02-17 17:27 ` Max Chou [this message]
2024-02-15 19:28 ` [RFC PATCH 5/6] accel/tcg: Inline do_ld1_mmu function Max Chou
2024-02-15 20:12 ` Richard Henderson
2024-02-15 19:28 ` [RFC PATCH 6/6] accel/tcg: Inline do_st1_mmu function Max Chou
2024-02-15 20:24 ` [RFC PATCH 0/6] Improve the performance of RISC-V vector unit-stride ld/st instructions Richard Henderson
2024-02-17 9:52 ` Max Chou
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=79830f29-485c-4755-1ba7-57aa76ccc9d7@sifive.com \
--to=max.chou@sifive.com \
--cc=dbarboza@ventanamicro.com \
--cc=pbonzini@redhat.com \
--cc=qemu-devel@nongnu.org \
--cc=qemu-riscv@nongnu.org \
--cc=richard.henderson@linaro.org \
--cc=riku.voipio@iki.fi \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).