All the mail mirrored from lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v7 00/12] Migration: Transmit and detect zero pages in the multifd threads
@ 2022-08-02  6:38 Juan Quintela
  2022-08-02  6:38 ` [PATCH v7 01/12] multifd: Create page_size fields into both MultiFD{Recv, Send}Params Juan Quintela
                   ` (11 more replies)
  0 siblings, 12 replies; 43+ messages in thread
From: Juan Quintela @ 2022-08-02  6:38 UTC (permalink / raw)
  To: qemu-devel
  Cc: Marcel Apfelbaum, Dr. David Alan Gilbert, Leonardo Bras, Peter Xu,
	Eric Blake, Philippe Mathieu-Daudé, Yanan Wang,
	Juan Quintela, Markus Armbruster, Eduardo Habkost

Hi

In this version:

- rebase to latest upstream
- convert multifd-zero-pages property into main-zero-page capability, because:
  * libvirt handles capabilities way easiert
  * capabilities are off by default, so I have to change the name
  * this way one can check if zero_page is enabled or not
- minor changer here and there

Please review, Juan.

[v6]
In this version:
- document what protects each field in MultiFDRecv/SendParams
- calcule page_size once when we start the migration, and store it in
  a field
- Same for page_count.
- rebase to latest
- minor improvements here and there
- test on huge memory machines

Command line for all the tests:

gdb -q --ex "run" --args $QEMU \
	-name guest=$NAME,debug-threads=on \
	-m 16G \
	-smp 6 \
	-machine q35,accel=kvm,usb=off,dump-guest-core=off \
	-boot strict=on \
	-cpu host \
	-no-hpet \
	-rtc base=utc,driftfix=slew \
	-global kvm-pit.lost_tick_policy=delay \
	-global ICH9-LPC.disable_s3=1 \
	-global ICH9-LPC.disable_s4=1 \
	-device pcie-root-port,id=root.1,chassis=1,addr=0x2.0,multifunction=on \
	-device pcie-root-port,id=root.2,chassis=2,addr=0x2.1 \
	-device pcie-root-port,id=root.3,chassis=3,addr=0x2.2 \
	-device pcie-root-port,id=root.4,chassis=4,addr=0x2.3 \
	-device pcie-root-port,id=root.5,chassis=5,addr=0x2.4 \
	-device pcie-root-port,id=root.6,chassis=6,addr=0x2.5 \
	-device pcie-root-port,id=root.7,chassis=7,addr=0x2.6 \
	-device pcie-root-port,id=root.8,chassis=8,addr=0x2.7 \
	-blockdev driver=file,node-name=storage0,filename=$FILE,auto-read-only=true,discard=unmap \
	-blockdev driver=qcow2,node-name=format0,read-only=false,file=storage0 \
	-device virtio-blk-pci,id=virtio-disk0,drive=format0,bootindex=1,bus=root.1 \
	-netdev tap,id=hostnet0,vhost=on,script=/etc/kvm-ifup,downscript=/etc/kvm-ifdown \
	-device virtio-net-pci,id=net0,netdev=hostnet0,mac=$MAC,bus=root.2 \
	-device virtio-serial-pci,id=virtio-serial0,bus=root.3 \
	-device virtio-balloon-pci,id=balloon0,bus=root.4 \
	$GRAPHICS \
	$CONSOLE \
	-device virtconsole,id=console0,chardev=charconsole0 \
	-uuid 9d3be7da-e1ff-41a0-ac39-8b2e04de2c19 \
	-nodefaults \
	-msg timestamp=on \
	-no-user-config \
	$MONITOR \
	$TRACE \
	-global migration.x-multifd=on \
	-global migration.multifd-channels=16 \
	-global migration.x-max-bandwidth=$BANDWIDTH

Tests have been done in a single machine over localhost.  I didn't have 2 machines with 4TB of RAM for testing.

Tests done on a 12TB RAM machine.  Guests where running with 16GB, 1TB and 4TB RAM

tests run with:
- upstream multifd
- multifd + zero page
- precopy (only some of them)

tests done:
- idle clean guest (just booted guest)
- idle dirty guest (run a program to dirty all memory)
- test with stress (4 threads each dirtying 1GB RAM)

Executive summary

16GB guest
                Precopy            upstream          zero page
                Time    Downtime   Time    Downtime  Time    Downtime
clean idle      1548     93         1359   48         866    167
dirty idle     16222    220         2092   371       1870    258
busy 4GB       don't converge      31000   308       1604    371

In the dirty idle, there is some weirdness in the precopy case, I
tried several times and it always took too much time.  It should be
faster.

In the busy 4GB case, precopy don't converge (expected) and without
zero page, multifd is on the limit, it _almost_ don't convrge, it took
187 iterations to converge.

1TB
                Precopy            upstream          zero page
                Time    Downtime   Time    Downtime  Time    Downtime
clean idle     83174    381        72075   345       52966   273
dirty idle                        104587   381       75601   269
busy 2GB                           79912   345       58953   348

I only tried the clean idle case with 1TB.  Notice that it is already
significantively slower.  With 1TB RAM, zero page is clearly superior in all tests.

4TB
                upstream          zero page
                Time    Downtime  Time    Downtime
clean idle      317054  552       215567  500
dirty idle      357581  553       317428  744

The busy case here is similar to the 1TB guests, just takes much more time.

In conclusion, zero page detection on the migration threads is from a
bit to much faster than anything else.

I add here the output of info migrate and perf for all the migration
rounds.  The important bit that I found is that once that we introduce
zero pages, migration spends all its time copyng pages, that is where
it needs to be, not waiting for buffer_zero or similar.

Upstream
--------

16GB test

idle

precopy

Migration status: completed
total time: 1548 ms
downtime: 93 ms
setup: 16 ms
transferred ram: 624798 kbytes
throughput: 3343.01 mbps
remaining ram: 0 kbytes
total ram: 16777992 kbytes
duplicate: 4048839 pages
skipped: 0 pages
normal: 147016 pages
normal bytes: 588064 kbytes
dirty sync count: 3
page size: 4 kbytes
multifd bytes: 0 kbytes
pages-per-second: 651825
precopy ram: 498490 kbytes
downtime ram: 126307 kbytes

  41.76%  live_migration   qemu-system-x86_64       [.] buffer_zero_avx512
  14.68%  live_migration   [kernel.kallsyms]        [k] copy_user_enhanced_fast_string
   9.53%  live_migration   qemu-system-x86_64       [.] ram_find_and_save_block.part.0
   5.72%  live_migration   qemu-system-x86_64       [.] add_to_iovec
   3.89%  live_migration   qemu-system-x86_64       [.] ram_bytes_total_common
   2.50%  live_migration   qemu-system-x86_64       [.] qemu_put_byte.part.0
   2.45%  live_migration   qemu-system-x86_64       [.] bitmap_test_and_clear_atomic
   1.87%  live_migration   qemu-system-x86_64       [.] qemu_ram_is_migratable
   1.28%  live_migration   qemu-system-x86_64       [.] qemu_put_be32
   1.03%  live_migration   qemu-system-x86_64       [.] find_next_bit
   0.95%  live_migration   qemu-system-x86_64       [.] migrate_ignore_shared
   0.95%  live_migration   qemu-system-x86_64       [.] ram_save_iterate
   0.68%  live_migration   [kernel.kallsyms]        [k] tcp_sendmsg_locked
   0.67%  live_migration   qemu-system-x86_64       [.] kvm_log_clear
   0.56%  live_migration   qemu-system-x86_64       [.] save_zero_page_to_file.part.0
   0.51%  live_migration   qemu-system-x86_64       [.] qemu_put_byte
   0.43%  live_migration   [kernel.kallsyms]        [k] copy_page
   0.38%  live_migration   qemu-system-x86_64       [.] get_ptr_rcu_reader
   0.36%  live_migration   qemu-system-x86_64       [.] save_page_header
   0.33%  live_migration   [kernel.kallsyms]        [k] __memcg_kmem_charge_page
   0.33%  live_migration   qemu-system-x86_64       [.] runstate_is_running

upstream

Migration status: completed
total time: 1359 ms
downtime: 48 ms
setup: 35 ms
transferred ram: 603701 kbytes
throughput: 3737.66 mbps
remaining ram: 0 kbytes
total ram: 16777992 kbytes
duplicate: 4053362 pages
skipped: 0 pages
normal: 141517 pages
normal bytes: 566068 kbytes
dirty sync count: 3
page size: 4 kbytes
multifd bytes: 568076 kbytes
pages-per-second: 2039403
precopy ram: 35624 kbytes
downtime ram: 1 kbytes

  36.03%  live_migration   qemu-system-x86_64       [.] buffer_zero_avx512
   9.32%  live_migration   qemu-system-x86_64       [.] ram_find_and_save_block.part.0
   5.18%  live_migration   qemu-system-x86_64       [.] add_to_iovec
   4.15%  live_migration   qemu-system-x86_64       [.] ram_bytes_total_common
   2.60%  live_migration   [kernel.kallsyms]        [k] copy_page
   2.30%  live_migration   qemu-system-x86_64       [.] qemu_put_byte.part.0
   2.24%  live_migration   qemu-system-x86_64       [.] bitmap_test_and_clear_atomic
   1.96%  live_migration   qemu-system-x86_64       [.] qemu_ram_is_migratable
   1.30%  live_migration   qemu-system-x86_64       [.] qemu_put_be32
   1.12%  multifdsend_11   [kernel.kallsyms]        [k] copy_user_enhanced_fast_string
   1.00%  multifdsend_15   [kernel.kallsyms]        [k] copy_user_enhanced_fast_string
   0.94%  live_migration   qemu-system-x86_64       [.] find_next_bit
   0.93%  live_migration   qemu-system-x86_64       [.] migrate_ignore_shared
   0.91%  multifdsend_1    [kernel.kallsyms]        [k] copy_user_enhanced_fast_string
   0.88%  live_migration   qemu-system-x86_64       [.] ram_save_iterate
   0.88%  live_migration   libc.so.6                [.] __pthread_mutex_unlock_usercnt
   0.81%  live_migration   qemu-system-x86_64       [.] kvm_log_clear
   0.81%  multifdsend_7    [kernel.kallsyms]        [k] copy_user_enhanced_fast_string
   0.79%  multifdsend_3    [kernel.kallsyms]        [k] copy_user_enhanced_fast_string
   0.75%  multifdsend_2    [kernel.kallsyms]        [k] copy_user_enhanced_fast_string
   0.72%  live_migration   libc.so.6                [.] __pthread_mutex_lock
   0.70%  multifdsend_6    [kernel.kallsyms]        [k] copy_user_enhanced_fast_string
   0.70%  live_migration   qemu-system-x86_64       [.] save_zero_page_to_file.part.0
   0.70%  qemu-system-x86  [kernel.kallsyms]        [k] perf_event_alloc
   0.69%  multifdsend_10   [kernel.kallsyms]        [k] copy_user_enhanced_fast_string
   0.68%  multifdsend_9    [kernel.kallsyms]        [k] copy_user_enhanced_fast_string
   0.67%  multifdsend_14   [kernel.kallsyms]        [k] copy_user_enhanced_fast_string
   0.66%  multifdsend_13   [kernel.kallsyms]        [k] copy_user_enhanced_fast_string
   0.64%  multifdsend_12   [kernel.kallsyms]        [k] copy_user_enhanced_fast_string
   0.63%  multifdsend_8    [kernel.kallsyms]        [k] copy_user_enhanced_fast_string
   0.63%  multifdsend_5    [kernel.kallsyms]        [k] copy_user_enhanced_fast_string
   0.60%  multifdsend_4    [kernel.kallsyms]        [k] copy_user_enhanced_fast_string
   0.53%  multifdsend_0    [kernel.kallsyms]        [k] copy_user_enhanced_fast_string
   0.47%  live_migration   qemu-system-x86_64       [.] qemu_put_byte

zero page

Migration status: completed
total time: 866 ms
downtime: 167 ms
setup: 42 ms
transferred ram: 14627983 kbytes
throughput: 145431.53 mbps
remaining ram: 0 kbytes
total ram: 16777992 kbytes
duplicate: 4024050 pages
skipped: 0 pages
normal: 143374 pages
normal bytes: 573496 kbytes
dirty sync count: 3
page size: 4 kbytes
multifd bytes: 14627983 kbytes
pages-per-second: 4786693
precopy ram: 11033626 kbytes
downtime ram: 3594356 kbytes

   6.84%  live_migration   qemu-system-x86_64       [.] ram_find_and_save_block.part.0
   4.06%  live_migration   qemu-system-x86_64       [.] qemu_ram_is_migratable
   3.46%  live_migration   qemu-system-x86_64       [.] ram_bytes_total_common
   2.39%  live_migration   qemu-system-x86_64       [.] bitmap_test_and_clear_atomic
   1.59%  multifdsend_1    [kernel.kallsyms]        [k] copy_user_enhanced_fast_string
   1.50%  multifdsend_3    qemu-system-x86_64       [.] buffer_zero_avx512
   1.48%  multifdsend_10   qemu-system-x86_64       [.] buffer_zero_avx512
   1.32%  multifdsend_12   qemu-system-x86_64       [.] buffer_zero_avx512
   1.29%  multifdsend_1    qemu-system-x86_64       [.] buffer_zero_avx512
   1.25%  live_migration   qemu-system-x86_64       [.] find_next_bit
   1.24%  multifdsend_3    [kernel.kallsyms]        [k] copy_user_enhanced_fast_string
   1.20%  multifdsend_10   [kernel.kallsyms]        [k] copy_user_enhanced_fast_string
   1.20%  multifdsend_13   qemu-system-x86_64       [.] buffer_zero_avx512
   1.18%  live_migration   qemu-system-x86_64       [.] migrate_ignore_shared
   1.16%  multifdsend_12   [kernel.kallsyms]        [k] copy_user_enhanced_fast_string
   1.13%  live_migration   qemu-system-x86_64       [.] multifd_queue_page
   1.08%  multifdsend_0    qemu-system-x86_64       [.] buffer_zero_avx512
   1.06%  multifdsend_0    [kernel.kallsyms]        [k] copy_user_enhanced_fast_string
   0.94%  multifdsend_13   [kernel.kallsyms]        [k] copy_user_enhanced_fast_string
   0.92%  multifdsend_8    [kernel.kallsyms]        [k] copy_user_enhanced_fast_string
   0.91%  multifdsend_11   [kernel.kallsyms]        [k] copy_user_enhanced_fast_string
   0.90%  multifdsend_4    [kernel.kallsyms]        [k] copy_user_enhanced_fast_string

16GB guest

dirty

precopy

Migration status: completed
total time: 16222 ms
downtime: 220 ms
setup: 18 ms
transferred ram: 15927448 kbytes
throughput: 8052.38 mbps
remaining ram: 0 kbytes
total ram: 16777992 kbytes
duplicate: 222804 pages
skipped: 0 pages
normal: 3973611 pages
normal bytes: 15894444 kbytes
dirty sync count: 3
page size: 4 kbytes
multifd bytes: 0 kbytes
pages-per-second: 241728
precopy ram: 15670253 kbytes
downtime ram: 257194 kbytes

  38.22%  live_migration   [kernel.kallsyms]        [k] copy_page
  38.04%  live_migration   [kernel.kallsyms]        [k] copy_user_enhanced_fast_string
   2.55%  live_migration   qemu-system-x86_64       [.] buffer_zero_avx512
   2.45%  live_migration   [kernel.kallsyms]        [k] tcp_sendmsg_locked
   1.43%  live_migration   [kernel.kallsyms]        [k] free_pcp_prepare
   1.01%  live_migration   [kernel.kallsyms]        [k] _copy_from_iter
   0.79%  live_migration   qemu-system-x86_64       [.] ram_find_and_save_block.part.0
   0.79%  live_migration   [kernel.kallsyms]        [k] __list_del_entry_valid
   0.68%  live_migration   [kernel.kallsyms]        [k] check_new_pages
   0.64%  live_migration   qemu-system-x86_64       [.] add_to_iovec
   0.49%  live_migration   [kernel.kallsyms]        [k] skb_release_data
   0.39%  live_migration   [kernel.kallsyms]        [k] __skb_clone
   0.36%  live_migration   [kernel.kallsyms]        [k] total_mapcount
   0.34%  live_migration   qemu-system-x86_64       [.] ram_bytes_total_common
   0.32%  live_migration   [kernel.kallsyms]        [k] __dev_queue_xmit
   0.29%  live_migration   qemu-system-x86_64       [.] qemu_ram_is_migratable
   0.29%  live_migration   [kernel.kallsyms]        [k] __alloc_skb
   0.27%  live_migration   [kernel.kallsyms]        [k] __ip_queue_xmit
   0.26%  live_migration   [kernel.kallsyms]        [k] copy_user_generic_unrolled
   0.26%  live_migration   [kernel.kallsyms]        [k] __tcp_transmit_skb
   0.24%  live_migration   qemu-system-x86_64       [.] qemu_put_byte.part.0
   0.24%  live_migration   [kernel.kallsyms]        [k] skb_page_frag_refill

upstream

Migration status: completed
total time: 2092 ms
downtime: 371 ms
setup: 39 ms
transferred ram: 15929157 kbytes
throughput: 63562.98 mbps
remaining ram: 0 kbytes
total ram: 16777992 kbytes
duplicate: 224436 pages
skipped: 0 pages
normal: 3971430 pages
normal bytes: 15885720 kbytes
dirty sync count: 3
page size: 4 kbytes
multifd bytes: 15927184 kbytes
pages-per-second: 2441771
precopy ram: 1798 kbytes
downtime ram: 174 kbytes

  5.23%  live_migration   qemu-system-x86_64       [.] buffer_zero_avx512
   4.93%  multifdsend_7    [kernel.kallsyms]        [k] copy_user_enhanced_fast_string
   4.92%  multifdsend_6    [kernel.kallsyms]        [k] copy_user_enhanced_fast_string
   4.84%  multifdsend_9    [kernel.kallsyms]        [k] copy_user_enhanced_fast_string
   4.56%  multifdsend_1    [kernel.kallsyms]        [k] copy_user_enhanced_fast_string
   4.55%  multifdsend_15   [kernel.kallsyms]        [k] copy_user_enhanced_fast_string
   4.53%  multifdsend_13   [kernel.kallsyms]        [k] copy_user_enhanced_fast_string
   4.48%  multifdsend_11   [kernel.kallsyms]        [k] copy_user_enhanced_fast_string
   4.43%  multifdsend_12   [kernel.kallsyms]        [k] copy_user_enhanced_fast_string
   4.43%  multifdsend_3    [kernel.kallsyms]        [k] copy_user_enhanced_fast_string
   4.33%  multifdsend_5    [kernel.kallsyms]        [k] copy_user_enhanced_fast_string
   4.21%  multifdsend_14   [kernel.kallsyms]        [k] copy_user_enhanced_fast_string
   4.19%  multifdsend_8    [kernel.kallsyms]        [k] copy_user_enhanced_fast_string
   4.13%  multifdsend_10   [kernel.kallsyms]        [k] copy_user_enhanced_fast_string
   4.01%  multifdsend_0    [kernel.kallsyms]        [k] copy_user_enhanced_fast_string
   3.86%  multifdsend_4    [kernel.kallsyms]        [k] copy_user_enhanced_fast_string
   3.83%  multifdsend_2    [kernel.kallsyms]        [k] copy_user_enhanced_fast_string
   0.90%  live_migration   qemu-system-x86_64       [.] ram_find_and_save_block.part.0
   0.70%  live_migration   qemu-system-x86_64       [.] bitmap_test_and_clear_atomic
   0.69%  live_migration   libc.so.6                [.] __pthread_mutex_unlock_usercnt
   0.62%  live_migration   libc.so.6                [.] __pthread_mutex_lock
   0.37%  live_migration   qemu-system-x86_64       [.] ram_bytes_total_common
   0.29%  live_migration   qemu-system-x86_64       [.] qemu_ram_is_migratable
   0.27%  live_migration   qemu-system-x86_64       [.] multifd_send_pages

zero page

Migration status: completed
total time: 1870 ms
downtime: 258 ms
setup: 36 ms
transferred ram: 16998097 kbytes
throughput: 75927.79 mbps
remaining ram: 0 kbytes
total ram: 16777992 kbytes
duplicate: 222485 pages
skipped: 0 pages
normal: 3915115 pages
normal bytes: 15660460 kbytes
dirty sync count: 3
page size: 4 kbytes
multifd bytes: 16998097 kbytes
pages-per-second: 2555169
precopy ram: 13929973 kbytes
downtime ram: 3068124 kbytes

   4.66%  multifdsend_1    [kernel.kallsyms]        [k] copy_user_enhanced_fast_string
   4.60%  multifdsend_3    [kernel.kallsyms]        [k] copy_user_enhanced_fast_string
   4.49%  multifdsend_5    [kernel.kallsyms]        [k] copy_user_enhanced_fast_string
   4.39%  multifdsend_6    [kernel.kallsyms]        [k] copy_user_enhanced_fast_string
   4.36%  multifdsend_8    [kernel.kallsyms]        [k] copy_user_enhanced_fast_string
   4.21%  multifdsend_10   [kernel.kallsyms]        [k] copy_user_enhanced_fast_string
   4.20%  multifdsend_9    [kernel.kallsyms]        [k] copy_user_enhanced_fast_string
   4.18%  multifdsend_15   [kernel.kallsyms]        [k] copy_user_enhanced_fast_string
   4.17%  multifdsend_13   [kernel.kallsyms]        [k] copy_user_enhanced_fast_string
   4.07%  multifdsend_2    [kernel.kallsyms]        [k] copy_user_enhanced_fast_string
   3.97%  multifdsend_12   [kernel.kallsyms]        [k] copy_user_enhanced_fast_string
   3.96%  multifdsend_7    [kernel.kallsyms]        [k] copy_user_enhanced_fast_string
   3.89%  multifdsend_14   [kernel.kallsyms]        [k] copy_user_enhanced_fast_string
   3.73%  multifdsend_0    [kernel.kallsyms]        [k] copy_user_enhanced_fast_string
   3.68%  multifdsend_4    [kernel.kallsyms]        [k] copy_user_enhanced_fast_string
   3.44%  multifdsend_11   [kernel.kallsyms]        [k] copy_user_enhanced_fast_string
   2.52%  live_migration   libc.so.6                [.] __pthread_mutex_unlock_usercnt
   2.09%  live_migration   libc.so.6                [.] __pthread_mutex_lock
   1.03%  live_migration   qemu-system-x86_64       [.] ram_find_and_save_block.part.0
   0.97%  multifdsend_3    [kernel.kallsyms]        [k] copy_page
   0.94%  live_migration   qemu-system-x86_64       [.] multifd_send_pages
   0.79%  live_migration   qemu-system-x86_64       [.] qemu_mutex_lock_impl
   0.73%  multifdsend_11   [kernel.kallsyms]        [k] copy_page
   0.70%  live_migration   qemu-system-x86_64       [.] qemu_mutex_unlock_impl
   0.45%  live_migration   qemu-system-x86_64       [.] bitmap_test_and_clear_atomic
   0.41%  live_migration   qemu-system-x86_64       [.] qemu_ram_is_migratable

16GB guest

stress --vm 4 --vm-bytes 1G --vm-keep

precopy

Don't converge

upstream

Migration status: completed
total time: 31800 ms
downtime: 308 ms
setup: 40 ms
transferred ram: 295540640 kbytes
throughput: 76230.23 mbps
remaining ram: 0 kbytes
total ram: 16777992 kbytes
duplicate: 3006674 pages
skipped: 0 pages
normal: 73686367 pages
normal bytes: 294745468 kbytes
dirty sync count: 187
page size: 4 kbytes
multifd bytes: 295514209 kbytes
pages-per-second: 2118000
precopy ram: 26430 kbytes

  7.79%  live_migration   qemu-system-x86_64       [.] buffer_zero_avx512
   3.86%  multifdsend_5    [kernel.kallsyms]        [k] copy_user_enhanced_fast_string
   3.83%  multifdsend_1    [kernel.kallsyms]        [k] copy_user_enhanced_fast_string
   3.79%  multifdsend_9    [kernel.kallsyms]        [k] copy_user_enhanced_fast_string
   3.72%  multifdsend_10   [kernel.kallsyms]        [k] copy_user_enhanced_fast_string
   3.46%  multifdsend_12   [kernel.kallsyms]        [k] copy_user_enhanced_fast_string
   3.44%  multifdsend_8    [kernel.kallsyms]        [k] copy_user_enhanced_fast_string
   3.38%  multifdsend_6    [kernel.kallsyms]        [k] copy_user_enhanced_fast_string
   3.32%  multifdsend_14   [kernel.kallsyms]        [k] copy_user_enhanced_fast_string
   3.31%  multifdsend_15   [kernel.kallsyms]        [k] copy_user_enhanced_fast_string
   3.22%  multifdsend_4    [kernel.kallsyms]        [k] copy_user_enhanced_fast_string
   3.21%  multifdsend_13   [kernel.kallsyms]        [k] copy_user_enhanced_fast_string
   3.19%  multifdsend_7    [kernel.kallsyms]        [k] copy_user_enhanced_fast_string
   3.07%  multifdsend_0    [kernel.kallsyms]        [k] copy_user_enhanced_fast_string
   2.95%  multifdsend_2    [kernel.kallsyms]        [k] copy_user_enhanced_fast_string
   2.95%  multifdsend_11   [kernel.kallsyms]        [k] copy_user_enhanced_fast_string
   2.77%  multifdsend_3    [kernel.kallsyms]        [k] copy_user_enhanced_fast_string
   1.78%  live_migration   [kernel.kallsyms]        [k] kvm_set_pfn_dirty
   1.65%  live_migration   qemu-system-x86_64       [.] ram_find_and_save_block.part.0
   0.68%  live_migration   qemu-system-x86_64       [.] ram_bytes_total_common
   0.62%  live_migration   qemu-system-x86_64       [.] bitmap_test_and_clear_atomic
   0.46%  live_migration   qemu-system-x86_64       [.] qemu_ram_is_migratable
   0.41%  live_migration   [kernel.kallsyms]        [k] __handle_changed_spte
   0.40%  live_migration   [kernel.kallsyms]        [k] pfn_valid.part.0
   0.37%  live_migration   qemu-system-x86_64       [.] kvm_log_clear
   0.29%  CPU 2/KVM        [kernel.kallsyms]        [k] copy_page
   0.27%  live_migration   [kernel.kallsyms]        [k] clear_dirty_pt_masked
   0.27%  CPU 1/KVM        [kernel.kallsyms]        [k] copy_page
   0.26%  live_migration   [kernel.kallsyms]        [k] tdp_iter_next
   0.25%  CPU 1/KVM        [kernel.kallsyms]        [k] native_queued_spin_lock_slowpath.part.0
   0.24%  CPU 1/KVM        [kernel.kallsyms]        [k] mark_page_dirty_in_slot.part.0
   0.24%  CPU 2/KVM        [kernel.kallsyms]        [k] mark_page_dirty_in_slot.part.0

Zero page

Migration status: completed
total time: 1604 ms
downtime: 371 ms
setup: 32 ms
transferred ram: 20591268 kbytes
throughput: 107307.14 mbps
remaining ram: 0 kbytes
total ram: 16777992 kbytes
duplicate: 2984825 pages
skipped: 0 pages
normal: 2213496 pages
normal bytes: 8853984 kbytes
dirty sync count: 4
page size: 4 kbytes
multifd bytes: 20591268 kbytes
pages-per-second: 4659200
precopy ram: 15722803 kbytes
downtime ram: 4868465 kbytes

   3.21%  multifdsend_15   [kernel.kallsyms]        [k] copy_user_enhanced_fast_string
   2.92%  multifdsend_5    [kernel.kallsyms]        [k] copy_user_enhanced_fast_string
   2.86%  multifdsend_8    [kernel.kallsyms]        [k] copy_user_enhanced_fast_string
   2.81%  multifdsend_11   [kernel.kallsyms]        [k] copy_user_enhanced_fast_string
   2.80%  multifdsend_0    [kernel.kallsyms]        [k] copy_user_enhanced_fast_string
   2.79%  multifdsend_12   [kernel.kallsyms]        [k] copy_user_enhanced_fast_string
   2.78%  multifdsend_4    [kernel.kallsyms]        [k] copy_user_enhanced_fast_string
   2.73%  multifdsend_6    [kernel.kallsyms]        [k] copy_user_enhanced_fast_string
   2.73%  multifdsend_9    [kernel.kallsyms]        [k] copy_user_enhanced_fast_string
   2.69%  multifdsend_10   [kernel.kallsyms]        [k] copy_user_enhanced_fast_string
   2.62%  multifdsend_3    [kernel.kallsyms]        [k] copy_user_enhanced_fast_string
   2.60%  multifdsend_14   [kernel.kallsyms]        [k] copy_user_enhanced_fast_string
   2.59%  multifdsend_13   [kernel.kallsyms]        [k] copy_user_enhanced_fast_string
   2.58%  multifdsend_1    [kernel.kallsyms]        [k] copy_user_enhanced_fast_string
   2.55%  multifdsend_2    [kernel.kallsyms]        [k] copy_user_enhanced_fast_string
   2.38%  multifdsend_7    [kernel.kallsyms]        [k] copy_user_enhanced_fast_string
   1.44%  live_migration   libc.so.6                [.] __pthread_mutex_lock
   1.41%  live_migration   qemu-system-x86_64       [.] ram_find_and_save_block.part.0
   1.37%  live_migration   libc.so.6                [.] __pthread_mutex_unlock_usercnt
   0.80%  live_migration   qemu-system-x86_64       [.] qemu_ram_is_migratable
   0.78%  CPU 4/KVM        [kernel.kallsyms]        [k] _raw_read_lock
   0.78%  CPU 2/KVM        [kernel.kallsyms]        [k] _raw_read_lock
   0.77%  CPU 4/KVM        [kernel.kallsyms]        [k] tdp_mmu_map_handle_target_level
   0.77%  CPU 2/KVM        [kernel.kallsyms]        [k] tdp_mmu_map_handle_target_level
   0.76%  CPU 5/KVM        [kernel.kallsyms]        [k] tdp_mmu_map_handle_target_level
   0.75%  live_migration   qemu-system-x86_64       [.] multifd_send_pages
   0.74%  live_migration   qemu-system-x86_64       [.] ram_bytes_total_common
   0.73%  CPU 5/KVM        [kernel.kallsyms]        [k] _raw_read_lock
   0.67%  CPU 0/KVM        [kernel.kallsyms]        [k] copy_page
   0.62%  CPU 0/KVM        [kernel.kallsyms]        [k] tdp_mmu_map_handle_target_level
   0.62%  live_migration   qemu-system-x86_64       [.] qemu_mutex_lock_impl
   0.61%  CPU 0/KVM        [kernel.kallsyms]        [k] _raw_read_lock
   0.60%  CPU 2/KVM        [kernel.kallsyms]        [k] mark_page_dirty_in_slot.part.0
   0.58%  CPU 5/KVM        [kernel.kallsyms]        [k] mark_page_dirty_in_slot.part.0
   0.54%  live_migration   qemu-system-x86_64       [.] bitmap_test_and_clear_atomic
   0.53%  CPU 4/KVM        [kernel.kallsyms]        [k] mark_page_dirty_in_slot.part.0
   0.52%  CPU 0/KVM        [kernel.kallsyms]        [k] mark_page_dirty_in_slot.part.0
   0.49%  live_migration   [kernel.kallsyms]        [k] kvm_set_pfn_dirty

1TB guest

precopy

Migration status: completed
total time: 83147 ms
downtime: 381 ms
setup: 265 ms
transferred ram: 19565544 kbytes
throughput: 1933.88 mbps
remaining ram: 0 kbytes
total ram: 1073742600 kbytes
duplicate: 264135334 pages
skipped: 0 pages
normal: 4302604 pages
normal bytes: 17210416 kbytes
dirty sync count: 3
page size: 4 kbytes
multifd bytes: 0 kbytes
pages-per-second: 412882
precopy ram: 19085615 kbytes
downtime ram: 479929 kbytes

  43.50%  live_migration   qemu-system-x86_64       [.] buffer_zero_avx512
  11.27%  live_migration   qemu-system-x86_64       [.] ram_find_and_save_block.part.0
   8.33%  live_migration   [kernel.kallsyms]        [k] copy_user_enhanced_fast_string
   7.47%  live_migration   qemu-system-x86_64       [.] add_to_iovec
   4.41%  live_migration   qemu-system-x86_64       [.] ram_bytes_total_common
   3.42%  live_migration   qemu-system-x86_64       [.] qemu_ram_is_migratable
   3.06%  live_migration   qemu-system-x86_64       [.] qemu_put_byte.part.0
   2.62%  live_migration   qemu-system-x86_64       [.] bitmap_test_and_clear_atomic
   1.78%  live_migration   qemu-system-x86_64       [.] qemu_put_be32
   1.43%  live_migration   qemu-system-x86_64       [.] find_next_bit
   1.13%  live_migration   qemu-system-x86_64       [.] ram_save_iterate
   1.12%  live_migration   qemu-system-x86_64       [.] migrate_ignore_shared
   0.70%  live_migration   qemu-system-x86_64       [.] save_zero_page_to_file.part.0
   0.51%  live_migration   qemu-system-x86_64       [.] qemu_put_byte
   0.49%  live_migration   qemu-system-x86_64       [.] save_page_header
   0.48%  live_migration   qemu-system-x86_64       [.] qemu_put_be64
   0.40%  live_migration   qemu-system-x86_64       [.] migrate_postcopy_ram
   0.40%  live_migration   qemu-system-x86_64       [.] runstate_is_running
   0.35%  live_migration   [kernel.kallsyms]        [k] tcp_sendmsg_locked
   0.32%  live_migration   qemu-system-x86_64       [.] get_ptr_rcu_reader
   0.30%  live_migration   qemu-system-x86_64       [.] qemu_file_rate_limit
   0.30%  live_migration   qemu-system-x86_64       [.] migrate_use_xbzrle
   0.27%  live_migration   [kernel.kallsyms]        [k] __memcg_kmem_charge_page
   0.26%  live_migration   qemu-system-x86_64       [.] migrate_use_compression
   0.25%  live_migration   qemu-system-x86_64       [.] kvm_log_clear
   0.25%  live_migration   qemu-system-x86_64       [.] qemu_file_get_error

upstream

Migration status: completed
total time: 72075 ms
downtime: 345 ms
setup: 287 ms
transferred ram: 19601046 kbytes
throughput: 2236.79 mbps
remaining ram: 0 kbytes
total ram: 1073742600 kbytes
duplicate: 264134669 pages
normal: 4301611 pages
normal bytes: 17206444 kbytes
dirty sync count: 3
page size: 4 kbytes
multifd bytes: 17279539 kbytes
pages-per-second: 2458584
precopy ram: 2321505 kbytes
downtime ram: 1 kbytes
(qemu)

 39.09%  live_migration   qemu-system-x86_64       [.] buffer_zero_avx512
  10.85%  live_migration   qemu-system-x86_64       [.] ram_find_and_save_block.part.0
   6.92%  live_migration   qemu-system-x86_64       [.] add_to_iovec
   4.41%  live_migration   qemu-system-x86_64       [.] ram_bytes_total_common
   2.87%  live_migration   qemu-system-x86_64       [.] qemu_put_byte.part.0
   2.63%  live_migration   qemu-system-x86_64       [.] qemu_ram_is_migratable
   2.54%  live_migration   qemu-system-x86_64       [.] bitmap_test_and_clear_atomic
   1.70%  live_migration   qemu-system-x86_64       [.] qemu_put_be32
   1.31%  live_migration   qemu-system-x86_64       [.] find_next_bit
   1.11%  live_migration   qemu-system-x86_64       [.] migrate_ignore_shared
   1.05%  live_migration   qemu-system-x86_64       [.] ram_save_iterate
   0.80%  multifdsend_0    [kernel.kallsyms]        [k] copy_user_enhanced_fast_string
   0.79%  multifdsend_5    [kernel.kallsyms]        [k] copy_user_enhanced_fast_string
   0.78%  multifdsend_2    [kernel.kallsyms]        [k] copy_user_enhanced_fast_string
   0.78%  multifdsend_1    [kernel.kallsyms]        [k] copy_user_enhanced_fast_string
   0.76%  multifdsend_12   [kernel.kallsyms]        [k] copy_user_enhanced_fast_string
   0.75%  multifdsend_14   [kernel.kallsyms]        [k] copy_user_enhanced_fast_string
   0.75%  multifdsend_10   [kernel.kallsyms]        [k] copy_user_enhanced_fast_string
   0.73%  multifdsend_3    [kernel.kallsyms]        [k] copy_user_enhanced_fast_string
   0.73%  multifdsend_8    [kernel.kallsyms]        [k] copy_user_enhanced_fast_string
   0.72%  multifdsend_13   [kernel.kallsyms]        [k] copy_user_enhanced_fast_string
   0.72%  multifdsend_9    [kernel.kallsyms]        [k] copy_user_enhanced_fast_string
   0.71%  multifdsend_7    [kernel.kallsyms]        [k] copy_user_enhanced_fast_string
   0.71%  multifdsend_15   [kernel.kallsyms]        [k] copy_user_enhanced_fast_string
   0.69%  multifdsend_4    [kernel.kallsyms]        [k] copy_user_enhanced_fast_string
   0.66%  multifdsend_11   [kernel.kallsyms]        [k] copy_user_enhanced_fast_string
   0.65%  multifdsend_6    [kernel.kallsyms]        [k] copy_user_enhanced_fast_string
   0.63%  live_migration   qemu-system-x86_64       [.] save_zero_page_to_file.part.0
   0.53%  live_migration   [kernel.kallsyms]        [k] copy_user_enhanced_fast_string
   0.48%  live_migration   qemu-system-x86_64       [.] qemu_put_byte
   0.44%  live_migration   qemu-system-x86_64       [.] save_page_header
   0.44%  live_migration   qemu-system-x86_64       [.] qemu_put_be64
   0.39%  live_migration   qemu-system-x86_64       [.] migrate_postcopy_ram
   0.36%  live_migration   qemu-system-x86_64       [.] runstate_is_running
   0.33%  live_migration   qemu-system-x86_64       [.] get_ptr_rcu_reader
   0.28%  live_migration   [kernel.kallsyms]        [k] __memcg_kmem_charge_page
   0.27%  live_migration   qemu-system-x86_64       [.] migrate_use_compression
   0.26%  live_migration   qemu-system-x86_64       [.] qemu_file_rate_limit
   0.26%  live_migration   qemu-system-x86_64       [.] migrate_use_xbzrle
   0.24%  live_migration   qemu-system-x86_64       [.] qemu_file_get_error
   0.21%  live_migration   qemu-system-x86_64       [.] kvm_log_clear
   0.21%  live_migration   qemu-system-x86_64       [.] ram_transferred_add
   0.20%  live_migration   [kernel.kallsyms]        [k] try_charge_memcg
   0.19%  live_migration   qemu-system-x86_64       [.] ram_control_save_page
   0.18%  live_migration   qemu-system-x86_64       [.] buffer_is_zero
   0.18%  live_migration   qemu-system-x86_64       [.] cpu_physical_memory_set_dirty_lebitmap
   0.12%  live_migration   qemu-system-x86_64       [.] qemu_ram_pagesize
   0.11%  live_migration   [kernel.kallsyms]        [k] sync_regs
   0.11%  live_migration   libc.so.6                [.] __pthread_mutex_unlock_usercnt
   0.11%  live_migration   [kernel.kallsyms]        [k] clear_page_erms
   0.11%  live_migration   [kernel.kallsyms]        [k] kernel_init_free_pages.part.0
   0.11%  live_migration   qemu-system-x86_64       [.] migrate_background_snapshot
   0.10%  live_migration   qemu-system-x86_64       [.] migrate_release_ram
   0.10%  live_migration   [kernel.kallsyms]        [k] pte_alloc_one
   0.10%  live_migration   libc.so.6                [.] __pthread_mutex_lock
   0.10%  live_migration   [kernel.kallsyms]        [k] native_irq_return_iret
   0.08%  live_migration   [kernel.kallsyms]        [k] kvm_clear_dirty_log_protect
   0.07%  qemu-system-x86  [kernel.kallsyms]        [k] free_pcp_prepare
   0.06%  qemu-system-x86  [kernel.kallsyms]        [k] __free_pages
   0.06%  live_migration   [kernel.kallsyms]        [k] tdp_iter_next
   0.05%  live_migration   qemu-system-x86_64       [.] cpu_physical_memory_sync_dirty_bitmap.con
   0.05%  live_migration   [kernel.kallsyms]        [k] __list_del_entry_valid
   0.05%  live_migration   [kernel.kallsyms]        [k] _raw_spin_lock_irqsave
   0.05%  multifdsend_2    [kernel.kallsyms]        [k] native_queued_spin_lock_slowpath.part.0
   0.05%  multifdsend_11   [kernel.kallsyms]        [k] native_queued_spin_lock_slowpath.part.0
   0.05%  live_migration   [vdso]                   [.] 0x00000000000006f5
   0.05%  multifdsend_15   [kernel.kallsyms]        [k] native_queued_spin_lock_slowpath.part.0
   0.04%  multifdsend_1    [kernel.kallsyms]        [k] native_queued_spin_lock_slowpath.part.0
   0.04%  multifdsend_13   [kernel.kallsyms]        [k] native_queued_spin_lock_slowpath.part.0
   0.04%  multifdsend_4    [kernel.kallsyms]        [k] native_queued_spin_lock_slowpath.part.0
   0.04%  multifdsend_8    [kernel.kallsyms]        [k] native_queued_spin_lock_slowpath.part.0
   0.04%  live_migration   qemu-system-x86_64       [.] multifd_send_pages
   0.04%  multifdsend_0    [kernel.kallsyms]        [k] native_queued_spin_lock_slowpath.part.0
   0.04%  multifdsend_9    [kernel.kallsyms]        [k] native_queued_spin_lock_slowpath.part.0
   0.04%  multifdsend_14   [kernel.kallsyms]        [k] native_queued_spin_lock_slowpath.part.0
   0.04%  live_migration   [kernel.kallsyms]        [k] kvm_arch_mmu_enable_log_dirty_pt_masked
   0.04%  live_migration   [kernel.kallsyms]        [k] obj_cgroup_charge_pages
   0.04%  multifdsend_7    [kernel.kallsyms]        [k] native_queued_spin_lock_slowpath.part.0
   0.04%  multifdsend_12   [kernel.kallsyms]        [k] native_queued_spin_lock_slowpath.part.0
   0.04%  multifdsend_5    [kernel.kallsyms]        [k] native_queued_spin_lock_slowpath.part.0
   0.04%  multifdsend_10   [kernel.kallsyms]        [k] native_queued_spin_lock_slowpath.part.0
   0.04%  live_migration   [kernel.kallsyms]        [k] _raw_spin_lock
   0.04%  live_migration   qemu-system-x86_64       [.] qemu_mutex_unlock_impl

1TB idle, zero page

Migration status: completed
total time: 52966 ms
downtime: 409 ms
setup: 273 ms
transferred ram: 879229325 kbytes
throughput: 136690.83 mbps
remaining ram: 0 kbytes
total ram: 1073742600 kbytes
duplicate: 262093359 pages
skipped: 0 pages
normal: 4266123 pages
normal bytes: 17064492 kbytes
dirty sync count: 3
page size: 4 kbytes
multifd bytes: 879229317 kbytes
pages-per-second: 4024470
precopy ram: 874888589 kbytes
downtime ram: 4340735 kbytes

  14.42%  live_migration   qemu-system-x86_64       [.] ram_find_and_save_block.part.0          ◆
   2.97%  live_migration   qemu-system-x86_64       [.] ram_bytes_total_common                  ▒
   2.56%  live_migration   qemu-system-x86_64       [.] qemu_ram_is_migratable                  ▒
   2.50%  live_migration   qemu-system-x86_64       [.] multifd_queue_page                      ▒
   2.30%  live_migration   qemu-system-x86_64       [.] bitmap_test_and_clear_atomic            ▒
   1.17%  live_migration   qemu-system-x86_64       [.] find_next_bit                           ▒
   1.12%  multifdsend_14   qemu-system-x86_64       [.] buffer_zero_avx512                      ▒
   1.09%  live_migration   qemu-system-x86_64       [.] multifd_send_pages                      ▒
   1.08%  multifdsend_15   qemu-system-x86_64       [.] buffer_zero_avx512                      ▒
   1.07%  multifdsend_11   qemu-system-x86_64       [.] buffer_zero_avx512                      ▒
   1.03%  multifdsend_1    qemu-system-x86_64       [.] buffer_zero_avx512                      ▒
   1.03%  multifdsend_0    qemu-system-x86_64       [.] buffer_zero_avx512                      ▒
   1.03%  multifdsend_7    qemu-system-x86_64       [.] buffer_zero_avx512                      ▒
   1.03%  multifdsend_4    qemu-system-x86_64       [.] buffer_zero_avx512                      ▒
   1.02%  multifdsend_2    qemu-system-x86_64       [.] buffer_zero_avx512                      ▒
   1.02%  multifdsend_10   qemu-system-x86_64       [.] buffer_zero_avx512                      ▒
   1.02%  multifdsend_9    qemu-system-x86_64       [.] buffer_zero_avx512                      ▒
   1.02%  multifdsend_8    qemu-system-x86_64       [.] buffer_zero_avx512                      ▒
   1.01%  multifdsend_6    qemu-system-x86_64       [.] buffer_zero_avx512                      ▒
   1.00%  multifdsend_5    qemu-system-x86_64       [.] buffer_zero_avx512                      ▒
   0.99%  live_migration   libc.so.6                [.] __pthread_mutex_lock                    ▒
   0.98%  multifdsend_13   qemu-system-x86_64       [.] buffer_zero_avx512                      ▒
   0.98%  multifdsend_3    qemu-system-x86_64       [.] buffer_zero_avx512                      ▒
   0.93%  live_migration   qemu-system-x86_64       [.] migrate_ignore_shared                   ▒
   0.93%  multifdsend_12   qemu-system-x86_64       [.] buffer_zero_avx512                      ▒
   0.89%  live_migration   [kernel.kallsyms]        [k] futex_wake                              ▒
   0.83%  live_migration   libc.so.6                [.] __pthread_mutex_unlock_usercnt          ▒
   0.70%  multifdsend_12   [kernel.kallsyms]        [k] copy_user_enhanced_fast_string          ▒
   0.69%  multifdsend_6    [kernel.kallsyms]        [k] copy_user_enhanced_fast_string

1TB: stress  stress --vm 4 --vm-bytes 512M

Wait until load in guest reach 3 before doing the migration

upstream

Migration status: completed
total time: 79912 ms
downtime: 345 ms
setup: 300 ms
transferred ram: 23723877 kbytes
throughput: 2441.21 mbps
remaining ram: 0 kbytes
total ram: 1073742600 kbytes
duplicate: 263616778 pages
normal: 5330059 pages
normal bytes: 21320236 kbytes
dirty sync count: 4
page size: 4 kbytes
multifd bytes: 21406921 kbytes
pages-per-second: 2301580
precopy ram: 2316947 kbytes
downtime ram: 9 kbytes

  38.87%  live_migration   qemu-system-x86_64       [.] buffer_zero_avx512
   9.14%  live_migration   qemu-system-x86_64       [.] ram_find_and_save_block.part.0
   5.84%  live_migration   qemu-system-x86_64       [.] add_to_iovec
   3.80%  live_migration   qemu-system-x86_64       [.] ram_bytes_total_common
   2.41%  live_migration   qemu-system-x86_64       [.] qemu_put_byte.part.0
   2.14%  live_migration   qemu-system-x86_64       [.] qemu_ram_is_migratable
   2.10%  live_migration   qemu-system-x86_64       [.] bitmap_test_and_clear_atomic
   1.44%  live_migration   qemu-system-x86_64       [.] qemu_put_be32
   1.17%  live_migration   qemu-system-x86_64       [.] find_next_bit
   0.95%  live_migration   qemu-system-x86_64       [.] migrate_ignore_shared
   0.91%  multifdsend_14   [kernel.kallsyms]        [k] copy_user_enhanced_fast_string
   0.89%  live_migration   qemu-system-x86_64       [.] ram_save_iterate
   0.88%  multifdsend_10   [kernel.kallsyms]        [k] copy_user_enhanced_fast_string
   0.87%  multifdsend_1    [kernel.kallsyms]        [k] copy_user_enhanced_fast_string
   0.84%  multifdsend_4    [kernel.kallsyms]        [k] copy_user_enhanced_fast_string
   0.84%  multifdsend_13   [kernel.kallsyms]        [k] copy_user_enhanced_fast_string
   0.80%  multifdsend_3    [kernel.kallsyms]        [k] copy_user_enhanced_fast_string
   0.79%  multifdsend_7    [kernel.kallsyms]        [k] copy_user_enhanced_fast_string
   0.79%  multifdsend_5    [kernel.kallsyms]        [k] copy_user_enhanced_fast_string
   0.78%  multifdsend_6    [kernel.kallsyms]        [k] copy_user_enhanced_fast_string
   0.78%  multifdsend_8    [kernel.kallsyms]        [k] copy_user_enhanced_fast_string
   0.78%  multifdsend_15   [kernel.kallsyms]        [k] copy_user_enhanced_fast_string
   0.77%  multifdsend_12   [kernel.kallsyms]        [k] copy_user_enhanced_fast_string
   0.76%  multifdsend_0    [kernel.kallsyms]        [k] copy_user_enhanced_fast_string
   0.75%  multifdsend_2    [kernel.kallsyms]        [k] copy_user_enhanced_fast_string
   0.74%  multifdsend_11   [kernel.kallsyms]        [k] copy_user_enhanced_fast_string
   0.70%  multifdsend_9    [kernel.kallsyms]        [k] copy_user_enhanced_fast_string
   0.66%  live_migration   [kernel.kallsyms]        [k] copy_user_enhanced_fast_string
   0.58%  live_migration   qemu-system-x86_64       [.] save_zero_page_to_file.part.0
   0.45%  live_migration   qemu-system-x86_64       [.] kvm_log_clear

zero page

Migration status: completed
total time: 58953 ms
downtime: 373 ms
setup: 348 ms
transferred ram: 972143021 kbytes
throughput: 135889.41 mbps
remaining ram: 0 kbytes
total ram: 1073742600 kbytes
duplicate: 261357013 pages
skipped: 0 pages
normal: 5293916 pages
normal bytes: 21175664 kbytes
dirty sync count: 4
page size: 4 kbytes
multifd bytes: 972143012 kbytes
pages-per-second: 3699692
precopy ram: 968625243 kbytes
downtime ram: 3517778 kbytes

 12.91%  live_migration   qemu-system-x86_64       [.] ram_find_and_save_block.part.0
   2.85%  live_migration   qemu-system-x86_64       [.] ram_bytes_total_common
   2.16%  live_migration   qemu-system-x86_64       [.] bitmap_test_and_clear_atomic
   2.05%  live_migration   qemu-system-x86_64       [.] qemu_ram_is_migratable
   1.17%  live_migration   qemu-system-x86_64       [.] multifd_queue_page
   1.13%  multifdsend_4    qemu-system-x86_64       [.] buffer_zero_avx512
   1.12%  multifdsend_1    qemu-system-x86_64       [.] buffer_zero_avx512
   1.08%  live_migration   qemu-system-x86_64       [.] find_next_bit
   1.07%  multifdsend_14   qemu-system-x86_64       [.] buffer_zero_avx512
   1.07%  multifdsend_15   qemu-system-x86_64       [.] buffer_zero_avx512
   1.06%  multifdsend_2    qemu-system-x86_64       [.] buffer_zero_avx512
   1.06%  live_migration   qemu-system-x86_64       [.] migrate_ignore_shared
   1.06%  live_migration   qemu-system-x86_64       [.] multifd_send_pages
   1.04%  multifdsend_9    qemu-system-x86_64       [.] buffer_zero_avx512
   1.04%  multifdsend_0    qemu-system-x86_64       [.] buffer_zero_avx512
   1.04%  multifdsend_3    qemu-system-x86_64       [.] buffer_zero_avx512
   1.03%  multifdsend_11   qemu-system-x86_64       [.] buffer_zero_avx512
   1.01%  multifdsend_5    qemu-system-x86_64       [.] buffer_zero_avx512
   0.99%  multifdsend_7    qemu-system-x86_64       [.] buffer_zero_avx512
   0.98%  multifdsend_6    qemu-system-x86_64       [.] buffer_zero_avx512
   0.98%  multifdsend_8    qemu-system-x86_64       [.] buffer_zero_avx512
   0.95%  multifdsend_13   qemu-system-x86_64       [.] buffer_zero_avx512
   0.94%  multifdsend_12   qemu-system-x86_64       [.] buffer_zero_avx512
   0.92%  multifdsend_10   qemu-system-x86_64       [.] buffer_zero_avx512
   0.89%  live_migration   libc.so.6                [.] __pthread_mutex_unlock_usercnt
   0.85%  multifdsend_13   [kernel.kallsyms]        [k] copy_user_enhanced_fast_string
   0.84%  multifdsend_6    [kernel.kallsyms]        [k] copy_user_enhanced_fast_string
   0.84%  multifdsend_3    [kernel.kallsyms]        [k] copy_user_enhanced_fast_string
   0.81%  live_migration   libc.so.6                [.] __pthread_mutex_lock

1TB: stress  stress --vm 4 --vm-bytes 1024M

upstream

Migration status: completed
total time: 79302 ms
downtime: 315 ms
setup: 307 ms
transferred ram: 30307307 kbytes
throughput: 3142.99 mbps
remaining ram: 0 kbytes
total ram: 1073742600 kbytes
duplicate: 263089198 pages
skipped: 0 pages
normal: 6972933 pages
normal bytes: 27891732 kbytes
dirty sync count: 7
page size: 4 kbytes
multifd bytes: 27994987 kbytes
pages-per-second: 1875902
precopy ram: 2312314 kbytes
downtime ram: 4 kbytes

  35.46%  live_migration   qemu-system-x86_64       [.] buffer_zero_avx512
   9.27%  live_migration   qemu-system-x86_64       [.] ram_find_and_save_block.part.0
   6.02%  live_migration   qemu-system-x86_64       [.] add_to_iovec
   3.68%  live_migration   qemu-system-x86_64       [.] ram_bytes_total_common
   2.64%  live_migration   qemu-system-x86_64       [.] qemu_ram_is_migratable
   2.51%  live_migration   qemu-system-x86_64       [.] qemu_put_byte.part.0
   2.31%  live_migration   qemu-system-x86_64       [.] bitmap_test_and_clear_atomic
   1.46%  live_migration   qemu-system-x86_64       [.] qemu_put_be32
   1.23%  live_migration   qemu-system-x86_64       [.] find_next_bit
   1.05%  multifdsend_9    [kernel.kallsyms]        [k] copy_user_enhanced_fast_string
   1.03%  multifdsend_2    [kernel.kallsyms]        [k] copy_user_enhanced_fast_string
   1.01%  multifdsend_3    [kernel.kallsyms]        [k] copy_user_enhanced_fast_string
   1.01%  multifdsend_5    [kernel.kallsyms]        [k] copy_user_enhanced_fast_string
   1.01%  multifdsend_14   [kernel.kallsyms]        [k] copy_user_enhanced_fast_string
   1.00%  multifdsend_11   [kernel.kallsyms]        [k] copy_user_enhanced_fast_string
   0.99%  multifdsend_1    [kernel.kallsyms]        [k] copy_user_enhanced_fast_string
   0.99%  multifdsend_10   [kernel.kallsyms]        [k] copy_user_enhanced_fast_string
   0.99%  multifdsend_15   [kernel.kallsyms]        [k] copy_user_enhanced_fast_string
   0.96%  live_migration   qemu-system-x86_64       [.] migrate_ignore_shared
   0.95%  multifdsend_4    [kernel.kallsyms]        [k] copy_user_enhanced_fast_string
   0.93%  multifdsend_12   [kernel.kallsyms]        [k] copy_user_enhanced_fast_string
   0.91%  live_migration   qemu-system-x86_64       [.] ram_save_iterate
   0.90%  multifdsend_7    [kernel.kallsyms]        [k] copy_user_enhanced_fast_string
   0.87%  live_migration   qemu-system-x86_64       [.] kvm_log_clear
   0.87%  multifdsend_6    [kernel.kallsyms]        [k] copy_user_enhanced_fast_string
   0.82%  multifdsend_8    [kernel.kallsyms]        [k] copy_user_enhanced_fast_string
   0.82%  multifdsend_0    [kernel.kallsyms]        [k] copy_user_enhanced_fast_string
   0.65%  multifdsend_13   [kernel.kallsyms]        [k] copy_user_enhanced_fast_string
   0.58%  live_migration   qemu-system-x86_64       [.] save_zero_page_to_file.part.0
   0.47%  live_migration   [kernel.kallsyms]        [k] copy_user_enhanced_fast_string

zero_page

900GB dirty + idle

mig_mon mm_dirty -m 10000 -p once

upstream

Migration status: completed
total time: 104587 ms
downtime: 381 ms
setup: 311 ms
transferred ram: 943318066 kbytes
throughput: 74107.80 mbps
remaining ram: 0 kbytes
total ram: 1073742600 kbytes
duplicate: 33298094 pages
skipped: 0 pages
normal: 235142522 pages
normal bytes: 940570088 kbytes
dirty sync count: 3
page size: 4 kbytes
multifd bytes: 943025391 kbytes
pages-per-second: 3331126
precopy ram: 292673 kbytes
downtime ram: 1 kbytes

  7.71%  live_migration   qemu-system-x86_64       [.] buffer_zero_avx512
   4.55%  multifdsend_4    [kernel.kallsyms]        [k] copy_user_enhanced_fast_string
   4.48%  multifdsend_7    [kernel.kallsyms]        [k] copy_user_enhanced_fast_string
   4.36%  multifdsend_12   [kernel.kallsyms]        [k] copy_user_enhanced_fast_string
   4.36%  multifdsend_10   [kernel.kallsyms]        [k] copy_user_enhanced_fast_string
   4.31%  multifdsend_15   [kernel.kallsyms]        [k] copy_user_enhanced_fast_string
   4.29%  multifdsend_9    [kernel.kallsyms]        [k] copy_user_enhanced_fast_string
   4.27%  multifdsend_0    [kernel.kallsyms]        [k] copy_user_enhanced_fast_string
   4.23%  multifdsend_3    [kernel.kallsyms]        [k] copy_user_enhanced_fast_string
   4.17%  multifdsend_5    [kernel.kallsyms]        [k] copy_user_enhanced_fast_string
   4.06%  multifdsend_14   [kernel.kallsyms]        [k] copy_user_enhanced_fast_string
   3.94%  multifdsend_13   [kernel.kallsyms]        [k] copy_user_enhanced_fast_string
   3.89%  multifdsend_11   [kernel.kallsyms]        [k] copy_user_enhanced_fast_string
   3.59%  multifdsend_6    [kernel.kallsyms]        [k] copy_user_enhanced_fast_string
   3.25%  multifdsend_2    [kernel.kallsyms]        [k] copy_user_enhanced_fast_string
   3.12%  multifdsend_1    [kernel.kallsyms]        [k] copy_user_enhanced_fast_string
   2.72%  multifdsend_8    [kernel.kallsyms]        [k] copy_user_enhanced_fast_string
   1.54%  live_migration   [kernel.kallsyms]        [k] copy_page
   1.39%  live_migration   qemu-system-x86_64       [.] ram_find_and_save_block.part.0
   0.86%  live_migration   qemu-system-x86_64       [.] bitmap_test_and_clear_atomic
   0.50%  live_migration   qemu-system-x86_64       [.] ram_bytes_total_common
   0.49%  live_migration   qemu-system-x86_64       [.] qemu_ram_is_migratable
   0.26%  multifdsend_7    [kernel.kallsyms]        [k] tcp_sendmsg_locked
   0.25%  multifdsend_4    [kernel.kallsyms]        [k] tcp_sendmsg_locked
   0.25%  multifdsend_10   [kernel.kallsyms]        [k] tcp_sendmsg_locked
   0.25%  multifdsend_9    [kernel.kallsyms]        [k] tcp_sendmsg_locked
   0.25%  multifdsend_15   [kernel.kallsyms]        [k] tcp_sendmsg_locked
   0.24%  multifdsend_12   [kernel.kallsyms]        [k] tcp_sendmsg_locked
   0.23%  multifdsend_5    [kernel.kallsyms]        [k] tcp_sendmsg_locked
   0.23%  multifdsend_0    [kernel.kallsyms]        [k] tcp_sendmsg_locked
   0.23%  multifdsend_3    [kernel.kallsyms]        [k] tcp_sendmsg_locked
   0.21%  multifdsend_14   [kernel.kallsyms]        [k] tcp_sendmsg_locked
   0.18%  live_migration   qemu-system-x86_64       [.] find_next_bit

Migration status: completed
total time: 75601 ms
downtime: 427 ms
setup: 269 ms
transferred ram: 1083999214 kbytes
throughput: 117879.85 mbps
remaining ram: 0 kbytes
total ram: 1073742600 kbytes
duplicate: 32991750 pages
skipped: 0 pages
normal: 232638485 pages
normal bytes: 930553940 kbytes
dirty sync count: 3
page size: 4 kbytes
multifd bytes: 1083999202 kbytes
pages-per-second: 3669333
precopy ram: 1080197079 kbytes
downtime ram: 3802134 kbytes

   4.41%  multifdsend_0    [kernel.kallsyms]        [k] copy_user_enhanced_fast_string
   4.38%  multifdsend_12   [kernel.kallsyms]        [k] copy_user_enhanced_fast_string
   4.37%  multifdsend_7    [kernel.kallsyms]        [k] copy_user_enhanced_fast_string
   4.32%  multifdsend_5    [kernel.kallsyms]        [k] copy_user_enhanced_fast_string
   4.29%  multifdsend_14   [kernel.kallsyms]        [k] copy_user_enhanced_fast_string
   4.29%  multifdsend_15   [kernel.kallsyms]        [k] copy_user_enhanced_fast_string
   4.28%  multifdsend_10   [kernel.kallsyms]        [k] copy_user_enhanced_fast_string
   4.27%  multifdsend_4    [kernel.kallsyms]        [k] copy_user_enhanced_fast_string
   4.16%  multifdsend_1    [kernel.kallsyms]        [k] copy_user_enhanced_fast_string
   4.09%  multifdsend_8    [kernel.kallsyms]        [k] copy_user_enhanced_fast_string
   4.07%  multifdsend_13   [kernel.kallsyms]        [k] copy_user_enhanced_fast_string
   4.07%  multifdsend_2    [kernel.kallsyms]        [k] copy_user_enhanced_fast_string
   4.07%  multifdsend_3    [kernel.kallsyms]        [k] copy_user_enhanced_fast_string
   4.07%  multifdsend_6    [kernel.kallsyms]        [k] copy_user_enhanced_fast_string
   4.07%  multifdsend_9    [kernel.kallsyms]        [k] copy_user_enhanced_fast_string
   4.07%  multifdsend_11   [kernel.kallsyms]        [k] copy_user_enhanced_fast_string
   1.59%  live_migration   libc.so.6                [.] __pthread_mutex_unlock_usercnt
   1.59%  live_migration   qemu-system-x86_64       [.] ram_find_and_save_block.part.0
   1.39%  live_migration   libc.so.6                [.] __pthread_mutex_lock
   0.80%  live_migration   qemu-system-x86_64       [.] multifd_send_pages
   0.65%  multifdsend_14   [kernel.kallsyms]        [k] copy_page
   0.63%  multifdsend_1    [kernel.kallsyms]        [k] copy_page
   0.58%  live_migration   qemu-system-x86_64       [.] qemu_mutex_lock_impl
   0.48%  live_migration   qemu-system-x86_64       [.] qemu_mutex_unlock_impl
   0.40%  live_migration   qemu-system-x86_64       [.] ram_bytes_total_common
   0.29%  live_migration   qemu-system-x86_64       [.] bitmap_test_and_clear_atomic
   0.26%  live_migration   qemu-system-x86_64       [.] qemu_ram_is_migratable

4TB idle

upstream

Migration status: completed
total time: 317054 ms
downtime: 552 ms
setup: 1045 ms
transferred ram: 77208692 kbytes
throughput: 2001.52 mbps
remaining ram: 0 kbytes
total ram: 4294968072 kbytes
duplicate: 1056844269 pages
skipped: 0 pages
normal: 16904683 pages
normal bytes: 67618732 kbytes
dirty sync count: 3
page size: 4 kbytes
multifd bytes: 67919974 kbytes
pages-per-second: 3477766
precopy ram: 9288715 kbytes
downtime ram: 2 kbytes

 44.27%  live_migration   qemu-system-x86_64       [.] buffer_zero_avx512
  10.21%  live_migration   qemu-system-x86_64       [.] ram_find_and_save_block.part.0
   6.58%  live_migration   qemu-system-x86_64       [.] add_to_iovec
   4.25%  live_migration   qemu-system-x86_64       [.] ram_bytes_total_common
   2.70%  live_migration   qemu-system-x86_64       [.] qemu_put_byte.part.0
   2.43%  live_migration   qemu-system-x86_64       [.] bitmap_test_and_clear_atomic
   2.34%  live_migration   qemu-system-x86_64       [.] qemu_ram_is_migratable
   1.59%  live_migration   qemu-system-x86_64       [.] qemu_put_be32
   1.30%  live_migration   qemu-system-x86_64       [.] find_next_bit
   1.08%  live_migration   qemu-system-x86_64       [.] migrate_ignore_shared
   0.98%  live_migration   qemu-system-x86_64       [.] ram_save_iterate
   0.78%  multifdsend_9    [kernel.kallsyms]        [k] copy_user_enhanced_fast_string
   0.74%  multifdsend_15   [kernel.kallsyms]        [k] copy_user_enhanced_fast_string
   0.70%  multifdsend_7    [kernel.kallsyms]        [k] copy_user_enhanced_fast_string
   0.68%  multifdsend_6    [kernel.kallsyms]        [k] copy_user_enhanced_fast_string
   0.67%  live_migration   [kernel.kallsyms]        [k] copy_user_enhanced_fast_string
   0.66%  multifdsend_0    [kernel.kallsyms]        [k] copy_user_enhanced_fast_string
   0.66%  multifdsend_8    [kernel.kallsyms]        [k] copy_user_enhanced_fast_string
   0.64%  multifdsend_3    [kernel.kallsyms]        [k] copy_user_enhanced_fast_string
   0.62%  multifdsend_10   [kernel.kallsyms]        [k] copy_user_enhanced_fast_string
   0.61%  live_migration   qemu-system-x86_64       [.] save_zero_page_to_file.part.0
   0.56%  multifdsend_4    [kernel.kallsyms]        [k] copy_user_enhanced_fast_string
   0.55%  multifdsend_13   [kernel.kallsyms]        [k] copy_user_enhanced_fast_string
   0.54%  multifdsend_2    [kernel.kallsyms]        [k] copy_user_enhanced_fast_string
   0.52%  multifdsend_1    [kernel.kallsyms]        [k] copy_user_enhanced_fast_string
   0.52%  multifdsend_11   [kernel.kallsyms]        [k] copy_user_enhanced_fast_string
   0.52%  multifdsend_12   [kernel.kallsyms]        [k] copy_user_enhanced_fast_string
   0.51%  multifdsend_5    [kernel.kallsyms]        [k] copy_user_enhanced_fast_string
   0.49%  multifdsend_14   [kernel.kallsyms]        [k] copy_user_enhanced_fast_string
   0.45%  live_migration   qemu-system-x86_64       [.] qemu_put_byte
   0.42%  live_migration   qemu-system-x86_64       [.] save_page_header
   0.41%  live_migration   qemu-system-x86_64       [.] qemu_put_be64
   0.35%  live_migration   qemu-system-x86_64       [.] migrate_postcopy_ram

zero_page

Migration status: completed
total time: 215567 ms
downtime: 500 ms
setup: 1040 ms
transferred ram: 3587151463 kbytes
throughput: 136980.19 mbps
remaining ram: 0 kbytes
total ram: 4294968072 kbytes
duplicate: 1048466740 pages
skipped: 0 pages
normal: 16747893 pages
normal bytes: 66991572 kbytes
dirty sync count: 3
page size: 4 kbytes
multifd bytes: 3587151430 kbytes
pages-per-second: 4104960
precopy ram: 3583004863 kbytes
downtime ram: 4146599 kbytes

 15.49%  live_migration   qemu-system-x86_64       [.] ram_find_and_save_block.part.0
   3.20%  live_migration   qemu-system-x86_64       [.] ram_bytes_total_common
   2.67%  live_migration   qemu-system-x86_64       [.] multifd_queue_page
   2.33%  live_migration   qemu-system-x86_64       [.] bitmap_test_and_clear_atomic
   2.19%  live_migration   qemu-system-x86_64       [.] qemu_ram_is_migratable
   1.19%  live_migration   qemu-system-x86_64       [.] find_next_bit
   1.18%  live_migration   qemu-system-x86_64       [.] migrate_ignore_shared
   1.14%  live_migration   qemu-system-x86_64       [.] multifd_send_pages
   1.02%  multifdsend_10   qemu-system-x86_64       [.] buffer_zero_avx512
   1.01%  multifdsend_9    qemu-system-x86_64       [.] buffer_zero_avx512
   1.01%  multifdsend_8    qemu-system-x86_64       [.] buffer_zero_avx512
   1.00%  multifdsend_5    qemu-system-x86_64       [.] buffer_zero_avx512
   1.00%  multifdsend_3    qemu-system-x86_64       [.] buffer_zero_avx512
   1.00%  multifdsend_15   qemu-system-x86_64       [.] buffer_zero_avx512
   0.99%  multifdsend_2    qemu-system-x86_64       [.] buffer_zero_avx512
   0.99%  multifdsend_6    qemu-system-x86_64       [.] buffer_zero_avx512
   0.99%  multifdsend_14   qemu-system-x86_64       [.] buffer_zero_avx512
   0.99%  multifdsend_0    qemu-system-x86_64       [.] buffer_zero_avx512
   0.98%  multifdsend_13   qemu-system-x86_64       [.] buffer_zero_avx512
   0.97%  multifdsend_1    qemu-system-x86_64       [.] buffer_zero_avx512
   0.97%  multifdsend_7    qemu-system-x86_64       [.] buffer_zero_avx512
   0.96%  live_migration   [kernel.kallsyms]        [k] futex_wake
   0.96%  multifdsend_11   qemu-system-x86_64       [.] buffer_zero_avx512
   0.93%  multifdsend_4    qemu-system-x86_64       [.] buffer_zero_avx512
   0.88%  multifdsend_12   qemu-system-x86_64       [.] buffer_zero_avx512
   0.81%  live_migration   [kernel.kallsyms]        [k] send_call_function_single_ipi
   0.71%  live_migration   qemu-system-x86_64       [.] ram_save_iterate
   0.63%  multifdsend_6    [kernel.kallsyms]        [k] copy_user_enhanced_fast_string

4TB dirty + idle

    mig_mon mm_dirty -m 3900000 -p once

upstream

Migration status: completed
total time: 357581 ms
downtime: 553 ms
setup: 1295 ms
transferred ram: 4080035248 kbytes
throughput: 93811.30 mbps
remaining ram: 0 kbytes
total ram: 4294968072 kbytes
duplicate: 56507728 pages
skipped: 0 pages
normal: 1017239053 pages
normal bytes: 4068956212 kbytes
dirty sync count: 3
page size: 4 kbytes
multifd bytes: 4079538545 kbytes
pages-per-second: 3610116
precopy ram: 496701 kbytes
downtime ram: 2 kbytes

   5.07%  live_migration   qemu-system-x86_64       [.] buffer_zero_avx512
   4.99%  multifdsend_9    [kernel.kallsyms]        [k] copy_user_enhanced_fast_string
   4.99%  multifdsend_5    [kernel.kallsyms]        [k] copy_user_enhanced_fast_string
   4.97%  multifdsend_2    [kernel.kallsyms]        [k] copy_user_enhanced_fast_string
   4.96%  multifdsend_7    [kernel.kallsyms]        [k] copy_user_enhanced_fast_string
   4.95%  multifdsend_10   [kernel.kallsyms]        [k] copy_user_enhanced_fast_string
   4.91%  multifdsend_0    [kernel.kallsyms]        [k] copy_user_enhanced_fast_string
   4.65%  multifdsend_13   [kernel.kallsyms]        [k] copy_user_enhanced_fast_string
   4.56%  multifdsend_11   [kernel.kallsyms]        [k] copy_user_enhanced_fast_string
   4.33%  multifdsend_8    [kernel.kallsyms]        [k] copy_user_enhanced_fast_string
   4.16%  multifdsend_3    [kernel.kallsyms]        [k] copy_user_enhanced_fast_string
   3.83%  multifdsend_14   [kernel.kallsyms]        [k] copy_user_enhanced_fast_string
   3.79%  multifdsend_1    [kernel.kallsyms]        [k] copy_user_enhanced_fast_string
   3.75%  multifdsend_15   [kernel.kallsyms]        [k] copy_user_enhanced_fast_string
   3.73%  multifdsend_12   [kernel.kallsyms]        [k] copy_user_enhanced_fast_string
   3.58%  multifdsend_6    [kernel.kallsyms]        [k] copy_user_enhanced_fast_string
   0.95%  live_migration   qemu-system-x86_64       [.] ram_find_and_save_block.part.0
   0.88%  live_migration   qemu-system-x86_64       [.] bitmap_test_and_clear_atomic
   0.36%  live_migration   qemu-system-x86_64       [.] ram_bytes_total_common
   0.32%  multifdsend_4    [kernel.kallsyms]        [k] tcp_sendmsg_locked
   0.30%  multifdsend_5    [kernel.kallsyms]        [k] tcp_sendmsg_locked
   0.30%  multifdsend_2    [kernel.kallsyms]        [k] tcp_sendmsg_locked
   0.30%  multifdsend_0    [kernel.kallsyms]        [k] tcp_sendmsg_locked
   0.30%  multifdsend_9    [kernel.kallsyms]        [k] tcp_sendmsg_locked
   0.30%  multifdsend_7    [kernel.kallsyms]        [k] tcp_sendmsg_locked
   0.30%  multifdsend_10   [kernel.kallsyms]        [k] tcp_sendmsg_locked
   0.26%  live_migration   qemu-system-x86_64       [.] qemu_ram_is_migratable
   0.22%  multifdsend_8    [kernel.kallsyms]        [k] tcp_sendmsg_locked
   0.22%  multifdsend_11   [kernel.kallsyms]        [k] tcp_sendmsg_locked
   0.19%  multifdsend_13   [kernel.kallsyms]        [k] tcp_sendmsg_locked
   0.19%  multifdsend_3    [kernel.kallsyms]        [k] tcp_sendmsg_locked
   0.17%  multifdsend_12   [kernel.kallsyms]        [k] tcp_sendmsg_locked
   0.15%  multifdsend_14   [kernel.kallsyms]        [k] tcp_sendmsg_locked
   0.14%  multifdsend_10   [kernel.kallsyms]        [k] _copy_from_iter

zero_page

Migration status: completed
total time: 317428 ms
downtime: 744 ms
setup: 1192 ms
transferred ram: 4340691359 kbytes
throughput: 112444.34 mbps
remaining ram: 0 kbytes
total ram: 4294968072 kbytes
duplicate: 55993692 pages
normal: 1005801180 pages
normal bytes: 4023204720 kbytes
dirty sync count: 3
page size: 4 kbytes
multifd bytes: 4340691312 kbytes
pages-per-second: 3417846
precopy ram: 4336921795 kbytes
downtime ram: 3769564 kbytes

  4.38%  multifdsend_5    [kernel.kallsyms]        [k] copy_user_enhanced_fast_string
   4.38%  multifdsend_10   [kernel.kallsyms]        [k] copy_user_enhanced_fast_string
   4.37%  multifdsend_11   [kernel.kallsyms]        [k] copy_user_enhanced_fast_string
   4.34%  multifdsend_3    [kernel.kallsyms]        [k] copy_user_enhanced_fast_string
   4.29%  multifdsend_4    [kernel.kallsyms]        [k] copy_user_enhanced_fast_string
   4.28%  multifdsend_9    [kernel.kallsyms]        [k] copy_user_enhanced_fast_string
   4.27%  multifdsend_12   [kernel.kallsyms]        [k] copy_user_enhanced_fast_string
   4.26%  multifdsend_1    [kernel.kallsyms]        [k] copy_user_enhanced_fast_string
   4.23%  multifdsend_13   [kernel.kallsyms]        [k] copy_user_enhanced_fast_string
   4.18%  multifdsend_6    [kernel.kallsyms]        [k] copy_user_enhanced_fast_string
   4.18%  multifdsend_2    [kernel.kallsyms]        [k] copy_user_enhanced_fast_string
   3.90%  multifdsend_0    [kernel.kallsyms]        [k] copy_user_enhanced_fast_string
   3.86%  multifdsend_14   [kernel.kallsyms]        [k] copy_user_enhanced_fast_string
   3.84%  multifdsend_7    [kernel.kallsyms]        [k] copy_user_enhanced_fast_string
   3.73%  multifdsend_8    [kernel.kallsyms]        [k] copy_user_enhanced_fast_string
   3.73%  multifdsend_15   [kernel.kallsyms]        [k] copy_user_enhanced_fast_string
   1.59%  live_migration   qemu-system-x86_64       [.] ram_find_and_save_block.part.0
   1.45%  live_migration   libc.so.6                [.] __pthread_mutex_unlock_usercnt
   1.28%  live_migration   libc.so.6                [.] __pthread_mutex_lock
   1.02%  multifdsend_8    [kernel.kallsyms]        [k] copy_page
   0.96%  multifdsend_15   [kernel.kallsyms]        [k] copy_page
   0.83%  multifdsend_14   [kernel.kallsyms]        [k] copy_page
   0.81%  multifdsend_7    [kernel.kallsyms]        [k] copy_page
   0.75%  multifdsend_0    [kernel.kallsyms]        [k] copy_page
   0.69%  live_migration   qemu-system-x86_64       [.] multifd_send_pages
   0.48%  live_migration   qemu-system-x86_64       [.] qemu_mutex_unlock_impl
   0.48%  live_migration   qemu-system-x86_64       [.] qemu_mutex_lock_impl

[v5]

In this version:
- Rebase to latest
- Address all comments
- statistics about zero pages are right now (or at least much better than before)
- changed how we calculate the amount of transferred ram
- numbers, who don't like numbers.

Everything has been checked with a guest launched like the following
command.  Migration is running through localhost.  Will send numbers
with real hardware as soon as I get access to the machines that have
it (I checked with previous versions already, but not this one).

[removed example]

Please review, Juan.

[v4]
In this version
- Rebase to latest
- Address all comments from previous versions
- code cleanup

Please review.

[v2]
This is a rebase against last master.

And the reason for resend is to configure properly git-publish and
hope this time that git-publish send all the patches.

Please, review.

[v1]
Since Friday version:
- More cleanups on the code
- Remove repeated calls to qemu_target_page_size()
- Establish normal pages and zero pages
- detect zero pages on the multifd threads
- send zero pages through the multifd channels.
- reviews by Richard addressed.

It pases migration-test, so it should be perfect O:+)

ToDo for next version:
- check the version changes
  I need that 6.2 is out to check for 7.0.
  This code don't exist at all due to that reason.
- Send measurements of the differences

Please, review.

[

Friday version that just created a single writev instead of
write+writev.

]

Right now, multifd does a write() for the header and a writev() for
each group of pages.  Simplify it so we send the header as another
member of the IOV.

Once there, I got several simplifications:
* is_zero_range() was used only once, just use its body.
* same with is_zero_page().
* Be consintent and use offset insed the ramblock everywhere.
* Now that we have the offsets of the ramblock, we can drop the iov.
* Now that nothing uses iov's except NOCOMP method, move the iovs
  from pages to methods.
* Now we can use iov's with a single field for zlib/zstd.
* send_write() method is the same in all the implementaitons, so use
  it directly.
* Now, we can use a single writev() to write everything.

ToDo: Move zero page detection to the multifd thrteads.

With RAM in the Terabytes size, the detection of the zero page takes
too much time on the main thread.

Last patch on the series removes the detection of zero pages in the
main thread for multifd.  In the next series post, I will add how to
detect the zero pages and send them on multifd channels.

Please review.

Later, Juan.

Juan Quintela (12):
  multifd: Create page_size fields into both MultiFD{Recv,Send}Params
  multifd: Create page_count fields into both MultiFD{Recv,Send}Params
  migration: Export ram_transferred_ram()
  multifd: Count the number of bytes sent correctly
  migration: Make ram_save_target_page() a pointer
  multifd: Make flags field thread local
  multifd: Prepare to send a packet without the mutex held
  multifd: Add capability to enable/disable zero_page
  migration: Export ram_release_page()
  multifd: Support for zero pages transmission
  multifd: Zero pages transmission
  So we use multifd to transmit zero pages.

 qapi/migration.json      |   8 ++-
 migration/migration.h    |   1 +
 migration/multifd.h      |  44 ++++++++++--
 migration/ram.h          |   3 +
 hw/core/machine.c        |   1 +
 migration/migration.c    |  14 +++-
 migration/multifd-zlib.c |  14 ++--
 migration/multifd-zstd.c |  12 ++--
 migration/multifd.c      | 140 ++++++++++++++++++++++++++++-----------
 migration/ram.c          |  50 +++++++++++---
 migration/trace-events   |   8 +--
 11 files changed, 221 insertions(+), 74 deletions(-)

-- 
2.37.1



^ permalink raw reply	[flat|nested] 43+ messages in thread

* [PATCH v7 01/12] multifd: Create page_size fields into both MultiFD{Recv, Send}Params
  2022-08-02  6:38 [PATCH v7 00/12] Migration: Transmit and detect zero pages in the multifd threads Juan Quintela
@ 2022-08-02  6:38 ` Juan Quintela
  2022-08-11  8:10   ` [PATCH v7 01/12] multifd: Create page_size fields into both MultiFD{Recv,Send}Params Leonardo Brás
  2022-08-02  6:38 ` [PATCH v7 02/12] multifd: Create page_count fields into both MultiFD{Recv, Send}Params Juan Quintela
                   ` (10 subsequent siblings)
  11 siblings, 1 reply; 43+ messages in thread
From: Juan Quintela @ 2022-08-02  6:38 UTC (permalink / raw)
  To: qemu-devel
  Cc: Marcel Apfelbaum, Dr. David Alan Gilbert, Leonardo Bras, Peter Xu,
	Eric Blake, Philippe Mathieu-Daudé, Yanan Wang,
	Juan Quintela, Markus Armbruster, Eduardo Habkost

We were calling qemu_target_page_size() left and right.

Signed-off-by: Juan Quintela <quintela@redhat.com>
---
 migration/multifd.h      |  4 ++++
 migration/multifd-zlib.c | 14 ++++++--------
 migration/multifd-zstd.c | 12 +++++-------
 migration/multifd.c      | 18 ++++++++----------
 4 files changed, 23 insertions(+), 25 deletions(-)

diff --git a/migration/multifd.h b/migration/multifd.h
index 519f498643..86fb9982b3 100644
--- a/migration/multifd.h
+++ b/migration/multifd.h
@@ -80,6 +80,8 @@ typedef struct {
     bool registered_yank;
     /* packet allocated len */
     uint32_t packet_len;
+    /* guest page size */
+    uint32_t page_size;
     /* multifd flags for sending ram */
     int write_flags;
 
@@ -143,6 +145,8 @@ typedef struct {
     QIOChannel *c;
     /* packet allocated len */
     uint32_t packet_len;
+    /* guest page size */
+    uint32_t page_size;
 
     /* syncs main thread and channels */
     QemuSemaphore sem_sync;
diff --git a/migration/multifd-zlib.c b/migration/multifd-zlib.c
index 18213a9513..37770248e1 100644
--- a/migration/multifd-zlib.c
+++ b/migration/multifd-zlib.c
@@ -116,7 +116,6 @@ static void zlib_send_cleanup(MultiFDSendParams *p, Error **errp)
 static int zlib_send_prepare(MultiFDSendParams *p, Error **errp)
 {
     struct zlib_data *z = p->data;
-    size_t page_size = qemu_target_page_size();
     z_stream *zs = &z->zs;
     uint32_t out_size = 0;
     int ret;
@@ -135,8 +134,8 @@ static int zlib_send_prepare(MultiFDSendParams *p, Error **errp)
          * with compression. zlib does not guarantee that this is safe,
          * therefore copy the page before calling deflate().
          */
-        memcpy(z->buf, p->pages->block->host + p->normal[i], page_size);
-        zs->avail_in = page_size;
+        memcpy(z->buf, p->pages->block->host + p->normal[i], p->page_size);
+        zs->avail_in = p->page_size;
         zs->next_in = z->buf;
 
         zs->avail_out = available;
@@ -242,12 +241,11 @@ static void zlib_recv_cleanup(MultiFDRecvParams *p)
 static int zlib_recv_pages(MultiFDRecvParams *p, Error **errp)
 {
     struct zlib_data *z = p->data;
-    size_t page_size = qemu_target_page_size();
     z_stream *zs = &z->zs;
     uint32_t in_size = p->next_packet_size;
     /* we measure the change of total_out */
     uint32_t out_size = zs->total_out;
-    uint32_t expected_size = p->normal_num * page_size;
+    uint32_t expected_size = p->normal_num * p->page_size;
     uint32_t flags = p->flags & MULTIFD_FLAG_COMPRESSION_MASK;
     int ret;
     int i;
@@ -274,7 +272,7 @@ static int zlib_recv_pages(MultiFDRecvParams *p, Error **errp)
             flush = Z_SYNC_FLUSH;
         }
 
-        zs->avail_out = page_size;
+        zs->avail_out = p->page_size;
         zs->next_out = p->host + p->normal[i];
 
         /*
@@ -288,8 +286,8 @@ static int zlib_recv_pages(MultiFDRecvParams *p, Error **errp)
         do {
             ret = inflate(zs, flush);
         } while (ret == Z_OK && zs->avail_in
-                             && (zs->total_out - start) < page_size);
-        if (ret == Z_OK && (zs->total_out - start) < page_size) {
+                             && (zs->total_out - start) < p->page_size);
+        if (ret == Z_OK && (zs->total_out - start) < p->page_size) {
             error_setg(errp, "multifd %u: inflate generated too few output",
                        p->id);
             return -1;
diff --git a/migration/multifd-zstd.c b/migration/multifd-zstd.c
index d788d309f2..f4a8e1ed1f 100644
--- a/migration/multifd-zstd.c
+++ b/migration/multifd-zstd.c
@@ -113,7 +113,6 @@ static void zstd_send_cleanup(MultiFDSendParams *p, Error **errp)
 static int zstd_send_prepare(MultiFDSendParams *p, Error **errp)
 {
     struct zstd_data *z = p->data;
-    size_t page_size = qemu_target_page_size();
     int ret;
     uint32_t i;
 
@@ -128,7 +127,7 @@ static int zstd_send_prepare(MultiFDSendParams *p, Error **errp)
             flush = ZSTD_e_flush;
         }
         z->in.src = p->pages->block->host + p->normal[i];
-        z->in.size = page_size;
+        z->in.size = p->page_size;
         z->in.pos = 0;
 
         /*
@@ -241,8 +240,7 @@ static int zstd_recv_pages(MultiFDRecvParams *p, Error **errp)
 {
     uint32_t in_size = p->next_packet_size;
     uint32_t out_size = 0;
-    size_t page_size = qemu_target_page_size();
-    uint32_t expected_size = p->normal_num * page_size;
+    uint32_t expected_size = p->normal_num * p->page_size;
     uint32_t flags = p->flags & MULTIFD_FLAG_COMPRESSION_MASK;
     struct zstd_data *z = p->data;
     int ret;
@@ -265,7 +263,7 @@ static int zstd_recv_pages(MultiFDRecvParams *p, Error **errp)
 
     for (i = 0; i < p->normal_num; i++) {
         z->out.dst = p->host + p->normal[i];
-        z->out.size = page_size;
+        z->out.size = p->page_size;
         z->out.pos = 0;
 
         /*
@@ -279,8 +277,8 @@ static int zstd_recv_pages(MultiFDRecvParams *p, Error **errp)
         do {
             ret = ZSTD_decompressStream(z->zds, &z->out, &z->in);
         } while (ret > 0 && (z->in.size - z->in.pos > 0)
-                         && (z->out.pos < page_size));
-        if (ret > 0 && (z->out.pos < page_size)) {
+                         && (z->out.pos < p->page_size));
+        if (ret > 0 && (z->out.pos < p->page_size)) {
             error_setg(errp, "multifd %u: decompressStream buffer too small",
                        p->id);
             return -1;
diff --git a/migration/multifd.c b/migration/multifd.c
index 586ddc9d65..d2070c9cee 100644
--- a/migration/multifd.c
+++ b/migration/multifd.c
@@ -87,15 +87,14 @@ static void nocomp_send_cleanup(MultiFDSendParams *p, Error **errp)
 static int nocomp_send_prepare(MultiFDSendParams *p, Error **errp)
 {
     MultiFDPages_t *pages = p->pages;
-    size_t page_size = qemu_target_page_size();
 
     for (int i = 0; i < p->normal_num; i++) {
         p->iov[p->iovs_num].iov_base = pages->block->host + p->normal[i];
-        p->iov[p->iovs_num].iov_len = page_size;
+        p->iov[p->iovs_num].iov_len = p->page_size;
         p->iovs_num++;
     }
 
-    p->next_packet_size = p->normal_num * page_size;
+    p->next_packet_size = p->normal_num * p->page_size;
     p->flags |= MULTIFD_FLAG_NOCOMP;
     return 0;
 }
@@ -139,7 +138,6 @@ static void nocomp_recv_cleanup(MultiFDRecvParams *p)
 static int nocomp_recv_pages(MultiFDRecvParams *p, Error **errp)
 {
     uint32_t flags = p->flags & MULTIFD_FLAG_COMPRESSION_MASK;
-    size_t page_size = qemu_target_page_size();
 
     if (flags != MULTIFD_FLAG_NOCOMP) {
         error_setg(errp, "multifd %u: flags received %x flags expected %x",
@@ -148,7 +146,7 @@ static int nocomp_recv_pages(MultiFDRecvParams *p, Error **errp)
     }
     for (int i = 0; i < p->normal_num; i++) {
         p->iov[i].iov_base = p->host + p->normal[i];
-        p->iov[i].iov_len = page_size;
+        p->iov[i].iov_len = p->page_size;
     }
     return qio_channel_readv_all(p->c, p->iov, p->normal_num, errp);
 }
@@ -281,8 +279,7 @@ static void multifd_send_fill_packet(MultiFDSendParams *p)
 static int multifd_recv_unfill_packet(MultiFDRecvParams *p, Error **errp)
 {
     MultiFDPacket_t *packet = p->packet;
-    size_t page_size = qemu_target_page_size();
-    uint32_t page_count = MULTIFD_PACKET_SIZE / page_size;
+    uint32_t page_count = MULTIFD_PACKET_SIZE / p->page_size;
     RAMBlock *block;
     int i;
 
@@ -344,7 +341,7 @@ static int multifd_recv_unfill_packet(MultiFDRecvParams *p, Error **errp)
     for (i = 0; i < p->normal_num; i++) {
         uint64_t offset = be64_to_cpu(packet->offset[i]);
 
-        if (offset > (block->used_length - page_size)) {
+        if (offset > (block->used_length - p->page_size)) {
             error_setg(errp, "multifd: offset too long %" PRIu64
                        " (max " RAM_ADDR_FMT ")",
                        offset, block->used_length);
@@ -433,8 +430,7 @@ static int multifd_send_pages(QEMUFile *f)
     p->packet_num = multifd_send_state->packet_num++;
     multifd_send_state->pages = p->pages;
     p->pages = pages;
-    transferred = ((uint64_t) pages->num) * qemu_target_page_size()
-                + p->packet_len;
+    transferred = ((uint64_t) pages->num) * p->page_size + p->packet_len;
     qemu_file_acct_rate_limit(f, transferred);
     ram_counters.multifd_bytes += transferred;
     ram_counters.transferred += transferred;
@@ -939,6 +935,7 @@ int multifd_save_setup(Error **errp)
         /* We need one extra place for the packet header */
         p->iov = g_new0(struct iovec, page_count + 1);
         p->normal = g_new0(ram_addr_t, page_count);
+        p->page_size = qemu_target_page_size();
 
         if (migrate_use_zero_copy_send()) {
             p->write_flags = QIO_CHANNEL_WRITE_FLAG_ZERO_COPY;
@@ -1186,6 +1183,7 @@ int multifd_load_setup(Error **errp)
         p->name = g_strdup_printf("multifdrecv_%d", i);
         p->iov = g_new0(struct iovec, page_count);
         p->normal = g_new0(ram_addr_t, page_count);
+        p->page_size = qemu_target_page_size();
     }
 
     for (i = 0; i < thread_count; i++) {
-- 
2.37.1



^ permalink raw reply related	[flat|nested] 43+ messages in thread

* [PATCH v7 02/12] multifd: Create page_count fields into both MultiFD{Recv, Send}Params
  2022-08-02  6:38 [PATCH v7 00/12] Migration: Transmit and detect zero pages in the multifd threads Juan Quintela
  2022-08-02  6:38 ` [PATCH v7 01/12] multifd: Create page_size fields into both MultiFD{Recv, Send}Params Juan Quintela
@ 2022-08-02  6:38 ` Juan Quintela
  2022-08-11  8:10   ` [PATCH v7 02/12] multifd: Create page_count fields into both MultiFD{Recv,Send}Params Leonardo Brás
  2022-08-02  6:38 ` [PATCH v7 03/12] migration: Export ram_transferred_ram() Juan Quintela
                   ` (9 subsequent siblings)
  11 siblings, 1 reply; 43+ messages in thread
From: Juan Quintela @ 2022-08-02  6:38 UTC (permalink / raw)
  To: qemu-devel
  Cc: Marcel Apfelbaum, Dr. David Alan Gilbert, Leonardo Bras, Peter Xu,
	Eric Blake, Philippe Mathieu-Daudé, Yanan Wang,
	Juan Quintela, Markus Armbruster, Eduardo Habkost

We were recalculating it left and right.  We plan to change that
values on next patches.

Signed-off-by: Juan Quintela <quintela@redhat.com>
---
 migration/multifd.h | 4 ++++
 migration/multifd.c | 7 ++++---
 2 files changed, 8 insertions(+), 3 deletions(-)

diff --git a/migration/multifd.h b/migration/multifd.h
index 86fb9982b3..e2802a9ce2 100644
--- a/migration/multifd.h
+++ b/migration/multifd.h
@@ -82,6 +82,8 @@ typedef struct {
     uint32_t packet_len;
     /* guest page size */
     uint32_t page_size;
+    /* number of pages in a full packet */
+    uint32_t page_count;
     /* multifd flags for sending ram */
     int write_flags;
 
@@ -147,6 +149,8 @@ typedef struct {
     uint32_t packet_len;
     /* guest page size */
     uint32_t page_size;
+    /* number of pages in a full packet */
+    uint32_t page_count;
 
     /* syncs main thread and channels */
     QemuSemaphore sem_sync;
diff --git a/migration/multifd.c b/migration/multifd.c
index d2070c9cee..aa3808a6f4 100644
--- a/migration/multifd.c
+++ b/migration/multifd.c
@@ -279,7 +279,6 @@ static void multifd_send_fill_packet(MultiFDSendParams *p)
 static int multifd_recv_unfill_packet(MultiFDRecvParams *p, Error **errp)
 {
     MultiFDPacket_t *packet = p->packet;
-    uint32_t page_count = MULTIFD_PACKET_SIZE / p->page_size;
     RAMBlock *block;
     int i;
 
@@ -306,10 +305,10 @@ static int multifd_recv_unfill_packet(MultiFDRecvParams *p, Error **errp)
      * If we received a packet that is 100 times bigger than expected
      * just stop migration.  It is a magic number.
      */
-    if (packet->pages_alloc > page_count) {
+    if (packet->pages_alloc > p->page_count) {
         error_setg(errp, "multifd: received packet "
                    "with size %u and expected a size of %u",
-                   packet->pages_alloc, page_count) ;
+                   packet->pages_alloc, p->page_count) ;
         return -1;
     }
 
@@ -936,6 +935,7 @@ int multifd_save_setup(Error **errp)
         p->iov = g_new0(struct iovec, page_count + 1);
         p->normal = g_new0(ram_addr_t, page_count);
         p->page_size = qemu_target_page_size();
+        p->page_count = page_count;
 
         if (migrate_use_zero_copy_send()) {
             p->write_flags = QIO_CHANNEL_WRITE_FLAG_ZERO_COPY;
@@ -1183,6 +1183,7 @@ int multifd_load_setup(Error **errp)
         p->name = g_strdup_printf("multifdrecv_%d", i);
         p->iov = g_new0(struct iovec, page_count);
         p->normal = g_new0(ram_addr_t, page_count);
+        p->page_count = page_count;
         p->page_size = qemu_target_page_size();
     }
 
-- 
2.37.1



^ permalink raw reply related	[flat|nested] 43+ messages in thread

* [PATCH v7 03/12] migration: Export ram_transferred_ram()
  2022-08-02  6:38 [PATCH v7 00/12] Migration: Transmit and detect zero pages in the multifd threads Juan Quintela
  2022-08-02  6:38 ` [PATCH v7 01/12] multifd: Create page_size fields into both MultiFD{Recv, Send}Params Juan Quintela
  2022-08-02  6:38 ` [PATCH v7 02/12] multifd: Create page_count fields into both MultiFD{Recv, Send}Params Juan Quintela
@ 2022-08-02  6:38 ` Juan Quintela
  2022-08-11  8:11   ` Leonardo Brás
  2022-08-02  6:38 ` [PATCH v7 04/12] multifd: Count the number of bytes sent correctly Juan Quintela
                   ` (8 subsequent siblings)
  11 siblings, 1 reply; 43+ messages in thread
From: Juan Quintela @ 2022-08-02  6:38 UTC (permalink / raw)
  To: qemu-devel
  Cc: Marcel Apfelbaum, Dr. David Alan Gilbert, Leonardo Bras, Peter Xu,
	Eric Blake, Philippe Mathieu-Daudé, Yanan Wang,
	Juan Quintela, Markus Armbruster, Eduardo Habkost,
	David Edmondson

Signed-off-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Reviewed-by: David Edmondson <david.edmondson@oracle.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
---
 migration/ram.h | 2 ++
 migration/ram.c | 2 +-
 2 files changed, 3 insertions(+), 1 deletion(-)

diff --git a/migration/ram.h b/migration/ram.h
index c7af65ac74..e844966f69 100644
--- a/migration/ram.h
+++ b/migration/ram.h
@@ -65,6 +65,8 @@ int ram_load_postcopy(QEMUFile *f, int channel);
 
 void ram_handle_compressed(void *host, uint8_t ch, uint64_t size);
 
+void ram_transferred_add(uint64_t bytes);
+
 int ramblock_recv_bitmap_test(RAMBlock *rb, void *host_addr);
 bool ramblock_recv_bitmap_test_byte_offset(RAMBlock *rb, uint64_t byte_offset);
 void ramblock_recv_bitmap_set(RAMBlock *rb, void *host_addr);
diff --git a/migration/ram.c b/migration/ram.c
index b94669ba5d..85d89d61ac 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -422,7 +422,7 @@ uint64_t ram_bytes_remaining(void)
 
 MigrationStats ram_counters;
 
-static void ram_transferred_add(uint64_t bytes)
+void ram_transferred_add(uint64_t bytes)
 {
     if (runstate_is_running()) {
         ram_counters.precopy_bytes += bytes;
-- 
2.37.1



^ permalink raw reply related	[flat|nested] 43+ messages in thread

* [PATCH v7 04/12] multifd: Count the number of bytes sent correctly
  2022-08-02  6:38 [PATCH v7 00/12] Migration: Transmit and detect zero pages in the multifd threads Juan Quintela
                   ` (2 preceding siblings ...)
  2022-08-02  6:38 ` [PATCH v7 03/12] migration: Export ram_transferred_ram() Juan Quintela
@ 2022-08-02  6:38 ` Juan Quintela
  2022-08-11  8:11   ` Leonardo Brás
  2022-08-02  6:39 ` [PATCH v7 05/12] migration: Make ram_save_target_page() a pointer Juan Quintela
                   ` (7 subsequent siblings)
  11 siblings, 1 reply; 43+ messages in thread
From: Juan Quintela @ 2022-08-02  6:38 UTC (permalink / raw)
  To: qemu-devel
  Cc: Marcel Apfelbaum, Dr. David Alan Gilbert, Leonardo Bras, Peter Xu,
	Eric Blake, Philippe Mathieu-Daudé, Yanan Wang,
	Juan Quintela, Markus Armbruster, Eduardo Habkost

Current code asumes that all pages are whole.  That is not true for
example for compression already.  Fix it for creating a new field
->sent_bytes that includes it.

All ram_counters are used only from the migration thread, so we have
two options:
- put a mutex and fill everything when we sent it (not only
ram_counters, also qemu_file->xfer_bytes).
- Create a local variable that implements how much has been sent
through each channel.  And when we push another packet, we "add" the
previous stats.

I choose two due to less changes overall.  On the previous code we
increase transferred and then we sent.  Current code goes the other
way around.  It sents the data, and after the fact, it updates the
counters.  Notice that each channel can have a maximum of half a
megabyte of data without counting, so it is not very important.

Signed-off-by: Juan Quintela <quintela@redhat.com>
---
 migration/multifd.h |  2 ++
 migration/multifd.c | 14 ++++++--------
 2 files changed, 8 insertions(+), 8 deletions(-)

diff --git a/migration/multifd.h b/migration/multifd.h
index e2802a9ce2..36f899c56f 100644
--- a/migration/multifd.h
+++ b/migration/multifd.h
@@ -102,6 +102,8 @@ typedef struct {
     uint32_t flags;
     /* global number of generated multifd packets */
     uint64_t packet_num;
+    /* How many bytes have we sent on the last packet */
+    uint64_t sent_bytes;
     /* thread has work to do */
     int pending_job;
     /* array of pages to sent.
diff --git a/migration/multifd.c b/migration/multifd.c
index aa3808a6f4..e25b529235 100644
--- a/migration/multifd.c
+++ b/migration/multifd.c
@@ -394,7 +394,6 @@ static int multifd_send_pages(QEMUFile *f)
     static int next_channel;
     MultiFDSendParams *p = NULL; /* make happy gcc */
     MultiFDPages_t *pages = multifd_send_state->pages;
-    uint64_t transferred;
 
     if (qatomic_read(&multifd_send_state->exiting)) {
         return -1;
@@ -429,10 +428,10 @@ static int multifd_send_pages(QEMUFile *f)
     p->packet_num = multifd_send_state->packet_num++;
     multifd_send_state->pages = p->pages;
     p->pages = pages;
-    transferred = ((uint64_t) pages->num) * p->page_size + p->packet_len;
-    qemu_file_acct_rate_limit(f, transferred);
-    ram_counters.multifd_bytes += transferred;
-    ram_counters.transferred += transferred;
+    ram_transferred_add(p->sent_bytes);
+    ram_counters.multifd_bytes += p->sent_bytes;
+    qemu_file_acct_rate_limit(f, p->sent_bytes);
+    p->sent_bytes = 0;
     qemu_mutex_unlock(&p->mutex);
     qemu_sem_post(&p->sem);
 
@@ -605,9 +604,6 @@ int multifd_send_sync_main(QEMUFile *f)
         p->packet_num = multifd_send_state->packet_num++;
         p->flags |= MULTIFD_FLAG_SYNC;
         p->pending_job++;
-        qemu_file_acct_rate_limit(f, p->packet_len);
-        ram_counters.multifd_bytes += p->packet_len;
-        ram_counters.transferred += p->packet_len;
         qemu_mutex_unlock(&p->mutex);
         qemu_sem_post(&p->sem);
 
@@ -714,6 +710,8 @@ static void *multifd_send_thread(void *opaque)
             }
 
             qemu_mutex_lock(&p->mutex);
+            p->sent_bytes += p->packet_len;;
+            p->sent_bytes += p->next_packet_size;
             p->pending_job--;
             qemu_mutex_unlock(&p->mutex);
 
-- 
2.37.1



^ permalink raw reply related	[flat|nested] 43+ messages in thread

* [PATCH v7 05/12] migration: Make ram_save_target_page() a pointer
  2022-08-02  6:38 [PATCH v7 00/12] Migration: Transmit and detect zero pages in the multifd threads Juan Quintela
                   ` (3 preceding siblings ...)
  2022-08-02  6:38 ` [PATCH v7 04/12] multifd: Count the number of bytes sent correctly Juan Quintela
@ 2022-08-02  6:39 ` Juan Quintela
  2022-08-11  8:11   ` Leonardo Brás
  2022-08-02  6:39 ` [PATCH v7 06/12] multifd: Make flags field thread local Juan Quintela
                   ` (6 subsequent siblings)
  11 siblings, 1 reply; 43+ messages in thread
From: Juan Quintela @ 2022-08-02  6:39 UTC (permalink / raw)
  To: qemu-devel
  Cc: Marcel Apfelbaum, Dr. David Alan Gilbert, Leonardo Bras, Peter Xu,
	Eric Blake, Philippe Mathieu-Daudé, Yanan Wang,
	Juan Quintela, Markus Armbruster, Eduardo Habkost

We are going to create a new function for multifd latest in the series.

Signed-off-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
---
 migration/ram.c | 13 +++++++++----
 1 file changed, 9 insertions(+), 4 deletions(-)

diff --git a/migration/ram.c b/migration/ram.c
index 85d89d61ac..499d9b2a90 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -310,6 +310,9 @@ typedef struct {
     bool preempted;
 } PostcopyPreemptState;
 
+typedef struct RAMState RAMState;
+typedef struct PageSearchStatus PageSearchStatus;
+
 /* State of RAM for migration */
 struct RAMState {
     /* QEMUFile used for this migration */
@@ -372,8 +375,9 @@ struct RAMState {
      * is enabled.
      */
     unsigned int postcopy_channel;
+
+    int (*ram_save_target_page)(RAMState *rs, PageSearchStatus *pss);
 };
-typedef struct RAMState RAMState;
 
 static RAMState *ram_state;
 
@@ -2255,14 +2259,14 @@ static bool save_compress_page(RAMState *rs, RAMBlock *block, ram_addr_t offset)
 }
 
 /**
- * ram_save_target_page: save one target page
+ * ram_save_target_page_legacy: save one target page
  *
  * Returns the number of pages written
  *
  * @rs: current RAM state
  * @pss: data about the page we want to send
  */
-static int ram_save_target_page(RAMState *rs, PageSearchStatus *pss)
+static int ram_save_target_page_legacy(RAMState *rs, PageSearchStatus *pss)
 {
     RAMBlock *block = pss->block;
     ram_addr_t offset = ((ram_addr_t)pss->page) << TARGET_PAGE_BITS;
@@ -2469,7 +2473,7 @@ static int ram_save_host_page(RAMState *rs, PageSearchStatus *pss)
 
         /* Check the pages is dirty and if it is send it */
         if (migration_bitmap_clear_dirty(rs, pss->block, pss->page)) {
-            tmppages = ram_save_target_page(rs, pss);
+            tmppages = rs->ram_save_target_page(rs, pss);
             if (tmppages < 0) {
                 return tmppages;
             }
@@ -3223,6 +3227,7 @@ static int ram_save_setup(QEMUFile *f, void *opaque)
     ram_control_before_iterate(f, RAM_CONTROL_SETUP);
     ram_control_after_iterate(f, RAM_CONTROL_SETUP);
 
+    (*rsp)->ram_save_target_page = ram_save_target_page_legacy;
     ret =  multifd_send_sync_main(f);
     if (ret < 0) {
         return ret;
-- 
2.37.1



^ permalink raw reply related	[flat|nested] 43+ messages in thread

* [PATCH v7 06/12] multifd: Make flags field thread local
  2022-08-02  6:38 [PATCH v7 00/12] Migration: Transmit and detect zero pages in the multifd threads Juan Quintela
                   ` (4 preceding siblings ...)
  2022-08-02  6:39 ` [PATCH v7 05/12] migration: Make ram_save_target_page() a pointer Juan Quintela
@ 2022-08-02  6:39 ` Juan Quintela
  2022-08-11  9:04   ` Leonardo Brás
  2022-08-02  6:39 ` [PATCH v7 07/12] multifd: Prepare to send a packet without the mutex held Juan Quintela
                   ` (5 subsequent siblings)
  11 siblings, 1 reply; 43+ messages in thread
From: Juan Quintela @ 2022-08-02  6:39 UTC (permalink / raw)
  To: qemu-devel
  Cc: Marcel Apfelbaum, Dr. David Alan Gilbert, Leonardo Bras, Peter Xu,
	Eric Blake, Philippe Mathieu-Daudé, Yanan Wang,
	Juan Quintela, Markus Armbruster, Eduardo Habkost

Use of flags with respect to locking was incensistant.  For the
sending side:
- it was set to 0 with mutex held on the multifd channel.
- MULTIFD_FLAG_SYNC was set with mutex held on the migration thread.
- Everything else was done without the mutex held on the multifd channel.

On the reception side, it is not used on the migration thread, only on
the multifd channels threads.

So we move it to the multifd channels thread only variables, and we
introduce a new bool sync_needed on the send side to pass that information.

Signed-off-by: Juan Quintela <quintela@redhat.com>
---
 migration/multifd.h | 10 ++++++----
 migration/multifd.c | 23 +++++++++++++----------
 2 files changed, 19 insertions(+), 14 deletions(-)

diff --git a/migration/multifd.h b/migration/multifd.h
index 36f899c56f..a67cefc0a2 100644
--- a/migration/multifd.h
+++ b/migration/multifd.h
@@ -98,12 +98,12 @@ typedef struct {
     bool running;
     /* should this thread finish */
     bool quit;
-    /* multifd flags for each packet */
-    uint32_t flags;
     /* global number of generated multifd packets */
     uint64_t packet_num;
     /* How many bytes have we sent on the last packet */
     uint64_t sent_bytes;
+    /* Do we need to do an iteration sync */
+    bool sync_needed;
     /* thread has work to do */
     int pending_job;
     /* array of pages to sent.
@@ -117,6 +117,8 @@ typedef struct {
 
     /* pointer to the packet */
     MultiFDPacket_t *packet;
+    /* multifd flags for each packet */
+    uint32_t flags;
     /* size of the next packet that contains pages */
     uint32_t next_packet_size;
     /* packets sent through this channel */
@@ -163,8 +165,6 @@ typedef struct {
     bool running;
     /* should this thread finish */
     bool quit;
-    /* multifd flags for each packet */
-    uint32_t flags;
     /* global number of generated multifd packets */
     uint64_t packet_num;
 
@@ -172,6 +172,8 @@ typedef struct {
 
     /* pointer to the packet */
     MultiFDPacket_t *packet;
+    /* multifd flags for each packet */
+    uint32_t flags;
     /* size of the next packet that contains pages */
     uint32_t next_packet_size;
     /* packets sent through this channel */
diff --git a/migration/multifd.c b/migration/multifd.c
index e25b529235..09a40a9135 100644
--- a/migration/multifd.c
+++ b/migration/multifd.c
@@ -602,7 +602,7 @@ int multifd_send_sync_main(QEMUFile *f)
         }
 
         p->packet_num = multifd_send_state->packet_num++;
-        p->flags |= MULTIFD_FLAG_SYNC;
+        p->sync_needed = true;
         p->pending_job++;
         qemu_mutex_unlock(&p->mutex);
         qemu_sem_post(&p->sem);
@@ -658,7 +658,11 @@ static void *multifd_send_thread(void *opaque)
 
         if (p->pending_job) {
             uint64_t packet_num = p->packet_num;
-            uint32_t flags = p->flags;
+            p->flags = 0;
+            if (p->sync_needed) {
+                p->flags |= MULTIFD_FLAG_SYNC;
+                p->sync_needed = false;
+            }
             p->normal_num = 0;
 
             if (use_zero_copy_send) {
@@ -680,14 +684,13 @@ static void *multifd_send_thread(void *opaque)
                 }
             }
             multifd_send_fill_packet(p);
-            p->flags = 0;
             p->num_packets++;
             p->total_normal_pages += p->normal_num;
             p->pages->num = 0;
             p->pages->block = NULL;
             qemu_mutex_unlock(&p->mutex);
 
-            trace_multifd_send(p->id, packet_num, p->normal_num, flags,
+            trace_multifd_send(p->id, packet_num, p->normal_num, p->flags,
                                p->next_packet_size);
 
             if (use_zero_copy_send) {
@@ -715,7 +718,7 @@ static void *multifd_send_thread(void *opaque)
             p->pending_job--;
             qemu_mutex_unlock(&p->mutex);
 
-            if (flags & MULTIFD_FLAG_SYNC) {
+            if (p->flags & MULTIFD_FLAG_SYNC) {
                 qemu_sem_post(&p->sem_sync);
             }
             qemu_sem_post(&multifd_send_state->channels_ready);
@@ -1090,7 +1093,7 @@ static void *multifd_recv_thread(void *opaque)
     rcu_register_thread();
 
     while (true) {
-        uint32_t flags;
+        bool sync_needed = false;
 
         if (p->quit) {
             break;
@@ -1112,11 +1115,11 @@ static void *multifd_recv_thread(void *opaque)
             break;
         }
 
-        flags = p->flags;
+        trace_multifd_recv(p->id, p->packet_num, p->normal_num, p->flags,
+                           p->next_packet_size);
+        sync_needed = p->flags & MULTIFD_FLAG_SYNC;
         /* recv methods don't know how to handle the SYNC flag */
         p->flags &= ~MULTIFD_FLAG_SYNC;
-        trace_multifd_recv(p->id, p->packet_num, p->normal_num, flags,
-                           p->next_packet_size);
         p->num_packets++;
         p->total_normal_pages += p->normal_num;
         qemu_mutex_unlock(&p->mutex);
@@ -1128,7 +1131,7 @@ static void *multifd_recv_thread(void *opaque)
             }
         }
 
-        if (flags & MULTIFD_FLAG_SYNC) {
+        if (sync_needed) {
             qemu_sem_post(&multifd_recv_state->sem_sync);
             qemu_sem_wait(&p->sem_sync);
         }
-- 
2.37.1



^ permalink raw reply related	[flat|nested] 43+ messages in thread

* [PATCH v7 07/12] multifd: Prepare to send a packet without the mutex held
  2022-08-02  6:38 [PATCH v7 00/12] Migration: Transmit and detect zero pages in the multifd threads Juan Quintela
                   ` (5 preceding siblings ...)
  2022-08-02  6:39 ` [PATCH v7 06/12] multifd: Make flags field thread local Juan Quintela
@ 2022-08-02  6:39 ` Juan Quintela
  2022-08-11  9:16   ` Leonardo Brás
  2022-08-02  6:39 ` [PATCH v7 08/12] multifd: Add capability to enable/disable zero_page Juan Quintela
                   ` (4 subsequent siblings)
  11 siblings, 1 reply; 43+ messages in thread
From: Juan Quintela @ 2022-08-02  6:39 UTC (permalink / raw)
  To: qemu-devel
  Cc: Marcel Apfelbaum, Dr. David Alan Gilbert, Leonardo Bras, Peter Xu,
	Eric Blake, Philippe Mathieu-Daudé, Yanan Wang,
	Juan Quintela, Markus Armbruster, Eduardo Habkost

We do the send_prepare() and the fill of the head packet without the
mutex held.  It will help a lot for compression and later in the
series for zero pages.

Notice that we can use p->pages without holding p->mutex because
p->pending_job == 1.

Signed-off-by: Juan Quintela <quintela@redhat.com>
---
 migration/multifd.h |  2 ++
 migration/multifd.c | 11 ++++++-----
 2 files changed, 8 insertions(+), 5 deletions(-)

diff --git a/migration/multifd.h b/migration/multifd.h
index a67cefc0a2..cd389d18d2 100644
--- a/migration/multifd.h
+++ b/migration/multifd.h
@@ -109,7 +109,9 @@ typedef struct {
     /* array of pages to sent.
      * The owner of 'pages' depends of 'pending_job' value:
      * pending_job == 0 -> migration_thread can use it.
+     *                     No need for mutex lock.
      * pending_job != 0 -> multifd_channel can use it.
+     *                     No need for mutex lock.
      */
     MultiFDPages_t *pages;
 
diff --git a/migration/multifd.c b/migration/multifd.c
index 09a40a9135..68fc9f8e88 100644
--- a/migration/multifd.c
+++ b/migration/multifd.c
@@ -663,6 +663,8 @@ static void *multifd_send_thread(void *opaque)
                 p->flags |= MULTIFD_FLAG_SYNC;
                 p->sync_needed = false;
             }
+            qemu_mutex_unlock(&p->mutex);
+
             p->normal_num = 0;
 
             if (use_zero_copy_send) {
@@ -684,11 +686,6 @@ static void *multifd_send_thread(void *opaque)
                 }
             }
             multifd_send_fill_packet(p);
-            p->num_packets++;
-            p->total_normal_pages += p->normal_num;
-            p->pages->num = 0;
-            p->pages->block = NULL;
-            qemu_mutex_unlock(&p->mutex);
 
             trace_multifd_send(p->id, packet_num, p->normal_num, p->flags,
                                p->next_packet_size);
@@ -713,6 +710,10 @@ static void *multifd_send_thread(void *opaque)
             }
 
             qemu_mutex_lock(&p->mutex);
+            p->num_packets++;
+            p->total_normal_pages += p->normal_num;
+            p->pages->num = 0;
+            p->pages->block = NULL;
             p->sent_bytes += p->packet_len;;
             p->sent_bytes += p->next_packet_size;
             p->pending_job--;
-- 
2.37.1



^ permalink raw reply related	[flat|nested] 43+ messages in thread

* [PATCH v7 08/12] multifd: Add capability to enable/disable zero_page
  2022-08-02  6:38 [PATCH v7 00/12] Migration: Transmit and detect zero pages in the multifd threads Juan Quintela
                   ` (6 preceding siblings ...)
  2022-08-02  6:39 ` [PATCH v7 07/12] multifd: Prepare to send a packet without the mutex held Juan Quintela
@ 2022-08-02  6:39 ` Juan Quintela
  2022-08-11  9:29   ` Leonardo Brás
  2022-08-02  6:39 ` [PATCH v7 09/12] migration: Export ram_release_page() Juan Quintela
                   ` (3 subsequent siblings)
  11 siblings, 1 reply; 43+ messages in thread
From: Juan Quintela @ 2022-08-02  6:39 UTC (permalink / raw)
  To: qemu-devel
  Cc: Marcel Apfelbaum, Dr. David Alan Gilbert, Leonardo Bras, Peter Xu,
	Eric Blake, Philippe Mathieu-Daudé, Yanan Wang,
	Juan Quintela, Markus Armbruster, Eduardo Habkost

We have to enable it by default until we introduce the new code.

Signed-off-by: Juan Quintela <quintela@redhat.com>

---

Change it to a capability.  As capabilities are off by default, have
to change MULTIFD_ZERO_PAGE to MAIN_ZERO_PAGE, so it is false for
default, and true for older versions.
---
 qapi/migration.json   |  8 +++++++-
 migration/migration.h |  1 +
 hw/core/machine.c     |  1 +
 migration/migration.c | 16 +++++++++++++++-
 4 files changed, 24 insertions(+), 2 deletions(-)

diff --git a/qapi/migration.json b/qapi/migration.json
index 81185d4311..dc981236ff 100644
--- a/qapi/migration.json
+++ b/qapi/migration.json
@@ -472,12 +472,18 @@
 #                  Requires that QEMU be permitted to use locked memory
 #                  for guest RAM pages.
 #                  (since 7.1)
+#
 # @postcopy-preempt: If enabled, the migration process will allow postcopy
 #                    requests to preempt precopy stream, so postcopy requests
 #                    will be handled faster.  This is a performance feature and
 #                    should not affect the correctness of postcopy migration.
 #                    (since 7.1)
 #
+# @main-zero-page: If enabled, the detection of zero pages will be
+#                  done on the main thread.  Otherwise it is done on
+#                  the multifd threads.
+#                  (since 7.1)
+#
 # Features:
 # @unstable: Members @x-colo and @x-ignore-shared are experimental.
 #
@@ -492,7 +498,7 @@
            'dirty-bitmaps', 'postcopy-blocktime', 'late-block-activate',
            { 'name': 'x-ignore-shared', 'features': [ 'unstable' ] },
            'validate-uuid', 'background-snapshot',
-           'zero-copy-send', 'postcopy-preempt'] }
+           'zero-copy-send', 'postcopy-preempt', 'main-zero-page'] }
 
 ##
 # @MigrationCapabilityStatus:
diff --git a/migration/migration.h b/migration/migration.h
index cdad8aceaa..58b245b138 100644
--- a/migration/migration.h
+++ b/migration/migration.h
@@ -415,6 +415,7 @@ int migrate_multifd_channels(void);
 MultiFDCompression migrate_multifd_compression(void);
 int migrate_multifd_zlib_level(void);
 int migrate_multifd_zstd_level(void);
+bool migrate_use_main_zero_page(void);
 
 #ifdef CONFIG_LINUX
 bool migrate_use_zero_copy_send(void);
diff --git a/hw/core/machine.c b/hw/core/machine.c
index a673302cce..2624b75ab4 100644
--- a/hw/core/machine.c
+++ b/hw/core/machine.c
@@ -43,6 +43,7 @@
 GlobalProperty hw_compat_7_0[] = {
     { "arm-gicv3-common", "force-8-bit-prio", "on" },
     { "nvme-ns", "eui64-default", "on"},
+    { "migration", "main-zero-page", "true" },
 };
 const size_t hw_compat_7_0_len = G_N_ELEMENTS(hw_compat_7_0);
 
diff --git a/migration/migration.c b/migration/migration.c
index e03f698a3c..ce3e5cc0cd 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -164,7 +164,8 @@ INITIALIZE_MIGRATE_CAPS_SET(check_caps_background_snapshot,
     MIGRATION_CAPABILITY_XBZRLE,
     MIGRATION_CAPABILITY_X_COLO,
     MIGRATION_CAPABILITY_VALIDATE_UUID,
-    MIGRATION_CAPABILITY_ZERO_COPY_SEND);
+    MIGRATION_CAPABILITY_ZERO_COPY_SEND,
+    MIGRATION_CAPABILITY_MAIN_ZERO_PAGE);
 
 /* When we add fault tolerance, we could have several
    migrations at once.  For now we don't need to add
@@ -2592,6 +2593,17 @@ bool migrate_use_multifd(void)
     return s->enabled_capabilities[MIGRATION_CAPABILITY_MULTIFD];
 }
 
+bool migrate_use_main_zero_page(void)
+{
+    MigrationState *s;
+
+    s = migrate_get_current();
+
+    // We will enable this when we add the right code.
+    // return s->enabled_capabilities[MIGRATION_CAPABILITY_MAIN_ZERO_PAGE];
+    return true;
+}
+
 bool migrate_pause_before_switchover(void)
 {
     MigrationState *s;
@@ -4406,6 +4418,8 @@ static Property migration_properties[] = {
     DEFINE_PROP_MIG_CAP("x-zero-copy-send",
             MIGRATION_CAPABILITY_ZERO_COPY_SEND),
 #endif
+    DEFINE_PROP_MIG_CAP("main-zero-page",
+            MIGRATION_CAPABILITY_MAIN_ZERO_PAGE),
 
     DEFINE_PROP_END_OF_LIST(),
 };
-- 
2.37.1



^ permalink raw reply related	[flat|nested] 43+ messages in thread

* [PATCH v7 09/12] migration: Export ram_release_page()
  2022-08-02  6:38 [PATCH v7 00/12] Migration: Transmit and detect zero pages in the multifd threads Juan Quintela
                   ` (7 preceding siblings ...)
  2022-08-02  6:39 ` [PATCH v7 08/12] multifd: Add capability to enable/disable zero_page Juan Quintela
@ 2022-08-02  6:39 ` Juan Quintela
  2022-08-11  9:31   ` Leonardo Brás
  2022-08-02  6:39 ` [PATCH v7 10/12] multifd: Support for zero pages transmission Juan Quintela
                   ` (2 subsequent siblings)
  11 siblings, 1 reply; 43+ messages in thread
From: Juan Quintela @ 2022-08-02  6:39 UTC (permalink / raw)
  To: qemu-devel
  Cc: Marcel Apfelbaum, Dr. David Alan Gilbert, Leonardo Bras, Peter Xu,
	Eric Blake, Philippe Mathieu-Daudé, Yanan Wang,
	Juan Quintela, Markus Armbruster, Eduardo Habkost

Signed-off-by: Juan Quintela <quintela@redhat.com>
---
 migration/ram.h | 1 +
 migration/ram.c | 2 +-
 2 files changed, 2 insertions(+), 1 deletion(-)

diff --git a/migration/ram.h b/migration/ram.h
index e844966f69..038d52f49f 100644
--- a/migration/ram.h
+++ b/migration/ram.h
@@ -66,6 +66,7 @@ int ram_load_postcopy(QEMUFile *f, int channel);
 void ram_handle_compressed(void *host, uint8_t ch, uint64_t size);
 
 void ram_transferred_add(uint64_t bytes);
+void ram_release_page(const char *rbname, uint64_t offset);
 
 int ramblock_recv_bitmap_test(RAMBlock *rb, void *host_addr);
 bool ramblock_recv_bitmap_test_byte_offset(RAMBlock *rb, uint64_t byte_offset);
diff --git a/migration/ram.c b/migration/ram.c
index 499d9b2a90..291ba5c0ed 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -1238,7 +1238,7 @@ static void migration_bitmap_sync_precopy(RAMState *rs)
     }
 }
 
-static void ram_release_page(const char *rbname, uint64_t offset)
+void ram_release_page(const char *rbname, uint64_t offset)
 {
     if (!migrate_release_ram() || !migration_in_postcopy()) {
         return;
-- 
2.37.1



^ permalink raw reply related	[flat|nested] 43+ messages in thread

* [PATCH v7 10/12] multifd: Support for zero pages transmission
  2022-08-02  6:38 [PATCH v7 00/12] Migration: Transmit and detect zero pages in the multifd threads Juan Quintela
                   ` (8 preceding siblings ...)
  2022-08-02  6:39 ` [PATCH v7 09/12] migration: Export ram_release_page() Juan Quintela
@ 2022-08-02  6:39 ` Juan Quintela
  2022-09-02 13:27   ` Leonardo Brás
  2022-10-25  9:10   ` chuang xu
  2022-08-02  6:39 ` [PATCH v7 11/12] multifd: Zero " Juan Quintela
  2022-08-02  6:39 ` [PATCH v7 12/12] So we use multifd to transmit zero pages Juan Quintela
  11 siblings, 2 replies; 43+ messages in thread
From: Juan Quintela @ 2022-08-02  6:39 UTC (permalink / raw)
  To: qemu-devel
  Cc: Marcel Apfelbaum, Dr. David Alan Gilbert, Leonardo Bras, Peter Xu,
	Eric Blake, Philippe Mathieu-Daudé, Yanan Wang,
	Juan Quintela, Markus Armbruster, Eduardo Habkost

This patch adds counters and similar.  Logic will be added on the
following patch.

Signed-off-by: Juan Quintela <quintela@redhat.com>

---

Added counters for duplicated/non duplicated pages.
Removed reviewed by from David.
Add total_zero_pages
---
 migration/multifd.h    | 17 ++++++++++++++++-
 migration/multifd.c    | 36 +++++++++++++++++++++++++++++-------
 migration/ram.c        |  2 --
 migration/trace-events |  8 ++++----
 4 files changed, 49 insertions(+), 14 deletions(-)

diff --git a/migration/multifd.h b/migration/multifd.h
index cd389d18d2..a1b852200d 100644
--- a/migration/multifd.h
+++ b/migration/multifd.h
@@ -47,7 +47,10 @@ typedef struct {
     /* size of the next packet that contains pages */
     uint32_t next_packet_size;
     uint64_t packet_num;
-    uint64_t unused[4];    /* Reserved for future use */
+    /* zero pages */
+    uint32_t zero_pages;
+    uint32_t unused32[1];    /* Reserved for future use */
+    uint64_t unused64[3];    /* Reserved for future use */
     char ramblock[256];
     uint64_t offset[];
 } __attribute__((packed)) MultiFDPacket_t;
@@ -127,6 +130,8 @@ typedef struct {
     uint64_t num_packets;
     /* non zero pages sent through this channel */
     uint64_t total_normal_pages;
+    /* zero pages sent through this channel */
+    uint64_t total_zero_pages;
     /* buffers to send */
     struct iovec *iov;
     /* number of iovs used */
@@ -135,6 +140,10 @@ typedef struct {
     ram_addr_t *normal;
     /* num of non zero pages */
     uint32_t normal_num;
+    /* Pages that are  zero */
+    ram_addr_t *zero;
+    /* num of zero pages */
+    uint32_t zero_num;
     /* used for compression methods */
     void *data;
 }  MultiFDSendParams;
@@ -184,12 +193,18 @@ typedef struct {
     uint8_t *host;
     /* non zero pages recv through this channel */
     uint64_t total_normal_pages;
+    /* zero pages recv through this channel */
+    uint64_t total_zero_pages;
     /* buffers to recv */
     struct iovec *iov;
     /* Pages that are not zero */
     ram_addr_t *normal;
     /* num of non zero pages */
     uint32_t normal_num;
+    /* Pages that are  zero */
+    ram_addr_t *zero;
+    /* num of zero pages */
+    uint32_t zero_num;
     /* used for de-compression methods */
     void *data;
 } MultiFDRecvParams;
diff --git a/migration/multifd.c b/migration/multifd.c
index 68fc9f8e88..4473d9f834 100644
--- a/migration/multifd.c
+++ b/migration/multifd.c
@@ -263,6 +263,7 @@ static void multifd_send_fill_packet(MultiFDSendParams *p)
     packet->normal_pages = cpu_to_be32(p->normal_num);
     packet->next_packet_size = cpu_to_be32(p->next_packet_size);
     packet->packet_num = cpu_to_be64(p->packet_num);
+    packet->zero_pages = cpu_to_be32(p->zero_num);
 
     if (p->pages->block) {
         strncpy(packet->ramblock, p->pages->block->idstr, 256);
@@ -323,7 +324,15 @@ static int multifd_recv_unfill_packet(MultiFDRecvParams *p, Error **errp)
     p->next_packet_size = be32_to_cpu(packet->next_packet_size);
     p->packet_num = be64_to_cpu(packet->packet_num);
 
-    if (p->normal_num == 0) {
+    p->zero_num = be32_to_cpu(packet->zero_pages);
+    if (p->zero_num > packet->pages_alloc - p->normal_num) {
+        error_setg(errp, "multifd: received packet "
+                   "with %u zero pages and expected maximum pages are %u",
+                   p->zero_num, packet->pages_alloc - p->normal_num) ;
+        return -1;
+    }
+
+    if (p->normal_num == 0 && p->zero_num == 0) {
         return 0;
     }
 
@@ -432,6 +441,8 @@ static int multifd_send_pages(QEMUFile *f)
     ram_counters.multifd_bytes += p->sent_bytes;
     qemu_file_acct_rate_limit(f, p->sent_bytes);
     p->sent_bytes = 0;
+    ram_counters.normal += p->normal_num;
+    ram_counters.duplicate += p->zero_num;
     qemu_mutex_unlock(&p->mutex);
     qemu_sem_post(&p->sem);
 
@@ -545,6 +556,8 @@ void multifd_save_cleanup(void)
         p->iov = NULL;
         g_free(p->normal);
         p->normal = NULL;
+        g_free(p->zero);
+        p->zero = NULL;
         multifd_send_state->ops->send_cleanup(p, &local_err);
         if (local_err) {
             migrate_set_error(migrate_get_current(), local_err);
@@ -666,6 +679,7 @@ static void *multifd_send_thread(void *opaque)
             qemu_mutex_unlock(&p->mutex);
 
             p->normal_num = 0;
+            p->zero_num = 0;
 
             if (use_zero_copy_send) {
                 p->iovs_num = 0;
@@ -687,8 +701,8 @@ static void *multifd_send_thread(void *opaque)
             }
             multifd_send_fill_packet(p);
 
-            trace_multifd_send(p->id, packet_num, p->normal_num, p->flags,
-                               p->next_packet_size);
+            trace_multifd_send(p->id, packet_num, p->normal_num, p->zero_num,
+                               p->flags, p->next_packet_size);
 
             if (use_zero_copy_send) {
                 /* Send header first, without zerocopy */
@@ -712,6 +726,7 @@ static void *multifd_send_thread(void *opaque)
             qemu_mutex_lock(&p->mutex);
             p->num_packets++;
             p->total_normal_pages += p->normal_num;
+            p->total_zero_pages += p->zero_num;
             p->pages->num = 0;
             p->pages->block = NULL;
             p->sent_bytes += p->packet_len;;
@@ -753,7 +768,8 @@ out:
     qemu_mutex_unlock(&p->mutex);
 
     rcu_unregister_thread();
-    trace_multifd_send_thread_end(p->id, p->num_packets, p->total_normal_pages);
+    trace_multifd_send_thread_end(p->id, p->num_packets, p->total_normal_pages,
+                                  p->total_zero_pages);
 
     return NULL;
 }
@@ -938,6 +954,7 @@ int multifd_save_setup(Error **errp)
         p->normal = g_new0(ram_addr_t, page_count);
         p->page_size = qemu_target_page_size();
         p->page_count = page_count;
+        p->zero = g_new0(ram_addr_t, page_count);
 
         if (migrate_use_zero_copy_send()) {
             p->write_flags = QIO_CHANNEL_WRITE_FLAG_ZERO_COPY;
@@ -1046,6 +1063,8 @@ int multifd_load_cleanup(Error **errp)
         p->iov = NULL;
         g_free(p->normal);
         p->normal = NULL;
+        g_free(p->zero);
+        p->zero = NULL;
         multifd_recv_state->ops->recv_cleanup(p);
     }
     qemu_sem_destroy(&multifd_recv_state->sem_sync);
@@ -1116,13 +1135,14 @@ static void *multifd_recv_thread(void *opaque)
             break;
         }
 
-        trace_multifd_recv(p->id, p->packet_num, p->normal_num, p->flags,
-                           p->next_packet_size);
+        trace_multifd_recv(p->id, p->packet_num, p->normal_num, p->zero_num,
+                           p->flags, p->next_packet_size);
         sync_needed = p->flags & MULTIFD_FLAG_SYNC;
         /* recv methods don't know how to handle the SYNC flag */
         p->flags &= ~MULTIFD_FLAG_SYNC;
         p->num_packets++;
         p->total_normal_pages += p->normal_num;
+        p->total_normal_pages += p->zero_num;
         qemu_mutex_unlock(&p->mutex);
 
         if (p->normal_num) {
@@ -1147,7 +1167,8 @@ static void *multifd_recv_thread(void *opaque)
     qemu_mutex_unlock(&p->mutex);
 
     rcu_unregister_thread();
-    trace_multifd_recv_thread_end(p->id, p->num_packets, p->total_normal_pages);
+    trace_multifd_recv_thread_end(p->id, p->num_packets, p->total_normal_pages,
+                                  p->total_zero_pages);
 
     return NULL;
 }
@@ -1187,6 +1208,7 @@ int multifd_load_setup(Error **errp)
         p->normal = g_new0(ram_addr_t, page_count);
         p->page_count = page_count;
         p->page_size = qemu_target_page_size();
+        p->zero = g_new0(ram_addr_t, page_count);
     }
 
     for (i = 0; i < thread_count; i++) {
diff --git a/migration/ram.c b/migration/ram.c
index 291ba5c0ed..2af70f517a 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -1412,8 +1412,6 @@ static int ram_save_multifd_page(RAMState *rs, RAMBlock *block,
     if (multifd_queue_page(rs->f, block, offset) < 0) {
         return -1;
     }
-    ram_counters.normal++;
-
     return 1;
 }
 
diff --git a/migration/trace-events b/migration/trace-events
index a34afe7b85..d34aec177c 100644
--- a/migration/trace-events
+++ b/migration/trace-events
@@ -120,21 +120,21 @@ postcopy_preempt_reset_channel(void) ""
 
 # multifd.c
 multifd_new_send_channel_async(uint8_t id) "channel %u"
-multifd_recv(uint8_t id, uint64_t packet_num, uint32_t used, uint32_t flags, uint32_t next_packet_size) "channel %u packet_num %" PRIu64 " pages %u flags 0x%x next packet size %u"
+multifd_recv(uint8_t id, uint64_t packet_num, uint32_t normal, uint32_t zero, uint32_t flags, uint32_t next_packet_size) "channel %u packet_num %" PRIu64 " normal pages %u zero pages %u flags 0x%x next packet size %u"
 multifd_recv_new_channel(uint8_t id) "channel %u"
 multifd_recv_sync_main(long packet_num) "packet num %ld"
 multifd_recv_sync_main_signal(uint8_t id) "channel %u"
 multifd_recv_sync_main_wait(uint8_t id) "channel %u"
 multifd_recv_terminate_threads(bool error) "error %d"
-multifd_recv_thread_end(uint8_t id, uint64_t packets, uint64_t pages) "channel %u packets %" PRIu64 " pages %" PRIu64
+multifd_recv_thread_end(uint8_t id, uint64_t packets, uint64_t normal_pages, uint64_t zero_pages) "channel %u packets %" PRIu64 " normal pages %" PRIu64 " zero pages %" PRIu64
 multifd_recv_thread_start(uint8_t id) "%u"
-multifd_send(uint8_t id, uint64_t packet_num, uint32_t normal, uint32_t flags, uint32_t next_packet_size) "channel %u packet_num %" PRIu64 " normal pages %u flags 0x%x next packet size %u"
+multifd_send(uint8_t id, uint64_t packet_num, uint32_t normalpages, uint32_t zero_pages, uint32_t flags, uint32_t next_packet_size) "channel %u packet_num %" PRIu64 " normal pages %u zero pages %u flags 0x%x next packet size %u"
 multifd_send_error(uint8_t id) "channel %u"
 multifd_send_sync_main(long packet_num) "packet num %ld"
 multifd_send_sync_main_signal(uint8_t id) "channel %u"
 multifd_send_sync_main_wait(uint8_t id) "channel %u"
 multifd_send_terminate_threads(bool error) "error %d"
-multifd_send_thread_end(uint8_t id, uint64_t packets, uint64_t normal_pages) "channel %u packets %" PRIu64 " normal pages %"  PRIu64
+multifd_send_thread_end(uint8_t id, uint64_t packets, uint64_t normal_pages, uint64_t zero_pages) "channel %u packets %" PRIu64 " normal pages %"  PRIu64 " zero pages %"  PRIu64
 multifd_send_thread_start(uint8_t id) "%u"
 multifd_tls_outgoing_handshake_start(void *ioc, void *tioc, const char *hostname) "ioc=%p tioc=%p hostname=%s"
 multifd_tls_outgoing_handshake_error(void *ioc, const char *err) "ioc=%p err=%s"
-- 
2.37.1



^ permalink raw reply related	[flat|nested] 43+ messages in thread

* [PATCH v7 11/12] multifd: Zero pages transmission
  2022-08-02  6:38 [PATCH v7 00/12] Migration: Transmit and detect zero pages in the multifd threads Juan Quintela
                   ` (9 preceding siblings ...)
  2022-08-02  6:39 ` [PATCH v7 10/12] multifd: Support for zero pages transmission Juan Quintela
@ 2022-08-02  6:39 ` Juan Quintela
  2022-09-02 13:27   ` Leonardo Brás
  2022-08-02  6:39 ` [PATCH v7 12/12] So we use multifd to transmit zero pages Juan Quintela
  11 siblings, 1 reply; 43+ messages in thread
From: Juan Quintela @ 2022-08-02  6:39 UTC (permalink / raw)
  To: qemu-devel
  Cc: Marcel Apfelbaum, Dr. David Alan Gilbert, Leonardo Bras, Peter Xu,
	Eric Blake, Philippe Mathieu-Daudé, Yanan Wang,
	Juan Quintela, Markus Armbruster, Eduardo Habkost

This implements the zero page dection and handling.

Signed-off-by: Juan Quintela <quintela@redhat.com>

---

Add comment for offset (dave)
Use local variables for offset/block to have shorter lines
---
 migration/multifd.h |  5 +++++
 migration/multifd.c | 41 +++++++++++++++++++++++++++++++++++++++--
 2 files changed, 44 insertions(+), 2 deletions(-)

diff --git a/migration/multifd.h b/migration/multifd.h
index a1b852200d..5931de6f86 100644
--- a/migration/multifd.h
+++ b/migration/multifd.h
@@ -52,6 +52,11 @@ typedef struct {
     uint32_t unused32[1];    /* Reserved for future use */
     uint64_t unused64[3];    /* Reserved for future use */
     char ramblock[256];
+    /*
+     * This array contains the pointers to:
+     *  - normal pages (initial normal_pages entries)
+     *  - zero pages (following zero_pages entries)
+     */
     uint64_t offset[];
 } __attribute__((packed)) MultiFDPacket_t;
 
diff --git a/migration/multifd.c b/migration/multifd.c
index 4473d9f834..89811619d8 100644
--- a/migration/multifd.c
+++ b/migration/multifd.c
@@ -11,6 +11,7 @@
  */
 
 #include "qemu/osdep.h"
+#include "qemu/cutils.h"
 #include "qemu/rcu.h"
 #include "exec/target_page.h"
 #include "sysemu/sysemu.h"
@@ -275,6 +276,12 @@ static void multifd_send_fill_packet(MultiFDSendParams *p)
 
         packet->offset[i] = cpu_to_be64(temp);
     }
+    for (i = 0; i < p->zero_num; i++) {
+        /* there are architectures where ram_addr_t is 32 bit */
+        uint64_t temp = p->zero[i];
+
+        packet->offset[p->normal_num + i] = cpu_to_be64(temp);
+    }
 }
 
 static int multifd_recv_unfill_packet(MultiFDRecvParams *p, Error **errp)
@@ -358,6 +365,18 @@ static int multifd_recv_unfill_packet(MultiFDRecvParams *p, Error **errp)
         p->normal[i] = offset;
     }
 
+    for (i = 0; i < p->zero_num; i++) {
+        uint64_t offset = be64_to_cpu(packet->offset[p->normal_num + i]);
+
+        if (offset > (block->used_length - p->page_size)) {
+            error_setg(errp, "multifd: offset too long %" PRIu64
+                       " (max " RAM_ADDR_FMT ")",
+                       offset, block->used_length);
+            return -1;
+        }
+        p->zero[i] = offset;
+    }
+
     return 0;
 }
 
@@ -648,6 +667,8 @@ static void *multifd_send_thread(void *opaque)
 {
     MultiFDSendParams *p = opaque;
     Error *local_err = NULL;
+    /* qemu older than 7.0 don't understand zero page on multifd channel */
+    bool use_zero_page = migrate_use_multifd_zero_page();
     int ret = 0;
     bool use_zero_copy_send = migrate_use_zero_copy_send();
 
@@ -670,6 +691,7 @@ static void *multifd_send_thread(void *opaque)
         qemu_mutex_lock(&p->mutex);
 
         if (p->pending_job) {
+            RAMBlock *rb = p->pages->block;
             uint64_t packet_num = p->packet_num;
             p->flags = 0;
             if (p->sync_needed) {
@@ -688,8 +710,16 @@ static void *multifd_send_thread(void *opaque)
             }
 
             for (int i = 0; i < p->pages->num; i++) {
-                p->normal[p->normal_num] = p->pages->offset[i];
-                p->normal_num++;
+                uint64_t offset = p->pages->offset[i];
+                if (use_zero_page &&
+                    buffer_is_zero(rb->host + offset, p->page_size)) {
+                    p->zero[p->zero_num] = offset;
+                    p->zero_num++;
+                    ram_release_page(rb->idstr, offset);
+                } else {
+                    p->normal[p->normal_num] = offset;
+                    p->normal_num++;
+                }
             }
 
             if (p->normal_num) {
@@ -1152,6 +1182,13 @@ static void *multifd_recv_thread(void *opaque)
             }
         }
 
+        for (int i = 0; i < p->zero_num; i++) {
+            void *page = p->host + p->zero[i];
+            if (!buffer_is_zero(page, p->page_size)) {
+                memset(page, 0, p->page_size);
+            }
+        }
+
         if (sync_needed) {
             qemu_sem_post(&multifd_recv_state->sem_sync);
             qemu_sem_wait(&p->sem_sync);
-- 
2.37.1



^ permalink raw reply related	[flat|nested] 43+ messages in thread

* [PATCH v7 12/12] So we use multifd to transmit zero pages.
  2022-08-02  6:38 [PATCH v7 00/12] Migration: Transmit and detect zero pages in the multifd threads Juan Quintela
                   ` (10 preceding siblings ...)
  2022-08-02  6:39 ` [PATCH v7 11/12] multifd: Zero " Juan Quintela
@ 2022-08-02  6:39 ` Juan Quintela
  2022-09-02 13:27   ` Leonardo Brás
  11 siblings, 1 reply; 43+ messages in thread
From: Juan Quintela @ 2022-08-02  6:39 UTC (permalink / raw)
  To: qemu-devel
  Cc: Marcel Apfelbaum, Dr. David Alan Gilbert, Leonardo Bras, Peter Xu,
	Eric Blake, Philippe Mathieu-Daudé, Yanan Wang,
	Juan Quintela, Markus Armbruster, Eduardo Habkost

Signed-off-by: Juan Quintela <quintela@redhat.com>

---

- Check zero_page property before using new code (Dave)
---
 migration/migration.c |  4 +---
 migration/multifd.c   |  6 +++---
 migration/ram.c       | 33 ++++++++++++++++++++++++++++++++-
 3 files changed, 36 insertions(+), 7 deletions(-)

diff --git a/migration/migration.c b/migration/migration.c
index ce3e5cc0cd..13842f6803 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -2599,9 +2599,7 @@ bool migrate_use_main_zero_page(void)
 
     s = migrate_get_current();
 
-    // We will enable this when we add the right code.
-    // return s->enabled_capabilities[MIGRATION_CAPABILITY_MAIN_ZERO_PAGE];
-    return true;
+    return s->enabled_capabilities[MIGRATION_CAPABILITY_MAIN_ZERO_PAGE];
 }
 
 bool migrate_pause_before_switchover(void)
diff --git a/migration/multifd.c b/migration/multifd.c
index 89811619d8..54acdc004c 100644
--- a/migration/multifd.c
+++ b/migration/multifd.c
@@ -667,8 +667,8 @@ static void *multifd_send_thread(void *opaque)
 {
     MultiFDSendParams *p = opaque;
     Error *local_err = NULL;
-    /* qemu older than 7.0 don't understand zero page on multifd channel */
-    bool use_zero_page = migrate_use_multifd_zero_page();
+    /* older qemu don't understand zero page on multifd channel */
+    bool use_multifd_zero_page = !migrate_use_main_zero_page();
     int ret = 0;
     bool use_zero_copy_send = migrate_use_zero_copy_send();
 
@@ -711,7 +711,7 @@ static void *multifd_send_thread(void *opaque)
 
             for (int i = 0; i < p->pages->num; i++) {
                 uint64_t offset = p->pages->offset[i];
-                if (use_zero_page &&
+                if (use_multifd_zero_page &&
                     buffer_is_zero(rb->host + offset, p->page_size)) {
                     p->zero[p->zero_num] = offset;
                     p->zero_num++;
diff --git a/migration/ram.c b/migration/ram.c
index 2af70f517a..26e60b9cc1 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -2428,6 +2428,32 @@ static void postcopy_preempt_reset_channel(RAMState *rs)
     }
 }
 
+/**
+ * ram_save_target_page_multifd: save one target page
+ *
+ * Returns the number of pages written
+ *
+ * @rs: current RAM state
+ * @pss: data about the page we want to send
+ */
+static int ram_save_target_page_multifd(RAMState *rs, PageSearchStatus *pss)
+{
+    RAMBlock *block = pss->block;
+    ram_addr_t offset = ((ram_addr_t)pss->page) << TARGET_PAGE_BITS;
+    int res;
+
+    if (!migration_in_postcopy()) {
+        return ram_save_multifd_page(rs, block, offset);
+    }
+
+    res = save_zero_page(rs, block, offset);
+    if (res > 0) {
+        return res;
+    }
+
+    return ram_save_page(rs, pss);
+}
+
 /**
  * ram_save_host_page: save a whole host page
  *
@@ -3225,7 +3251,12 @@ static int ram_save_setup(QEMUFile *f, void *opaque)
     ram_control_before_iterate(f, RAM_CONTROL_SETUP);
     ram_control_after_iterate(f, RAM_CONTROL_SETUP);
 
-    (*rsp)->ram_save_target_page = ram_save_target_page_legacy;
+    if (migrate_use_multifd() && !migrate_use_main_zero_page()) {
+        (*rsp)->ram_save_target_page = ram_save_target_page_multifd;
+    } else {
+        (*rsp)->ram_save_target_page = ram_save_target_page_legacy;
+    }
+
     ret =  multifd_send_sync_main(f);
     if (ret < 0) {
         return ret;
-- 
2.37.1



^ permalink raw reply related	[flat|nested] 43+ messages in thread

* Re: [PATCH v7 01/12] multifd: Create page_size fields into both MultiFD{Recv,Send}Params
  2022-08-02  6:38 ` [PATCH v7 01/12] multifd: Create page_size fields into both MultiFD{Recv, Send}Params Juan Quintela
@ 2022-08-11  8:10   ` Leonardo Brás
  2022-08-13 15:41     ` Juan Quintela
  0 siblings, 1 reply; 43+ messages in thread
From: Leonardo Brás @ 2022-08-11  8:10 UTC (permalink / raw)
  To: Juan Quintela, qemu-devel
  Cc: Marcel Apfelbaum, Dr. David Alan Gilbert, Peter Xu, Eric Blake,
	Philippe Mathieu-Daudé, Yanan Wang, Markus Armbruster,
	Eduardo Habkost

Hello Juan,

On Tue, 2022-08-02 at 08:38 +0200, Juan Quintela wrote:
> We were calling qemu_target_page_size() left and right.
> 
> Signed-off-by: Juan Quintela <quintela@redhat.com>

IMHO looks a good idea to bring that info inside the multifd parameters.

> ---
>  migration/multifd.h      |  4 ++++
>  migration/multifd-zlib.c | 14 ++++++--------
>  migration/multifd-zstd.c | 12 +++++-------
>  migration/multifd.c      | 18 ++++++++----------
>  4 files changed, 23 insertions(+), 25 deletions(-)
> 
> diff --git a/migration/multifd.h b/migration/multifd.h
> index 519f498643..86fb9982b3 100644
> --- a/migration/multifd.h
> +++ b/migration/multifd.h
> @@ -80,6 +80,8 @@ typedef struct {
>      bool registered_yank;
>      /* packet allocated len */
>      uint32_t packet_len;
> +    /* guest page size */
> +    uint32_t page_size;
>      /* multifd flags for sending ram */
>      int write_flags;
>  
> @@ -143,6 +145,8 @@ typedef struct {
>      QIOChannel *c;
>      /* packet allocated len */
>      uint32_t packet_len;
> +    /* guest page size */
> +    uint32_t page_size;
>  
>      /* syncs main thread and channels */
>      QemuSemaphore sem_sync;
> diff --git a/migration/multifd-zlib.c b/migration/multifd-zlib.c
> index 18213a9513..37770248e1 100644
> --- a/migration/multifd-zlib.c
> +++ b/migration/multifd-zlib.c
> @@ -116,7 +116,6 @@ static void zlib_send_cleanup(MultiFDSendParams *p, Error **errp)
>  static int zlib_send_prepare(MultiFDSendParams *p, Error **errp)
>  {
>      struct zlib_data *z = p->data;
> -    size_t page_size = qemu_target_page_size();
>      z_stream *zs = &z->zs;
>      uint32_t out_size = 0;
>      int ret;
> @@ -135,8 +134,8 @@ static int zlib_send_prepare(MultiFDSendParams *p, Error **errp)
>           * with compression. zlib does not guarantee that this is safe,
>           * therefore copy the page before calling deflate().
>           */
> -        memcpy(z->buf, p->pages->block->host + p->normal[i], page_size);
> -        zs->avail_in = page_size;
> +        memcpy(z->buf, p->pages->block->host + p->normal[i], p->page_size);
> +        zs->avail_in = p->page_size;
>          zs->next_in = z->buf;
>  
>          zs->avail_out = available;
> @@ -242,12 +241,11 @@ static void zlib_recv_cleanup(MultiFDRecvParams *p)
>  static int zlib_recv_pages(MultiFDRecvParams *p, Error **errp)
>  {
>      struct zlib_data *z = p->data;
> -    size_t page_size = qemu_target_page_size();
>      z_stream *zs = &z->zs;
>      uint32_t in_size = p->next_packet_size;
>      /* we measure the change of total_out */
>      uint32_t out_size = zs->total_out;
> -    uint32_t expected_size = p->normal_num * page_size;
> +    uint32_t expected_size = p->normal_num * p->page_size;
>      uint32_t flags = p->flags & MULTIFD_FLAG_COMPRESSION_MASK;
>      int ret;
>      int i;
> @@ -274,7 +272,7 @@ static int zlib_recv_pages(MultiFDRecvParams *p, Error **errp)
>              flush = Z_SYNC_FLUSH;
>          }
>  
> -        zs->avail_out = page_size;
> +        zs->avail_out = p->page_size;
>          zs->next_out = p->host + p->normal[i];
>  
>          /*
> @@ -288,8 +286,8 @@ static int zlib_recv_pages(MultiFDRecvParams *p, Error **errp)
>          do {
>              ret = inflate(zs, flush);
>          } while (ret == Z_OK && zs->avail_in
> -                             && (zs->total_out - start) < page_size);
> -        if (ret == Z_OK && (zs->total_out - start) < page_size) {
> +                             && (zs->total_out - start) < p->page_size);
> +        if (ret == Z_OK && (zs->total_out - start) < p->page_size) {
>              error_setg(errp, "multifd %u: inflate generated too few output",
>                         p->id);
>              return -1;
> diff --git a/migration/multifd-zstd.c b/migration/multifd-zstd.c
> index d788d309f2..f4a8e1ed1f 100644
> --- a/migration/multifd-zstd.c
> +++ b/migration/multifd-zstd.c
> @@ -113,7 +113,6 @@ static void zstd_send_cleanup(MultiFDSendParams *p, Error **errp)
>  static int zstd_send_prepare(MultiFDSendParams *p, Error **errp)
>  {
>      struct zstd_data *z = p->data;
> -    size_t page_size = qemu_target_page_size();
>      int ret;
>      uint32_t i;
>  
> @@ -128,7 +127,7 @@ static int zstd_send_prepare(MultiFDSendParams *p, Error **errp)
>              flush = ZSTD_e_flush;
>          }
>          z->in.src = p->pages->block->host + p->normal[i];
> -        z->in.size = page_size;
> +        z->in.size = p->page_size;
>          z->in.pos = 0;
>  
>          /*
> @@ -241,8 +240,7 @@ static int zstd_recv_pages(MultiFDRecvParams *p, Error **errp)
>  {
>      uint32_t in_size = p->next_packet_size;
>      uint32_t out_size = 0;
> -    size_t page_size = qemu_target_page_size();
> -    uint32_t expected_size = p->normal_num * page_size;
> +    uint32_t expected_size = p->normal_num * p->page_size;
>      uint32_t flags = p->flags & MULTIFD_FLAG_COMPRESSION_MASK;
>      struct zstd_data *z = p->data;
>      int ret;
> @@ -265,7 +263,7 @@ static int zstd_recv_pages(MultiFDRecvParams *p, Error **errp)
>  
>      for (i = 0; i < p->normal_num; i++) {
>          z->out.dst = p->host + p->normal[i];
> -        z->out.size = page_size;
> +        z->out.size = p->page_size;
>          z->out.pos = 0;
>  
>          /*
> @@ -279,8 +277,8 @@ static int zstd_recv_pages(MultiFDRecvParams *p, Error **errp)
>          do {
>              ret = ZSTD_decompressStream(z->zds, &z->out, &z->in);
>          } while (ret > 0 && (z->in.size - z->in.pos > 0)
> -                         && (z->out.pos < page_size));
> -        if (ret > 0 && (z->out.pos < page_size)) {
> +                         && (z->out.pos < p->page_size));
> +        if (ret > 0 && (z->out.pos < p->page_size)) {
>              error_setg(errp, "multifd %u: decompressStream buffer too small",
>                         p->id);
>              return -1;
> diff --git a/migration/multifd.c b/migration/multifd.c
> index 586ddc9d65..d2070c9cee 100644
> --- a/migration/multifd.c
> +++ b/migration/multifd.c
> @@ -87,15 +87,14 @@ static void nocomp_send_cleanup(MultiFDSendParams *p, Error **errp)
>  static int nocomp_send_prepare(MultiFDSendParams *p, Error **errp)
>  {
>      MultiFDPages_t *pages = p->pages;
> -    size_t page_size = qemu_target_page_size();
>  
>      for (int i = 0; i < p->normal_num; i++) {
>          p->iov[p->iovs_num].iov_base = pages->block->host + p->normal[i];
> -        p->iov[p->iovs_num].iov_len = page_size;
> +        p->iov[p->iovs_num].iov_len = p->page_size;
>          p->iovs_num++;
>      }
>  
> -    p->next_packet_size = p->normal_num * page_size;
> +    p->next_packet_size = p->normal_num * p->page_size;
>      p->flags |= MULTIFD_FLAG_NOCOMP;
>      return 0;
>  }
> @@ -139,7 +138,6 @@ static void nocomp_recv_cleanup(MultiFDRecvParams *p)
>  static int nocomp_recv_pages(MultiFDRecvParams *p, Error **errp)
>  {
>      uint32_t flags = p->flags & MULTIFD_FLAG_COMPRESSION_MASK;
> -    size_t page_size = qemu_target_page_size();
>  
>      if (flags != MULTIFD_FLAG_NOCOMP) {
>          error_setg(errp, "multifd %u: flags received %x flags expected %x",
> @@ -148,7 +146,7 @@ static int nocomp_recv_pages(MultiFDRecvParams *p, Error **errp)
>      }
>      for (int i = 0; i < p->normal_num; i++) {
>          p->iov[i].iov_base = p->host + p->normal[i];
> -        p->iov[i].iov_len = page_size;
> +        p->iov[i].iov_len = p->page_size;
>      }
>      return qio_channel_readv_all(p->c, p->iov, p->normal_num, errp);
>  }
> @@ -281,8 +279,7 @@ static void multifd_send_fill_packet(MultiFDSendParams *p)
>  static int multifd_recv_unfill_packet(MultiFDRecvParams *p, Error **errp)
>  {
>      MultiFDPacket_t *packet = p->packet;
> -    size_t page_size = qemu_target_page_size();
> -    uint32_t page_count = MULTIFD_PACKET_SIZE / page_size;
> +    uint32_t page_count = MULTIFD_PACKET_SIZE / p->page_size;
>      RAMBlock *block;
>      int i;
>  
> @@ -344,7 +341,7 @@ static int multifd_recv_unfill_packet(MultiFDRecvParams *p, Error **errp)
>      for (i = 0; i < p->normal_num; i++) {
>          uint64_t offset = be64_to_cpu(packet->offset[i]);
>  
> -        if (offset > (block->used_length - page_size)) {
> +        if (offset > (block->used_length - p->page_size)) {
>              error_setg(errp, "multifd: offset too long %" PRIu64
>                         " (max " RAM_ADDR_FMT ")",
>                         offset, block->used_length);
> @@ -433,8 +430,7 @@ static int multifd_send_pages(QEMUFile *f)
>      p->packet_num = multifd_send_state->packet_num++;
>      multifd_send_state->pages = p->pages;
>      p->pages = pages;
> -    transferred = ((uint64_t) pages->num) * qemu_target_page_size()
> -                + p->packet_len;
> +    transferred = ((uint64_t) pages->num) * p->page_size + p->packet_len;
>      qemu_file_acct_rate_limit(f, transferred);
>      ram_counters.multifd_bytes += transferred;
>      ram_counters.transferred += transferred;
> @@ -939,6 +935,7 @@ int multifd_save_setup(Error **errp)
>          /* We need one extra place for the packet header */
>          p->iov = g_new0(struct iovec, page_count + 1);
>          p->normal = g_new0(ram_addr_t, page_count);
> +        p->page_size = qemu_target_page_size();
>  
>          if (migrate_use_zero_copy_send()) {
>              p->write_flags = QIO_CHANNEL_WRITE_FLAG_ZERO_COPY;
> @@ -1186,6 +1183,7 @@ int multifd_load_setup(Error **errp)
>          p->name = g_strdup_printf("multifdrecv_%d", i);
>          p->iov = g_new0(struct iovec, page_count);
>          p->normal = g_new0(ram_addr_t, page_count);
> +        p->page_size = qemu_target_page_size();
>      }
>  
>      for (i = 0; i < thread_count; i++) {


IIUC this info should never change after assigned, and is the same on every
multifd channel param. 

I wonder if it would be interesting to have a common area for this kind of info,
which could be referenced by every multifd channel parameter.
Or maybe too much trouble?

Anyway, FWIW:
Reviewed-by: Leonardo Bras <leobras@redhat.com>

Best regards, 
Leo



^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH v7 02/12] multifd: Create page_count fields into both MultiFD{Recv,Send}Params
  2022-08-02  6:38 ` [PATCH v7 02/12] multifd: Create page_count fields into both MultiFD{Recv, Send}Params Juan Quintela
@ 2022-08-11  8:10   ` Leonardo Brás
  0 siblings, 0 replies; 43+ messages in thread
From: Leonardo Brás @ 2022-08-11  8:10 UTC (permalink / raw)
  To: Juan Quintela, qemu-devel
  Cc: Marcel Apfelbaum, Dr. David Alan Gilbert, Peter Xu, Eric Blake,
	Philippe Mathieu-Daudé, Yanan Wang, Markus Armbruster,
	Eduardo Habkost

On Tue, 2022-08-02 at 08:38 +0200, Juan Quintela wrote:
> We were recalculating it left and right.  We plan to change that
> values on next patches.
> 
> Signed-off-by: Juan Quintela <quintela@redhat.com>
> ---
>  migration/multifd.h | 4 ++++
>  migration/multifd.c | 7 ++++---
>  2 files changed, 8 insertions(+), 3 deletions(-)
> 
> diff --git a/migration/multifd.h b/migration/multifd.h
> index 86fb9982b3..e2802a9ce2 100644
> --- a/migration/multifd.h
> +++ b/migration/multifd.h
> @@ -82,6 +82,8 @@ typedef struct {
>      uint32_t packet_len;
>      /* guest page size */
>      uint32_t page_size;
> +    /* number of pages in a full packet */
> +    uint32_t page_count;
>      /* multifd flags for sending ram */
>      int write_flags;
>  
> @@ -147,6 +149,8 @@ typedef struct {
>      uint32_t packet_len;
>      /* guest page size */
>      uint32_t page_size;
> +    /* number of pages in a full packet */
> +    uint32_t page_count;
>  
>      /* syncs main thread and channels */
>      QemuSemaphore sem_sync;
> diff --git a/migration/multifd.c b/migration/multifd.c
> index d2070c9cee..aa3808a6f4 100644
> --- a/migration/multifd.c
> +++ b/migration/multifd.c
> @@ -279,7 +279,6 @@ static void multifd_send_fill_packet(MultiFDSendParams *p)
>  static int multifd_recv_unfill_packet(MultiFDRecvParams *p, Error **errp)
>  {
>      MultiFDPacket_t *packet = p->packet;
> -    uint32_t page_count = MULTIFD_PACKET_SIZE / p->page_size;
>      RAMBlock *block;
>      int i;
>  
> @@ -306,10 +305,10 @@ static int multifd_recv_unfill_packet(MultiFDRecvParams *p, Error **errp)
>       * If we received a packet that is 100 times bigger than expected
>       * just stop migration.  It is a magic number.
>       */
> -    if (packet->pages_alloc > page_count) {
> +    if (packet->pages_alloc > p->page_count) {
>          error_setg(errp, "multifd: received packet "
>                     "with size %u and expected a size of %u",
> -                   packet->pages_alloc, page_count) ;
> +                   packet->pages_alloc, p->page_count) ;
>          return -1;
>      }
>  
> @@ -936,6 +935,7 @@ int multifd_save_setup(Error **errp)
>          p->iov = g_new0(struct iovec, page_count + 1);
>          p->normal = g_new0(ram_addr_t, page_count);
>          p->page_size = qemu_target_page_size();
> +        p->page_count = page_count;
>  
>          if (migrate_use_zero_copy_send()) {
>              p->write_flags = QIO_CHANNEL_WRITE_FLAG_ZERO_COPY;
> @@ -1183,6 +1183,7 @@ int multifd_load_setup(Error **errp)
>          p->name = g_strdup_printf("multifdrecv_%d", i);
>          p->iov = g_new0(struct iovec, page_count);
>          p->normal = g_new0(ram_addr_t, page_count);
> +        p->page_count = page_count;
>          p->page_size = qemu_target_page_size();
>      }
>  

Same comment as Patch [1/12] here.

FWIW:
Reviewed-by: Leonardo Bras <leobras@redhat.com>

Best regards, 
Leo



^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH v7 03/12] migration: Export ram_transferred_ram()
  2022-08-02  6:38 ` [PATCH v7 03/12] migration: Export ram_transferred_ram() Juan Quintela
@ 2022-08-11  8:11   ` Leonardo Brás
  2022-08-13 15:36     ` Juan Quintela
  0 siblings, 1 reply; 43+ messages in thread
From: Leonardo Brás @ 2022-08-11  8:11 UTC (permalink / raw)
  To: Juan Quintela, qemu-devel
  Cc: Marcel Apfelbaum, Dr. David Alan Gilbert, Peter Xu, Eric Blake,
	Philippe Mathieu-Daudé, Yanan Wang, Markus Armbruster,
	Eduardo Habkost, David Edmondson

On Tue, 2022-08-02 at 08:38 +0200, Juan Quintela wrote:
> Signed-off-by: Juan Quintela <quintela@redhat.com>
> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
> Reviewed-by: David Edmondson <david.edmondson@oracle.com>
> Signed-off-by: Juan Quintela <quintela@redhat.com>

Is this doubled Signed-off-by intentional?

> ---
>  migration/ram.h | 2 ++
>  migration/ram.c | 2 +-
>  2 files changed, 3 insertions(+), 1 deletion(-)
> 
> diff --git a/migration/ram.h b/migration/ram.h
> index c7af65ac74..e844966f69 100644
> --- a/migration/ram.h
> +++ b/migration/ram.h
> @@ -65,6 +65,8 @@ int ram_load_postcopy(QEMUFile *f, int channel);
>  
>  void ram_handle_compressed(void *host, uint8_t ch, uint64_t size);
>  
> +void ram_transferred_add(uint64_t bytes);
> +
>  int ramblock_recv_bitmap_test(RAMBlock *rb, void *host_addr);
>  bool ramblock_recv_bitmap_test_byte_offset(RAMBlock *rb, uint64_t byte_offset);
>  void ramblock_recv_bitmap_set(RAMBlock *rb, void *host_addr);
> diff --git a/migration/ram.c b/migration/ram.c
> index b94669ba5d..85d89d61ac 100644
> --- a/migration/ram.c
> +++ b/migration/ram.c
> @@ -422,7 +422,7 @@ uint64_t ram_bytes_remaining(void)
>  
>  MigrationStats ram_counters;
>  
> -static void ram_transferred_add(uint64_t bytes)
> +void ram_transferred_add(uint64_t bytes)
>  {
>      if (runstate_is_running()) {
>          ram_counters.precopy_bytes += bytes;

Other than that, FWIW: 
Reviewed-by: Leonardo Bras <leobras@redhat.com>


Best regards, 
Leo



^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH v7 04/12] multifd: Count the number of bytes sent correctly
  2022-08-02  6:38 ` [PATCH v7 04/12] multifd: Count the number of bytes sent correctly Juan Quintela
@ 2022-08-11  8:11   ` Leonardo Brás
  2022-08-19  9:35     ` Juan Quintela
  0 siblings, 1 reply; 43+ messages in thread
From: Leonardo Brás @ 2022-08-11  8:11 UTC (permalink / raw)
  To: Juan Quintela, qemu-devel
  Cc: Marcel Apfelbaum, Dr. David Alan Gilbert, Peter Xu, Eric Blake,
	Philippe Mathieu-Daudé, Yanan Wang, Markus Armbruster,
	Eduardo Habkost

On Tue, 2022-08-02 at 08:38 +0200, Juan Quintela wrote:
> Current code asumes that all pages are whole.  That is not true for
> example for compression already.  Fix it for creating a new field
> ->sent_bytes that includes it.
> 
> All ram_counters are used only from the migration thread, so we have
> two options:
> - put a mutex and fill everything when we sent it (not only
> ram_counters, also qemu_file->xfer_bytes).
> - Create a local variable that implements how much has been sent
> through each channel.  And when we push another packet, we "add" the
> previous stats.
> 
> I choose two due to less changes overall.  On the previous code we
> increase transferred and then we sent.  Current code goes the other
> way around.  It sents the data, and after the fact, it updates the
> counters.  Notice that each channel can have a maximum of half a
> megabyte of data without counting, so it is not very important.
> 
> Signed-off-by: Juan Quintela <quintela@redhat.com>
> ---
>  migration/multifd.h |  2 ++
>  migration/multifd.c | 14 ++++++--------
>  2 files changed, 8 insertions(+), 8 deletions(-)
> 
> diff --git a/migration/multifd.h b/migration/multifd.h
> index e2802a9ce2..36f899c56f 100644
> --- a/migration/multifd.h
> +++ b/migration/multifd.h
> @@ -102,6 +102,8 @@ typedef struct {
>      uint32_t flags;
>      /* global number of generated multifd packets */
>      uint64_t packet_num;
> +    /* How many bytes have we sent on the last packet */
> +    uint64_t sent_bytes;
>      /* thread has work to do */
>      int pending_job;
>      /* array of pages to sent.
> diff --git a/migration/multifd.c b/migration/multifd.c
> index aa3808a6f4..e25b529235 100644
> --- a/migration/multifd.c
> +++ b/migration/multifd.c
> @@ -394,7 +394,6 @@ static int multifd_send_pages(QEMUFile *f)
>      static int next_channel;
>      MultiFDSendParams *p = NULL; /* make happy gcc */
>      MultiFDPages_t *pages = multifd_send_state->pages;
> -    uint64_t transferred;
>  
>      if (qatomic_read(&multifd_send_state->exiting)) {
>          return -1;
> @@ -429,10 +428,10 @@ static int multifd_send_pages(QEMUFile *f)
>      p->packet_num = multifd_send_state->packet_num++;
>      multifd_send_state->pages = p->pages;
>      p->pages = pages;
> -    transferred = ((uint64_t) pages->num) * p->page_size + p->packet_len;
> -    qemu_file_acct_rate_limit(f, transferred);
> -    ram_counters.multifd_bytes += transferred;
> -    ram_counters.transferred += transferred;
> +    ram_transferred_add(p->sent_bytes);
> +    ram_counters.multifd_bytes += p->sent_bytes;

I'm worndering if we could avoid having this last line by having
ram_transferred_add() to include:

if (migrate_use_multifd()) {
    ram_counters.multifd_bytes += bytes;
}

But I am not sure if other usages from ram_transferred_add() could interfere.


> +    qemu_file_acct_rate_limit(f, p->sent_bytes);
> +    p->sent_bytes = 0;
>      qemu_mutex_unlock(&p->mutex);
>      qemu_sem_post(&p->sem);
>  
> @@ -605,9 +604,6 @@ int multifd_send_sync_main(QEMUFile *f)
>          p->packet_num = multifd_send_state->packet_num++;
>          p->flags |= MULTIFD_FLAG_SYNC;
>          p->pending_job++;
> -        qemu_file_acct_rate_limit(f, p->packet_len);
> -        ram_counters.multifd_bytes += p->packet_len;
> -        ram_counters.transferred += p->packet_len;
>          qemu_mutex_unlock(&p->mutex);
>          qemu_sem_post(&p->sem);
>  
> @@ -714,6 +710,8 @@ static void *multifd_send_thread(void *opaque)
>              }
>  
>              qemu_mutex_lock(&p->mutex);
> +            p->sent_bytes += p->packet_len;;

Double semicolon.

> +            p->sent_bytes += p->next_packet_size;
>              p->pending_job--;
>              qemu_mutex_unlock(&p->mutex);
>  

IIUC, it changes how rate-limiting and ram counters perceive how many bytes have
been sent, by counting actual bytes instead of page multiples. This should
reflect what have been actually sent (in terms of rate limiting).

I'm wondering if having the ram_counters.transferred to reflect acutal bytes,
instead of the number of pages * pagesize will cause any user (or management
code) to be confuse in any way.

Other than that:
Reviewed-by: Leonardo Bras <leobras@redhat.com>

Best regards, 
Leo



^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH v7 05/12] migration: Make ram_save_target_page() a pointer
  2022-08-02  6:39 ` [PATCH v7 05/12] migration: Make ram_save_target_page() a pointer Juan Quintela
@ 2022-08-11  8:11   ` Leonardo Brás
  2022-08-19  9:51     ` Juan Quintela
  0 siblings, 1 reply; 43+ messages in thread
From: Leonardo Brás @ 2022-08-11  8:11 UTC (permalink / raw)
  To: Juan Quintela, qemu-devel
  Cc: Marcel Apfelbaum, Dr. David Alan Gilbert, Peter Xu, Eric Blake,
	Philippe Mathieu-Daudé, Yanan Wang, Markus Armbruster,
	Eduardo Habkost

On Tue, 2022-08-02 at 08:39 +0200, Juan Quintela wrote:
> We are going to create a new function for multifd latest in the series.
> 
> Signed-off-by: Juan Quintela <quintela@redhat.com>
> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
> Signed-off-by: Juan Quintela <quintela@redhat.com>

Double Signed-off-by again.

> ---
>  migration/ram.c | 13 +++++++++----
>  1 file changed, 9 insertions(+), 4 deletions(-)
> 
> diff --git a/migration/ram.c b/migration/ram.c
> index 85d89d61ac..499d9b2a90 100644
> --- a/migration/ram.c
> +++ b/migration/ram.c
> @@ -310,6 +310,9 @@ typedef struct {
>      bool preempted;
>  } PostcopyPreemptState;
>  
> +typedef struct RAMState RAMState;
> +typedef struct PageSearchStatus PageSearchStatus;
> +
>  /* State of RAM for migration */
>  struct RAMState {
>      /* QEMUFile used for this migration */
> @@ -372,8 +375,9 @@ struct RAMState {
>       * is enabled.
>       */
>      unsigned int postcopy_channel;
> +
> +    int (*ram_save_target_page)(RAMState *rs, PageSearchStatus *pss);
>  };
> -typedef struct RAMState RAMState;
>  
>  static RAMState *ram_state;
>  
> @@ -2255,14 +2259,14 @@ static bool save_compress_page(RAMState *rs, RAMBlock *block, ram_addr_t offset)
>  }
>  
>  /**
> - * ram_save_target_page: save one target page
> + * ram_save_target_page_legacy: save one target page
>   *
>   * Returns the number of pages written
>   *
>   * @rs: current RAM state
>   * @pss: data about the page we want to send
>   */
> -static int ram_save_target_page(RAMState *rs, PageSearchStatus *pss)
> +static int ram_save_target_page_legacy(RAMState *rs, PageSearchStatus *pss)
>  {
>      RAMBlock *block = pss->block;
>      ram_addr_t offset = ((ram_addr_t)pss->page) << TARGET_PAGE_BITS;
> @@ -2469,7 +2473,7 @@ static int ram_save_host_page(RAMState *rs, PageSearchStatus *pss)
>  
>          /* Check the pages is dirty and if it is send it */
>          if (migration_bitmap_clear_dirty(rs, pss->block, pss->page)) {
> -            tmppages = ram_save_target_page(rs, pss);
> +            tmppages = rs->ram_save_target_page(rs, pss);
>              if (tmppages < 0) {
>                  return tmppages;
>              }
> @@ -3223,6 +3227,7 @@ static int ram_save_setup(QEMUFile *f, void *opaque)
>      ram_control_before_iterate(f, RAM_CONTROL_SETUP);
>      ram_control_after_iterate(f, RAM_CONTROL_SETUP);
>  
> +    (*rsp)->ram_save_target_page = ram_save_target_page_legacy;
>      ret =  multifd_send_sync_main(f);
>      if (ret < 0) {
>          return ret;


So, IIUC:
- Rename ram_save_target_page -> ram_save_target_page_legacy
- Add a function pointer to RAMState (or a callback)
- Assign function pointer = ram_save_target_page_legacy at setup
- Replace ram_save_target_page() by indirect function call using above pointer.

I could see no issue in this, so I belive it works fine.

The only thing that concerns me is the name RAMState.
IMHO, a struct named RAMState is supposed to just reflect the state of ram (or
according to this struct's comments, the state of RAM for migration. Having a
function pointer here that saves a page seems counterintuitive, since it does
not reflect the state of RAM.

Maybe we could rename the struct, or even better, create another struct that
could look something like this:

struct RAMMigration {
    RAMState state;
    int (*ram_save_target_page)(RAMState *rs, PageSearchStatus *pss);
    /* Other callbacks or further info.*/
}

What do you think about it?

Best regards, 
Leo



^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH v7 06/12] multifd: Make flags field thread local
  2022-08-02  6:39 ` [PATCH v7 06/12] multifd: Make flags field thread local Juan Quintela
@ 2022-08-11  9:04   ` Leonardo Brás
  2022-08-19 10:03     ` Juan Quintela
  0 siblings, 1 reply; 43+ messages in thread
From: Leonardo Brás @ 2022-08-11  9:04 UTC (permalink / raw)
  To: Juan Quintela, qemu-devel
  Cc: Marcel Apfelbaum, Dr. David Alan Gilbert, Peter Xu, Eric Blake,
	Philippe Mathieu-Daudé, Yanan Wang, Markus Armbruster,
	Eduardo Habkost

On Tue, 2022-08-02 at 08:39 +0200, Juan Quintela wrote:
> Use of flags with respect to locking was incensistant.  For the
> sending side:
> - it was set to 0 with mutex held on the multifd channel.
> - MULTIFD_FLAG_SYNC was set with mutex held on the migration thread.
> - Everything else was done without the mutex held on the multifd channel.
> 
> On the reception side, it is not used on the migration thread, only on
> the multifd channels threads.
> 
> So we move it to the multifd channels thread only variables, and we
> introduce a new bool sync_needed on the send side to pass that information.
> 
> Signed-off-by: Juan Quintela <quintela@redhat.com>
> ---
>  migration/multifd.h | 10 ++++++----
>  migration/multifd.c | 23 +++++++++++++----------
>  2 files changed, 19 insertions(+), 14 deletions(-)
> 
> diff --git a/migration/multifd.h b/migration/multifd.h
> index 36f899c56f..a67cefc0a2 100644
> --- a/migration/multifd.h
> +++ b/migration/multifd.h
> @@ -98,12 +98,12 @@ typedef struct {

Just noticed having no name in 'typedef struct' line makes it harder to
understand what is going on. 

MultiFDSendParams

>      bool running;
>      /* should this thread finish */
>      bool quit;
> -    /* multifd flags for each packet */
> -    uint32_t flags;
>      /* global number of generated multifd packets */
>      uint64_t packet_num;
>      /* How many bytes have we sent on the last packet */
>      uint64_t sent_bytes;
> +    /* Do we need to do an iteration sync */
> +    bool sync_needed;
>      /* thread has work to do */
>      int pending_job;
>      /* array of pages to sent.
> @@ -117,6 +117,8 @@ typedef struct {
>  
>      /* pointer to the packet */
>      MultiFDPacket_t *packet;
> +    /* multifd flags for each packet */
> +    uint32_t flags;
>      /* size of the next packet that contains pages */
>      uint32_t next_packet_size;
>      /* packets sent through this channel */

MultiFDRecvParams

> @@ -163,8 +165,6 @@ typedef struct {
>      bool running;
>      /* should this thread finish */
>      bool quit;
> -    /* multifd flags for each packet */
> -    uint32_t flags;
>      /* global number of generated multifd packets */
>      uint64_t packet_num;
>  
> @@ -172,6 +172,8 @@ typedef struct {
>  
>      /* pointer to the packet */
>      MultiFDPacket_t *packet;
> +    /* multifd flags for each packet */
> +    uint32_t flags;
>      /* size of the next packet that contains pages */
>      uint32_t next_packet_size;
>      /* packets sent through this channel */

So, IIUC, the struct member flags got moved down (same struct) to an area
described as thread-local, meaning it does not need locking. 

Interesting, I haven't noticed this different areas in the same struct.

> diff --git a/migration/multifd.c b/migration/multifd.c
> index e25b529235..09a40a9135 100644
> --- a/migration/multifd.c
> +++ b/migration/multifd.c
> @@ -602,7 +602,7 @@ int multifd_send_sync_main(QEMUFile *f)
>          }
>  
>          p->packet_num = multifd_send_state->packet_num++;
> -        p->flags |= MULTIFD_FLAG_SYNC;
> +        p->sync_needed = true;
>          p->pending_job++;
>          qemu_mutex_unlock(&p->mutex);
>          qemu_sem_post(&p->sem);
> @@ -658,7 +658,11 @@ static void *multifd_send_thread(void *opaque)
>  
>          if (p->pending_job) {
>              uint64_t packet_num = p->packet_num;
> -            uint32_t flags = p->flags;
> +            p->flags = 0;
> +            if (p->sync_needed) {
> +                p->flags |= MULTIFD_FLAG_SYNC;
> +                p->sync_needed = false;
> +            }

Any particular reason why doing p->flags = 0, then p->flags |= MULTIFD_FLAG_SYNC
?

[1] Couldn't it be done without the |= , since it's already being set to zero
before? (becoming "p->flags = MULTIFD_FLAG_SYNC" )


>              p->normal_num = 0;
>  
>              if (use_zero_copy_send) {
> @@ -680,14 +684,13 @@ static void *multifd_send_thread(void *opaque)
>                  }
>              }
>              multifd_send_fill_packet(p);
> -            p->flags = 0;
>              p->num_packets++;
>              p->total_normal_pages += p->normal_num;
>              p->pages->num = 0;
>              p->pages->block = NULL;
>              qemu_mutex_unlock(&p->mutex);
>  
> -            trace_multifd_send(p->id, packet_num, p->normal_num, flags,
> +            trace_multifd_send(p->id, packet_num, p->normal_num, p->flags,
>                                 p->next_packet_size);
>  
>              if (use_zero_copy_send) {
> @@ -715,7 +718,7 @@ static void *multifd_send_thread(void *opaque)
>              p->pending_job--;
>              qemu_mutex_unlock(&p->mutex);
>  
> -            if (flags & MULTIFD_FLAG_SYNC) {
> +            if (p->flags & MULTIFD_FLAG_SYNC) {
>                  qemu_sem_post(&p->sem_sync);
>              }
>              qemu_sem_post(&multifd_send_state->channels_ready);

IIUC it uses p->sync_needed to keep the sync info, instead of the previous flags
local var, and thus it can set p->flags = 0 earlier. Seems to not change any
behavior AFAICS.



> @@ -1090,7 +1093,7 @@ static void *multifd_recv_thread(void *opaque)
>      rcu_register_thread();
>  
>      while (true) {
> -        uint32_t flags;
> +        bool sync_needed = false;
>  
>          if (p->quit) {
>              break;
> @@ -1112,11 +1115,11 @@ static void *multifd_recv_thread(void *opaque)
>              break;
>          }
>  
> -        flags = p->flags;
> +        trace_multifd_recv(p->id, p->packet_num, p->normal_num, p->flags,
> +                           p->next_packet_size);
> +        sync_needed = p->flags & MULTIFD_FLAG_SYNC;
>          /* recv methods don't know how to handle the SYNC flag */
>          p->flags &= ~MULTIFD_FLAG_SYNC;
> -        trace_multifd_recv(p->id, p->packet_num, p->normal_num, flags,
> -                           p->next_packet_size);
>          p->num_packets++;
>          p->total_normal_pages += p->normal_num;
>          qemu_mutex_unlock(&p->mutex);
> @@ -1128,7 +1131,7 @@ static void *multifd_recv_thread(void *opaque)
>              }
>          }
>  
> -        if (flags & MULTIFD_FLAG_SYNC) {
> +        if (sync_needed) {
>              qemu_sem_post(&multifd_recv_state->sem_sync);
>              qemu_sem_wait(&p->sem_sync);
>          }

Ok, IIUC this part should have the same behavior as before, but using a bool
instead of an u32.

Other than question [1], LGTM. 

FWIW:
Reviewed-by: Leonardo Bras <leobras@redhat.com>



^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH v7 07/12] multifd: Prepare to send a packet without the mutex held
  2022-08-02  6:39 ` [PATCH v7 07/12] multifd: Prepare to send a packet without the mutex held Juan Quintela
@ 2022-08-11  9:16   ` Leonardo Brás
  2022-08-19 11:32     ` Juan Quintela
  0 siblings, 1 reply; 43+ messages in thread
From: Leonardo Brás @ 2022-08-11  9:16 UTC (permalink / raw)
  To: Juan Quintela, qemu-devel
  Cc: Marcel Apfelbaum, Dr. David Alan Gilbert, Peter Xu, Eric Blake,
	Philippe Mathieu-Daudé, Yanan Wang, Markus Armbruster,
	Eduardo Habkost

On Tue, 2022-08-02 at 08:39 +0200, Juan Quintela wrote:
> We do the send_prepare() and the fill of the head packet without the
> mutex held.  It will help a lot for compression and later in the
> series for zero pages.
> 
> Notice that we can use p->pages without holding p->mutex because
> p->pending_job == 1.
> 
> Signed-off-by: Juan Quintela <quintela@redhat.com>
> ---
>  migration/multifd.h |  2 ++
>  migration/multifd.c | 11 ++++++-----
>  2 files changed, 8 insertions(+), 5 deletions(-)
> 
> diff --git a/migration/multifd.h b/migration/multifd.h
> index a67cefc0a2..cd389d18d2 100644
> --- a/migration/multifd.h
> +++ b/migration/multifd.h
> @@ -109,7 +109,9 @@ typedef struct {
>      /* array of pages to sent.
>       * The owner of 'pages' depends of 'pending_job' value:
>       * pending_job == 0 -> migration_thread can use it.
> +     *                     No need for mutex lock.
>       * pending_job != 0 -> multifd_channel can use it.
> +     *                     No need for mutex lock.
>       */
>      MultiFDPages_t *pages;
>  
> diff --git a/migration/multifd.c b/migration/multifd.c
> index 09a40a9135..68fc9f8e88 100644
> --- a/migration/multifd.c
> +++ b/migration/multifd.c
> @@ -663,6 +663,8 @@ static void *multifd_send_thread(void *opaque)
>                  p->flags |= MULTIFD_FLAG_SYNC;
>                  p->sync_needed = false;
>              }
> +            qemu_mutex_unlock(&p->mutex);
> +

If it unlocks here, we will have unprotected:
for (int i = 0; i < p->pages->num; i++) {
    p->normal[p->normal_num] = p->pages->offset[i];
    p->normal_num++;
}

And p->pages seems to be in the mutex-protected area.
Should it be ok?

Also, under that we have:
            if (p->normal_num) {
                ret = multifd_send_state->ops->send_prepare(p, &local_err);
                if (ret != 0) {
                    qemu_mutex_unlock(&p->mutex);
                    break;
                }
            }

Calling mutex_unlock() here, even though the unlock already happened before,
could cause any issue?


>              p->normal_num = 0;
>  
>              if (use_zero_copy_send) {
> @@ -684,11 +686,6 @@ static void *multifd_send_thread(void *opaque)
>                  }
>              }
>              multifd_send_fill_packet(p);
> -            p->num_packets++;
> -            p->total_normal_pages += p->normal_num;
> -            p->pages->num = 0;
> -            p->pages->block = NULL;
> -            qemu_mutex_unlock(&p->mutex);
>  
>              trace_multifd_send(p->id, packet_num, p->normal_num, p->flags,
>                                 p->next_packet_size);
> @@ -713,6 +710,10 @@ static void *multifd_send_thread(void *opaque)
>              }
>  
>              qemu_mutex_lock(&p->mutex);
> +            p->num_packets++;
> +            p->total_normal_pages += p->normal_num;
> +            p->pages->num = 0;
> +            p->pages->block = NULL;
>              p->sent_bytes += p->packet_len;;
>              p->sent_bytes += p->next_packet_size;
>              p->pending_job--;

Not used in the interval, this part seems ok.

Best regards,
Leo



^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH v7 08/12] multifd: Add capability to enable/disable zero_page
  2022-08-02  6:39 ` [PATCH v7 08/12] multifd: Add capability to enable/disable zero_page Juan Quintela
@ 2022-08-11  9:29   ` Leonardo Brás
  2022-08-19 11:36     ` Juan Quintela
  0 siblings, 1 reply; 43+ messages in thread
From: Leonardo Brás @ 2022-08-11  9:29 UTC (permalink / raw)
  To: Juan Quintela, qemu-devel
  Cc: Marcel Apfelbaum, Dr. David Alan Gilbert, Peter Xu, Eric Blake,
	Philippe Mathieu-Daudé, Yanan Wang, Markus Armbruster,
	Eduardo Habkost

On Tue, 2022-08-02 at 08:39 +0200, Juan Quintela wrote:
> We have to enable it by default until we introduce the new code.
> 
> Signed-off-by: Juan Quintela <quintela@redhat.com>
> 
> ---
> 
> Change it to a capability.  As capabilities are off by default, have
> to change MULTIFD_ZERO_PAGE to MAIN_ZERO_PAGE, so it is false for
> default, and true for older versions.

IIUC, the idea of a capability is to introduce some new features to the code,
and let users enable or disable it. 

If it introduce a new capability, is not very intuitive to think that it will be
always true for older versions, and false for new ones.

I would suggest adding it as MULTIFD_ZERO_PAGE, and let it disabled for now.
When the full feature gets introduced, the capability could be enabled by
default, if desired.

What do you think?

Best regards,
Leo


> ---
>  qapi/migration.json   |  8 +++++++-
>  migration/migration.h |  1 +
>  hw/core/machine.c     |  1 +
>  migration/migration.c | 16 +++++++++++++++-
>  4 files changed, 24 insertions(+), 2 deletions(-)
> 
> diff --git a/qapi/migration.json b/qapi/migration.json
> index 81185d4311..dc981236ff 100644
> --- a/qapi/migration.json
> +++ b/qapi/migration.json
> @@ -472,12 +472,18 @@
>  #                  Requires that QEMU be permitted to use locked memory
>  #                  for guest RAM pages.
>  #                  (since 7.1)
> +#
>  # @postcopy-preempt: If enabled, the migration process will allow postcopy
>  #                    requests to preempt precopy stream, so postcopy requests
>  #                    will be handled faster.  This is a performance feature and
>  #                    should not affect the correctness of postcopy migration.
>  #                    (since 7.1)
>  #
> +# @main-zero-page: If enabled, the detection of zero pages will be
> +#                  done on the main thread.  Otherwise it is done on
> +#                  the multifd threads.
> +#                  (since 7.1)
> +#
>  # Features:
>  # @unstable: Members @x-colo and @x-ignore-shared are experimental.
>  #
> @@ -492,7 +498,7 @@
>             'dirty-bitmaps', 'postcopy-blocktime', 'late-block-activate',
>             { 'name': 'x-ignore-shared', 'features': [ 'unstable' ] },
>             'validate-uuid', 'background-snapshot',
> -           'zero-copy-send', 'postcopy-preempt'] }
> +           'zero-copy-send', 'postcopy-preempt', 'main-zero-page'] }
>  
>  ##
>  # @MigrationCapabilityStatus:
> diff --git a/migration/migration.h b/migration/migration.h
> index cdad8aceaa..58b245b138 100644
> --- a/migration/migration.h
> +++ b/migration/migration.h
> @@ -415,6 +415,7 @@ int migrate_multifd_channels(void);
>  MultiFDCompression migrate_multifd_compression(void);
>  int migrate_multifd_zlib_level(void);
>  int migrate_multifd_zstd_level(void);
> +bool migrate_use_main_zero_page(void);
>  
>  #ifdef CONFIG_LINUX
>  bool migrate_use_zero_copy_send(void);
> diff --git a/hw/core/machine.c b/hw/core/machine.c
> index a673302cce..2624b75ab4 100644
> --- a/hw/core/machine.c
> +++ b/hw/core/machine.c
> @@ -43,6 +43,7 @@
>  GlobalProperty hw_compat_7_0[] = {
>      { "arm-gicv3-common", "force-8-bit-prio", "on" },
>      { "nvme-ns", "eui64-default", "on"},
> +    { "migration", "main-zero-page", "true" },
>  };
>  const size_t hw_compat_7_0_len = G_N_ELEMENTS(hw_compat_7_0);
>  
> diff --git a/migration/migration.c b/migration/migration.c
> index e03f698a3c..ce3e5cc0cd 100644
> --- a/migration/migration.c
> +++ b/migration/migration.c
> @@ -164,7 +164,8 @@ INITIALIZE_MIGRATE_CAPS_SET(check_caps_background_snapshot,
>      MIGRATION_CAPABILITY_XBZRLE,
>      MIGRATION_CAPABILITY_X_COLO,
>      MIGRATION_CAPABILITY_VALIDATE_UUID,
> -    MIGRATION_CAPABILITY_ZERO_COPY_SEND);
> +    MIGRATION_CAPABILITY_ZERO_COPY_SEND,
> +    MIGRATION_CAPABILITY_MAIN_ZERO_PAGE);
>  
>  /* When we add fault tolerance, we could have several
>     migrations at once.  For now we don't need to add
> @@ -2592,6 +2593,17 @@ bool migrate_use_multifd(void)
>      return s->enabled_capabilities[MIGRATION_CAPABILITY_MULTIFD];
>  }
>  
> +bool migrate_use_main_zero_page(void)
> +{
> +    MigrationState *s;
> +
> +    s = migrate_get_current();
> +
> +    // We will enable this when we add the right code.
> +    // return s->enabled_capabilities[MIGRATION_CAPABILITY_MAIN_ZERO_PAGE];
> +    return true;
> +}
> +
>  bool migrate_pause_before_switchover(void)
>  {
>      MigrationState *s;
> @@ -4406,6 +4418,8 @@ static Property migration_properties[] = {
>      DEFINE_PROP_MIG_CAP("x-zero-copy-send",
>              MIGRATION_CAPABILITY_ZERO_COPY_SEND),
>  #endif
> +    DEFINE_PROP_MIG_CAP("main-zero-page",
> +            MIGRATION_CAPABILITY_MAIN_ZERO_PAGE),
>  
>      DEFINE_PROP_END_OF_LIST(),
>  };



^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH v7 09/12] migration: Export ram_release_page()
  2022-08-02  6:39 ` [PATCH v7 09/12] migration: Export ram_release_page() Juan Quintela
@ 2022-08-11  9:31   ` Leonardo Brás
  0 siblings, 0 replies; 43+ messages in thread
From: Leonardo Brás @ 2022-08-11  9:31 UTC (permalink / raw)
  To: Juan Quintela, qemu-devel
  Cc: Marcel Apfelbaum, Dr. David Alan Gilbert, Peter Xu, Eric Blake,
	Philippe Mathieu-Daudé, Yanan Wang, Markus Armbruster,
	Eduardo Habkost

On Tue, 2022-08-02 at 08:39 +0200, Juan Quintela wrote:
> Signed-off-by: Juan Quintela <quintela@redhat.com>
> ---
>  migration/ram.h | 1 +
>  migration/ram.c | 2 +-
>  2 files changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/migration/ram.h b/migration/ram.h
> index e844966f69..038d52f49f 100644
> --- a/migration/ram.h
> +++ b/migration/ram.h
> @@ -66,6 +66,7 @@ int ram_load_postcopy(QEMUFile *f, int channel);
>  void ram_handle_compressed(void *host, uint8_t ch, uint64_t size);
>  
>  void ram_transferred_add(uint64_t bytes);
> +void ram_release_page(const char *rbname, uint64_t offset);
>  
>  int ramblock_recv_bitmap_test(RAMBlock *rb, void *host_addr);
>  bool ramblock_recv_bitmap_test_byte_offset(RAMBlock *rb, uint64_t byte_offset);
> diff --git a/migration/ram.c b/migration/ram.c
> index 499d9b2a90..291ba5c0ed 100644
> --- a/migration/ram.c
> +++ b/migration/ram.c
> @@ -1238,7 +1238,7 @@ static void migration_bitmap_sync_precopy(RAMState *rs)
>      }
>  }
>  
> -static void ram_release_page(const char *rbname, uint64_t offset)
> +void ram_release_page(const char *rbname, uint64_t offset)
>  {
>      if (!migrate_release_ram() || !migration_in_postcopy()) {
>          return;

LGTM. FWIW:
Reviewed-by: Leonardo Bras <leobras@redhat.com>



^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH v7 03/12] migration: Export ram_transferred_ram()
  2022-08-11  8:11   ` Leonardo Brás
@ 2022-08-13 15:36     ` Juan Quintela
  0 siblings, 0 replies; 43+ messages in thread
From: Juan Quintela @ 2022-08-13 15:36 UTC (permalink / raw)
  To: Leonardo Brás
  Cc: qemu-devel, Marcel Apfelbaum, Dr. David Alan Gilbert, Peter Xu,
	Eric Blake, Philippe Mathieu-Daudé, Yanan Wang,
	Markus Armbruster, Eduardo Habkost, David Edmondson

Leonardo Brás <leobras@redhat.com> wrote:
> On Tue, 2022-08-02 at 08:38 +0200, Juan Quintela wrote:
>> Signed-off-by: Juan Quintela <quintela@redhat.com>
>> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
>> Reviewed-by: David Edmondson <david.edmondson@oracle.com>
>> Signed-off-by: Juan Quintela <quintela@redhat.com>
>
> Is this doubled Signed-off-by intentional?

It is .git/hooks/prepare-commit-msg for you.

Adding --if-exits doNothing

And we will see how it breaks in (other) very subtle ways.


>
> Other than that, FWIW: 
> Reviewed-by: Leonardo Bras <leobras@redhat.com>

Thanks, Juan.



^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH v7 01/12] multifd: Create page_size fields into both MultiFD{Recv,Send}Params
  2022-08-11  8:10   ` [PATCH v7 01/12] multifd: Create page_size fields into both MultiFD{Recv,Send}Params Leonardo Brás
@ 2022-08-13 15:41     ` Juan Quintela
  0 siblings, 0 replies; 43+ messages in thread
From: Juan Quintela @ 2022-08-13 15:41 UTC (permalink / raw)
  To: Leonardo Brás
  Cc: qemu-devel, Marcel Apfelbaum, Dr. David Alan Gilbert, Peter Xu,
	Eric Blake, Philippe Mathieu-Daudé, Yanan Wang,
	Markus Armbruster, Eduardo Habkost

Leonardo Brás <leobras@redhat.com> wrote:
> Hello Juan,
>
> On Tue, 2022-08-02 at 08:38 +0200, Juan Quintela wrote:
>> We were calling qemu_target_page_size() left and right.
>> 
>> Signed-off-by: Juan Quintela <quintela@redhat.com>
>
> IMHO looks a good idea to bring that info inside the multifd parameters.

Thanks.

[...]

> IIUC this info should never change after assigned, and is the same on every
> multifd channel param. 

registered_yank?  Perhaps, I have to look.

packet_len, page_size, page_count and write_flags: They never really change.

But on the other hand, So we are "wasting" 16bytes per channel.

> I wonder if it would be interesting to have a common area for this kind of info,
> which could be referenced by every multifd channel parameter.
> Or maybe too much trouble?

Will take a look on the future, the bigger problem that I can think of
is that we are already passing the MultiFD{Send,Recv}Params to each
function, so having them globaly will means to have global variable or
adding a pointer (8bytes) to each params, so not sure it is a good idea
with current amount of constants.

> Anyway, FWIW:
> Reviewed-by: Leonardo Bras <leobras@redhat.com>

thanks, Juan.



^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH v7 04/12] multifd: Count the number of bytes sent correctly
  2022-08-11  8:11   ` Leonardo Brás
@ 2022-08-19  9:35     ` Juan Quintela
  0 siblings, 0 replies; 43+ messages in thread
From: Juan Quintela @ 2022-08-19  9:35 UTC (permalink / raw)
  To: Leonardo Brás
  Cc: qemu-devel, Marcel Apfelbaum, Dr. David Alan Gilbert, Peter Xu,
	Eric Blake, Philippe Mathieu-Daudé, Yanan Wang,
	Markus Armbruster, Eduardo Habkost

Leonardo Brás <leobras@redhat.com> wrote:
> On Tue, 2022-08-02 at 08:38 +0200, Juan Quintela wrote:
>> Current code asumes that all pages are whole.  That is not true for
>> example for compression already.  Fix it for creating a new field
>> ->sent_bytes that includes it.
>> 
>> All ram_counters are used only from the migration thread, so we have
>> two options:
>> - put a mutex and fill everything when we sent it (not only
>> ram_counters, also qemu_file->xfer_bytes).
>> - Create a local variable that implements how much has been sent
>> through each channel.  And when we push another packet, we "add" the
>> previous stats.
>> 
>> I choose two due to less changes overall.  On the previous code we
>> increase transferred and then we sent.  Current code goes the other
>> way around.  It sents the data, and after the fact, it updates the
>> counters.  Notice that each channel can have a maximum of half a
>> megabyte of data without counting, so it is not very important.
>> 
>> Signed-off-by: Juan Quintela <quintela@redhat.com>
>> ---
>>  migration/multifd.h |  2 ++
>>  migration/multifd.c | 14 ++++++--------
>>  2 files changed, 8 insertions(+), 8 deletions(-)
>> 
>> diff --git a/migration/multifd.h b/migration/multifd.h
>> index e2802a9ce2..36f899c56f 100644
>> --- a/migration/multifd.h
>> +++ b/migration/multifd.h
>> @@ -102,6 +102,8 @@ typedef struct {
>>      uint32_t flags;
>>      /* global number of generated multifd packets */
>>      uint64_t packet_num;
>> +    /* How many bytes have we sent on the last packet */
>> +    uint64_t sent_bytes;
>>      /* thread has work to do */
>>      int pending_job;
>>      /* array of pages to sent.
>> diff --git a/migration/multifd.c b/migration/multifd.c
>> index aa3808a6f4..e25b529235 100644
>> --- a/migration/multifd.c
>> +++ b/migration/multifd.c
>> @@ -394,7 +394,6 @@ static int multifd_send_pages(QEMUFile *f)
>>      static int next_channel;
>>      MultiFDSendParams *p = NULL; /* make happy gcc */
>>      MultiFDPages_t *pages = multifd_send_state->pages;
>> -    uint64_t transferred;
>>  
>>      if (qatomic_read(&multifd_send_state->exiting)) {
>>          return -1;
>> @@ -429,10 +428,10 @@ static int multifd_send_pages(QEMUFile *f)
>>      p->packet_num = multifd_send_state->packet_num++;
>>      multifd_send_state->pages = p->pages;
>>      p->pages = pages;
>> -    transferred = ((uint64_t) pages->num) * p->page_size + p->packet_len;
>> -    qemu_file_acct_rate_limit(f, transferred);
>> -    ram_counters.multifd_bytes += transferred;
>> -    ram_counters.transferred += transferred;
>> +    ram_transferred_add(p->sent_bytes);
>> +    ram_counters.multifd_bytes += p->sent_bytes;
>
> I'm worndering if we could avoid having this last line by having
> ram_transferred_add() to include:
>
> if (migrate_use_multifd()) {
>     ram_counters.multifd_bytes += bytes;
> }
>
> But I am not sure if other usages from ram_transferred_add() could interfere.

I preffer not to, because ram_addr_ram() is also used for non multifd code.

> Double semicolon.

Fixed, thanks.

>> +            p->sent_bytes += p->next_packet_size;
>>              p->pending_job--;
>>              qemu_mutex_unlock(&p->mutex);
>>  
>
> IIUC, it changes how rate-limiting and ram counters perceive how many bytes have
> been sent, by counting actual bytes instead of page multiples. This should
> reflect what have been actually sent (in terms of rate limiting).
>
> I'm wondering if having the ram_counters.transferred to reflect acutal bytes,
> instead of the number of pages * pagesize will cause any user (or management
> code) to be confuse in any way.

It shouldn't, because we already have things that don't sent that data
as multiples:
- any compression code
- xbzrle

so I think we are right here.

> Other than that:
> Reviewed-by: Leonardo Bras <leobras@redhat.com>

Thanks, Juan.



^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH v7 05/12] migration: Make ram_save_target_page() a pointer
  2022-08-11  8:11   ` Leonardo Brás
@ 2022-08-19  9:51     ` Juan Quintela
  2022-08-20  7:14       ` Leonardo Bras Soares Passos
  0 siblings, 1 reply; 43+ messages in thread
From: Juan Quintela @ 2022-08-19  9:51 UTC (permalink / raw)
  To: Leonardo Brás
  Cc: qemu-devel, Marcel Apfelbaum, Dr. David Alan Gilbert, Peter Xu,
	Eric Blake, Philippe Mathieu-Daudé, Yanan Wang,
	Markus Armbruster, Eduardo Habkost

Leonardo Brás <leobras@redhat.com> wrote:
> On Tue, 2022-08-02 at 08:39 +0200, Juan Quintela wrote:
>> We are going to create a new function for multifd latest in the series.
>> 
>> Signed-off-by: Juan Quintela <quintela@redhat.com>
>> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
>> Signed-off-by: Juan Quintela <quintela@redhat.com>
>
> Double Signed-off-by again.
>
>> ---
>>  migration/ram.c | 13 +++++++++----
>>  1 file changed, 9 insertions(+), 4 deletions(-)
>> 
>> diff --git a/migration/ram.c b/migration/ram.c
>> index 85d89d61ac..499d9b2a90 100644
>> --- a/migration/ram.c
>> +++ b/migration/ram.c
>> @@ -310,6 +310,9 @@ typedef struct {
>>      bool preempted;
>>  } PostcopyPreemptState;
>>  
>> +typedef struct RAMState RAMState;
>> +typedef struct PageSearchStatus PageSearchStatus;
>> +
>>  /* State of RAM for migration */
>>  struct RAMState {
>>      /* QEMUFile used for this migration */
>> @@ -372,8 +375,9 @@ struct RAMState {
>>       * is enabled.
>>       */
>>      unsigned int postcopy_channel;
>> +
>> +    int (*ram_save_target_page)(RAMState *rs, PageSearchStatus *pss);
>>  };
>> -typedef struct RAMState RAMState;
>>  
>>  static RAMState *ram_state;
>>  
>> @@ -2255,14 +2259,14 @@ static bool save_compress_page(RAMState *rs, RAMBlock *block, ram_addr_t offset)
>>  }
>>  
>>  /**
>> - * ram_save_target_page: save one target page
>> + * ram_save_target_page_legacy: save one target page
>>   *
>>   * Returns the number of pages written
>>   *
>>   * @rs: current RAM state
>>   * @pss: data about the page we want to send
>>   */
>> -static int ram_save_target_page(RAMState *rs, PageSearchStatus *pss)
>> +static int ram_save_target_page_legacy(RAMState *rs, PageSearchStatus *pss)
>>  {
>>      RAMBlock *block = pss->block;
>>      ram_addr_t offset = ((ram_addr_t)pss->page) << TARGET_PAGE_BITS;
>> @@ -2469,7 +2473,7 @@ static int ram_save_host_page(RAMState *rs, PageSearchStatus *pss)
>>  
>>          /* Check the pages is dirty and if it is send it */
>>          if (migration_bitmap_clear_dirty(rs, pss->block, pss->page)) {
>> -            tmppages = ram_save_target_page(rs, pss);
>> +            tmppages = rs->ram_save_target_page(rs, pss);
>>              if (tmppages < 0) {
>>                  return tmppages;
>>              }
>> @@ -3223,6 +3227,7 @@ static int ram_save_setup(QEMUFile *f, void *opaque)
>>      ram_control_before_iterate(f, RAM_CONTROL_SETUP);
>>      ram_control_after_iterate(f, RAM_CONTROL_SETUP);
>>  
>> +    (*rsp)->ram_save_target_page = ram_save_target_page_legacy;
>>      ret =  multifd_send_sync_main(f);
>>      if (ret < 0) {
>>          return ret;
>
>
> So, IIUC:
> - Rename ram_save_target_page -> ram_save_target_page_legacy
> - Add a function pointer to RAMState (or a callback)
> - Assign function pointer = ram_save_target_page_legacy at setup
> - Replace ram_save_target_page() by indirect function call using above pointer.
>
> I could see no issue in this, so I belive it works fine.
>
> The only thing that concerns me is the name RAMState.

Every device state is setup in RAMState.

> IMHO, a struct named RAMState is supposed to just reflect the state of ram (or
> according to this struct's comments, the state of RAM for migration. Having a
> function pointer here that saves a page seems counterintuitive, since it does
> not reflect the state of RAM.

The big problem for adding another struct is that we would have to
change all the callers, or yet another global variable.  Both are bad
idea in my humble opinion.

> Maybe we could rename the struct, or even better, create another struct that
> could look something like this:
>
> struct RAMMigration {
>     RAMState state;
>     int (*ram_save_target_page)(RAMState *rs, PageSearchStatus *pss);
>     /* Other callbacks or further info.*/
> }
>
> What do you think about it?

Really this depends on configuration.  What is setup for qemu
migration.  I think this is the easiest way to do it, we can add a new
struct, but it gets everything much more complicated:

- the value that we receive in ram_save_setup() is a RAMState
- We would have to change all the callers form
  * ram_save_iterate()
  * ram_find_and_save_block()
  * ram_save_host_page()

So I think it is quite a bit of churn for not a lot of gain.

Later, Juan.



^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH v7 06/12] multifd: Make flags field thread local
  2022-08-11  9:04   ` Leonardo Brás
@ 2022-08-19 10:03     ` Juan Quintela
  2022-08-20  7:24       ` Leonardo Bras Soares Passos
  0 siblings, 1 reply; 43+ messages in thread
From: Juan Quintela @ 2022-08-19 10:03 UTC (permalink / raw)
  To: Leonardo Brás
  Cc: qemu-devel, Marcel Apfelbaum, Dr. David Alan Gilbert, Peter Xu,
	Eric Blake, Philippe Mathieu-Daudé, Yanan Wang,
	Markus Armbruster, Eduardo Habkost

Leonardo Brás <leobras@redhat.com> wrote:
> On Tue, 2022-08-02 at 08:39 +0200, Juan Quintela wrote:
>> Use of flags with respect to locking was incensistant.  For the
>> sending side:
>> - it was set to 0 with mutex held on the multifd channel.
>> - MULTIFD_FLAG_SYNC was set with mutex held on the migration thread.
>> - Everything else was done without the mutex held on the multifd channel.
>> 
>> On the reception side, it is not used on the migration thread, only on
>> the multifd channels threads.
>> 
>> So we move it to the multifd channels thread only variables, and we
>> introduce a new bool sync_needed on the send side to pass that information.
>> 
>> Signed-off-by: Juan Quintela <quintela@redhat.com>
>> ---
>>  migration/multifd.h | 10 ++++++----
>>  migration/multifd.c | 23 +++++++++++++----------
>>  2 files changed, 19 insertions(+), 14 deletions(-)
>> 
>> diff --git a/migration/multifd.h b/migration/multifd.h
>> index 36f899c56f..a67cefc0a2 100644
>> --- a/migration/multifd.h
>> +++ b/migration/multifd.h
>> @@ -98,12 +98,12 @@ typedef struct {
>
> Just noticed having no name in 'typedef struct' line makes it harder to
> understand what is going on. 

It is common idiom in QEMU.  The principal reason is that if you don't
want anyone to use "struct MultiFDSendParams" but MultiFDSendParams, the
best way to achieve that is to do it this way.

>> @@ -172,6 +172,8 @@ typedef struct {
>>  
>>      /* pointer to the packet */
>>      MultiFDPacket_t *packet;
>> +    /* multifd flags for each packet */
>> +    uint32_t flags;
>>      /* size of the next packet that contains pages */
>>      uint32_t next_packet_size;
>>      /* packets sent through this channel */
>
> So, IIUC, the struct member flags got moved down (same struct) to an area
> described as thread-local, meaning it does not need locking. 
>
> Interesting, I haven't noticed this different areas in the same struct.

It has changed in the last two weeks or so in upstream (it has been on
this patchset for several months.)


>
>> diff --git a/migration/multifd.c b/migration/multifd.c
>> index e25b529235..09a40a9135 100644
>> --- a/migration/multifd.c
>> +++ b/migration/multifd.c
>> @@ -602,7 +602,7 @@ int multifd_send_sync_main(QEMUFile *f)
>>          }
>>  
>>          p->packet_num = multifd_send_state->packet_num++;
>> -        p->flags |= MULTIFD_FLAG_SYNC;
>> +        p->sync_needed = true;
>>          p->pending_job++;
>>          qemu_mutex_unlock(&p->mutex);
>>          qemu_sem_post(&p->sem);
>> @@ -658,7 +658,11 @@ static void *multifd_send_thread(void *opaque)
>>  
>>          if (p->pending_job) {
>>              uint64_t packet_num = p->packet_num;
>> -            uint32_t flags = p->flags;
>> +            p->flags = 0;
>> +            if (p->sync_needed) {
>> +                p->flags |= MULTIFD_FLAG_SYNC;
>> +                p->sync_needed = false;
>> +            }
>
> Any particular reason why doing p->flags = 0, then p->flags |= MULTIFD_FLAG_SYNC
> ?

It is a bitmap field, and if there is anything on the future, we need to
set it.  I agree that when there is only one flag, it seems "weird".

> [1] Couldn't it be done without the |= , since it's already being set to zero
> before? (becoming "p->flags = MULTIFD_FLAG_SYNC" )

As said, easier to modify later, and also easier if we want to setup a
flag by default.

I agree that it is a matter of style/taste.

>>              p->normal_num = 0;
>>  
>>              if (use_zero_copy_send) {
>> @@ -680,14 +684,13 @@ static void *multifd_send_thread(void *opaque)
>>                  }
>>              }
>>              multifd_send_fill_packet(p);
>> -            p->flags = 0;
>>              p->num_packets++;
>>              p->total_normal_pages += p->normal_num;
>>              p->pages->num = 0;
>>              p->pages->block = NULL;
>>              qemu_mutex_unlock(&p->mutex);
>>  
>> -            trace_multifd_send(p->id, packet_num, p->normal_num, flags,
>> +            trace_multifd_send(p->id, packet_num, p->normal_num, p->flags,
>>                                 p->next_packet_size);
>>  
>>              if (use_zero_copy_send) {
>> @@ -715,7 +718,7 @@ static void *multifd_send_thread(void *opaque)
>>              p->pending_job--;
>>              qemu_mutex_unlock(&p->mutex);
>>  
>> -            if (flags & MULTIFD_FLAG_SYNC) {
>> +            if (p->flags & MULTIFD_FLAG_SYNC) {
>>                  qemu_sem_post(&p->sem_sync);
>>              }
>>              qemu_sem_post(&multifd_send_state->channels_ready);
>
> IIUC it uses p->sync_needed to keep the sync info, instead of the previous flags
> local var, and thus it can set p->flags = 0 earlier. Seems to not change any
> behavior AFAICS.

The protection of the global flags was being wrong.  That is the reason
that I decided to change it to the sync_needed.

The problem was that at some point we were still sending a packet (that
shouldn't have the SYNC flag enabled), but we received a
multifd_main_sync() and it got enabled anyways.  The easier way that I
found te fix it was this way.

Problem was difficult to detect, that is the reason that I change it
this way.

>> -        if (flags & MULTIFD_FLAG_SYNC) {
>> +        if (sync_needed) {
>>              qemu_sem_post(&multifd_recv_state->sem_sync);
>>              qemu_sem_wait(&p->sem_sync);
>>          }
>
> Ok, IIUC this part should have the same behavior as before, but using a bool
> instead of an u32.

I changed it to make sure that we only checked the flags at the
beggining of the function, with the lock taken.

>
> FWIW:
> Reviewed-by: Leonardo Bras <leobras@redhat.com>

Thanks, Juan.



^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH v7 07/12] multifd: Prepare to send a packet without the mutex held
  2022-08-11  9:16   ` Leonardo Brás
@ 2022-08-19 11:32     ` Juan Quintela
  2022-08-20  7:27       ` Leonardo Bras Soares Passos
  0 siblings, 1 reply; 43+ messages in thread
From: Juan Quintela @ 2022-08-19 11:32 UTC (permalink / raw)
  To: Leonardo Brás
  Cc: qemu-devel, Marcel Apfelbaum, Dr. David Alan Gilbert, Peter Xu,
	Eric Blake, Philippe Mathieu-Daudé, Yanan Wang,
	Markus Armbruster, Eduardo Habkost

Leonardo Brás <leobras@redhat.com> wrote:
> On Tue, 2022-08-02 at 08:39 +0200, Juan Quintela wrote:
>> We do the send_prepare() and the fill of the head packet without the
>> mutex held.  It will help a lot for compression and later in the
>> series for zero pages.
>> 
>> Notice that we can use p->pages without holding p->mutex because
>> p->pending_job == 1.
>> 
>> Signed-off-by: Juan Quintela <quintela@redhat.com>
>> ---
>>  migration/multifd.h |  2 ++
>>  migration/multifd.c | 11 ++++++-----
>>  2 files changed, 8 insertions(+), 5 deletions(-)
>> 
>> diff --git a/migration/multifd.h b/migration/multifd.h
>> index a67cefc0a2..cd389d18d2 100644
>> --- a/migration/multifd.h
>> +++ b/migration/multifd.h
>> @@ -109,7 +109,9 @@ typedef struct {
>>      /* array of pages to sent.
>>       * The owner of 'pages' depends of 'pending_job' value:
>>       * pending_job == 0 -> migration_thread can use it.
>> +     *                     No need for mutex lock.
>>       * pending_job != 0 -> multifd_channel can use it.
>> +     *                     No need for mutex lock.
>>       */
>>      MultiFDPages_t *pages;
>>  
>> diff --git a/migration/multifd.c b/migration/multifd.c
>> index 09a40a9135..68fc9f8e88 100644
>> --- a/migration/multifd.c
>> +++ b/migration/multifd.c
>> @@ -663,6 +663,8 @@ static void *multifd_send_thread(void *opaque)
>>                  p->flags |= MULTIFD_FLAG_SYNC;
>>                  p->sync_needed = false;
>>              }
>> +            qemu_mutex_unlock(&p->mutex);
>> +
>
> If it unlocks here, we will have unprotected:
> for (int i = 0; i < p->pages->num; i++) {
>     p->normal[p->normal_num] = p->pages->offset[i];
>     p->normal_num++;
> }
>
> And p->pages seems to be in the mutex-protected area.
> Should it be ok?

From the documentation:

    /* array of pages to sent.
     * The owner of 'pages' depends of 'pending_job' value:
     * pending_job == 0 -> migration_thread can use it.
     *                     No need for mutex lock.
     * pending_job != 0 -> multifd_channel can use it.
     *                     No need for mutex lock.
     */
    MultiFDPages_t *pages;

So, it is right.

> Also, under that we have:
>             if (p->normal_num) {
>                 ret = multifd_send_state->ops->send_prepare(p, &local_err);
>                 if (ret != 0) {
>                     qemu_mutex_unlock(&p->mutex);
>                     break;
>                 }
>             }
>
> Calling mutex_unlock() here, even though the unlock already happened before,
> could cause any issue?

Good catch.  Never got an error there.

Removing that bit.

> Best regards,


Thanks, Juan.



^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH v7 08/12] multifd: Add capability to enable/disable zero_page
  2022-08-11  9:29   ` Leonardo Brás
@ 2022-08-19 11:36     ` Juan Quintela
  0 siblings, 0 replies; 43+ messages in thread
From: Juan Quintela @ 2022-08-19 11:36 UTC (permalink / raw)
  To: Leonardo Brás
  Cc: qemu-devel, Marcel Apfelbaum, Dr. David Alan Gilbert, Peter Xu,
	Eric Blake, Philippe Mathieu-Daudé, Yanan Wang,
	Markus Armbruster, Eduardo Habkost

Leonardo Brás <leobras@redhat.com> wrote:
> On Tue, 2022-08-02 at 08:39 +0200, Juan Quintela wrote:
>> We have to enable it by default until we introduce the new code.
>> 
>> Signed-off-by: Juan Quintela <quintela@redhat.com>
>> 
>> ---
>> 
>> Change it to a capability.  As capabilities are off by default, have
>> to change MULTIFD_ZERO_PAGE to MAIN_ZERO_PAGE, so it is false for
>> default, and true for older versions.
>
> IIUC, the idea of a capability is to introduce some new features to the code,
> and let users enable or disable it. 

All capabilities are false by default.
If we change the capability to be true by default, we need to teach
libvirt new tricks.

> If it introduce a new capability, is not very intuitive to think that it will be
> always true for older versions, and false for new ones.

It don't need to be intuitive, it just need to be documented correctly.
I think that is done, no?

> I would suggest adding it as MULTIFD_ZERO_PAGE, and let it disabled for now.
> When the full feature gets introduced, the capability could be enabled by
> default, if desired.

I have it that way before it was a capability.
but the info migrate_capabilities

showed everything false and this one true, wondering _why_ this one is
true.

So I decided to rename it, and make it true by default.

> What do you think?

I preffer it this way, why?

Because at some point in the future, we will remove the code that
implements the capability and the capability.  So the idea is that we
don't want to use it for old code.

Later, Juan.



^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH v7 05/12] migration: Make ram_save_target_page() a pointer
  2022-08-19  9:51     ` Juan Quintela
@ 2022-08-20  7:14       ` Leonardo Bras Soares Passos
  2022-08-22 21:35         ` Juan Quintela
  0 siblings, 1 reply; 43+ messages in thread
From: Leonardo Bras Soares Passos @ 2022-08-20  7:14 UTC (permalink / raw)
  To: Juan Quintela
  Cc: qemu-devel, Marcel Apfelbaum, Dr. David Alan Gilbert, Peter Xu,
	Eric Blake, Philippe Mathieu-Daudé, Yanan Wang,
	Markus Armbruster, Eduardo Habkost

On Fri, Aug 19, 2022 at 6:52 AM Juan Quintela <quintela@redhat.com> wrote:
>
> Leonardo Brás <leobras@redhat.com> wrote:
> > On Tue, 2022-08-02 at 08:39 +0200, Juan Quintela wrote:
> >> We are going to create a new function for multifd latest in the series.
> >>
> >> Signed-off-by: Juan Quintela <quintela@redhat.com>
> >> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
> >> Signed-off-by: Juan Quintela <quintela@redhat.com>
> >
> > Double Signed-off-by again.
> >
> >> ---
> >>  migration/ram.c | 13 +++++++++----
> >>  1 file changed, 9 insertions(+), 4 deletions(-)
> >>
> >> diff --git a/migration/ram.c b/migration/ram.c
> >> index 85d89d61ac..499d9b2a90 100644
> >> --- a/migration/ram.c
> >> +++ b/migration/ram.c
> >> @@ -310,6 +310,9 @@ typedef struct {
> >>      bool preempted;
> >>  } PostcopyPreemptState;
> >>
> >> +typedef struct RAMState RAMState;
> >> +typedef struct PageSearchStatus PageSearchStatus;
> >> +
> >>  /* State of RAM for migration */
> >>  struct RAMState {
> >>      /* QEMUFile used for this migration */
> >> @@ -372,8 +375,9 @@ struct RAMState {
> >>       * is enabled.
> >>       */
> >>      unsigned int postcopy_channel;
> >> +
> >> +    int (*ram_save_target_page)(RAMState *rs, PageSearchStatus *pss);
> >>  };
> >> -typedef struct RAMState RAMState;
> >>
> >>  static RAMState *ram_state;
> >>
> >> @@ -2255,14 +2259,14 @@ static bool save_compress_page(RAMState *rs, RAMBlock *block, ram_addr_t offset)
> >>  }
> >>
> >>  /**
> >> - * ram_save_target_page: save one target page
> >> + * ram_save_target_page_legacy: save one target page
> >>   *
> >>   * Returns the number of pages written
> >>   *
> >>   * @rs: current RAM state
> >>   * @pss: data about the page we want to send
> >>   */
> >> -static int ram_save_target_page(RAMState *rs, PageSearchStatus *pss)
> >> +static int ram_save_target_page_legacy(RAMState *rs, PageSearchStatus *pss)
> >>  {
> >>      RAMBlock *block = pss->block;
> >>      ram_addr_t offset = ((ram_addr_t)pss->page) << TARGET_PAGE_BITS;
> >> @@ -2469,7 +2473,7 @@ static int ram_save_host_page(RAMState *rs, PageSearchStatus *pss)
> >>
> >>          /* Check the pages is dirty and if it is send it */
> >>          if (migration_bitmap_clear_dirty(rs, pss->block, pss->page)) {
> >> -            tmppages = ram_save_target_page(rs, pss);
> >> +            tmppages = rs->ram_save_target_page(rs, pss);
> >>              if (tmppages < 0) {
> >>                  return tmppages;
> >>              }
> >> @@ -3223,6 +3227,7 @@ static int ram_save_setup(QEMUFile *f, void *opaque)
> >>      ram_control_before_iterate(f, RAM_CONTROL_SETUP);
> >>      ram_control_after_iterate(f, RAM_CONTROL_SETUP);
> >>
> >> +    (*rsp)->ram_save_target_page = ram_save_target_page_legacy;
> >>      ret =  multifd_send_sync_main(f);
> >>      if (ret < 0) {
> >>          return ret;
> >
> >
> > So, IIUC:
> > - Rename ram_save_target_page -> ram_save_target_page_legacy
> > - Add a function pointer to RAMState (or a callback)
> > - Assign function pointer = ram_save_target_page_legacy at setup
> > - Replace ram_save_target_page() by indirect function call using above pointer.
> >
> > I could see no issue in this, so I belive it works fine.
> >
> > The only thing that concerns me is the name RAMState.
>
> Every device state is setup in RAMState.
>
> > IMHO, a struct named RAMState is supposed to just reflect the state of ram (or
> > according to this struct's comments, the state of RAM for migration. Having a
> > function pointer here that saves a page seems counterintuitive, since it does
> > not reflect the state of RAM.
>
> The big problem for adding another struct is that we would have to
> change all the callers, or yet another global variable.  Both are bad
> idea in my humble opinion.
>
> > Maybe we could rename the struct, or even better, create another struct that
> > could look something like this:
> >
> > struct RAMMigration {
> >     RAMState state;
> >     int (*ram_save_target_page)(RAMState *rs, PageSearchStatus *pss);
> >     /* Other callbacks or further info.*/
> > }
> >
> > What do you think about it?
>
> Really this depends on configuration.  What is setup for qemu
> migration.  I think this is the easiest way to do it, we can add a new
> struct, but it gets everything much more complicated:
>
> - the value that we receive in ram_save_setup() is a RAMState
> - We would have to change all the callers form
>   * ram_save_iterate()
>   * ram_find_and_save_block()
>   * ram_save_host_page()

Maybe RAMState could be part of a bigger struct, and we could use
something like a container_of().
So whenever you want to use it, it would be available.

What about that?

>
> So I think it is quite a bit of churn for not a lot of gain.
>
> Later, Juan.
>



^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH v7 06/12] multifd: Make flags field thread local
  2022-08-19 10:03     ` Juan Quintela
@ 2022-08-20  7:24       ` Leonardo Bras Soares Passos
  2022-08-23 13:00         ` Juan Quintela
  0 siblings, 1 reply; 43+ messages in thread
From: Leonardo Bras Soares Passos @ 2022-08-20  7:24 UTC (permalink / raw)
  To: Juan Quintela
  Cc: qemu-devel, Marcel Apfelbaum, Dr. David Alan Gilbert, Peter Xu,
	Eric Blake, Philippe Mathieu-Daudé, Yanan Wang,
	Markus Armbruster, Eduardo Habkost

On Fri, Aug 19, 2022 at 7:03 AM Juan Quintela <quintela@redhat.com> wrote:
>
> Leonardo Brás <leobras@redhat.com> wrote:
> > On Tue, 2022-08-02 at 08:39 +0200, Juan Quintela wrote:
> >> Use of flags with respect to locking was incensistant.  For the
> >> sending side:
> >> - it was set to 0 with mutex held on the multifd channel.
> >> - MULTIFD_FLAG_SYNC was set with mutex held on the migration thread.
> >> - Everything else was done without the mutex held on the multifd channel.
> >>
> >> On the reception side, it is not used on the migration thread, only on
> >> the multifd channels threads.
> >>
> >> So we move it to the multifd channels thread only variables, and we
> >> introduce a new bool sync_needed on the send side to pass that information.
> >>
> >> Signed-off-by: Juan Quintela <quintela@redhat.com>
> >> ---
> >>  migration/multifd.h | 10 ++++++----
> >>  migration/multifd.c | 23 +++++++++++++----------
> >>  2 files changed, 19 insertions(+), 14 deletions(-)
> >>
> >> diff --git a/migration/multifd.h b/migration/multifd.h
> >> index 36f899c56f..a67cefc0a2 100644
> >> --- a/migration/multifd.h
> >> +++ b/migration/multifd.h
> >> @@ -98,12 +98,12 @@ typedef struct {
> >
> > Just noticed having no name in 'typedef struct' line makes it harder to
> > understand what is going on.
>
> It is common idiom in QEMU.  The principal reason is that if you don't
> want anyone to use "struct MultiFDSendParams" but MultiFDSendParams, the
> best way to achieve that is to do it this way.

I agree, but a comment after the typedef could help reviewing. Something like

typedef struct { /* MultiFDSendParams */
...
} MultiFDSendParams

Becomes this in diff:

diff --git a/migration/multifd.h b/migration/multifd.h
index 134e6a7f19..93bb3a7f4a 100644
--- a/migration/multifd.h
+++ b/migration/multifd.h
@@ -90,6 +90,7 @@ typedef struct { /* MultiFDSendParams */
[...]


>
> >> @@ -172,6 +172,8 @@ typedef struct {
> >>
> >>      /* pointer to the packet */
> >>      MultiFDPacket_t *packet;
> >> +    /* multifd flags for each packet */
> >> +    uint32_t flags;
> >>      /* size of the next packet that contains pages */
> >>      uint32_t next_packet_size;
> >>      /* packets sent through this channel */
> >
> > So, IIUC, the struct member flags got moved down (same struct) to an area
> > described as thread-local, meaning it does not need locking.
> >
> > Interesting, I haven't noticed this different areas in the same struct.
>
> It has changed in the last two weeks or so in upstream (it has been on
> this patchset for several months.)

Nice :)

>
>
> >
> >> diff --git a/migration/multifd.c b/migration/multifd.c
> >> index e25b529235..09a40a9135 100644
> >> --- a/migration/multifd.c
> >> +++ b/migration/multifd.c
> >> @@ -602,7 +602,7 @@ int multifd_send_sync_main(QEMUFile *f)
> >>          }
> >>
> >>          p->packet_num = multifd_send_state->packet_num++;
> >> -        p->flags |= MULTIFD_FLAG_SYNC;
> >> +        p->sync_needed = true;
> >>          p->pending_job++;
> >>          qemu_mutex_unlock(&p->mutex);
> >>          qemu_sem_post(&p->sem);
> >> @@ -658,7 +658,11 @@ static void *multifd_send_thread(void *opaque)
> >>
> >>          if (p->pending_job) {
> >>              uint64_t packet_num = p->packet_num;
> >> -            uint32_t flags = p->flags;
> >> +            p->flags = 0;
> >> +            if (p->sync_needed) {
> >> +                p->flags |= MULTIFD_FLAG_SYNC;
> >> +                p->sync_needed = false;
> >> +            }
> >
> > Any particular reason why doing p->flags = 0, then p->flags |= MULTIFD_FLAG_SYNC
> > ?
>
> It is a bitmap field, and if there is anything on the future, we need to
> set it.  I agree that when there is only one flag, it seems "weird".
>
> > [1] Couldn't it be done without the |= , since it's already being set to zero
> > before? (becoming "p->flags = MULTIFD_FLAG_SYNC" )
>
> As said, easier to modify later, and also easier if we want to setup a
> flag by default.

Yeah, I agree. It makes sense now.

Thanks

>
> I agree that it is a matter of style/taste.
>
> >>              p->normal_num = 0;
> >>
> >>              if (use_zero_copy_send) {
> >> @@ -680,14 +684,13 @@ static void *multifd_send_thread(void *opaque)
> >>                  }
> >>              }
> >>              multifd_send_fill_packet(p);
> >> -            p->flags = 0;
> >>              p->num_packets++;
> >>              p->total_normal_pages += p->normal_num;
> >>              p->pages->num = 0;
> >>              p->pages->block = NULL;
> >>              qemu_mutex_unlock(&p->mutex);
> >>
> >> -            trace_multifd_send(p->id, packet_num, p->normal_num, flags,
> >> +            trace_multifd_send(p->id, packet_num, p->normal_num, p->flags,
> >>                                 p->next_packet_size);
> >>
> >>              if (use_zero_copy_send) {
> >> @@ -715,7 +718,7 @@ static void *multifd_send_thread(void *opaque)
> >>              p->pending_job--;
> >>              qemu_mutex_unlock(&p->mutex);
> >>
> >> -            if (flags & MULTIFD_FLAG_SYNC) {
> >> +            if (p->flags & MULTIFD_FLAG_SYNC) {
> >>                  qemu_sem_post(&p->sem_sync);
> >>              }
> >>              qemu_sem_post(&multifd_send_state->channels_ready);
> >
> > IIUC it uses p->sync_needed to keep the sync info, instead of the previous flags
> > local var, and thus it can set p->flags = 0 earlier. Seems to not change any
> > behavior AFAICS.
>
> The protection of the global flags was being wrong.  That is the reason
> that I decided to change it to the sync_needed.
>
> The problem was that at some point we were still sending a packet (that
> shouldn't have the SYNC flag enabled), but we received a
> multifd_main_sync() and it got enabled anyways.  The easier way that I
> found te fix it was this way.
>
> Problem was difficult to detect, that is the reason that I change it
> this way.

Oh, I see.

>
> >> -        if (flags & MULTIFD_FLAG_SYNC) {
> >> +        if (sync_needed) {
> >>              qemu_sem_post(&multifd_recv_state->sem_sync);
> >>              qemu_sem_wait(&p->sem_sync);
> >>          }
> >
> > Ok, IIUC this part should have the same behavior as before, but using a bool
> > instead of an u32.
>
> I changed it to make sure that we only checked the flags at the
> beggining of the function, with the lock taken.

Thanks for sharing!

Best regards,
Leo

>
> >
> > FWIW:
> > Reviewed-by: Leonardo Bras <leobras@redhat.com>
>
> Thanks, Juan.
>



^ permalink raw reply related	[flat|nested] 43+ messages in thread

* Re: [PATCH v7 07/12] multifd: Prepare to send a packet without the mutex held
  2022-08-19 11:32     ` Juan Quintela
@ 2022-08-20  7:27       ` Leonardo Bras Soares Passos
  0 siblings, 0 replies; 43+ messages in thread
From: Leonardo Bras Soares Passos @ 2022-08-20  7:27 UTC (permalink / raw)
  To: Juan Quintela
  Cc: qemu-devel, Marcel Apfelbaum, Dr. David Alan Gilbert, Peter Xu,
	Eric Blake, Philippe Mathieu-Daudé, Yanan Wang,
	Markus Armbruster, Eduardo Habkost

On Fri, Aug 19, 2022 at 8:32 AM Juan Quintela <quintela@redhat.com> wrote:
>
> Leonardo Brás <leobras@redhat.com> wrote:
> > On Tue, 2022-08-02 at 08:39 +0200, Juan Quintela wrote:
> >> We do the send_prepare() and the fill of the head packet without the
> >> mutex held.  It will help a lot for compression and later in the
> >> series for zero pages.
> >>
> >> Notice that we can use p->pages without holding p->mutex because
> >> p->pending_job == 1.
> >>
> >> Signed-off-by: Juan Quintela <quintela@redhat.com>
> >> ---
> >>  migration/multifd.h |  2 ++
> >>  migration/multifd.c | 11 ++++++-----
> >>  2 files changed, 8 insertions(+), 5 deletions(-)
> >>
> >> diff --git a/migration/multifd.h b/migration/multifd.h
> >> index a67cefc0a2..cd389d18d2 100644
> >> --- a/migration/multifd.h
> >> +++ b/migration/multifd.h
> >> @@ -109,7 +109,9 @@ typedef struct {
> >>      /* array of pages to sent.
> >>       * The owner of 'pages' depends of 'pending_job' value:
> >>       * pending_job == 0 -> migration_thread can use it.
> >> +     *                     No need for mutex lock.
> >>       * pending_job != 0 -> multifd_channel can use it.
> >> +     *                     No need for mutex lock.
> >>       */
> >>      MultiFDPages_t *pages;
> >>
> >> diff --git a/migration/multifd.c b/migration/multifd.c
> >> index 09a40a9135..68fc9f8e88 100644
> >> --- a/migration/multifd.c
> >> +++ b/migration/multifd.c
> >> @@ -663,6 +663,8 @@ static void *multifd_send_thread(void *opaque)
> >>                  p->flags |= MULTIFD_FLAG_SYNC;
> >>                  p->sync_needed = false;
> >>              }
> >> +            qemu_mutex_unlock(&p->mutex);
> >> +
> >
> > If it unlocks here, we will have unprotected:
> > for (int i = 0; i < p->pages->num; i++) {
> >     p->normal[p->normal_num] = p->pages->offset[i];
> >     p->normal_num++;
> > }
> >
> > And p->pages seems to be in the mutex-protected area.
> > Should it be ok?
>
> From the documentation:
>
>     /* array of pages to sent.
>      * The owner of 'pages' depends of 'pending_job' value:
>      * pending_job == 0 -> migration_thread can use it.
>      *                     No need for mutex lock.
>      * pending_job != 0 -> multifd_channel can use it.
>      *                     No need for mutex lock.
>      */
>     MultiFDPages_t *pages;
>
> So, it is right.

Oh, right. I missed that part earlier .

>
> > Also, under that we have:
> >             if (p->normal_num) {
> >                 ret = multifd_send_state->ops->send_prepare(p, &local_err);
> >                 if (ret != 0) {
> >                     qemu_mutex_unlock(&p->mutex);
> >                     break;
> >                 }
> >             }
> >
> > Calling mutex_unlock() here, even though the unlock already happened before,
> > could cause any issue?
>
> Good catch.  Never got an error there.
>
> Removing that bit.

Thanks!

Best regards,
Leo

>
> > Best regards,
>
>
> Thanks, Juan.
>



^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH v7 05/12] migration: Make ram_save_target_page() a pointer
  2022-08-20  7:14       ` Leonardo Bras Soares Passos
@ 2022-08-22 21:35         ` Juan Quintela
  0 siblings, 0 replies; 43+ messages in thread
From: Juan Quintela @ 2022-08-22 21:35 UTC (permalink / raw)
  To: Leonardo Bras Soares Passos
  Cc: qemu-devel, Marcel Apfelbaum, Dr. David Alan Gilbert, Peter Xu,
	Eric Blake, Philippe Mathieu-Daudé, Yanan Wang,
	Markus Armbruster, Eduardo Habkost

Leonardo Bras Soares Passos <leobras@redhat.com> wrote:
> On Fri, Aug 19, 2022 at 6:52 AM Juan Quintela <quintela@redhat.com> wrote:
>>
>> - the value that we receive in ram_save_setup() is a RAMState
>> - We would have to change all the callers form
>>   * ram_save_iterate()
>>   * ram_find_and_save_block()
>>   * ram_save_host_page()
>
> Maybe RAMState could be part of a bigger struct, and we could use
> something like a container_of().
> So whenever you want to use it, it would be available.
>
> What about that?

New struct it is:

typedef struct {
           int (*ram_save_target_page)(RAMState *rs, PageSearchStatus *pss);
} MigrationOps;

And go from there.

Later, Juan.



^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH v7 06/12] multifd: Make flags field thread local
  2022-08-20  7:24       ` Leonardo Bras Soares Passos
@ 2022-08-23 13:00         ` Juan Quintela
  0 siblings, 0 replies; 43+ messages in thread
From: Juan Quintela @ 2022-08-23 13:00 UTC (permalink / raw)
  To: Leonardo Bras Soares Passos
  Cc: qemu-devel, Marcel Apfelbaum, Dr. David Alan Gilbert, Peter Xu,
	Eric Blake, Philippe Mathieu-Daudé, Yanan Wang,
	Markus Armbruster, Eduardo Habkost

Leonardo Bras Soares Passos <leobras@redhat.com> wrote:
> On Fri, Aug 19, 2022 at 7:03 AM Juan Quintela <quintela@redhat.com> wrote:
>>
>> Leonardo Brás <leobras@redhat.com> wrote:
>> > On Tue, 2022-08-02 at 08:39 +0200, Juan Quintela wrote:
>> >> Use of flags with respect to locking was incensistant.  For the
>> >> sending side:
>> >> - it was set to 0 with mutex held on the multifd channel.
>> >> - MULTIFD_FLAG_SYNC was set with mutex held on the migration thread.
>> >> - Everything else was done without the mutex held on the multifd channel.
>> >>
>> >> On the reception side, it is not used on the migration thread, only on
>> >> the multifd channels threads.
>> >>
>> >> So we move it to the multifd channels thread only variables, and we
>> >> introduce a new bool sync_needed on the send side to pass that information.
>> >>
>> >> Signed-off-by: Juan Quintela <quintela@redhat.com>
>> >> ---
>> >>  migration/multifd.h | 10 ++++++----
>> >>  migration/multifd.c | 23 +++++++++++++----------
>> >>  2 files changed, 19 insertions(+), 14 deletions(-)
>> >>
>> >> diff --git a/migration/multifd.h b/migration/multifd.h
>> >> index 36f899c56f..a67cefc0a2 100644
>> >> --- a/migration/multifd.h
>> >> +++ b/migration/multifd.h
>> >> @@ -98,12 +98,12 @@ typedef struct {
>> >
>> > Just noticed having no name in 'typedef struct' line makes it harder to
>> > understand what is going on.
>>
>> It is common idiom in QEMU.  The principal reason is that if you don't
>> want anyone to use "struct MultiFDSendParams" but MultiFDSendParams, the
>> best way to achieve that is to do it this way.
>
> I agree, but a comment after the typedef could help reviewing. Something like
>
> typedef struct { /* MultiFDSendParams */
> ...
> } MultiFDSendParams

You have a point here.  Not putting a comment, putting the real thing.

Thanks, Juan.



^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH v7 10/12] multifd: Support for zero pages transmission
  2022-08-02  6:39 ` [PATCH v7 10/12] multifd: Support for zero pages transmission Juan Quintela
@ 2022-09-02 13:27   ` Leonardo Brás
  2022-11-14 12:09     ` Juan Quintela
  2022-10-25  9:10   ` chuang xu
  1 sibling, 1 reply; 43+ messages in thread
From: Leonardo Brás @ 2022-09-02 13:27 UTC (permalink / raw)
  To: Juan Quintela, qemu-devel
  Cc: Marcel Apfelbaum, Dr. David Alan Gilbert, Peter Xu, Eric Blake,
	Philippe Mathieu-Daudé, Yanan Wang, Markus Armbruster,
	Eduardo Habkost

On Tue, 2022-08-02 at 08:39 +0200, Juan Quintela wrote:
> This patch adds counters and similar.  Logic will be added on the
> following patch.
> 
> Signed-off-by: Juan Quintela <quintela@redhat.com>
> 
> ---
> 
> Added counters for duplicated/non duplicated pages.
> Removed reviewed by from David.
> Add total_zero_pages
> ---
>  migration/multifd.h    | 17 ++++++++++++++++-
>  migration/multifd.c    | 36 +++++++++++++++++++++++++++++-------
>  migration/ram.c        |  2 --
>  migration/trace-events |  8 ++++----
>  4 files changed, 49 insertions(+), 14 deletions(-)
> 
> diff --git a/migration/multifd.h b/migration/multifd.h
> index cd389d18d2..a1b852200d 100644
> --- a/migration/multifd.h
> +++ b/migration/multifd.h
> @@ -47,7 +47,10 @@ typedef struct {
>      /* size of the next packet that contains pages */
>      uint32_t next_packet_size;
>      uint64_t packet_num;
> -    uint64_t unused[4];    /* Reserved for future use */
> +    /* zero pages */
> +    uint32_t zero_pages;
> +    uint32_t unused32[1];    /* Reserved for future use */
> +    uint64_t unused64[3];    /* Reserved for future use */
>      char ramblock[256];
>      uint64_t offset[];
>  } __attribute__((packed)) MultiFDPacket_t;
> @@ -127,6 +130,8 @@ typedef struct {
>      uint64_t num_packets;
>      /* non zero pages sent through this channel */
>      uint64_t total_normal_pages;
> +    /* zero pages sent through this channel */
> +    uint64_t total_zero_pages;
>      /* buffers to send */
>      struct iovec *iov;
>      /* number of iovs used */
> @@ -135,6 +140,10 @@ typedef struct {
>      ram_addr_t *normal;
>      /* num of non zero pages */
>      uint32_t normal_num;
> +    /* Pages that are  zero */
> +    ram_addr_t *zero;
> +    /* num of zero pages */
> +    uint32_t zero_num;

More of an organization viewpoint: 
I can't see total_zero_pages, zero[] and zero_num as Multifd "Parameters". 
But OTOH there are other data like this in the struct for keeping migration
status, so not an issue.

>      /* used for compression methods */
>      void *data;
>  }  MultiFDSendParams;
> @@ -184,12 +193,18 @@ typedef struct {
>      uint8_t *host;
>      /* non zero pages recv through this channel */
>      uint64_t total_normal_pages;
> +    /* zero pages recv through this channel */
> +    uint64_t total_zero_pages;
>      /* buffers to recv */
>      struct iovec *iov;
>      /* Pages that are not zero */
>      ram_addr_t *normal;
>      /* num of non zero pages */
>      uint32_t normal_num;
> +    /* Pages that are  zero */
> +    ram_addr_t *zero;
> +    /* num of zero pages */
> +    uint32_t zero_num;
>      /* used for de-compression methods */
>      void *data;
>  } MultiFDRecvParams;
> diff --git a/migration/multifd.c b/migration/multifd.c
> index 68fc9f8e88..4473d9f834 100644
> --- a/migration/multifd.c
> +++ b/migration/multifd.c
> @@ -263,6 +263,7 @@ static void multifd_send_fill_packet(MultiFDSendParams *p)
>      packet->normal_pages = cpu_to_be32(p->normal_num);
>      packet->next_packet_size = cpu_to_be32(p->next_packet_size);
>      packet->packet_num = cpu_to_be64(p->packet_num);
> +    packet->zero_pages = cpu_to_be32(p->zero_num);
>  
>      if (p->pages->block) {
>          strncpy(packet->ramblock, p->pages->block->idstr, 256);
> @@ -323,7 +324,15 @@ static int multifd_recv_unfill_packet(MultiFDRecvParams *p, Error **errp)
>      p->next_packet_size = be32_to_cpu(packet->next_packet_size);
>      p->packet_num = be64_to_cpu(packet->packet_num);
>  
> -    if (p->normal_num == 0) {
> +    p->zero_num = be32_to_cpu(packet->zero_pages);
> +    if (p->zero_num > packet->pages_alloc - p->normal_num) {
> +        error_setg(errp, "multifd: received packet "
> +                   "with %u zero pages and expected maximum pages are %u",
> +                   p->zero_num, packet->pages_alloc - p->normal_num) ;
> +        return -1;
> +    }
> +
> +    if (p->normal_num == 0 && p->zero_num == 0) {
>          return 0;
>      }
>  
> @@ -432,6 +441,8 @@ static int multifd_send_pages(QEMUFile *f)
>      ram_counters.multifd_bytes += p->sent_bytes;
>      qemu_file_acct_rate_limit(f, p->sent_bytes);
>      p->sent_bytes = 0;
> +    ram_counters.normal += p->normal_num;
> +    ram_counters.duplicate += p->zero_num;
>      qemu_mutex_unlock(&p->mutex);
>      qemu_sem_post(&p->sem);
>  
> @@ -545,6 +556,8 @@ void multifd_save_cleanup(void)
>          p->iov = NULL;
>          g_free(p->normal);
>          p->normal = NULL;
> +        g_free(p->zero);
> +        p->zero = NULL;
>          multifd_send_state->ops->send_cleanup(p, &local_err);
>          if (local_err) {
>              migrate_set_error(migrate_get_current(), local_err);
> @@ -666,6 +679,7 @@ static void *multifd_send_thread(void *opaque)
>              qemu_mutex_unlock(&p->mutex);
>  
>              p->normal_num = 0;
> +            p->zero_num = 0;
>  
>              if (use_zero_copy_send) {
>                  p->iovs_num = 0;
> @@ -687,8 +701,8 @@ static void *multifd_send_thread(void *opaque)
>              }
>              multifd_send_fill_packet(p);
>  
> -            trace_multifd_send(p->id, packet_num, p->normal_num, p->flags,
> -                               p->next_packet_size);
> +            trace_multifd_send(p->id, packet_num, p->normal_num, p->zero_num,
> +                               p->flags, p->next_packet_size);
>  
>              if (use_zero_copy_send) {
>                  /* Send header first, without zerocopy */
> @@ -712,6 +726,7 @@ static void *multifd_send_thread(void *opaque)
>              qemu_mutex_lock(&p->mutex);
>              p->num_packets++;
>              p->total_normal_pages += p->normal_num;
> +            p->total_zero_pages += p->zero_num;

I can see it getting declared, incremented and used. But where is it initialized
in zero? I mean, should it not have 'p->total_normal_pages = 0;' somewhere in
setup?

(I understand multifd_save_setup() allocates a multifd_send_state->params with
g_new0(),but other variables are zeroed there, like p->pending_job and 
p->write_flags, so why not?)   

>              p->pages->num = 0;
>              p->pages->block = NULL;
>              p->sent_bytes += p->packet_len;;
> @@ -753,7 +768,8 @@ out:
>      qemu_mutex_unlock(&p->mutex);
>  
>      rcu_unregister_thread();
> -    trace_multifd_send_thread_end(p->id, p->num_packets, p->total_normal_pages);
> +    trace_multifd_send_thread_end(p->id, p->num_packets, p->total_normal_pages,
> +                                  p->total_zero_pages);
>  
>      return NULL;
>  }
> @@ -938,6 +954,7 @@ int multifd_save_setup(Error **errp)
>          p->normal = g_new0(ram_addr_t, page_count);
>          p->page_size = qemu_target_page_size();
>          p->page_count = page_count;
> +        p->zero = g_new0(ram_addr_t, page_count);
>  
>          if (migrate_use_zero_copy_send()) {
>              p->write_flags = QIO_CHANNEL_WRITE_FLAG_ZERO_COPY;
> @@ -1046,6 +1063,8 @@ int multifd_load_cleanup(Error **errp)
>          p->iov = NULL;
>          g_free(p->normal);
>          p->normal = NULL;
> +        g_free(p->zero);
> +        p->zero = NULL;
>          multifd_recv_state->ops->recv_cleanup(p);
>      }
>      qemu_sem_destroy(&multifd_recv_state->sem_sync);
> @@ -1116,13 +1135,14 @@ static void *multifd_recv_thread(void *opaque)
>              break;
>          }
>  
> -        trace_multifd_recv(p->id, p->packet_num, p->normal_num, p->flags,
> -                           p->next_packet_size);
> +        trace_multifd_recv(p->id, p->packet_num, p->normal_num, p->zero_num,
> +                           p->flags, p->next_packet_size);
>          sync_needed = p->flags & MULTIFD_FLAG_SYNC;
>          /* recv methods don't know how to handle the SYNC flag */
>          p->flags &= ~MULTIFD_FLAG_SYNC;
>          p->num_packets++;
>          p->total_normal_pages += p->normal_num;
> +        p->total_normal_pages += p->zero_num;
>          qemu_mutex_unlock(&p->mutex);
>  
>          if (p->normal_num) {
> @@ -1147,7 +1167,8 @@ static void *multifd_recv_thread(void *opaque)
>      qemu_mutex_unlock(&p->mutex);
>  
>      rcu_unregister_thread();
> -    trace_multifd_recv_thread_end(p->id, p->num_packets, p->total_normal_pages);
> +    trace_multifd_recv_thread_end(p->id, p->num_packets, p->total_normal_pages,
> +                                  p->total_zero_pages);
>  
>      return NULL;
>  }
> @@ -1187,6 +1208,7 @@ int multifd_load_setup(Error **errp)
>          p->normal = g_new0(ram_addr_t, page_count);
>          p->page_count = page_count;
>          p->page_size = qemu_target_page_size();
> +        p->zero = g_new0(ram_addr_t, page_count);
>      }
>  
>      for (i = 0; i < thread_count; i++) {
> diff --git a/migration/ram.c b/migration/ram.c
> index 291ba5c0ed..2af70f517a 100644
> --- a/migration/ram.c
> +++ b/migration/ram.c
> @@ -1412,8 +1412,6 @@ static int ram_save_multifd_page(RAMState *rs, RAMBlock *block,
>      if (multifd_queue_page(rs->f, block, offset) < 0) {
>          return -1;
>      }
> -    ram_counters.normal++;
> -
>      return 1;
>  }
>  
> diff --git a/migration/trace-events b/migration/trace-events
> index a34afe7b85..d34aec177c 100644
> --- a/migration/trace-events
> +++ b/migration/trace-events
> @@ -120,21 +120,21 @@ postcopy_preempt_reset_channel(void) ""
>  
>  # multifd.c
>  multifd_new_send_channel_async(uint8_t id) "channel %u"
> -multifd_recv(uint8_t id, uint64_t packet_num, uint32_t used, uint32_t flags, uint32_t next_packet_size) "channel %u packet_num %" PRIu64 " pages %u flags 0x%x next packet size %u"
> +multifd_recv(uint8_t id, uint64_t packet_num, uint32_t normal, uint32_t zero, uint32_t flags, uint32_t next_packet_size) "channel %u packet_num %" PRIu64 " normal pages %u zero pages %u flags 0x%x next packet size %u"
>  multifd_recv_new_channel(uint8_t id) "channel %u"
>  multifd_recv_sync_main(long packet_num) "packet num %ld"
>  multifd_recv_sync_main_signal(uint8_t id) "channel %u"
>  multifd_recv_sync_main_wait(uint8_t id) "channel %u"
>  multifd_recv_terminate_threads(bool error) "error %d"
> -multifd_recv_thread_end(uint8_t id, uint64_t packets, uint64_t pages) "channel %u packets %" PRIu64 " pages %" PRIu64
> +multifd_recv_thread_end(uint8_t id, uint64_t packets, uint64_t normal_pages, uint64_t zero_pages) "channel %u packets %" PRIu64 " normal pages %" PRIu64 " zero pages %" PRIu64
>  multifd_recv_thread_start(uint8_t id) "%u"
> -multifd_send(uint8_t id, uint64_t packet_num, uint32_t normal, uint32_t flags, uint32_t next_packet_size) "channel %u packet_num %" PRIu64 " normal pages %u flags 0x%x next packet size %u"
> +multifd_send(uint8_t id, uint64_t packet_num, uint32_t normalpages, uint32_t zero_pages, uint32_t flags, uint32_t next_packet_size) "channel %u packet_num %" PRIu64 " normal pages %u zero pages %u flags 0x%x next packet size %u"
>  multifd_send_error(uint8_t id) "channel %u"
>  multifd_send_sync_main(long packet_num) "packet num %ld"
>  multifd_send_sync_main_signal(uint8_t id) "channel %u"
>  multifd_send_sync_main_wait(uint8_t id) "channel %u"
>  multifd_send_terminate_threads(bool error) "error %d"
> -multifd_send_thread_end(uint8_t id, uint64_t packets, uint64_t normal_pages) "channel %u packets %" PRIu64 " normal pages %"  PRIu64
> +multifd_send_thread_end(uint8_t id, uint64_t packets, uint64_t normal_pages, uint64_t zero_pages) "channel %u packets %" PRIu64 " normal pages %"  PRIu64 " zero pages %"  PRIu64
>  multifd_send_thread_start(uint8_t id) "%u"
>  multifd_tls_outgoing_handshake_start(void *ioc, void *tioc, const char *hostname) "ioc=%p tioc=%p hostname=%s"
>  multifd_tls_outgoing_handshake_error(void *ioc, const char *err) "ioc=%p err=%s"



^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH v7 11/12] multifd: Zero pages transmission
  2022-08-02  6:39 ` [PATCH v7 11/12] multifd: Zero " Juan Quintela
@ 2022-09-02 13:27   ` Leonardo Brás
  2022-11-14 12:20     ` Juan Quintela
  2022-11-14 12:27     ` Juan Quintela
  0 siblings, 2 replies; 43+ messages in thread
From: Leonardo Brás @ 2022-09-02 13:27 UTC (permalink / raw)
  To: Juan Quintela, qemu-devel
  Cc: Marcel Apfelbaum, Dr. David Alan Gilbert, Peter Xu, Eric Blake,
	Philippe Mathieu-Daudé, Yanan Wang, Markus Armbruster,
	Eduardo Habkost

On Tue, 2022-08-02 at 08:39 +0200, Juan Quintela wrote:
> This implements the zero page dection and handling.
> 
> Signed-off-by: Juan Quintela <quintela@redhat.com>
> 
> ---
> 
> Add comment for offset (dave)
> Use local variables for offset/block to have shorter lines
> ---
>  migration/multifd.h |  5 +++++
>  migration/multifd.c | 41 +++++++++++++++++++++++++++++++++++++++--
>  2 files changed, 44 insertions(+), 2 deletions(-)
> 
> diff --git a/migration/multifd.h b/migration/multifd.h
> index a1b852200d..5931de6f86 100644
> --- a/migration/multifd.h
> +++ b/migration/multifd.h
> @@ -52,6 +52,11 @@ typedef struct {
>      uint32_t unused32[1];    /* Reserved for future use */
>      uint64_t unused64[3];    /* Reserved for future use */
>      char ramblock[256];
> +    /*
> +     * This array contains the pointers to:
> +     *  - normal pages (initial normal_pages entries)
> +     *  - zero pages (following zero_pages entries)
> +     */
>      uint64_t offset[];
>  } __attribute__((packed)) MultiFDPacket_t;
>  
> diff --git a/migration/multifd.c b/migration/multifd.c
> index 4473d9f834..89811619d8 100644
> --- a/migration/multifd.c
> +++ b/migration/multifd.c
> @@ -11,6 +11,7 @@
>   */
>  
>  #include "qemu/osdep.h"
> +#include "qemu/cutils.h"
>  #include "qemu/rcu.h"
>  #include "exec/target_page.h"
>  #include "sysemu/sysemu.h"
> @@ -275,6 +276,12 @@ static void multifd_send_fill_packet(MultiFDSendParams *p)
>  
>          packet->offset[i] = cpu_to_be64(temp);
>      }
> +    for (i = 0; i < p->zero_num; i++) {
> +        /* there are architectures where ram_addr_t is 32 bit */
> +        uint64_t temp = p->zero[i];
> +
> +        packet->offset[p->normal_num + i] = cpu_to_be64(temp);
> +    }
>  }
>  
>  static int multifd_recv_unfill_packet(MultiFDRecvParams *p, Error **errp)
> @@ -358,6 +365,18 @@ static int multifd_recv_unfill_packet(MultiFDRecvParams *p, Error **errp)
>          p->normal[i] = offset;
>      }
>  
> +    for (i = 0; i < p->zero_num; i++) {
> +        uint64_t offset = be64_to_cpu(packet->offset[p->normal_num + i]);
> +
> +        if (offset > (block->used_length - p->page_size)) {
> +            error_setg(errp, "multifd: offset too long %" PRIu64
> +                       " (max " RAM_ADDR_FMT ")",
> +                       offset, block->used_length);
> +            return -1;
> +        }
> +        p->zero[i] = offset;
> +    }
> +
>      return 0;
>  }

IIUC ram_addr_t is supposed to be the address size for the architecture, mainly
being 32 or 64 bits. So packet->offset[i] is always u64, and p->zero[i] possibly
being u32 or u64.

Since both local variables and packet->offset[i] are 64-bit, there is no issue.

But on 'p->zero[i] = offset' we can have 'u32 = u64', and this should raise a
warning (or am I missing something?).
  

>  
> @@ -648,6 +667,8 @@ static void *multifd_send_thread(void *opaque)
>  {
>      MultiFDSendParams *p = opaque;
>      Error *local_err = NULL;
> +    /* qemu older than 7.0 don't understand zero page on multifd channel */
> +    bool use_zero_page = migrate_use_multifd_zero_page();
>      int ret = 0;
>      bool use_zero_copy_send = migrate_use_zero_copy_send();
>  
> @@ -670,6 +691,7 @@ static void *multifd_send_thread(void *opaque)
>          qemu_mutex_lock(&p->mutex);
>  
>          if (p->pending_job) {
> +            RAMBlock *rb = p->pages->block;
>              uint64_t packet_num = p->packet_num;
>              p->flags = 0;
>              if (p->sync_needed) {
> @@ -688,8 +710,16 @@ static void *multifd_send_thread(void *opaque)
>              }
>  
>              for (int i = 0; i < p->pages->num; i++) {
> -                p->normal[p->normal_num] = p->pages->offset[i];
> -                p->normal_num++;
> +                uint64_t offset = p->pages->offset[i];
> +                if (use_zero_page &&
> +                    buffer_is_zero(rb->host + offset, p->page_size)) {
> +                    p->zero[p->zero_num] = offset;

Same here.

> +                    p->zero_num++;
> +                    ram_release_page(rb->idstr, offset);
> +                } else {
> +                    p->normal[p->normal_num] = offset;

Same here? (p->normal[i] can also be u32)

> +                    p->normal_num++;
> +                }
>              }
>  
>              if (p->normal_num) {
> @@ -1152,6 +1182,13 @@ static void *multifd_recv_thread(void *opaque)
>              }
>          }
>  
> +        for (int i = 0; i < p->zero_num; i++) {
> +            void *page = p->host + p->zero[i];
> +            if (!buffer_is_zero(page, p->page_size)) {
> +                memset(page, 0, p->page_size);
> +            }
> +        }
> +
>          if (sync_needed) {
>              qemu_sem_post(&multifd_recv_state->sem_sync);
>              qemu_sem_wait(&p->sem_sync);



^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH v7 12/12] So we use multifd to transmit zero pages.
  2022-08-02  6:39 ` [PATCH v7 12/12] So we use multifd to transmit zero pages Juan Quintela
@ 2022-09-02 13:27   ` Leonardo Brás
  2022-11-14 12:30     ` Juan Quintela
  0 siblings, 1 reply; 43+ messages in thread
From: Leonardo Brás @ 2022-09-02 13:27 UTC (permalink / raw)
  To: Juan Quintela, qemu-devel
  Cc: Marcel Apfelbaum, Dr. David Alan Gilbert, Peter Xu, Eric Blake,
	Philippe Mathieu-Daudé, Yanan Wang, Markus Armbruster,
	Eduardo Habkost

On Tue, 2022-08-02 at 08:39 +0200, Juan Quintela wrote:
> Signed-off-by: Juan Quintela <quintela@redhat.com>
> 
> ---
> 
> - Check zero_page property before using new code (Dave)
> ---
>  migration/migration.c |  4 +---
>  migration/multifd.c   |  6 +++---
>  migration/ram.c       | 33 ++++++++++++++++++++++++++++++++-
>  3 files changed, 36 insertions(+), 7 deletions(-)
> 
> diff --git a/migration/migration.c b/migration/migration.c
> index ce3e5cc0cd..13842f6803 100644
> --- a/migration/migration.c
> +++ b/migration/migration.c
> @@ -2599,9 +2599,7 @@ bool migrate_use_main_zero_page(void)
>  
>      s = migrate_get_current();
>  
> -    // We will enable this when we add the right code.
> -    // return s->enabled_capabilities[MIGRATION_CAPABILITY_MAIN_ZERO_PAGE];
> -    return true;
> +    return s->enabled_capabilities[MIGRATION_CAPABILITY_MAIN_ZERO_PAGE];
>  }
>  
>  bool migrate_pause_before_switchover(void)
> diff --git a/migration/multifd.c b/migration/multifd.c
> index 89811619d8..54acdc004c 100644
> --- a/migration/multifd.c
> +++ b/migration/multifd.c
> @@ -667,8 +667,8 @@ static void *multifd_send_thread(void *opaque)
>  {
>      MultiFDSendParams *p = opaque;
>      Error *local_err = NULL;
> -    /* qemu older than 7.0 don't understand zero page on multifd channel */
> -    bool use_zero_page = migrate_use_multifd_zero_page();
> +    /* older qemu don't understand zero page on multifd channel */
> +    bool use_multifd_zero_page = !migrate_use_main_zero_page();

I understand that "use_main_zero_page", which is introduced as a new capability,
is in fact the old behavior, and the new feature is introduced when this
capability is disabled.

But it sure looks weird reading:
 use_multifd_zero_page = !migrate_use_main_zero_page();

This series is fresh in my mind, but it took a few seconds to see that this is
actually not a typo. 

>      int ret = 0;
>      bool use_zero_copy_send = migrate_use_zero_copy_send();
>  
> @@ -711,7 +711,7 @@ static void *multifd_send_thread(void *opaque)
>  
>              for (int i = 0; i < p->pages->num; i++) {
>                  uint64_t offset = p->pages->offset[i];
> -                if (use_zero_page &&
> +                if (use_multifd_zero_page &&
>                      buffer_is_zero(rb->host + offset, p->page_size)) {
>                      p->zero[p->zero_num] = offset;
>                      p->zero_num++;
> diff --git a/migration/ram.c b/migration/ram.c
> index 2af70f517a..26e60b9cc1 100644
> --- a/migration/ram.c
> +++ b/migration/ram.c
> @@ -2428,6 +2428,32 @@ static void postcopy_preempt_reset_channel(RAMState *rs)
>      }
>  }
>  
> +/**
> + * ram_save_target_page_multifd: save one target page
> + *
> + * Returns the number of pages written
> + *
> + * @rs: current RAM state
> + * @pss: data about the page we want to send
> + */
> +static int ram_save_target_page_multifd(RAMState *rs, PageSearchStatus *pss)
> +{
> +    RAMBlock *block = pss->block;
> +    ram_addr_t offset = ((ram_addr_t)pss->page) << TARGET_PAGE_BITS;
> +    int res;
> +
> +    if (!migration_in_postcopy()) {
> +        return ram_save_multifd_page(rs, block, offset);
> +    }
> +
> +    res = save_zero_page(rs, block, offset);
> +    if (res > 0) {
> +        return res;
> +    }
> +
> +    return ram_save_page(rs, pss);
> +}
> +
>  /**
>   * ram_save_host_page: save a whole host page
>   *
> @@ -3225,7 +3251,12 @@ static int ram_save_setup(QEMUFile *f, void *opaque)
>      ram_control_before_iterate(f, RAM_CONTROL_SETUP);
>      ram_control_after_iterate(f, RAM_CONTROL_SETUP);
>  
> -    (*rsp)->ram_save_target_page = ram_save_target_page_legacy;
> +    if (migrate_use_multifd() && !migrate_use_main_zero_page()) {
> +        (*rsp)->ram_save_target_page = ram_save_target_page_multifd;
> +    } else {
> +        (*rsp)->ram_save_target_page = ram_save_target_page_legacy;
> +    }
> +
>      ret =  multifd_send_sync_main(f);
>      if (ret < 0) {
>          return ret;

The rest LGTM.

FWIW:
Reviewed-by: Leonardo Bras <leobras@redhat.com>




^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH v7 10/12] multifd: Support for zero pages transmission
  2022-08-02  6:39 ` [PATCH v7 10/12] multifd: Support for zero pages transmission Juan Quintela
  2022-09-02 13:27   ` Leonardo Brás
@ 2022-10-25  9:10   ` chuang xu
  2022-11-14 12:10     ` Juan Quintela
  1 sibling, 1 reply; 43+ messages in thread
From: chuang xu @ 2022-10-25  9:10 UTC (permalink / raw)
  To: Juan Quintela
  Cc: qemu-devel, Marcel Apfelbaum, Dr. David Alan Gilbert,
	Leonardo Bras, Peter Xu, Eric Blake, Philippe Mathieu-Daudé,
	Yanan Wang, Markus Armbruster, Eduardo Habkost


On 2022/8/2 下午2:39, Juan Quintela wrote:
> This patch adds counters and similar.  Logic will be added on the
> following patch.
>
> Signed-off-by: Juan Quintela <quintela@redhat.com>
>
> ---
>
> Added counters for duplicated/non duplicated pages.
> Removed reviewed by from David.
> Add total_zero_pages
> ---
>   migration/multifd.h    | 17 ++++++++++++++++-
>   migration/multifd.c    | 36 +++++++++++++++++++++++++++++-------
>   migration/ram.c        |  2 --
>   migration/trace-events |  8 ++++----
>   4 files changed, 49 insertions(+), 14 deletions(-)
>
> diff --git a/migration/multifd.h b/migration/multifd.h
> index cd389d18d2..a1b852200d 100644
> --- a/migration/multifd.h
> +++ b/migration/multifd.h
> @@ -47,7 +47,10 @@ typedef struct {
>       /* size of the next packet that contains pages */
>       uint32_t next_packet_size;
>       uint64_t packet_num;
> -    uint64_t unused[4];    /* Reserved for future use */
> +    /* zero pages */
> +    uint32_t zero_pages;
> +    uint32_t unused32[1];    /* Reserved for future use */
> +    uint64_t unused64[3];    /* Reserved for future use */
>       char ramblock[256];
>       uint64_t offset[];
>   } __attribute__((packed)) MultiFDPacket_t;
> @@ -127,6 +130,8 @@ typedef struct {
>       uint64_t num_packets;
>       /* non zero pages sent through this channel */
>       uint64_t total_normal_pages;
> +    /* zero pages sent through this channel */
> +    uint64_t total_zero_pages;
>       /* buffers to send */
>       struct iovec *iov;
>       /* number of iovs used */
> @@ -135,6 +140,10 @@ typedef struct {
>       ram_addr_t *normal;
>       /* num of non zero pages */
>       uint32_t normal_num;
> +    /* Pages that are  zero */
> +    ram_addr_t *zero;
> +    /* num of zero pages */
> +    uint32_t zero_num;
>       /* used for compression methods */
>       void *data;
>   }  MultiFDSendParams;
> @@ -184,12 +193,18 @@ typedef struct {
>       uint8_t *host;
>       /* non zero pages recv through this channel */
>       uint64_t total_normal_pages;
> +    /* zero pages recv through this channel */
> +    uint64_t total_zero_pages;
>       /* buffers to recv */
>       struct iovec *iov;
>       /* Pages that are not zero */
>       ram_addr_t *normal;
>       /* num of non zero pages */
>       uint32_t normal_num;
> +    /* Pages that are  zero */
> +    ram_addr_t *zero;
> +    /* num of zero pages */
> +    uint32_t zero_num;
>       /* used for de-compression methods */
>       void *data;
>   } MultiFDRecvParams;
> diff --git a/migration/multifd.c b/migration/multifd.c
> index 68fc9f8e88..4473d9f834 100644
> --- a/migration/multifd.c
> +++ b/migration/multifd.c
> @@ -263,6 +263,7 @@ static void multifd_send_fill_packet(MultiFDSendParams *p)
>       packet->normal_pages = cpu_to_be32(p->normal_num);
>       packet->next_packet_size = cpu_to_be32(p->next_packet_size);
>       packet->packet_num = cpu_to_be64(p->packet_num);
> +    packet->zero_pages = cpu_to_be32(p->zero_num);
>   
>       if (p->pages->block) {
>           strncpy(packet->ramblock, p->pages->block->idstr, 256);
> @@ -323,7 +324,15 @@ static int multifd_recv_unfill_packet(MultiFDRecvParams *p, Error **errp)
>       p->next_packet_size = be32_to_cpu(packet->next_packet_size);
>       p->packet_num = be64_to_cpu(packet->packet_num);
>   
> -    if (p->normal_num == 0) {
> +    p->zero_num = be32_to_cpu(packet->zero_pages);
> +    if (p->zero_num > packet->pages_alloc - p->normal_num) {
> +        error_setg(errp, "multifd: received packet "
> +                   "with %u zero pages and expected maximum pages are %u",
> +                   p->zero_num, packet->pages_alloc - p->normal_num) ;
> +        return -1;
> +    }
> +
> +    if (p->normal_num == 0 && p->zero_num == 0) {
>           return 0;
>       }
>   
> @@ -432,6 +441,8 @@ static int multifd_send_pages(QEMUFile *f)
>       ram_counters.multifd_bytes += p->sent_bytes;
>       qemu_file_acct_rate_limit(f, p->sent_bytes);
>       p->sent_bytes = 0;
> +    ram_counters.normal += p->normal_num;
> +    ram_counters.duplicate += p->zero_num;
>       qemu_mutex_unlock(&p->mutex);
>       qemu_sem_post(&p->sem);
>   
> @@ -545,6 +556,8 @@ void multifd_save_cleanup(void)
>           p->iov = NULL;
>           g_free(p->normal);
>           p->normal = NULL;
> +        g_free(p->zero);
> +        p->zero = NULL;
>           multifd_send_state->ops->send_cleanup(p, &local_err);
>           if (local_err) {
>               migrate_set_error(migrate_get_current(), local_err);
> @@ -666,6 +679,7 @@ static void *multifd_send_thread(void *opaque)
>               qemu_mutex_unlock(&p->mutex);
>   
>               p->normal_num = 0;
> +            p->zero_num = 0;
>   
>               if (use_zero_copy_send) {
>                   p->iovs_num = 0;
> @@ -687,8 +701,8 @@ static void *multifd_send_thread(void *opaque)
>               }
>               multifd_send_fill_packet(p);
>   
> -            trace_multifd_send(p->id, packet_num, p->normal_num, p->flags,
> -                               p->next_packet_size);
> +            trace_multifd_send(p->id, packet_num, p->normal_num, p->zero_num,
> +                               p->flags, p->next_packet_size);
>   
>               if (use_zero_copy_send) {
>                   /* Send header first, without zerocopy */
> @@ -712,6 +726,7 @@ static void *multifd_send_thread(void *opaque)
>               qemu_mutex_lock(&p->mutex);
>               p->num_packets++;
>               p->total_normal_pages += p->normal_num;
> +            p->total_zero_pages += p->zero_num;
>               p->pages->num = 0;
>               p->pages->block = NULL;
>               p->sent_bytes += p->packet_len;;
> @@ -753,7 +768,8 @@ out:
>       qemu_mutex_unlock(&p->mutex);
>   
>       rcu_unregister_thread();
> -    trace_multifd_send_thread_end(p->id, p->num_packets, p->total_normal_pages);
> +    trace_multifd_send_thread_end(p->id, p->num_packets, p->total_normal_pages,
> +                                  p->total_zero_pages);
>   
>       return NULL;
>   }
> @@ -938,6 +954,7 @@ int multifd_save_setup(Error **errp)
>           p->normal = g_new0(ram_addr_t, page_count);
>           p->page_size = qemu_target_page_size();
>           p->page_count = page_count;
> +        p->zero = g_new0(ram_addr_t, page_count);
>   
>           if (migrate_use_zero_copy_send()) {
>               p->write_flags = QIO_CHANNEL_WRITE_FLAG_ZERO_COPY;
> @@ -1046,6 +1063,8 @@ int multifd_load_cleanup(Error **errp)
>           p->iov = NULL;
>           g_free(p->normal);
>           p->normal = NULL;
> +        g_free(p->zero);
> +        p->zero = NULL;
>           multifd_recv_state->ops->recv_cleanup(p);
>       }
>       qemu_sem_destroy(&multifd_recv_state->sem_sync);
> @@ -1116,13 +1135,14 @@ static void *multifd_recv_thread(void *opaque)
>               break;
>           }
>   
> -        trace_multifd_recv(p->id, p->packet_num, p->normal_num, p->flags,
> -                           p->next_packet_size);
> +        trace_multifd_recv(p->id, p->packet_num, p->normal_num, p->zero_num,
> +                           p->flags, p->next_packet_size);
>           sync_needed = p->flags & MULTIFD_FLAG_SYNC;
>           /* recv methods don't know how to handle the SYNC flag */
>           p->flags &= ~MULTIFD_FLAG_SYNC;
>           p->num_packets++;
>           p->total_normal_pages += p->normal_num;
> +        p->total_normal_pages += p->zero_num;

Hi, Juan:

If I understand correctly, it should be "p->total_zero_pages += 
p->zero_num; ".

By the way, This patch seems to greatly improve the performance of zero 
page checking,  but it seems that there has been no new update in the 
past two months. I want to know when it will be merged into master?



^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH v7 10/12] multifd: Support for zero pages transmission
  2022-09-02 13:27   ` Leonardo Brás
@ 2022-11-14 12:09     ` Juan Quintela
  0 siblings, 0 replies; 43+ messages in thread
From: Juan Quintela @ 2022-11-14 12:09 UTC (permalink / raw)
  To: Leonardo Brás
  Cc: qemu-devel, Marcel Apfelbaum, Dr. David Alan Gilbert, Peter Xu,
	Eric Blake, Philippe Mathieu-Daudé, Yanan Wang,
	Markus Armbruster, Eduardo Habkost

Leonardo Brás <leobras@redhat.com> wrote:

...

>> @@ -712,6 +726,7 @@ static void *multifd_send_thread(void *opaque)
>>              qemu_mutex_lock(&p->mutex);
>>              p->num_packets++;
>>              p->total_normal_pages += p->normal_num;
>> +            p->total_zero_pages += p->zero_num;
>
> I can see it getting declared, incremented and used. But where is it initialized
> in zero? I mean, should it not have 'p->total_normal_pages = 0;' somewhere in
> setup?

int multifd_save_setup(Error **errp)
{
    ....

    thread_count = migrate_multifd_channels();
    multifd_send_state = g_malloc0(sizeof(*multifd_send_state));
    multifd_send_state->params = g_new0(MultiFDSendParams, thread_count);

You can see here, that we setup everything to zero.  We only need to
initialize explicitely whatever is not zero.


> (I understand multifd_save_setup() allocates a multifd_send_state->params with
> g_new0(),but other variables are zeroed there, like p->pending_job and 
> p->write_flags, so why not?)   

Humm, I think that it is better to do it the other way around.  Remove
the initilazations that are not zero.  That way we only put whatever is
not zero.


Thanks, Juan.



^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH v7 10/12] multifd: Support for zero pages transmission
  2022-10-25  9:10   ` chuang xu
@ 2022-11-14 12:10     ` Juan Quintela
  0 siblings, 0 replies; 43+ messages in thread
From: Juan Quintela @ 2022-11-14 12:10 UTC (permalink / raw)
  To: chuang xu
  Cc: qemu-devel, Marcel Apfelbaum, Dr. David Alan Gilbert,
	Leonardo Bras, Peter Xu, Eric Blake, Philippe Mathieu-Daudé,
	Yanan Wang, Markus Armbruster, Eduardo Habkost

chuang xu <xuchuangxclwt@bytedance.com> wrote:
> On 2022/8/2 下午2:39, Juan Quintela wrote:
>> This patch adds counters and similar.  Logic will be added on the
>> following patch.
>>
>> Signed-off-by: Juan Quintela <quintela@redhat.com>
>>

>>           sync_needed = p->flags & MULTIFD_FLAG_SYNC;
>>           /* recv methods don't know how to handle the SYNC flag */
>>           p->flags &= ~MULTIFD_FLAG_SYNC;
>>           p->num_packets++;
>>           p->total_normal_pages += p->normal_num;
>> +        p->total_normal_pages += p->zero_num;
>
> Hi, Juan:
>
> If I understand correctly, it should be "p->total_zero_pages +=
> p->zero_num; ".

Very good catch. Thanks.

That is what rebases make to you.

> By the way, This patch seems to greatly improve the performance of
> zero page checking,  but it seems that there has been no new update in
> the past two months. I want to know when it will be merged into
> master?

I am resending right now.

Later, Juan.



^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH v7 11/12] multifd: Zero pages transmission
  2022-09-02 13:27   ` Leonardo Brás
@ 2022-11-14 12:20     ` Juan Quintela
  2022-11-14 12:27     ` Juan Quintela
  1 sibling, 0 replies; 43+ messages in thread
From: Juan Quintela @ 2022-11-14 12:20 UTC (permalink / raw)
  To: Leonardo Brás
  Cc: qemu-devel, Marcel Apfelbaum, Dr. David Alan Gilbert, Peter Xu,
	Eric Blake, Philippe Mathieu-Daudé, Yanan Wang,
	Markus Armbruster, Eduardo Habkost

Leonardo Brás <leobras@redhat.com> wrote:
> On Tue, 2022-08-02 at 08:39 +0200, Juan Quintela wrote:
>> This implements the zero page dection and handling.
>> 
>> Signed-off-by: Juan Quintela <quintela@redhat.com>

>> @@ -358,6 +365,18 @@ static int multifd_recv_unfill_packet(MultiFDRecvParams *p, Error **errp)
>>          p->normal[i] = offset;
>>      }
>>  
>> +    for (i = 0; i < p->zero_num; i++) {
>> +        uint64_t offset = be64_to_cpu(packet->offset[p->normal_num + i]);
>> +
>> +        if (offset > (block->used_length - p->page_size)) {
>> +            error_setg(errp, "multifd: offset too long %" PRIu64
>> +                       " (max " RAM_ADDR_FMT ")",
>> +                       offset, block->used_length);
>> +            return -1;
>> +        }
>> +        p->zero[i] = offset;
>> +    }
>> +
>>      return 0;
>>  }
>
> IIUC ram_addr_t is supposed to be the address size for the architecture, mainly
> being 32 or 64 bits. So packet->offset[i] is always u64, and p->zero[i] possibly
> being u32 or u64.
>
> Since both local variables and packet->offset[i] are 64-bit, there is no issue.
>
> But on 'p->zero[i] = offset' we can have 'u32 = u64', and this should raise a
> warning (or am I missing something?).

I don't really know what to do here.
The problem is only theoretical (in the long, long past, we have
supported migrating between different architectures, but we aren't
testing anymore).

And because it was a pain in the ass, we define it as:

/* address in the RAM (different from a physical address) */
#if defined(CONFIG_XEN_BACKEND)
typedef uint64_t ram_addr_t;
#  define RAM_ADDR_MAX UINT64_MAX
#  define RAM_ADDR_FMT "%" PRIx64
#else
typedef uintptr_t ram_addr_t;
#  define RAM_ADDR_MAX UINTPTR_MAX
#  define RAM_ADDR_FMT "%" PRIxPTR
#endif

So I am pretty sure that almost nothing uses 32bits for it now (I
haven't checked lately, but I guess that nobody is really using/testing
xen on 32 bits).

I don't really know.  But it could only happens when you are migrating
from Xen 64 bits to Xen 32 bits, I don't really know if that even work.

I will give it a try to change normal/zero to u64.

Thanks, Juan.



^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH v7 11/12] multifd: Zero pages transmission
  2022-09-02 13:27   ` Leonardo Brás
  2022-11-14 12:20     ` Juan Quintela
@ 2022-11-14 12:27     ` Juan Quintela
  1 sibling, 0 replies; 43+ messages in thread
From: Juan Quintela @ 2022-11-14 12:27 UTC (permalink / raw)
  To: Leonardo Brás
  Cc: qemu-devel, Marcel Apfelbaum, Dr. David Alan Gilbert, Peter Xu,
	Eric Blake, Philippe Mathieu-Daudé, Yanan Wang,
	Markus Armbruster, Eduardo Habkost

Leonardo Brás <leobras@redhat.com> wrote:
> On Tue, 2022-08-02 at 08:39 +0200, Juan Quintela wrote:
>> This implements the zero page dection and handling.
>> 
>> Signed-off-by: Juan Quintela <quintela@redhat.com>

Hi

on further investigation, I see why it can't be a problem.


>>  static int multifd_recv_unfill_packet(MultiFDRecvParams *p, Error **errp)
>> @@ -358,6 +365,18 @@ static int multifd_recv_unfill_packet(MultiFDRecvParams *p, Error **errp)
>>          p->normal[i] = offset;
>>      }
>>  
>> +    for (i = 0; i < p->zero_num; i++) {
>> +        uint64_t offset = be64_to_cpu(packet->offset[p->normal_num + i]);
>> +
>> +        if (offset > (block->used_length - p->page_size)) {

We are checked that we are inside the RAM block.  You can't have a
bigger that 32bit offset when you have 32bits of RAM.


>> @@ -688,8 +710,16 @@ static void *multifd_send_thread(void *opaque)
>>              }
>>  
>>              for (int i = 0; i < p->pages->num; i++) {
>> -                p->normal[p->normal_num] = p->pages->offset[i];
>> -                p->normal_num++;
>> +                uint64_t offset = p->pages->offset[i];

We are reading the offset here.
p->pages->offset is ram_addr_t, so no prolbem here.

>> +                if (use_zero_page &&
>> +                    buffer_is_zero(rb->host + offset, p->page_size)) {
>> +                    p->zero[p->zero_num] = offset;
>
> Same here.

This and next case are exactly the same, we are doing:

ram_addr_t offset1;
u64 foo = offset1;
ram_addr_t offest2 = foo;

So, it should be right.  Everything is unsigned here.

>> +                    p->zero_num++;
>> +                    ram_release_page(rb->idstr, offset);
>> +                } else {
>> +                    p->normal[p->normal_num] = offset;
>
> Same here? (p->normal[i] can also be u32)

Thanks, Juan.



^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH v7 12/12] So we use multifd to transmit zero pages.
  2022-09-02 13:27   ` Leonardo Brás
@ 2022-11-14 12:30     ` Juan Quintela
  0 siblings, 0 replies; 43+ messages in thread
From: Juan Quintela @ 2022-11-14 12:30 UTC (permalink / raw)
  To: Leonardo Brás
  Cc: qemu-devel, Marcel Apfelbaum, Dr. David Alan Gilbert, Peter Xu,
	Eric Blake, Philippe Mathieu-Daudé, Yanan Wang,
	Markus Armbruster, Eduardo Habkost

Leonardo Brás <leobras@redhat.com> wrote:
> On Tue, 2022-08-02 at 08:39 +0200, Juan Quintela wrote:
>> Signed-off-by: Juan Quintela <quintela@redhat.com>

>> diff --git a/migration/multifd.c b/migration/multifd.c
>> index 89811619d8..54acdc004c 100644
>> --- a/migration/multifd.c
>> +++ b/migration/multifd.c
>> @@ -667,8 +667,8 @@ static void *multifd_send_thread(void *opaque)
>>  {
>>      MultiFDSendParams *p = opaque;
>>      Error *local_err = NULL;
>> -    /* qemu older than 7.0 don't understand zero page on multifd channel */
>> -    bool use_zero_page = migrate_use_multifd_zero_page();
>> +    /* older qemu don't understand zero page on multifd channel */
>> +    bool use_multifd_zero_page = !migrate_use_main_zero_page();
>
> I understand that "use_main_zero_page", which is introduced as a new capability,
> is in fact the old behavior, and the new feature is introduced when this
> capability is disabled.
>
> But it sure looks weird reading:
>  use_multifd_zero_page = !migrate_use_main_zero_page();
>
> This series is fresh in my mind, but it took a few seconds to see that this is
> actually not a typo. 

We can't have it both ways.

All other capabilities are false by default.  And libvirt assumes they
are false.  So, or we are willing to change the expectations, or we need
to do it this way.

In previous versions, I had the capability named the other way around,
and I changed it due to this.

Thanks, Juan.



^ permalink raw reply	[flat|nested] 43+ messages in thread

end of thread, other threads:[~2022-11-15  1:37 UTC | newest]

Thread overview: 43+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-08-02  6:38 [PATCH v7 00/12] Migration: Transmit and detect zero pages in the multifd threads Juan Quintela
2022-08-02  6:38 ` [PATCH v7 01/12] multifd: Create page_size fields into both MultiFD{Recv, Send}Params Juan Quintela
2022-08-11  8:10   ` [PATCH v7 01/12] multifd: Create page_size fields into both MultiFD{Recv,Send}Params Leonardo Brás
2022-08-13 15:41     ` Juan Quintela
2022-08-02  6:38 ` [PATCH v7 02/12] multifd: Create page_count fields into both MultiFD{Recv, Send}Params Juan Quintela
2022-08-11  8:10   ` [PATCH v7 02/12] multifd: Create page_count fields into both MultiFD{Recv,Send}Params Leonardo Brás
2022-08-02  6:38 ` [PATCH v7 03/12] migration: Export ram_transferred_ram() Juan Quintela
2022-08-11  8:11   ` Leonardo Brás
2022-08-13 15:36     ` Juan Quintela
2022-08-02  6:38 ` [PATCH v7 04/12] multifd: Count the number of bytes sent correctly Juan Quintela
2022-08-11  8:11   ` Leonardo Brás
2022-08-19  9:35     ` Juan Quintela
2022-08-02  6:39 ` [PATCH v7 05/12] migration: Make ram_save_target_page() a pointer Juan Quintela
2022-08-11  8:11   ` Leonardo Brás
2022-08-19  9:51     ` Juan Quintela
2022-08-20  7:14       ` Leonardo Bras Soares Passos
2022-08-22 21:35         ` Juan Quintela
2022-08-02  6:39 ` [PATCH v7 06/12] multifd: Make flags field thread local Juan Quintela
2022-08-11  9:04   ` Leonardo Brás
2022-08-19 10:03     ` Juan Quintela
2022-08-20  7:24       ` Leonardo Bras Soares Passos
2022-08-23 13:00         ` Juan Quintela
2022-08-02  6:39 ` [PATCH v7 07/12] multifd: Prepare to send a packet without the mutex held Juan Quintela
2022-08-11  9:16   ` Leonardo Brás
2022-08-19 11:32     ` Juan Quintela
2022-08-20  7:27       ` Leonardo Bras Soares Passos
2022-08-02  6:39 ` [PATCH v7 08/12] multifd: Add capability to enable/disable zero_page Juan Quintela
2022-08-11  9:29   ` Leonardo Brás
2022-08-19 11:36     ` Juan Quintela
2022-08-02  6:39 ` [PATCH v7 09/12] migration: Export ram_release_page() Juan Quintela
2022-08-11  9:31   ` Leonardo Brás
2022-08-02  6:39 ` [PATCH v7 10/12] multifd: Support for zero pages transmission Juan Quintela
2022-09-02 13:27   ` Leonardo Brás
2022-11-14 12:09     ` Juan Quintela
2022-10-25  9:10   ` chuang xu
2022-11-14 12:10     ` Juan Quintela
2022-08-02  6:39 ` [PATCH v7 11/12] multifd: Zero " Juan Quintela
2022-09-02 13:27   ` Leonardo Brás
2022-11-14 12:20     ` Juan Quintela
2022-11-14 12:27     ` Juan Quintela
2022-08-02  6:39 ` [PATCH v7 12/12] So we use multifd to transmit zero pages Juan Quintela
2022-09-02 13:27   ` Leonardo Brás
2022-11-14 12:30     ` Juan Quintela

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.