All the mail mirrored from lore.kernel.org
 help / color / mirror / Atom feed
From: David Hildenbrand <david@redhat.com>
To: Christian Borntraeger <borntraeger@linux.ibm.com>,
	linux-kernel@vger.kernel.org
Cc: linux-mm@kvack.org, Janosch Frank <frankja@linux.ibm.com>,
	Claudio Imbrenda <imbrenda@linux.ibm.com>,
	Heiko Carstens <hca@linux.ibm.com>,
	Vasily Gorbik <gor@linux.ibm.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	Peter Xu <peterx@redhat.com>,
	Alexander Gordeev <agordeev@linux.ibm.com>,
	Sven Schnelle <svens@linux.ibm.com>,
	Gerald Schaefer <gerald.schaefer@linux.ibm.com>,
	Andrea Arcangeli <aarcange@redhat.com>,
	kvm@vger.kernel.org, linux-s390@vger.kernel.org
Subject: Re: [PATCH v1 2/2] s390/mm: re-enable the shared zeropage for !PV and !skeys KVM guests
Date: Fri, 22 Mar 2024 18:08:08 +0100	[thread overview]
Message-ID: <ac0e3000-eb04-4f13-9eaf-fe1eaa2f5497@redhat.com> (raw)
In-Reply-To: <ed0f05de-0e17-41ec-85b2-be8603b0556a@linux.ibm.com>

On 22.03.24 11:22, Christian Borntraeger wrote:
> 
> 
> Am 21.03.24 um 22:59 schrieb David Hildenbrand:
>> commit fa41ba0d08de ("s390/mm: avoid empty zero pages for KVM guests to
>> avoid postcopy hangs") introduced an undesired side effect when combined
>> with memory ballooning and VM migration: memory part of the inflated
>> memory balloon will consume memory.
>>
>> Assuming we have a 100GiB VM and inflated the balloon to 40GiB. Our VM
>> will consume ~60GiB of memory. If we now trigger a VM migration,
>> hypervisors like QEMU will read all VM memory. As s390x does not support
>> the shared zeropage, we'll end up allocating for all previously-inflated
>> memory part of the memory balloon: 50 GiB. So we might easily
>> (unexpectedly) crash the VM on the migration source.
>>
>> Even worse, hypervisors like QEMU optimize for zeropage migration to not
>> consume memory on the migration destination: when migrating a
>> "page full of zeroes", on the migration destination they check whether the
>> target memory is already zero (by reading the destination memory) and avoid
>> writing to the memory to not allocate memory: however, s390x will also
>> allocate memory here, implying that also on the migration destination, we
>> will end up allocating all previously-inflated memory part of the memory
>> balloon.
>>
>> This is especially bad if actual memory overcommit was not desired, when
>> memory ballooning is used for dynamic VM memory resizing, setting aside
>> some memory during boot that can be added later on demand. Alternatives
>> like virtio-mem that would avoid this issue are not yet available on
>> s390x.
>>
>> There could be ways to optimize some cases in user space: before reading
>> memory in an anonymous private mapping on the migration source, check via
>> /proc/self/pagemap if anything is already populated. Similarly check on
>> the migration destination before reading. While that would avoid
>> populating tables full of shared zeropages on all architectures, it's
>> harder to get right and performant, and requires user space changes.
>>
>> Further, with posctopy live migration we must place a page, so there,
>> "avoid touching memory to avoid allocating memory" is not really
>> possible. (Note that a previously we would have falsely inserted
>> shared zeropages into processes using UFFDIO_ZEROPAGE where
>> mm_forbids_zeropage() would have actually forbidden it)
>>
>> PV is currently incompatible with memory ballooning, and in the common
>> case, KVM guests don't make use of storage keys. Instead of zapping
>> zeropages when enabling storage keys / PV, that turned out to be
>> problematic in the past, let's do exactly the same we do with KSM pages:
>> trigger unsharing faults to replace the shared zeropages by proper
>> anonymous folios.
>>
>> What about added latency when enabling storage kes? Having a lot of
>> zeropages in applicable environments (PV, legacy guests, unittests) is
>> unexpected. Further, KSM could today already unshare the zeropages
>> and unmerging KSM pages when enabling storage kets would unshare the
>> KSM-placed zeropages in the same way, resulting in the same latency.
>>
>> Signed-off-by: David Hildenbrand <david@redhat.com>
> 
> Nice work. Looks good to me and indeed it fixes the memory
> over-consumption that you mentioned.

Thanks for the very fast review and test!

> 
> Reviewed-by: Christian Borntraeger <borntraeger@linux.ibm.com>
> Tested-by: Christian Borntraeger <borntraeger@linux.ibm.com>
> (can also be seen with virsh managedsave; virsh start)
> 
> I guess its too invasive for stable, but I would say it is real fix.

Should we add a Fixes: Tag? I refrained from doing so, treating this 
more like an optimization to restore the intended behavior at least as 
long as the VM does not use storage keys.

-- 
Cheers,

David / dhildenb



  reply	other threads:[~2024-03-22 17:08 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-03-21 21:59 [PATCH v1 0/2] s390/mm: shared zeropage + KVM fix and optimization David Hildenbrand
2024-03-21 21:59 ` [PATCH v1 1/2] mm/userfaultfd: don't place zeropages when zeropages are disallowed David Hildenbrand
2024-03-21 22:20   ` Peter Xu
2024-03-21 22:29     ` David Hildenbrand
2024-03-21 22:46       ` Peter Xu
2024-03-22  8:13         ` David Hildenbrand
2024-03-21 21:59 ` [PATCH v1 2/2] s390/mm: re-enable the shared zeropage for !PV and !skeys KVM guests David Hildenbrand
2024-03-22 10:22   ` Christian Borntraeger
2024-03-22 17:08     ` David Hildenbrand [this message]
2024-03-21 22:13 ` [PATCH v1 0/2] s390/mm: shared zeropage + KVM fix and optimization Andrew Morton
2024-03-26  7:38   ` Heiko Carstens
2024-03-26  8:28     ` David Hildenbrand
  -- strict thread matches above, loose matches on Subject: below --
2024-04-01  9:47 [PATCH v1 2/2] s390/mm: re-enable the shared zeropage for !PV and !skeys KVM guests kernel test robot

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ac0e3000-eb04-4f13-9eaf-fe1eaa2f5497@redhat.com \
    --to=david@redhat.com \
    --cc=aarcange@redhat.com \
    --cc=agordeev@linux.ibm.com \
    --cc=akpm@linux-foundation.org \
    --cc=borntraeger@linux.ibm.com \
    --cc=frankja@linux.ibm.com \
    --cc=gerald.schaefer@linux.ibm.com \
    --cc=gor@linux.ibm.com \
    --cc=hca@linux.ibm.com \
    --cc=imbrenda@linux.ibm.com \
    --cc=kvm@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=linux-s390@vger.kernel.org \
    --cc=peterx@redhat.com \
    --cc=svens@linux.ibm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.