From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 44232C07CB1 for ; Mon, 27 Nov 2023 11:14:06 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232990AbjK0LN5 (ORCPT ); Mon, 27 Nov 2023 06:13:57 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:51510 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232624AbjK0LNz (ORCPT ); Mon, 27 Nov 2023 06:13:55 -0500 Received: from smtp-out1.suse.de (smtp-out1.suse.de [195.135.223.130]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id A6E14136; Mon, 27 Nov 2023 03:14:01 -0800 (PST) Received: from imap1.dmz-prg2.suse.org (imap1.dmz-prg2.suse.org [IPv6:2a07:de40:b281:104:10:150:64:97]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by smtp-out1.suse.de (Postfix) with ESMTPS id E414521C6F; Mon, 27 Nov 2023 11:13:59 +0000 (UTC) Received: from imap1.dmz-prg2.suse.org (localhost [127.0.0.1]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by imap1.dmz-prg2.suse.org (Postfix) with ESMTPS id 8C5491367B; Mon, 27 Nov 2023 11:13:59 +0000 (UTC) Received: from dovecot-director2.suse.de ([2a07:de40:b281:106:10:150:64:167]) by imap1.dmz-prg2.suse.org with ESMTPSA id Pc/rIfd5ZGXRZQAAD6G6ig (envelope-from ); Mon, 27 Nov 2023 11:13:59 +0000 Message-ID: <81628606-ca9b-866f-5e71-91001e856871@suse.cz> Date: Mon, 27 Nov 2023 12:13:59 +0100 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101 Thunderbird/102.15.1 Subject: Re: [PATCH v13 17/35] KVM: Add transparent hugepage support for dedicated guest memory Content-Language: en-US To: Paolo Bonzini , Sean Christopherson Cc: Xiaoyao Li , Marc Zyngier , Oliver Upton , Huacai Chen , Michael Ellerman , Anup Patel , Paul Walmsley , Palmer Dabbelt , Albert Ou , Alexander Viro , Christian Brauner , "Matthew Wilcox (Oracle)" , Andrew Morton , kvm@vger.kernel.org, linux-arm-kernel@lists.infradead.org, kvmarm@lists.linux.dev, linux-mips@vger.kernel.org, linuxppc-dev@lists.ozlabs.org, kvm-riscv@lists.infradead.org, linux-riscv@lists.infradead.org, linux-fsdevel@vger.kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org, Xu Yilun , Chao Peng , Fuad Tabba , Jarkko Sakkinen , Anish Moorthy , David Matlack , Yu Zhang , Isaku Yamahata , =?UTF-8?B?TWlja2HDq2wgU2FsYcO8?= =?UTF-8?Q?n?= , Vishal Annapurve , Ackerley Tng , Maciej Szmigiero , David Hildenbrand , Quentin Perret , Michael Roth , Wang , Liam Merwick , Isaku Yamahata , "Kirill A . Shutemov" References: <20231027182217.3615211-18-seanjc@google.com> <7c0844d8-6f97-4904-a140-abeabeb552c1@intel.com> <92ba7ddd-2bc8-4a8d-bd67-d6614b21914f@intel.com> <4ca2253d-276f-43c5-8e9f-0ded5d5b2779@redhat.com> From: Vlastimil Babka In-Reply-To: Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit X-Spamd-Bar: +++++++++++++++ Authentication-Results: smtp-out1.suse.de; dkim=none; dmarc=none; spf=softfail (smtp-out1.suse.de: 2a07:de40:b281:104:10:150:64:97 is neither permitted nor denied by domain of vbabka@suse.cz) smtp.mailfrom=vbabka@suse.cz X-Rspamd-Server: rspamd2 X-Spamd-Result: default: False [15.89 / 50.00]; RCVD_VIA_SMTP_AUTH(0.00)[]; SPAMHAUS_XBL(0.00)[2a07:de40:b281:104:10:150:64:97:from]; TO_DN_SOME(0.00)[]; R_SPF_SOFTFAIL(4.60)[~all]; RCVD_COUNT_THREE(0.00)[3]; MX_GOOD(-0.01)[]; FROM_EQ_ENVFROM(0.00)[]; R_DKIM_NA(2.20)[]; MIME_TRACE(0.00)[0:+]; MID_RHS_MATCH_FROM(0.00)[]; ARC_NA(0.00)[]; FROM_HAS_DN(0.00)[]; TO_MATCH_ENVRCPT_ALL(0.00)[]; FREEMAIL_ENVRCPT(0.00)[gmail.com]; TAGGED_RCPT(0.00)[]; MIME_GOOD(-0.10)[text/plain]; DMARC_NA(1.20)[suse.cz]; NEURAL_SPAM_SHORT(3.00)[1.000]; NEURAL_SPAM_LONG(3.50)[1.000]; RCPT_COUNT_TWELVE(0.00)[44]; FUZZY_BLOCKED(0.00)[rspamd.com]; FREEMAIL_CC(0.00)[intel.com,kernel.org,linux.dev,ellerman.id.au,brainfault.org,sifive.com,dabbelt.com,eecs.berkeley.edu,zeniv.linux.org.uk,infradead.org,linux-foundation.org,vger.kernel.org,lists.infradead.org,lists.linux.dev,lists.ozlabs.org,kvack.org,linux.intel.com,google.com,digikod.net,maciej.szmigiero.name,redhat.com,amd.com,oracle.com,gmail.com]; RCVD_TLS_ALL(0.00)[]; SUSPICIOUS_RECIPS(1.50)[] X-Rspamd-Queue-Id: E414521C6F X-Spam: Yes Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 11/2/23 16:46, Paolo Bonzini wrote: > On Thu, Nov 2, 2023 at 4:38 PM Sean Christopherson wrote: >> Actually, looking that this again, there's not actually a hard dependency on THP. >> A THP-enabled kernel _probably_ gives a higher probability of using hugepages, >> but mostly because THP selects COMPACTION, and I suppose because using THP for >> other allocations reduces overall fragmentation. > > Yes, that's why I didn't even bother enabling it unless THP is > enabled, but it makes even more sense to just try. > >> So rather than honor KVM_GUEST_MEMFD_ALLOW_HUGEPAGE iff THP is enabled, I think >> we should do the below (I verified KVM can create hugepages with THP=n). We'll >> need another capability, but (a) we probably should have that anyways and (b) it >> provides a cleaner path to adding PUD-sized hugepage support in the future. > > I wonder if we need KVM_CAP_GUEST_MEMFD_HUGEPAGE_PMD_SIZE though. This > should be a generic kernel API and in fact the sizes are available in > a not-so-friendly format in /sys/kernel/mm/hugepages. > > We should just add /sys/kernel/mm/hugepages/sizes that contains > "2097152 1073741824" on x86 (only the former if 1G pages are not > supported). > > Plus: is this the best API if we need something else for 1G pages? > > Let's drop *this* patch and proceed incrementally. (Again, this is > what I want to do with this final review: identify places that are > stil sticky, and don't let them block the rest). > > Coincidentially we have an open spot next week at plumbers. Let's > extend Fuad's section to cover more guestmem work. Hi, was there any outcome wrt this one? Based on my experience with THP's it would be best if userspace didn't have to opt-in, nor care about the supported size. If the given size is unaligned, provide a mix of large pages up to an aligned size, and for the rest fallback to base pages, which should be better than -EINVAL on creation (is it possible with the current implementation? I'd hope so so?). A way to opt-out from huge pages could be useful although there's always the risk of some initial troubles resulting in various online sources cargo-cult recommending to opt-out forever. Vlastimil