From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:52835) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1ZFAtF-0000BA-Ap for qemu-devel@nongnu.org; Tue, 14 Jul 2015 20:49:38 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1ZFAtC-0002Hc-1s for qemu-devel@nongnu.org; Tue, 14 Jul 2015 20:49:37 -0400 Received: from mail-pd0-f177.google.com ([209.85.192.177]:33966) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1ZFAtB-0002Gb-Q3 for qemu-devel@nongnu.org; Tue, 14 Jul 2015 20:49:33 -0400 Received: by pdbep18 with SMTP id ep18so14972191pdb.1 for ; Tue, 14 Jul 2015 17:49:33 -0700 (PDT) References: <1436799381-16150-1-git-send-email-aik@ozlabs.ru> <1436799381-16150-2-git-send-email-aik@ozlabs.ru> <1436814815.1391.388.camel@redhat.com> <55A4B306.6040402@ozlabs.ru> <1436912893.1391.484.camel@redhat.com> From: Alexey Kardashevskiy Message-ID: <55A5AE0F.7040207@ozlabs.ru> Date: Wed, 15 Jul 2015 10:49:19 +1000 MIME-Version: 1.0 In-Reply-To: <1436912893.1391.484.camel@redhat.com> Content-Type: text/plain; charset=koi8-r; format=flowed Content-Transfer-Encoding: 7bit Subject: Re: [Qemu-devel] [RFC PATCH qemu v2 1/5] vfio: Switch from TARGET_PAGE_MASK to qemu_real_host_page_mask List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Alex Williamson Cc: Michael Roth , qemu-ppc@nongnu.org, qemu-devel@nongnu.org, David Gibson On 07/15/2015 08:28 AM, Alex Williamson wrote: > On Tue, 2015-07-14 at 16:58 +1000, Alexey Kardashevskiy wrote: >> On 07/14/2015 05:13 AM, Alex Williamson wrote: >>> On Tue, 2015-07-14 at 00:56 +1000, Alexey Kardashevskiy wrote: >>>> These started switching from TARGET_PAGE_MASK (hardcoded as 4K) to >>>> a real host page size: >>>> 4e51361d7 "cpu-all: complete "real" host page size API" and >>>> f7ceed190 "vfio: cpu: Use "real" page size API" >>>> >>>> This finished the transition by: >>>> - %s/TARGET_PAGE_MASK/qemu_real_host_page_mask/ >>>> - %s/TARGET_PAGE_ALIGN/REAL_HOST_PAGE_ALIGN/ >>>> - removing bitfield length for offsets in VFIOQuirk::data as >>>> qemu_real_host_page_mask is not a macro >>> >>> Can we assume that none of the changes to quirks have actually been >>> tested? >> >> No, why? :) I tried it on one of NVIDIAs I got here - >> VGA compatible controller: NVIDIA Corporation GM107GL [Quadro K2200] (rev a2) >> The driver was from NVIDIA (not nouveau) and the test was "acos" (some >> basic CUDA test). > > That's only one of a handful or more quirks. The VGA related quirks are > all for backdoors in the bootstrap process and MMIO access to config > space, things that I would not expect to see on power. So power > probably isn't a useful host to test these things. > >>> I don't really support them being bundled in here since they >>> really aren't related to what you're doing. >> >> This makes sense, I'll move them to a separate patch and add a note how it >> helps on a 64k-pages host. >> >>> For DMA we generally want >>> to be host IOMMU page aligned, >> >> Do all known IOMMUs use a constant page size? IOMMU memory region does not >> have an IOMMU page size/mask and I wanted to add it there but not sure if >> it is generic enough. > > AMD supports nearly any power-of-2 size (>=4k), Intel supports 4k + > optionally 2M and 1G. The vfio type1 iommu driver looks for physically > contiguous ranges to give to the hardware iommu driver, which makes use > of whatever optimal page size it can. Therefore, we really don't care > what the hardware page size is beyond the assumption that it supports > 4k. When hugepages are used by the VM, we expect the iommu will > automatically make use of them if supported. A non-VM vfio userspace > driver might care a little bit more about the supported page sizes, I > imagine. Oh. Cooler that us (p8). >>> which we can generally assume is the same >>> as host page aligned, >> >> They are almost never the same on sPAPR for 32bit windows... >> >>> but quirks are simply I/O regions, so I think they >>> ought to continue to be target page aligned. >> >> Without s/TARGET_PAGE_MASK/qemu_real_host_page_mask/, >> &vfio_nvidia_88000_quirk fails+exits in kvm_set_phys_mem() as the size of >> section is 0x88000. It still works with x-mmap=false (or TCG, I suppose) >> though. > > Think about what this is doing to the guest. There's a 4k window > (because PCIe extended config space is 4k) at offset 0x88000 of the MMIO > BAR that allows access to PCI config space of the device. With 4k pages > this all aligns quite nicely and only config space accesses are trapped. > With 64k pages, you're trapping everything from 0x80000 to 0x8ffff. We > have no idea what else might live in that space and what kind of > performance impact it'll cause to the operation of the device. If you If the alternative is not to work all, then working slower (potentially) is ok. > don't know that you need it and can't meet the mapping criteria, don't > enable it. BTW, this quirk is for GeForce, not Quadro. The region > seems not to be used by Quadro drivers and we can't programatically tell > the difference between Quadro and GeForce hardware, so we leave it > enabled for either on x86. What kind of driver exploits this? If it is windows-only, then I can safely drop all of these hacks. Or it is a adapter's bios? Or host bios? > > Maybe these really should be real_host_page_size, but I can't really > picture how an underlying 64k page size host gets imposed on a guest > that's only aware of a 4k page size. For instance, what prevents a 4k > guest from mapping PCI BARs with 4k alignment. Multiple BARs for > different devices fit within a 64k host page. MMU mappings would seem > to have similar issues. These quirks need to be on the granularity we > take a target page fault, which I imagine is the same as the > real_host_page_size. I'd expect that unless you're going to support > consumer graphics or crappy realtek NICs, none of these quirks are > relevant to you. > >>>> This keeps using TARGET_PAGE_MASK for IOMMU regions though as it is >>>> the minimum page size which IOMMU regions may be using and at the moment >>>> memory regions do not carry the actual page size. >>>> >>>> Signed-off-by: Alexey Kardashevskiy >>>> --- >>>> >>>> In reality DMA windows are always a lot bigger than a single 4K page >>>> and aligned to 32/64MB, may be only use here qemu_real_host_page_mask? >>> >>> I don't understand what this is asking either. While the bulk of memory >>> is going to be mapped in larger chunks, we do occasionally see 4k >>> mappings on x86, particularly in some of the legacy low memory areas. >> >> >> The question was not about individual mappings - these are handled by a >> iommu memory region notifier; here we are dealing with DMA windows which >> are always megabytes but nothing really prevents/prohibits a guest from >> requesting a 4K _window_ (with a single TCE entry). Whether we want to >> support such small windows or not - this was a question. > > But you're using the same memory listener that does deak with 4k > mappings. What optimization would you hope to achieve by assuming only > larger mappings that compensates for the lack of generality? Thanks, I gave up here already - I reworked this part in v3 :) Thanks for review and education. -- Alexey