From mboxrd@z Thu Jan  1 00:00:00 1970
Received: from eggs.gnu.org ([2001:4830:134:3::10]:52835)
	by lists.gnu.org with esmtp (Exim 4.71)
	(envelope-from <aik@ozlabs.ru>) id 1ZFAtF-0000BA-Ap
	for qemu-devel@nongnu.org; Tue, 14 Jul 2015 20:49:38 -0400
Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71)
	(envelope-from <aik@ozlabs.ru>) id 1ZFAtC-0002Hc-1s
	for qemu-devel@nongnu.org; Tue, 14 Jul 2015 20:49:37 -0400
Received: from mail-pd0-f177.google.com ([209.85.192.177]:33966)
	by eggs.gnu.org with esmtp (Exim 4.71)
	(envelope-from <aik@ozlabs.ru>) id 1ZFAtB-0002Gb-Q3
	for qemu-devel@nongnu.org; Tue, 14 Jul 2015 20:49:33 -0400
Received: by pdbep18 with SMTP id ep18so14972191pdb.1
	for <qemu-devel@nongnu.org>; Tue, 14 Jul 2015 17:49:33 -0700 (PDT)
References: <1436799381-16150-1-git-send-email-aik@ozlabs.ru>
	<1436799381-16150-2-git-send-email-aik@ozlabs.ru>
	<1436814815.1391.388.camel@redhat.com> <55A4B306.6040402@ozlabs.ru>
	<1436912893.1391.484.camel@redhat.com>
From: Alexey Kardashevskiy <aik@ozlabs.ru>
Message-ID: <55A5AE0F.7040207@ozlabs.ru>
Date: Wed, 15 Jul 2015 10:49:19 +1000
MIME-Version: 1.0
In-Reply-To: <1436912893.1391.484.camel@redhat.com>
Content-Type: text/plain; charset=koi8-r; format=flowed
Content-Transfer-Encoding: 7bit
Subject: Re: [Qemu-devel] [RFC PATCH qemu v2 1/5] vfio: Switch from
 TARGET_PAGE_MASK to qemu_real_host_page_mask
List-Id: <qemu-devel.nongnu.org>
List-Unsubscribe: <https://lists.nongnu.org/mailman/options/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=unsubscribe>
List-Archive: <http://lists.nongnu.org/archive/html/qemu-devel>
List-Post: <mailto:qemu-devel@nongnu.org>
List-Help: <mailto:qemu-devel-request@nongnu.org?subject=help>
List-Subscribe: <https://lists.nongnu.org/mailman/listinfo/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=subscribe>
To: Alex Williamson <alex.williamson@redhat.com>
Cc: Michael Roth <mdroth@linux.vnet.ibm.com>, qemu-ppc@nongnu.org, qemu-devel@nongnu.org, David Gibson <david@gibson.dropbear.id.au>

On 07/15/2015 08:28 AM, Alex Williamson wrote:
> On Tue, 2015-07-14 at 16:58 +1000, Alexey Kardashevskiy wrote:
>> On 07/14/2015 05:13 AM, Alex Williamson wrote:
>>> On Tue, 2015-07-14 at 00:56 +1000, Alexey Kardashevskiy wrote:
>>>> These started switching from TARGET_PAGE_MASK (hardcoded as 4K) to
>>>> a real host page size:
>>>> 4e51361d7 "cpu-all: complete "real" host page size API" and
>>>> f7ceed190 "vfio: cpu: Use "real" page size API"
>>>>
>>>> This finished the transition by:
>>>> - %s/TARGET_PAGE_MASK/qemu_real_host_page_mask/
>>>> - %s/TARGET_PAGE_ALIGN/REAL_HOST_PAGE_ALIGN/
>>>> - removing bitfield length for offsets in VFIOQuirk::data as
>>>> qemu_real_host_page_mask is not a macro
>>>
>>> Can we assume that none of the changes to quirks have actually been
>>> tested?
>>
>> No, why? :) I tried it on one of NVIDIAs I got here -
>> VGA compatible controller: NVIDIA Corporation GM107GL [Quadro K2200] (rev a2)
>> The driver was from NVIDIA (not nouveau) and the test was "acos" (some
>> basic CUDA test).
>
> That's only one of a handful or more quirks.  The VGA related quirks are
> all for backdoors in the bootstrap process and MMIO access to config
> space, things that I would not expect to see on power.  So power
> probably isn't a useful host to test these things.
>
>>> I don't really support them being bundled in here since they
>>> really aren't related to what you're doing.
>>
>> This makes sense, I'll move them to a separate patch and add a note how it
>> helps on a 64k-pages host.
>>
>>> For DMA we generally want
>>> to be host IOMMU page aligned,
>>
>> Do all known IOMMUs use a constant page size? IOMMU memory region does not
>> have an IOMMU page size/mask and I wanted to add it there but not sure if
>> it is generic enough.
>
> AMD supports nearly any power-of-2 size (>=4k), Intel supports 4k +
> optionally 2M and 1G.  The vfio type1 iommu driver looks for physically
> contiguous ranges to give to the hardware iommu driver, which makes use
> of whatever optimal page size it can.  Therefore, we really don't care
> what the hardware page size is beyond the assumption that it supports
> 4k.  When hugepages are used by the VM, we expect the iommu will
> automatically make use of them if supported.  A non-VM vfio userspace
> driver might care a little bit more about the supported page sizes, I
> imagine.

Oh. Cooler that us (p8).

>>> which we can generally assume is the same
>>> as host page aligned,
>>
>> They are almost never the same on sPAPR for 32bit windows...
>>
>>> but quirks are simply I/O regions, so I think they
>>> ought to continue to be target page aligned.
>>
>> Without s/TARGET_PAGE_MASK/qemu_real_host_page_mask/,
>> &vfio_nvidia_88000_quirk fails+exits in kvm_set_phys_mem() as the size of
>> section is 0x88000. It still works with x-mmap=false (or TCG, I suppose)
>> though.
>
> Think about what this is doing to the guest.  There's a 4k window
> (because PCIe extended config space is 4k) at offset 0x88000 of the MMIO
> BAR that allows access to PCI config space of the device.  With 4k pages
> this all aligns quite nicely and only config space accesses are trapped.
> With 64k pages, you're trapping everything from 0x80000 to 0x8ffff.  We
> have no idea what else might live in that space and what kind of
> performance impact it'll cause to the operation of the device.  If you

If the alternative is not to work all, then working slower (potentially) is ok.


> don't know that you need it and can't meet the mapping criteria, don't
> enable it.  BTW, this quirk is for GeForce, not Quadro.  The region
> seems not to be used by Quadro drivers and we can't programatically tell
> the difference between Quadro and GeForce hardware, so we leave it
> enabled for either on x86.

What kind of driver exploits this? If it is windows-only, then I can safely 
drop all of these hacks.

Or it is a adapter's bios? Or host bios?

>
> Maybe these really should be real_host_page_size, but I can't really
> picture how an underlying 64k page size host gets imposed on a guest
> that's only aware of a 4k page size.  For instance, what prevents a 4k
> guest from mapping PCI BARs with 4k alignment.  Multiple BARs for
> different devices fit within a 64k host page.  MMU mappings would seem
> to have similar issues.  These quirks need to be on the granularity we
> take a target page fault, which I imagine is the same as the
> real_host_page_size.  I'd expect that unless you're going to support
> consumer graphics or crappy realtek NICs, none of these quirks are
> relevant to you.
>
>>>> This keeps using TARGET_PAGE_MASK for IOMMU regions though as it is
>>>> the minimum page size which IOMMU regions may be using and at the moment
>>>> memory regions do not carry the actual page size.
>>>>
>>>> Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
>>>> ---
>>>>
>>>> In reality DMA windows are always a lot bigger than a single 4K page
>>>> and aligned to 32/64MB, may be only use here qemu_real_host_page_mask?
>>>
>>> I don't understand what this is asking either.  While the bulk of memory
>>> is going to be mapped in larger chunks, we do occasionally see 4k
>>> mappings on x86, particularly in some of the legacy low memory areas.
>>
>>
>> The question was not about individual mappings - these are handled by a
>> iommu memory region notifier; here we are dealing with DMA windows which
>> are always megabytes but nothing really prevents/prohibits a guest from
>> requesting a 4K _window_ (with a single TCE entry). Whether we want to
>> support such small windows or not - this was a question.
>
> But you're using the same memory listener that does deak with 4k
> mappings.  What optimization would you hope to achieve by assuming only
> larger mappings that compensates for the lack of generality?  Thanks,

I gave up here already - I reworked this part in v3 :)

Thanks for review and education.


-- 
Alexey