Re: [Qemu-devel] [RFC PATCH qemu v2 1/5] vfio: Switch from TARGET_PAGE_MASK to qemu_real_host_page_mask

All the mail mirrored from lore.kernel.org
 help / color / mirror / Atom feed

From: Alexey Kardashevskiy <aik@ozlabs.ru>
To: Alex Williamson <alex.williamson@redhat.com>
Cc: Michael Roth <mdroth@linux.vnet.ibm.com>,
	qemu-ppc@nongnu.org, qemu-devel@nongnu.org,
	David Gibson <david@gibson.dropbear.id.au>
Subject: Re: [Qemu-devel] [RFC PATCH qemu v2 1/5] vfio: Switch from TARGET_PAGE_MASK to qemu_real_host_page_mask
Date: Wed, 15 Jul 2015 10:49:19 +1000	[thread overview]
Message-ID: <55A5AE0F.7040207@ozlabs.ru> (raw)
In-Reply-To: <1436912893.1391.484.camel@redhat.com>

On 07/15/2015 08:28 AM, Alex Williamson wrote:
> On Tue, 2015-07-14 at 16:58 +1000, Alexey Kardashevskiy wrote:
>> On 07/14/2015 05:13 AM, Alex Williamson wrote:
>>> On Tue, 2015-07-14 at 00:56 +1000, Alexey Kardashevskiy wrote:
>>>> These started switching from TARGET_PAGE_MASK (hardcoded as 4K) to
>>>> a real host page size:
>>>> 4e51361d7 "cpu-all: complete "real" host page size API" and
>>>> f7ceed190 "vfio: cpu: Use "real" page size API"
>>>>
>>>> This finished the transition by:
>>>> - %s/TARGET_PAGE_MASK/qemu_real_host_page_mask/
>>>> - %s/TARGET_PAGE_ALIGN/REAL_HOST_PAGE_ALIGN/
>>>> - removing bitfield length for offsets in VFIOQuirk::data as
>>>> qemu_real_host_page_mask is not a macro
>>>
>>> Can we assume that none of the changes to quirks have actually been
>>> tested?
>>
>> No, why? :) I tried it on one of NVIDIAs I got here -
>> VGA compatible controller: NVIDIA Corporation GM107GL [Quadro K2200] (rev a2)
>> The driver was from NVIDIA (not nouveau) and the test was "acos" (some
>> basic CUDA test).
>
> That's only one of a handful or more quirks.  The VGA related quirks are
> all for backdoors in the bootstrap process and MMIO access to config
> space, things that I would not expect to see on power.  So power
> probably isn't a useful host to test these things.
>
>>> I don't really support them being bundled in here since they
>>> really aren't related to what you're doing.
>>
>> This makes sense, I'll move them to a separate patch and add a note how it
>> helps on a 64k-pages host.
>>
>>> For DMA we generally want
>>> to be host IOMMU page aligned,
>>
>> Do all known IOMMUs use a constant page size? IOMMU memory region does not
>> have an IOMMU page size/mask and I wanted to add it there but not sure if
>> it is generic enough.
>
> AMD supports nearly any power-of-2 size (>=4k), Intel supports 4k +
> optionally 2M and 1G.  The vfio type1 iommu driver looks for physically
> contiguous ranges to give to the hardware iommu driver, which makes use
> of whatever optimal page size it can.  Therefore, we really don't care
> what the hardware page size is beyond the assumption that it supports
> 4k.  When hugepages are used by the VM, we expect the iommu will
> automatically make use of them if supported.  A non-VM vfio userspace
> driver might care a little bit more about the supported page sizes, I
> imagine.

Oh. Cooler that us (p8).

>>> which we can generally assume is the same
>>> as host page aligned,
>>
>> They are almost never the same on sPAPR for 32bit windows...
>>
>>> but quirks are simply I/O regions, so I think they
>>> ought to continue to be target page aligned.
>>
>> Without s/TARGET_PAGE_MASK/qemu_real_host_page_mask/,
>> &vfio_nvidia_88000_quirk fails+exits in kvm_set_phys_mem() as the size of
>> section is 0x88000. It still works with x-mmap=false (or TCG, I suppose)
>> though.
>
> Think about what this is doing to the guest.  There's a 4k window
> (because PCIe extended config space is 4k) at offset 0x88000 of the MMIO
> BAR that allows access to PCI config space of the device.  With 4k pages
> this all aligns quite nicely and only config space accesses are trapped.
> With 64k pages, you're trapping everything from 0x80000 to 0x8ffff.  We
> have no idea what else might live in that space and what kind of
> performance impact it'll cause to the operation of the device.  If you

If the alternative is not to work all, then working slower (potentially) is ok.


> don't know that you need it and can't meet the mapping criteria, don't
> enable it.  BTW, this quirk is for GeForce, not Quadro.  The region
> seems not to be used by Quadro drivers and we can't programatically tell
> the difference between Quadro and GeForce hardware, so we leave it
> enabled for either on x86.

What kind of driver exploits this? If it is windows-only, then I can safely 
drop all of these hacks.

Or it is a adapter's bios? Or host bios?

>
> Maybe these really should be real_host_page_size, but I can't really
> picture how an underlying 64k page size host gets imposed on a guest
> that's only aware of a 4k page size.  For instance, what prevents a 4k
> guest from mapping PCI BARs with 4k alignment.  Multiple BARs for
> different devices fit within a 64k host page.  MMU mappings would seem
> to have similar issues.  These quirks need to be on the granularity we
> take a target page fault, which I imagine is the same as the
> real_host_page_size.  I'd expect that unless you're going to support
> consumer graphics or crappy realtek NICs, none of these quirks are
> relevant to you.
>
>>>> This keeps using TARGET_PAGE_MASK for IOMMU regions though as it is
>>>> the minimum page size which IOMMU regions may be using and at the moment
>>>> memory regions do not carry the actual page size.
>>>>
>>>> Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
>>>> ---
>>>>
>>>> In reality DMA windows are always a lot bigger than a single 4K page
>>>> and aligned to 32/64MB, may be only use here qemu_real_host_page_mask?
>>>
>>> I don't understand what this is asking either.  While the bulk of memory
>>> is going to be mapped in larger chunks, we do occasionally see 4k
>>> mappings on x86, particularly in some of the legacy low memory areas.
>>
>>
>> The question was not about individual mappings - these are handled by a
>> iommu memory region notifier; here we are dealing with DMA windows which
>> are always megabytes but nothing really prevents/prohibits a guest from
>> requesting a 4K _window_ (with a single TCE entry). Whether we want to
>> support such small windows or not - this was a question.
>
> But you're using the same memory listener that does deak with 4k
> mappings.  What optimization would you hope to achieve by assuming only
> larger mappings that compensates for the lack of generality?  Thanks,

I gave up here already - I reworked this part in v3 :)

Thanks for review and education.


-- 
Alexey

next prev parent reply	other threads:[~2015-07-15  0:49 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-07-13 14:56 [Qemu-devel] [RFC PATCH qemu v2 0/5] vfio: SPAPR IOMMU v2 (memory preregistration support) Alexey Kardashevskiy
2015-07-13 14:56 ` [Qemu-devel] [RFC PATCH qemu v2 1/5] vfio: Switch from TARGET_PAGE_MASK to qemu_real_host_page_mask Alexey Kardashevskiy
2015-07-13 19:13   ` Alex Williamson
2015-07-14  6:58     ` Alexey Kardashevskiy
2015-07-14 12:17       ` Alexey Kardashevskiy
2015-07-14 22:28       ` Alex Williamson
2015-07-15  0:49         ` Alexey Kardashevskiy [this message]
2015-07-13 14:56 ` [Qemu-devel] [RFC PATCH qemu v2 2/5] vfio: Skip PCI BARs in memory listener Alexey Kardashevskiy
2015-07-13 14:56 ` [Qemu-devel] [RFC PATCH qemu v2 3/5] vfio: Store IOMMU type in container Alexey Kardashevskiy
2015-07-13 14:56 ` [Qemu-devel] [RFC PATCH qemu v2 4/5] vfio: Refactor memory listener to accommodate more IOMMU types Alexey Kardashevskiy
2015-07-13 14:56 ` [Qemu-devel] [RFC PATCH qemu v2 5/5] vfio: spapr: Add SPAPR IOMMU v2 support (DMA memory preregistering) Alexey Kardashevskiy
2015-07-13 19:37 ` [Qemu-devel] [RFC PATCH qemu v2 0/5] vfio: SPAPR IOMMU v2 (memory preregistration support) Alex Williamson

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=55A5AE0F.7040207@ozlabs.ru \
    --to=aik@ozlabs.ru \
    --cc=alex.williamson@redhat.com \
    --cc=david@gibson.dropbear.id.au \
    --cc=mdroth@linux.vnet.ibm.com \
    --cc=qemu-devel@nongnu.org \
    --cc=qemu-ppc@nongnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.