All the mail mirrored from lore.kernel.org
 help / color / mirror / Atom feed
From: "Tian, Kevin" <kevin.tian@intel.com>
To: Malcolm Crossley <malcolm.crossley@citrix.com>,
	xen-devel <xen-devel@lists.xenproject.org>,
	Jan Beulich <JBeulich@suse.com>,
	Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>,
	Andrew Cooper <andrew.cooper3@citrix.com>,
	Paul Durrant <Paul.Durrant@citrix.com>,
	"Lv, Zhiyuan" <zhiyuan.lv@intel.com>,
	"Zhang, Yu C" <yu.c.zhang@intel.com>,
	David Vrabel <david.vrabel@citrix.com>,
	Ian Campbell <Ian.Campbell@citrix.com>
Subject: Re: Xen PV IOMMU interface draft D
Date: Wed, 2 Mar 2016 06:54:43 +0000	[thread overview]
Message-ID: <AADFC41AFE54684AB9EE6CBC0274A5D15F7D82BE@SHSMSX101.ccr.corp.intel.com> (raw)
In-Reply-To: 56BB0C5A.7030306@citrix.com

Hi, Malcolm,

Not sure whether I missed your reply or not, but failed to find it in my
archive. Could you help re-post if you already did so? Sorry that my
comments might be a bit late which didn't catch previous draft discussions,
but some of below questions are really important to help us understand
how this new interface works with XenGT...

Thanks
Kevin

> From: Tian, Kevin
> Sent: Thursday, February 18, 2016 4:21 PM
> 
> > From: Malcolm Crossley [mailto:malcolm.crossley@citrix.com]
> > Sent: Wednesday, February 10, 2016 6:09 PM
> 
> As Konrad commented, it's better to add this doc as 1st patch in your series
> then it's easier to review it with other patches together. Also it's always
> good to include such design doc in the repo.
> 
> Other comments embedded.
> 
> [...]
> >
> > Clarification of GFN and BFN fields for different guest types
> > -------------------------------------------------------------
> >
> [...]
> > Bus Frame Numbers (BFN) refer to the address presented on the physical bus
> > before being translated by the IOMMU.
> >
> > Diagram below details memory accesses originating from physical device.
> >
> >     Physical Device
> >           |
> >         (BFN)
> >           |
> > 	   IOMMU-PT
> >           |
> >         (MFN)
> >           |
> >          RAM
> 
> Curious what IOMMU-'PT' means here?
> 
> [...]
> > General principles for PV IOMMU interface
> > =========================================
> >
> > There are two different usage models for the BFN address space of a calling
> > guest based upon the two purposes specified in the section above.
> >
> > A calling guest may use their BFN address space for only one of the purposes
> > detailed above and so the PV IOMMU interface has a subop per usage model.
> > Furthermore, the IOMMU mapping of foreign domains memory is more complex than
> > IOMMU mapping local domain memory and seperating the subops allows for the
> > complexity to be split in the implementation.
> >
> > The PV IOMMU design allows the calling domain to control it's BFN memory map.
> > Thus the design also assigns the responsiblity of ensuring a BFN address
> > mapped for local domain memory mappings are not reused for foreign domain
> > memory mappings without an explict unmap of BFN address first. This simplifies
> > the usage of the API and the extra overhead for the calling domains should be
> > minimal as they should be already tracking the BFN address space usage already.
> 
> It might be clearer if you can add a separate section for BFN itself, i.e.
> how it is managed/allocated in different scenarios. I know most info is
> already provided in this text, but not centralized so far. :-)
> 
> >
> >
> > Emulator usage of PV IOMMU interface
> > ====================================
> 
> I'd suggest moving this and later sections to behind basic API introduction.
> Otherwise insufficient background on so many API references at this point.
> 
> >
> > Emulators which require bus address mapping of guest RAM must first determine if
> > it's possible for the domain to control the bus addresses themselves.
> >
> > A IOMMUOP_query_caps subop will return the IOMMU_QUERY_map_cap flag. If this
> > flag is set then the emulator may specify the BFN address it wishes guest RAM to
> > be mapped to via the IOMMUOP_map_foreign_page subop.  If the flag is not set
> > then the emulator must use BFN addresses supplied by the Xen via the
> > IOMMUOP_lookup_foreign_page.
> 
> IOMMU_QUERY_map_cap is a bit confusing here. Above paragraph is about
> whether emulator is allowed to allocate/specify BFN itself. However this
> capability name is more read as whether the calling domain can map foreign
> pages which is actually true regardless of how BFN is allocated.
> 
> >
> > Operating systems which use the IOMMUOP_map_page subop are expected to provide
> a
> > common interface for emulators to use. Otherwise emulators will not be aware
> > of existing BFN mappings created by operating system and will get failed
> > subops due to conflicts in the BFN address space for the domain.
> 
> Do you mean that emulator needs to detect whether OS is using
> IOMMUOP_map_page? If yes, then emulator calls a common interface
> provided by OS. If not, then emulator just directly invoke raw IOMMUOP
> itself. I'm not certain whether there is common mechanism to detect
> this so far. Could you elaborate your thought here?
> 
> >
> > Emulators should unmap unused GFN mappings as often as possible using
> > IOMMUOP_unmap_foreign_page subops so that guest domains can balloon pages
> > quickly and efficiently.
> 
> Following earlier analysis then this only applies when OS doesn't use IOMMUOP.
> Otherwise emulator needs call a 'OS common interface' right?
> 
> >
> > Emulators should conform to the ballooning behaviour described section
> > "IOMMUOP_*_foreign_page interactions with guest domain ballooning" so that guest
> > domains are able to effectively balloon out and in memory.
> >
> > Emulators must unmap any active BFN mappings when they shutdown.
> >
> > IOMMUOP_*_foreign_page interactions with guest domain ballooning
> >
> =====================================================
> > ===========
> >
> > Guest domains can balloon out a set of GFN mappings at any time and render the
> > BFN to GFN mapping invalid.
> >
> > When a BFN to GFN mapping becomes invalid, Xen will issue a buffered I/O request
> > of type IOREQ_TYPE_INVALIDATE to the affected IOREQ servers with the now invalid
> > BFN address in the data field. If the buffered I/O request ring is full then a
> > standard (synchronous) I/O request of type IOREQ_TYPE_INVALIDATE will be issued
> > to the affected IOREQ server the with just invalidated BFN address in the data
> > field.
> >
> > The BFN mappings cannot be simply unmapped at the point of the balloon hypercall
> > otherwise a malicious guest could specifically balloon out an in use GFN address
> > in use by an emulator and trigger IOMMU faults for the domains with BFN
> > mappings.
> 
> Is it a real problem? Today for PCI passthru, what will happen if guest programs
> assigned device with a bad GPA which is not mapped in IOMMU? I think IOMMU
> fault should be fine, and we can just leverage existing IOMMU fault handling after
> the fault is triggered.
> 
> >
> > For hosts with no IOMMU support: The affected emulator(s) must specifically
> > issue a IOMMUOP_unmap_foreign_page subop for the now invalid BFN address so that
> > the references to the underlying MFN are removed and the MFN can be freed back
> > to the Xen memory allocator.
> >
> > For hosts with IOMMU support:
> > If the BFN was mapped without the IOMMUOP_swap_mfn flag set in the
> > IOMMUOP_map_foreign_page then the affected affected emulator(s) must
> > specifically issue a IOMMUOP_unmap_foreign_page subop for the now invalid BFN
> > address so that the references to the underlying MFN are removed.
> >
> > If the BFN was mapped with the IOMMUOP_swap_mfn flag set in the
> > IOMMUOP_map_foreign_page subop for all emulators with mappings of that GFN then
> > the BFN mapping will be swapped to point at a scratch MFN page and all BFN
> > references to the invalid MFN will be removed by Xen after the BFN mapping has
> > been updated to point at the scratch MFN page.
> 
> I don't understand why for 'swap' case you don't need emulator to do
> explicit unmap. You can think 'noswap' (page-A to invalid) as a special
> example of 'swap' (page-A to scratch page), since they both move
> away from page-A reference. If there is a reason that emulator needs
> to do some cleanup internally before dropping the reference, does
> 'swap_mfn' breaks that situation then?
> 
> >
> > The rationale for swapping the BFN mapping to point at scratch pages is to
> > enable guest domains to balloon quickly without requiring hypercall(s) from
> > emulators.
> >
> > Not all BFN mappings can be swapped without potentially causing problems for the
> > hardware itself (command rings etc.) so the IOMMUOP_swap_mfn flag is used to
> > allow per BFN control of Xen ballooning behaviour.
> 
> Who will judge whether a BFN mapping can be swapped then?
> 
> [...]
> > Xen PV IOMMU hypercall interface
> > --------------------------------
> > A two argument hypercall interface (do_iommu_op).
> >
> >     ret_t do_iommu_op(XEN_GUEST_HANDLE_PARAM(void) arg, unsigned int count)
> >
> > First argument, guest handle pointer to array of `struct pv_iommu_op`
> >
> > Second argument, unsigned integer count of `struct pv_iommu_op` elements in array.
> >
> > Definition of `struct pv_iommu_op`:
> >
> >     struct pv_iommu_op {
> >
> >         uint16_t subop_id;
> >         uint16_t flags;
> >         int32_t status;
> >
> >         union {
> >             struct {
> >                 uint64_t bfn;
> >                 uint64_t gfn;
> >             } map_page;
> >
> >             struct {
> >                 uint64_t bfn;
> >             } unmap_page;
> >
> >             struct {
> >                 uint64_t bfn;
> >                 uint64_t gfn;
> >                 uint16_t domid;
> >                 ioservid_t ioserver;
> >             } map_foreign_page;
> >
> >             struct {
> >                 uint64_t bfn;
> >                 uint64_t gfn;
> >                 uint16_t domid;
> >                 ioservid_t ioserver;
> >             } lookup_foreign_page;
> >
> >             struct {
> >                 uint64_t bfn;
> >                 ioservid_t ioserver;
> >             } unmap_foreign_page;
> >         } u;
> >     };
> 
> Do we really need such ioserver ID here? Could it be simple
> as looping all ioreq servers with INVALIDATE notifications?
> 
> 
> [...]
> >
> > IOMMUOP_map_page
> > ----------------------
> > This subop uses `struct map_page` part of the `struct pv_iommu_op`.
> >
> > If IOMMU dom0-strict mode is NOT enabled then the hardware domain will be
> > allowed to map all GFNs except for Xen owned MFNs else the hardware
> > domain will only be allowed to map GFNs which it owns.
> 
> "map all GFNs" -> "map all MFNs" since you use "except for Xen owned MFNs"
> later. Since you have a capability called IOMMU_QUERY_map_all_mfns, should
> you add such condition in above description?
> 
> >
> > If IOMMU dom0-strict mode is NOT enabled then the hardware domain will be
> > allowed to map all GFNs without taking a reference to the MFN backing the GFN
> > by setting the IOMMU_MAP_OP_no_ref_cnt flag.
> 
> could you elaborate when no_ref_cnt is required?
> 
> [...]
> >
> > IOMMUOP_unmap_page
> > ------------------
> > This subop uses `struct unmap_page` part of the `struct pv_iommu_op`.
> >
> > The subop usage of the `struct pv_iommu_op` and `struct unmap_page` fields
> > are detailed below:
> >
> > --------------------------------------------------------------------
> > Field          Purpose
> > -----          -----------------------------------------------------
> > `bfn`          [in] Bus address frame number to be unmapped in DOMID_SELF
> >
> > `flags`        [in] Flags for signalling page order of unmap operation
> >
> > `status`       [out] Mapping status of this unmap operation, 0 indicates success
> > --------------------------------------------------------------------
> >
> > Defined bits for flags field:
> >
> > Name                        Bit                Definition
> > ----                       -----      ----------------------------------
> > IOMMU_UNMAP_OP_remove_m2b    0        Wildcard M2B mapping removed for
> >                                       lookup_foreign_page use
> 
> Is it explicitly required? Should it be implicit as long as a valid M2B entry existing?
> 
> 
> [...]
> > IOMMUOP_map_foreign_page
> > ------------------------
> > This subop uses `struct map_foreign_page` part of the `struct pv_iommu_op`.
> >
> > It is not valid to use a domid representing the calling domain.
> 
> Then what's being used here to represent the calling domain?
> 
> >
> > The hypercall will only succeed if calling domain has sufficient privilege over
> > the specified domid.
> 
> How is this privilege check being done? Is there existing mechanism, or something
> new to add?
> 
> >
> > The M2B mechanism is an MFN to (BFN,domid,ioserver) tuple.
> >
> > Each successful subop will add to the M2B if there was not an existing identical
> > M2B entry.
> >
> > Every new M2B entry will take a reference to the MFN backing the GFN.
> >
> > All the following conditions are required to be true for PV IOMMU map_foreign
> > subop to succeed:
> >
> > 1. IOMMU detected and supported by Xen
> > 2. The domain has IOMMU controlled hardware allocated to it
> > 3. The domain is the hardware_domain and the following Xen IOMMU options are
> >    NOT enabled: dom0-passthrough
> 
> 4. the domain has sufficient privilege over the specified domid;
> 
> [...]
> >
> > IOMMU_lookup_foreign_page
> > -------------------------
> > This subop uses `struct lookup_foreign_page` part of the `struct pv_iommu_op`.
> >
> > This subop lookups up a BFN mapping for a ioserver + gfn + target domid
> > combination.
> >
> > The hypercall will only succeed if calling domain has sufficient privilege over
> > the specified domid.
> >
> > If a 1:1 mapping exists of BFN to MFN then a M2B entry is added and a
> > reference is taken to the underlying MFN. If an existing mapping is present
> 
> Then when will this very reference be dropped?
> 
> > then the BFN is returned and no additional reference's will be taken to the
> > underlying MFN.
> >
> > A 1:1 mapping will exist if there is no IOMMU support or if the PV hardware
> > domain was booted in dom0-relaxed mode or in dom0-passthrough mode.
> 
> what about hardware domain using IOMMUOPS in the meantime? In that
> case, from your earlier description it's hardware domain to manage BFN
> addr space, while here 1:1 mapping is some hard assumption in hypervisor,
> so two things together may conflict. There needs to be a mechanism
> that once Xen sees any explicit BFN passed from hardware domain, then
> such 1:1 mapping scheme should be disabled.
> 
> >
> > If there is no IOMMU support then the MFN is returned in the BFN field (that is
> > the only valid bus address for the GFN + domid combination).
> >
> 
> [...]
> >
> > Linux kernel architecture
> > =========================
> >
> > The Linux kernel will use the PV-IOMMU hypercalls to map its PFN address
> > space into the IOMMU. It will map the PFNs to the IOMMU address space using
> > a 1:1 mapping, it does this by programming a BFN to GFN mapping which matches
> > the PFN to GFN mapping.
> >
> > The native SWIOTLB will be used to handle devices which cannot DMA to all of
> > the kernel's PFN address space.
> >
> > An interface shall be provided for emulator usage of IOMMUOP_*_foreign_page
> > subops which will allow the Linux kernel to centrally manage that domain's BFN
> > resource and ensure there are no unexpected conflicts.
> 
> One open here. When IOMMU is enabled, there is supposed to be a
> IOVA space created in Linux kernel. How does this BFN space play
> with that one?
> 
> Thanks
> Kevin
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

      parent reply	other threads:[~2016-03-02  6:54 UTC|newest]

Thread overview: 20+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-06-12 16:43 [RFC] Xen PV IOMMU interface draft B Malcolm Crossley
2015-06-16 13:19 ` Jan Beulich
2015-06-16 14:47   ` Malcolm Crossley
2015-06-16 15:56     ` Jan Beulich
2015-06-17 12:48 ` Yu, Zhang
2015-06-17 13:34   ` Jan Beulich
2015-06-17 13:44   ` Malcolm Crossley
2015-06-26 10:23 ` Xen PV IOMMU interface draft C Malcolm Crossley
2015-06-26 11:03   ` Ian Campbell
2015-06-29 14:40     ` Konrad Rzeszutek Wilk
2015-06-29 14:52       ` Ian Campbell
2015-06-29 15:05         ` Malcolm Crossley
2015-06-29 15:24         ` David Vrabel
2015-06-29 15:36           ` Ian Campbell
2015-07-10 19:32   ` Konrad Rzeszutek Wilk
2016-02-10 10:09   ` Xen PV IOMMU interface draft D Malcolm Crossley
2016-02-18  8:21     ` Tian, Kevin
2016-02-23 16:17     ` Jan Beulich
2016-02-23 16:22       ` Malcolm Crossley
2016-03-02  6:54     ` Tian, Kevin [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=AADFC41AFE54684AB9EE6CBC0274A5D15F7D82BE@SHSMSX101.ccr.corp.intel.com \
    --to=kevin.tian@intel.com \
    --cc=Ian.Campbell@citrix.com \
    --cc=JBeulich@suse.com \
    --cc=Paul.Durrant@citrix.com \
    --cc=andrew.cooper3@citrix.com \
    --cc=david.vrabel@citrix.com \
    --cc=konrad.wilk@oracle.com \
    --cc=malcolm.crossley@citrix.com \
    --cc=xen-devel@lists.xenproject.org \
    --cc=yu.c.zhang@intel.com \
    --cc=zhiyuan.lv@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.