Re: [RFC] Xen PV IOMMU interface draft B

All the mail mirrored from lore.kernel.org
 help / color / mirror / Atom feed

From: Malcolm Crossley <malcolm.crossley@citrix.com>
To: "Yu, Zhang" <yu.c.zhang@linux.intel.com>,
	xen-devel <xen-devel@lists.xenproject.org>,
	Jan Beulich <JBeulich@suse.com>,
	Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>,
	Andrew Cooper <andrew.cooper3@citrix.com>,
	Paul Durrant <Paul.Durrant@citrix.com>,
	Kevin Tian <kevin.tian@intel.com>,
	"Lv, Zhiyuan" <zhiyuan.lv@intel.com>,
	David Vrabel <david.vrabel@citrix.com>
Subject: Re: [RFC] Xen PV IOMMU interface draft B
Date: Wed, 17 Jun 2015 14:44:22 +0100	[thread overview]
Message-ID: <558179B6.3060601@citrix.com> (raw)
In-Reply-To: <55816CAC.7090104@linux.intel.com>

On 17/06/15 13:48, Yu, Zhang wrote:
> Hi Malcolm,
> 
>   Thank you very much for accommodate our XenGT requirement in your
> design. Following are some XenGT related questions. :)
> 
> On 6/13/2015 12:43 AM, Malcolm Crossley wrote:
>> Hi All,
<snip>
>>
>> IOMMUOP_map_foreign_page
>> ----------------
>> This subop uses `struct map_foreign_page` part of the `struct pv_iommu_op`.
>>
>> It is not valid to use domid representing the calling domain.
>>
>> The hypercall will only succeed if calling domain has sufficient privilege over
>> the specified domid
>>
>> If there is no IOMMU support then the MFN is returned in the BFN field (that is
>> the only valid bus address for the GFN + domid combination).
>>
>> If there IOMMU support then the specified BFN is returned for the GFN + domid
>> combination
>>
>> The M2B mechanism is a MFN to (BFN,domid,ioserver) tuple.
>>
>> Each successful subop will add to the M2B if there was not an existing identical
>> M2B entry.
>>
>> Every new M2B entry will take a reference to the MFN backing the GFN.
>>
>> All the following conditions are required to be true for PV IOMMU map_foreign
>> subop to succeed:
>>
>> 1. IOMMU detected and supported by Xen
>> 2. The domain has IOMMU controlled hardware allocated to it
>> 3. The domain is a hardware_domain and the following Xen IOMMU options are
>>     NOT enabled: dom0-passthrough
> What if the IOMMU is enabled, and runs in the default mode, which 1:1 maps all memories except owned
> by Xen?

Good question. A PV IOMMU aware guest will know the 1:1 map exists and can use the
IOMMUOP_unmap_page to remove any mappings which will conflict with it's planned BFN mappings.

For a PV IOMMU unaware guest I think the IOMMUOP_lookup_foreign_page should be used instead. This
will allow the IOSERVER to register interest in the Domid + GFN it's using and allow ballooning to
be used.


FYI, The 1:1 map on PV guests will be setup without taking a reference to the MFN otherwise unaware
PV guests will be unable to create page tables.

>>
>>
>> This subop usage of the "struct pv_iommu_op" and ``struct map_foreign_page`
>> fields are detailed below:
>>
>> --------------------------------------------------------------------
>> Field          Purpose
>> -----          -----------------------------------------------------
>> `domid`        [in] The domain ID for which the gfn field applies
>>
>> `ioserver`     [in] IOREQ server id associated with mapping
>>
>> `bfn`          [in] Bus address frame number for gfn address
>>
>> `gfn`          [in] Guest address frame number
>>
>> `flags`        [in] Details the status of the BFN mapping
>>
>> `status`       [out] status of this subop, 0 indicates success
>> --------------------------------------------------------------------
>>
>> Defined bits for flags field:
>>
>> Name                         Bit                Definition
>> ----                        -----      ----------------------------------
>> IOMMUOP_readable              0        BFN IOMMU mapping is readable
>> IOMMUOP_writeable             1        BFN IOMMU mapping is writeable
>> IOMMUOP_swap_mfn              2        BFN IOMMU mapping can be safely
>>                                         swapped to scratch page
>> Reserved for future use      3-9       Reserved flag bits should be 0
>> IOMMU_page_order            10-15      Returns maximum possible page order for
>>                                         all other IOMMUOP subops
>>
>> Defined values for map_foreign_page subop status field:
>>
>> Error code  Reason
>> ----------  ------------------------------------------------------------
>> 0            subop successfully returned
>> -EIO         IOMMU unit returned error when attempting to map BFN to GFN.
>> -EPERM       Calling domain does not have sufficient privilege over domid
>> -EPERM       GFN could not be mapped because the GFN belongs to Xen.
>> -EPERM       domid maps to DOMID_SELF
>> -EACCES      BFN address conflicts with RMRR regions for device's attached to
>>               DOMID_SELF
>> -ENODEV      Provided ioserver id is not valid
>> -ENXIO       Provided domid id is not valid
>> -ENXIO       Provided GFN address is not valid
>> -ENOSPC      Page order is too large for either BFN, GFN or IOMMU unit
>>
>> IOMMU_lookup_foreign_page
>> ----------------
>> This subop uses `struct lookup_foreign_page` part of the `struct pv_iommu_op`.
>>
>> If the BFN is specified as an input and parameter and there is no IOMMU support
>> for the calling domain then an error will be returned.
>>
>> It is the calling domain responsibility to ensure there are no conflicts
>>
>> The hypercall will only succeed if calling domain has sufficient privilege over
>> the specified domid
>>
>> If there is no IOMMU support then the MFN is returned in the BFN field (that is
>> the only valid bus address for the GFN + domid combination).
> Similarly, what if the IOMMU is enabled, and runs in the default mode,
> which 1:1 maps all memories except owned by Xen? Will a MFN be returned?
> Or should we take the query/map ops instead of the lookup op for this
> situation?

The lookup will return the BFN which is 1:1 mapped to the MFN.

Only the hardware domain will have precreated BFN mappings of other domains memory.

So the logic could look like this:

If dom0 then lookup use P2M to get MFN then use M2B to lookup BFN if this fails
then check if BFN is mapped to MFN 1:1, if so
return BFN else return -ENOENT.


>>
>> Each successful subop will add to the M2B if there was not an existing identical
>> M2B entry.
>>
>> Every new M2B entry will take a reference to the MFN backing the GFN.
>>
<snip>
>>
>> IOMMUOP_*_foreign_page interactions with guest domain ballooning
>> ================================================================
>>
>> Guest domains can balloon out a set of GFN mappings at any time and render the
>> BFN to GFN mapping invalid.
>>
>> When a BFN to GFN mapping becomes invalid, Xen will issue a buffered IO request
>> of type IOREQ_TYPE_INVALIDATE to the affected IOREQ servers with the now invalid
>> BFN address in the data field. If the buffered IO request ring is full then a
>> standard (synchronous) IO request of type IOREQ_TYPE_INVALIDATE will be issued
>> to the affected IOREQ server the with just invalidated BFN address in the data
>> field.
>>
>> The BFN mappings cannot be simply unmapped at the point of the balloon hypercall
>> otherwise a malicious guest could specifically balloon out an in use GFN address
>> in use by an emulator and trigger IOMMU faults for the domains with BFN
>> mappings.
>>
>> For hosts with no IOMMU support: The affected emulator(s) must specifically
>> issue a IOMMUOP_unmap_foreign_page subop for the now invalid BFN address so that
>> the references to the underlying MFN are removed and the MFN can be freed back
>> to the Xen memory allocator.
> I do not quite understand this. With no IOMMU support, these BFNs are
> supplied by hypervisor. So why not let hypervisor do this unmap and
> notify the calling domain?

We need the emulators to do the unmap so that they can ensure that hardware is not actively using
the BFN (same as MFN in this case) otherwise Xen may allocate that MFN to another guest and that
guest will have it's memory corrupted.

Another way to think about it is that a malicious guest could set up a long running DMA to it's RAM
and then deliberately balloons out that RAM whilst the DMA is running. The only way to secure that
scenario is not let the balloon out RAM to be used until the emulator confirms it's safe to do so.

The IOMMUOP_swap_mfn optimisation has been added to allow Xen to drop reference's safely.
Unfortunately it requires the IOMMU to be enabled.

>>
<snip>
>> Emulator usage of PV IOMMU interface
>> ====================================
>>
>> Emulators which require bus address mapping of guest RAM must first determine if
>> it's possible for the domain to control the bus addresses themselves.
>>
>> A IOMMUOP_query_caps subop will return the IOMMU_QUERY_map_cap flag. If this
>> flag is set then the emulator may specify the BFN address it wishes guest RAM to
>> be mapped to via the IOMMUOP_map_foreign_page subop.  If the flag is not set
>> then the emulator must use BFN addresses supplied by the Xen via the
>> IOMMUOP_lookup_foreign_page.
>>
>> Operating systems which use the IOMMUOP_map_page subop are expected to provide a
>> common interface for emulators
> 
> According to our previous internal discussions, my understanding about
> the usage is this:
> 1> PV IOMMU has an interface in dom0's kernel to do the query/map/lookup
> all at once, which also includes the BFN allocation algorithm.
> 2> When XenGT emulator tries to construct a shadow PTE, we can just call
> your interface, which returns a BFN whatever.
> 
> However, the above description seems the XenGT device model need to do
> the query/lookup/map by itself?
The above description is to cover emulator which may run in their own domain (stub domain).

> Besides, could you please give a more detailed information about this
> 'common interface'? :)

I will try to include more details in the next draft.

My current thinking is to reuse the "struct pv_iommu_op" array of ops and just implement a common
function for requesting a BFN mapping. The common function will fill in the subOp_field for the caller.

Thanks for your feedback and please trim your replies as Jan suggested. It makes it much easier to
find and reply to your inline comments.

> 
> Thanks
> Yu
>>
>> _______________________________________________
>> Xen-devel mailing list
>> Xen-devel@lists.xen.org
>> http://lists.xen.org/xen-devel
>>

next prev parent reply	other threads:[~2015-06-17 13:45 UTC|newest]

Thread overview: 20+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-06-12 16:43 [RFC] Xen PV IOMMU interface draft B Malcolm Crossley
2015-06-16 13:19 ` Jan Beulich
2015-06-16 14:47   ` Malcolm Crossley
2015-06-16 15:56     ` Jan Beulich
2015-06-17 12:48 ` Yu, Zhang
2015-06-17 13:34   ` Jan Beulich
2015-06-17 13:44   ` Malcolm Crossley [this message]
2015-06-26 10:23 ` Xen PV IOMMU interface draft C Malcolm Crossley
2015-06-26 11:03   ` Ian Campbell
2015-06-29 14:40     ` Konrad Rzeszutek Wilk
2015-06-29 14:52       ` Ian Campbell
2015-06-29 15:05         ` Malcolm Crossley
2015-06-29 15:24         ` David Vrabel
2015-06-29 15:36           ` Ian Campbell
2015-07-10 19:32   ` Konrad Rzeszutek Wilk
2016-02-10 10:09   ` Xen PV IOMMU interface draft D Malcolm Crossley
2016-02-18  8:21     ` Tian, Kevin
2016-02-23 16:17     ` Jan Beulich
2016-02-23 16:22       ` Malcolm Crossley
2016-03-02  6:54     ` Tian, Kevin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=558179B6.3060601@citrix.com \
    --to=malcolm.crossley@citrix.com \
    --cc=JBeulich@suse.com \
    --cc=Paul.Durrant@citrix.com \
    --cc=andrew.cooper3@citrix.com \
    --cc=david.vrabel@citrix.com \
    --cc=kevin.tian@intel.com \
    --cc=konrad.wilk@oracle.com \
    --cc=xen-devel@lists.xenproject.org \
    --cc=yu.c.zhang@linux.intel.com \
    --cc=zhiyuan.lv@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.