Xen-Devel Archive mirror
 help / color / mirror / Atom feed
* [Draft D] Xen on ARM vITS Handling
@ 2015-06-04 13:54 Ian Campbell
  2015-06-04 17:55 ` Julien Grall
  2015-06-05  6:07 ` Vijay Kilari
  0 siblings, 2 replies; 18+ messages in thread
From: Ian Campbell @ 2015-06-04 13:54 UTC (permalink / raw
  To: xen-devel@lists.xen.org
  Cc: manish.jaggi, Julien Grall, Stefano Stabellini, Vijay Kilari

Draft D follows. Also at:
http://xenbits.xen.org/people/ianc/vits/draftD.{pdf,html}

In this iteration I took a step back and tried to simplify a lot of
things, in an attempt to get closer to something we can agree on as a
first cut that is achievable for 4.6, since I felt we were getting
bogged down in the complexity of trying to do everything at once. These
assumptions/short comings can be addressed in future iterations of the
code.

Please let me know what you think.

Perhaps it would be useful to have a short IRC meeting on #xenarm to
iron out the last details? If people could let me know their
availability and timezones I can try and set something up.

If people prefer we could use a slot in the Monthly technical call[0],
which is next week but I think IRC is probably better suited?

[0] http://lists.xen.org/archives/html/xen-devel/2015-06/msg00460.html

Ian.

-----8>------------

% Xen on ARM vITS Handling
% Ian Campbell <ian.campbell@citrix.com>
% Draft D

# Changelog

## Since Draft C

* _Major_ rework, in an attempt to simplify everything into something
  more likely to be achievable for 4.6.
    * Made some simplifying assumptions.
    * Reduced the scope of some support.
    * Command emulation is now mostly trivial.
    * Expanded detail on host setup, allowing other assumptions to be
      made during emulation.
* Many other things lost in the noise of the above.

## Since Draft B

* Details of command translation (thanks to Julien and Vijay)
* Added background on LPI Translation and Pending tables
* Added background on Collections
* Settled on `N:N` scheme for vITS:pat's mapping.
* Rejigged section nesting a bit.
* Since we now thing translation should be cheap, settle on
  translation at scheduling time.
* Lazy `INVALL` and `SYNC`

## Since Draft A

* Added discussion of when/where command translation occurs.
* Contention on scheduler lock, suggestion to use SOFTIRQ.
* Handling of domain shutdown.
* More detailed discussion of multiple vs single vits pros/cons.

# Introduction

ARM systems containing a GIC version 3 or later may contain one or
more ITS logical blocks. An ITS is used to route Message Signalled
interrupts from devices into an LPI injection on the processor.

The following summarises the ITS hardware design and serves as a set
of assumptions for the vITS software design. For full details of the
ITS see the "GIC Architecture Specification".

## Locality-specific Peripheral Interrupts (`LPI`)

This is a new class of message signalled interrupts introduced in
GICv3. They occupy the interrupt ID space from `8192..(2^32)-1`.

The number of LPIs support by an ITS is exposed via
`GITS_TYPER.IDbits` (as number of bits - 1), it may be up to
2^32. _Note_: This field also contains the number of Event IDs
supported by the ITS.

### LPI Configuration Table

Each LPI has an associated configuration byte in the LPI Configuration
Table (managed via the GIC Redistributor and placed at
`GICR_PROPBASER` or `GICR_VPROPBASER`). This byte configures:

* The LPI's priority;
* Whether the LPI is enabled or disabled.

Software updates the Configuration Table directly but must then issue
an invalidate command (per-device `INV` ITS command, global `INVALL`
ITS command or write `GICR_INVLPIR`) for the affect to be guaranteed
to become visible (possibly requiring an ITS `SYNC` command to ensure
completion of the `INV` or `INVALL`). Note that it is valid for an
implementation to reread the configuration table at any time (IOW it
is _not_ guaranteed that a change to the LPI Configuration Table won't
be visible until an invalidate is issued).

### LPI Pending Table

Each LPI also has an associated bit in the LPI Pending Table (managed
by the GIC redistributor). This bit signals whether the LPI is pending
or not.

This region may contain out of date information and the mechanism to
synchronise is `IMPLEMENTATION DEFINED`.

## Interrupt Translation Service (`ITS`)

### Device Identifiers

Each device using the ITS is associated with a unique "Device
Identifier".

The device IDs are properties of the implementaiton and are typically
described via system firmware, e.g. the ACPI IORT table or via device
tree.

The number of device ids in a system depends on the implementation and
can be discovered via `GITS_TYPER.Devbits`. This field allows an ITS
to have up to 2^32 devices.

### Events

Each device can generate "Events" (called `ID` in the spec) these
correspond to possible interrupt sources in the device (e.g. MSI
offset).

The maximum number of interrupt sources is device specific. It is
usually discovered either from firmware tables (e.g. DT or ACPI) or
from bus specific mechanisms (e.g. PCI config space).

The maximum number of events ids support by an ITS is exposed via
`GITS_TYPER.IDbits` (as number of bits - 1), it may be up to
2^32. _Note_: This field also contains the number of `LPIs` supported
by the ITS.

### Interrupt Collections

Each interrupt is a member of an "Interrupt Collection". This allows
software to manage large numbers of physical interrupts with a small
number of commands rather than issuing one command per interrupt.

On a system with N processors, the ITS must provide at least N+1
collections.

An ITS may support some number of internal collections (indicated by
`GITS_TYPER.HCC`) and external ones which require memory provisioned
by the Operating System via a `GITS_BASERn` register.

### Target Addresses

The Target Address correspond to a specific GIC re-distributor. The
format of this field depends on the value of the `GITS_TYPER.PTA` bit:

* 1: the base address of the re-distributor target is used
* 0: a unique processor number is used. The mapping between the
  processor affinity value (`MPIDR`) and the processor number is
  discoverable via `GICR_TYPER.ProcessorNumber`.

This value is up to the ITS implementer (`GITS_TYPER` is a read-only
register).

### Device Table

A Device Table is configured in each ITS which maps incoming device
identifiers into an ITS Interrupt Translation Table.

### Interrupt Translation Table (`ITT`) and Collection Table

An `Event` generated by a `Device` is translated into an `LPI` via a
per-Device Interrupt Translation Table. The structure of this table is
described in GIC Spec 4.9.12.

The ITS translation table maps the device id of the originating device
into a physical interrupt (`LPI`) and an Interrupt Collection.

The Collection is in turn looked up in the Collection Table to produce
a Target Address, indicating a redistributor (AKA CPU) to which the
LPI is delivered.

### OS Provisioned Memory Regions

The ITS hardware design provides mechanisms for an ITS to be provided
with various blocks of memory by the OS for ITS internal use, this
include the per-device ITT (established with `MAPD`) and memory
regions for Device Tables, Virtual Processors and Interrupt
Collections. Up to 8 such regions can be requested by the ITS and
provisioned by the OS via the `GITS_BASERn` registers.

### ITS Configuration

The ITS is configured and managed, including establishing and
configuring the Translation Tables and Collection Table, via an in
memory ring shared between the CPU and the ITS controller. The ring is
managed via the `GITS_CBASER` register and indexed by `GITS_CWRITER`
and `GITS_CREADR` registers.

A processor adds commands to the shared ring and then updates
`GITS_CWRITER` to make them visible to the ITS controller.

The ITS controller processes commands from the ring and then updates
`GITS_CREADR` to indicate the the processor that the command has been
processed.

Commands are processed sequentially.

Commands sent on the ring include operational commands:

* Routing interrupts to processors;
* Generating interrupts;
* Clearing the pending state of interrupts;
* Synchronising the command queue

and maintenance commands:

* Map device/collection/processor;
* Map virtual interrupt;
* Clean interrupts;
* Discard interrupts;

The field `GITS_CBASER.Size` encodes the number of 4KB pages minus 0
consisting of the command queue. This field is 8 bits which means the
maximum size is 2^8 * 4KB = 1MB. Given that each command is 32 bytes,
there is a maximum of 32768 commands in the queue.

The ITS provides no specific completion notification
mechanism. Completion is monitored by a combination of a `SYNC`
command and either polling `GITS_CREADR` or notification via an
interrupt generated via the `INT` command.

Note that the interrupt generation via `INT` requires an originating
device ID to be supplied (which is then translated via the ITS into an
LPI). No specific device ID is defined for this purpose and so the OS
software is expected to fabricate one.

Possible ways of inventing such a device ID are:

* Enumerate all device ids in the system and pick another one;
* Use a PCI BDF associated with a non-existent device function (such
  as an unused one relating to the PCI root-bridge) and translate that
  (via firmware tables) into a suitable device id;
* ???

# LPI Handling in Xen

## IRQ descriptors

Currently all SGI/PPI/SPI interrupts are covered by a single static
array of `struct irq_desc` with ~1024 entries (the maximum interrupt
number in that set of interrupt types).

The addition of LPIs in GICv3 means that the largest potential
interrupt specifier is much larger.

Therefore a second dynamically allocated array will be added to cover
the range `8192..nr_lpis`. The `irq_to_desc` function will determine
which array to use (static `0..1024` or dynamic `8192..end` lpi desc
array) based on the input irq number. Two arrays are used to avoid a
wasteful allocation covering the unused/unusable) `1024..8191` range.

## Virtual LPI interrupt injection

A physical interrupt which is routed to a guest vCPU has the
`_IRQ_GUEST` flag set in the `irq_desc` status mask. Such interrupts
have an associated instance of `struct irq_guest` which contains the
target `struct domain` pointer and virtual interrupt number.

In Xen a virtual interrupt (either arising from a physical interrupt
or completely virtual) is ultimately injected to a VCPU using the
`vgic_vcpu_inject_irq` function, or `vgic_vcpu_inject_spi`.

This mechanism will likely need updating to handle the injection of
virtual LPIs. In particular rather than `GICD_ITARGERRn` or
`GICD_IROUTERn` routing of LPIs is performed via the ITS collections
mechanism. This is discussed below (In _vITS_:_Virtual LPI injection_).

# Scope

The ITS is rather complicated, especially when combined with
virtualisation. To simplify things we initially omit the following
functionality:

- Interrupt -> vCPU -> pCPU affinity. The management of physical vs
  virtual Collections is a feature of GICv4, thus is omitted in this
  design for GICv3. Physical interrupts which occur on a pCPU where
  the target vCPU is not already resident will be forwarded (via IPI)
  to the correct pCPU for injection via the existing
  `vgic_vcpu_inject_irq` mechanism (extended to handle LPI injection
  correctly).
- Clearing of the pending state of an LPI under various circumstances
  (`MOVI`, `DISCARD`, `CLEAR` commands) is not done. This will result
  in guests seeing some perhaps spurious interrupts.
- vITS functionality will only be available on 64-bit ARM hosts,
  avoiding the need to worry about fast access to guest owned data
  structures (64-bit uses a direct map). (NB: 32-bit guests on 64-bit
  hosts can be considered to have access)

XXX Can we assume that `GITS_TYPER.Devbits` will be sane,
i.e. requiring support for the full 2^32 device ids would require a
32GB device table even for native, which is improbable except on
systems with RAM measured in TB. So we can probably assume that
Devbits will be appropriate to the size of the system. _Note_: We
require per guest device tables, so size of the native Device Table is
not the only factor here.

XXX Likewise can we assume that `GITS_TYPER.IDbits` will be sane?
i.e. that the required ITT table size will be reasonable?

# Unresolved Issues

Various parts are marked with XXX. Most are minor, but there s one
more or less major one, which we may or may not be able to live with
for a first implementation:

1. When handling Virtual LPI Configuration Table writes we do not have
   a Device ID, so we cannot consult the virtual Device Table, ITT etc
   to determine if the LPI is actually mapped. This means that the
   physical LPI enable/disable is decoupled from the validity of the
   virtual ITT. Possibly resulting in spurious LPIs which must be
   ignored.

This issue is discussed further in the relevant places in the text,
marked with `XXX UI1`.

# pITS

## Assumptions

It is assumed that `GITS_TYPER.IDbits` is large enough that there are
sufficient LPIs available to cover the sum of the number of possible
events generated by each device in the system (that is the sum of the
actual events for each bit of hardware, rather than the notional
per-device maximum from `GITS_TYPER.Idbits`).

This assumption avoids the need to do memory allocations and interrupt
routing at run time, e.g. during command processing by allowing us to
setup everything up front.

## Driver

The physical driver will provide functions for enabling, disabling
routing etc a specified interrupt, via the usual Xen APIs for doing
such things.

This will likely involve interacting with the physical ITS command
queue etc. In this document such interactions are considered internal
to the driver (i.e. we care that the API to enable an interrupt
exists, not how it is implemented).

## Device Table

The `pITS` device table will be allocated and given to the pITS at
start of day.

## Collections

The `pITS` will be configured at start of day with 1 Collection mapped
to each physical processor, using the `MAPC` command on the physical
ITS.

## Per Device Information

Each physical device in the system which can be used together with an
ITS (whether using passthrough or not) will have associated with it a
data structure:

    struct its_device {
        uintNN_t phys_device_id;
        uintNN_t virt_device_id;
        unsigned int *events;
        unsigned int nr_events;
        struct page_info *pitt;
        unsigned int nr_pitt_pages
    };

Where:

- `phys_device_id`: The physical device ID of the physical device
- `virt_device_id`: The virtual device ID if the device is accessible
  to a domain
- `events`: An array mapping a per-device event number into a physical
  LPI.
- `nr_events`: The number of events which this device is able to
  generate.
- `pitt`, `nr_pitt_pages`: Records allocation of pages for physical
  ITT (not directly accessible).

During its lifetime this structure may be referenced by several
different mappings (e.g. physical and virtual device id maps, virtual
collection device id).

## Device Discovery/Registration and Configuration

Per device information will be discovered based on firmware tables (DT
or ACPI) and information provided by dom0 (e.g. registration via
PHYSDEVOP_pci_device_add or new custom hypercalls).

This information shall include at least:

- The Device ID of the device.
- The maximum number of Events which the device is capable of
  generating.

When a device is discovered/registered (i.e. when all necessary
information is available) then:

- `struct its_device` and the embedded `events` array will be
  allocated (the latter with `nr_events` elements).
- The `struct its_device` will be inserted into a mapping (possibly an
  R-B tree) from its physical Device ID to the `struct its`.
- `nr_events` physical LPIs will be allocated and recorded in the
  `events` array.
- An ITT table will be allocated for the device and the appropriate
  `MAPD` command will be issued to the physical ITS. The location will
  be recorded in `struct its_device.pitt`.
- Each Event which the device may generate will be mapped to the
  corresponding LPI in the `events` array and a collection, by issuing
  a series of `MAPVI` commands. Events will be assigned to physical
  collections in a round-robin fashion.

This setup must occur for a given device before any ITS interrupts may
be configured for the device and certainly before a device is passed
through to a guest. This implies that dom0 cannot use MSIs on a PCI
device before having called `PHYSDEVOP_pci_device_add`.

# Device Assignment

Each domain will have an associated mapping from virtual device ids
into a data structure describing the physical device, including a
reference to the relevant `struct its_device`.

The number of possible device IDs may be large so a simple array or
list is likely unsuitable. A tree (e.g. Red-Black may be a suitable
data structure. Currently we do not need to perform lookups in this
tree on any hot paths.

_Note_: In the context of virtualised device ids (especially for domU)
it may be possible to arrange for the upper bound on the number of
device IDs to be lower allowing a more efficient data structure to be
used. This is left for a future improvement.

When a device is assigned to a domain (including to domain 0) the
mapping for the new virtual device ID will be entered into the tree.

During assignment all LPIs associated with the device will be routed
to the guest (i.e. `route_irq_to_guest` will be called for each LPI in
the `struct its_device.events` array).

# vITS

A guest domain which is allowed to use ITS functionality (i.e. has
been assigned pass-through devices which can generate MSIs) will be
presented with a virtualised ITS.

Accesses to the vITS registers will trap to Xen and be emulated and a
virtualised Command Queue will be provided.

Commands entered onto the virtual Command Queue will be translated
into physical commands, as described later in this document.

There are other aspects to virtualising the ITS (LPI collection
management, assignment of LPI ranges to guests, device
management). However these are only considered here to the extent
needed for describing the vITS emulation.

## Xen interaction with guest OS provisioned vITS memory

Memory which the guest provisions to the vITS (ITT via `MAPD` or other
tables via `GITS_BASERn`) needs careful handling in Xen.

Since Xen cannot trust data in data structures contained in such
memory if a guest can trample over it at will. Therefore Xen either
must take great care when accessing data structures stored in such
memory to validate the contents e.g. not trust that values are within
the required limits or it must take steps to restrict guest access to
the memory when it is provisioned. Since the data structures are
simple and most accessors need to do bounds check anyway it is
considered sufficient to simply do the necessary checks on access.

Most data structures stored in this shared memory are accessed on the
hot interrupt injection path and must therefore be quickly accessbile
from within Xen. Since we have restricted vits support to 64-bit hosts
only `map_domain_page` is fast enough to be used on the fly and
therefore we do not need to be concerned about unbounded amounts of
permanently mapped memory consumed by each `MAPD` command.

Although `map_domain_page` is fast, `p2m_lookup` (translation from IPA
to PA) is not necessarily so. For now we accept this, as a future
extension a sparse mapping of the guest device table in vmap space
could be considered, with limits on the total amount of vmap space which
we allow each domain to consume.

## vITS properties

The vITS implementation shall have:

- `GITS_TYPER.HCC == nr_vcpus + 1`.
- `GITS_TYPER.PTA == 0`. Target addresses are linear processor numbers.
- `GITS_TYPER.Devbits == See below`.
- `GITS_TYPER.IDbits == See below`.
- `GITS_TYPER.ITT Entry Size == 7`, meaning 8 bytes, which is the size
  of `struct vitt` (defined below).

`GITS_TYPER.Devbits` and `GITS_TYPER.Idbits` will need to be chosen to
reflect the host and guest configurations (number of LPIs, maximum
device ID etc).

Other fields (not mentioned here) will be set to some sensible (or
mandated) value.

The `GITS_BASER0` will be setup to request sufficient memory for a
device table consisting of entries of:

    struct vdevice_table {
        uint64_t vitt_ipa;
        uint32_t vitt_size;
        uint32_t padding;
    };
    BUILD_BUG_ON(sizeof(struct vdevice_table) != 16);

On write to `GITS_BASE0` the relevant details of the Device Table
(IPA, size, cache attributes to use when mapping) will be recorded in
`struct domain`.

All other `GITS_BASERn.Valid == 0`.

## vITS to pITS mapping

A physical system may have multiple physical ITSs.

With the simplified vits command model presented here only a single
`vits` is required.

In the future a more complex arrangement may be desired. Since the
choice of model is internal to the hypervisor/tools and is
communicated to the guest via firmware tables we are not tied to this
model as an ABI if we decide to change.

## LPI Configuration Table Virtualisation

A guest's write accesses to its LPI Configuration Table (which is just
an area of guest RAM which the guest has nominated) will be trapped to
the hypervisor, using stage 2 MMU permissions, in order for changes to
be propagated into the host interrupt configuration.

On write `bit[0]` of the written byte is the enable/disable state for
the irq and is handled thus:

    lpi = (addr - table_base);
    if ( byte & 1 )
        enable_irq(lpi);
    else
        disable_irq(lpi);

Note that in the context of this emulation we do not have access to a
Device ID, and therefore cannot make decisions based on whether the
LPI/event has been `MAPD`d etc. In any case we have an `lpi` in our
hand and not an `event`, IOW we would need to do a _reverse_ lookup in
the ITT.

LPI priority (the remaining bits in the written byte) is currently
ignored.

## LPI Pending Table Virtualisation

XXX Can we simply ignore this? 4.8.5 suggests it is not necessarily in
sync and the mechanism to force a sync is `IMPLEMENTATION DEFINED`.

## Device Table Virtualisation

The IPA, size and cacheability attributes of the guest device table
will be recorded in `struct domain` upon write to `GITS_BASER0`.

In order to lookup an entry for `device`:

    define {get,set}_vdevice_entry(domain, device, struct device_table *entry):
        offset = device*sizeof(struct vdevice_table)
        if offset > <DT size>: error

        dt_entry = <DT base IPA> + device*sizeof(struct vdevice_table)
        page = p2m_lookup(domain, dt_entry, p2m_ram)
        if !page: error
        /* nb: non-RAM pages, e.g. grant mappings,
         * are rejected by this lookup */

        dt_mapping = map_domain_page(page)

        if (set)
             dt_mapping[<appropriate page offset from device>] = *entry;
        else
             *entry = dt_mapping[<appropriate page offset>];

        unmap_domain_page(dt_mapping)

Since everything is based upon IPA (guest addresses) a malicious guest
can only reference its own RAM here.

## ITT Virtualisation

The location of a VITS will have been recorded in the domain Device
Table by a `MAPI` or `MAPVI` command and is looked up as above.

The `vitt` is a `struct vitt`:

    struct vitt {
        uint16_t valid:1;
        uint16_t pad:15;
        uint16_t collection;
        uint32_t vpli;
    };
    BUILD_BUG_ON(sizeof(struct vitt) != 8);

A lookup occurs similar to for a device table, the offset is range
checked against the `vitt_size` from the device table. To lookup
`event` on `device`:

    define {get,set}_vitt_entry(domain, device, event, struct vitt *entry):
        get_vdevice_entry(domain, device, &dt)

        offset = device*sizeof(struct vitt);
        if offset > dt->vitt_size: error

        vitt_entry = dt->vita_ipa + event*sizeof(struct vitt)
        page = p2m_lookup(domain, vitt_entry, p2m_ram)
        if !page: error
        /* nb: non-RAM pages, e.g. grant mappings,
         * are rejected by this lookup */

        vitt_mapping = map_domain_page(page)

        if (set)
             vitt_mapping[<appropriate page offset from event>] = *entry;
        else
             *entry = vitt_mapping[<appropriate page offset>];

        unmap_domain_page(entry)

Again since this is IPA based a malicious guest can only point things
to its own ram.

## Collection Table Virtualisation

A pointer to a dynamically allocated array `its_collections` mapping
collection ID to vcpu ID will be added to `struct domain`. The array
shall have `nr_vcpus + 1` entries and resets to ~0 (or another
explicitly invalid vpcu nr).

## Virtual LPI injection

As discussed above the `vgic_vcpu_inject_irq` functionality will need
to be extended to cover this new case, most likely via a new
`vgic_vcpu_inject_lpi` frontend function.

`vgic_vcpu_inject_lpi` receives a `struct domain *` and a virtual
interrupt number (corresponding to a vLPI) and needs to figure out
which vcpu this should map to.

To do this it must look up the Collection ID associated (via the vITS)
with that LPI.

Proposal: Add a new `its_device` field to `struct irq_guest`, a
pointer to the associated `struct its_device`. The existing `struct
irq_guest.virq` field contains the event ID (perhaps use a `union`
to give a more appropriate name) and _not_ the virtual LPI. Injection
then consists of:

        d = irq_guest->domain
        virq = irq_guest->virq
        its_device = irq_guest->its_device

        get_vitt_entry(d, its_device->virt_device_id, virq, &vitt)
        vcpu = d->its_collections[vitt.collection]
        vgic_vcpu_inject_irq(d, &d->vcpus[vcpu])

In the event that the IIT is not `MAPD`d, or the Event has not been
`MAPI`/`MAPVI`d or the collection is not `MAPC`d here the interrupt is
simply ignored. Note that this can happen because LPI mapping is
decoupled from LPI enablement. In particular writes to the LPI
Configuration Table do not include a Device ID and therefore cannot
make decisions based on the ITT.

XXX UI1 if we could find a reliable way to reenable then could
potentially disable LPI on error and reenable later (taking a spurious
Xen interrupt for each possible vits misconfiguration). IOW if the
interrupt is invalid for each of these reasons we can disable and
reenable as described:

- Not `MAPD`d -- on `MAPD` enable all associate LPIs which are enabled
  in LPI CFG Table.
- Not `MAPI`/`MAPVI`d -- on `MAPI`/`MAPVI` enable LPI if enabled in
  CFG Table.
- Not `MAPC`d -- tricky. Need to know lists of LPIs associated with a
  virtual collection. A `list_head` in `struct irq_guest` implies a
  fair bit of overhead, number of LPIs per collection is the total
  number of LPIs (guest could assign all to the same
  collection). Could walk entire LPI CFG and reenable all set LPIs,
  might involve walking over several KB of memory though. Could inject
  unmapped collections to vcpu0, forcing the guest to deal with the
  spurious interrupts?

XXX Only the `MAPC` issue seems problematic. Is this a critical issue or
can we get away with it?

## Command Queue Virtualisation

The command translation/emulation in this design has been arranged to
be as cheap as possible (e.g. in many cases the actions are NOPs),
avoiding previous concerns about the length of time which an emulated
write to a `CWRITER` register may block the vcpu.

The vits will simply track its reader and writer pointers. On write
to `CWRITER` it will immediately and synchronously process all
commands in the queue and update its state accordingly.

It might be possible to implement a rudimentary form of preemption by
periodically (as determined by `hypercall_preempt_check()`) returning
to the guest without incrementing PC but with updated internal
`CREADR` state, meaning it will reexecute the write to `CWRITER` and
we can pickup where we left off for another iteration. This at least
lets us schedule other vcpus etc and prevents a monopoly.

## ITS Command Translation

This section is based on the section 5.13 of GICv3 specification
(PRD03-GENC-010745 24.0) and provides concrete ideas of how this can
be interpreted for Xen.

The ITS provides 12 commands in order to manage interrupt collections,
devices and interrupts. Possible command parameters are:

- Device ID (`Device`) (called `device` in the spec).
- Event ID (`Event`) (called `ID` in the spec). This is an index into
  a devices `ITT`.
- Collection ID (`Collection`) (called `collection` in the spec)
- LPI ID (`LPI`) (called `pID` in the spec)
- Target Address (`TA`) (called `TA` in the spec`)

These parameters need to be validated and translated from Virtual (`v`
prefix) to Physical (`p` prefix).

Note, we differ from the naming in the GIC spec for clarity, in
particular we use `Event` not `ID` and `LPI` not `pID` to reduce
confusion, especially when `v` and `p` suffixes are used due to
virtualisation.

### Parameter Validation / Translation

Each command contains parameters that needs to be validated before any
usage in Xen or passing to the hardware.

#### Device ID (`Device`)

Corresponding ITT obtained by looking up as described above.

The physical `struct its_device` can be found by looking up in the
domain's device map.

If lookup fails or the resulting device table entry is invalid then
the Device is invalid.

#### Event ID (`Event`)

Validated against emulated `GITS_TYPER.IDbits`.

It is not necessary to translate a `vEvent`.

#### LPI (`LPI`)

Validated against emulated `GITS_TYPER.IDbits`.

It is not necessary to translate a `vLPI` into a `pLPI` since the
tables all contain `vLPI`. (Translation from `pLPI` to `vLPI` happens
via `struct irq_guest` when we receive the IRQ).

#### Interrupt Collection (`Collection`)

The `Collection` is validated against the size of the per-domain
`its_collections` array (i.e. nr_vcpus + 1) and then translated by a
simple lookup in that array.

     vcpu_nr = d->its_collections[Collection]

A result > `nr_cpus` is invalid

#### Target Address (`TA`)

This parameter is used in commands which manage collections. It is a
unique identifier per processor.

We have chosen to implement `GITS_TYPER.PTA` as 0, hence `vTA` simply
corresponds to the `vcpu_id`, so only needs bounds checking against
`nr_vcpus`.

### Commands

To be read with reference to spec for each command (which includes
error checks etc which are omitted here).

It is assumed that inputs will be bounds and validity checked as
described above, thus error handling is omitted for brevity (i.e. if
get and/or set fail then so be it). In general invalid commands are
simply ignored.

#### `MAPD`: Map a physical device to an ITT.

_Format_: `MAPD Device, Valid, ITT Address, ITT Size`.

_Spec_: 5.13.11

`MAPD` is sent with `Valid` bit set if the mapping is to be added and
reset when mapping is removed.

The domain's device table is updated with the provided information.

The `vitt_mapd` field is set according to the `Valid` flag in the
command:

    dt_entry.vitt_ipa = ITT Address
    dt_entry.vitt_size = ITT Size
    set_vdevice_entry(current->domain, Device, &dt_entry)

    XXX UI1 Poss. handle dis/enabling pLPI, if mapping now completely (in)valid.

XXX Should we validate size? Guests own fault if we run off the end of
a table later? If we did start to consider permanent mappings then we
_would_ care.

#### `MAPC`: Map an interrupt collection to a target processor

_Format_: `MAPC Collection, TA`

_Spec_: 5.13.12

The updated `vTA` (a vcpu number) is recorded in the `its_collections`
array of the domain struct:

    d->its_collections[Collection] = TA

    XXX UI1 Poss. handle dis/enabling pLPI, if mapping now completely (in)valid.

#### `MAPI`: Map an interrupt to an interrupt collection.

_Format_: `MAPI Device, LPI, Collection`

_Spec_: 5.13.13

After validation:

    vitt.valid = True
    vitt.collection = Collection
    vitt.vpli = LPI
    set_vitt_entry(current->domian, Device, LPI, &vitt)

    XXX UI1 Poss. handle dis/enabling pLPI, if mapping now completely (in)valid.

#### `MAPVI`: Map an input identifier to a physical interrupt and an interrupt collection.

Format: `MAPVI Device, Event, LPI, Collection`

    vitt.valid = True
    vitt.collection = Collection
    vitt.vpli = LPI
    set_vitt_entry(current->odmian, Device, Event, &vitt)

    XXX UI1 Poss. handle dis/enabling pLPI, if mapping now completely (in)valid.

#### `MOVI`: Redirect interrupt to an interrupt collection

_Format_: `MOVI Device, Event, Collection`

_Spec_: 5.13.15

    get_vitt_entry(current->domain, Device, Event, &vitt)
    vitt.collection = Collection
    set_vitt_entry(current->domain, Device, Event, &vitt)

    XXX consider helper which sets field without mapping/unmapping
    twice.

This command is supposed to move any pending interrupts associated
with `Event` to the vcpu implied by the new `Collection`, which is
tricky. For now we ignore this requirement (as we do for
`GICD_IROUTERn` and `GICD_TARGETRn` for other interrupt types).

#### `DISCARD`: Discard interrupt requests

_Format_: `DISCARD Device, Event`

_Spec_: 5.13.16

    get_vitt_entry(current->domain, Device, Event, &vitt)
    vitt.valid = False
    set_vitt_entry(current->domain, Device, Event, &vitt)

    XXX UI1 Poss. handle dis/enabling pLPI, if mapping now completely (in)valid.

    XXX consider helper which sets field without mapping/unmapping
    twice.

This command is supposed to clear the pending state of any associated
interrupt. This requirement is ignored (guest may see a spurious
interrupt).

#### `INV`: Clean any caches associated with interrupt

_Format_: `INV Device, Event`

_Spec_: 5.13.17

Since LPI Configuration table updates are handled synchronously in the
respective trap handler there is nothing to do here.

#### `INVALL`: Clean any caches associated with an interrupt collection

_Format_: `INVALL Collection`

_Spec_: 5.13.19

Since LPI Configuration table updates are handled synchronously there
is nothing to do here.

#### `INT`: Generate an interrupt

_Format_: `INT Device, Event`

_Spec_: 5.13.20

The `vitt` entry corresonding to `Device,Event` is looked up and then:

    get_vitt_entry(current->domain, Device, Event, &vitt)
    vgic_vcpu_inject_lpi(current->domain, vitt.vlpi)

XXX Where (Device,Event) is real may need consideration of
interactions with real LPIs being delivered: Julien had concerns about
Xen's internal IRQ State tracking. if this is a problem then may need
changes to IRQ state tracking, or to inject as a real IRQ and let
physical IRQ injection handle it, or write to `GICR_SETLPIR`?

#### `CLEAR`: Clear the pending state of an interrupt

_Format_: `CLEAR Device, Event`

_Spec_: 5.13.21

Should clear pending state of LPI. Ignore (guest may see a spurious
interrupt).

#### `SYNC`: Wait for completion of any outstanding ITS actions for collection

_Format_: `SYNC TA`

_Spec_: 5.13.22

This command can be ignored.

# GICv4 Direct Interrupt Injection

GICv4 will directly mark the LPIs pending in the virtual pending table
which is per-redistributor (i.e per-vCPU).

LPIs will be received by the guest the same way as an SPIs. I.e trap in
IRQ mode then read ICC_IAR1_EL1 (for GICv3).

Therefore GICv4 will not require one vITS per pITS.

# Event Channels

It has been proposed that it might be nice to inject event channels as
LPIs in the future. Whether or not that would involve any sort of vITS
is unclear, but if it did then it would likely be a separate emulation
to the vITS emulation used with a pITS and as such is not considered
further here.

# Glossary

* _MSI_: Message Signalled Interrupt
* _ITS_: Interrupt Translation Service
* _GIC_: Generic Interrupt Controller
* _LPI_: Locality-specific Peripheral Interrupt

# References

"GIC Architecture Specification" PRD03-GENC-010745 24.0.

"IO Remapping Table System Software on ARM® Platforms" ARM DEN 0049A.


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Draft D] Xen on ARM vITS Handling
  2015-06-04 13:54 [Draft D] Xen on ARM vITS Handling Ian Campbell
@ 2015-06-04 17:55 ` Julien Grall
  2015-06-04 19:08   ` Julien Grall
  2015-06-05 10:24   ` Ian Campbell
  2015-06-05  6:07 ` Vijay Kilari
  1 sibling, 2 replies; 18+ messages in thread
From: Julien Grall @ 2015-06-04 17:55 UTC (permalink / raw
  To: Ian Campbell, xen-devel@lists.xen.org
  Cc: manish.jaggi, Julien Grall, Stefano Stabellini, Vijay Kilari

Hi Ian,

On 04/06/15 14:54, Ian Campbell wrote:
> ### Device Identifiers
> 
> Each device using the ITS is associated with a unique "Device
> Identifier".
> 
> The device IDs are properties of the implementaiton and are typically

implementation

> described via system firmware, e.g. the ACPI IORT table or via device
> tree.
> 
> The number of device ids in a system depends on the implementation and
> can be discovered via `GITS_TYPER.Devbits`. This field allows an ITS
> to have up to 2^32 devices.

[..]

> # Scope
> 
> The ITS is rather complicated, especially when combined with
> virtualisation. To simplify things we initially omit the following
> functionality:
> 
> - Interrupt -> vCPU -> pCPU affinity. The management of physical vs
>   virtual Collections is a feature of GICv4, thus is omitted in this
>   design for GICv3. Physical interrupts which occur on a pCPU where
>   the target vCPU is not already resident will be forwarded (via IPI)
>   to the correct pCPU for injection via the existing
>   `vgic_vcpu_inject_irq` mechanism (extended to handle LPI injection
>   correctly).
> - Clearing of the pending state of an LPI under various circumstances
>   (`MOVI`, `DISCARD`, `CLEAR` commands) is not done. This will result
>   in guests seeing some perhaps spurious interrupts.
> - vITS functionality will only be available on 64-bit ARM hosts,
>   avoiding the need to worry about fast access to guest owned data
>   structures (64-bit uses a direct map). (NB: 32-bit guests on 64-bit
>   hosts can be considered to have access)
> 
> XXX Can we assume that `GITS_TYPER.Devbits` will be sane,
> i.e. requiring support for the full 2^32 device ids would require a
> 32GB device table even for native, which is improbable except on
> systems with RAM measured in TB. So we can probably assume that
> Devbits will be appropriate to the size of the system. _Note_: We
> require per guest device tables, so size of the native Device Table is
> not the only factor here.

As we control the vBDF we can control the vDevID. If we have a single
PCI bus, the number won't be too high.

> XXX Likewise can we assume that `GITS_TYPER.IDbits` will be sane?

The GITS_TYPER.IDbits of who? The physical ITS?

> i.e. that the required ITT table size will be reasonable?

> # Unresolved Issues
> 
> Various parts are marked with XXX. Most are minor, but there s one
> more or less major one, which we may or may not be able to live with
> for a first implementation:
> 
> 1. When handling Virtual LPI Configuration Table writes we do not have
>    a Device ID, so we cannot consult the virtual Device Table, ITT etc
>    to determine if the LPI is actually mapped. This means that the
>    physical LPI enable/disable is decoupled from the validity of the
>    virtual ITT. Possibly resulting in spurious LPIs which must be
>    ignored.

> This issue is discussed further in the relevant places in the text,
> marked with `XXX UI1`.
> 
> # pITS
> 
> ## Assumptions
> 
> It is assumed that `GITS_TYPER.IDbits` is large enough that there are
> sufficient LPIs available to cover the sum of the number of possible
> events generated by each device in the system (that is the sum of the
> actual events for each bit of hardware, rather than the notional
> per-device maximum from `GITS_TYPER.Idbits`).
> 
> This assumption avoids the need to do memory allocations and interrupt
> routing at run time, e.g. during command processing by allowing us to
> setup everything up front.
> 
> ## Driver
> 
> The physical driver will provide functions for enabling, disabling
> routing etc a specified interrupt, via the usual Xen APIs for doing
> such things.
> 
> This will likely involve interacting with the physical ITS command
> queue etc. In this document such interactions are considered internal
> to the driver (i.e. we care that the API to enable an interrupt
> exists, not how it is implemented).
> 
> ## Device Table
> 
> The `pITS` device table will be allocated and given to the pITS at
> start of day.

We don't really care about this. This is part of the memory provision at
initialization based on GITS_BASER*

Furthermore, the ITS may already have in the table in-memory and
therefore this allocation is not neccesary.

> 
> ## Collections
> 
> The `pITS` will be configured at start of day with 1 Collection mapped
> to each physical processor, using the `MAPC` command on the physical
> ITS.
> 
> ## Per Device Information
> 
> Each physical device in the system which can be used together with an
> ITS (whether using passthrough or not) will have associated with it a
> data structure:
> 
>     struct its_device {
>         uintNN_t phys_device_id;
>         uintNN_t virt_device_id;
>         unsigned int *events;
>         unsigned int nr_events;
>         struct page_info *pitt;
>         unsigned int nr_pitt_pages

You need to have a pointer to the pITS associated.

>     };
> 
> Where:
> 
> - `phys_device_id`: The physical device ID of the physical device
> - `virt_device_id`: The virtual device ID if the device is accessible
>   to a domain
> - `events`: An array mapping a per-device event number into a physical
>   LPI.
> - `nr_events`: The number of events which this device is able to
>   generate.
> - `pitt`, `nr_pitt_pages`: Records allocation of pages for physical
>   ITT (not directly accessible).
> 
> During its lifetime this structure may be referenced by several
> different mappings (e.g. physical and virtual device id maps, virtual
> collection device id).
> 
> ## Device Discovery/Registration and Configuration
> 
> Per device information will be discovered based on firmware tables (DT
> or ACPI) and information provided by dom0 (e.g. registration via
> PHYSDEVOP_pci_device_add or new custom hypercalls).
> 
> This information shall include at least:
> 
> - The Device ID of the device.
> - The maximum number of Events which the device is capable of
>   generating.

Well, the maximum number of Events doesn't need to come through a new
hypercall. We can directly retrieve it via the PCI CFG space.

> 
> When a device is discovered/registered (i.e. when all necessary
> information is available) then:
> 
> - `struct its_device` and the embedded `events` array will be
>   allocated (the latter with `nr_events` elements).
> - The `struct its_device` will be inserted into a mapping (possibly an
>   R-B tree) from its physical Device ID to the `struct its`.
> - `nr_events` physical LPIs will be allocated and recorded in the
>   `events` array.
> - An ITT table will be allocated for the device and the appropriate
>   `MAPD` command will be issued to the physical ITS. The location will
>   be recorded in `struct its_device.pitt`.
> - Each Event which the device may generate will be mapped to the
>   corresponding LPI in the `events` array and a collection, by issuing
>   a series of `MAPVI` commands. Events will be assigned to physical
>   collections in a round-robin fashion.
> 
> This setup must occur for a given device before any ITS interrupts may
> be configured for the device and certainly before a device is passed
> through to a guest. This implies that dom0 cannot use MSIs on a PCI
> device before having called `PHYSDEVOP_pci_device_add`.

Sounds sensible.

> 
> # Device Assignment
> 
> Each domain will have an associated mapping from virtual device ids
> into a data structure describing the physical device, including a
> reference to the relevant `struct its_device`.
> 
> The number of possible device IDs may be large so a simple array or
> list is likely unsuitable. A tree (e.g. Red-Black may be a suitable
> data structure. Currently we do not need to perform lookups in this
> tree on any hot paths.

Even though, the lookup would be quick.

> 
> _Note_: In the context of virtualised device ids (especially for domU)
> it may be possible to arrange for the upper bound on the number of
> device IDs to be lower allowing a more efficient data structure to be
> used. This is left for a future improvement.
> 
> When a device is assigned to a domain (including to domain 0) the
> mapping for the new virtual device ID will be entered into the tree.
> 
> During assignment all LPIs associated with the device will be routed
> to the guest (i.e. `route_irq_to_guest` will be called for each LPI in
> the `struct its_device.events` array).
> 
> # vITS
> 
> A guest domain which is allowed to use ITS functionality (i.e. has
> been assigned pass-through devices which can generate MSIs) will be
> presented with a virtualised ITS.
> 
> Accesses to the vITS registers will trap to Xen and be emulated and a
> virtualised Command Queue will be provided.
> 
> Commands entered onto the virtual Command Queue will be translated
> into physical commands, as described later in this document.
> 
> There are other aspects to virtualising the ITS (LPI collection
> management, assignment of LPI ranges to guests, device
> management). However these are only considered here to the extent
> needed for describing the vITS emulation.
> 
> ## Xen interaction with guest OS provisioned vITS memory
> 
> Memory which the guest provisions to the vITS (ITT via `MAPD` or other
> tables via `GITS_BASERn`) needs careful handling in Xen.
> 
> Since Xen cannot trust data in data structures contained in such
> memory if a guest can trample over it at will. Therefore Xen either
> must take great care when accessing data structures stored in such
> memory to validate the contents e.g. not trust that values are within
> the required limits or it must take steps to restrict guest access to
> the memory when it is provisioned. Since the data structures are
> simple and most accessors need to do bounds check anyway it is
> considered sufficient to simply do the necessary checks on access.
> 
> Most data structures stored in this shared memory are accessed on the
> hot interrupt injection path and must therefore be quickly accessbile

accessible

> from within Xen. Since we have restricted vits support to 64-bit hosts
> only `map_domain_page` is fast enough to be used on the fly and
> therefore we do not need to be concerned about unbounded amounts of
> permanently mapped memory consumed by each `MAPD` command.
> 
> Although `map_domain_page` is fast, `p2m_lookup` (translation from IPA
> to PA) is not necessarily so. For now we accept this, as a future
> extension a sparse mapping of the guest device table in vmap space
> could be considered, with limits on the total amount of vmap space which
> we allow each domain to consume.
> 
> ## vITS properties
> 
> The vITS implementation shall have:
> 
> - `GITS_TYPER.HCC == nr_vcpus + 1`.
> - `GITS_TYPER.PTA == 0`. Target addresses are linear processor numbers.
> - `GITS_TYPER.Devbits == See below`.
> - `GITS_TYPER.IDbits == See below`.
> - `GITS_TYPER.ITT Entry Size == 7`, meaning 8 bytes, which is the size
>   of `struct vitt` (defined below).
> 
> `GITS_TYPER.Devbits` and `GITS_TYPER.Idbits` will need to be chosen to
> reflect the host and guest configurations (number of LPIs, maximum
> device ID etc).
> 
> Other fields (not mentioned here) will be set to some sensible (or
> mandated) value.
> 
> The `GITS_BASER0` will be setup to request sufficient memory for a
> device table consisting of entries of:
> 
>     struct vdevice_table {
>         uint64_t vitt_ipa;
>         uint32_t vitt_size;
>         uint32_t padding;
>     };
>     BUILD_BUG_ON(sizeof(struct vdevice_table) != 16);
> 
> On write to `GITS_BASE0` the relevant details of the Device Table

GITS_BASER0

> (IPA, size, cache attributes to use when mapping) will be recorded in
> `struct domain`.

map_domain_page assume that the memory is WB/WT. What happen if the
guest decides to choose a different attribute?

> 
> All other `GITS_BASERn.Valid == 0`.
> 
> ## vITS to pITS mapping
> 
> A physical system may have multiple physical ITSs.
> 
> With the simplified vits command model presented here only a single
> `vits` is required.

What about DOM0? We would have to browse the firmware table (ACPI or DT)
to replace the ITS parent.

> 
> In the future a more complex arrangement may be desired. Since the
> choice of model is internal to the hypervisor/tools and is
> communicated to the guest via firmware tables we are not tied to this
> model as an ABI if we decide to change.
> 
> ## LPI Configuration Table Virtualisation
> 
> A guest's write accesses to its LPI Configuration Table (which is just
> an area of guest RAM which the guest has nominated) will be trapped to
> the hypervisor, using stage 2 MMU permissions, in order for changes to
> be propagated into the host interrupt configuration.
> 
> On write `bit[0]` of the written byte is the enable/disable state for
> the irq and is handled thus:
> 
>     lpi = (addr - table_base);
>     if ( byte & 1 )
>         enable_irq(lpi);
>     else
>         disable_irq(lpi);
> 
> Note that in the context of this emulation we do not have access to a
> Device ID, and therefore cannot make decisions based on whether the
> LPI/event has been `MAPD`d etc. In any case we have an `lpi` in our
> hand and not an `event`, IOW we would need to do a _reverse_ lookup in
> the ITT.

I'm struggling to see how this would work. After enabling/disabling an
IRQ you need to send an INV which require the devID and the eventID.

Also, until now you haven't explained how the vLPI will be mapped to the
pLPI. If you have this mapping, you are able to get the device, ID.

> 
> LPI priority (the remaining bits in the written byte) is currently
> ignored.
> 
> ## LPI Pending Table Virtualisation
> 
> XXX Can we simply ignore this? 4.8.5 suggests it is not necessarily in
> sync and the mechanism to force a sync is `IMPLEMENTATION DEFINED`.
> 
> ## Device Table Virtualisation
> 
> The IPA, size and cacheability attributes of the guest device table
> will be recorded in `struct domain` upon write to `GITS_BASER0`.
>
> In order to lookup an entry for `device`:
> 
>     define {get,set}_vdevice_entry(domain, device, struct device_table *entry):
>         offset = device*sizeof(struct vdevice_table)
>         if offset > <DT size>: error
> 
>         dt_entry = <DT base IPA> + device*sizeof(struct vdevice_table)
>         page = p2m_lookup(domain, dt_entry, p2m_ram)
>         if !page: error
>         /* nb: non-RAM pages, e.g. grant mappings,
>          * are rejected by this lookup */

It is allowed to pass device memory here. See Device-nGnRnE.

> 
>         dt_mapping = map_domain_page(page)
> 
>         if (set)
>              dt_mapping[<appropriate page offset from device>] = *entry;
>         else
>              *entry = dt_mapping[<appropriate page offset>];
> 
>         unmap_domain_page(dt_mapping)
> 
> Since everything is based upon IPA (guest addresses) a malicious guest
> can only reference its own RAM here.
> 
> ## ITT Virtualisation
> 
> The location of a VITS will have been recorded in the domain Device
> Table by a `MAPI` or `MAPVI` command and is looked up as above.
> 
> The `vitt` is a `struct vitt`:
> 
>     struct vitt {
>         uint16_t valid:1;
>         uint16_t pad:15;
>         uint16_t collection;
>         uint32_t vpli;
>     };
>     BUILD_BUG_ON(sizeof(struct vitt) != 8);
> 
> A lookup occurs similar to for a device table, the offset is range
> checked against the `vitt_size` from the device table. To lookup
> `event` on `device`:
> 
>     define {get,set}_vitt_entry(domain, device, event, struct vitt *entry):
>         get_vdevice_entry(domain, device, &dt)
> 
>         offset = device*sizeof(struct vitt);

s/device/event/ ?

>         if offset > dt->vitt_size: error
> 
>         vitt_entry = dt->vita_ipa + event*sizeof(struct vitt)
>         page = p2m_lookup(domain, vitt_entry, p2m_ram)
>         if !page: error
>         /* nb: non-RAM pages, e.g. grant mappings,
>          * are rejected by this lookup */
> 
>         vitt_mapping = map_domain_page(page)
> 
>         if (set)
>              vitt_mapping[<appropriate page offset from event>] = *entry;
>         else
>              *entry = vitt_mapping[<appropriate page offset>];
> 
>         unmap_domain_page(entry)
> 
> Again since this is IPA based a malicious guest can only point things
> to its own ram.
> 
> ## Collection Table Virtualisation
> 
> A pointer to a dynamically allocated array `its_collections` mapping
> collection ID to vcpu ID will be added to `struct domain`. The array
> shall have `nr_vcpus + 1` entries and resets to ~0 (or another
> explicitly invalid vpcu nr).
> 
> ## Virtual LPI injection
> 
> As discussed above the `vgic_vcpu_inject_irq` functionality will need
> to be extended to cover this new case, most likely via a new
> `vgic_vcpu_inject_lpi` frontend function.
> 
> `vgic_vcpu_inject_lpi` receives a `struct domain *` and a virtual
> interrupt number (corresponding to a vLPI) and needs to figure out
> which vcpu this should map to.
> 
> To do this it must look up the Collection ID associated (via the vITS)
> with that LPI.
> 
> Proposal: Add a new `its_device` field to `struct irq_guest`, a
> pointer to the associated `struct its_device`. The existing `struct
> irq_guest.virq` field contains the event ID (perhaps use a `union`
> to give a more appropriate name) and _not_ the virtual LPI. Injection
> then consists of:
> 
>         d = irq_guest->domain
>         virq = irq_guest->virq
>         its_device = irq_guest->its_device
> 
>         get_vitt_entry(d, its_device->virt_device_id, virq, &vitt)
>         vcpu = d->its_collections[vitt.collection]
>         vgic_vcpu_inject_irq(d, &d->vcpus[vcpu])

Shouldn't you pass at least the vLPIs?

We would also need to ensure that the vitt.vpli is effectively an LPI.
Otherwise the guest may send an invalid value (such as 1023) which will
potentially crash the platform.

> In the event that the IIT is not `MAPD`d, or the Event has not been
> `MAPI`/`MAPVI`d or the collection is not `MAPC`d here the interrupt is
> simply ignored. Note that this can happen because LPI mapping is
> decoupled from LPI enablement. In particular writes to the LPI
> Configuration Table do not include a Device ID and therefore cannot
> make decisions based on the ITT.
> 
> XXX UI1 if we could find a reliable way to reenable then could
> potentially disable LPI on error and reenable later (taking a spurious
> Xen interrupt for each possible vits misconfiguration). IOW if the
> interrupt is invalid for each of these reasons we can disable and
> reenable as described:
> 
> - Not `MAPD`d -- on `MAPD` enable all associate LPIs which are enabled
>   in LPI CFG Table.
> - Not `MAPI`/`MAPVI`d -- on `MAPI`/`MAPVI` enable LPI if enabled in
>   CFG Table.
> - Not `MAPC`d -- tricky. Need to know lists of LPIs associated with a
>   virtual collection. A `list_head` in `struct irq_guest` implies a
>   fair bit of overhead, number of LPIs per collection is the total
>   number of LPIs (guest could assign all to the same
>   collection). Could walk entire LPI CFG and reenable all set LPIs,
>   might involve walking over several KB of memory though. Could inject
>   unmapped collections to vcpu0, forcing the guest to deal with the
>   spurious interrupts?
>
> XXX Only the `MAPC` issue seems problematic. Is this a critical issue or
> can we get away with it?

This is not the only issue. With your solution, you let the possibility
to the guest having one vLPI for multiple (devID, ID)/LPI. Even if we
validate it when the command MAPI/MAPVI is sent, the guest could modify
himself the ITT.

Furthermore, each virtual IRQ of a domain is associated to a struct
pending_irq. This structure contains an irq_desc which is a pointer to
the physical IRQ. With a craft ITT the guest could mess up those
datastructure. Although, I think we can get back on our feet after the
domain is destroyed.

After reading this draft, I think we can avoid to browse the Device
Table and the ITT. As said on the previous paragraph, the pending_irq
structure is linked to an irq_desc. In your proposal, you suggested to
store the its_device in the irq_guest (part of irq_desc). If we make
usage of pending_irq->desc to store the physical descriptor, we can have
a mapping vLPI <=> pLPI. Therefore, this would resolve UI1 and AFAICT,
the memory usage would be the same in Xen of the ITT/Device base solution.

Regards,

-- 
Julien Grall

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Draft D] Xen on ARM vITS Handling
  2015-06-04 17:55 ` Julien Grall
@ 2015-06-04 19:08   ` Julien Grall
  2015-06-05 10:24   ` Ian Campbell
  1 sibling, 0 replies; 18+ messages in thread
From: Julien Grall @ 2015-06-04 19:08 UTC (permalink / raw
  To: Julien Grall, Ian Campbell, xen-devel@lists.xen.org
  Cc: manish.jaggi, Julien Grall, Stefano Stabellini, Vijay Kilari



On 04/06/2015 18:55, Julien Grall wrote:

> After reading this draft, I think we can avoid to browse the Device
> Table and the ITT. As said on the previous paragraph, the pending_irq
> structure is linked to an irq_desc. In your proposal, you suggested to
> store the its_device in the irq_guest (part of irq_desc). If we make
> usage of pending_irq->desc to store the physical descriptor, we can have
> a mapping vLPI <=> pLPI. Therefore, this would resolve UI1 and AFAICT,
> the memory usage would be the same in Xen of the ITT/Device base solution.

BTW, I will suggest a pseudo-code based on your draft tomorrow. It may 
be easier to understand.

-- 
Julien Grall

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Draft D] Xen on ARM vITS Handling
  2015-06-04 13:54 [Draft D] Xen on ARM vITS Handling Ian Campbell
  2015-06-04 17:55 ` Julien Grall
@ 2015-06-05  6:07 ` Vijay Kilari
  2015-06-05  9:16   ` Ian Campbell
                     ` (2 more replies)
  1 sibling, 3 replies; 18+ messages in thread
From: Vijay Kilari @ 2015-06-05  6:07 UTC (permalink / raw
  To: Ian Campbell
  Cc: manish.jaggi, Julien Grall, Stefano Stabellini,
	xen-devel@lists.xen.org

On Thu, Jun 4, 2015 at 7:24 PM, Ian Campbell <ian.campbell@citrix.com> wrote:
> Draft D follows. Also at:
> http://xenbits.xen.org/people/ianc/vits/draftD.{pdf,html}

http://xenbits.xen.org/people/ianc/vits/draftD.pdf downloads draftC not draftD

>
> In this iteration I took a step back and tried to simplify a lot of
> things, in an attempt to get closer to something we can agree on as a
> first cut that is achievable for 4.6, since I felt we were getting
> bogged down in the complexity of trying to do everything at once. These
> assumptions/short comings can be addressed in future iterations of the
> code.
>
> Please let me know what you think.
>
> Perhaps it would be useful to have a short IRC meeting on #xenarm to
> iron out the last details? If people could let me know their
> availability and timezones I can try and set something up.
>
> If people prefer we could use a slot in the Monthly technical call[0],
> which is next week but I think IRC is probably better suited?
>
> [0] http://lists.xen.org/archives/html/xen-devel/2015-06/msg00460.html
>
> Ian.
>
> -----8>------------
>
> % Xen on ARM vITS Handling
> % Ian Campbell <ian.campbell@citrix.com>
> % Draft D
>
> # Changelog
>
> ## Since Draft C
>
> * _Major_ rework, in an attempt to simplify everything into something
>   more likely to be achievable for 4.6.
>     * Made some simplifying assumptions.
>     * Reduced the scope of some support.
>     * Command emulation is now mostly trivial.
>     * Expanded detail on host setup, allowing other assumptions to be
>       made during emulation.
> * Many other things lost in the noise of the above.
>
> ## Since Draft B
>
> * Details of command translation (thanks to Julien and Vijay)
> * Added background on LPI Translation and Pending tables
> * Added background on Collections
> * Settled on `N:N` scheme for vITS:pat's mapping.
> * Rejigged section nesting a bit.
> * Since we now thing translation should be cheap, settle on
>   translation at scheduling time.
> * Lazy `INVALL` and `SYNC`
>
> ## Since Draft A
>
> * Added discussion of when/where command translation occurs.
> * Contention on scheduler lock, suggestion to use SOFTIRQ.
> * Handling of domain shutdown.
> * More detailed discussion of multiple vs single vits pros/cons.
>
> # Introduction
>
> ARM systems containing a GIC version 3 or later may contain one or
> more ITS logical blocks. An ITS is used to route Message Signalled
> interrupts from devices into an LPI injection on the processor.
>
> The following summarises the ITS hardware design and serves as a set
> of assumptions for the vITS software design. For full details of the
> ITS see the "GIC Architecture Specification".
>
> ## Locality-specific Peripheral Interrupts (`LPI`)
>
> This is a new class of message signalled interrupts introduced in
> GICv3. They occupy the interrupt ID space from `8192..(2^32)-1`.
>
> The number of LPIs support by an ITS is exposed via
> `GITS_TYPER.IDbits` (as number of bits - 1), it may be up to
> 2^32. _Note_: This field also contains the number of Event IDs
> supported by the ITS.
>
> ### LPI Configuration Table
>
> Each LPI has an associated configuration byte in the LPI Configuration
> Table (managed via the GIC Redistributor and placed at
> `GICR_PROPBASER` or `GICR_VPROPBASER`). This byte configures:
>
> * The LPI's priority;
> * Whether the LPI is enabled or disabled.
>
> Software updates the Configuration Table directly but must then issue
> an invalidate command (per-device `INV` ITS command, global `INVALL`
> ITS command or write `GICR_INVLPIR`) for the affect to be guaranteed
> to become visible (possibly requiring an ITS `SYNC` command to ensure
> completion of the `INV` or `INVALL`). Note that it is valid for an
> implementation to reread the configuration table at any time (IOW it
> is _not_ guaranteed that a change to the LPI Configuration Table won't
> be visible until an invalidate is issued).
>
> ### LPI Pending Table
>
> Each LPI also has an associated bit in the LPI Pending Table (managed
> by the GIC redistributor). This bit signals whether the LPI is pending
> or not.
>
> This region may contain out of date information and the mechanism to
> synchronise is `IMPLEMENTATION DEFINED`.
>
> ## Interrupt Translation Service (`ITS`)
>
> ### Device Identifiers
>
> Each device using the ITS is associated with a unique "Device
> Identifier".
>
> The device IDs are properties of the implementaiton and are typically
> described via system firmware, e.g. the ACPI IORT table or via device
> tree.
>
> The number of device ids in a system depends on the implementation and
> can be discovered via `GITS_TYPER.Devbits`. This field allows an ITS
> to have up to 2^32 devices.
>
> ### Events
>
> Each device can generate "Events" (called `ID` in the spec) these
> correspond to possible interrupt sources in the device (e.g. MSI
> offset).
>
> The maximum number of interrupt sources is device specific. It is
> usually discovered either from firmware tables (e.g. DT or ACPI) or
> from bus specific mechanisms (e.g. PCI config space).
>
> The maximum number of events ids support by an ITS is exposed via
> `GITS_TYPER.IDbits` (as number of bits - 1), it may be up to
> 2^32. _Note_: This field also contains the number of `LPIs` supported
> by the ITS.
>
> ### Interrupt Collections
>
> Each interrupt is a member of an "Interrupt Collection". This allows
> software to manage large numbers of physical interrupts with a small
> number of commands rather than issuing one command per interrupt.
>
> On a system with N processors, the ITS must provide at least N+1
> collections.
>
> An ITS may support some number of internal collections (indicated by
> `GITS_TYPER.HCC`) and external ones which require memory provisioned
> by the Operating System via a `GITS_BASERn` register.
>
> ### Target Addresses
>
> The Target Address correspond to a specific GIC re-distributor. The
> format of this field depends on the value of the `GITS_TYPER.PTA` bit:
>
> * 1: the base address of the re-distributor target is used
> * 0: a unique processor number is used. The mapping between the
>   processor affinity value (`MPIDR`) and the processor number is
>   discoverable via `GICR_TYPER.ProcessorNumber`.
>
> This value is up to the ITS implementer (`GITS_TYPER` is a read-only
> register).
>
> ### Device Table
>
> A Device Table is configured in each ITS which maps incoming device
> identifiers into an ITS Interrupt Translation Table.
>
> ### Interrupt Translation Table (`ITT`) and Collection Table
>
> An `Event` generated by a `Device` is translated into an `LPI` via a
> per-Device Interrupt Translation Table. The structure of this table is
> described in GIC Spec 4.9.12.
>
> The ITS translation table maps the device id of the originating device
> into a physical interrupt (`LPI`) and an Interrupt Collection.
>
> The Collection is in turn looked up in the Collection Table to produce
> a Target Address, indicating a redistributor (AKA CPU) to which the
> LPI is delivered.
>
> ### OS Provisioned Memory Regions
>
> The ITS hardware design provides mechanisms for an ITS to be provided
> with various blocks of memory by the OS for ITS internal use, this
> include the per-device ITT (established with `MAPD`) and memory
> regions for Device Tables, Virtual Processors and Interrupt
> Collections. Up to 8 such regions can be requested by the ITS and
> provisioned by the OS via the `GITS_BASERn` registers.
>
> ### ITS Configuration
>
> The ITS is configured and managed, including establishing and
> configuring the Translation Tables and Collection Table, via an in
> memory ring shared between the CPU and the ITS controller. The ring is
> managed via the `GITS_CBASER` register and indexed by `GITS_CWRITER`
> and `GITS_CREADR` registers.
>
> A processor adds commands to the shared ring and then updates
> `GITS_CWRITER` to make them visible to the ITS controller.
>
> The ITS controller processes commands from the ring and then updates
> `GITS_CREADR` to indicate the the processor that the command has been
> processed.
>
> Commands are processed sequentially.
>
> Commands sent on the ring include operational commands:
>
> * Routing interrupts to processors;
> * Generating interrupts;
> * Clearing the pending state of interrupts;
> * Synchronising the command queue
>
> and maintenance commands:
>
> * Map device/collection/processor;
> * Map virtual interrupt;
> * Clean interrupts;
> * Discard interrupts;
>
> The field `GITS_CBASER.Size` encodes the number of 4KB pages minus 0
> consisting of the command queue. This field is 8 bits which means the
> maximum size is 2^8 * 4KB = 1MB. Given that each command is 32 bytes,
> there is a maximum of 32768 commands in the queue.
>
> The ITS provides no specific completion notification
> mechanism. Completion is monitored by a combination of a `SYNC`
> command and either polling `GITS_CREADR` or notification via an
> interrupt generated via the `INT` command.
>
> Note that the interrupt generation via `INT` requires an originating
> device ID to be supplied (which is then translated via the ITS into an
> LPI). No specific device ID is defined for this purpose and so the OS
> software is expected to fabricate one.
>
> Possible ways of inventing such a device ID are:
>
> * Enumerate all device ids in the system and pick another one;
> * Use a PCI BDF associated with a non-existent device function (such
>   as an unused one relating to the PCI root-bridge) and translate that
>   (via firmware tables) into a suitable device id;
> * ???
>
> # LPI Handling in Xen
>
> ## IRQ descriptors
>
> Currently all SGI/PPI/SPI interrupts are covered by a single static
> array of `struct irq_desc` with ~1024 entries (the maximum interrupt
> number in that set of interrupt types).
>
> The addition of LPIs in GICv3 means that the largest potential
> interrupt specifier is much larger.
>
> Therefore a second dynamically allocated array will be added to cover
> the range `8192..nr_lpis`. The `irq_to_desc` function will determine
> which array to use (static `0..1024` or dynamic `8192..end` lpi desc
> array) based on the input irq number. Two arrays are used to avoid a
> wasteful allocation covering the unused/unusable) `1024..8191` range.
>
> ## Virtual LPI interrupt injection
>
> A physical interrupt which is routed to a guest vCPU has the
> `_IRQ_GUEST` flag set in the `irq_desc` status mask. Such interrupts
> have an associated instance of `struct irq_guest` which contains the
> target `struct domain` pointer and virtual interrupt number.
>
> In Xen a virtual interrupt (either arising from a physical interrupt
> or completely virtual) is ultimately injected to a VCPU using the
> `vgic_vcpu_inject_irq` function, or `vgic_vcpu_inject_spi`.
>
> This mechanism will likely need updating to handle the injection of
> virtual LPIs. In particular rather than `GICD_ITARGERRn` or
> `GICD_IROUTERn` routing of LPIs is performed via the ITS collections
> mechanism. This is discussed below (In _vITS_:_Virtual LPI injection_).
>

For 4.6 scope, we can inject LPIs always to VCPU0?

> # Scope
>
> The ITS is rather complicated, especially when combined with
> virtualisation. To simplify things we initially omit the following
> functionality:
>
> - Interrupt -> vCPU -> pCPU affinity. The management of physical vs
>   virtual Collections is a feature of GICv4, thus is omitted in this
>   design for GICv3. Physical interrupts which occur on a pCPU where
>   the target vCPU is not already resident will be forwarded (via IPI)
>   to the correct pCPU for injection via the existing
>   `vgic_vcpu_inject_irq` mechanism (extended to handle LPI injection
>   correctly).
> - Clearing of the pending state of an LPI under various circumstances
>   (`MOVI`, `DISCARD`, `CLEAR` commands) is not done. This will result
>   in guests seeing some perhaps spurious interrupts.
> - vITS functionality will only be available on 64-bit ARM hosts,
>   avoiding the need to worry about fast access to guest owned data
>   structures (64-bit uses a direct map). (NB: 32-bit guests on 64-bit
>   hosts can be considered to have access)
>
> XXX Can we assume that `GITS_TYPER.Devbits` will be sane,
> i.e. requiring support for the full 2^32 device ids would require a
> 32GB device table even for native, which is improbable except on
> systems with RAM measured in TB. So we can probably assume that
> Devbits will be appropriate to the size of the system. _Note_: We
> require per guest device tables, so size of the native Device Table is
> not the only factor here.
>
> XXX Likewise can we assume that `GITS_TYPER.IDbits` will be sane?
> i.e. that the required ITT table size will be reasonable?
>
> # Unresolved Issues
>
> Various parts are marked with XXX. Most are minor, but there s one
> more or less major one, which we may or may not be able to live with
> for a first implementation:
>
> 1. When handling Virtual LPI Configuration Table writes we do not have
>    a Device ID, so we cannot consult the virtual Device Table, ITT etc
>    to determine if the LPI is actually mapped. This means that the
>    physical LPI enable/disable is decoupled from the validity of the
>    virtual ITT. Possibly resulting in spurious LPIs which must be
>    ignored.
>
> This issue is discussed further in the relevant places in the text,
> marked with `XXX UI1`.
>
> # pITS
>
> ## Assumptions
>
> It is assumed that `GITS_TYPER.IDbits` is large enough that there are
> sufficient LPIs available to cover the sum of the number of possible
> events generated by each device in the system (that is the sum of the
> actual events for each bit of hardware, rather than the notional
> per-device maximum from `GITS_TYPER.Idbits`).
>
> This assumption avoids the need to do memory allocations and interrupt
> routing at run time, e.g. during command processing by allowing us to
> setup everything up front.
>
> ## Driver
>
> The physical driver will provide functions for enabling, disabling
> routing etc a specified interrupt, via the usual Xen APIs for doing
> such things.
>
> This will likely involve interacting with the physical ITS command
> queue etc. In this document such interactions are considered internal
> to the driver (i.e. we care that the API to enable an interrupt
> exists, not how it is implemented).
>
> ## Device Table
>
> The `pITS` device table will be allocated and given to the pITS at
> start of day.
>
> ## Collections
>
> The `pITS` will be configured at start of day with 1 Collection mapped
> to each physical processor, using the `MAPC` command on the physical
> ITS.
>
> ## Per Device Information
>
> Each physical device in the system which can be used together with an
> ITS (whether using passthrough or not) will have associated with it a
> data structure:
>
>     struct its_device {
>         uintNN_t phys_device_id;
>         uintNN_t virt_device_id;
>         unsigned int *events;
>         unsigned int nr_events;
>         struct page_info *pitt;
>         unsigned int nr_pitt_pages
>     };
>
> Where:
>
> - `phys_device_id`: The physical device ID of the physical device
> - `virt_device_id`: The virtual device ID if the device is accessible
>   to a domain
> - `events`: An array mapping a per-device event number into a physical
>   LPI.
> - `nr_events`: The number of events which this device is able to
>   generate.
> - `pitt`, `nr_pitt_pages`: Records allocation of pages for physical
>   ITT (not directly accessible).
>
> During its lifetime this structure may be referenced by several
> different mappings (e.g. physical and virtual device id maps, virtual
> collection device id).
>
> ## Device Discovery/Registration and Configuration
>
> Per device information will be discovered based on firmware tables (DT
> or ACPI) and information provided by dom0 (e.g. registration via
> PHYSDEVOP_pci_device_add or new custom hypercalls).
>
> This information shall include at least:
>
> - The Device ID of the device.
> - The maximum number of Events which the device is capable of
>   generating.
>
> When a device is discovered/registered (i.e. when all necessary
> information is available) then:
>
> - `struct its_device` and the embedded `events` array will be
>   allocated (the latter with `nr_events` elements).
> - The `struct its_device` will be inserted into a mapping (possibly an
>   R-B tree) from its physical Device ID to the `struct its`.

   Why not radix tree. It might be better in look up?

> - `nr_events` physical LPIs will be allocated and recorded in the
>   `events` array.
> - An ITT table will be allocated for the device and the appropriate
>   `MAPD` command will be issued to the physical ITS. The location will
>   be recorded in `struct its_device.pitt`.
> - Each Event which the device may generate will be mapped to the
>   corresponding LPI in the `events` array and a collection, by issuing
>   a series of `MAPVI` commands. Events will be assigned to physical
>   collections in a round-robin fashion.
>
> This setup must occur for a given device before any ITS interrupts may
> be configured for the device and certainly before a device is passed
> through to a guest. This implies that dom0 cannot use MSIs on a PCI
> device before having called `PHYSDEVOP_pci_device_add`.
>
> # Device Assignment
>
> Each domain will have an associated mapping from virtual device ids
> into a data structure describing the physical device, including a
> reference to the relevant `struct its_device`.
>
> The number of possible device IDs may be large so a simple array or
> list is likely unsuitable. A tree (e.g. Red-Black may be a suitable
> data structure. Currently we do not need to perform lookups in this
> tree on any hot paths.
>
> _Note_: In the context of virtualised device ids (especially for domU)
> it may be possible to arrange for the upper bound on the number of
> device IDs to be lower allowing a more efficient data structure to be
> used. This is left for a future improvement.
>
> When a device is assigned to a domain (including to domain 0) the
> mapping for the new virtual device ID will be entered into the tree.
>
> During assignment all LPIs associated with the device will be routed
> to the guest (i.e. `route_irq_to_guest` will be called for each LPI in
> the `struct its_device.events` array).

 Is there any hypercall where in we can assign & route LPI's to the guest?

>
> # vITS
>
> A guest domain which is allowed to use ITS functionality (i.e. has
> been assigned pass-through devices which can generate MSIs) will be
> presented with a virtualised ITS.
>
> Accesses to the vITS registers will trap to Xen and be emulated and a
> virtualised Command Queue will be provided.
>
> Commands entered onto the virtual Command Queue will be translated
> into physical commands, as described later in this document.
>
> There are other aspects to virtualising the ITS (LPI collection
> management, assignment of LPI ranges to guests, device
> management). However these are only considered here to the extent
> needed for describing the vITS emulation.
>
> ## Xen interaction with guest OS provisioned vITS memory
>
> Memory which the guest provisions to the vITS (ITT via `MAPD` or other
> tables via `GITS_BASERn`) needs careful handling in Xen.
>
> Since Xen cannot trust data in data structures contained in such
> memory if a guest can trample over it at will. Therefore Xen either
> must take great care when accessing data structures stored in such
> memory to validate the contents e.g. not trust that values are within
> the required limits or it must take steps to restrict guest access to
> the memory when it is provisioned. Since the data structures are
> simple and most accessors need to do bounds check anyway it is
> considered sufficient to simply do the necessary checks on access.
>
> Most data structures stored in this shared memory are accessed on the
> hot interrupt injection path and must therefore be quickly accessbile
> from within Xen. Since we have restricted vits support to 64-bit hosts
> only `map_domain_page` is fast enough to be used on the fly and
> therefore we do not need to be concerned about unbounded amounts of
> permanently mapped memory consumed by each `MAPD` command.
>
> Although `map_domain_page` is fast, `p2m_lookup` (translation from IPA
> to PA) is not necessarily so. For now we accept this, as a future
> extension a sparse mapping of the guest device table in vmap space
> could be considered, with limits on the total amount of vmap space which
> we allow each domain to consume.
>
> ## vITS properties
>
> The vITS implementation shall have:
>
> - `GITS_TYPER.HCC == nr_vcpus + 1`.
> - `GITS_TYPER.PTA == 0`. Target addresses are linear processor numbers.
> - `GITS_TYPER.Devbits == See below`.
> - `GITS_TYPER.IDbits == See below`.
> - `GITS_TYPER.ITT Entry Size == 7`, meaning 8 bytes, which is the size
>   of `struct vitt` (defined below).
>
> `GITS_TYPER.Devbits` and `GITS_TYPER.Idbits` will need to be chosen to
> reflect the host and guest configurations (number of LPIs, maximum
> device ID etc).
>
> Other fields (not mentioned here) will be set to some sensible (or
> mandated) value.
>
> The `GITS_BASER0` will be setup to request sufficient memory for a
> device table consisting of entries of:
>
>     struct vdevice_table {
>         uint64_t vitt_ipa;
>         uint32_t vitt_size;
>         uint32_t padding;
>     };
>     BUILD_BUG_ON(sizeof(struct vdevice_table) != 16);
>
> On write to `GITS_BASE0` the relevant details of the Device Table
> (IPA, size, cache attributes to use when mapping) will be recorded in
> `struct domain`.
>
> All other `GITS_BASERn.Valid == 0`.
>
> ## vITS to pITS mapping
>
> A physical system may have multiple physical ITSs.
>
> With the simplified vits command model presented here only a single
> `vits` is required.
>
> In the future a more complex arrangement may be desired. Since the
> choice of model is internal to the hypervisor/tools and is
> communicated to the guest via firmware tables we are not tied to this
> model as an ABI if we decide to change.

Do you mean that for 4.6 1:N model is followed? why not
1:1 vITS to pITS mapping?

>
> ## LPI Configuration Table Virtualisation
>
> A guest's write accesses to its LPI Configuration Table (which is just
> an area of guest RAM which the guest has nominated) will be trapped to
> the hypervisor, using stage 2 MMU permissions, in order for changes to
> be propagated into the host interrupt configuration.
>
> On write `bit[0]` of the written byte is the enable/disable state for
> the irq and is handled thus:
>
>     lpi = (addr - table_base);
>     if ( byte & 1 )
>         enable_irq(lpi);
>     else
>         disable_irq(lpi);
>
> Note that in the context of this emulation we do not have access to a
> Device ID, and therefore cannot make decisions based on whether the
> LPI/event has been `MAPD`d etc. In any case we have an `lpi` in our
> hand and not an `event`, IOW we would need to do a _reverse_ lookup in
> the ITT.
>
> LPI priority (the remaining bits in the written byte) is currently
> ignored.
>
> ## LPI Pending Table Virtualisation
>
> XXX Can we simply ignore this? 4.8.5 suggests it is not necessarily in
> sync and the mechanism to force a sync is `IMPLEMENTATION DEFINED`.
>
> ## Device Table Virtualisation
>
> The IPA, size and cacheability attributes of the guest device table
> will be recorded in `struct domain` upon write to `GITS_BASER0`.
>
> In order to lookup an entry for `device`:
>
>     define {get,set}_vdevice_entry(domain, device, struct device_table *entry):
>         offset = device*sizeof(struct vdevice_table)
>         if offset > <DT size>: error
>
>         dt_entry = <DT base IPA> + device*sizeof(struct vdevice_table)
>         page = p2m_lookup(domain, dt_entry, p2m_ram)
>         if !page: error
>         /* nb: non-RAM pages, e.g. grant mappings,
>          * are rejected by this lookup */
>
>         dt_mapping = map_domain_page(page)
>
>         if (set)
>              dt_mapping[<appropriate page offset from device>] = *entry;
>         else
>              *entry = dt_mapping[<appropriate page offset>];
>
>         unmap_domain_page(dt_mapping)
>
> Since everything is based upon IPA (guest addresses) a malicious guest
> can only reference its own RAM here.

   Here device table memory allocated by guest is used to lookup for the device.
Why can't we avoid using the guest memory all together and just only emulate
read and write of GITS_BASER0.
Let Xen manage device information using its_device structure based on
MAPD command.
am I missing any spec where in it specifies format of Device table
managed by ITS hw
which software can read/update directly?

>
> ## ITT Virtualisation
>
> The location of a VITS will have been recorded in the domain Device
> Table by a `MAPI` or `MAPVI` command and is looked up as above.
>
> The `vitt` is a `struct vitt`:
>
>     struct vitt {
>         uint16_t valid:1;
>         uint16_t pad:15;
>         uint16_t collection;
>         uint32_t vpli;
>     };
>     BUILD_BUG_ON(sizeof(struct vitt) != 8);
>
> A lookup occurs similar to for a device table, the offset is range
> checked against the `vitt_size` from the device table. To lookup
> `event` on `device`:
>
>     define {get,set}_vitt_entry(domain, device, event, struct vitt *entry):
>         get_vdevice_entry(domain, device, &dt)
>
>         offset = device*sizeof(struct vitt);
>         if offset > dt->vitt_size: error
>
>         vitt_entry = dt->vita_ipa + event*sizeof(struct vitt)
>         page = p2m_lookup(domain, vitt_entry, p2m_ram)
>         if !page: error
>         /* nb: non-RAM pages, e.g. grant mappings,
>          * are rejected by this lookup */
>
>         vitt_mapping = map_domain_page(page)
>
>         if (set)
>              vitt_mapping[<appropriate page offset from event>] = *entry;
>         else
>              *entry = vitt_mapping[<appropriate page offset>];
>
>         unmap_domain_page(entry)
>
> Again since this is IPA based a malicious guest can only point things
> to its own ram.

Same as above comment.  Xen can manage vlpi<=>plpi mapping
for each device. This info can also be used in enabling/disableing
of LPI (LPI configuration table update). The size of this vlpi can
be equal to number of vectors supported by the device.

>
> ## Collection Table Virtualisation
>
> A pointer to a dynamically allocated array `its_collections` mapping
> collection ID to vcpu ID will be added to `struct domain`. The array
> shall have `nr_vcpus + 1` entries and resets to ~0 (or another
> explicitly invalid vpcu nr).
>
> ## Virtual LPI injection
>
> As discussed above the `vgic_vcpu_inject_irq` functionality will need
> to be extended to cover this new case, most likely via a new
> `vgic_vcpu_inject_lpi` frontend function.
>
> `vgic_vcpu_inject_lpi` receives a `struct domain *` and a virtual
> interrupt number (corresponding to a vLPI) and needs to figure out
> which vcpu this should map to.
>
> To do this it must look up the Collection ID associated (via the vITS)
> with that LPI.
>
> Proposal: Add a new `its_device` field to `struct irq_guest`, a
> pointer to the associated `struct its_device`. The existing `struct
> irq_guest.virq` field contains the event ID (perhaps use a `union`
> to give a more appropriate name) and _not_ the virtual LPI. Injection
> then consists of:
>
>         d = irq_guest->domain
>         virq = irq_guest->virq
>         its_device = irq_guest->its_device
>
>         get_vitt_entry(d, its_device->virt_device_id, virq, &vitt)
>         vcpu = d->its_collections[vitt.collection]
>         vgic_vcpu_inject_irq(d, &d->vcpus[vcpu])
>
> In the event that the IIT is not `MAPD`d, or the Event has not been
> `MAPI`/`MAPVI`d or the collection is not `MAPC`d here the interrupt is
> simply ignored. Note that this can happen because LPI mapping is
> decoupled from LPI enablement. In particular writes to the LPI
> Configuration Table do not include a Device ID and therefore cannot
> make decisions based on the ITT.
>
> XXX UI1 if we could find a reliable way to reenable then could
> potentially disable LPI on error and reenable later (taking a spurious
> Xen interrupt for each possible vits misconfiguration). IOW if the
> interrupt is invalid for each of these reasons we can disable and
> reenable as described:
>
> - Not `MAPD`d -- on `MAPD` enable all associate LPIs which are enabled
>   in LPI CFG Table.
> - Not `MAPI`/`MAPVI`d -- on `MAPI`/`MAPVI` enable LPI if enabled in
>   CFG Table.
> - Not `MAPC`d -- tricky. Need to know lists of LPIs associated with a
>   virtual collection. A `list_head` in `struct irq_guest` implies a
>   fair bit of overhead, number of LPIs per collection is the total
>   number of LPIs (guest could assign all to the same
>   collection). Could walk entire LPI CFG and reenable all set LPIs,
>   might involve walking over several KB of memory though. Could inject
>   unmapped collections to vcpu0, forcing the guest to deal with the
>   spurious interrupts?
>
> XXX Only the `MAPC` issue seems problematic. Is this a critical issue or
> can we get away with it?
>
> ## Command Queue Virtualisation
>
> The command translation/emulation in this design has been arranged to
> be as cheap as possible (e.g. in many cases the actions are NOPs),
> avoiding previous concerns about the length of time which an emulated
> write to a `CWRITER` register may block the vcpu.
>
> The vits will simply track its reader and writer pointers. On write
> to `CWRITER` it will immediately and synchronously process all
> commands in the queue and update its state accordingly.

 you mean for 4.6 we don't use any completion INT or softirq approach?

>
> It might be possible to implement a rudimentary form of preemption by
> periodically (as determined by `hypercall_preempt_check()`) returning
> to the guest without incrementing PC but with updated internal
> `CREADR` state, meaning it will reexecute the write to `CWRITER` and
> we can pickup where we left off for another iteration. This at least
> lets us schedule other vcpus etc and prevents a monopoly.
>
> ## ITS Command Translation
>
> This section is based on the section 5.13 of GICv3 specification
> (PRD03-GENC-010745 24.0) and provides concrete ideas of how this can
> be interpreted for Xen.
>
> The ITS provides 12 commands in order to manage interrupt collections,
> devices and interrupts. Possible command parameters are:
>
> - Device ID (`Device`) (called `device` in the spec).
> - Event ID (`Event`) (called `ID` in the spec). This is an index into
>   a devices `ITT`.
> - Collection ID (`Collection`) (called `collection` in the spec)
> - LPI ID (`LPI`) (called `pID` in the spec)
> - Target Address (`TA`) (called `TA` in the spec`)
>
> These parameters need to be validated and translated from Virtual (`v`
> prefix) to Physical (`p` prefix).
>
> Note, we differ from the naming in the GIC spec for clarity, in
> particular we use `Event` not `ID` and `LPI` not `pID` to reduce
> confusion, especially when `v` and `p` suffixes are used due to
> virtualisation.
>
> ### Parameter Validation / Translation
>
> Each command contains parameters that needs to be validated before any
> usage in Xen or passing to the hardware.
>
> #### Device ID (`Device`)
>
> Corresponding ITT obtained by looking up as described above.
>
> The physical `struct its_device` can be found by looking up in the
> domain's device map.
>
> If lookup fails or the resulting device table entry is invalid then
> the Device is invalid.
>
> #### Event ID (`Event`)
>
> Validated against emulated `GITS_TYPER.IDbits`.
>
> It is not necessary to translate a `vEvent`.
>
> #### LPI (`LPI`)
>
> Validated against emulated `GITS_TYPER.IDbits`.
>
> It is not necessary to translate a `vLPI` into a `pLPI` since the
> tables all contain `vLPI`. (Translation from `pLPI` to `vLPI` happens
> via `struct irq_guest` when we receive the IRQ).
>
> #### Interrupt Collection (`Collection`)
>
> The `Collection` is validated against the size of the per-domain
> `its_collections` array (i.e. nr_vcpus + 1) and then translated by a
> simple lookup in that array.
>
>      vcpu_nr = d->its_collections[Collection]
>
> A result > `nr_cpus` is invalid
>
> #### Target Address (`TA`)
>
> This parameter is used in commands which manage collections. It is a
> unique identifier per processor.
>
> We have chosen to implement `GITS_TYPER.PTA` as 0, hence `vTA` simply
> corresponds to the `vcpu_id`, so only needs bounds checking against
> `nr_vcpus`.
>
> ### Commands
>
> To be read with reference to spec for each command (which includes
> error checks etc which are omitted here).
>
> It is assumed that inputs will be bounds and validity checked as
> described above, thus error handling is omitted for brevity (i.e. if
> get and/or set fail then so be it). In general invalid commands are
> simply ignored.
>
> #### `MAPD`: Map a physical device to an ITT.
>
> _Format_: `MAPD Device, Valid, ITT Address, ITT Size`.
>
> _Spec_: 5.13.11
>
> `MAPD` is sent with `Valid` bit set if the mapping is to be added and
> reset when mapping is removed.
>
> The domain's device table is updated with the provided information.
>
> The `vitt_mapd` field is set according to the `Valid` flag in the
> command:
>
>     dt_entry.vitt_ipa = ITT Address
>     dt_entry.vitt_size = ITT Size
>     set_vdevice_entry(current->domain, Device, &dt_entry)
>
>     XXX UI1 Poss. handle dis/enabling pLPI, if mapping now completely (in)valid.
>
> XXX Should we validate size? Guests own fault if we run off the end of
> a table later? If we did start to consider permanent mappings then we
> _would_ care.
>
> #### `MAPC`: Map an interrupt collection to a target processor
>
> _Format_: `MAPC Collection, TA`
>
> _Spec_: 5.13.12
>
> The updated `vTA` (a vcpu number) is recorded in the `its_collections`
> array of the domain struct:
>
>     d->its_collections[Collection] = TA
>
>     XXX UI1 Poss. handle dis/enabling pLPI, if mapping now completely (in)valid.
>
> #### `MAPI`: Map an interrupt to an interrupt collection.
>
> _Format_: `MAPI Device, LPI, Collection`
>
> _Spec_: 5.13.13
>
> After validation:
>
>     vitt.valid = True
>     vitt.collection = Collection
>     vitt.vpli = LPI
>     set_vitt_entry(current->domian, Device, LPI, &vitt)
>
>     XXX UI1 Poss. handle dis/enabling pLPI, if mapping now completely (in)valid.
>
> #### `MAPVI`: Map an input identifier to a physical interrupt and an interrupt collection.
>
> Format: `MAPVI Device, Event, LPI, Collection`
>
>     vitt.valid = True
>     vitt.collection = Collection
>     vitt.vpli = LPI
>     set_vitt_entry(current->odmian, Device, Event, &vitt)
>
>     XXX UI1 Poss. handle dis/enabling pLPI, if mapping now completely (in)valid.
>
> #### `MOVI`: Redirect interrupt to an interrupt collection
>
> _Format_: `MOVI Device, Event, Collection`
>
> _Spec_: 5.13.15
>
>     get_vitt_entry(current->domain, Device, Event, &vitt)
>     vitt.collection = Collection
>     set_vitt_entry(current->domain, Device, Event, &vitt)
>
>     XXX consider helper which sets field without mapping/unmapping
>     twice.
>
> This command is supposed to move any pending interrupts associated
> with `Event` to the vcpu implied by the new `Collection`, which is
> tricky. For now we ignore this requirement (as we do for
> `GICD_IROUTERn` and `GICD_TARGETRn` for other interrupt types).
>
> #### `DISCARD`: Discard interrupt requests
>
> _Format_: `DISCARD Device, Event`
>
> _Spec_: 5.13.16
>
>     get_vitt_entry(current->domain, Device, Event, &vitt)
>     vitt.valid = False
>     set_vitt_entry(current->domain, Device, Event, &vitt)
>
>     XXX UI1 Poss. handle dis/enabling pLPI, if mapping now completely (in)valid.
>
>     XXX consider helper which sets field without mapping/unmapping
>     twice.
>
> This command is supposed to clear the pending state of any associated
> interrupt. This requirement is ignored (guest may see a spurious
> interrupt).
>
> #### `INV`: Clean any caches associated with interrupt
>
> _Format_: `INV Device, Event`
>
> _Spec_: 5.13.17
>
> Since LPI Configuration table updates are handled synchronously in the
> respective trap handler there is nothing to do here.
>
> #### `INVALL`: Clean any caches associated with an interrupt collection
>
> _Format_: `INVALL Collection`
>
> _Spec_: 5.13.19
>
> Since LPI Configuration table updates are handled synchronously there
> is nothing to do here.
>
> #### `INT`: Generate an interrupt
>
> _Format_: `INT Device, Event`
>
> _Spec_: 5.13.20
>
> The `vitt` entry corresonding to `Device,Event` is looked up and then:
>
>     get_vitt_entry(current->domain, Device, Event, &vitt)
>     vgic_vcpu_inject_lpi(current->domain, vitt.vlpi)
>
> XXX Where (Device,Event) is real may need consideration of
> interactions with real LPIs being delivered: Julien had concerns about
> Xen's internal IRQ State tracking. if this is a problem then may need
> changes to IRQ state tracking, or to inject as a real IRQ and let
> physical IRQ injection handle it, or write to `GICR_SETLPIR`?
>
> #### `CLEAR`: Clear the pending state of an interrupt
>
> _Format_: `CLEAR Device, Event`
>
> _Spec_: 5.13.21
>
> Should clear pending state of LPI. Ignore (guest may see a spurious
> interrupt).
>
> #### `SYNC`: Wait for completion of any outstanding ITS actions for collection
>
> _Format_: `SYNC TA`
>
> _Spec_: 5.13.22
>
> This command can be ignored.
>
> # GICv4 Direct Interrupt Injection
>
> GICv4 will directly mark the LPIs pending in the virtual pending table
> which is per-redistributor (i.e per-vCPU).
>
> LPIs will be received by the guest the same way as an SPIs. I.e trap in
> IRQ mode then read ICC_IAR1_EL1 (for GICv3).
>
> Therefore GICv4 will not require one vITS per pITS.
>
> # Event Channels
>
> It has been proposed that it might be nice to inject event channels as
> LPIs in the future. Whether or not that would involve any sort of vITS
> is unclear, but if it did then it would likely be a separate emulation
> to the vITS emulation used with a pITS and as such is not considered
> further here.
>
> # Glossary
>
> * _MSI_: Message Signalled Interrupt
> * _ITS_: Interrupt Translation Service
> * _GIC_: Generic Interrupt Controller
> * _LPI_: Locality-specific Peripheral Interrupt
>
> # References
>
> "GIC Architecture Specification" PRD03-GENC-010745 24.0.
>
> "IO Remapping Table System Software on ARM® Platforms" ARM DEN 0049A.
>

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Draft D] Xen on ARM vITS Handling
  2015-06-05  6:07 ` Vijay Kilari
@ 2015-06-05  9:16   ` Ian Campbell
  2015-06-05  9:28   ` Julien Grall
  2015-06-05  9:49   ` Ian Campbell
  2 siblings, 0 replies; 18+ messages in thread
From: Ian Campbell @ 2015-06-05  9:16 UTC (permalink / raw
  To: Vijay Kilari
  Cc: manish.jaggi, Julien Grall, Stefano Stabellini,
	xen-devel@lists.xen.org

On Fri, 2015-06-05 at 11:37 +0530, Vijay Kilari wrote:
> On Thu, Jun 4, 2015 at 7:24 PM, Ian Campbell <ian.campbell@citrix.com> wrote:
> > Draft D follows. Also at:
> > http://xenbits.xen.org/people/ianc/vits/draftD.{pdf,html}
> 
> http://xenbits.xen.org/people/ianc/vits/draftD.pdf downloads draftC not draftD

Fixed, not sure why it wasn't being rebuilt.

Ian.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Draft D] Xen on ARM vITS Handling
  2015-06-05  6:07 ` Vijay Kilari
  2015-06-05  9:16   ` Ian Campbell
@ 2015-06-05  9:28   ` Julien Grall
  2015-06-05  9:51     ` Ian Campbell
  2015-06-05  9:49   ` Ian Campbell
  2 siblings, 1 reply; 18+ messages in thread
From: Julien Grall @ 2015-06-05  9:28 UTC (permalink / raw
  To: Vijay Kilari, Ian Campbell
  Cc: manish.jaggi, Julien Grall, Stefano Stabellini,
	xen-devel@lists.xen.org



On 05/06/2015 07:07, Vijay Kilari wrote:
> On Thu, Jun 4, 2015 at 7:24 PM, Ian Campbell <ian.campbell@citrix.com> wrote:
>> This information shall include at least:
>>
>> - The Device ID of the device.
>> - The maximum number of Events which the device is capable of
>>    generating.
>>
>> When a device is discovered/registered (i.e. when all necessary
>> information is available) then:
>>
>> - `struct its_device` and the embedded `events` array will be
>>    allocated (the latter with `nr_events` elements).
>> - The `struct its_device` will be inserted into a mapping (possibly an
>>    R-B tree) from its physical Device ID to the `struct its`.
>
>     Why not radix tree. It might be better in look up?

The lookup up in the radix tree is in O(k) where k is the size of the 
index (i.e the number of DevID bits).

In the R-B tree, the lookup is in O(log(n)) where n is the number of 
member in the tree.

As we use integer for the index, the R-B tree will be faster (The number 
of PCI device per domain will never be too big).

Regards,

-- 
Julien Grall

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Draft D] Xen on ARM vITS Handling
  2015-06-05  6:07 ` Vijay Kilari
  2015-06-05  9:16   ` Ian Campbell
  2015-06-05  9:28   ` Julien Grall
@ 2015-06-05  9:49   ` Ian Campbell
  2015-06-05 12:41     ` Vijay Kilari
  2 siblings, 1 reply; 18+ messages in thread
From: Ian Campbell @ 2015-06-05  9:49 UTC (permalink / raw
  To: Vijay Kilari
  Cc: manish.jaggi, Julien Grall, Stefano Stabellini,
	xen-devel@lists.xen.org

On Fri, 2015-06-05 at 11:37 +0530, Vijay Kilari wrote:
> > ## Virtual LPI interrupt injection
> >
> > A physical interrupt which is routed to a guest vCPU has the
> > `_IRQ_GUEST` flag set in the `irq_desc` status mask. Such interrupts
> > have an associated instance of `struct irq_guest` which contains the
> > target `struct domain` pointer and virtual interrupt number.
> >
> > In Xen a virtual interrupt (either arising from a physical interrupt
> > or completely virtual) is ultimately injected to a VCPU using the
> > `vgic_vcpu_inject_irq` function, or `vgic_vcpu_inject_spi`.
> >
> > This mechanism will likely need updating to handle the injection of
> > virtual LPIs. In particular rather than `GICD_ITARGERRn` or
> > `GICD_IROUTERn` routing of LPIs is performed via the ITS collections
> > mechanism. This is discussed below (In _vITS_:_Virtual LPI injection_).
> >
> 
> For 4.6 scope, we can inject LPIs always to VCPU0?

If that isn't going to be too upsetting for guest OSes we might be able
to get away with it. I can't remember if we did this for SPIs in the
early days also.


> > During assignment all LPIs associated with the device will be routed
> > to the guest (i.e. `route_irq_to_guest` will be called for each LPI in
> > the `struct its_device.events` array).
> 
>  Is there any hypercall where in we can assign & route LPI's to the guest?

There's:
        /*
         * Dom0 should use these two to announce MMIO resources assigned to
         * MSI-X capable devices won't (prepare) or may (release) change.
         */
        #define PHYSDEVOP_prepare_msix          30
        #define PHYSDEVOP_release_msix          31
        struct physdev_pci_device {
            /* IN */
            uint16_t seg;
            uint8_t bus;
            uint8_t devfn;
        };
which sort of is?

But, the paragraph above is proposing to do the routing on _device_
assignment, e.g. on XEN_DOMCTL_assign_device route all LPIs which we
have prellocated to the device to the designated domain.

> > ## vITS to pITS mapping
> >
> > A physical system may have multiple physical ITSs.
> >
> > With the simplified vits command model presented here only a single
> > `vits` is required.
> >
> > In the future a more complex arrangement may be desired. Since the
> > choice of model is internal to the hypervisor/tools and is
> > communicated to the guest via firmware tables we are not tied to this
> > model as an ABI if we decide to change.
> 
> Do you mean that for 4.6 1:N model is followed? why not
> 1:1 vITS to pITS mapping?

In the model here there isn't really a mapping from vITS to pITS at all,
that was one of the simplifications I tried to make, since it avoids all
the stuff about scheduling etc.

> > Since everything is based upon IPA (guest addresses) a malicious guest
> > can only reference its own RAM here.
> 
>    Here device table memory allocated by guest is used to lookup for the device.
> Why can't we avoid using the guest memory all together and just only emulate
> read and write of GITS_BASER0.

Emulate how? All as GITS_BASERn.Valid == 0 for all?

> Let Xen manage device information using its_device structure based on
> MAPD command.

Between C and D I had a version which was like this, and used the
per-domain device map to lookup the its_device.

The problem with it is that the INT command can be given a DeviceID
which doesn't really exist (in order to generate completion interrupts
etc).

But a non-existent DeviceID won't have a struct its device to go with
it, hence there was no way to translate the Event into an LPI.

> am I missing any spec where in it specifies format of Device table
> managed by ITS hw which software can read/update directly?

AIUI the software is never supposed to read/update the table directly,
irrespective of whether it is technically able to. Accessing those bits
directly would be a bug and have UNPREDICTABLE results. The whole
GITS_BASERn stuff is just a mechanism to avoid the h/w having to
provision RAM specifically for the ITS by allowing it to take over
portions of the real RAM.

I think you are correct that in principal a hardware designer would be
allowed to throw some RAM into the ITS h/w block itself and not request
anything via GITS_BASERn, as I say I had an interim version which did
this (using the per domain device map rather than RAM in the ITS h/w
block), but this didn't handle the INT on a non-existing/invented device
case.

I decided to handle that case by moving the data structure into guest
RAM (using the mechanisms provided to do so), which I considered to have
the benefit of being reasonably simple, but it's not the only possible
answer.

As an alternative I suppose we could e.g. insert dummy struct its_device
entries into the per-domain virtual deviceid mapping table in response
to `MPAD` requests for guest invented devices. Then we would need a
mechanism to stop a guest from populating all 2^32 entries in the tree
(which would use a lot of Xen guest RAM) and ideally we would want to
avoid allocating dummy struct its_device during emulation.

Perhaps each guest could have a small pool of "invented its_devices"
allocated at start of day (small == say, 4) which could be inserted into
the device table on MAPD, once they are all used further MAPD commands
for invented devices would be ignored (this assumes that most guests
will only want one such device for completion purposes), which also
solves the problem of filling the tree completely (or does it: a
sequence of MAPD commands setting and clearing all 2^32 devices might
cause us to populate the whole tree).

Do you have any other ideas? I was trying to keep things simple and
using a guest hosted table seemed to achieve that.

> > Again since this is IPA based a malicious guest can only point things
> > to its own ram.
> 
> Same as above comment.  Xen can manage vlpi<=>plpi mapping
> for each device. This info can also be used in enabling/disableing
> of LPI (LPI configuration table update). The size of this vlpi can
> be equal to number of vectors supported by the device.

I think the vpli is chosen by the guest, isn't it? So the mapping from
vlpi->plpi is under guest control, and not restricted to the number of
vectors provided by the device. I suppose we could maintain another tree
data structure, which might.

> > ## Command Queue Virtualisation
> >
> > The command translation/emulation in this design has been arranged to
> > be as cheap as possible (e.g. in many cases the actions are NOPs),
> > avoiding previous concerns about the length of time which an emulated
> > write to a `CWRITER` register may block the vcpu.
> >
> > The vits will simply track its reader and writer pointers. On write
> > to `CWRITER` it will immediately and synchronously process all
> > commands in the queue and update its state accordingly.
> 
>  you mean for 4.6 we don't use any completion INT or softirq approach?

I've deliberately left the operation of the pits driver out of scope for
this iteration of the spec. So far as the vits is concerned it makes
irq_enable (or whatever) calls to the core Xen interrupt handling code,
but it doesn't then care about how it is achieved, the physical driver
owns the physical its (or multiple of them) and is free to implement
things in whichever way it deems best, in isolation from any particular
vits design.

This was one of the major simplifications made in this iteration, since
all the vits->pits scheduling etc stuff was causing .

Ian.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Draft D] Xen on ARM vITS Handling
  2015-06-05  9:28   ` Julien Grall
@ 2015-06-05  9:51     ` Ian Campbell
  0 siblings, 0 replies; 18+ messages in thread
From: Ian Campbell @ 2015-06-05  9:51 UTC (permalink / raw
  To: Julien Grall
  Cc: manish.jaggi, Julien Grall, Stefano Stabellini, Vijay Kilari,
	xen-devel@lists.xen.org

On Fri, 2015-06-05 at 10:28 +0100, Julien Grall wrote:
> 
> On 05/06/2015 07:07, Vijay Kilari wrote:
> > On Thu, Jun 4, 2015 at 7:24 PM, Ian Campbell <ian.campbell@citrix.com> wrote:
> >> This information shall include at least:
> >>
> >> - The Device ID of the device.
> >> - The maximum number of Events which the device is capable of
> >>    generating.
> >>
> >> When a device is discovered/registered (i.e. when all necessary
> >> information is available) then:
> >>
> >> - `struct its_device` and the embedded `events` array will be
> >>    allocated (the latter with `nr_events` elements).
> >> - The `struct its_device` will be inserted into a mapping (possibly an
> >>    R-B tree) from its physical Device ID to the `struct its`.
> >
> >     Why not radix tree. It might be better in look up?
> 
> The lookup up in the radix tree is in O(k) where k is the size of the 
> index (i.e the number of DevID bits).
> 
> In the R-B tree, the lookup is in O(log(n)) where n is the number of 
> member in the tree.
> 
> As we use integer for the index, the R-B tree will be faster (The number 
> of PCI device per domain will never be too big).

Lets try not to worry too much about the specific data structures at
this stage. The key point is its not a list (potentially enormous memory
overhead) nor a simple list (bad lookup times).

Ian.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Draft D] Xen on ARM vITS Handling
  2015-06-04 17:55 ` Julien Grall
  2015-06-04 19:08   ` Julien Grall
@ 2015-06-05 10:24   ` Ian Campbell
  2015-06-05 12:15     ` Julien Grall
  1 sibling, 1 reply; 18+ messages in thread
From: Ian Campbell @ 2015-06-05 10:24 UTC (permalink / raw
  To: Julien Grall
  Cc: manish.jaggi, Julien Grall, Stefano Stabellini, Vijay Kilari,
	xen-devel@lists.xen.org

On Thu, 2015-06-04 at 18:55 +0100, Julien Grall wrote:
> > # Scope
> > 
> > The ITS is rather complicated, especially when combined with
> > virtualisation. To simplify things we initially omit the following
> > functionality:
> > 
> > - Interrupt -> vCPU -> pCPU affinity. The management of physical vs
> >   virtual Collections is a feature of GICv4, thus is omitted in this
> >   design for GICv3. Physical interrupts which occur on a pCPU where
> >   the target vCPU is not already resident will be forwarded (via IPI)
> >   to the correct pCPU for injection via the existing
> >   `vgic_vcpu_inject_irq` mechanism (extended to handle LPI injection
> >   correctly).
> > - Clearing of the pending state of an LPI under various circumstances
> >   (`MOVI`, `DISCARD`, `CLEAR` commands) is not done. This will result
> >   in guests seeing some perhaps spurious interrupts.
> > - vITS functionality will only be available on 64-bit ARM hosts,
> >   avoiding the need to worry about fast access to guest owned data
> >   structures (64-bit uses a direct map). (NB: 32-bit guests on 64-bit
> >   hosts can be considered to have access)
> > 
> > XXX Can we assume that `GITS_TYPER.Devbits` will be sane,
> > i.e. requiring support for the full 2^32 device ids would require a
> > 32GB device table even for native, which is improbable except on
> > systems with RAM measured in TB. So we can probably assume that
> > Devbits will be appropriate to the size of the system. _Note_: We
> > require per guest device tables, so size of the native Device Table is
> > not the only factor here.
> 
> As we control the vBDF we can control the vDevID.

Not for dom0 though, unfortunately.

> If we have a single PCI bus, the number won't be too high.

Right.

Perhaps we can make some useful simplification from assuming 1:1 for
dom0 and a reduced Devbits space controlled by us) for domU?

> > XXX Likewise can we assume that `GITS_TYPER.IDbits` will be sane?
> 
> The GITS_TYPER.IDbits of who? The physical ITS?

Yes, although I think it then has an impact on what we virtualise, for
the dom0 case in particular.

> > ## Device Table
> > 
> > The `pITS` device table will be allocated and given to the pITS at
> > start of day.
> 
> We don't really care about this. This is part of the memory provision at
> initialization based on GITS_BASER*
> 
> Furthermore, the ITS may already have in the table in-memory and
> therefore this allocation is not neccesary.


I was just trying to indicate that the physical device was setup in some
appropriate way, I've replaced this section with another sentence at the
end of the previous one:
        The physical ITS will be provisioned with whatever tables it requests
        via its `GITS_BASERn` registers.

> > ## Per Device Information
> > 
> > Each physical device in the system which can be used together with an
> > ITS (whether using passthrough or not) will have associated with it a
> > data structure:
> > 
> >     struct its_device {
> >         uintNN_t phys_device_id;
> >         uintNN_t virt_device_id;
> >         unsigned int *events;
> >         unsigned int nr_events;
> >         struct page_info *pitt;
> >         unsigned int nr_pitt_pages
> 
> You need to have a pointer to the pITS associated.

In general I suppose that is likely, so far as this particular vits spec
goes is there a specific use here which requires it?

I'll add the field, but I'll also add:

	/* Other fields relating to pITS maintenance but unrelated to vITS */

> > ## Device Discovery/Registration and Configuration
> > 
> > Per device information will be discovered based on firmware tables (DT
> > or ACPI) and information provided by dom0 (e.g. registration via
> > PHYSDEVOP_pci_device_add or new custom hypercalls).
> > 
> > This information shall include at least:
> > 
> > - The Device ID of the device.
> > - The maximum number of Events which the device is capable of
> >   generating.
> 
> Well, the maximum number of Events doesn't need to come through a new
> hypercall. We can directly retrieve it via the PCI CFG space.

I don't think it matters, the point of this paragraph is that we know
the value from somewhere of all these things before the device can be
used for its.

Anyway, I added PCI cfg space to the e.g. in the first paragraph. From
the point of view of this spec I don't really care about the details of
where the nr events comes from.

> > (IPA, size, cache attributes to use when mapping) will be recorded in
> > `struct domain`.
> 
> map_domain_page assume that the memory is WB/WT. What happen if the
> guest decides to choose a different attribute?

This is an interesting point. Although GITS_BASERn (and similar
registers) allow the cacheability of the tables to be specified by the
OS its not clear to me why when the tables are essentially opaque to the
OS, so why would it be accessing them at all?

IOW so long as Xen always uses the same attributes for its mappings I'm
not sure we need to worry about this. Anything odd which occurs because
the guest touches the memory with the wrong cache attribute would be
equally odd if the guest touched it with the matching one.

> 
> > 
> > All other `GITS_BASERn.Valid == 0`.
> > 
> > ## vITS to pITS mapping
> > 
> > A physical system may have multiple physical ITSs.
> > 
> > With the simplified vits command model presented here only a single
> > `vits` is required.
> 
> What about DOM0? We would have to browse the firmware table (ACPI or DT)
> to replace the ITS parent.

That's ok though, isn't it?

> > On write `bit[0]` of the written byte is the enable/disable state for
> > the irq and is handled thus:
> > 
> >     lpi = (addr - table_base);
> >     if ( byte & 1 )
> >         enable_irq(lpi);
> >     else
> >         disable_irq(lpi);
> > 
> > Note that in the context of this emulation we do not have access to a
> > Device ID, and therefore cannot make decisions based on whether the
> > LPI/event has been `MAPD`d etc. In any case we have an `lpi` in our
> > hand and not an `event`, IOW we would need to do a _reverse_ lookup in
> > the ITT.
> 
> I'm struggling to see how this would work. After enabling/disabling an
> IRQ you need to send an INV which require the devID and the eventID.

This is a detail for the pits driver, which is deliberately out of scope
here. The interface to core Xen system is enable_irq/disable_irq and so
that is what we must use here.

I imagine the pits driver will need to store some details about this
somewhere for its own internal purposes, perhaps in struct its_device.

> Also, until now you haven't explained how the vLPI will be mapped to the
> pLPI.

I didn't think anything here so far needed that mapping, so of course I
didn't explain it.

If some alternative scheme which you are proposing (or solution to an
outstanding issue) requires this mapping then we would have to add it.

> > LPI priority (the remaining bits in the written byte) is currently
> > ignored.
> > 
> > ## LPI Pending Table Virtualisation
> > 
> > XXX Can we simply ignore this? 4.8.5 suggests it is not necessarily in
> > sync and the mechanism to force a sync is `IMPLEMENTATION DEFINED`.
> > 
> > ## Device Table Virtualisation
> > 
> > The IPA, size and cacheability attributes of the guest device table
> > will be recorded in `struct domain` upon write to `GITS_BASER0`.
> >
> > In order to lookup an entry for `device`:
> > 
> >     define {get,set}_vdevice_entry(domain, device, struct device_table *entry):
> >         offset = device*sizeof(struct vdevice_table)
> >         if offset > <DT size>: error
> > 
> >         dt_entry = <DT base IPA> + device*sizeof(struct vdevice_table)
> >         page = p2m_lookup(domain, dt_entry, p2m_ram)
> >         if !page: error
> >         /* nb: non-RAM pages, e.g. grant mappings,
> >          * are rejected by this lookup */
> 
> It is allowed to pass device memory here. See Device-nGnRnE.

The guest is allowed to tell us that we should use RAM with strongly
ordered etc device like semantics, but telling us to use an actual MMIO
region is nuts, do you really think we need to suppor that case?

> >         d = irq_guest->domain
> >         virq = irq_guest->virq
> >         its_device = irq_guest->its_device
> > 
> >         get_vitt_entry(d, its_device->virt_device_id, virq, &vitt)
> >         vcpu = d->its_collections[vitt.collection]
> >         vgic_vcpu_inject_irq(d, &d->vcpus[vcpu])
> 
> Shouldn't you pass at least the vLPIs?

Yes, I meant

        vgic_vcpu_inject_irq(&d->vcpus[vcpu], vitt.vpli)

> We would also need to ensure that the vitt.vpli is effectively an LPI.
> Otherwise the guest may send an invalid value (such as 1023) which will
> potentially crash the platform.

Yes. This sort of thing is largely implicit as far as the error handling
goes, i.e. error handling should be done at all stages.

I added a specific line about this though.

> > - Not `MAPD`d -- on `MAPD` enable all associate LPIs which are enabled
> >   in LPI CFG Table.
> > - Not `MAPI`/`MAPVI`d -- on `MAPI`/`MAPVI` enable LPI if enabled in
> >   CFG Table.
> > - Not `MAPC`d -- tricky. Need to know lists of LPIs associated with a
> >   virtual collection. A `list_head` in `struct irq_guest` implies a
> >   fair bit of overhead, number of LPIs per collection is the total
> >   number of LPIs (guest could assign all to the same
> >   collection). Could walk entire LPI CFG and reenable all set LPIs,
> >   might involve walking over several KB of memory though. Could inject
> >   unmapped collections to vcpu0, forcing the guest to deal with the
> >   spurious interrupts?
> >
> > XXX Only the `MAPC` issue seems problematic. Is this a critical issue or
> > can we get away with it?
> 
> This is not the only issue. With your solution, you let the possibility
> to the guest having one vLPI for multiple (devID, ID)/LPI.

I'm not sure, but isn't this UNPREDICTABLE or something? The "Where:"
under Figure 19 (and most of the other related tables) suggests that the
the identifier must be unique. I'd like to find a stronger statement if
we were to rely on this though, true.

>  Even if we
> validate it when the command MAPI/MAPVI is sent, the guest could modify
> himself the ITT.
> 
> Furthermore, each virtual IRQ of a domain is associated to a struct
> pending_irq. This structure contains an irq_desc which is a pointer to
> the physical IRQ. With a craft ITT the guest could mess up those
> datastructure. Although, I think we can get back on our feet after the
> domain is destroyed.
> 
> After reading this draft, I think we can avoid to browse the Device
> Table and the ITT. As said on the previous paragraph, the pending_irq
> structure is linked to an irq_desc. In your proposal, you suggested to
> store the its_device in the irq_guest (part of irq_desc). If we make
> usage of pending_irq->desc to store the physical descriptor, we can have
> a mapping vLPI <=> pLPI. Therefore, this would resolve UI1 and AFAICT,
> the memory usage would be the same in Xen of the ITT/Device base solution.

If you have a scheme along those lines that would be good. From your
next message:

> BTW, I will suggest a pseudo-code based on your draft tomorrow. It may 
> be easier to understand.

Yes, please, but please update everything which is affected, add new
sections as necessary etc so the document remains a self consistent
plausible design.

Ian.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Draft D] Xen on ARM vITS Handling
  2015-06-05 10:24   ` Ian Campbell
@ 2015-06-05 12:15     ` Julien Grall
  2015-06-05 13:20       ` Ian Campbell
  0 siblings, 1 reply; 18+ messages in thread
From: Julien Grall @ 2015-06-05 12:15 UTC (permalink / raw
  To: Ian Campbell, Julien Grall
  Cc: manish.jaggi, Julien Grall, Stefano Stabellini, Vijay Kilari,
	xen-devel@lists.xen.org

On 05/06/15 11:24, Ian Campbell wrote:
> On Thu, 2015-06-04 at 18:55 +0100, Julien Grall wrote:
>>> # Scope
>>>
>>> The ITS is rather complicated, especially when combined with
>>> virtualisation. To simplify things we initially omit the following
>>> functionality:
>>>
>>> - Interrupt -> vCPU -> pCPU affinity. The management of physical vs
>>>   virtual Collections is a feature of GICv4, thus is omitted in this
>>>   design for GICv3. Physical interrupts which occur on a pCPU where
>>>   the target vCPU is not already resident will be forwarded (via IPI)
>>>   to the correct pCPU for injection via the existing
>>>   `vgic_vcpu_inject_irq` mechanism (extended to handle LPI injection
>>>   correctly).
>>> - Clearing of the pending state of an LPI under various circumstances
>>>   (`MOVI`, `DISCARD`, `CLEAR` commands) is not done. This will result
>>>   in guests seeing some perhaps spurious interrupts.
>>> - vITS functionality will only be available on 64-bit ARM hosts,
>>>   avoiding the need to worry about fast access to guest owned data
>>>   structures (64-bit uses a direct map). (NB: 32-bit guests on 64-bit
>>>   hosts can be considered to have access)
>>>
>>> XXX Can we assume that `GITS_TYPER.Devbits` will be sane,
>>> i.e. requiring support for the full 2^32 device ids would require a
>>> 32GB device table even for native, which is improbable except on
>>> systems with RAM measured in TB. So we can probably assume that
>>> Devbits will be appropriate to the size of the system. _Note_: We
>>> require per guest device tables, so size of the native Device Table is
>>> not the only factor here.
>>
>> As we control the vBDF we can control the vDevID.
> 
> Not for dom0 though, unfortunately.
> 
>> If we have a single PCI bus, the number won't be too high.
> 
> Right.
> 
> Perhaps we can make some useful simplification from assuming 1:1 for
> dom0 and a reduced Devbits space controlled by us) for domU?
> 
>>> XXX Likewise can we assume that `GITS_TYPER.IDbits` will be sane?
>>
>> The GITS_TYPER.IDbits of who? The physical ITS?
> 
> Yes, although I think it then has an impact on what we virtualise, for
> the dom0 case in particular.

AFAICT it's only impact DOM0. For DOMU we are able to know the maximum
number of LPIs (based on the PCI assigned). So we can limit the
GITS_TYPER.IDbits.

> 
>>> ## Device Table
>>>
>>> The `pITS` device table will be allocated and given to the pITS at
>>> start of day.
>>
>> We don't really care about this. This is part of the memory provision at
>> initialization based on GITS_BASER*
>>
>> Furthermore, the ITS may already have in the table in-memory and
>> therefore this allocation is not neccesary.
> 
> 
> I was just trying to indicate that the physical device was setup in some
> appropriate way, I've replaced this section with another sentence at the
> end of the previous one:
>         The physical ITS will be provisioned with whatever tables it requests
>         via its `GITS_BASERn` registers.
> 
>>> ## Per Device Information
>>>
>>> Each physical device in the system which can be used together with an
>>> ITS (whether using passthrough or not) will have associated with it a
>>> data structure:
>>>
>>>     struct its_device {
>>>         uintNN_t phys_device_id;
>>>         uintNN_t virt_device_id;
>>>         unsigned int *events;
>>>         unsigned int nr_events;
>>>         struct page_info *pitt;
>>>         unsigned int nr_pitt_pages
>>
>> You need to have a pointer to the pITS associated.
> 
> In general I suppose that is likely, so far as this particular vits spec
> goes is there a specific use here which requires it?

Well, you were speaking about the structure associated to a physical
device and provide information about the physical IIT which is not
strictly speaking vITS specific.

> I'll add the field, but I'll also add:
> 
> 	/* Other fields relating to pITS maintenance but unrelated to vITS */
> 
>>> ## Device Discovery/Registration and Configuration
>>>
>>> Per device information will be discovered based on firmware tables (DT
>>> or ACPI) and information provided by dom0 (e.g. registration via
>>> PHYSDEVOP_pci_device_add or new custom hypercalls).
>>>
>>> This information shall include at least:
>>>
>>> - The Device ID of the device.
>>> - The maximum number of Events which the device is capable of
>>>   generating.
>>
>> Well, the maximum number of Events doesn't need to come through a new
>> hypercall. We can directly retrieve it via the PCI CFG space.
> 
> I don't think it matters, the point of this paragraph is that we know
> the value from somewhere of all these things before the device can be
> used for its.
> 
> Anyway, I added PCI cfg space to the e.g. in the first paragraph. From
> the point of view of this spec I don't really care about the details of
> where the nr events comes from.
> 
>>> (IPA, size, cache attributes to use when mapping) will be recorded in
>>> `struct domain`.
>>
>> map_domain_page assume that the memory is WB/WT. What happen if the
>> guest decides to choose a different attribute?
> 
> This is an interesting point. Although GITS_BASERn (and similar
> registers) allow the cacheability of the tables to be specified by the
> OS its not clear to me why when the tables are essentially opaque to the
> OS, so why would it be accessing them at all?

I have no idea why they request to specify the cacheability in the
GITS_BASER.

> 
> IOW so long as Xen always uses the same attributes for its mappings I'm
> not sure we need to worry about this. Anything odd which occurs because
> the guest touches the memory with the wrong cache attribute would be
> equally odd if the guest touched it with the matching one.

> 
>>
>>>
>>> All other `GITS_BASERn.Valid == 0`.
>>>
>>> ## vITS to pITS mapping
>>>
>>> A physical system may have multiple physical ITSs.
>>>
>>> With the simplified vits command model presented here only a single
>>> `vits` is required.
>>
>> What about DOM0? We would have to browse the firmware table (ACPI or DT)
>> to replace the ITS parent.
> 
> That's ok though, isn't it?
> 
>>> On write `bit[0]` of the written byte is the enable/disable state for
>>> the irq and is handled thus:
>>>
>>>     lpi = (addr - table_base);
>>>     if ( byte & 1 )
>>>         enable_irq(lpi);
>>>     else
>>>         disable_irq(lpi);
>>>
>>> Note that in the context of this emulation we do not have access to a
>>> Device ID, and therefore cannot make decisions based on whether the
>>> LPI/event has been `MAPD`d etc. In any case we have an `lpi` in our
>>> hand and not an `event`, IOW we would need to do a _reverse_ lookup in
>>> the ITT.
>>
>> I'm struggling to see how this would work. After enabling/disabling an
>> IRQ you need to send an INV which require the devID and the eventID.
> 
> This is a detail for the pits driver, which is deliberately out of scope
> here. The interface to core Xen system is enable_irq/disable_irq and so
> that is what we must use here.
> 
> I imagine the pits driver will need to store some details about this
> somewhere for its own internal purposes, perhaps in struct its_device.

It's rather strange to put this out of scope because it would have
solved your issue (i.e getting the vDevID, ID).

If you have a mapping vLPI -> pLPI, you can therefore get the its_device
which store the vDevID and you can device the eventID.

Which that you can decide what to do.

>> Also, until now you haven't explained how the vLPI will be mapped to the
>> pLPI.
> 
> I didn't think anything here so far needed that mapping, so of course I
> didn't explain it.
> 
> If some alternative scheme which you are proposing (or solution to an
> outstanding issue) requires this mapping then we would have to add it.
> 
>>> LPI priority (the remaining bits in the written byte) is currently
>>> ignored.
>>>
>>> ## LPI Pending Table Virtualisation
>>>
>>> XXX Can we simply ignore this? 4.8.5 suggests it is not necessarily in
>>> sync and the mechanism to force a sync is `IMPLEMENTATION DEFINED`.
>>>
>>> ## Device Table Virtualisation
>>>
>>> The IPA, size and cacheability attributes of the guest device table
>>> will be recorded in `struct domain` upon write to `GITS_BASER0`.
>>>
>>> In order to lookup an entry for `device`:
>>>
>>>     define {get,set}_vdevice_entry(domain, device, struct device_table *entry):
>>>         offset = device*sizeof(struct vdevice_table)
>>>         if offset > <DT size>: error
>>>
>>>         dt_entry = <DT base IPA> + device*sizeof(struct vdevice_table)
>>>         page = p2m_lookup(domain, dt_entry, p2m_ram)
>>>         if !page: error
>>>         /* nb: non-RAM pages, e.g. grant mappings,
>>>          * are rejected by this lookup */
>>
>> It is allowed to pass device memory here. See Device-nGnRnE.
> 
> The guest is allowed to tell us that we should use RAM with strongly
> ordered etc device like semantics, but telling us to use an actual MMIO
> region is nuts, do you really think we need to suppor that case?

I agree that it may be strange but I wasn't able to find a place which
rule out this possibility...

Though, it may be possible to have MMIO region to use specifically for
the DOM0 ITS (?).

> 
>>>         d = irq_guest->domain
>>>         virq = irq_guest->virq
>>>         its_device = irq_guest->its_device
>>>
>>>         get_vitt_entry(d, its_device->virt_device_id, virq, &vitt)
>>>         vcpu = d->its_collections[vitt.collection]
>>>         vgic_vcpu_inject_irq(d, &d->vcpus[vcpu])
>>
>> Shouldn't you pass at least the vLPIs?
> 
> Yes, I meant
> 
>         vgic_vcpu_inject_irq(&d->vcpus[vcpu], vitt.vpli)
> 
>> We would also need to ensure that the vitt.vpli is effectively an LPI.
>> Otherwise the guest may send an invalid value (such as 1023) which will
>> potentially crash the platform.
> 
> Yes. This sort of thing is largely implicit as far as the error handling
> goes, i.e. error handling should be done at all stages.
> 
> I added a specific line about this though.
> 
>>> - Not `MAPD`d -- on `MAPD` enable all associate LPIs which are enabled
>>>   in LPI CFG Table.
>>> - Not `MAPI`/`MAPVI`d -- on `MAPI`/`MAPVI` enable LPI if enabled in
>>>   CFG Table.
>>> - Not `MAPC`d -- tricky. Need to know lists of LPIs associated with a
>>>   virtual collection. A `list_head` in `struct irq_guest` implies a
>>>   fair bit of overhead, number of LPIs per collection is the total
>>>   number of LPIs (guest could assign all to the same
>>>   collection). Could walk entire LPI CFG and reenable all set LPIs,
>>>   might involve walking over several KB of memory though. Could inject
>>>   unmapped collections to vcpu0, forcing the guest to deal with the
>>>   spurious interrupts?
>>>
>>> XXX Only the `MAPC` issue seems problematic. Is this a critical issue or
>>> can we get away with it?
>>
>> This is not the only issue. With your solution, you let the possibility
>> to the guest having one vLPI for multiple (devID, ID)/LPI.
> 
> I'm not sure, but isn't this UNPREDICTABLE or something? The "Where:"
> under Figure 19 (and most of the other related tables) suggests that the
> the identifier must be unique. I'd like to find a stronger statement if
> we were to rely on this though, true.

I think so. But you are assuming that the guest is nice and will never
touch the memory provisioned to the ITS (ITT, Device Table...).

The vLPI is stored in the guest memory. A malicious guest could decide
to modify different ITT entries to point to the same vLPI. It would be
impossible to know which pLPI we have to EOI.

We can't validate this case when the interrupt is mapped via MAPVI as
the guest can modify afterward the memory. And checking at injection
time would require to browse all the ITTs.

So, shall we trust a guest using vITS? If not I'm afraid that we can't
make usage of memory provisioned by the guest.

FWIW, this is not the first place in this proposal where you assume the
table won't be modified by the guest.

> 
>>  Even if we
>> validate it when the command MAPI/MAPVI is sent, the guest could modify
>> himself the ITT.
>>
>> Furthermore, each virtual IRQ of a domain is associated to a struct
>> pending_irq. This structure contains an irq_desc which is a pointer to
>> the physical IRQ. With a craft ITT the guest could mess up those
>> datastructure. Although, I think we can get back on our feet after the
>> domain is destroyed.
>>
>> After reading this draft, I think we can avoid to browse the Device
>> Table and the ITT. As said on the previous paragraph, the pending_irq
>> structure is linked to an irq_desc. In your proposal, you suggested to
>> store the its_device in the irq_guest (part of irq_desc). If we make
>> usage of pending_irq->desc to store the physical descriptor, we can have
>> a mapping vLPI <=> pLPI. Therefore, this would resolve UI1 and AFAICT,
>> the memory usage would be the same in Xen of the ITT/Device base solution.
> 
> If you have a scheme along those lines that would be good. From your
> next message:
> 
>> BTW, I will suggest a pseudo-code based on your draft tomorrow. It may 
>> be easier to understand.
> 
> Yes, please, but please update everything which is affected, add new
> sections as necessary etc so the document remains a self consistent
> plausible design.

I'm afraid to say that I won't be able to come up with a full proposal
today and I will be away from tonight until the 15th of June.

I can try to suggest few lead but I still need to think about the INT
problem.

Regards,

-- 
Julien Grall

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Draft D] Xen on ARM vITS Handling
  2015-06-05  9:49   ` Ian Campbell
@ 2015-06-05 12:41     ` Vijay Kilari
  2015-06-05 13:28       ` Ian Campbell
  0 siblings, 1 reply; 18+ messages in thread
From: Vijay Kilari @ 2015-06-05 12:41 UTC (permalink / raw
  To: Ian Campbell
  Cc: manish.jaggi, Julien Grall, Stefano Stabellini,
	xen-devel@lists.xen.org

On Fri, Jun 5, 2015 at 3:19 PM, Ian Campbell <ian.campbell@citrix.com> wrote:
> On Fri, 2015-06-05 at 11:37 +0530, Vijay Kilari wrote:
>> > ## Virtual LPI interrupt injection
>> >
>> > A physical interrupt which is routed to a guest vCPU has the
>> > `_IRQ_GUEST` flag set in the `irq_desc` status mask. Such interrupts
>> > have an associated instance of `struct irq_guest` which contains the
>> > target `struct domain` pointer and virtual interrupt number.
>> >
>> > In Xen a virtual interrupt (either arising from a physical interrupt
>> > or completely virtual) is ultimately injected to a VCPU using the
>> > `vgic_vcpu_inject_irq` function, or `vgic_vcpu_inject_spi`.
>> >
>> > This mechanism will likely need updating to handle the injection of
>> > virtual LPIs. In particular rather than `GICD_ITARGERRn` or
>> > `GICD_IROUTERn` routing of LPIs is performed via the ITS collections
>> > mechanism. This is discussed below (In _vITS_:_Virtual LPI injection_).
>> >
>>
>> For 4.6 scope, we can inject LPIs always to VCPU0?
>
> If that isn't going to be too upsetting for guest OSes we might be able
> to get away with it. I can't remember if we did this for SPIs in the
> early days also.
>
>
>> > During assignment all LPIs associated with the device will be routed
>> > to the guest (i.e. `route_irq_to_guest` will be called for each LPI in
>> > the `struct its_device.events` array).
>>
>>  Is there any hypercall where in we can assign & route LPI's to the guest?
>
> There's:
>         /*
>          * Dom0 should use these two to announce MMIO resources assigned to
>          * MSI-X capable devices won't (prepare) or may (release) change.
>          */
>         #define PHYSDEVOP_prepare_msix          30
>         #define PHYSDEVOP_release_msix          31
>         struct physdev_pci_device {
>             /* IN */
>             uint16_t seg;
>             uint8_t bus;
>             uint8_t devfn;
>         };
> which sort of is?
>
> But, the paragraph above is proposing to do the routing on _device_
> assignment, e.g. on XEN_DOMCTL_assign_device route all LPIs which we
> have prellocated to the device to the designated domain.
>
>> > ## vITS to pITS mapping
>> >
>> > A physical system may have multiple physical ITSs.
>> >
>> > With the simplified vits command model presented here only a single
>> > `vits` is required.
>> >
>> > In the future a more complex arrangement may be desired. Since the
>> > choice of model is internal to the hypervisor/tools and is
>> > communicated to the guest via firmware tables we are not tied to this
>> > model as an ABI if we decide to change.
>>
>> Do you mean that for 4.6 1:N model is followed? why not
>> 1:1 vITS to pITS mapping?
>
> In the model here there isn't really a mapping from vITS to pITS at all,
> that was one of the simplifications I tried to make, since it avoids all
> the stuff about scheduling etc.
>
>> > Since everything is based upon IPA (guest addresses) a malicious guest
>> > can only reference its own RAM here.
>>
>>    Here device table memory allocated by guest is used to lookup for the device.
>> Why can't we avoid using the guest memory all together and just only emulate
>> read and write of GITS_BASER0.
>
> Emulate how? All as GITS_BASERn.Valid == 0 for all?

Yes, this avoids guest allocating device table for ITS which
is not used.

>
>> Let Xen manage device information using its_device structure based on
>> MAPD command.
>
> Between C and D I had a version which was like this, and used the
> per-domain device map to lookup the its_device.
>
> The problem with it is that the INT command can be given a DeviceID
> which doesn't really exist (in order to generate completion interrupts
> etc).
>
> But a non-existent DeviceID won't have a struct its device to go with
> it, hence there was no way to translate the Event into an LPI.
>
>> am I missing any spec where in it specifies format of Device table
>> managed by ITS hw which software can read/update directly?
>
> AIUI the software is never supposed to read/update the table directly,
> irrespective of whether it is technically able to. Accessing those bits
> directly would be a bug and have UNPREDICTABLE results. The whole
> GITS_BASERn stuff is just a mechanism to avoid the h/w having to
> provision RAM specifically for the ITS by allowing it to take over
> portions of the real RAM.
>
> I think you are correct that in principal a hardware designer would be
> allowed to throw some RAM into the ITS h/w block itself and not request
> anything via GITS_BASERn, as I say I had an interim version which did
> this (using the per domain device map rather than RAM in the ITS h/w
> block), but this didn't handle the INT on a non-existing/invented device
> case.
>
> I decided to handle that case by moving the data structure into guest
> RAM (using the mechanisms provided to do so), which I considered to have
> the benefit of being reasonably simple, but it's not the only possible
> answer.
>
> As an alternative I suppose we could e.g. insert dummy struct its_device
> entries into the per-domain virtual deviceid mapping table in response
> to `MPAD` requests for guest invented devices. Then we would need a
> mechanism to stop a guest from populating all 2^32 entries in the tree
> (which would use a lot of Xen guest RAM) and ideally we would want to
> avoid allocating dummy struct its_device during emulation.

If dom0 is inventing all the devices in the platform and Xen is preallocating
memory for its_device struct that are added using PHYSDEV_add_device op,
 there is will be no chance of domain allocating 2^32 devices.

So deviceid of any MAPD command for all the guests should be have been
within pre-identified list.

>
> Perhaps each guest could have a small pool of "invented its_devices"
> allocated at start of day (small == say, 4) which could be inserted into
> the device table on MAPD, once they are all used further MAPD commands
> for invented devices would be ignored (this assumes that most guests
> will only want one such device for completion purposes), which also
> solves the problem of filling the tree completely (or does it: a
> sequence of MAPD commands setting and clearing all 2^32 devices might
> cause us to populate the whole tree).

If guest is creating this new device for its completion
INT purpose by sending MAPD command, then Xen can create dummy device.

I think we can limit number of dummy devices that guest can create to
small (2 or 4)
and number of vectors say 32. Xen can mark this devices as dummy and
allow all MAPVI/MAPI commands
on this device, but does not translate to physical ITS commands. Also
multiple guests can create dummy device with the same deviceID for which
Xen cannot translate.

On receiving INT command, Xen will just inject LPI mapped for the (device, ID)
(INT device, ID) to domain. So effectively no physical commands are _not_ sent
to ITS hardware that are related to dummy devices.

- Vijay

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Draft D] Xen on ARM vITS Handling
  2015-06-05 12:15     ` Julien Grall
@ 2015-06-05 13:20       ` Ian Campbell
  2015-06-05 14:12         ` Julien Grall
  0 siblings, 1 reply; 18+ messages in thread
From: Ian Campbell @ 2015-06-05 13:20 UTC (permalink / raw
  To: Julien Grall
  Cc: Vijay Kilari, manish.jaggi, Julien Grall, xen-devel@lists.xen.org,
	Julien Grall, Stefano Stabellini

On Fri, 2015-06-05 at 13:15 +0100, Julien Grall wrote:
> >> I'm struggling to see how this would work. After enabling/disabling an
> >> IRQ you need to send an INV which require the devID and the eventID.
> > 
> > This is a detail for the pits driver, which is deliberately out of scope
> > here. The interface to core Xen system is enable_irq/disable_irq and so
> > that is what we must use here.
> > 
> > I imagine the pits driver will need to store some details about this
> > somewhere for its own internal purposes, perhaps in struct its_device.
> 
> It's rather strange to put this out of scope because it would have
> solved your issue (i.e getting the vDevID, ID).

Details of the pits are out of scope so the fact that enabling or
disabling an interrupt needs a specific set of operations and info is
out of scope.

The fact that a vLPU->pLPI mapping might be helpful for some other issue
within the vits is a separate consideration. You might be correct that
this would help the LPI CFG emulation case, but: 

> If you have a mapping vLPI -> pLPI, you can therefore get the its_device
> which store the vDevID and you can device the eventID.

Previously you mentioned multiple (device,event) mapping to the same
(v)LPI. I hoped we could rule this out but I wasn't sure, and frankly
I'm no longer hopeful that it is the case.

That possibility makes vPLI->pLPI mapping trickier (if not impossible)
to maintain.

In fact given that LPIs are a GICR thing while (device,event)=>LPI
mapping is an ITS thing, its possible that this is simply allowed and
expected, even if strange.

> Which that you can decide what to do.
> 
> >> Also, until now you haven't explained how the vLPI will be mapped to the
> >> pLPI.
> > 
> > I didn't think anything here so far needed that mapping, so of course I
> > didn't explain it.
> > 
> > If some alternative scheme which you are proposing (or solution to an
> > outstanding issue) requires this mapping then we would have to add it.
> > 
> >>> LPI priority (the remaining bits in the written byte) is currently
> >>> ignored.
> >>>
> >>> ## LPI Pending Table Virtualisation
> >>>
> >>> XXX Can we simply ignore this? 4.8.5 suggests it is not necessarily in
> >>> sync and the mechanism to force a sync is `IMPLEMENTATION DEFINED`.
> >>>
> >>> ## Device Table Virtualisation
> >>>
> >>> The IPA, size and cacheability attributes of the guest device table
> >>> will be recorded in `struct domain` upon write to `GITS_BASER0`.
> >>>
> >>> In order to lookup an entry for `device`:
> >>>
> >>>     define {get,set}_vdevice_entry(domain, device, struct device_table *entry):
> >>>         offset = device*sizeof(struct vdevice_table)
> >>>         if offset > <DT size>: error
> >>>
> >>>         dt_entry = <DT base IPA> + device*sizeof(struct vdevice_table)
> >>>         page = p2m_lookup(domain, dt_entry, p2m_ram)
> >>>         if !page: error
> >>>         /* nb: non-RAM pages, e.g. grant mappings,
> >>>          * are rejected by this lookup */
> >>
> >> It is allowed to pass device memory here. See Device-nGnRnE.
> > 
> > The guest is allowed to tell us that we should use RAM with strongly
> > ordered etc device like semantics, but telling us to use an actual MMIO
> > region is nuts, do you really think we need to suppor that case?
> 
> I agree that it may be strange but I wasn't able to find a place which
> rule out this possibility...

I think we should rule it out for our use case, it's a legitimate
simplification.

> >> This is not the only issue. With your solution, you let the possibility
> >> to the guest having one vLPI for multiple (devID, ID)/LPI.
> > 
> > I'm not sure, but isn't this UNPREDICTABLE or something? The "Where:"
> > under Figure 19 (and most of the other related tables) suggests that the
> > the identifier must be unique. I'd like to find a stronger statement if
> > we were to rely on this though, true.
> 
> I think so. But you are assuming that the guest is nice and will never
> touch the memory provisioned to the ITS (ITT, Device Table...).

No, I'm not. The document was very clear that Xen should never trust
anything written in that memory and should always carefully bounds check
it.

But if a guest wants to scribble over that memory and mess up its own
interrupt handling it is welcome to do so, just like on native. We don't
need to protect the guest against itself, just the host against the
guest.

> The vLPI is stored in the guest memory. A malicious guest could decide
> to modify different ITT entries to point to the same vLPI. It would be
> impossible to know which pLPI we have to EOI.

There may be an issue here. Thinking out loud:

At the point we come to inject a vPLI we do (or could) know the pLPI,
because we have the irq_desc/guest_irq (or something which would let us
find them) in our hands.

On native it's unclear if mapping multiple (Device,Event) pairs to a
single LPI is valid (or a single (Collection,LPI) pair, I'm not sure it
matters). I can't find where it is rules out, so lets assume it is
allowed.

In that case what would happen if a (DeviceB,EventB) triggers a given
LPI which has already been triggered and is pending due to
(DeviceA,EventA). LPIs are edge triggered so I think the second
interrupt would simply be ignored.

Can we do the same? Would this be harmful to anyone other than the guest
itself (who would no longer receive some interrupts, but its their own
fault).

It's not impossible to imagine (DeviceA,EventA) and (DeviceB,EventB)
mapping to the same LPI but the OS arranging that they are used at
mutually exclusive times and so things would work. Mad but not
impossible to imagine (especially if DeviceA==DeviceB with two events on
it).

> We can't validate this case when the interrupt is mapped via MAPVI as
> the guest can modify afterward the memory. And checking at injection
> time would require to browse all the ITTs.
> 
> So, shall we trust a guest using vITS? If not I'm afraid that we can't
> make usage of memory provisioned by the guest.

This is a false dichotomy, we don't need to trust the guest if we are
careful to validate things as we use them, and if we do this we can use
guest owned memory for things. This was discussed in the spec. Is the
section "Xen interaction with guest OS provisioned vITS memory" lacking
in some way?

> 
> FWIW, this is not the first place in this proposal where you assume the
> table won't be modified by the guest.

It is very much assumed throughout that the guest might maliciously
modify memory it has given to the vITS, it is a part of the design.

If there are cases where Xen is _unable_ to validate something at the
time of use, then that is a problem with the design. The fact that
everywhere else is required to check things before use is part of the
design, not a problem with it.

> >>  Even if we
> >> validate it when the command MAPI/MAPVI is sent, the guest could modify
> >> himself the ITT.
> >>
> >> Furthermore, each virtual IRQ of a domain is associated to a struct
> >> pending_irq. This structure contains an irq_desc which is a pointer to
> >> the physical IRQ. With a craft ITT the guest could mess up those
> >> datastructure. Although, I think we can get back on our feet after the
> >> domain is destroyed.
> >>
> >> After reading this draft, I think we can avoid to browse the Device
> >> Table and the ITT. As said on the previous paragraph, the pending_irq
> >> structure is linked to an irq_desc. In your proposal, you suggested to
> >> store the its_device in the irq_guest (part of irq_desc). If we make
> >> usage of pending_irq->desc to store the physical descriptor, we can have
> >> a mapping vLPI <=> pLPI. Therefore, this would resolve UI1 and AFAICT,
> >> the memory usage would be the same in Xen of the ITT/Device base solution.
> > 
> > If you have a scheme along those lines that would be good. From your
> > next message:
> > 
> >> BTW, I will suggest a pseudo-code based on your draft tomorrow. It may 
> >> be easier to understand.
> > 
> > Yes, please, but please update everything which is affected, add new
> > sections as necessary etc so the document remains a self consistent
> > plausible design.
> 
> I'm afraid to say that I won't be able to come up with a full proposal
> today and I will be away from tonight until the 15th of June.
> 
> I can try to suggest few lead but I still need to think about the INT
> problem.

Be aware that I _have_ thought about the INT problem, which resulted in
the current design and specifically in the decision to keep the device
table (and by extension the ITTs) in guest RAM. This wasn't a whim on my
part, I thought quite hard about it.

Ian.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Draft D] Xen on ARM vITS Handling
  2015-06-05 12:41     ` Vijay Kilari
@ 2015-06-05 13:28       ` Ian Campbell
  2015-06-05 15:55         ` Vijay Kilari
  0 siblings, 1 reply; 18+ messages in thread
From: Ian Campbell @ 2015-06-05 13:28 UTC (permalink / raw
  To: Vijay Kilari
  Cc: manish.jaggi, Julien Grall, Stefano Stabellini,
	xen-devel@lists.xen.org

On Fri, 2015-06-05 at 18:11 +0530, Vijay Kilari wrote:

> >>    Here device table memory allocated by guest is used to lookup for the device.
> >> Why can't we avoid using the guest memory all together and just only emulate
> >> read and write of GITS_BASER0.
> >
> > Emulate how? All as GITS_BASERn.Valid == 0 for all?
> 
> Yes, this avoids guest allocating device table for ITS which
> is not used.

In this design the device table from the guest _is_ used though.

If you have another design in mind which doesn't require this then
please propose it.

> > As an alternative I suppose we could e.g. insert dummy struct its_device
> > entries into the per-domain virtual deviceid mapping table in response
> > to `MPAD` requests for guest invented devices. Then we would need a
> > mechanism to stop a guest from populating all 2^32 entries in the tree
> > (which would use a lot of Xen guest RAM) and ideally we would want to
> > avoid allocating dummy struct its_device during emulation.
> 
> If dom0 is inventing all the devices in the platform and Xen is preallocating
> memory for its_device struct that are added using PHYSDEV_add_device op,
>  there is will be no chance of domain allocating 2^32 devices.

(ITYM PHYSDEVOP_pci_device_add)

The invented device ids here are the sort discussed in "ITS
Configuration" as being used for ITS completion notification, and which
therefore do not correspond to any real device at all. A domU might make
up an arbitrary device ID, unused by any device which it can see, in
order to use with an INT command on its vits in order to get a
completion interrupt.

I hadn't imagined requiring the guest to call PHYSDEVOP_pci_device_add
for such phantom devices just in order to be able to use them with the
INT command, and I don't think you were suggesting this anyway. I'm not
sure what the implications of extending that to domUs being allowed to
register invented "phantom" devices, but on the face of it I don't think
it is a good idea.

> So deviceid of any MAPD command for all the guests should be have been
> within pre-identified list.

I don't think that is true for a domU, not necessarily for a dom0 given
that any device id can be used with INT.

> > Perhaps each guest could have a small pool of "invented its_devices"
> > allocated at start of day (small == say, 4) which could be inserted into
> > the device table on MAPD, once they are all used further MAPD commands
> > for invented devices would be ignored (this assumes that most guests
> > will only want one such device for completion purposes), which also
> > solves the problem of filling the tree completely (or does it: a
> > sequence of MAPD commands setting and clearing all 2^32 devices might
> > cause us to populate the whole tree).
> 
> If guest is creating this new device for its completion
> INT purpose by sending MAPD command, then Xen can create dummy device.
> 
> I think we can limit number of dummy devices that guest can create to
> small (2 or 4)
> and number of vectors say 32. Xen can mark this devices as dummy and
> allow all MAPVI/MAPI commands
> on this device, but does not translate to physical ITS commands.

Nothing in this iteration of the design translates any virtual command
into any physical one.

>  Also
> multiple guests can create dummy device with the same deviceID for which
> Xen cannot translate.


> 
> On receiving INT command, Xen will just inject LPI mapped for the (device, ID)
> (INT device, ID) to domain. So effectively no physical commands are _not_ sent
> to ITS hardware that are related to dummy devices.

I can't parse this last sentence, it seems to have an unintentional
double negative?


> 
> - Vijay

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Draft D] Xen on ARM vITS Handling
  2015-06-05 13:20       ` Ian Campbell
@ 2015-06-05 14:12         ` Julien Grall
  0 siblings, 0 replies; 18+ messages in thread
From: Julien Grall @ 2015-06-05 14:12 UTC (permalink / raw
  To: Ian Campbell, Julien Grall
  Cc: manish.jaggi, Julien Grall, Stefano Stabellini, Vijay Kilari,
	xen-devel@lists.xen.org

On 05/06/15 14:20, Ian Campbell wrote:
> On Fri, 2015-06-05 at 13:15 +0100, Julien Grall wrote:
>>>> I'm struggling to see how this would work. After enabling/disabling an
>>>> IRQ you need to send an INV which require the devID and the eventID.
>>>
>>> This is a detail for the pits driver, which is deliberately out of scope
>>> here. The interface to core Xen system is enable_irq/disable_irq and so
>>> that is what we must use here.
>>>
>>> I imagine the pits driver will need to store some details about this
>>> somewhere for its own internal purposes, perhaps in struct its_device.
>>
>> It's rather strange to put this out of scope because it would have
>> solved your issue (i.e getting the vDevID, ID).
> 
> Details of the pits are out of scope so the fact that enabling or
> disabling an interrupt needs a specific set of operations and info is
> out of scope.

I wasn't speaking about enable/disable IRQ but your paragraph "Note that
in the context of this emulation we do not have access to a
Device ID, and therefore cannot make decisions based on whether the
LPI/event has been `MAPD`d etc. In any case we have an `lpi` in our
hand and not an `event`, IOW we would need to do a _reverse_ lookup in
the ITT."

> The fact that a vLPU->pLPI mapping might be helpful for some other issue
> within the vits is a separate consideration. You might be correct that
> this would help the LPI CFG emulation case, but: 
> 
>> If you have a mapping vLPI -> pLPI, you can therefore get the its_device
>> which store the vDevID and you can device the eventID.
> 
> Previously you mentioned multiple (device,event) mapping to the same
> (v)LPI. I hoped we could rule this out but I wasn't sure, and frankly
> I'm no longer hopeful that it is the case.
> 
> That possibility makes vPLI->pLPI mapping trickier (if not impossible)
> to maintain.
>
>>>> This is not the only issue. With your solution, you let the possibility
>>>> to the guest having one vLPI for multiple (devID, ID)/LPI.
>>>
>>> I'm not sure, but isn't this UNPREDICTABLE or something? The "Where:"
>>> under Figure 19 (and most of the other related tables) suggests that the
>>> the identifier must be unique. I'd like to find a stronger statement if
>>> we were to rely on this though, true.
>>
>> I think so. But you are assuming that the guest is nice and will never
>> touch the memory provisioned to the ITS (ITT, Device Table...).
> 
> No, I'm not. The document was very clear that Xen should never trust
> anything written in that memory and should always carefully bounds check
> it.
> 
> But if a guest wants to scribble over that memory and mess up its own
> interrupt handling it is welcome to do so, just like on native. We don't
> need to protect the guest against itself, just the host against the
> guest.

I should have make clear that I only care about Xen and not the guest
shooting itself.

>> The vLPI is stored in the guest memory. A malicious guest could decide
>> to modify different ITT entries to point to the same vLPI. It would be
>> impossible to know which pLPI we have to EOI.
> 
> There may be an issue here. Thinking out loud:
> 
> At the point we come to inject a vPLI we do (or could) know the pLPI,
> because we have the irq_desc/guest_irq (or something which would let us
> find them) in our hands.

Agreed.

> 
> On native it's unclear if mapping multiple (Device,Event) pairs to a
> single LPI is valid (or a single (Collection,LPI) pair, I'm not sure it
> matters). I can't find where it is rules out, so lets assume it is
> allowed.
> 
> In that case what would happen if a (DeviceB,EventB) triggers a given
> LPI which has already been triggered and is pending due to
> (DeviceA,EventA). LPIs are edge triggered so I think the second
> interrupt would simply be ignored.
> 
> Can we do the same? Would this be harmful to anyone other than the guest
> itself (who would no longer receive some interrupts, but its their own
> fault).

As talk IRL, I think it's fine.

irq_desc->status may be inconsistent with the state of the hardware at
some point. It's required during domain destruction to know if we need
to EOI an IRQ assigned to a guest. But we could go through the IRQ and
EOI not matters of the status for a first approach.

>>>>  Even if we
>>>> validate it when the command MAPI/MAPVI is sent, the guest could modify
>>>> himself the ITT.
>>>>
>>>> Furthermore, each virtual IRQ of a domain is associated to a struct
>>>> pending_irq. This structure contains an irq_desc which is a pointer to
>>>> the physical IRQ. With a craft ITT the guest could mess up those
>>>> datastructure. Although, I think we can get back on our feet after the
>>>> domain is destroyed.
>>>>
>>>> After reading this draft, I think we can avoid to browse the Device
>>>> Table and the ITT. As said on the previous paragraph, the pending_irq
>>>> structure is linked to an irq_desc. In your proposal, you suggested to
>>>> store the its_device in the irq_guest (part of irq_desc). If we make
>>>> usage of pending_irq->desc to store the physical descriptor, we can have
>>>> a mapping vLPI <=> pLPI. Therefore, this would resolve UI1 and AFAICT,
>>>> the memory usage would be the same in Xen of the ITT/Device base solution.
>>>
>>> If you have a scheme along those lines that would be good. From your
>>> next message:
>>>
>>>> BTW, I will suggest a pseudo-code based on your draft tomorrow. It may 
>>>> be easier to understand.
>>>
>>> Yes, please, but please update everything which is affected, add new
>>> sections as necessary etc so the document remains a self consistent
>>> plausible design.
>>
>> I'm afraid to say that I won't be able to come up with a full proposal
>> today and I will be away from tonight until the 15th of June.
>>
>> I can try to suggest few lead but I still need to think about the INT
>> problem.
> 
> Be aware that I _have_ thought about the INT problem, which resulted in
> the current design and specifically in the decision to keep the device
> table (and by extension the ITTs) in guest RAM. This wasn't a whim on my
> part, I thought quite hard about it.

I know :/. I read your answer on Vijay's mail. I don't much like the
suggestion made for making things work.

After your last answer and the talk IRL, I'm fine with this design.

Regards,

-- 
Julien Grall

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Draft D] Xen on ARM vITS Handling
  2015-06-05 13:28       ` Ian Campbell
@ 2015-06-05 15:55         ` Vijay Kilari
  2015-06-05 16:38           ` Ian Campbell
  0 siblings, 1 reply; 18+ messages in thread
From: Vijay Kilari @ 2015-06-05 15:55 UTC (permalink / raw
  To: Ian Campbell
  Cc: manish.jaggi, Julien Grall, Stefano Stabellini,
	xen-devel@lists.xen.org

On Fri, Jun 5, 2015 at 6:58 PM, Ian Campbell <ian.campbell@citrix.com> wrote:
> On Fri, 2015-06-05 at 18:11 +0530, Vijay Kilari wrote:
>
>> >>    Here device table memory allocated by guest is used to lookup for the device.
>> >> Why can't we avoid using the guest memory all together and just only emulate
>> >> read and write of GITS_BASER0.
>> >
>> > Emulate how? All as GITS_BASERn.Valid == 0 for all?
>>
>> Yes, this avoids guest allocating device table for ITS which
>> is not used.
>
> In this design the device table from the guest _is_ used though.
>
> If you have another design in mind which doesn't require this then
> please propose it.

With below discussion is to avoid using guest device table

>
>> > As an alternative I suppose we could e.g. insert dummy struct its_device
>> > entries into the per-domain virtual deviceid mapping table in response
>> > to `MPAD` requests for guest invented devices. Then we would need a
>> > mechanism to stop a guest from populating all 2^32 entries in the tree
>> > (which would use a lot of Xen guest RAM) and ideally we would want to
>> > avoid allocating dummy struct its_device during emulation.
>>
>> If dom0 is inventing all the devices in the platform and Xen is preallocating
>> memory for its_device struct that are added using PHYSDEV_add_device op,
>>  there is will be no chance of domain allocating 2^32 devices.
>
> (ITYM PHYSDEVOP_pci_device_add)
>
> The invented device ids here are the sort discussed in "ITS
> Configuration" as being used for ITS completion notification, and which
> therefore do not correspond to any real device at all. A domU might make
> up an arbitrary device ID, unused by any device which it can see, in
> order to use with an INT command on its vits in order to get a
> completion interrupt.
>
> I hadn't imagined requiring the guest to call PHYSDEVOP_pci_device_add
> for such phantom devices just in order to be able to use them with the
> INT command, and I don't think you were suggesting this anyway. I'm not
> sure what the implications of extending that to domUs being allowed to
> register invented "phantom" devices, but on the face of it I don't think
> it is a good idea.

   What I mean is we don't need domU's to allow and register
phantom/dummy devices
used for INT command with PHYSDEVOP_pci_device_add.
Let xen mark those phantom devices added using MAPD as dummy and
just emulate and does not translate ITS commands for these devices.

>
>> So deviceid of any MAPD command for all the guests should be have been
>> within pre-identified list.
>
> I don't think that is true for a domU, not necessarily for a dom0 given
> that any device id can be used with INT.

   If the INT command is with valid device (not phantom) then Xen can translate
and send to ITS hardware

>
>> > Perhaps each guest could have a small pool of "invented its_devices"
>> > allocated at start of day (small == say, 4) which could be inserted into
>> > the device table on MAPD, once they are all used further MAPD commands
>> > for invented devices would be ignored (this assumes that most guests
>> > will only want one such device for completion purposes), which also
>> > solves the problem of filling the tree completely (or does it: a
>> > sequence of MAPD commands setting and clearing all 2^32 devices might
>> > cause us to populate the whole tree).
>>
>> If guest is creating this new device for its completion
>> INT purpose by sending MAPD command, then Xen can create dummy device.
>>
>> I think we can limit number of dummy devices that guest can create to
>> small (2 or 4)
>> and number of vectors say 32. Xen can mark this devices as dummy and
>> allow all MAPVI/MAPI commands
>> on this device, but does not translate to physical ITS commands.
>
> Nothing in this iteration of the design translates any virtual command
> into any physical one.
>
>>  Also
>> multiple guests can create dummy device with the same deviceID for which
>> Xen cannot translate.
>
>
>>
>> On receiving INT command, Xen will just inject LPI mapped for the (device, ID)
>> (INT device, ID) to domain. So effectively no physical commands are _not_ sent
>> to ITS hardware that are related to dummy devices.
>
> I can't parse this last sentence, it seems to have an unintentional
> double negative?
Sorry. I correct it as "So effectively no physical commands are sent
to ITS hardware that are related to dummy/phantom devices"
>
>
>>
>> - Vijay
>
>

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Draft D] Xen on ARM vITS Handling
  2015-06-05 15:55         ` Vijay Kilari
@ 2015-06-05 16:38           ` Ian Campbell
  2015-06-05 17:11             ` Vijay Kilari
  0 siblings, 1 reply; 18+ messages in thread
From: Ian Campbell @ 2015-06-05 16:38 UTC (permalink / raw
  To: Vijay Kilari
  Cc: manish.jaggi, Julien Grall, Stefano Stabellini,
	xen-devel@lists.xen.org

On Fri, 2015-06-05 at 21:25 +0530, Vijay Kilari wrote:
> Let xen mark those phantom devices added using MAPD as dummy and
> just emulate and does not translate ITS commands for these devices.

But we think guests might use this mechanism to drive completion
(instead of polling), so we have to translate INT commands for such
devices, don't we? Otherwise such guests won't work.

> >> So deviceid of any MAPD command for all the guests should be have been
> >> within pre-identified list.
> >
> > I don't think that is true for a domU, not necessarily for a dom0 given
> > that any device id can be used with INT.
> 
>    If the INT command is with valid device (not phantom) then Xen can translate
> and send to ITS hardware

That is not what this design says, Xen will translate and call
vgic_interrupt_inject, which doesn't go via the hardware ITS at all,
since it doesn't need to.

[...]
> Sorry. I correct it as "So effectively no physical commands are sent
> to ITS hardware that are related to dummy/phantom devices"

Remember that in this design the vits doesn't generate any commands to
the physical its _at all_ even for real devices. It just calls
core/generic APIs.

Ian.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Draft D] Xen on ARM vITS Handling
  2015-06-05 16:38           ` Ian Campbell
@ 2015-06-05 17:11             ` Vijay Kilari
  2015-06-08  9:59               ` Ian Campbell
  0 siblings, 1 reply; 18+ messages in thread
From: Vijay Kilari @ 2015-06-05 17:11 UTC (permalink / raw
  To: Ian Campbell
  Cc: manish.jaggi, Julien Grall, Stefano Stabellini,
	xen-devel@lists.xen.org

On Fri, Jun 5, 2015 at 10:08 PM, Ian Campbell <ian.campbell@citrix.com> wrote:
> On Fri, 2015-06-05 at 21:25 +0530, Vijay Kilari wrote:
>> Let xen mark those phantom devices added using MAPD as dummy and
>> just emulate and does not translate ITS commands for these devices.
>
> But we think guests might use this mechanism to drive completion
> (instead of polling), so we have to translate INT commands for such
> devices, don't we? Otherwise such guests won't work.

  I propose, for guests that use completion INT, the XEN can inject
lpi by calling vgic_interrupt_inject instead of ITS HW raising interrupt.

>
>> >> So deviceid of any MAPD command for all the guests should be have been
>> >> within pre-identified list.
>> >
>> > I don't think that is true for a domU, not necessarily for a dom0 given
>> > that any device id can be used with INT.
>>
>>    If the INT command is with valid device (not phantom) then Xen can translate
>> and send to ITS hardware
>
> That is not what this design says, Xen will translate and call
> vgic_interrupt_inject, which doesn't go via the hardware ITS at all,
> since it doesn't need to.

I mean, For valid devices, LPI interrupt will be raised when ITS hw
process INT command.
from there interrupt handler will inject to domain

For phantom devices, Xen will inject to directly to domain on seeing
INT command.
(we have to ensure that all ITS commands are processed before
injecting INT to domain)

>
> [...]
>> Sorry. I correct it as "So effectively no physical commands are sent
>> to ITS hardware that are related to dummy/phantom devices"
>
> Remember that in this design the vits doesn't generate any commands to
> the physical its _at all_ even for real devices. It just calls
> core/generic APIs.

OK. you mean pITS driver owns translation and sending ITS commands to HW
instead of vITS?

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Draft D] Xen on ARM vITS Handling
  2015-06-05 17:11             ` Vijay Kilari
@ 2015-06-08  9:59               ` Ian Campbell
  0 siblings, 0 replies; 18+ messages in thread
From: Ian Campbell @ 2015-06-08  9:59 UTC (permalink / raw
  To: Vijay Kilari
  Cc: manish.jaggi, Julien Grall, Stefano Stabellini,
	xen-devel@lists.xen.org

On Fri, 2015-06-05 at 22:41 +0530, Vijay Kilari wrote:
> On Fri, Jun 5, 2015 at 10:08 PM, Ian Campbell <ian.campbell@citrix.com> wrote:
> > On Fri, 2015-06-05 at 21:25 +0530, Vijay Kilari wrote:
> >> Let xen mark those phantom devices added using MAPD as dummy and
> >> just emulate and does not translate ITS commands for these devices.
> >
> > But we think guests might use this mechanism to drive completion
> > (instead of polling), so we have to translate INT commands for such
> > devices, don't we? Otherwise such guests won't work.
> 
>   I propose, for guests that use completion INT, the XEN can inject
> lpi by calling vgic_interrupt_inject instead of ITS HW raising interrupt.

Have you read this draft? It says exactly that.

This draft is significantly different from the one which came before, it
is worth rereading the whole thing since most assumption made based on
draft C will now be invalid.

> >> >> So deviceid of any MAPD command for all the guests should be have been
> >> >> within pre-identified list.
> >> >
> >> > I don't think that is true for a domU, not necessarily for a dom0 given
> >> > that any device id can be used with INT.
> >>
> >>    If the INT command is with valid device (not phantom) then Xen can translate
> >> and send to ITS hardware
> >
> > That is not what this design says, Xen will translate and call
> > vgic_interrupt_inject, which doesn't go via the hardware ITS at all,
> > since it doesn't need to.
> 
> I mean, For valid devices, LPI interrupt will be raised when ITS hw
> process INT command.
> from there interrupt handler will inject to domain
> 
> For phantom devices, Xen will inject to directly to domain on seeing
> INT command.
> (we have to ensure that all ITS commands are processed before
> injecting INT to domain)

The design says to use vgic_interrupt_inject for INT commands on any
device, whether phantom or real. It makes no distinction, because it
doesn't need to.

> > [...]
> >> Sorry. I correct it as "So effectively no physical commands are sent
> >> to ITS hardware that are related to dummy/phantom devices"
> >
> > Remember that in this design the vits doesn't generate any commands to
> > the physical its _at all_ even for real devices. It just calls
> > core/generic APIs.
> 
> OK. you mean pITS driver owns translation and sending ITS commands to HW
> instead of vITS?

No, there is no "translation" in this design.

e.g. the vits sees a command which should enable an interrupt. After
figuring out which plpi corresponds it just calls "enable_irq(plpi)". It
doesn't care what that turns into, core interrupt handling core and the
backend pgic driver will take care of doing what is requested. That same
is true for all commands in this design.

In draft D of this design there is _no_ linkage between vits and pits at
all. It is completely abstracted.

Ian.

^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2015-06-08  9:59 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2015-06-04 13:54 [Draft D] Xen on ARM vITS Handling Ian Campbell
2015-06-04 17:55 ` Julien Grall
2015-06-04 19:08   ` Julien Grall
2015-06-05 10:24   ` Ian Campbell
2015-06-05 12:15     ` Julien Grall
2015-06-05 13:20       ` Ian Campbell
2015-06-05 14:12         ` Julien Grall
2015-06-05  6:07 ` Vijay Kilari
2015-06-05  9:16   ` Ian Campbell
2015-06-05  9:28   ` Julien Grall
2015-06-05  9:51     ` Ian Campbell
2015-06-05  9:49   ` Ian Campbell
2015-06-05 12:41     ` Vijay Kilari
2015-06-05 13:28       ` Ian Campbell
2015-06-05 15:55         ` Vijay Kilari
2015-06-05 16:38           ` Ian Campbell
2015-06-05 17:11             ` Vijay Kilari
2015-06-08  9:59               ` Ian Campbell

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).