All the mail mirrored from lore.kernel.org
 help / color / mirror / Atom feed
From: Maxim Levitsky <mlevitsk@redhat.com>
To: Bart Van Assche <bvanassche@acm.org>, linux-nvme@lists.infradead.org
Cc: Fam Zheng <fam@euphon.net>, Jens Axboe <axboe@fb.com>,
	Alex Williamson <alex.williamson@redhat.com>,
	Sagi Grimberg <sagi@grimberg.me>,
	kvm@vger.kernel.org, Wolfram Sang <wsa@the-dreams.de>,
	Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
	Liang Cunming <cunming.liang@intel.com>,
	Nicolas Ferre <nicolas.ferre@microchip.com>,
	linux-kernel@vger.kernel.org,
	Liu Changpeng <changpeng.liu@intel.com>,
	Keith Busch <keith.busch@intel.com>,
	Kirti Wankhede <kwankhede@nvidia.com>,
	Christoph Hellwig <hch@lst.de>,
	Paolo Bonzini <pbonzini@redhat.com>,
	Mauro Carvalho Chehab <mchehab+samsung@kernel.org>,
	John Ferlan <jferlan@redhat.com>,
	"Paul E . McKenney" <paulmck@linux.ibm.com>,
	Amnon Ilan <ailan@redhat.com>,
	"David S . Miller" <davem@davemloft.net>
Subject: Re: [PATCH 0/9] RFC: NVME VFIO mediated device
Date: Wed, 20 Mar 2019 18:42:02 +0200	[thread overview]
Message-ID: <8994f43d26ebf6040b9d5d5e3866ee81abcf1a1c.camel@redhat.com> (raw)
In-Reply-To: <1553095686.65329.36.camel@acm.org>

On Wed, 2019-03-20 at 08:28 -0700, Bart Van Assche wrote:
> On Tue, 2019-03-19 at 16:41 +0200, Maxim Levitsky wrote:
> > *  All guest memory is mapped into the physical nvme device 
> >    but not 1:1 as vfio-pci would do this.
> >    This allows very efficient DMA.
> >    To support this, patch 2 adds ability for a mdev device to listen on 
> >    guest's memory map events. 
> >    Any such memory is immediately pinned and then DMA mapped.
> >    (Support for fabric drivers where this is not possible exits too,
> >     in which case the fabric driver will do its own DMA mapping)
> 
> Does this mean that all guest memory is pinned all the time? If so, are you
> sure that's acceptable?
I think so. The VFIO pci passthrough also pins all the guest memory.
SPDK also does this (pins and dma maps) all the guest memory.

I agree that this is not an ideal solution but this is a fastest and simplest
solution possible.

> 
> Additionally, what is the performance overhead of the IOMMU notifier added
> by patch 8/9? How often was that notifier called per second in your tests
> and how much time was spent per call in the notifier callbacks?

To be honest I haven't optimized my IOMMU notifier at all, so when it is called,
it stops the IO thread, does its work and then restarts it which is very slow.

Fortunelly it is not called at all during normal operation as VFIO dma map/unmap
events are really rare and happen only on guest boot.

The same is even true for nested guests, as nested guest startup causes a wave
of map unmap events while shadow IOMMU updates, but then it just uses these
mapping without changing them.

The only case when performance is really bad is when you boot a guest with
iommu=on intel_iommu=on and then use the nvme driver there. In this case, the
driver in the guest does itself IOMMU maps/unmaps (on the virtual IOMMU) and for
each such event my VFIO map/unmap callback is called.

This can be optimized though to be much better using also some kind of queued
invalidation in my driver. iommu=pt meanwhile in the guest solves that issue.

Best regards,
	Maxim Levitsky

> 
> Thanks,
> 
> Bart.
> 
> _______________________________________________
> Linux-nvme mailing list
> Linux-nvme@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-nvme


WARNING: multiple messages have this Message-ID (diff)
From: mlevitsk@redhat.com (Maxim Levitsky)
Subject: [PATCH 0/9] RFC: NVME VFIO mediated device
Date: Wed, 20 Mar 2019 18:42:02 +0200	[thread overview]
Message-ID: <8994f43d26ebf6040b9d5d5e3866ee81abcf1a1c.camel@redhat.com> (raw)
In-Reply-To: <1553095686.65329.36.camel@acm.org>

On Wed, 2019-03-20@08:28 -0700, Bart Van Assche wrote:
> On Tue, 2019-03-19@16:41 +0200, Maxim Levitsky wrote:
> > *  All guest memory is mapped into the physical nvme device 
> >    but not 1:1 as vfio-pci would do this.
> >    This allows very efficient DMA.
> >    To support this, patch 2 adds ability for a mdev device to listen on 
> >    guest's memory map events. 
> >    Any such memory is immediately pinned and then DMA mapped.
> >    (Support for fabric drivers where this is not possible exits too,
> >     in which case the fabric driver will do its own DMA mapping)
> 
> Does this mean that all guest memory is pinned all the time? If so, are you
> sure that's acceptable?
I think so. The VFIO pci passthrough also pins all the guest memory.
SPDK also does this (pins and dma maps) all the guest memory.

I agree that this is not an ideal solution but this is a fastest and simplest
solution possible.

> 
> Additionally, what is the performance overhead of the IOMMU notifier added
> by patch 8/9? How often was that notifier called per second in your tests
> and how much time was spent per call in the notifier callbacks?

To be honest I haven't optimized my IOMMU notifier at all, so when it is called,
it stops the IO thread, does its work and then restarts it which is very slow.

Fortunelly it is not called at all during normal operation as VFIO dma map/unmap
events are really rare and happen only on guest boot.

The same is even true for nested guests, as nested guest startup causes a wave
of map unmap events while shadow IOMMU updates, but then it just uses these
mapping without changing them.

The only case when performance is really bad is when you boot a guest with
iommu=on intel_iommu=on and then use the nvme driver there. In this case, the
driver in the guest does itself IOMMU maps/unmaps (on the virtual IOMMU) and for
each such event my VFIO map/unmap callback is called.

This can be optimized though to be much better using also some kind of queued
invalidation in my driver. iommu=pt meanwhile in the guest solves that issue.

Best regards,
	Maxim Levitsky

> 
> Thanks,
> 
> Bart.
> 
> _______________________________________________
> Linux-nvme mailing list
> Linux-nvme at lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-nvme

  reply	other threads:[~2019-03-20 16:42 UTC|newest]

Thread overview: 117+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-03-19 14:41 No subject Maxim Levitsky
2019-03-19 14:41 ` (unknown) Maxim Levitsky
2019-03-19 14:41 ` [PATCH 1/9] vfio/mdev: add .request callback Maxim Levitsky
2019-03-19 14:41   ` Maxim Levitsky
2019-03-19 14:41   ` Maxim Levitsky
2019-03-19 14:41 ` [PATCH 2/9] nvme/core: add some more values from the spec Maxim Levitsky
2019-03-19 14:41   ` Maxim Levitsky
2019-03-19 14:41   ` Maxim Levitsky
2019-03-19 14:41 ` [PATCH 3/9] nvme/core: add NVME_CTRL_SUSPENDED controller state Maxim Levitsky
2019-03-19 14:41   ` Maxim Levitsky
2019-03-19 14:41   ` Maxim Levitsky
2019-03-19 14:41 ` [PATCH 4/9] nvme/pci: use the NVME_CTRL_SUSPENDED state Maxim Levitsky
2019-03-19 14:41   ` Maxim Levitsky
2019-03-19 14:41   ` Maxim Levitsky
2019-03-20  2:54   ` Fam Zheng
2019-03-20  2:54     ` Fam Zheng
2019-03-19 14:41 ` [PATCH 5/9] nvme/pci: add known admin effects to augument admin effects log page Maxim Levitsky
2019-03-19 14:41   ` Maxim Levitsky
2019-03-19 14:41   ` Maxim Levitsky
2019-03-19 14:41 ` [PATCH 6/9] nvme/pci: init shadow doorbell after each reset Maxim Levitsky
2019-03-19 14:41   ` Maxim Levitsky
2019-03-19 14:41   ` Maxim Levitsky
2019-03-19 14:41 ` [PATCH 7/9] nvme/core: add mdev interfaces Maxim Levitsky
2019-03-19 14:41   ` Maxim Levitsky
2019-03-19 14:41   ` Maxim Levitsky
2019-03-20 11:46   ` Stefan Hajnoczi
2019-03-20 11:46     ` Stefan Hajnoczi
2019-03-20 11:46     ` Stefan Hajnoczi
2019-03-20 12:50     ` Maxim Levitsky
2019-03-20 12:50       ` Maxim Levitsky
2019-03-20 12:50       ` Maxim Levitsky
2019-03-19 14:41 ` [PATCH 8/9] nvme/core: add nvme-mdev core driver Maxim Levitsky
2019-03-19 14:41   ` Maxim Levitsky
2019-03-19 14:41   ` Maxim Levitsky
2019-03-19 14:41 ` [PATCH 9/9] nvme/pci: implement the mdev external queue allocation interface Maxim Levitsky
2019-03-19 14:41   ` Maxim Levitsky
2019-03-19 14:41   ` Maxim Levitsky
2019-03-19 14:58 ` [PATCH 0/9] RFC: NVME VFIO mediated device Maxim Levitsky
2019-03-19 14:58   ` Maxim Levitsky
2019-03-25 18:52   ` [PATCH 0/9] RFC: NVME VFIO mediated device [BENCHMARKS] Maxim Levitsky
2019-03-25 18:52     ` Maxim Levitsky
2019-03-26  9:38     ` Stefan Hajnoczi
2019-03-26  9:38       ` Stefan Hajnoczi
2019-03-26  9:38       ` Stefan Hajnoczi
2019-03-26  9:50       ` Maxim Levitsky
2019-03-26  9:50         ` Maxim Levitsky
2019-03-26  9:50         ` Maxim Levitsky
2019-03-19 15:22 ` your mail Keith Busch
2019-03-19 15:22   ` Keith Busch
2019-03-19 15:22   ` Keith Busch
2019-03-19 23:49   ` Chaitanya Kulkarni
2019-03-19 23:49     ` Chaitanya Kulkarni
2019-03-19 23:49     ` Chaitanya Kulkarni
2019-03-20 16:44     ` Maxim Levitsky
2019-03-20 16:44       ` Maxim Levitsky
2019-03-20 16:44       ` Maxim Levitsky
2019-03-20 16:30   ` Maxim Levitsky
2019-03-20 16:30     ` Maxim Levitsky
2019-03-20 16:30     ` Maxim Levitsky
2019-03-20 17:03     ` Keith Busch
2019-03-20 17:03       ` Keith Busch
2019-03-20 17:03       ` Keith Busch
2019-03-20 17:33       ` Maxim Levitsky
2019-03-20 17:33         ` Maxim Levitsky
2019-03-20 17:33         ` Maxim Levitsky
2019-04-08 10:04   ` Maxim Levitsky
2019-04-08 10:04     ` Maxim Levitsky
2019-03-20 11:03 ` Felipe Franciosi
2019-03-20 11:03   ` Re: Felipe Franciosi
2019-03-20 11:03   ` No subject Felipe Franciosi
2019-03-20 19:08   ` Maxim Levitsky
2019-03-20 19:08     ` Re: Maxim Levitsky
2019-03-20 19:08     ` No subject Maxim Levitsky
2019-03-21 16:12     ` Stefan Hajnoczi
2019-03-21 16:12       ` Re: Stefan Hajnoczi
2019-03-21 16:12       ` No subject Stefan Hajnoczi
2019-03-21 16:21       ` Keith Busch
2019-03-21 16:21         ` Re: Keith Busch
2019-03-21 16:21         ` No subject Keith Busch
2019-03-21 16:41         ` Felipe Franciosi
2019-03-21 16:41           ` Re: Felipe Franciosi
2019-03-21 16:41           ` No subject Felipe Franciosi
2019-03-21 17:04           ` Maxim Levitsky
2019-03-21 17:04             ` Re: Maxim Levitsky
2019-03-21 17:04             ` No subject Maxim Levitsky
2019-03-22  7:54             ` Felipe Franciosi
2019-03-22  7:54               ` Re: Felipe Franciosi
2019-03-22  7:54               ` No subject Felipe Franciosi
2019-03-22 10:32               ` Maxim Levitsky
2019-03-22 10:32                 ` Re: Maxim Levitsky
2019-03-22 10:32                 ` No subject Maxim Levitsky
2019-03-22 15:30               ` Keith Busch
2019-03-22 15:30                 ` Re: Keith Busch
2019-03-22 15:30                 ` No subject Keith Busch
2019-03-25 15:44                 ` Felipe Franciosi
2019-03-25 15:44                   ` Re: Felipe Franciosi
2019-03-25 15:44                   ` No subject Felipe Franciosi
2019-03-20 15:08 ` [PATCH 0/9] RFC: NVME VFIO mediated device Bart Van Assche
2019-03-20 15:08   ` Bart Van Assche
2019-03-20 16:48   ` Maxim Levitsky
2019-03-20 16:48     ` Maxim Levitsky
2019-03-20 15:28 ` Bart Van Assche
2019-03-20 15:28   ` Bart Van Assche
2019-03-20 16:42   ` Maxim Levitsky [this message]
2019-03-20 16:42     ` Maxim Levitsky
2019-03-20 17:03     ` Alex Williamson
2019-03-20 17:03       ` Alex Williamson
2019-03-20 17:03       ` Alex Williamson
2019-03-21 16:13 ` your mail Stefan Hajnoczi
2019-03-21 16:13   ` Stefan Hajnoczi
2019-03-21 16:13   ` Stefan Hajnoczi
2019-03-21 17:07   ` Maxim Levitsky
2019-03-21 17:07     ` Maxim Levitsky
2019-03-21 17:07     ` Maxim Levitsky
2019-03-25 16:46     ` Stefan Hajnoczi
2019-03-25 16:46       ` Stefan Hajnoczi
2019-03-25 16:46       ` Stefan Hajnoczi

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=8994f43d26ebf6040b9d5d5e3866ee81abcf1a1c.camel@redhat.com \
    --to=mlevitsk@redhat.com \
    --cc=ailan@redhat.com \
    --cc=alex.williamson@redhat.com \
    --cc=axboe@fb.com \
    --cc=bvanassche@acm.org \
    --cc=changpeng.liu@intel.com \
    --cc=cunming.liang@intel.com \
    --cc=davem@davemloft.net \
    --cc=fam@euphon.net \
    --cc=gregkh@linuxfoundation.org \
    --cc=hch@lst.de \
    --cc=jferlan@redhat.com \
    --cc=keith.busch@intel.com \
    --cc=kvm@vger.kernel.org \
    --cc=kwankhede@nvidia.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-nvme@lists.infradead.org \
    --cc=mchehab+samsung@kernel.org \
    --cc=nicolas.ferre@microchip.com \
    --cc=paulmck@linux.ibm.com \
    --cc=pbonzini@redhat.com \
    --cc=sagi@grimberg.me \
    --cc=wsa@the-dreams.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.