From: Maxim Levitsky <mlevitsk@redhat.com> To: Bart Van Assche <bvanassche@acm.org>, linux-nvme@lists.infradead.org Cc: Fam Zheng <fam@euphon.net>, Jens Axboe <axboe@fb.com>, Alex Williamson <alex.williamson@redhat.com>, Sagi Grimberg <sagi@grimberg.me>, kvm@vger.kernel.org, Wolfram Sang <wsa@the-dreams.de>, Greg Kroah-Hartman <gregkh@linuxfoundation.org>, Liang Cunming <cunming.liang@intel.com>, Nicolas Ferre <nicolas.ferre@microchip.com>, linux-kernel@vger.kernel.org, Liu Changpeng <changpeng.liu@intel.com>, Keith Busch <keith.busch@intel.com>, Kirti Wankhede <kwankhede@nvidia.com>, Christoph Hellwig <hch@lst.de>, Paolo Bonzini <pbonzini@redhat.com>, Mauro Carvalho Chehab <mchehab+samsung@kernel.org>, John Ferlan <jferlan@redhat.com>, "Paul E . McKenney" <paulmck@linux.ibm.com>, Amnon Ilan <ailan@redhat.com>, "David S . Miller" <davem@davemloft.net> Subject: Re: [PATCH 0/9] RFC: NVME VFIO mediated device Date: Wed, 20 Mar 2019 18:42:02 +0200 [thread overview] Message-ID: <8994f43d26ebf6040b9d5d5e3866ee81abcf1a1c.camel@redhat.com> (raw) In-Reply-To: <1553095686.65329.36.camel@acm.org> On Wed, 2019-03-20 at 08:28 -0700, Bart Van Assche wrote: > On Tue, 2019-03-19 at 16:41 +0200, Maxim Levitsky wrote: > > * All guest memory is mapped into the physical nvme device > > but not 1:1 as vfio-pci would do this. > > This allows very efficient DMA. > > To support this, patch 2 adds ability for a mdev device to listen on > > guest's memory map events. > > Any such memory is immediately pinned and then DMA mapped. > > (Support for fabric drivers where this is not possible exits too, > > in which case the fabric driver will do its own DMA mapping) > > Does this mean that all guest memory is pinned all the time? If so, are you > sure that's acceptable? I think so. The VFIO pci passthrough also pins all the guest memory. SPDK also does this (pins and dma maps) all the guest memory. I agree that this is not an ideal solution but this is a fastest and simplest solution possible. > > Additionally, what is the performance overhead of the IOMMU notifier added > by patch 8/9? How often was that notifier called per second in your tests > and how much time was spent per call in the notifier callbacks? To be honest I haven't optimized my IOMMU notifier at all, so when it is called, it stops the IO thread, does its work and then restarts it which is very slow. Fortunelly it is not called at all during normal operation as VFIO dma map/unmap events are really rare and happen only on guest boot. The same is even true for nested guests, as nested guest startup causes a wave of map unmap events while shadow IOMMU updates, but then it just uses these mapping without changing them. The only case when performance is really bad is when you boot a guest with iommu=on intel_iommu=on and then use the nvme driver there. In this case, the driver in the guest does itself IOMMU maps/unmaps (on the virtual IOMMU) and for each such event my VFIO map/unmap callback is called. This can be optimized though to be much better using also some kind of queued invalidation in my driver. iommu=pt meanwhile in the guest solves that issue. Best regards, Maxim Levitsky > > Thanks, > > Bart. > > _______________________________________________ > Linux-nvme mailing list > Linux-nvme@lists.infradead.org > http://lists.infradead.org/mailman/listinfo/linux-nvme
WARNING: multiple messages have this Message-ID (diff)
From: mlevitsk@redhat.com (Maxim Levitsky) Subject: [PATCH 0/9] RFC: NVME VFIO mediated device Date: Wed, 20 Mar 2019 18:42:02 +0200 [thread overview] Message-ID: <8994f43d26ebf6040b9d5d5e3866ee81abcf1a1c.camel@redhat.com> (raw) In-Reply-To: <1553095686.65329.36.camel@acm.org> On Wed, 2019-03-20@08:28 -0700, Bart Van Assche wrote: > On Tue, 2019-03-19@16:41 +0200, Maxim Levitsky wrote: > > * All guest memory is mapped into the physical nvme device > > but not 1:1 as vfio-pci would do this. > > This allows very efficient DMA. > > To support this, patch 2 adds ability for a mdev device to listen on > > guest's memory map events. > > Any such memory is immediately pinned and then DMA mapped. > > (Support for fabric drivers where this is not possible exits too, > > in which case the fabric driver will do its own DMA mapping) > > Does this mean that all guest memory is pinned all the time? If so, are you > sure that's acceptable? I think so. The VFIO pci passthrough also pins all the guest memory. SPDK also does this (pins and dma maps) all the guest memory. I agree that this is not an ideal solution but this is a fastest and simplest solution possible. > > Additionally, what is the performance overhead of the IOMMU notifier added > by patch 8/9? How often was that notifier called per second in your tests > and how much time was spent per call in the notifier callbacks? To be honest I haven't optimized my IOMMU notifier at all, so when it is called, it stops the IO thread, does its work and then restarts it which is very slow. Fortunelly it is not called at all during normal operation as VFIO dma map/unmap events are really rare and happen only on guest boot. The same is even true for nested guests, as nested guest startup causes a wave of map unmap events while shadow IOMMU updates, but then it just uses these mapping without changing them. The only case when performance is really bad is when you boot a guest with iommu=on intel_iommu=on and then use the nvme driver there. In this case, the driver in the guest does itself IOMMU maps/unmaps (on the virtual IOMMU) and for each such event my VFIO map/unmap callback is called. This can be optimized though to be much better using also some kind of queued invalidation in my driver. iommu=pt meanwhile in the guest solves that issue. Best regards, Maxim Levitsky > > Thanks, > > Bart. > > _______________________________________________ > Linux-nvme mailing list > Linux-nvme at lists.infradead.org > http://lists.infradead.org/mailman/listinfo/linux-nvme
next prev parent reply other threads:[~2019-03-20 16:42 UTC|newest] Thread overview: 117+ messages / expand[flat|nested] mbox.gz Atom feed top 2019-03-19 14:41 No subject Maxim Levitsky 2019-03-19 14:41 ` (unknown) Maxim Levitsky 2019-03-19 14:41 ` [PATCH 1/9] vfio/mdev: add .request callback Maxim Levitsky 2019-03-19 14:41 ` Maxim Levitsky 2019-03-19 14:41 ` Maxim Levitsky 2019-03-19 14:41 ` [PATCH 2/9] nvme/core: add some more values from the spec Maxim Levitsky 2019-03-19 14:41 ` Maxim Levitsky 2019-03-19 14:41 ` Maxim Levitsky 2019-03-19 14:41 ` [PATCH 3/9] nvme/core: add NVME_CTRL_SUSPENDED controller state Maxim Levitsky 2019-03-19 14:41 ` Maxim Levitsky 2019-03-19 14:41 ` Maxim Levitsky 2019-03-19 14:41 ` [PATCH 4/9] nvme/pci: use the NVME_CTRL_SUSPENDED state Maxim Levitsky 2019-03-19 14:41 ` Maxim Levitsky 2019-03-19 14:41 ` Maxim Levitsky 2019-03-20 2:54 ` Fam Zheng 2019-03-20 2:54 ` Fam Zheng 2019-03-19 14:41 ` [PATCH 5/9] nvme/pci: add known admin effects to augument admin effects log page Maxim Levitsky 2019-03-19 14:41 ` Maxim Levitsky 2019-03-19 14:41 ` Maxim Levitsky 2019-03-19 14:41 ` [PATCH 6/9] nvme/pci: init shadow doorbell after each reset Maxim Levitsky 2019-03-19 14:41 ` Maxim Levitsky 2019-03-19 14:41 ` Maxim Levitsky 2019-03-19 14:41 ` [PATCH 7/9] nvme/core: add mdev interfaces Maxim Levitsky 2019-03-19 14:41 ` Maxim Levitsky 2019-03-19 14:41 ` Maxim Levitsky 2019-03-20 11:46 ` Stefan Hajnoczi 2019-03-20 11:46 ` Stefan Hajnoczi 2019-03-20 11:46 ` Stefan Hajnoczi 2019-03-20 12:50 ` Maxim Levitsky 2019-03-20 12:50 ` Maxim Levitsky 2019-03-20 12:50 ` Maxim Levitsky 2019-03-19 14:41 ` [PATCH 8/9] nvme/core: add nvme-mdev core driver Maxim Levitsky 2019-03-19 14:41 ` Maxim Levitsky 2019-03-19 14:41 ` Maxim Levitsky 2019-03-19 14:41 ` [PATCH 9/9] nvme/pci: implement the mdev external queue allocation interface Maxim Levitsky 2019-03-19 14:41 ` Maxim Levitsky 2019-03-19 14:41 ` Maxim Levitsky 2019-03-19 14:58 ` [PATCH 0/9] RFC: NVME VFIO mediated device Maxim Levitsky 2019-03-19 14:58 ` Maxim Levitsky 2019-03-25 18:52 ` [PATCH 0/9] RFC: NVME VFIO mediated device [BENCHMARKS] Maxim Levitsky 2019-03-25 18:52 ` Maxim Levitsky 2019-03-26 9:38 ` Stefan Hajnoczi 2019-03-26 9:38 ` Stefan Hajnoczi 2019-03-26 9:38 ` Stefan Hajnoczi 2019-03-26 9:50 ` Maxim Levitsky 2019-03-26 9:50 ` Maxim Levitsky 2019-03-26 9:50 ` Maxim Levitsky 2019-03-19 15:22 ` your mail Keith Busch 2019-03-19 15:22 ` Keith Busch 2019-03-19 15:22 ` Keith Busch 2019-03-19 23:49 ` Chaitanya Kulkarni 2019-03-19 23:49 ` Chaitanya Kulkarni 2019-03-19 23:49 ` Chaitanya Kulkarni 2019-03-20 16:44 ` Maxim Levitsky 2019-03-20 16:44 ` Maxim Levitsky 2019-03-20 16:44 ` Maxim Levitsky 2019-03-20 16:30 ` Maxim Levitsky 2019-03-20 16:30 ` Maxim Levitsky 2019-03-20 16:30 ` Maxim Levitsky 2019-03-20 17:03 ` Keith Busch 2019-03-20 17:03 ` Keith Busch 2019-03-20 17:03 ` Keith Busch 2019-03-20 17:33 ` Maxim Levitsky 2019-03-20 17:33 ` Maxim Levitsky 2019-03-20 17:33 ` Maxim Levitsky 2019-04-08 10:04 ` Maxim Levitsky 2019-04-08 10:04 ` Maxim Levitsky 2019-03-20 11:03 ` Felipe Franciosi 2019-03-20 11:03 ` Re: Felipe Franciosi 2019-03-20 11:03 ` No subject Felipe Franciosi 2019-03-20 19:08 ` Maxim Levitsky 2019-03-20 19:08 ` Re: Maxim Levitsky 2019-03-20 19:08 ` No subject Maxim Levitsky 2019-03-21 16:12 ` Stefan Hajnoczi 2019-03-21 16:12 ` Re: Stefan Hajnoczi 2019-03-21 16:12 ` No subject Stefan Hajnoczi 2019-03-21 16:21 ` Keith Busch 2019-03-21 16:21 ` Re: Keith Busch 2019-03-21 16:21 ` No subject Keith Busch 2019-03-21 16:41 ` Felipe Franciosi 2019-03-21 16:41 ` Re: Felipe Franciosi 2019-03-21 16:41 ` No subject Felipe Franciosi 2019-03-21 17:04 ` Maxim Levitsky 2019-03-21 17:04 ` Re: Maxim Levitsky 2019-03-21 17:04 ` No subject Maxim Levitsky 2019-03-22 7:54 ` Felipe Franciosi 2019-03-22 7:54 ` Re: Felipe Franciosi 2019-03-22 7:54 ` No subject Felipe Franciosi 2019-03-22 10:32 ` Maxim Levitsky 2019-03-22 10:32 ` Re: Maxim Levitsky 2019-03-22 10:32 ` No subject Maxim Levitsky 2019-03-22 15:30 ` Keith Busch 2019-03-22 15:30 ` Re: Keith Busch 2019-03-22 15:30 ` No subject Keith Busch 2019-03-25 15:44 ` Felipe Franciosi 2019-03-25 15:44 ` Re: Felipe Franciosi 2019-03-25 15:44 ` No subject Felipe Franciosi 2019-03-20 15:08 ` [PATCH 0/9] RFC: NVME VFIO mediated device Bart Van Assche 2019-03-20 15:08 ` Bart Van Assche 2019-03-20 16:48 ` Maxim Levitsky 2019-03-20 16:48 ` Maxim Levitsky 2019-03-20 15:28 ` Bart Van Assche 2019-03-20 15:28 ` Bart Van Assche 2019-03-20 16:42 ` Maxim Levitsky [this message] 2019-03-20 16:42 ` Maxim Levitsky 2019-03-20 17:03 ` Alex Williamson 2019-03-20 17:03 ` Alex Williamson 2019-03-20 17:03 ` Alex Williamson 2019-03-21 16:13 ` your mail Stefan Hajnoczi 2019-03-21 16:13 ` Stefan Hajnoczi 2019-03-21 16:13 ` Stefan Hajnoczi 2019-03-21 17:07 ` Maxim Levitsky 2019-03-21 17:07 ` Maxim Levitsky 2019-03-21 17:07 ` Maxim Levitsky 2019-03-25 16:46 ` Stefan Hajnoczi 2019-03-25 16:46 ` Stefan Hajnoczi 2019-03-25 16:46 ` Stefan Hajnoczi
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to=8994f43d26ebf6040b9d5d5e3866ee81abcf1a1c.camel@redhat.com \ --to=mlevitsk@redhat.com \ --cc=ailan@redhat.com \ --cc=alex.williamson@redhat.com \ --cc=axboe@fb.com \ --cc=bvanassche@acm.org \ --cc=changpeng.liu@intel.com \ --cc=cunming.liang@intel.com \ --cc=davem@davemloft.net \ --cc=fam@euphon.net \ --cc=gregkh@linuxfoundation.org \ --cc=hch@lst.de \ --cc=jferlan@redhat.com \ --cc=keith.busch@intel.com \ --cc=kvm@vger.kernel.org \ --cc=kwankhede@nvidia.com \ --cc=linux-kernel@vger.kernel.org \ --cc=linux-nvme@lists.infradead.org \ --cc=mchehab+samsung@kernel.org \ --cc=nicolas.ferre@microchip.com \ --cc=paulmck@linux.ibm.com \ --cc=pbonzini@redhat.com \ --cc=sagi@grimberg.me \ --cc=wsa@the-dreams.de \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: linkBe sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.