dri-devel Archive mirror
 help / color / mirror / Atom feed
From: Felix Kuehling <felix.kuehling@amd.com>
To: "Christian König" <christian.koenig@amd.com>,
	"Daniel Vetter" <daniel@ffwll.ch>
Cc: "Brost, Matthew" <matthew.brost@intel.com>,
	"Thomas.Hellstrom@linux.intel.com"
	<Thomas.Hellstrom@linux.intel.com>,
	"Welty, Brian" <brian.welty@intel.com>,
	"Ghimiray, Himal Prasad" <himal.prasad.ghimiray@intel.com>,
	"dri-devel@lists.freedesktop.org"
	<dri-devel@lists.freedesktop.org>,
	"Gupta, saurabhg" <saurabhg.gupta@intel.com>,
	Danilo Krummrich <dakr@redhat.com>,
	"Zeng, Oak" <oak.zeng@intel.com>,
	"Bommu, Krishnaiah" <krishnaiah.bommu@intel.com>,
	Dave Airlie <airlied@redhat.com>,
	"Vishwanathapura,
	Niranjana" <niranjana.vishwanathapura@intel.com>,
	"intel-xe@lists.freedesktop.org" <intel-xe@lists.freedesktop.org>
Subject: Re: Making drm_gpuvm work across gpu devices
Date: Mon, 29 Jan 2024 12:52:27 -0500	[thread overview]
Message-ID: <39b5adbc-0d3f-4e34-9ede-12d6542ff892@amd.com> (raw)
In-Reply-To: <2444da7e-be62-4538-b42e-b234c763f3bd@amd.com>


On 2024-01-29 11:28, Christian König wrote:
> Am 29.01.24 um 17:24 schrieb Felix Kuehling:
>> On 2024-01-29 10:33, Christian König wrote:
>>> Am 29.01.24 um 16:03 schrieb Felix Kuehling:
>>>> On 2024-01-25 13:32, Daniel Vetter wrote:
>>>>> On Wed, Jan 24, 2024 at 09:33:12AM +0100, Christian König wrote:
>>>>>> Am 23.01.24 um 20:37 schrieb Zeng, Oak:
>>>>>>> [SNIP]
>>>>>>> Yes most API are per device based.
>>>>>>>
>>>>>>> One exception I know is actually the kfd SVM API. If you look at 
>>>>>>> the svm_ioctl function, it is per-process based. Each 
>>>>>>> kfd_process represent a process across N gpu devices.
>>>>>> Yeah and that was a big mistake in my opinion. We should really 
>>>>>> not do that
>>>>>> ever again.
>>>>>>
>>>>>>> Need to say, kfd SVM represent a shared virtual address space 
>>>>>>> across CPU and all GPU devices on the system. This is by the 
>>>>>>> definition of SVM (shared virtual memory). This is very 
>>>>>>> different from our legacy gpu *device* driver which works for 
>>>>>>> only one device (i.e., if you want one device to access another 
>>>>>>> device's memory, you will have to use dma-buf export/import etc).
>>>>>> Exactly that thinking is what we have currently found as blocker 
>>>>>> for a
>>>>>> virtualization projects. Having SVM as device independent feature 
>>>>>> which
>>>>>> somehow ties to the process address space turned out to be an 
>>>>>> extremely bad
>>>>>> idea.
>>>>>>
>>>>>> The background is that this only works for some use cases but not 
>>>>>> all of
>>>>>> them.
>>>>>>
>>>>>> What's working much better is to just have a mirror functionality 
>>>>>> which says
>>>>>> that a range A..B of the process address space is mapped into a 
>>>>>> range C..D
>>>>>> of the GPU address space.
>>>>>>
>>>>>> Those ranges can then be used to implement the SVM feature 
>>>>>> required for
>>>>>> higher level APIs and not something you need at the UAPI or even 
>>>>>> inside the
>>>>>> low level kernel memory management.
>>>>>>
>>>>>> When you talk about migrating memory to a device you also do this 
>>>>>> on a per
>>>>>> device basis and *not* tied to the process address space. If you 
>>>>>> then get
>>>>>> crappy performance because userspace gave contradicting 
>>>>>> information where to
>>>>>> migrate memory then that's a bug in userspace and not something 
>>>>>> the kernel
>>>>>> should try to prevent somehow.
>>>>>>
>>>>>> [SNIP]
>>>>>>>> I think if you start using the same drm_gpuvm for multiple 
>>>>>>>> devices you
>>>>>>>> will sooner or later start to run into the same mess we have 
>>>>>>>> seen with
>>>>>>>> KFD, where we moved more and more functionality from the KFD to 
>>>>>>>> the DRM
>>>>>>>> render node because we found that a lot of the stuff simply 
>>>>>>>> doesn't work
>>>>>>>> correctly with a single object to maintain the state.
>>>>>>> As I understand it, KFD is designed to work across devices. A 
>>>>>>> single pseudo /dev/kfd device represent all hardware gpu 
>>>>>>> devices. That is why during kfd open, many pdd (process device 
>>>>>>> data) is created, each for one hardware device for this process.
>>>>>> Yes, I'm perfectly aware of that. And I can only repeat myself 
>>>>>> that I see
>>>>>> this design as a rather extreme failure. And I think it's one of 
>>>>>> the reasons
>>>>>> why NVidia is so dominant with Cuda.
>>>>>>
>>>>>> This whole approach KFD takes was designed with the idea of 
>>>>>> extending the
>>>>>> CPU process into the GPUs, but this idea only works for a few use 
>>>>>> cases and
>>>>>> is not something we should apply to drivers in general.
>>>>>>
>>>>>> A very good example are virtualization use cases where you end up 
>>>>>> with CPU
>>>>>> address != GPU address because the VAs are actually coming from 
>>>>>> the guest VM
>>>>>> and not the host process.
>>>>>>
>>>>>> SVM is a high level concept of OpenCL, Cuda, ROCm etc.. This 
>>>>>> should not have
>>>>>> any influence on the design of the kernel UAPI.
>>>>>>
>>>>>> If you want to do something similar as KFD for Xe I think you 
>>>>>> need to get
>>>>>> explicit permission to do this from Dave and Daniel and maybe 
>>>>>> even Linus.
>>>>> I think the one and only one exception where an SVM uapi like in 
>>>>> kfd makes
>>>>> sense, is if the _hardware_ itself, not the software stack defined
>>>>> semantics that you've happened to build on top of that hw, 
>>>>> enforces a 1:1
>>>>> mapping with the cpu process address space.
>>>>>
>>>>> Which means your hardware is using PASID, IOMMU based translation, 
>>>>> PCI-ATS
>>>>> (address translation services) or whatever your hw calls it and 
>>>>> has _no_
>>>>> device-side pagetables on top. Which from what I've seen all 
>>>>> devices with
>>>>> device-memory have, simply because they need some place to store 
>>>>> whether
>>>>> that memory is currently in device memory or should be translated 
>>>>> using
>>>>> PASID. Currently there's no gpu that works with PASID only, but 
>>>>> there are
>>>>> some on-cpu-die accelerator things that do work like that.
>>>>>
>>>>> Maybe in the future there will be some accelerators that are fully 
>>>>> cpu
>>>>> cache coherent (including atomics) with something like CXL, and the
>>>>> on-device memory is managed as normal system memory with struct 
>>>>> page as
>>>>> ZONE_DEVICE and accelerator va -> physical address translation is 
>>>>> only
>>>>> done with PASID ... but for now I haven't seen that, definitely 
>>>>> not in
>>>>> upstream drivers.
>>>>>
>>>>> And the moment you have some per-device pagetables or per-device 
>>>>> memory
>>>>> management of some sort (like using gpuva mgr) then I'm 100% 
>>>>> agreeing with
>>>>> Christian that the kfd SVM model is too strict and not a great idea.
>>>>
>>>> That basically means, without ATS/PRI+PASID you cannot implement a 
>>>> unified memory programming model, where GPUs or accelerators access 
>>>> virtual addresses without pre-registering them with an SVM API call.
>>>>
>>>> Unified memory is a feature implemented by the KFD SVM API and used 
>>>> by ROCm. This is used e.g. to implement OpenMP USM (unified shared 
>>>> memory). It's implemented with recoverable GPU page faults. If the 
>>>> page fault interrupt handler cannot assume a shared virtual address 
>>>> space, then implementing this feature isn't possible.
>>>
>>> Why not? As far as I can see the OpenMP USM is just another funky 
>>> way of userptr handling.
>>>
>>> The difference is that in an userptr we assume that we always need 
>>> to request the whole block A..B from a mapping while for page fault 
>>> based handling it can be just any page in between A and B which is 
>>> requested and made available to the GPU address space.
>>>
>>> As far as I can see there is absolutely no need for any special SVM 
>>> handling.
>>
>> It does assume a shared virtual address space between CPU and GPUs. 
>> There are no API calls to tell the driver that address A on the CPU 
>> maps to address B on the GPU1 and address C on GPU2. The KFD SVM API 
>> was designed to work with this programming model, by augmenting the 
>> shared virtual address mappings with virtual address range attributes 
>> that can modify the migration policy and indicate prefetching, 
>> prefaulting, etc. You could think of it as madvise on steroids.
>
> Yeah, so what? In this case you just say through an IOCTL that CPU 
> range A..B should map to GPU range C..D and for A/B and C/D you use 
> the maximum of the address space.

What I want is that address range A..B on the CPU matches A..B on the 
GPU, because I'm sharing pointers between CPU and GPU. I can't think of 
any sane user mode using a unified memory programming model, that would 
ever ask KFD to map unified memory mappints to a different address range 
on the GPU. Adding such an ioclt is a complete waste of time, and can 
only serve to add unnecessary complexity.

Regards,
   Felix


>
> There is no restriction that this needs to be accurate in way. It's 
> just the it can be accurate to be more efficient and eventually use 
> only a fraction of the address space instead of all of it for some use 
> cases.
>
> So this isn't a blocker, it's just one special use case.
>
> Regards,
> Christian.
>
>>
>> Regards,
>>   Felix
>>
>>
>>>
>>> Regards,
>>> Christian.
>>>
>>>>
>>>> Regards,
>>>>   Felix
>>>>
>>>>
>>>>>
>>>>> Cheers, Sima
>>>
>

  reply	other threads:[~2024-01-29 17:52 UTC|newest]

Thread overview: 123+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-01-17 22:12 [PATCH 00/23] XeKmd basic SVM support Oak Zeng
2024-01-17 22:12 ` [PATCH 01/23] drm/xe/svm: Add SVM document Oak Zeng
2024-01-17 22:12 ` [PATCH 02/23] drm/xe/svm: Add svm key data structures Oak Zeng
2024-01-17 22:12 ` [PATCH 03/23] drm/xe/svm: create xe svm during vm creation Oak Zeng
2024-01-17 22:12 ` [PATCH 04/23] drm/xe/svm: Trace svm creation Oak Zeng
2024-01-17 22:12 ` [PATCH 05/23] drm/xe/svm: add helper to retrieve svm range from address Oak Zeng
2024-01-17 22:12 ` [PATCH 06/23] drm/xe/svm: Introduce a helper to build sg table from hmm range Oak Zeng
2024-04-05  0:39   ` Jason Gunthorpe
2024-04-05  3:33     ` Zeng, Oak
2024-04-05 12:37       ` Jason Gunthorpe
2024-04-05 16:42         ` Zeng, Oak
2024-04-05 18:02           ` Jason Gunthorpe
2024-04-09 16:45             ` Zeng, Oak
2024-04-09 17:24               ` Jason Gunthorpe
2024-04-23 21:17                 ` Zeng, Oak
2024-04-24  2:31                   ` Matthew Brost
2024-04-24 13:57                     ` Jason Gunthorpe
2024-04-24 16:35                       ` Matthew Brost
2024-04-24 16:44                         ` Jason Gunthorpe
2024-04-24 16:56                           ` Matthew Brost
2024-04-24 17:48                             ` Jason Gunthorpe
2024-04-24 13:48                   ` Jason Gunthorpe
2024-04-24 23:59                     ` Zeng, Oak
2024-04-25  1:05                       ` Jason Gunthorpe
2024-04-26  9:55                         ` Thomas Hellström
2024-04-26 12:00                           ` Jason Gunthorpe
2024-04-26 14:49                             ` Thomas Hellström
2024-04-26 16:35                               ` Jason Gunthorpe
2024-04-29  8:25                                 ` Thomas Hellström
2024-04-30 17:30                                   ` Jason Gunthorpe
2024-04-30 18:57                                     ` Daniel Vetter
2024-05-01  0:09                                       ` Jason Gunthorpe
2024-05-02  8:04                                         ` Daniel Vetter
2024-05-02  9:11                                           ` Thomas Hellström
2024-05-02 12:46                                             ` Jason Gunthorpe
2024-05-02 15:01                                               ` Thomas Hellström
2024-05-02 19:25                                                 ` Zeng, Oak
2024-05-03 13:37                                                   ` Jason Gunthorpe
2024-05-03 14:43                                                     ` Zeng, Oak
2024-05-03 16:28                                                       ` Jason Gunthorpe
2024-05-03 20:29                                                         ` Zeng, Oak
2024-05-04  1:03                                                           ` Dave Airlie
2024-05-06 13:04                                                             ` Daniel Vetter
2024-05-06 23:50                                                               ` Matthew Brost
2024-05-07 11:56                                                                 ` Jason Gunthorpe
2024-05-06 13:33                                                           ` Jason Gunthorpe
2024-04-09 17:33               ` Matthew Brost
2024-01-17 22:12 ` [PATCH 07/23] drm/xe/svm: Add helper for binding hmm range to gpu Oak Zeng
2024-01-17 22:12 ` [PATCH 08/23] drm/xe/svm: Add helper to invalidate svm range from GPU Oak Zeng
2024-01-17 22:12 ` [PATCH 09/23] drm/xe/svm: Remap and provide memmap backing for GPU vram Oak Zeng
2024-01-17 22:12 ` [PATCH 10/23] drm/xe/svm: Introduce svm migration function Oak Zeng
2024-01-17 22:12 ` [PATCH 11/23] drm/xe/svm: implement functions to allocate and free device memory Oak Zeng
2024-01-17 22:12 ` [PATCH 12/23] drm/xe/svm: Trace buddy block allocation and free Oak Zeng
2024-01-17 22:12 ` [PATCH 13/23] drm/xe/svm: Handle CPU page fault Oak Zeng
2024-01-17 22:12 ` [PATCH 14/23] drm/xe/svm: trace svm range migration Oak Zeng
2024-01-17 22:12 ` [PATCH 15/23] drm/xe/svm: Implement functions to register and unregister mmu notifier Oak Zeng
2024-01-17 22:12 ` [PATCH 16/23] drm/xe/svm: Implement the mmu notifier range invalidate callback Oak Zeng
2024-01-17 22:12 ` [PATCH 17/23] drm/xe/svm: clean up svm range during process exit Oak Zeng
2024-01-17 22:12 ` [PATCH 18/23] drm/xe/svm: Move a few structures to xe_gt.h Oak Zeng
2024-01-17 22:12 ` [PATCH 19/23] drm/xe/svm: migrate svm range to vram Oak Zeng
2024-01-17 22:12 ` [PATCH 20/23] drm/xe/svm: Populate svm range Oak Zeng
2024-01-17 22:12 ` [PATCH 21/23] drm/xe/svm: GPU page fault support Oak Zeng
2024-01-23  2:06   ` Welty, Brian
2024-01-23  3:09     ` Zeng, Oak
2024-01-23  3:21       ` Making drm_gpuvm work across gpu devices Zeng, Oak
2024-01-23 11:13         ` Christian König
2024-01-23 19:37           ` Zeng, Oak
2024-01-23 20:17             ` Felix Kuehling
2024-01-25  1:39               ` Zeng, Oak
2024-01-23 23:56             ` Danilo Krummrich
2024-01-24  3:57               ` Zeng, Oak
2024-01-24  4:14                 ` Zeng, Oak
2024-01-24  6:48                   ` Christian König
2024-01-25 22:13                 ` Danilo Krummrich
2024-01-24  8:33             ` Christian König
2024-01-25  1:17               ` Zeng, Oak
2024-01-25  1:25                 ` David Airlie
2024-01-25  5:25                   ` Zeng, Oak
2024-01-26 10:09                     ` Christian König
2024-01-26 20:13                       ` Zeng, Oak
2024-01-29 10:10                         ` Christian König
2024-01-29 20:09                           ` Zeng, Oak
2024-01-25 11:00                 ` 回复:Making " 周春明(日月)
2024-01-25 17:00                   ` Zeng, Oak
2024-01-25 17:15                 ` Making " Felix Kuehling
2024-01-25 18:37                   ` Zeng, Oak
2024-01-26 13:23                     ` Christian König
2024-01-25 16:42               ` Zeng, Oak
2024-01-25 18:32               ` Daniel Vetter
2024-01-25 21:02                 ` Zeng, Oak
2024-01-26  8:21                 ` Thomas Hellström
2024-01-26 12:52                   ` Christian König
2024-01-27  2:21                     ` Zeng, Oak
2024-01-29 10:19                       ` Christian König
2024-01-30  0:21                         ` Zeng, Oak
2024-01-30  8:39                           ` Christian König
2024-01-30 22:29                             ` Zeng, Oak
2024-01-30 23:12                               ` David Airlie
2024-01-31  9:15                                 ` Daniel Vetter
2024-01-31 20:17                                   ` Zeng, Oak
2024-01-31 20:59                                     ` Zeng, Oak
2024-02-01  8:52                                     ` Christian König
2024-02-29 18:22                                       ` Zeng, Oak
2024-03-08  4:43                                         ` Zeng, Oak
2024-03-08 10:07                                           ` Christian König
2024-01-30  8:43                           ` Thomas Hellström
2024-01-29 15:03                 ` Felix Kuehling
2024-01-29 15:33                   ` Christian König
2024-01-29 16:24                     ` Felix Kuehling
2024-01-29 16:28                       ` Christian König
2024-01-29 17:52                         ` Felix Kuehling [this message]
2024-01-29 19:03                           ` Christian König
2024-01-29 20:24                             ` Felix Kuehling
2024-02-23 20:12               ` Zeng, Oak
2024-02-27  6:54                 ` Christian König
2024-02-27 15:58                   ` Zeng, Oak
2024-02-28 19:51                     ` Zeng, Oak
2024-02-29  9:41                       ` Christian König
2024-02-29 16:05                         ` Zeng, Oak
2024-02-29 17:12                         ` Thomas Hellström
2024-03-01  7:01                           ` Christian König
2024-01-17 22:12 ` [PATCH 22/23] drm/xe/svm: Add DRM_XE_SVM kernel config entry Oak Zeng
2024-01-17 22:12 ` [PATCH 23/23] drm/xe/svm: Add svm memory hints interface Oak Zeng

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=39b5adbc-0d3f-4e34-9ede-12d6542ff892@amd.com \
    --to=felix.kuehling@amd.com \
    --cc=Thomas.Hellstrom@linux.intel.com \
    --cc=airlied@redhat.com \
    --cc=brian.welty@intel.com \
    --cc=christian.koenig@amd.com \
    --cc=dakr@redhat.com \
    --cc=daniel@ffwll.ch \
    --cc=dri-devel@lists.freedesktop.org \
    --cc=himal.prasad.ghimiray@intel.com \
    --cc=intel-xe@lists.freedesktop.org \
    --cc=krishnaiah.bommu@intel.com \
    --cc=matthew.brost@intel.com \
    --cc=niranjana.vishwanathapura@intel.com \
    --cc=oak.zeng@intel.com \
    --cc=saurabhg.gupta@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).