Re: [MAINTAINER SUMMIT] User-space requirements for accelerator drivers

Linux Kernel Summit discussions
 help / color / mirror / Atom feed

From: Daniel Vetter <daniel.vetter@ffwll.ch>
To: Arnd Bergmann <arnd@arndb.de>
Cc: Dave Airlie <airlied@gmail.com>,
	Linus Walleij <linus.walleij@linaro.org>,
	 Greg KH <greg@kroah.com>, Leon Romanovsky <leon@kernel.org>,
	 Laurent Pinchart <laurent.pinchart@ideasonboard.com>,
	Thomas Gleixner <tglx@linutronix.de>,
	 Josh Triplett <josh@joshtriplett.org>,
	Mauro Carvalho Chehab <mchehab@kernel.org>,
	 Jonathan Corbet <corbet@lwn.net>,
	ksummit@lists.linux.dev, dev@tvm.apache.org
Subject: Re: [MAINTAINER SUMMIT] User-space requirements for accelerator drivers
Date: Tue, 14 Sep 2021 21:45:42 +0200	[thread overview]
Message-ID: <CAKMK7uHxbN8DQkgQn6QyjEWijRKnK9p0eqcHOTr1D9D0F=3M6g@mail.gmail.com> (raw)
In-Reply-To: <CAK8P3a2REUBb9yr4c2W2txwX4Ki3aOb2x1SiWhMkWb+5Gk7Qfw@mail.gmail.com>

On Tue, Sep 14, 2021 at 2:58 PM Arnd Bergmann <arnd@arndb.de> wrote:
> On Tue, Sep 14, 2021 at 11:23 AM Daniel Vetter <daniel.vetter@ffwll.ch> wrote:
> > On Tue, Sep 14, 2021 at 11:09 AM Arnd Bergmann <arnd@arndb.de> wrote:
>
> > > I can see two reasons why one would want to support a type (a)
> > > interface even with the more versatile devices:
> > >
> > > - It can be done in a generic way so that simply adding a kernel
> > >   driver and loading some firmware into it makes existing user space
> > >   software work out of the box.
> > >
> > > - It gives the manufacturer a way to get an upstream kernel driver
> > >   without open sourcing their firmware (a.k.a. compiler and user
> > >   space driver). Whether you consider this a good or bad thing is
> > >   of course a matter of perspective.
> >
> > I think for some embedded use-case this makes sense, especially around
> > media stuff.
> >
> > I don't think it's BLAS, because on the compute side you really want a
> > compiler that sees through the entire thing and can optimize it. Afaik
> > BLAS is for some quick prototype of matrix algorithms and most
> > importantly, for the top500 list :-)
>
> It's probably not the only thing you need, but I would assume something
> like sgemm and its variants are one of the building blocks you'd need
> in this kind of interface. Note that oneDNN also comes with a
> simplified interface similar to gemm[1] as well as straight wrapper around
> gemm itself.
>
> There are definitely frameworks that are successfully built just on top
> of NumPy and blas (with NumPy itself being built on top of blas).
> I used to make fun of linpack as the supercomputer benchmark that
> has no practical use, but in the end it does spend most of its time in
> the SGEMM function that is the most optimized algorithm in the world
> and that is also where you end up spending your cycles in many AI
> applications. I found a link to this blog post[2] explaining why this is still
> used everywhere, and this matches what I've seen elsewhere, but
> unlike me, the author seems to know what they are talking about ;-)
>
> To get back to my own question from earlier about which part of oneAPI
> is actually being used, I see that pytorch (to pick a common framework)
> can use either mkl (oneMKL, BLAS) or mkldnn (dnnl, oneDNN) as a backend,
> next to cuda, cudnn, openmp and certainly a number of third-party
> backends.
>
> The mkl backend seems to mostly be a wrapper around cblas_*gemm(),
> though I may be reading that wrong.
> The oneDNN backend operates on a higher level, calling into a
> subset of the oneDNN interfaces. The other frameworks I looked at
> (mxnet, tensorflow) look similar, probably each using other subsets of
> oneDNN.

Hm I didn't know that in practice it's all just matrix multiplies in
AI land too. I thought there's more fun going on here, but I guess as
long as you have dense (enough) networks it's fully limited by the
matrix multiply step and nothing else matters. Thanks for the
references.

I still dont think BLAS is what you want, except for a very specific
NPU thing in a soc maybe that can't do anything else than actually
matrix multiplies in hw. The reason is that vendors are most likely
not going to give you the optimized kernels, and the dumb kernels are
very boring (just multiply-add in a loop). So for anything somewhat
programmable you want want level below that, or it's just not very
interesting as userspace demonstraction vehicle for your kernel
interface. Also there's generally quite some featurs in the command
streamer (inter-engine sync as just one example), so a gemm ioctl call
(or whatever you pick from blas) is definitely not what you want for
anything that has a command streamer in hw.

But I guess for the various NPUs that pop up in socs all over a
limited blas interface with documentation might be good enough.

> > > > We have lots of fixed function on GPUs, video codecs are on most x86
> > > > GPUs. It's how you program them that matters, most of them are behind
> > > > queues similar to the 3D engine, so you program them the same way.
> > >
> > > So these would go through /dev/dri instead of /dev/media0? I can definitely
> > > see a lot of codec drivers in the kernel that use a /dev/media interfaces,
> > > and the tradeoffs between those two seem very similar to the tradeoffs
> > > you get for machine learning accelerators.
> >
> > Yeah we have plenty of codes running on top of /dev/dri0, with all the
> > magic in userspace.
> >
> > They are all very far away from anything that is a machine learning accelerator.
>
> Sure, I only meant the relation between dri codecs and media codecs
> is similar to the relation between the ways one can implement the AI
> accelerator APIs.
>
> > Yeah for those I think a more fixed uapi like drivers/media has a lot
> > of makes sense. What I don't like is when vendors then use that excuse
> > of "oh you only upload a fixed model at boot" to shovel in an acccel
> > driver with full generic interface, but not all the userspace
> > bits&pieces. There's unfortunately another accel driver in
> > drivers/misc for qualcom soc, which really should be either a media
> > driver (for the fixed function use-case) or a drm driver (for the
> > fully programmable) use-case.
>
> I would argue that for the fixed-function use case, the media subsystem
> isn't a great fit either. It would probably work just as well (as would the
> crypto subsystem), but having a distinct interface that does just
> one thing makes more sense conceptually, if only to make it clear
> where to look for such drivers and to have a consistent interface
> documentation.

Yeah for tiny soc NPU a fixed interface might work out. Would need
some benchmarking to check the ioctl overhead isn't too bad, I guess
worst case the new uring ioctl stuff could be used for real fast
dispatch. I've seen an nvida npu (but not sure that shipped anywhere)
and the arm npu that Linus mentioned somewhere else with open enough
drivers to make this possible.
-Daniel

> > I think for the fixed-function interface case you can also make a
> > reasonable argument that just documenting that fixed interface and all
> > the parameters is good enough. But as soon as the interface becomes a
> > generic "submit workload" style thing because you want to make it work
> > for an entire set of "firmware" compiled by your closed stack, that's
> > out of the window.
>
> Right, agreed. If we add a fixed-function interface, that should ideally
> not allow any vendor specific extensions at all, just a set of well-defined
> operations, and certainly not a bypass mode that gets used to
> send compiled binaries.
>
>        Arnd
>
> [1] https://oneapi-src.github.io/oneDNN/dev_guide_matmul.html
> [1] https://petewarden.com/2015/04/20/why-gemm-is-at-the-heart-of-deep-learning/



-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch

next prev parent reply	other threads:[~2021-09-14 19:45 UTC|newest]

Thread overview: 77+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-09-10 21:00 [MAINTAINER SUMMIT] User-space requirements for accelerator drivers Jonathan Corbet
2021-09-10 21:32 ` Josh Triplett
2021-09-13 13:50   ` Christian Brauner
2021-09-13 13:57     ` Daniel Vetter
2021-09-14  2:07       ` Laurent Pinchart
2021-09-14 14:40   ` Jani Nikula
2021-09-14 14:45     ` Geert Uytterhoeven
2021-09-14 14:59       ` Jani Nikula
2021-09-14 15:10         ` Geert Uytterhoeven
2021-09-10 21:51 ` James Bottomley
2021-09-10 21:59   ` Alexandre Belloni
2021-09-10 22:35     ` James Bottomley
2021-09-11 14:51       ` Jonathan Corbet
2021-09-11 15:24         ` James Bottomley
2021-09-11 21:52           ` Laurent Pinchart
2021-09-14 13:22             ` Johannes Berg
2021-09-11  0:08   ` Laurent Pinchart
2021-09-10 22:52 ` Mauro Carvalho Chehab
2021-09-10 23:45   ` Josh Triplett
2021-09-10 23:48     ` Dave Hansen
2021-09-11  0:13       ` Laurent Pinchart
2021-09-10 23:55     ` Thomas Gleixner
2021-09-11  0:20       ` Laurent Pinchart
2021-09-11 14:20         ` Steven Rostedt
2021-09-11 22:08           ` Laurent Pinchart
2021-09-11 22:42             ` Steven Rostedt
2021-09-11 23:10               ` Laurent Pinchart
2021-09-13 11:10               ` Mark Brown
2021-09-11 22:51           ` Mauro Carvalho Chehab
2021-09-11 23:22           ` Mauro Carvalho Chehab
2021-09-11 10:31       ` Leon Romanovsky
2021-09-11 11:41         ` Laurent Pinchart
2021-09-11 12:04           ` Leon Romanovsky
2021-09-11 22:04             ` Laurent Pinchart
2021-09-12  4:27               ` Leon Romanovsky
2021-09-12  7:26                 ` Greg KH
2021-09-12  8:29                   ` Leon Romanovsky
2021-09-12 13:25                     ` Greg KH
2021-09-12 14:15                       ` Leon Romanovsky
2021-09-12 14:34                         ` Greg KH
2021-09-12 16:41                           ` Laurent Pinchart
2021-09-12 20:35                           ` Dave Airlie
2021-09-12 20:41                           ` Dave Airlie
2021-09-12 20:49                             ` Daniel Vetter
2021-09-12 21:12                               ` Dave Airlie
2021-09-12 22:51                                 ` Linus Walleij
2021-09-12 23:15                                   ` Dave Airlie
2021-09-13 13:20                                   ` Arnd Bergmann
2021-09-13 13:54                                     ` Daniel Vetter
2021-09-13 22:04                                       ` Arnd Bergmann
2021-09-13 23:33                                         ` Dave Airlie
2021-09-14  9:08                                           ` Arnd Bergmann
2021-09-14  9:23                                             ` Daniel Vetter
2021-09-14 10:47                                               ` Laurent Pinchart
2021-09-14 12:58                                               ` Arnd Bergmann
2021-09-14 19:45                                                 ` Daniel Vetter [this message]
2021-09-14 15:43                                             ` Luck, Tony
2021-09-13 14:52                                     ` James Bottomley
2021-09-14 13:07                                     ` Linus Walleij
2021-09-13 14:03                           ` Mark Brown
2021-09-12 15:55                       ` Laurent Pinchart
2021-09-12 16:43                         ` James Bottomley
2021-09-12 16:58                           ` Laurent Pinchart
2021-09-12 17:08                             ` James Bottomley
2021-09-12 19:52                   ` Dave Airlie
2021-09-12  7:46                 ` Mauro Carvalho Chehab
2021-09-12  8:00                   ` Leon Romanovsky
2021-09-12 14:53                     ` Laurent Pinchart
2021-09-12 15:41                       ` Mauro Carvalho Chehab
2021-09-10 23:46   ` Laurent Pinchart
2021-09-11  0:38     ` Mauro Carvalho Chehab
2021-09-11  9:27       ` Laurent Pinchart
2021-09-11 22:33         ` Mauro Carvalho Chehab
2021-09-13 12:04         ` Mark Brown
2021-09-12 19:13 ` Dave Airlie
2021-09-12 19:48   ` Laurent Pinchart
2021-09-13  2:26     ` Dave Airlie

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAKMK7uHxbN8DQkgQn6QyjEWijRKnK9p0eqcHOTr1D9D0F=3M6g@mail.gmail.com' \
    --to=daniel.vetter@ffwll.ch \
    --cc=airlied@gmail.com \
    --cc=arnd@arndb.de \
    --cc=corbet@lwn.net \
    --cc=dev@tvm.apache.org \
    --cc=greg@kroah.com \
    --cc=josh@joshtriplett.org \
    --cc=ksummit@lists.linux.dev \
    --cc=laurent.pinchart@ideasonboard.com \
    --cc=leon@kernel.org \
    --cc=linus.walleij@linaro.org \
    --cc=mchehab@kernel.org \
    --cc=tglx@linutronix.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).