All the mail mirrored from lore.kernel.org
 help / color / mirror / Atom feed
From: Catalin Marinas <catalin.marinas@arm.com>
To: Ard Biesheuvel <ardb@kernel.org>
Cc: "Alex Bennée" <alex.bennee@linaro.org>,
	"Will Deacon" <will@kernel.org>,
	"Hector Martin" <marcan@marcan.st>,
	"Marc Zyngier" <maz@kernel.org>,
	"Mark Rutland" <mark.rutland@arm.com>,
	"Zayd Qumsieh" <zayd_qumsieh@apple.com>,
	"Justin Lu" <ih_justin@apple.com>,
	"Ryan Houdek" <Houdek.Ryan@fex-emu.org>,
	"Mark Brown" <broonie@kernel.org>,
	"Mateusz Guzik" <mjguzik@gmail.com>,
	"Anshuman Khandual" <anshuman.khandual@arm.com>,
	"Oliver Upton" <oliver.upton@linux.dev>,
	"Miguel Luis" <miguel.luis@oracle.com>,
	"Joey Gouly" <joey.gouly@arm.com>,
	"Christoph Paasch" <cpaasch@apple.com>,
	"Kees Cook" <keescook@chromium.org>,
	"Sami Tolvanen" <samitolvanen@google.com>,
	"Baoquan He" <bhe@redhat.com>,
	"Joel Granados" <j.granados@samsung.com>,
	"Dawei Li" <dawei.li@shingroup.cn>,
	"Andrew Morton" <akpm@linux-foundation.org>,
	"Florent Revest" <revest@chromium.org>,
	"David Hildenbrand" <david@redhat.com>,
	"Stefan Roesch" <shr@devkernel.io>,
	"Andy Chiu" <andy.chiu@sifive.com>,
	"Josh Triplett" <josh@joshtriplett.org>,
	"Oleg Nesterov" <oleg@redhat.com>, "Helge Deller" <deller@gmx.de>,
	"Zev Weiss" <zev@bewilderbeest.net>,
	"Ondrej Mosnacek" <omosnace@redhat.com>,
	"Miguel Ojeda" <ojeda@kernel.org>,
	linux-arm-kernel@lists.infradead.org,
	linux-kernel@vger.kernel.org,
	"Asahi Linux" <asahi@lists.linux.dev>
Subject: Re: [PATCH 0/4] arm64: Support the TSO memory model
Date: Thu, 9 May 2024 12:13:31 +0100	[thread overview]
Message-ID: <Zjyv23IuJFrk9Zh0@arm.com> (raw)
In-Reply-To: <CAMj1kXFqG7D2Q_T_NXZ-y3NYOjK6d8bP8ihJTeFz8TUJ77W7tw@mail.gmail.com>

On Tue, May 07, 2024 at 04:52:30PM +0200, Ard Biesheuvel wrote:
> On Tue, 7 May 2024 at 12:24, Alex Bennée <alex.bennee@linaro.org> wrote:
> > I think the main use case here is for emulation. When we run x86-on-arm
> > in QEMU we do currently insert lots of extra barrier instructions on
> > every load and store. If we can probe and set a TSO mode I can assure
> > you we'll do the right thing ;-)
> 
> Without a public specification of what TSO mode actually entails,
> deciding which of those barriers can be dropped is not going to be as
> straight-forward as you make it out to be.
> 
> Apple's TSO mode is vertically integrated with Rosetta, which means
> that TSO mode provides whatever Rosetta needs to run x86 code
> correctly, and that it could mean different things on different
> generations of the micro-architecture. And whether Apple's TSO is the
> same as Fujitsu's is anyone's guess afaik.

Indeed. Apart from using impdef registers, that's what I think is the
second biggest problem with this feature (and the corresponding
patches). We don't know the precise memory model, we can't tell whether
this TSO bit is stored in the TLB. If it is, is it per ASID/VMID? The
other problem Marc raised is what memory model is between two CPUs where
only one has the TSO bit set? Does it only break the TSO model or is
there a chance that it also breaks the default relaxed model? What other
TSO flavours are out there, how do they compare with the Apple one?

> Running a game and seeing it perform better is great, but it is not
> the kind of rigor we usually attempt to apply when adding support for
> architectural features. Hopefully, there will be some architectural
> support for this in the future, but without any spec that defines the
> memory model it implements, I am not convinced we should merge this.

There is FEAT_LRCPC (available on Apple Silicon from M2 onwards). Rather
than having a big knob to turn TSO on or off, this feature introduces
instructions that permit a code generator to get the TSO semantics in a
more efficient way (e.g. using LDAPR+STLR instead of the stricter
LDAR+STLR; not sure how well these are implemented on the Apple
Silicon). There are further improvements in FEAT_LRCPC{2,3} (with the
latter adding support for SIMD but not available in hardware yet). So
the direction from Arm is pretty clear, acknowledging that there is a
need for such TSO emulation but not in the way of undocumented impdef
registers. Whether more is needed here, I guess people working on
emulators could reach out to Arm or CPU vendors with suggestions (the
path to the architects is not straightforward, usually legal has a say,
but it's doable, there are formal channels already).

I see the impdef hardware TSO options as temporary until CPU
implementations catch up to architected FEAT_LRCPC*. Given the problems
already stated in this thread, I think such hacks should be carried
downstream and (hopefully) will eventually vanish. Maybe those TSO knobs
currently make an emulation faster than FEAT_LRCPC* but that's feedback
to go to the microarchitects on the implementation (or architects on
what other instructions should be covered).

-- 
Catalin

WARNING: multiple messages have this Message-ID (diff)
From: Catalin Marinas <catalin.marinas@arm.com>
To: Ard Biesheuvel <ardb@kernel.org>
Cc: "Alex Bennée" <alex.bennee@linaro.org>,
	"Will Deacon" <will@kernel.org>,
	"Hector Martin" <marcan@marcan.st>,
	"Marc Zyngier" <maz@kernel.org>,
	"Mark Rutland" <mark.rutland@arm.com>,
	"Zayd Qumsieh" <zayd_qumsieh@apple.com>,
	"Justin Lu" <ih_justin@apple.com>,
	"Ryan Houdek" <Houdek.Ryan@fex-emu.org>,
	"Mark Brown" <broonie@kernel.org>,
	"Mateusz Guzik" <mjguzik@gmail.com>,
	"Anshuman Khandual" <anshuman.khandual@arm.com>,
	"Oliver Upton" <oliver.upton@linux.dev>,
	"Miguel Luis" <miguel.luis@oracle.com>,
	"Joey Gouly" <joey.gouly@arm.com>,
	"Christoph Paasch" <cpaasch@apple.com>,
	"Kees Cook" <keescook@chromium.org>,
	"Sami Tolvanen" <samitolvanen@google.com>,
	"Baoquan He" <bhe@redhat.com>,
	"Joel Granados" <j.granados@samsung.com>,
	"Dawei Li" <dawei.li@shingroup.cn>,
	"Andrew Morton" <akpm@linux-foundation.org>,
	"Florent Revest" <revest@chromium.org>,
	"David Hildenbrand" <david@redhat.com>,
	"Stefan Roesch" <shr@devkernel.io>,
	"Andy Chiu" <andy.chiu@sifive.com>,
	"Josh Triplett" <josh@joshtriplett.org>,
	"Oleg Nesterov" <oleg@redhat.com>, "Helge Deller" <deller@gmx.de>,
	"Zev Weiss" <zev@bewilderbeest.net>,
	"Ondrej Mosnacek" <omosnace@redhat.com>,
	"Miguel Ojeda" <ojeda@kernel.org>,
	linux-arm-kernel@lists.infradead.org,
	linux-kernel@vger.kernel.org,
	"Asahi Linux" <asahi@lists.linux.dev>
Subject: Re: [PATCH 0/4] arm64: Support the TSO memory model
Date: Thu, 9 May 2024 12:13:31 +0100	[thread overview]
Message-ID: <Zjyv23IuJFrk9Zh0@arm.com> (raw)
In-Reply-To: <CAMj1kXFqG7D2Q_T_NXZ-y3NYOjK6d8bP8ihJTeFz8TUJ77W7tw@mail.gmail.com>

On Tue, May 07, 2024 at 04:52:30PM +0200, Ard Biesheuvel wrote:
> On Tue, 7 May 2024 at 12:24, Alex Bennée <alex.bennee@linaro.org> wrote:
> > I think the main use case here is for emulation. When we run x86-on-arm
> > in QEMU we do currently insert lots of extra barrier instructions on
> > every load and store. If we can probe and set a TSO mode I can assure
> > you we'll do the right thing ;-)
> 
> Without a public specification of what TSO mode actually entails,
> deciding which of those barriers can be dropped is not going to be as
> straight-forward as you make it out to be.
> 
> Apple's TSO mode is vertically integrated with Rosetta, which means
> that TSO mode provides whatever Rosetta needs to run x86 code
> correctly, and that it could mean different things on different
> generations of the micro-architecture. And whether Apple's TSO is the
> same as Fujitsu's is anyone's guess afaik.

Indeed. Apart from using impdef registers, that's what I think is the
second biggest problem with this feature (and the corresponding
patches). We don't know the precise memory model, we can't tell whether
this TSO bit is stored in the TLB. If it is, is it per ASID/VMID? The
other problem Marc raised is what memory model is between two CPUs where
only one has the TSO bit set? Does it only break the TSO model or is
there a chance that it also breaks the default relaxed model? What other
TSO flavours are out there, how do they compare with the Apple one?

> Running a game and seeing it perform better is great, but it is not
> the kind of rigor we usually attempt to apply when adding support for
> architectural features. Hopefully, there will be some architectural
> support for this in the future, but without any spec that defines the
> memory model it implements, I am not convinced we should merge this.

There is FEAT_LRCPC (available on Apple Silicon from M2 onwards). Rather
than having a big knob to turn TSO on or off, this feature introduces
instructions that permit a code generator to get the TSO semantics in a
more efficient way (e.g. using LDAPR+STLR instead of the stricter
LDAR+STLR; not sure how well these are implemented on the Apple
Silicon). There are further improvements in FEAT_LRCPC{2,3} (with the
latter adding support for SIMD but not available in hardware yet). So
the direction from Arm is pretty clear, acknowledging that there is a
need for such TSO emulation but not in the way of undocumented impdef
registers. Whether more is needed here, I guess people working on
emulators could reach out to Arm or CPU vendors with suggestions (the
path to the architects is not straightforward, usually legal has a say,
but it's doable, there are formal channels already).

I see the impdef hardware TSO options as temporary until CPU
implementations catch up to architected FEAT_LRCPC*. Given the problems
already stated in this thread, I think such hacks should be carried
downstream and (hopefully) will eventually vanish. Maybe those TSO knobs
currently make an emulation faster than FEAT_LRCPC* but that's feedback
to go to the microarchitects on the implementation (or architects on
what other instructions should be covered).

-- 
Catalin

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

  reply	other threads:[~2024-05-09 11:13 UTC|newest]

Thread overview: 60+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-04-11  0:51 [PATCH 0/4] arm64: Support the TSO memory model Hector Martin
2024-04-11  0:51 ` Hector Martin
2024-04-11  0:51 ` [PATCH 1/4] prctl: Introduce PR_{SET,GET}_MEM_MODEL Hector Martin
2024-04-11  0:51   ` Hector Martin
2024-04-11  0:51 ` [PATCH 2/4] arm64: Implement PR_{GET,SET}_MEM_MODEL for always-TSO CPUs Hector Martin
2024-04-11  0:51   ` Hector Martin
2024-04-11  0:51 ` [PATCH 3/4] arm64: Introduce scaffolding to add ACTLR_EL1 to thread state Hector Martin
2024-04-11  0:51   ` Hector Martin
2024-04-11  0:51 ` [PATCH 4/4] arm64: Implement Apple IMPDEF TSO memory model control Hector Martin
2024-04-11  0:51   ` Hector Martin
2024-04-11  1:37 ` [PATCH 0/4] arm64: Support the TSO memory model Neal Gompa
2024-04-11  1:37   ` Neal Gompa
2024-04-11 13:28 ` Will Deacon
2024-04-11 13:28   ` Will Deacon
2024-04-11 14:19   ` Hector Martin
2024-04-11 14:19     ` Hector Martin
2024-04-11 18:43     ` Hector Martin
2024-04-11 18:43       ` Hector Martin
2024-04-16  2:22       ` Zayd Qumsieh
2024-04-16  2:22         ` Zayd Qumsieh
2024-04-19 16:58         ` Will Deacon
2024-04-19 16:58           ` Will Deacon
2024-04-19 18:05           ` Catalin Marinas
2024-04-19 18:05             ` Catalin Marinas
2024-04-19 16:58     ` Will Deacon
2024-04-19 16:58       ` Will Deacon
2024-04-20 11:37       ` Marc Zyngier
2024-04-20 11:37         ` Marc Zyngier
2024-05-02  0:10         ` Zayd Qumsieh
2024-05-02  0:10           ` Zayd Qumsieh
2024-05-02 13:25           ` Marc Zyngier
2024-05-02 13:25             ` Marc Zyngier
2024-05-06  8:20             ` Jonas Oberhauser
2024-05-06  8:20               ` Jonas Oberhauser
2024-04-20 12:13       ` Eric Curtin
2024-04-20 12:13         ` Eric Curtin
2024-04-20 12:15         ` Eric Curtin
2024-04-20 12:15           ` Eric Curtin
2024-05-06 11:21         ` Sergio Lopez Pascual
2024-05-06 11:21           ` Sergio Lopez Pascual
2024-05-06 16:12           ` Marc Zyngier
2024-05-06 16:12             ` Marc Zyngier
2024-05-06 16:20             ` Eric Curtin
2024-05-06 16:20               ` Eric Curtin
2024-05-06 22:04             ` Sergio Lopez Pascual
2024-05-06 22:04               ` Sergio Lopez Pascual
2024-05-02  0:16   ` Zayd Qumsieh
2024-05-02  0:16     ` Zayd Qumsieh
2024-05-07 10:24   ` Alex Bennée
2024-05-07 10:24     ` Alex Bennée
2024-05-07 14:52     ` Ard Biesheuvel
2024-05-07 14:52       ` Ard Biesheuvel
2024-05-09 11:13       ` Catalin Marinas [this message]
2024-05-09 11:13         ` Catalin Marinas
2024-05-09 12:31         ` Neal Gompa
2024-05-09 12:31           ` Neal Gompa
2024-05-09 12:56           ` Catalin Marinas
2024-05-09 12:56             ` Catalin Marinas
2024-04-16  2:11 ` Zayd Qumsieh
2024-04-16  2:11   ` Zayd Qumsieh

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Zjyv23IuJFrk9Zh0@arm.com \
    --to=catalin.marinas@arm.com \
    --cc=Houdek.Ryan@fex-emu.org \
    --cc=akpm@linux-foundation.org \
    --cc=alex.bennee@linaro.org \
    --cc=andy.chiu@sifive.com \
    --cc=anshuman.khandual@arm.com \
    --cc=ardb@kernel.org \
    --cc=asahi@lists.linux.dev \
    --cc=bhe@redhat.com \
    --cc=broonie@kernel.org \
    --cc=cpaasch@apple.com \
    --cc=david@redhat.com \
    --cc=dawei.li@shingroup.cn \
    --cc=deller@gmx.de \
    --cc=ih_justin@apple.com \
    --cc=j.granados@samsung.com \
    --cc=joey.gouly@arm.com \
    --cc=josh@joshtriplett.org \
    --cc=keescook@chromium.org \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=marcan@marcan.st \
    --cc=mark.rutland@arm.com \
    --cc=maz@kernel.org \
    --cc=miguel.luis@oracle.com \
    --cc=mjguzik@gmail.com \
    --cc=ojeda@kernel.org \
    --cc=oleg@redhat.com \
    --cc=oliver.upton@linux.dev \
    --cc=omosnace@redhat.com \
    --cc=revest@chromium.org \
    --cc=samitolvanen@google.com \
    --cc=shr@devkernel.io \
    --cc=will@kernel.org \
    --cc=zayd_qumsieh@apple.com \
    --cc=zev@bewilderbeest.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.