From: Linus Torvalds <torvalds@linux-foundation.org>
To: Arnd Bergmann <arnd@arndb.de>
Cc: "Maciej W. Rozycki" <macro@orcam.me.uk>,
"Paul E. McKenney" <paulmck@kernel.org>,
John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de>,
Arnd Bergmann <arnd@kernel.org>,
linux-alpha@vger.kernel.org,
Richard Henderson <richard.henderson@linaro.org>,
Ivan Kokshaysky <ink@jurassic.park.msu.ru>,
Matt Turner <mattst88@gmail.com>,
Alexander Viro <viro@zeniv.linux.org.uk>,
Marc Zyngier <maz@kernel.org>,
linux-kernel@vger.kernel.org, Michael Cree <mcree@orcon.net.nz>,
Frank Scheiner <frank.scheiner@web.de>
Subject: Re: [PATCH 00/14] alpha: cleanups for 6.10
Date: Fri, 31 May 2024 09:32:23 -0700 [thread overview]
Message-ID: <CAHk-=wimJ2hQhKSq7+4O1EHtkg7eFBwY+fygxD+6sjWqgyDMTQ@mail.gmail.com> (raw)
In-Reply-To: <4bb50dc0-244a-4781-85ad-9ebc5e59c99a@app.fastmail.com>
On Fri, 31 May 2024 at 08:48, Arnd Bergmann <arnd@arndb.de> wrote:
>
> /* Is this type a native word size -- useful for atomic operations */
> #define __native_word(t) \
> - (sizeof(t) == sizeof(char) || sizeof(t) == sizeof(short) || \
> - sizeof(t) == sizeof(int) || sizeof(t) == sizeof(long))
> + (sizeof(t) == sizeof(int) || sizeof(t) == sizeof(long))
>
> #ifdef __OPTIMIZE__
> # define __compiletime_assert(condition, msg, prefix, suffix) \
>
> The WRITE_ONCE() calls tend to be there in order to avoid
> expensive atomic or locking when something can be expressed
> with a store that known to be visible atomically (on all other
> architectures).
No, if you go down this road, then you would want to do the same thing
we do for READ_ONCE() - but for a different reason - hook into it for
alpha, and add a memory barrier to get rid of the crazy alpha memory
ordering:
/*
* Alpha is apparently daft enough to reorder address-dependent loads
* on some CPU implementations. Knock some common sense into it with
* a memory barrier in READ_ONCE().
*
* For the curious, more information about this unusual reordering is
* available in chapter 15 of the "perfbook":
*
* https://kernel.org/pub/linux/kernel/people/paulmck/perfbook/perfbook.html
*
*/
#define __READ_ONCE(x) \
({ \
__unqual_scalar_typeof(x) __x = \
(*(volatile typeof(__x) *)(&(x))); \
mb(); \
(typeof(x))__x; \
})
and the solution would be to make a __WRITE_ONCE() that then uses
"sizeof()" to decide at compile-time whether it can just do it as a
regular write, or whether it needs to do it as a LL/SC loop.
Because we're definitely not changing hundreds - probably thousands -
of random generic data structures.
That said, the above fixes WRITE_ONCE() without changing the
definition of what a native word size is, but doesn't actually *fix*
the problem.
Let's take a really simple example:
struct net_device {
...
u8 reg_state;
bool dismantle;
enum {
RTNL_LINK_INITIALIZED,
RTNL_LINK_INITIALIZING,
} rtnl_link_state:16;
...
are all in the same 32-bit word, and we intentionally have code
without locking like this:
WRITE_ONCE(dev->reg_state, NETREG_RELEASED);
...
return READ_ONCE(dev->reg_state) <= NETREG_REGISTERED;
because the code knows the state machine ordering requirements (ie
once it has moved past NETREG_REGISTERED, it won't move back).
So now - assuming we fix WRITE_ONCE() to use LL/SC, these READ_ONCE()
and WRITE_ONCE() games work fine on alpha
BUT.
Code that then does something like this:
dev->dismantle = true;
which is all nice and good (accesses are done under the RTNL lock) now
will possibly race with the unlocked reg_state accesses.
So it's still fundamentally buggy.
And before you say "that's why I wanted to fix the __native_word()
definition", please realize that the above happens EVEN WITH the
READ_ONCE/WRITE_ONCE being done on an "int".
Yes, really. The READ_ONCE and WRITE_ONCE will be individual
instructions. But lookie here, if we have
u32 reg_state;
bool dismantle;
and they happen to share the same 8-byte word, and somebody passes
'&dismantle' off to something that does byte writes to it, guess what
the canonical byte write sequence is?
That's right, it looks something like this (excuse any bugs, this is
from memory and looking up the ops in the architecture manual):
LDQ_U tmp,(addr)
INSBL byte,addr,tmp2
MSKBL tmp,addr,tmp
OR tmp,tmp2,tmp
STQ_U tmp,(addr)
and notice how in the process it read and then wrote that supposedly
atomic 'req_state" that was otherwise accessed purely with 32-bit
atomic instructions?
There are no LDL_U/STL_U instructions. The unaligned memory ops are
always 8 bytes wide (you can obviously always do address masking
manually and "emulate" a LDL_U/STL_U model, but then you make already
bad code generation even *worse*).
So no. Even 32-bit values aren't "atomic" in alpha, because of the
complete sh*t-show that is lack of byte ops.
NOTE NOTE NOTE! Note how I said "pass off the address of
'dev->dismantle' to something that does byte ops"? If you *know* the
alignment of the byte in a structure, so you don't just get a random
pointer to a byte, you can - and should - generate better code on
alpha, which may in fact involve just doing a 32-bit load, masking off
the low bits, and doing the 32-bit store.
So that LDQ_U/STQ_U sequence is for the generic case, with various
simpler sub-cases that don't necessarily require it.
The fact is, the original alpha is the worst architecture ever made.
The lack of byte instructions and the absolutely horrendous memory
ordering are fatal flaws. And while the memory ordering arguably had
excuses for it ("they didn't know better"), the lack of byte ops was
wilful misdesign that the designers were proud of, and made a central
tenet of their mess.
And I say that as somebody who *loved* it originally. Yes, the lack of
byte operations always was a pain, because it really caused the IO
subsystem to be a nightmare, but I was young, I was stupid, it was
interesting, and I had bought into the kool aid.
But alpha without BWX really is shit. People think x86 is bad. Those
people have NO CLUE.
Linus
next prev parent reply other threads:[~2024-05-31 16:32 UTC|newest]
Thread overview: 55+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-05-03 8:11 [PATCH 00/14] alpha: cleanups for 6.10 Arnd Bergmann
2024-05-03 8:11 ` [PATCH 01/14] alpha: sort scr_mem{cpy,move}w() out Arnd Bergmann
2024-05-03 8:11 ` [PATCH 02/14] alpha: fix modversions for strcpy() et.al Arnd Bergmann
2024-05-03 8:11 ` [PATCH 03/14] alpha: add clone3() support Arnd Bergmann
2024-05-03 8:11 ` [PATCH 04/14] alpha: don't make functions public without a reason Arnd Bergmann
2024-05-03 8:11 ` [PATCH 05/14] alpha: sys_sio: fix misspelled ifdefs Arnd Bergmann
2024-05-03 8:11 ` [PATCH 06/14] alpha: missing includes Arnd Bergmann
2024-05-03 8:11 ` [PATCH 07/14] alpha: core_lca: take the unused functions out Arnd Bergmann
2024-05-03 8:11 ` [PATCH 08/14] alpha: jensen, t2 - make __EXTERN_INLINE same as for the rest Arnd Bergmann
2024-05-03 8:11 ` [PATCH 09/14] alpha: trim the unused stuff from asm-offsets.c Arnd Bergmann
2024-05-03 8:11 ` [PATCH 10/14] alpha: remove DECpc AXP150 (Jensen) support Arnd Bergmann
2024-05-03 16:07 ` Linus Torvalds
2024-05-03 17:00 ` Al Viro
2024-05-03 20:07 ` Arnd Bergmann
2024-05-03 8:11 ` [PATCH 11/14] alpha: sable: remove early machine support Arnd Bergmann
2024-05-03 8:11 ` [PATCH 12/14] alpha: remove LCA and APECS based machines Arnd Bergmann
2024-05-03 8:11 ` [PATCH 13/14] alpha: cabriolet: remove EV5 CPU support Arnd Bergmann
2024-05-03 8:11 ` [PATCH 14/14] alpha: drop pre-EV56 support Arnd Bergmann
2024-05-04 15:00 ` Richard Henderson
2024-05-06 10:06 ` Arnd Bergmann
2024-06-03 6:02 ` Jiri Slaby
2024-06-04 13:58 ` Greg KH
2024-05-03 16:06 ` [PATCH 00/14] alpha: cleanups for 6.10 Matt Turner
2024-05-03 20:15 ` Arnd Bergmann
2024-05-06 9:16 ` Michael Cree
2024-05-06 10:11 ` Arnd Bergmann
2024-05-03 16:53 ` John Paul Adrian Glaubitz
2024-05-03 17:19 ` Paul E. McKenney
2024-05-27 23:49 ` Maciej W. Rozycki
2024-05-28 14:43 ` Paul E. McKenney
2024-05-29 18:50 ` Maciej W. Rozycki
2024-05-29 22:09 ` Paul E. McKenney
2024-05-30 22:59 ` Maciej W. Rozycki
2024-05-31 3:56 ` Maciej W. Rozycki
2024-05-31 19:33 ` Paul E. McKenney
2024-06-03 16:22 ` Maciej W. Rozycki
2024-06-03 17:08 ` Paul E. McKenney
2024-07-01 23:50 ` Maciej W. Rozycki
2024-05-30 1:08 ` Linus Torvalds
2024-05-30 22:57 ` Maciej W. Rozycki
2024-05-31 0:10 ` Linus Torvalds
2024-06-03 11:09 ` Maciej W. Rozycki
2024-06-03 11:36 ` John Paul Adrian Glaubitz
2024-06-03 16:57 ` Linus Torvalds
2024-07-01 23:48 ` Maciej W. Rozycki
2024-05-31 15:48 ` Arnd Bergmann
2024-05-31 16:32 ` Linus Torvalds [this message]
2024-05-31 16:54 ` Arnd Bergmann
2024-06-01 13:51 ` David Laight
2024-07-01 23:48 ` Maciej W. Rozycki
2024-07-02 1:13 ` Linus Torvalds
2024-07-03 0:12 ` Maciej W. Rozycki
2024-07-03 0:50 ` Linus Torvalds
2024-07-04 22:21 ` Maciej W. Rozycki
2024-06-03 11:33 ` Maciej W. Rozycki
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to='CAHk-=wimJ2hQhKSq7+4O1EHtkg7eFBwY+fygxD+6sjWqgyDMTQ@mail.gmail.com' \
--to=torvalds@linux-foundation.org \
--cc=arnd@arndb.de \
--cc=arnd@kernel.org \
--cc=frank.scheiner@web.de \
--cc=glaubitz@physik.fu-berlin.de \
--cc=ink@jurassic.park.msu.ru \
--cc=linux-alpha@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=macro@orcam.me.uk \
--cc=mattst88@gmail.com \
--cc=maz@kernel.org \
--cc=mcree@orcon.net.nz \
--cc=paulmck@kernel.org \
--cc=richard.henderson@linaro.org \
--cc=viro@zeniv.linux.org.uk \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).