From: "Paul E. McKenney" <paulmck@kernel.org>
To: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Boqun Feng <boqun.feng@gmail.com>, Marco Elver <elver@google.com>,
Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp>,
Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
Dmitry Vyukov <dvyukov@google.com>,
syzbot <syzbot+b7c3ba8cdc2f6cf83c21@syzkaller.appspotmail.com>,
linux-kernel@vger.kernel.org, syzkaller-bugs@googlegroups.com,
Nathan Chancellor <nathan@kernel.org>,
Arnd Bergmann <arnd@kernel.org>,
Al Viro <viro@zeniv.linux.org.uk>,
Jiri Slaby <jirislaby@kernel.org>
Subject: Re: [PATCH v3] tty: tty_io: remove hung_up_tty_fops
Date: Sat, 4 May 2024 15:04:49 -0700 [thread overview]
Message-ID: <37195203-9a13-46aa-9cc0-5effea3c4b0e@paulmck-laptop> (raw)
In-Reply-To: <CAHk-=wi8mArAxxkO78CTSVRCyjim4hpGbzf2NFxNMAdXWR3oJA@mail.gmail.com>
On Sat, May 04, 2024 at 12:11:10PM -0700, Linus Torvalds wrote:
> On Sat, 4 May 2024 at 11:18, Paul E. McKenney <paulmck@kernel.org> wrote:
> >
> > Here is my current thoughts for possible optimizations of non-volatile
> > memory_order_relaxed atomics:
> >
> > o Loads from the same variable that can legitimately be
> > reordered to be adjacent to one another can be fused
> > into a single load.
>
> Let's call this "Rule 1"
>
> I think you can extend this to also "can be forwarded from a previous store".
Agreed, with constraints based on intervening synchronization.
> I also think you're too strict in saying "fused into a single load".
> Let me show an example below.
I certainly did intend to make any errors in the direction of being
too strict.
> > o Stores to the same variable that can legitimately be
> > reordered to be adjacent to one another can be replaced
> > by the last store in the series.
>
> Rule 2.
>
> Ack, although again, I think you're being a bit too strict in your
> language, and the rule can be relaxed.
>
> > o Loads and stores may not be invented.
>
> Rule 3.
>
> I think you can and should relax this. You can invent loads all you want.
I might be misunderstanding you, but given my understanding, I disagree.
Consider this example:
x = __atomic_load(&a, RELAXED);
r0 = x * x + 2 * x + 1;
It would not be good for a load to be invented as follows:
x = __atomic_load(&a, RELAXED);
invented = __atomic_load(&a, RELAXED);
r0 = x * x + 2 * invented + 1;
In the first example, we know that r0 is a perfect square, at least
assuming that x is small enough to avoid wrapping. In the second
example, x might not be equal to the value from the invented load,
and r0 might not be a perfect square.
I believe that we really need the compiler to keep basic arithmetic
working.
That said, I agree that this disallows directly applying current
CSE optimizations, which might make some people sad. But we do need
code to work regardless.
Again, it is quite possible that I am misunderstanding you here.
> > o The only way that a computation based on the value from
> > a given load can instead use some other load is if the
> > two loads are fused into a single load.
>
> Rule 4.
>
> I'm not convinced that makes sense, and I don't think it's true as written.
>
> I think I understand what you are trying to say, but I think you're
> saying it in a way that only confuses a compiler person.
>
> In particular, the case I do not think is true is very much the
> "spill" case: if you have code like this:
>
> a = expression involving '__atomic_load_n(xyz, RELAXED)'
>
> then it's perfectly fine to spill the result of that load and reload
> the value. So the "computation based on the value" *is* actually based
> on "some other load" (the reload).
As in the result is stored to a compiler temporary and then reloaded
from that temporary? Agreed, that would be just fine. In contrast,
spilling and reloading from xyz would not be good at all.
> I really *REALLY* think you need to explain the semantics in concrete
> terms that a compiler writer would understand and agree with.
Experience would indicate that I should not dispute sentence. ;-)
> So to explain your rules to an actual compiler person (and relax the
> semantics a bit) I would rewrite your rules as:
>
> Rule 1: a strictly dominating load can be replaced by the value of a
> preceding load or store
>
> Ruie 2: a strictly dominating store can remove preceding stores
>
> Rule 3: stores cannot be done speculatively (put another way: a
> subsequent dominating store can only *remove* a store entirely, it
> can't turn the store into one with speculative data)
>
> Rule 4: loads cannot be rematerialized (ie a load can be *combined*
> as per Rule 1, but a load cannot be *split* into two loads)
I still believe that synchronization operations need a look-in, and
I am not sure what is being dominated in your Rules 1 and 2 (all
subsequent execution?), but let's proceed.
> Anyway, let's get to the examples of *why* I think your language was
> bad and your rules were too strict.
>
> Let's start with your Rule 3, where you said:
>
> - Loads and stores may not be invented
>
> and while I think this should be very very true for stores, I think
> inventing loads is not only valid, but a good idea. Example:
>
> if (a)
> b = __atomic_load_n(ptr) + 1;
>
> can perfectly validly just be written as
>
> tmp = __atomic_load_n(ptr);
> if (a)
> b = tmp+1;
>
> which in turn may allow other optimizations (ie depending on how 'b'
> is used, the conditional may go away entirely, and you just end up
> with 'b = tmp+!!a').
>
> There's nothing wrong with extra loads that aren't used.
From a functional viewpoint, if the value isn't used, then agreed,
inventing the load is harmless. But there are some code sequences where
I really wouldn't want the extra cache miss.
> And to make that more explicit, let's look at Rule 1:
>
> Example of Rule 1 (together with the above case):
>
> if (a)
> b = __atomic_load_n(ptr) + 1;
> else
> b = __atomic_load_n(ptr) + 2;
> c = __atomic_load_n(ptr) + 3;
>
> then that can perfectly validly re-write this all as
>
> tmp = __atomic_load_n(ptr);
> b = a ? tmp+1 : tmp+2;
> c = tmp + 3;
>
> because my version of Rule 1 allows the dominating load used for 'c'
> to be replaced by the value of a preceding load that was used for 'a'
> and 'b'.
OK, I thought that nodes early in the control-flow graph dominated
nodes that are later in that graph, but I am not a compiler expert.
In any case, I agree with this transformation. This is making three
loads into one load, and there is no intervening synchronization to gum
up the works.
> And to give an example of Rule 2, where you said "reordered to be
> adjacent", I'm saying that all that matters is being strictly
> dominant, so
>
> if (a)
> __atomic_store_n(ptr,1);
> else
> __atomic_store_n(ptr,2);
> __atomic_store_n(ptr,3);
>
> can be perfectly validly be combined into just
>
> __atomic_store_n(ptr,3);
>
> because the third store completely dominates the two others, even if
> in the intermediate form they are not necessarily ever "adjacent".
I agree with this transformation as well. But suppose that the code
also contained an smp_mb() right after that "if" statement. Given that,
it is not hard to construct a larger example in which dropping the first
two stores would be problematic.
> (Your "adjacency" model might still be valid in how you could turn
> first of the first stores to be a fall-through, then remove it, and
> then turn the other to be a fall-through and then remove it, so maybe
> your language isn't _tecnically_ wrong, But I think the whole
> "dominating store" is how a compiler writer would think about it).
I was thinking in terms of first transforming the code as follows:
if (a) {
__atomic_store_n(ptr,1);
__atomic_store_n(ptr,3);
} else {
__atomic_store_n(ptr,2);
__atomic_store_n(ptr,3);
}
(And no, I would not expect a real compiler to do this!)
Then it is clearly OK to further transform into the following:
if (a) {
__atomic_store_n(ptr,3);
} else {
__atomic_store_n(ptr,3);
}
At which point both branches of the "if" statement are doing the
same thing, so:
__atomic_store_n(ptr,3);
On to your next email!
Thanx, Paul
next prev parent reply other threads:[~2024-05-04 22:04 UTC|newest]
Thread overview: 59+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-04-21 8:18 [syzbot] [kernel?] KCSAN: data-race in __fput / __tty_hangup (4) syzbot
2023-04-21 8:21 ` Dmitry Vyukov
2023-04-21 15:12 ` Tetsuo Handa
2023-04-21 16:02 ` Tetsuo Handa
2023-04-23 23:34 ` Al Viro
2023-04-23 23:55 ` Tetsuo Handa
2023-04-24 0:44 ` Al Viro
2023-04-24 1:09 ` Tetsuo Handa
2023-04-25 14:47 ` Tetsuo Handa
2023-04-25 16:03 ` Al Viro
2023-04-25 22:09 ` Tetsuo Handa
2023-04-26 11:05 ` [PATCH] tty: tty_io: remove hung_up_tty_fops Tetsuo Handa
2023-04-28 16:27 ` Nathan Chancellor
2023-04-28 16:41 ` Tetsuo Handa
2023-04-28 17:11 ` Al Viro
2023-04-29 10:43 ` Tetsuo Handa
2023-04-28 17:31 ` Greg Kroah-Hartman
2023-04-29 15:21 ` Guenter Roeck
2023-05-01 18:42 ` Geert Uytterhoeven
2023-05-14 1:02 ` [PATCH v2] " Tetsuo Handa
2023-05-30 10:44 ` Greg Kroah-Hartman
2023-05-30 11:57 ` Tetsuo Handa
2023-05-30 12:51 ` Greg Kroah-Hartman
2024-04-27 6:20 ` [PATCH v3] " Tetsuo Handa
2024-04-27 19:02 ` Linus Torvalds
2024-04-28 10:19 ` Tetsuo Handa
2024-04-28 18:50 ` Linus Torvalds
2024-04-29 13:55 ` Marco Elver
2024-04-29 15:38 ` Linus Torvalds
2024-05-01 18:45 ` Paul E. McKenney
2024-05-01 18:56 ` Linus Torvalds
2024-05-01 19:02 ` Paul E. McKenney
2024-05-01 20:14 ` Marco Elver
2024-05-01 21:06 ` Linus Torvalds
2024-05-01 21:20 ` Linus Torvalds
2024-05-01 21:49 ` Paul E. McKenney
2024-05-01 22:32 ` Paul E. McKenney
2024-05-02 16:37 ` Boqun Feng
2024-05-03 23:59 ` Paul E. McKenney
2024-05-04 0:14 ` Linus Torvalds
2024-05-04 5:08 ` Paul E. McKenney
2024-05-04 17:50 ` Linus Torvalds
2024-05-04 18:18 ` Paul E. McKenney
2024-05-04 19:11 ` Linus Torvalds
2024-05-04 19:25 ` Linus Torvalds
2024-05-04 22:17 ` Paul E. McKenney
2024-05-04 22:04 ` Paul E. McKenney [this message]
2024-05-02 14:14 ` Marco Elver
2024-05-02 16:42 ` Tetsuo Handa
2024-05-02 17:20 ` Marco Elver
2024-05-02 17:29 ` Linus Torvalds
2024-05-02 18:14 ` Al Viro
2024-05-02 19:29 ` Marco Elver
2024-05-02 23:54 ` Tetsuo Handa
2024-05-03 1:12 ` Linus Torvalds
2023-04-23 13:28 ` [syzbot] [kernel?] KCSAN: data-race in __fput / __tty_hangup (4) Tetsuo Handa
2023-04-23 14:00 ` Greg Kroah-Hartman
2023-04-23 14:03 ` Greg Kroah-Hartman
2023-04-23 14:17 ` Tetsuo Handa
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=37195203-9a13-46aa-9cc0-5effea3c4b0e@paulmck-laptop \
--to=paulmck@kernel.org \
--cc=arnd@kernel.org \
--cc=boqun.feng@gmail.com \
--cc=dvyukov@google.com \
--cc=elver@google.com \
--cc=gregkh@linuxfoundation.org \
--cc=jirislaby@kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=nathan@kernel.org \
--cc=penguin-kernel@i-love.sakura.ne.jp \
--cc=syzbot+b7c3ba8cdc2f6cf83c21@syzkaller.appspotmail.com \
--cc=syzkaller-bugs@googlegroups.com \
--cc=torvalds@linux-foundation.org \
--cc=viro@zeniv.linux.org.uk \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).