LKML Archive mirror
 help / color / mirror / Atom feed
From: "Paul E. McKenney" <paulmck@kernel.org>
To: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Boqun Feng <boqun.feng@gmail.com>, Marco Elver <elver@google.com>,
	Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp>,
	Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
	Dmitry Vyukov <dvyukov@google.com>,
	syzbot <syzbot+b7c3ba8cdc2f6cf83c21@syzkaller.appspotmail.com>,
	linux-kernel@vger.kernel.org, syzkaller-bugs@googlegroups.com,
	Nathan Chancellor <nathan@kernel.org>,
	Arnd Bergmann <arnd@kernel.org>,
	Al Viro <viro@zeniv.linux.org.uk>,
	Jiri Slaby <jirislaby@kernel.org>
Subject: Re: [PATCH v3] tty: tty_io: remove hung_up_tty_fops
Date: Sat, 4 May 2024 15:04:49 -0700	[thread overview]
Message-ID: <37195203-9a13-46aa-9cc0-5effea3c4b0e@paulmck-laptop> (raw)
In-Reply-To: <CAHk-=wi8mArAxxkO78CTSVRCyjim4hpGbzf2NFxNMAdXWR3oJA@mail.gmail.com>

On Sat, May 04, 2024 at 12:11:10PM -0700, Linus Torvalds wrote:
> On Sat, 4 May 2024 at 11:18, Paul E. McKenney <paulmck@kernel.org> wrote:
> >
> > Here is my current thoughts for possible optimizations of non-volatile
> > memory_order_relaxed atomics:
> >
> > o       Loads from the same variable that can legitimately be
> >         reordered to be adjacent to one another can be fused
> >         into a single load.
> 
> Let's call this "Rule 1"
> 
> I think you can extend this to also "can be forwarded from a previous store".

Agreed, with constraints based on intervening synchronization.

> I also think you're too strict in saying "fused into a single load".
> Let me show an example below.

I certainly did intend to make any errors in the direction of being
too strict.

> > o       Stores to the same variable that can legitimately be
> >         reordered to be adjacent to one another can be replaced
> >         by the last store in the series.
> 
> Rule 2.
> 
> Ack, although again, I think you're being a bit too strict in your
> language, and the rule can be relaxed.
> 
> > o       Loads and stores may not be invented.
> 
> Rule 3.
> 
> I think you can and should relax this. You can invent loads all you want.

I might be misunderstanding you, but given my understanding, I disagree.
Consider this example:

	x = __atomic_load(&a, RELAXED);
	r0 = x * x + 2 * x + 1;

It would not be good for a load to be invented as follows:

	x = __atomic_load(&a, RELAXED);
	invented = __atomic_load(&a, RELAXED);
	r0 = x * x + 2 * invented + 1;

In the first example, we know that r0 is a perfect square, at least
assuming that x is small enough to avoid wrapping.  In the second
example, x might not be equal to the value from the invented load,
and r0 might not be a perfect square.

I believe that we really need the compiler to keep basic arithmetic
working.

That said, I agree that this disallows directly applying current
CSE optimizations, which might make some people sad.  But we do need
code to work regardless.

Again, it is quite possible that I am misunderstanding you here.

> > o       The only way that a computation based on the value from
> >         a given load can instead use some other load is if the
> >         two loads are fused into a single load.
> 
> Rule 4.
> 
> I'm not convinced that makes sense, and I don't think it's true as written.
> 
> I think I understand what you are trying to say, but I think you're
> saying it in a way that only confuses a compiler person.
> 
> In particular, the case I do not think is true is very much the
> "spill" case: if you have code like this:
> 
>     a = expression involving '__atomic_load_n(xyz, RELAXED)'
> 
> then it's perfectly fine to spill the result of that load and reload
> the value. So the "computation based on the value" *is* actually based
> on "some other load" (the reload).

As in the result is stored to a compiler temporary and then reloaded
from that temporary?  Agreed, that would be just fine.  In contrast,
spilling and reloading from xyz would not be good at all.

> I really *REALLY* think you need to explain the semantics in concrete
> terms that a compiler writer would understand and agree with.

Experience would indicate that I should not dispute sentence.  ;-)

> So to explain your rules to an actual compiler person (and relax the
> semantics a bit) I would rewrite your rules as:
> 
>  Rule 1: a strictly dominating load can be replaced by the value of a
> preceding load or store
> 
>  Ruie 2: a strictly dominating store can remove preceding stores
> 
>  Rule 3: stores cannot be done speculatively (put another way: a
> subsequent dominating store can only *remove* a store entirely, it
> can't turn the store into one with speculative data)
> 
>  Rule 4: loads cannot be rematerialized (ie a load can be *combined*
> as per Rule 1, but a load cannot be *split* into two loads)

I still believe that synchronization operations need a look-in, and
I am not sure what is being dominated in your Rules 1 and 2 (all
subsequent execution?), but let's proceed.

> Anyway, let's get to the examples of *why* I think your language was
> bad and your rules were too strict.
> 
> Let's start with your Rule 3, where you said:
> 
>  - Loads and stores may not be invented
> 
> and while I think this should be very very true for stores, I think
> inventing loads is not only valid, but a good idea. Example:
> 
>     if (a)
>         b = __atomic_load_n(ptr) + 1;
> 
> can perfectly validly just be written as
> 
>     tmp = __atomic_load_n(ptr);
>     if (a)
>         b = tmp+1;
> 
> which in turn may allow other optimizations (ie depending on how 'b'
> is used, the conditional may go away entirely, and you just end up
> with 'b = tmp+!!a').
> 
> There's nothing wrong with extra loads that aren't used.

From a functional viewpoint, if the value isn't used, then agreed,
inventing the load is harmless.  But there are some code sequences where
I really wouldn't want the extra cache miss.

> And to make that more explicit, let's look at Rule 1:
> 
> Example of Rule 1 (together with the above case):
> 
>     if (a)
>         b = __atomic_load_n(ptr) + 1;
>     else
>         b =  __atomic_load_n(ptr) + 2;
>     c = __atomic_load_n(ptr) + 3;
> 
> then that can perfectly validly re-write this all as
> 
>     tmp = __atomic_load_n(ptr);
>     b = a ? tmp+1 : tmp+2;
>     c = tmp + 3;
> 
> because my version of Rule 1 allows the dominating load used for 'c'
> to be replaced by the value of a preceding load that was used for 'a'
> and 'b'.

OK, I thought that nodes early in the control-flow graph dominated
nodes that are later in that graph, but I am not a compiler expert.

In any case, I agree with this transformation.  This is making three
loads into one load, and there is no intervening synchronization to gum
up the works.

> And to give an example of Rule 2, where you said "reordered to be
> adjacent", I'm saying that all that matters is being strictly
> dominant, so
> 
>     if (a)
>         __atomic_store_n(ptr,1);
>     else
>         __atomic_store_n(ptr,2);
>     __atomic_store_n(ptr,3);
> 
> can be perfectly validly be combined into just
> 
>     __atomic_store_n(ptr,3);
> 
> because the third store completely dominates the two others, even if
> in the intermediate form they are not necessarily ever "adjacent".

I agree with this transformation as well.  But suppose that the code
also contained an smp_mb() right after that "if" statement.  Given that,
it is not hard to construct a larger example in which dropping the first
two stores would be problematic.

> (Your "adjacency" model might still be valid in how you could turn
> first of the first stores to be a fall-through, then remove it, and
> then turn the other to be a fall-through and then remove it, so maybe
> your language isn't _tecnically_ wrong, But I think the whole
> "dominating store" is how a compiler writer would think about it).

I was thinking in terms of first transforming the code as follows:

	if (a) {
		__atomic_store_n(ptr,1);
		__atomic_store_n(ptr,3);
	} else {
		__atomic_store_n(ptr,2);
		__atomic_store_n(ptr,3);
	}

(And no, I would not expect a real compiler to do this!)

Then it is clearly OK to further transform into the following:

	if (a) {
		__atomic_store_n(ptr,3);
	} else {
		__atomic_store_n(ptr,3);
	}

At which point both branches of the "if" statement are doing the
same thing, so:

	__atomic_store_n(ptr,3);

On to your next email!

							Thanx, Paul

  parent reply	other threads:[~2024-05-04 22:04 UTC|newest]

Thread overview: 59+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-04-21  8:18 [syzbot] [kernel?] KCSAN: data-race in __fput / __tty_hangup (4) syzbot
2023-04-21  8:21 ` Dmitry Vyukov
2023-04-21 15:12   ` Tetsuo Handa
2023-04-21 16:02     ` Tetsuo Handa
2023-04-23 23:34     ` Al Viro
2023-04-23 23:55       ` Tetsuo Handa
2023-04-24  0:44         ` Al Viro
2023-04-24  1:09           ` Tetsuo Handa
2023-04-25 14:47             ` Tetsuo Handa
2023-04-25 16:03               ` Al Viro
2023-04-25 22:09                 ` Tetsuo Handa
2023-04-26 11:05                   ` [PATCH] tty: tty_io: remove hung_up_tty_fops Tetsuo Handa
2023-04-28 16:27                     ` Nathan Chancellor
2023-04-28 16:41                       ` Tetsuo Handa
2023-04-28 17:11                         ` Al Viro
2023-04-29 10:43                           ` Tetsuo Handa
2023-04-28 17:31                         ` Greg Kroah-Hartman
2023-04-29 15:21                           ` Guenter Roeck
2023-05-01 18:42                             ` Geert Uytterhoeven
2023-05-14  1:02                     ` [PATCH v2] " Tetsuo Handa
2023-05-30 10:44                       ` Greg Kroah-Hartman
2023-05-30 11:57                         ` Tetsuo Handa
2023-05-30 12:51                           ` Greg Kroah-Hartman
2024-04-27  6:20                             ` [PATCH v3] " Tetsuo Handa
2024-04-27 19:02                               ` Linus Torvalds
2024-04-28 10:19                                 ` Tetsuo Handa
2024-04-28 18:50                                   ` Linus Torvalds
2024-04-29 13:55                                     ` Marco Elver
2024-04-29 15:38                                       ` Linus Torvalds
2024-05-01 18:45                                         ` Paul E. McKenney
2024-05-01 18:56                                           ` Linus Torvalds
2024-05-01 19:02                                             ` Paul E. McKenney
2024-05-01 20:14                                               ` Marco Elver
2024-05-01 21:06                                                 ` Linus Torvalds
2024-05-01 21:20                                                   ` Linus Torvalds
2024-05-01 21:49                                                     ` Paul E. McKenney
2024-05-01 22:32                                                       ` Paul E. McKenney
2024-05-02 16:37                                                         ` Boqun Feng
2024-05-03 23:59                                                           ` Paul E. McKenney
2024-05-04  0:14                                                             ` Linus Torvalds
2024-05-04  5:08                                                               ` Paul E. McKenney
2024-05-04 17:50                                                                 ` Linus Torvalds
2024-05-04 18:18                                                                   ` Paul E. McKenney
2024-05-04 19:11                                                                     ` Linus Torvalds
2024-05-04 19:25                                                                       ` Linus Torvalds
2024-05-04 22:17                                                                         ` Paul E. McKenney
2024-05-04 22:04                                                                       ` Paul E. McKenney [this message]
2024-05-02 14:14                                                   ` Marco Elver
2024-05-02 16:42                                                     ` Tetsuo Handa
2024-05-02 17:20                                                       ` Marco Elver
2024-05-02 17:29                                                       ` Linus Torvalds
2024-05-02 18:14                                                         ` Al Viro
2024-05-02 19:29                                                           ` Marco Elver
2024-05-02 23:54                                                         ` Tetsuo Handa
2024-05-03  1:12                                                           ` Linus Torvalds
2023-04-23 13:28   ` [syzbot] [kernel?] KCSAN: data-race in __fput / __tty_hangup (4) Tetsuo Handa
2023-04-23 14:00     ` Greg Kroah-Hartman
2023-04-23 14:03     ` Greg Kroah-Hartman
2023-04-23 14:17       ` Tetsuo Handa

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=37195203-9a13-46aa-9cc0-5effea3c4b0e@paulmck-laptop \
    --to=paulmck@kernel.org \
    --cc=arnd@kernel.org \
    --cc=boqun.feng@gmail.com \
    --cc=dvyukov@google.com \
    --cc=elver@google.com \
    --cc=gregkh@linuxfoundation.org \
    --cc=jirislaby@kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=nathan@kernel.org \
    --cc=penguin-kernel@i-love.sakura.ne.jp \
    --cc=syzbot+b7c3ba8cdc2f6cf83c21@syzkaller.appspotmail.com \
    --cc=syzkaller-bugs@googlegroups.com \
    --cc=torvalds@linux-foundation.org \
    --cc=viro@zeniv.linux.org.uk \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).