All the mail mirrored from lore.kernel.org
 help / color / mirror / Atom feed
From: "Paul E. McKenney" <paulmck@kernel.org>
To: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Boqun Feng <boqun.feng@gmail.com>, Marco Elver <elver@google.com>,
	Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp>,
	Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
	Dmitry Vyukov <dvyukov@google.com>,
	syzbot <syzbot+b7c3ba8cdc2f6cf83c21@syzkaller.appspotmail.com>,
	linux-kernel@vger.kernel.org, syzkaller-bugs@googlegroups.com,
	Nathan Chancellor <nathan@kernel.org>,
	Arnd Bergmann <arnd@kernel.org>,
	Al Viro <viro@zeniv.linux.org.uk>,
	Jiri Slaby <jirislaby@kernel.org>
Subject: Re: [PATCH v3] tty: tty_io: remove hung_up_tty_fops
Date: Sat, 4 May 2024 15:04:49 -0700	[thread overview]
Message-ID: <37195203-9a13-46aa-9cc0-5effea3c4b0e@paulmck-laptop> (raw)
In-Reply-To: <CAHk-=wi8mArAxxkO78CTSVRCyjim4hpGbzf2NFxNMAdXWR3oJA@mail.gmail.com>

On Sat, May 04, 2024 at 12:11:10PM -0700, Linus Torvalds wrote:
> On Sat, 4 May 2024 at 11:18, Paul E. McKenney <paulmck@kernel.org> wrote:
> >
> > Here is my current thoughts for possible optimizations of non-volatile
> > memory_order_relaxed atomics:
> >
> > o       Loads from the same variable that can legitimately be
> >         reordered to be adjacent to one another can be fused
> >         into a single load.
> 
> Let's call this "Rule 1"
> 
> I think you can extend this to also "can be forwarded from a previous store".

Agreed, with constraints based on intervening synchronization.

> I also think you're too strict in saying "fused into a single load".
> Let me show an example below.

I certainly did intend to make any errors in the direction of being
too strict.

> > o       Stores to the same variable that can legitimately be
> >         reordered to be adjacent to one another can be replaced
> >         by the last store in the series.
> 
> Rule 2.
> 
> Ack, although again, I think you're being a bit too strict in your
> language, and the rule can be relaxed.
> 
> > o       Loads and stores may not be invented.
> 
> Rule 3.
> 
> I think you can and should relax this. You can invent loads all you want.

I might be misunderstanding you, but given my understanding, I disagree.
Consider this example:

	x = __atomic_load(&a, RELAXED);
	r0 = x * x + 2 * x + 1;

It would not be good for a load to be invented as follows:

	x = __atomic_load(&a, RELAXED);
	invented = __atomic_load(&a, RELAXED);
	r0 = x * x + 2 * invented + 1;

In the first example, we know that r0 is a perfect square, at least
assuming that x is small enough to avoid wrapping.  In the second
example, x might not be equal to the value from the invented load,
and r0 might not be a perfect square.

I believe that we really need the compiler to keep basic arithmetic
working.

That said, I agree that this disallows directly applying current
CSE optimizations, which might make some people sad.  But we do need
code to work regardless.

Again, it is quite possible that I am misunderstanding you here.

> > o       The only way that a computation based on the value from
> >         a given load can instead use some other load is if the
> >         two loads are fused into a single load.
> 
> Rule 4.
> 
> I'm not convinced that makes sense, and I don't think it's true as written.
> 
> I think I understand what you are trying to say, but I think you're
> saying it in a way that only confuses a compiler person.
> 
> In particular, the case I do not think is true is very much the
> "spill" case: if you have code like this:
> 
>     a = expression involving '__atomic_load_n(xyz, RELAXED)'
> 
> then it's perfectly fine to spill the result of that load and reload
> the value. So the "computation based on the value" *is* actually based
> on "some other load" (the reload).

As in the result is stored to a compiler temporary and then reloaded
from that temporary?  Agreed, that would be just fine.  In contrast,
spilling and reloading from xyz would not be good at all.

> I really *REALLY* think you need to explain the semantics in concrete
> terms that a compiler writer would understand and agree with.

Experience would indicate that I should not dispute sentence.  ;-)

> So to explain your rules to an actual compiler person (and relax the
> semantics a bit) I would rewrite your rules as:
> 
>  Rule 1: a strictly dominating load can be replaced by the value of a
> preceding load or store
> 
>  Ruie 2: a strictly dominating store can remove preceding stores
> 
>  Rule 3: stores cannot be done speculatively (put another way: a
> subsequent dominating store can only *remove* a store entirely, it
> can't turn the store into one with speculative data)
> 
>  Rule 4: loads cannot be rematerialized (ie a load can be *combined*
> as per Rule 1, but a load cannot be *split* into two loads)

I still believe that synchronization operations need a look-in, and
I am not sure what is being dominated in your Rules 1 and 2 (all
subsequent execution?), but let's proceed.

> Anyway, let's get to the examples of *why* I think your language was
> bad and your rules were too strict.
> 
> Let's start with your Rule 3, where you said:
> 
>  - Loads and stores may not be invented
> 
> and while I think this should be very very true for stores, I think
> inventing loads is not only valid, but a good idea. Example:
> 
>     if (a)
>         b = __atomic_load_n(ptr) + 1;
> 
> can perfectly validly just be written as
> 
>     tmp = __atomic_load_n(ptr);
>     if (a)
>         b = tmp+1;
> 
> which in turn may allow other optimizations (ie depending on how 'b'
> is used, the conditional may go away entirely, and you just end up
> with 'b = tmp+!!a').
> 
> There's nothing wrong with extra loads that aren't used.

From a functional viewpoint, if the value isn't used, then agreed,
inventing the load is harmless.  But there are some code sequences where
I really wouldn't want the extra cache miss.

> And to make that more explicit, let's look at Rule 1:
> 
> Example of Rule 1 (together with the above case):
> 
>     if (a)
>         b = __atomic_load_n(ptr) + 1;
>     else
>         b =  __atomic_load_n(ptr) + 2;
>     c = __atomic_load_n(ptr) + 3;
> 
> then that can perfectly validly re-write this all as
> 
>     tmp = __atomic_load_n(ptr);
>     b = a ? tmp+1 : tmp+2;
>     c = tmp + 3;
> 
> because my version of Rule 1 allows the dominating load used for 'c'
> to be replaced by the value of a preceding load that was used for 'a'
> and 'b'.

OK, I thought that nodes early in the control-flow graph dominated
nodes that are later in that graph, but I am not a compiler expert.

In any case, I agree with this transformation.  This is making three
loads into one load, and there is no intervening synchronization to gum
up the works.

> And to give an example of Rule 2, where you said "reordered to be
> adjacent", I'm saying that all that matters is being strictly
> dominant, so
> 
>     if (a)
>         __atomic_store_n(ptr,1);
>     else
>         __atomic_store_n(ptr,2);
>     __atomic_store_n(ptr,3);
> 
> can be perfectly validly be combined into just
> 
>     __atomic_store_n(ptr,3);
> 
> because the third store completely dominates the two others, even if
> in the intermediate form they are not necessarily ever "adjacent".

I agree with this transformation as well.  But suppose that the code
also contained an smp_mb() right after that "if" statement.  Given that,
it is not hard to construct a larger example in which dropping the first
two stores would be problematic.

> (Your "adjacency" model might still be valid in how you could turn
> first of the first stores to be a fall-through, then remove it, and
> then turn the other to be a fall-through and then remove it, so maybe
> your language isn't _tecnically_ wrong, But I think the whole
> "dominating store" is how a compiler writer would think about it).

I was thinking in terms of first transforming the code as follows:

	if (a) {
		__atomic_store_n(ptr,1);
		__atomic_store_n(ptr,3);
	} else {
		__atomic_store_n(ptr,2);
		__atomic_store_n(ptr,3);
	}

(And no, I would not expect a real compiler to do this!)

Then it is clearly OK to further transform into the following:

	if (a) {
		__atomic_store_n(ptr,3);
	} else {
		__atomic_store_n(ptr,3);
	}

At which point both branches of the "if" statement are doing the
same thing, so:

	__atomic_store_n(ptr,3);

On to your next email!

							Thanx, Paul

  parent reply	other threads:[~2024-05-04 22:04 UTC|newest]

Thread overview: 59+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-04-21  8:18 [syzbot] [kernel?] KCSAN: data-race in __fput / __tty_hangup (4) syzbot
2023-04-21  8:21 ` Dmitry Vyukov
2023-04-21 15:12   ` Tetsuo Handa
2023-04-21 16:02     ` Tetsuo Handa
2023-04-23 23:34     ` Al Viro
2023-04-23 23:55       ` Tetsuo Handa
2023-04-24  0:44         ` Al Viro
2023-04-24  1:09           ` Tetsuo Handa
2023-04-25 14:47             ` Tetsuo Handa
2023-04-25 16:03               ` Al Viro
2023-04-25 22:09                 ` Tetsuo Handa
2023-04-26 11:05                   ` [PATCH] tty: tty_io: remove hung_up_tty_fops Tetsuo Handa
2023-04-28 16:27                     ` Nathan Chancellor
2023-04-28 16:41                       ` Tetsuo Handa
2023-04-28 17:11                         ` Al Viro
2023-04-29 10:43                           ` Tetsuo Handa
2023-04-28 17:31                         ` Greg Kroah-Hartman
2023-04-29 15:21                           ` Guenter Roeck
2023-05-01 18:42                             ` Geert Uytterhoeven
2023-05-14  1:02                     ` [PATCH v2] " Tetsuo Handa
2023-05-30 10:44                       ` Greg Kroah-Hartman
2023-05-30 11:57                         ` Tetsuo Handa
2023-05-30 12:51                           ` Greg Kroah-Hartman
2024-04-27  6:20                             ` [PATCH v3] " Tetsuo Handa
2024-04-27 19:02                               ` Linus Torvalds
2024-04-28 10:19                                 ` Tetsuo Handa
2024-04-28 18:50                                   ` Linus Torvalds
2024-04-29 13:55                                     ` Marco Elver
2024-04-29 15:38                                       ` Linus Torvalds
2024-05-01 18:45                                         ` Paul E. McKenney
2024-05-01 18:56                                           ` Linus Torvalds
2024-05-01 19:02                                             ` Paul E. McKenney
2024-05-01 20:14                                               ` Marco Elver
2024-05-01 21:06                                                 ` Linus Torvalds
2024-05-01 21:20                                                   ` Linus Torvalds
2024-05-01 21:49                                                     ` Paul E. McKenney
2024-05-01 22:32                                                       ` Paul E. McKenney
2024-05-02 16:37                                                         ` Boqun Feng
2024-05-03 23:59                                                           ` Paul E. McKenney
2024-05-04  0:14                                                             ` Linus Torvalds
2024-05-04  5:08                                                               ` Paul E. McKenney
2024-05-04 17:50                                                                 ` Linus Torvalds
2024-05-04 18:18                                                                   ` Paul E. McKenney
2024-05-04 19:11                                                                     ` Linus Torvalds
2024-05-04 19:25                                                                       ` Linus Torvalds
2024-05-04 22:17                                                                         ` Paul E. McKenney
2024-05-04 22:04                                                                       ` Paul E. McKenney [this message]
2024-05-02 14:14                                                   ` Marco Elver
2024-05-02 16:42                                                     ` Tetsuo Handa
2024-05-02 17:20                                                       ` Marco Elver
2024-05-02 17:29                                                       ` Linus Torvalds
2024-05-02 18:14                                                         ` Al Viro
2024-05-02 19:29                                                           ` Marco Elver
2024-05-02 23:54                                                         ` Tetsuo Handa
2024-05-03  1:12                                                           ` Linus Torvalds
2023-04-23 13:28   ` [syzbot] [kernel?] KCSAN: data-race in __fput / __tty_hangup (4) Tetsuo Handa
2023-04-23 14:00     ` Greg Kroah-Hartman
2023-04-23 14:03     ` Greg Kroah-Hartman
2023-04-23 14:17       ` Tetsuo Handa

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=37195203-9a13-46aa-9cc0-5effea3c4b0e@paulmck-laptop \
    --to=paulmck@kernel.org \
    --cc=arnd@kernel.org \
    --cc=boqun.feng@gmail.com \
    --cc=dvyukov@google.com \
    --cc=elver@google.com \
    --cc=gregkh@linuxfoundation.org \
    --cc=jirislaby@kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=nathan@kernel.org \
    --cc=penguin-kernel@i-love.sakura.ne.jp \
    --cc=syzbot+b7c3ba8cdc2f6cf83c21@syzkaller.appspotmail.com \
    --cc=syzkaller-bugs@googlegroups.com \
    --cc=torvalds@linux-foundation.org \
    --cc=viro@zeniv.linux.org.uk \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.