From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <linux-kernel-owner@vger.kernel.org>
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
	id S1757937AbbFQQhu (ORCPT <rfc822;w@1wt.eu>);
	Wed, 17 Jun 2015 12:37:50 -0400
Received: from e35.co.us.ibm.com ([32.97.110.153]:47027 "EHLO
	e35.co.us.ibm.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
	with ESMTP id S1753997AbbFQQhl (ORCPT
	<rfc822;linux-kernel@vger.kernel.org>);
	Wed, 17 Jun 2015 12:37:41 -0400
X-Helo: d03dlp03.boulder.ibm.com
X-MailFrom: paulmck@linux.vnet.ibm.com
X-RcptTo: linux-kernel@vger.kernel.org
Date: Wed, 17 Jun 2015 09:37:31 -0700
From: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
To: Peter Zijlstra <peterz@infradead.org>
Cc: umgwanakikbuti@gmail.com, mingo@elte.hu, ktkhai@parallels.com,
        rostedt@goodmis.org, tglx@linutronix.de, juri.lelli@gmail.com,
        pang.xunlei@linaro.org, oleg@redhat.com, wanpeng.li@linux.intel.com,
        linux-kernel@vger.kernel.org, Al Viro <viro@ZenIV.linux.org.uk>,
        Linus Torvalds <torvalds@linux-foundation.org>
Subject: Re: [PATCH 11/18] seqcount: Introduce raw_write_seqcount_barrier()
Message-ID: <20150617163731.GD3913@linux.vnet.ibm.com>
Reply-To: paulmck@linux.vnet.ibm.com
References: <20150611124636.448700267@infradead.org>
 <20150611124743.374180021@infradead.org>
 <20150611153341.GK3913@linux.vnet.ibm.com>
 <20150611214557.GA4249@linux.vnet.ibm.com>
 <20150617122924.GP3644@twins.programming.kicks-ass.net>
 <20150617145712.GZ3913@linux.vnet.ibm.com>
 <20150617154926.GE19282@twins.programming.kicks-ass.net>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <20150617154926.GE19282@twins.programming.kicks-ass.net>
User-Agent: Mutt/1.5.21 (2010-09-15)
X-TM-AS-MML: disable
X-Content-Scanned: Fidelis XPS MAILER
x-cbid: 15061716-0013-0000-0000-00000E7BCA33
Sender: linux-kernel-owner@vger.kernel.org
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org

On Wed, Jun 17, 2015 at 05:49:27PM +0200, Peter Zijlstra wrote:
> On Wed, Jun 17, 2015 at 07:57:12AM -0700, Paul E. McKenney wrote:
> > On Wed, Jun 17, 2015 at 02:29:24PM +0200, Peter Zijlstra wrote:
> > > I did leave off the READ/WRITE ONCE stuff, because I could not come up
> > > with a scenario where it makes a difference -- I appreciate paranoia,
> > > but I also think we should not overdo the thing.
> > 
> > I can only conclude that you have not read this document:
> > 
> > 	http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2015/n4455.html
> > 
> > Specifically, please keep in mind that unless you mark either the variable
> > or the memory access, the compiler is within its rights to assume that
> > there are no concurrent accesses to that variable.  For but one example,
> > if you do a normal store to a given variable, then the compiler is
> > within its rights to use that variable as temporary storage prior to
> > that store.  And yes, you can reasonably argue that no sane compiler
> > would store something else to s->sequence given that it could free up
> > a register by storing the incremented value, but the fact remains that
> > you have given it permission to do so if it wants.
> 
> This is the "Optimizations without Atomics" section you're referring to.
> 
> It has words such as: "if the compiler can prove they are not accessed
> in other threads concurrently" and "This requires escape analysis: the
> compiler must see the full scope of the memory location 'p', or must
> know that leaf functions don't capture 'p' and aren't used concurrently,
> for this optimization to be valid."

Yes, but...  These qualifiers apply only in cases where the code at hand
is not writing to the variable.

In this case, it is legal for other threads to be concurrently reading
the variable.  After all, if the value of the variable is not changing,
then the various read-side optimizations have no effect -- even the silly
ones, like reading the variable one bit at a time.  If the compiler
introduces a write into code that was only reading the variable, then
it is the compiler's job to ensure that no other thread is reading it.
If the compiler were to introduce a write to such a variable that other
threads might be reading, then the compiler would have introduced a data
race, which would mean that the compiler introduced undefined behavior
to correct code.  The compiler therefore must be extremely careful when
introducing a write to a variable that is only read by the code at hand.
(Though there are still some "interesting" escape clauses for the compiler
if the variable in question is allocated on the stack and the function
can return without transferring control to some function in some other
translation unit.)

Recall that a data race occurs when there are multiple concurrent normal
accesses to a given normal variable, at least one of which is a write.
Here normal accesses and normal variables are those that are not marked
as atomic (in the C11 sense).  Accesses and variables marked as volatile
also disable most (perhaps all) of the dangerous optimizations that lead
to the undefined behavior.  That said, many compiler people hate volatile,
and will therefore automatically argue that it is useless in a misguided
attempt to convince people not to use it.  :-/

On the other hand, if the code is already writing to the variable (as it
is in the s->sequence++ case), then if there are any concurrent accesses,
the code -already- contains a data race.  This data race invokes undefined
behavior in the code as written, so the compiler is within its rights to
do anything at all, even spawn the proverbial game of rogue.  A somewhat
more reasonable compiler is within its rights to assume that no one is
concurrently reading any normal variable to which the code does a normal
write.  The compiler is therefore within its rights to use that variable
as scratch storage at any time between the most recent read and the write
in question.

> But then it starts weasel wording and saying that the lack of
> std::atomic<> usage implies a lack of concurrency, to which I strongly
> object.

Heh!

The point of std::atomic<> (and of the equivalent C11 syntax) is to
force the compiler to suppress optimizations that are unsafe for shared
variables.  We get more or less the same effect with volatile, protests
from compiler people notwithstanding.

I often tell the compiler guys that they have to expect make -some-
concessions for being 30 years late to the concurrency party, but
it nevertheless makes sense to future-proof our code where it is
reasonable to do so.

All that aside, I agree that "s->sequence++" is relatively low
priority, given that the compiler can easily free up a register by
storing the actual value.  But that might well be a failure of
imagination on my part.

> Esp. seeing how -ffreestanding does not have access to any of the atomic
> stuff since its library bits and not language bits (something which I've
> often said was a failure in the spec).

Agreed, given that atomics can almost always be inlined, it would be
nice if -ffreestanding didn't cut off the compiler's concurrency nose
to spite its concurrency face.  The reasoning behind it is that it is
legal (but, in my opinion, stupid) to create large atomic data structures.
The compilers normally introduce locks to implement these, which is
inappropriate in free-standing environments.  I believe that a better
strategy would be for -ffreestanding to implement only those atomics
that are machine-word sized, as in ATOMIC_..._LOCK_FREE==2.

							Thanx, Paul