All the mail mirrored from lore.kernel.org
 help / color / mirror / Atom feed
From: "Paul E. McKenney" <paulmck@kernel.org>
To: Joel Fernandes <joelagnelf@nvidia.com>
Cc: rcu@vger.kernel.org
Subject: Re: The rdp->gpwrap behavior
Date: Fri, 14 Feb 2025 00:20:49 -0800	[thread overview]
Message-ID: <b04b997e-fee6-4c5f-a0f9-12e671d27f8c@paulmck-laptop> (raw)
In-Reply-To: <20250213183033.GA2635051@joelnvbox>

On Thu, Feb 13, 2025 at 01:30:33PM -0500, Joel Fernandes wrote:
> +rcu for archives
> 
> Hi,
> 
> Following up about our gpwrap discussions, I did some tracing of rdp->gpwrap
> with just boot testing.
> 
> I actually never see it set because rdp->gp_seq and rnp->gp_seq are usually
> very close.
> 
> Example trace:
> 
> # rnp->gp_seq starts with -1200 on boot and then right around the wrap:
> 
> 178.096329: rcu_gpnum_ovf: CPU: 2, rnp->gp_seq: -3 (0xfffffffffffffffd), rdp->gp_seq: -3 (0xfffffffffffffffd), wrap before: 0
> 178.096330: rcu_gpnum_ovf: CPU: 2, rnp->gp_seq: -3 (0xfffffffffffffffd), rdp->gp_seq: -3 (0xfffffffffffffffd), wrap after: 0
> 178.100327: rcu_gpnum_ovf: CPU: 2, rnp->gp_seq: 1 (0x1), rdp->gp_seq: 1 (0x1), wrap before: 0
> 178.100328: rcu_gpnum_ovf: CPU: 2, rnp->gp_seq: 1 (0x1), rdp->gp_seq: 1 (0x1), wrap after: 0
> 
> The wrap "before" and "after" are the value of gpwrap before and after
> calling rcu_gpnum_ovf().
> 
> Closer look at the _ovf() shows it should be only set if rdp->gp_seq and
> rnp->gp_seq are quite a distance apart (say if the CPU was idle for too long
> and many GPs have passed. This can happen both with/without wrap. So imminent
> wrapping seems not necessary for ->gpwrap to be even set AFAICS.
> 
> I think the problem is ULONG_CMP_LT is not really "<" so even after wrap, it
> can false. i.e if rdp->gpseq + ULONG/4 wraps, it could still return false.

Exactly, and all of that is in fact the intended behavior.

> >From a testing POV, the example shows it is not set even when around when a
> wrap actually happened.  So may be, we are not testing gpwrap much, or at
> all, with rcutorture?

True enough, and that would be well worth fixing.

One simple (and maybe too simple) fix is to add a Kconfig option to
kernel/rcu/Kconfig.debug that makes rcu_gpnum_ovf() use something like
(32 << RCU_SEQ_CTR_SHIFT).  This, or some similar value, should cause
rcutorture's stuttering and/or shuffling to trigger setting of ->gpwrap
within a few minutes.

And I would not be surprised if doing this located bugs.

> But before that, I am feeling it is not behaving correctly. I'd expect it to
> be set even if rnp and rdp values are close but we are actually about to
> wrap.  So most likely I am missing something.

If I understand correctly, you would like ->gpwrap to be set
unconditionally at various points in the cycle, for example, at zero,
ULONG_MAX/4, ULONG_MAX/2, and 3*ULONG_MAX/4.  But this could easily
have the effect of discarding each CPU's quiescent states that showed
up before the first call to rcu_watching_snap_recheck(), even if that
CPU was active.  Our users might not consider this to be a friendly act.

In contrast, the current setup only discards quiescent states for CPUs
that just came back from idle prior to call to rcu_watching_snap_save().

In addition, use of the aforementioned Kconfig option has the benefit of
making any ->gpwrap-related bugs appear during testing, not in production.
On the same grounds, one could argue that absence of this Kconfig option
should also cause gp_seq to zero rather than the current;

	.gp_seq = (0UL - 300UL) << RCU_SEQ_CTR_SHIFT,

One argument against is that production 32-bit systems can wrap ->gp_seq
in a relatively short time.  Maybe moreso for ->expedited_sequence, so
perhaps leave ->gp_seq and modify ->expedited_sequence.  Maybe similar
decisions for others as well.  ;-)

							Thanx, Paul

  reply	other threads:[~2025-02-14  8:20 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-02-13 18:30 The rdp->gpwrap behavior Joel Fernandes
2025-02-14  8:20 ` Paul E. McKenney [this message]
2025-02-14 18:03   ` Joel Fernandes
2025-02-15 10:28     ` Paul E. McKenney
2025-02-16 12:23       ` Joel Fernandes
2025-02-18 13:17         ` Paul E. McKenney
2025-02-18 22:50           ` Joel Fernandes
2025-02-19 13:54             ` Paul E. McKenney
2025-02-19 15:57               ` Joel Fernandes
2025-02-19 17:02                 ` Paul E. McKenney

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=b04b997e-fee6-4c5f-a0f9-12e671d27f8c@paulmck-laptop \
    --to=paulmck@kernel.org \
    --cc=joelagnelf@nvidia.com \
    --cc=rcu@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.