All the mail mirrored from lore.kernel.org
 help / color / mirror / Atom feed
* The rdp->gpwrap behavior
@ 2025-02-13 18:30 Joel Fernandes
  2025-02-14  8:20 ` Paul E. McKenney
  0 siblings, 1 reply; 10+ messages in thread
From: Joel Fernandes @ 2025-02-13 18:30 UTC (permalink / raw)
  To: rcu, paulmck

+rcu for archives

Hi,

Following up about our gpwrap discussions, I did some tracing of rdp->gpwrap
with just boot testing.

I actually never see it set because rdp->gp_seq and rnp->gp_seq are usually
very close.

Example trace:

# rnp->gp_seq starts with -1200 on boot and then right around the wrap:

178.096329: rcu_gpnum_ovf: CPU: 2, rnp->gp_seq: -3 (0xfffffffffffffffd), rdp->gp_seq: -3 (0xfffffffffffffffd), wrap before: 0
178.096330: rcu_gpnum_ovf: CPU: 2, rnp->gp_seq: -3 (0xfffffffffffffffd), rdp->gp_seq: -3 (0xfffffffffffffffd), wrap after: 0
178.100327: rcu_gpnum_ovf: CPU: 2, rnp->gp_seq: 1 (0x1), rdp->gp_seq: 1 (0x1), wrap before: 0
178.100328: rcu_gpnum_ovf: CPU: 2, rnp->gp_seq: 1 (0x1), rdp->gp_seq: 1 (0x1), wrap after: 0

The wrap "before" and "after" are the value of gpwrap before and after
calling rcu_gpnum_ovf().

Closer look at the _ovf() shows it should be only set if rdp->gp_seq and
rnp->gp_seq are quite a distance apart (say if the CPU was idle for too long
and many GPs have passed. This can happen both with/without wrap. So imminent
wrapping seems not necessary for ->gpwrap to be even set AFAICS.

I think the problem is ULONG_CMP_LT is not really "<" so even after wrap, it
can false. i.e if rdp->gpseq + ULONG/4 wraps, it could still return false.

From a testing POV, the example shows it is not set even when around when a
wrap actually happened.  So may be, we are not testing gpwrap much, or at
all, with rcutorture?

But before that, I am feeling it is not behaving correctly. I'd expect it to
be set even if rnp and rdp values are close but we are actually about to
wrap.  So most likely I am missing something.

thanks,

 - Joel


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: The rdp->gpwrap behavior
  2025-02-13 18:30 The rdp->gpwrap behavior Joel Fernandes
@ 2025-02-14  8:20 ` Paul E. McKenney
  2025-02-14 18:03   ` Joel Fernandes
  0 siblings, 1 reply; 10+ messages in thread
From: Paul E. McKenney @ 2025-02-14  8:20 UTC (permalink / raw)
  To: Joel Fernandes; +Cc: rcu

On Thu, Feb 13, 2025 at 01:30:33PM -0500, Joel Fernandes wrote:
> +rcu for archives
> 
> Hi,
> 
> Following up about our gpwrap discussions, I did some tracing of rdp->gpwrap
> with just boot testing.
> 
> I actually never see it set because rdp->gp_seq and rnp->gp_seq are usually
> very close.
> 
> Example trace:
> 
> # rnp->gp_seq starts with -1200 on boot and then right around the wrap:
> 
> 178.096329: rcu_gpnum_ovf: CPU: 2, rnp->gp_seq: -3 (0xfffffffffffffffd), rdp->gp_seq: -3 (0xfffffffffffffffd), wrap before: 0
> 178.096330: rcu_gpnum_ovf: CPU: 2, rnp->gp_seq: -3 (0xfffffffffffffffd), rdp->gp_seq: -3 (0xfffffffffffffffd), wrap after: 0
> 178.100327: rcu_gpnum_ovf: CPU: 2, rnp->gp_seq: 1 (0x1), rdp->gp_seq: 1 (0x1), wrap before: 0
> 178.100328: rcu_gpnum_ovf: CPU: 2, rnp->gp_seq: 1 (0x1), rdp->gp_seq: 1 (0x1), wrap after: 0
> 
> The wrap "before" and "after" are the value of gpwrap before and after
> calling rcu_gpnum_ovf().
> 
> Closer look at the _ovf() shows it should be only set if rdp->gp_seq and
> rnp->gp_seq are quite a distance apart (say if the CPU was idle for too long
> and many GPs have passed. This can happen both with/without wrap. So imminent
> wrapping seems not necessary for ->gpwrap to be even set AFAICS.
> 
> I think the problem is ULONG_CMP_LT is not really "<" so even after wrap, it
> can false. i.e if rdp->gpseq + ULONG/4 wraps, it could still return false.

Exactly, and all of that is in fact the intended behavior.

> >From a testing POV, the example shows it is not set even when around when a
> wrap actually happened.  So may be, we are not testing gpwrap much, or at
> all, with rcutorture?

True enough, and that would be well worth fixing.

One simple (and maybe too simple) fix is to add a Kconfig option to
kernel/rcu/Kconfig.debug that makes rcu_gpnum_ovf() use something like
(32 << RCU_SEQ_CTR_SHIFT).  This, or some similar value, should cause
rcutorture's stuttering and/or shuffling to trigger setting of ->gpwrap
within a few minutes.

And I would not be surprised if doing this located bugs.

> But before that, I am feeling it is not behaving correctly. I'd expect it to
> be set even if rnp and rdp values are close but we are actually about to
> wrap.  So most likely I am missing something.

If I understand correctly, you would like ->gpwrap to be set
unconditionally at various points in the cycle, for example, at zero,
ULONG_MAX/4, ULONG_MAX/2, and 3*ULONG_MAX/4.  But this could easily
have the effect of discarding each CPU's quiescent states that showed
up before the first call to rcu_watching_snap_recheck(), even if that
CPU was active.  Our users might not consider this to be a friendly act.

In contrast, the current setup only discards quiescent states for CPUs
that just came back from idle prior to call to rcu_watching_snap_save().

In addition, use of the aforementioned Kconfig option has the benefit of
making any ->gpwrap-related bugs appear during testing, not in production.
On the same grounds, one could argue that absence of this Kconfig option
should also cause gp_seq to zero rather than the current;

	.gp_seq = (0UL - 300UL) << RCU_SEQ_CTR_SHIFT,

One argument against is that production 32-bit systems can wrap ->gp_seq
in a relatively short time.  Maybe moreso for ->expedited_sequence, so
perhaps leave ->gp_seq and modify ->expedited_sequence.  Maybe similar
decisions for others as well.  ;-)

							Thanx, Paul

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: The rdp->gpwrap behavior
  2025-02-14  8:20 ` Paul E. McKenney
@ 2025-02-14 18:03   ` Joel Fernandes
  2025-02-15 10:28     ` Paul E. McKenney
  0 siblings, 1 reply; 10+ messages in thread
From: Joel Fernandes @ 2025-02-14 18:03 UTC (permalink / raw)
  To: paulmck; +Cc: rcu

Hi Paul,

On 2/14/2025 3:20 AM, Paul E. McKenney wrote:
> On Thu, Feb 13, 2025 at 01:30:33PM -0500, Joel Fernandes wrote:
>> +rcu for archives
>>
>> Hi,
>>
>> Following up about our gpwrap discussions, I did some tracing of rdp->gpwrap
>> with just boot testing.
>>
>> I actually never see it set because rdp->gp_seq and rnp->gp_seq are usually
>> very close.
>>
>> Example trace:
>>
>> # rnp->gp_seq starts with -1200 on boot and then right around the wrap:
>>
>> 178.096329: rcu_gpnum_ovf: CPU: 2, rnp->gp_seq: -3 (0xfffffffffffffffd), rdp->gp_seq: -3 (0xfffffffffffffffd), wrap before: 0
>> 178.096330: rcu_gpnum_ovf: CPU: 2, rnp->gp_seq: -3 (0xfffffffffffffffd), rdp->gp_seq: -3 (0xfffffffffffffffd), wrap after: 0
>> 178.100327: rcu_gpnum_ovf: CPU: 2, rnp->gp_seq: 1 (0x1), rdp->gp_seq: 1 (0x1), wrap before: 0
>> 178.100328: rcu_gpnum_ovf: CPU: 2, rnp->gp_seq: 1 (0x1), rdp->gp_seq: 1 (0x1), wrap after: 0
>>
>> The wrap "before" and "after" are the value of gpwrap before and after
>> calling rcu_gpnum_ovf().
>>
>> Closer look at the _ovf() shows it should be only set if rdp->gp_seq and
>> rnp->gp_seq are quite a distance apart (say if the CPU was idle for too long
>> and many GPs have passed. This can happen both with/without wrap. So imminent
>> wrapping seems not necessary for ->gpwrap to be even set AFAICS.
>>
>> I think the problem is ULONG_CMP_LT is not really "<" so even after wrap, it
>> can false. i.e if rdp->gpseq + ULONG/4 wraps, it could still return false.
> 
> Exactly, and all of that is in fact the intended behavior.

Ah I think I get it now, so the wrap isn't really about the counter wrapping
causing an issue, but about the design of the ULONG_CMP APIs which are designed
for comparisons that wrap around. Due to this design, there is a side effect
that the numbers being compared have to be at a distance that is less than half
the number wheel. Otherwise the comparisons of numbers yields incorrect results.
Is that accurate to say?  IOW, ->gpwrap == true means, don't compare the numbers
anymore till we sort these huge deltas out.

> 
>> >From a testing POV, the example shows it is not set even when around when a
>> wrap actually happened.  So may be, we are not testing gpwrap much, or at
>> all, with rcutorture?
> 
> True enough, and that would be well worth fixing.
> 
> One simple (and maybe too simple) fix is to add a Kconfig option to
> kernel/rcu/Kconfig.debug that makes rcu_gpnum_ovf() use something like
> (32 << RCU_SEQ_CTR_SHIFT).  This, or some similar value, should cause
> rcutorture's stuttering and/or shuffling to trigger setting of ->gpwrap
> within a few minutes.
> 
> And I would not be surprised if doing this located bugs.
> 

Ok cool! So I will work on this next then. I am curious to see how often it
triggers both with/without such a change.

>> But before that, I am feeling it is not behaving correctly. I'd expect it to
>> be set even if rnp and rdp values are close but we are actually about to
>> wrap.  So most likely I am missing something.
> 
> If I understand correctly, you would like ->gpwrap to be set
> unconditionally at various points in the cycle, for example, at zero,
> ULONG_MAX/4, ULONG_MAX/2, and 3*ULONG_MAX/4.  But this could easily
> have the effect of discarding each CPU's quiescent states that showed
> up before the first call to rcu_watching_snap_recheck(), even if that
> CPU was active.  Our users might not consider this to be a friendly act.

Got it.

> In contrast, the current setup only discards quiescent states for CPUs
> that just came back from idle prior to call to rcu_watching_snap_save().

Do you mean the check of ->gpwrap in rcu_report_qs_rdp() which returns early?

> In addition, use of the aforementioned Kconfig option has the benefit of
> making any ->gpwrap-related bugs appear during testing, not in production.
> On the same grounds, one could argue that absence of this Kconfig option
> should also cause gp_seq to zero rather than the current;
> 
> 	.gp_seq = (0UL - 300UL) << RCU_SEQ_CTR_SHIFT,

Hm, I think it still make sense to the do -300 even without debug options, just
so we have a true wrap early on, regardless of which APIs we use for comparison.

thanks,

 - Joel


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: The rdp->gpwrap behavior
  2025-02-14 18:03   ` Joel Fernandes
@ 2025-02-15 10:28     ` Paul E. McKenney
  2025-02-16 12:23       ` Joel Fernandes
  0 siblings, 1 reply; 10+ messages in thread
From: Paul E. McKenney @ 2025-02-15 10:28 UTC (permalink / raw)
  To: Joel Fernandes; +Cc: rcu

On Fri, Feb 14, 2025 at 01:03:40PM -0500, Joel Fernandes wrote:
> Hi Paul,
> 
> On 2/14/2025 3:20 AM, Paul E. McKenney wrote:
> > On Thu, Feb 13, 2025 at 01:30:33PM -0500, Joel Fernandes wrote:
> >> +rcu for archives
> >>
> >> Hi,
> >>
> >> Following up about our gpwrap discussions, I did some tracing of rdp->gpwrap
> >> with just boot testing.
> >>
> >> I actually never see it set because rdp->gp_seq and rnp->gp_seq are usually
> >> very close.
> >>
> >> Example trace:
> >>
> >> # rnp->gp_seq starts with -1200 on boot and then right around the wrap:
> >>
> >> 178.096329: rcu_gpnum_ovf: CPU: 2, rnp->gp_seq: -3 (0xfffffffffffffffd), rdp->gp_seq: -3 (0xfffffffffffffffd), wrap before: 0
> >> 178.096330: rcu_gpnum_ovf: CPU: 2, rnp->gp_seq: -3 (0xfffffffffffffffd), rdp->gp_seq: -3 (0xfffffffffffffffd), wrap after: 0
> >> 178.100327: rcu_gpnum_ovf: CPU: 2, rnp->gp_seq: 1 (0x1), rdp->gp_seq: 1 (0x1), wrap before: 0
> >> 178.100328: rcu_gpnum_ovf: CPU: 2, rnp->gp_seq: 1 (0x1), rdp->gp_seq: 1 (0x1), wrap after: 0
> >>
> >> The wrap "before" and "after" are the value of gpwrap before and after
> >> calling rcu_gpnum_ovf().
> >>
> >> Closer look at the _ovf() shows it should be only set if rdp->gp_seq and
> >> rnp->gp_seq are quite a distance apart (say if the CPU was idle for too long
> >> and many GPs have passed. This can happen both with/without wrap. So imminent
> >> wrapping seems not necessary for ->gpwrap to be even set AFAICS.
> >>
> >> I think the problem is ULONG_CMP_LT is not really "<" so even after wrap, it
> >> can false. i.e if rdp->gpseq + ULONG/4 wraps, it could still return false.
> > 
> > Exactly, and all of that is in fact the intended behavior.
> 
> Ah I think I get it now, so the wrap isn't really about the counter wrapping
> causing an issue, but about the design of the ULONG_CMP APIs which are designed
> for comparisons that wrap around. Due to this design, there is a side effect
> that the numbers being compared have to be at a distance that is less than half
> the number wheel. Otherwise the comparisons of numbers yields incorrect results.
> Is that accurate to say?  IOW, ->gpwrap == true means, don't compare the numbers
> anymore till we sort these huge deltas out.

But "incorrect" is such a harsh and judgmental word.  ;-)

Somewhat more seriously, ULONG_CMP_{GE,LT}() are defined to return the
sign (or its complement) of differences between a pair of 32-bit or 64-bit
numbers, as sizeof(unsigned long) dictates.  This is, mathematically,
how you do signed differences.  The result coming out negative isn't
incorrect, but instead the expected result given the numbers in question.

Current RCU use cases might interpret a negative result as "no one did
it for us yet", as in rcu_barrier(), "this CPU has not completed the
current grace period", as in __note_gp_changes(), and so on.

Now, these can get a false-positive/-negative result.  In the case of
rcu_barrier(), we get a needless rcu_barrier() computation.  In the case
of __note_gp_changes(), the false negative is OK, but a false positive
from wrapping all the way around, could be fatal for a CPU that was idle
for a long time (hence ->gpwrap).  And so on.

> >> >From a testing POV, the example shows it is not set even when around when a
> >> wrap actually happened.  So may be, we are not testing gpwrap much, or at
> >> all, with rcutorture?
> > 
> > True enough, and that would be well worth fixing.
> > 
> > One simple (and maybe too simple) fix is to add a Kconfig option to
> > kernel/rcu/Kconfig.debug that makes rcu_gpnum_ovf() use something like
> > (32 << RCU_SEQ_CTR_SHIFT).  This, or some similar value, should cause
> > rcutorture's stuttering and/or shuffling to trigger setting of ->gpwrap
> > within a few minutes.
> > 
> > And I would not be surprised if doing this located bugs.
> 
> Ok cool! So I will work on this next then. I am curious to see how often it
> triggers both with/without such a change.

If you can make it trigger without this change, you are far more patient
than I am.  ;-)

> >> But before that, I am feeling it is not behaving correctly. I'd expect it to
> >> be set even if rnp and rdp values are close but we are actually about to
> >> wrap.  So most likely I am missing something.
> > 
> > If I understand correctly, you would like ->gpwrap to be set
> > unconditionally at various points in the cycle, for example, at zero,
> > ULONG_MAX/4, ULONG_MAX/2, and 3*ULONG_MAX/4.  But this could easily
> > have the effect of discarding each CPU's quiescent states that showed
> > up before the first call to rcu_watching_snap_recheck(), even if that
> > CPU was active.  Our users might not consider this to be a friendly act.
> 
> Got it.
> 
> > In contrast, the current setup only discards quiescent states for CPUs
> > that just came back from idle prior to call to rcu_watching_snap_save().
> 
> Do you mean the check of ->gpwrap in rcu_report_qs_rdp() which returns early?

Well, it also resets the CPU to collect quiescent states for the current
grace period, but yes, it does return early.

> > In addition, use of the aforementioned Kconfig option has the benefit of
> > making any ->gpwrap-related bugs appear during testing, not in production.
> > On the same grounds, one could argue that absence of this Kconfig option
> > should also cause gp_seq to zero rather than the current;
> > 
> > 	.gp_seq = (0UL - 300UL) << RCU_SEQ_CTR_SHIFT,
> 
> Hm, I think it still make sense to the do -300 even without debug options, just
> so we have a true wrap early on, regardless of which APIs we use for comparison.

It has certainly been as it is for a long time, so agreed.

							Thanx, Paul

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: The rdp->gpwrap behavior
  2025-02-15 10:28     ` Paul E. McKenney
@ 2025-02-16 12:23       ` Joel Fernandes
  2025-02-18 13:17         ` Paul E. McKenney
  0 siblings, 1 reply; 10+ messages in thread
From: Joel Fernandes @ 2025-02-16 12:23 UTC (permalink / raw)
  To: paulmck; +Cc: rcu



On 2/15/2025 5:28 AM, Paul E. McKenney wrote:
> On Fri, Feb 14, 2025 at 01:03:40PM -0500, Joel Fernandes wrote:
>>>> Following up about our gpwrap discussions, I did some tracing of rdp->gpwrap
>>>> with just boot testing.
>>>>
>>>> I actually never see it set because rdp->gp_seq and rnp->gp_seq are usually
>>>> very close.
>>>>
>>>> Example trace:
>>>>
>>>> # rnp->gp_seq starts with -1200 on boot and then right around the wrap:
>>>>
>>>> 178.096329: rcu_gpnum_ovf: CPU: 2, rnp->gp_seq: -3 (0xfffffffffffffffd), rdp->gp_seq: -3 (0xfffffffffffffffd), wrap before: 0
>>>> 178.096330: rcu_gpnum_ovf: CPU: 2, rnp->gp_seq: -3 (0xfffffffffffffffd), rdp->gp_seq: -3 (0xfffffffffffffffd), wrap after: 0
>>>> 178.100327: rcu_gpnum_ovf: CPU: 2, rnp->gp_seq: 1 (0x1), rdp->gp_seq: 1 (0x1), wrap before: 0
>>>> 178.100328: rcu_gpnum_ovf: CPU: 2, rnp->gp_seq: 1 (0x1), rdp->gp_seq: 1 (0x1), wrap after: 0
>>>>
>>>> The wrap "before" and "after" are the value of gpwrap before and after
>>>> calling rcu_gpnum_ovf().
>>>>
>>>> Closer look at the _ovf() shows it should be only set if rdp->gp_seq and
>>>> rnp->gp_seq are quite a distance apart (say if the CPU was idle for too long
>>>> and many GPs have passed. This can happen both with/without wrap. So imminent
>>>> wrapping seems not necessary for ->gpwrap to be even set AFAICS.
>>>>
>>>> I think the problem is ULONG_CMP_LT is not really "<" so even after wrap, it
>>>> can false. i.e if rdp->gpseq + ULONG/4 wraps, it could still return false.
>>>
>>> Exactly, and all of that is in fact the intended behavior.
>>
>> Ah I think I get it now, so the wrap isn't really about the counter wrapping
>> causing an issue, but about the design of the ULONG_CMP APIs which are designed
>> for comparisons that wrap around. Due to this design, there is a side effect
>> that the numbers being compared have to be at a distance that is less than half
>> the number wheel. Otherwise the comparisons of numbers yields incorrect results.
>> Is that accurate to say?  IOW, ->gpwrap == true means, don't compare the numbers
>> anymore till we sort these huge deltas out.
> 
> But "incorrect" is such a harsh and judgmental word.  ;-)
> 
> Somewhat more seriously, ULONG_CMP_{GE,LT}() are defined to return the
> sign (or its complement) of differences between a pair of 32-bit or 64-bit
> numbers, as sizeof(unsigned long) dictates.  This is, mathematically,
> how you do signed differences.  The result coming out negative isn't
> incorrect, but instead the expected result given the numbers in question.
> 
> Current RCU use cases might interpret a negative result as "no one did
> it for us yet", as in rcu_barrier(), "this CPU has not completed the
> current grace period", as in __note_gp_changes(), and so on.
> 
> Now, these can get a false-positive/-negative result.  In the case of
> rcu_barrier(), we get a needless rcu_barrier() computation.  In the case
> of __note_gp_changes(), the false negative is OK, but a false positive
> from wrapping all the way around, could be fatal for a CPU that was idle
> for a long time (hence ->gpwrap).  And so on.

Understood and agreed! I guess what I was getting at was that ->gpwrap can be
true and false-negatives start getting reported well before a full wrap /
false-positive situation arises, I believe. And a false-negative may not be
desirable in all scenarios. In my other patch, I posted a scenario where I was
concerned about missed wakeups on a false-negative situation.

In srcutree.c
		swait_event_interruptible_exclusive(
			rnp->nocb_gp_wq[rcu_seq_ctr(wait_gp_seq) & 0x1],
			rcu_seq_done(&rnp->gp_seq, wait_gp_seq) ||
			!READ_ONCE(my_rdp->nocb_gp_sleep));

But I shamelessly admit I don't understand this code well enough to say if this
is an issue :/.

I agree depending on usecase, false-negative is a non-issue. But I am not sure
if false-negatives in all usecases is non-fatal.

> 
>>>> >From a testing POV, the example shows it is not set even when around when a
>>>> wrap actually happened.  So may be, we are not testing gpwrap much, or at
>>>> all, with rcutorture?
>>>
>>> True enough, and that would be well worth fixing.
>>>
>>> One simple (and maybe too simple) fix is to add a Kconfig option to
>>> kernel/rcu/Kconfig.debug that makes rcu_gpnum_ovf() use something like
>>> (32 << RCU_SEQ_CTR_SHIFT).  This, or some similar value, should cause
>>> rcutorture's stuttering and/or shuffling to trigger setting of ->gpwrap
>>> within a few minutes.

Since minutes was too long for me :-D, I am trying just a count of 4, and I do
see 15 total wraps in just 2 minutes, on 36 CPU system.

I am trying this same patch now for 10 minute run on all scenarios:
https://git.kernel.org/pub/scm/linux/kernel/git/jfern/linux.git/commit/?h=rcu/next&id=984c47eeb7728829901b33462baae2771af80ea0
Will try longer runs after.

Are you interested in an rcutorture patch where we force this? Would this be a
default setting where we always run it, or do we want to set it only under
torture command line options? Do we default to say a high value like 32, but
still provide an option to lower it to say 4, like I am doing?

>>>
>>> And I would not be surprised if doing this located bugs.
>>
>> Ok cool! So I will work on this next then. I am curious to see how often it
>> triggers both with/without such a change.
> 
> If you can make it trigger without this change, you are far more patient
> than I am.  ;-)
> 
>>>> But before that, I am feeling it is not behaving correctly. I'd expect it to
>>>> be set even if rnp and rdp values are close but we are actually about to
>>>> wrap.  So most likely I am missing something.
>>>
>>> If I understand correctly, you would like ->gpwrap to be set
>>> unconditionally at various points in the cycle, for example, at zero,
>>> ULONG_MAX/4, ULONG_MAX/2, and 3*ULONG_MAX/4.  But this could easily
>>> have the effect of discarding each CPU's quiescent states that showed
>>> up before the first call to rcu_watching_snap_recheck(), even if that
>>> CPU was active.  Our users might not consider this to be a friendly act.

Got it, thanks.

>>
>> Got it.
>>
>>> In contrast, the current setup only discards quiescent states for CPUs
>>> that just came back from idle prior to call to rcu_watching_snap_save().
>>
>> Do you mean the check of ->gpwrap in rcu_report_qs_rdp() which returns early?
> 
> Well, it also resets the CPU to collect quiescent states for the current
> grace period, but yes, it does return early.

Got it, makes sense!

thanks,

 - Joel


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: The rdp->gpwrap behavior
  2025-02-16 12:23       ` Joel Fernandes
@ 2025-02-18 13:17         ` Paul E. McKenney
  2025-02-18 22:50           ` Joel Fernandes
  0 siblings, 1 reply; 10+ messages in thread
From: Paul E. McKenney @ 2025-02-18 13:17 UTC (permalink / raw)
  To: Joel Fernandes; +Cc: rcu

On Sun, Feb 16, 2025 at 07:23:48AM -0500, Joel Fernandes wrote:
> On 2/15/2025 5:28 AM, Paul E. McKenney wrote:
> > On Fri, Feb 14, 2025 at 01:03:40PM -0500, Joel Fernandes wrote:
> >>>> Following up about our gpwrap discussions, I did some tracing of rdp->gpwrap
> >>>> with just boot testing.
> >>>>
> >>>> I actually never see it set because rdp->gp_seq and rnp->gp_seq are usually
> >>>> very close.
> >>>>
> >>>> Example trace:
> >>>>
> >>>> # rnp->gp_seq starts with -1200 on boot and then right around the wrap:
> >>>>
> >>>> 178.096329: rcu_gpnum_ovf: CPU: 2, rnp->gp_seq: -3 (0xfffffffffffffffd), rdp->gp_seq: -3 (0xfffffffffffffffd), wrap before: 0
> >>>> 178.096330: rcu_gpnum_ovf: CPU: 2, rnp->gp_seq: -3 (0xfffffffffffffffd), rdp->gp_seq: -3 (0xfffffffffffffffd), wrap after: 0
> >>>> 178.100327: rcu_gpnum_ovf: CPU: 2, rnp->gp_seq: 1 (0x1), rdp->gp_seq: 1 (0x1), wrap before: 0
> >>>> 178.100328: rcu_gpnum_ovf: CPU: 2, rnp->gp_seq: 1 (0x1), rdp->gp_seq: 1 (0x1), wrap after: 0
> >>>>
> >>>> The wrap "before" and "after" are the value of gpwrap before and after
> >>>> calling rcu_gpnum_ovf().
> >>>>
> >>>> Closer look at the _ovf() shows it should be only set if rdp->gp_seq and
> >>>> rnp->gp_seq are quite a distance apart (say if the CPU was idle for too long
> >>>> and many GPs have passed. This can happen both with/without wrap. So imminent
> >>>> wrapping seems not necessary for ->gpwrap to be even set AFAICS.
> >>>>
> >>>> I think the problem is ULONG_CMP_LT is not really "<" so even after wrap, it
> >>>> can false. i.e if rdp->gpseq + ULONG/4 wraps, it could still return false.
> >>>
> >>> Exactly, and all of that is in fact the intended behavior.
> >>
> >> Ah I think I get it now, so the wrap isn't really about the counter wrapping
> >> causing an issue, but about the design of the ULONG_CMP APIs which are designed
> >> for comparisons that wrap around. Due to this design, there is a side effect
> >> that the numbers being compared have to be at a distance that is less than half
> >> the number wheel. Otherwise the comparisons of numbers yields incorrect results.
> >> Is that accurate to say?  IOW, ->gpwrap == true means, don't compare the numbers
> >> anymore till we sort these huge deltas out.
> > 
> > But "incorrect" is such a harsh and judgmental word.  ;-)
> > 
> > Somewhat more seriously, ULONG_CMP_{GE,LT}() are defined to return the
> > sign (or its complement) of differences between a pair of 32-bit or 64-bit
> > numbers, as sizeof(unsigned long) dictates.  This is, mathematically,
> > how you do signed differences.  The result coming out negative isn't
> > incorrect, but instead the expected result given the numbers in question.
> > 
> > Current RCU use cases might interpret a negative result as "no one did
> > it for us yet", as in rcu_barrier(), "this CPU has not completed the
> > current grace period", as in __note_gp_changes(), and so on.
> > 
> > Now, these can get a false-positive/-negative result.  In the case of
> > rcu_barrier(), we get a needless rcu_barrier() computation.  In the case
> > of __note_gp_changes(), the false negative is OK, but a false positive
> > from wrapping all the way around, could be fatal for a CPU that was idle
> > for a long time (hence ->gpwrap).  And so on.
> 
> Understood and agreed! I guess what I was getting at was that ->gpwrap can be
> true and false-negatives start getting reported well before a full wrap /
> false-positive situation arises, I believe. And a false-negative may not be
> desirable in all scenarios. In my other patch, I posted a scenario where I was
> concerned about missed wakeups on a false-negative situation.
> 
> In srcutree.c
> 		swait_event_interruptible_exclusive(
> 			rnp->nocb_gp_wq[rcu_seq_ctr(wait_gp_seq) & 0x1],
> 			rcu_seq_done(&rnp->gp_seq, wait_gp_seq) ||
> 			!READ_ONCE(my_rdp->nocb_gp_sleep));
> 
> But I shamelessly admit I don't understand this code well enough to say if this
> is an issue :/.

This looks like RCU NOCB code rather than SRCU code?  Could you please
tell me what function this code lives in?  The closest match I see is
in nocb_gp_wait()?

> I agree depending on usecase, false-negative is a non-issue. But I am not sure
> if false-negatives in all usecases is non-fatal.

Agreed, which is exactly why this fatality (or lack thereof) must be
evaluated on a use-case-by-use-case basis.

> >>>> >From a testing POV, the example shows it is not set even when around when a
> >>>> wrap actually happened.  So may be, we are not testing gpwrap much, or at
> >>>> all, with rcutorture?
> >>>
> >>> True enough, and that would be well worth fixing.
> >>>
> >>> One simple (and maybe too simple) fix is to add a Kconfig option to
> >>> kernel/rcu/Kconfig.debug that makes rcu_gpnum_ovf() use something like
> >>> (32 << RCU_SEQ_CTR_SHIFT).  This, or some similar value, should cause
> >>> rcutorture's stuttering and/or shuffling to trigger setting of ->gpwrap
> >>> within a few minutes.
> 
> Since minutes was too long for me :-D, I am trying just a count of 4, and I do
> see 15 total wraps in just 2 minutes, on 36 CPU system.

Very good, and thank you for taking this on!

> I am trying this same patch now for 10 minute run on all scenarios:
> https://git.kernel.org/pub/scm/linux/kernel/git/jfern/linux.git/commit/?h=rcu/next&id=984c47eeb7728829901b33462baae2771af80ea0
> Will try longer runs after.

A good prototype!

> Are you interested in an rcutorture patch where we force this? Would this be a
> default setting where we always run it, or do we want to set it only under
> torture command line options? Do we default to say a high value like 32, but
> still provide an option to lower it to say 4, like I am doing?

Something like this would be good, but only under options.  (In order to
more easily debug something that only happens with hyperactive ->gpwrap
setting.)  For example, defined with yet another "rcutree." module_param()
that is also recorded in kernel-parameters.txt.  And printed out at boot,
perhaps in the rcu_bootup_announce_oddness() function.

We need a default setting that causes it to happen (very roughly) every
ten minutes on a given CPU, or perhaps better, every minute or so across
the system, by scaling the default by the number of CPUs.  After all,
we want to test both the ->gpwrap path *and* the non-->gpwrap path.

Then that parameter's value would of course take the place of the
hard-coded "4" in your new ULONG_CMP_LT() call in rcu_gpnum_ovf().

For example, the "pr_alert("%s%s total_gpwrap_count...);" works well
for prototyping, but for production needs to take its place elsewhere,
most likely later in that same series of pr_cont() invocations.

The get_gp_wrap_count() should sum up the per-CPU ->gpwrap_count fields,
right?  Ah, but you are calling it from a per-CPU loop anyway, so either
way should be fine.  But either way, get_gp_wrap_count() does need a
header comment saying why.  Trust me, it won't necessarily be obvious
five years from now.  ;-)

And that "atomic_inc(&rdp->gpwrap_count)" could be this_cpu_inc(), or
even "WRITE_ONCE(..., ... + 1)", given that we are not worried about
losing counts once in awhile.

I *think* that it is OK to have the tree.[ch] and rcutorture.c portions
in the same patch, but whoever is pushing this to mainline might well ask
for it to be split if we happen to end up with merge conflicts.  Or maybe
just get more practice handling merge conflicts, but it is their call.

							Thanx, Paul

> >>> And I would not be surprised if doing this located bugs.
> >>
> >> Ok cool! So I will work on this next then. I am curious to see how often it
> >> triggers both with/without such a change.
> > 
> > If you can make it trigger without this change, you are far more patient
> > than I am.  ;-)
> > 
> >>>> But before that, I am feeling it is not behaving correctly. I'd expect it to
> >>>> be set even if rnp and rdp values are close but we are actually about to
> >>>> wrap.  So most likely I am missing something.
> >>>
> >>> If I understand correctly, you would like ->gpwrap to be set
> >>> unconditionally at various points in the cycle, for example, at zero,
> >>> ULONG_MAX/4, ULONG_MAX/2, and 3*ULONG_MAX/4.  But this could easily
> >>> have the effect of discarding each CPU's quiescent states that showed
> >>> up before the first call to rcu_watching_snap_recheck(), even if that
> >>> CPU was active.  Our users might not consider this to be a friendly act.
> 
> Got it, thanks.
> 
> >>
> >> Got it.
> >>
> >>> In contrast, the current setup only discards quiescent states for CPUs
> >>> that just came back from idle prior to call to rcu_watching_snap_save().
> >>
> >> Do you mean the check of ->gpwrap in rcu_report_qs_rdp() which returns early?
> > 
> > Well, it also resets the CPU to collect quiescent states for the current
> > grace period, but yes, it does return early.
> 
> Got it, makes sense!
> 
> thanks,
> 
>  - Joel
> 

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: The rdp->gpwrap behavior
  2025-02-18 13:17         ` Paul E. McKenney
@ 2025-02-18 22:50           ` Joel Fernandes
  2025-02-19 13:54             ` Paul E. McKenney
  0 siblings, 1 reply; 10+ messages in thread
From: Joel Fernandes @ 2025-02-18 22:50 UTC (permalink / raw)
  To: paulmck; +Cc: rcu

Hi Paul,

On 2/18/2025 8:17 AM, Paul E. McKenney wrote:
> On Sun, Feb 16, 2025 at 07:23:48AM -0500, Joel Fernandes wrote:
>> On 2/15/2025 5:28 AM, Paul E. McKenney wrote:
>>> On Fri, Feb 14, 2025 at 01:03:40PM -0500, Joel Fernandes wrote:
>>>>>> Following up about our gpwrap discussions, I did some tracing of rdp->gpwrap
>>>>>> with just boot testing.
>>>>>>
>>>>>> I actually never see it set because rdp->gp_seq and rnp->gp_seq are usually
>>>>>> very close.
>>>>>>
>>>>>> Example trace:
>>>>>>
>>>>>> # rnp->gp_seq starts with -1200 on boot and then right around the wrap:
>>>>>>
>>>>>> 178.096329: rcu_gpnum_ovf: CPU: 2, rnp->gp_seq: -3 (0xfffffffffffffffd), rdp->gp_seq: -3 (0xfffffffffffffffd), wrap before: 0
>>>>>> 178.096330: rcu_gpnum_ovf: CPU: 2, rnp->gp_seq: -3 (0xfffffffffffffffd), rdp->gp_seq: -3 (0xfffffffffffffffd), wrap after: 0
>>>>>> 178.100327: rcu_gpnum_ovf: CPU: 2, rnp->gp_seq: 1 (0x1), rdp->gp_seq: 1 (0x1), wrap before: 0
>>>>>> 178.100328: rcu_gpnum_ovf: CPU: 2, rnp->gp_seq: 1 (0x1), rdp->gp_seq: 1 (0x1), wrap after: 0
>>>>>>
>>>>>> The wrap "before" and "after" are the value of gpwrap before and after
>>>>>> calling rcu_gpnum_ovf().
>>>>>>
>>>>>> Closer look at the _ovf() shows it should be only set if rdp->gp_seq and
>>>>>> rnp->gp_seq are quite a distance apart (say if the CPU was idle for too long
>>>>>> and many GPs have passed. This can happen both with/without wrap. So imminent
>>>>>> wrapping seems not necessary for ->gpwrap to be even set AFAICS.
>>>>>>
>>>>>> I think the problem is ULONG_CMP_LT is not really "<" so even after wrap, it
>>>>>> can false. i.e if rdp->gpseq + ULONG/4 wraps, it could still return false.
>>>>>
>>>>> Exactly, and all of that is in fact the intended behavior.
>>>>
>>>> Ah I think I get it now, so the wrap isn't really about the counter wrapping
>>>> causing an issue, but about the design of the ULONG_CMP APIs which are designed
>>>> for comparisons that wrap around. Due to this design, there is a side effect
>>>> that the numbers being compared have to be at a distance that is less than half
>>>> the number wheel. Otherwise the comparisons of numbers yields incorrect results.
>>>> Is that accurate to say?  IOW, ->gpwrap == true means, don't compare the numbers
>>>> anymore till we sort these huge deltas out.
>>>
>>> But "incorrect" is such a harsh and judgmental word.  ;-)
>>>
>>> Somewhat more seriously, ULONG_CMP_{GE,LT}() are defined to return the
>>> sign (or its complement) of differences between a pair of 32-bit or 64-bit
>>> numbers, as sizeof(unsigned long) dictates.  This is, mathematically,
>>> how you do signed differences.  The result coming out negative isn't
>>> incorrect, but instead the expected result given the numbers in question.
>>>
>>> Current RCU use cases might interpret a negative result as "no one did
>>> it for us yet", as in rcu_barrier(), "this CPU has not completed the
>>> current grace period", as in __note_gp_changes(), and so on.
>>>
>>> Now, these can get a false-positive/-negative result.  In the case of
>>> rcu_barrier(), we get a needless rcu_barrier() computation.  In the case
>>> of __note_gp_changes(), the false negative is OK, but a false positive
>>> from wrapping all the way around, could be fatal for a CPU that was idle
>>> for a long time (hence ->gpwrap).  And so on.
>>
>> Understood and agreed! I guess what I was getting at was that ->gpwrap can be
>> true and false-negatives start getting reported well before a full wrap /
>> false-positive situation arises, I believe. And a false-negative may not be
>> desirable in all scenarios. In my other patch, I posted a scenario where I was
>> concerned about missed wakeups on a false-negative situation.
>>
>> In srcutree.c
>> 		swait_event_interruptible_exclusive(
>> 			rnp->nocb_gp_wq[rcu_seq_ctr(wait_gp_seq) & 0x1],
>> 			rcu_seq_done(&rnp->gp_seq, wait_gp_seq) ||
>> 			!READ_ONCE(my_rdp->nocb_gp_sleep));
>>
>> But I shamelessly admit I don't understand this code well enough to say if this
>> is an issue :/.
> 
> This looks like RCU NOCB code rather than SRCU code?  Could you please
> tell me what function this code lives in?  The closest match I see is
> in nocb_gp_wait()?

Oops, sorry I was looking at too many things.

Yes, it is in nocb_gp_wait(). I meant the rcu_seq_done() in that snippet in
'tree_nocb.h'.

>>>>>> >From a testing POV, the example shows it is not set even when around when a
>>>>>> wrap actually happened.  So may be, we are not testing gpwrap much, or at
>>>>>> all, with rcutorture?
>>>>>
>>>>> True enough, and that would be well worth fixing.
>>>>>
>>>>> One simple (and maybe too simple) fix is to add a Kconfig option to
>>>>> kernel/rcu/Kconfig.debug that makes rcu_gpnum_ovf() use something like
>>>>> (32 << RCU_SEQ_CTR_SHIFT).  This, or some similar value, should cause
>>>>> rcutorture's stuttering and/or shuffling to trigger setting of ->gpwrap
>>>>> within a few minutes.
>>
>> Since minutes was too long for me :-D, I am trying just a count of 4, and I do
>> see 15 total wraps in just 2 minutes, on 36 CPU system.
> 
> Very good, and thank you for taking this on!
> 
>> I am trying this same patch now for 10 minute run on all scenarios:
>> https://git.kernel.org/pub/scm/linux/kernel/git/jfern/linux.git/commit/?h=rcu/next&id=984c47eeb7728829901b33462baae2771af80ea0
>> Will try longer runs after.
> 
> A good prototype!

Thanks, so far I could not produce any issue with doing this. Which could mean
that the code is robust to any wrap issues, but I guess no harm in testing it
for future-proofing?

> 
>> Are you interested in an rcutorture patch where we force this? Would this be a
>> default setting where we always run it, or do we want to set it only under
>> torture command line options? Do we default to say a high value like 32, but
>> still provide an option to lower it to say 4, like I am doing?
> 
> Something like this would be good, but only under options.  (In order to
> more easily debug something that only happens with hyperactive ->gpwrap
> setting.)  For example, defined with yet another "rcutree." module_param()
> that is also recorded in kernel-parameters.txt.  And printed out at boot,
> perhaps in the rcu_bootup_announce_oddness() function.
> 
> We need a default setting that causes it to happen (very roughly) every
> ten minutes on a given CPU, or perhaps better, every minute or so across

And that default is certainly not ULONG_MAX/2 but rather something that I
empirically calculate to cause a half-way wrap in 10 minutes?

> the system, by scaling the default by the number of CPUs.  After all,
> we want to test both the ->gpwrap path *and* the non-->gpwrap path.

Agreed. But do we want to test both these paths only under rcutorture testing or
were you considering we do it for regular operation as well? i.e. wrap in 10
minutes in regular operation. I am guessing you want it only for rcutorture
_and_ have a 10 minute default.

If that is the case, we would then need rcu_gpnum_ovf() to consult rcutorture,
if it is enabled, about what what the threshold is.


> Then that parameter's value would of course take the place of the
> hard-coded "4" in your new ULONG_CMP_LT() call in rcu_gpnum_ovf().
> 
> For example, the "pr_alert("%s%s total_gpwrap_count...);" works well
> for prototyping, but for production needs to take its place elsewhere,
> most likely later in that same series of pr_cont() invocations.
> 
> The get_gp_wrap_count() should sum up the per-CPU ->gpwrap_count fields,
> right?  Ah, but you are calling it from a per-CPU loop anyway, so either
> way should be fine.  But either way, get_gp_wrap_count() does need a
> header comment saying why.  Trust me, it won't necessarily be obvious
> five years from now.  ;-)
> 
> And that "atomic_inc(&rdp->gpwrap_count)" could be this_cpu_inc(), or
> even "WRITE_ONCE(..., ... + 1)", given that we are not worried about
> losing counts once in awhile.

Agreed to all these, I will make these changes.

> I *think* that it is OK to have the tree.[ch] and rcutorture.c portions
> in the same patch, but whoever is pushing this to mainline might well ask
> for it to be split if we happen to end up with merge conflicts.  Or maybe
> just get more practice handling merge conflicts, but it is their call.

Ok, will see how it looks and consider splitting.

Thanks!

 - Joel


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: The rdp->gpwrap behavior
  2025-02-18 22:50           ` Joel Fernandes
@ 2025-02-19 13:54             ` Paul E. McKenney
  2025-02-19 15:57               ` Joel Fernandes
  0 siblings, 1 reply; 10+ messages in thread
From: Paul E. McKenney @ 2025-02-19 13:54 UTC (permalink / raw)
  To: Joel Fernandes; +Cc: rcu

On Tue, Feb 18, 2025 at 05:50:27PM -0500, Joel Fernandes wrote:
> Hi Paul,
> 
> On 2/18/2025 8:17 AM, Paul E. McKenney wrote:
> > On Sun, Feb 16, 2025 at 07:23:48AM -0500, Joel Fernandes wrote:
> >> On 2/15/2025 5:28 AM, Paul E. McKenney wrote:
> >>> On Fri, Feb 14, 2025 at 01:03:40PM -0500, Joel Fernandes wrote:
> >>>>>> Following up about our gpwrap discussions, I did some tracing of rdp->gpwrap
> >>>>>> with just boot testing.
> >>>>>>
> >>>>>> I actually never see it set because rdp->gp_seq and rnp->gp_seq are usually
> >>>>>> very close.
> >>>>>>
> >>>>>> Example trace:
> >>>>>>
> >>>>>> # rnp->gp_seq starts with -1200 on boot and then right around the wrap:
> >>>>>>
> >>>>>> 178.096329: rcu_gpnum_ovf: CPU: 2, rnp->gp_seq: -3 (0xfffffffffffffffd), rdp->gp_seq: -3 (0xfffffffffffffffd), wrap before: 0
> >>>>>> 178.096330: rcu_gpnum_ovf: CPU: 2, rnp->gp_seq: -3 (0xfffffffffffffffd), rdp->gp_seq: -3 (0xfffffffffffffffd), wrap after: 0
> >>>>>> 178.100327: rcu_gpnum_ovf: CPU: 2, rnp->gp_seq: 1 (0x1), rdp->gp_seq: 1 (0x1), wrap before: 0
> >>>>>> 178.100328: rcu_gpnum_ovf: CPU: 2, rnp->gp_seq: 1 (0x1), rdp->gp_seq: 1 (0x1), wrap after: 0
> >>>>>>
> >>>>>> The wrap "before" and "after" are the value of gpwrap before and after
> >>>>>> calling rcu_gpnum_ovf().
> >>>>>>
> >>>>>> Closer look at the _ovf() shows it should be only set if rdp->gp_seq and
> >>>>>> rnp->gp_seq are quite a distance apart (say if the CPU was idle for too long
> >>>>>> and many GPs have passed. This can happen both with/without wrap. So imminent
> >>>>>> wrapping seems not necessary for ->gpwrap to be even set AFAICS.
> >>>>>>
> >>>>>> I think the problem is ULONG_CMP_LT is not really "<" so even after wrap, it
> >>>>>> can false. i.e if rdp->gpseq + ULONG/4 wraps, it could still return false.
> >>>>>
> >>>>> Exactly, and all of that is in fact the intended behavior.
> >>>>
> >>>> Ah I think I get it now, so the wrap isn't really about the counter wrapping
> >>>> causing an issue, but about the design of the ULONG_CMP APIs which are designed
> >>>> for comparisons that wrap around. Due to this design, there is a side effect
> >>>> that the numbers being compared have to be at a distance that is less than half
> >>>> the number wheel. Otherwise the comparisons of numbers yields incorrect results.
> >>>> Is that accurate to say?  IOW, ->gpwrap == true means, don't compare the numbers
> >>>> anymore till we sort these huge deltas out.
> >>>
> >>> But "incorrect" is such a harsh and judgmental word.  ;-)
> >>>
> >>> Somewhat more seriously, ULONG_CMP_{GE,LT}() are defined to return the
> >>> sign (or its complement) of differences between a pair of 32-bit or 64-bit
> >>> numbers, as sizeof(unsigned long) dictates.  This is, mathematically,
> >>> how you do signed differences.  The result coming out negative isn't
> >>> incorrect, but instead the expected result given the numbers in question.
> >>>
> >>> Current RCU use cases might interpret a negative result as "no one did
> >>> it for us yet", as in rcu_barrier(), "this CPU has not completed the
> >>> current grace period", as in __note_gp_changes(), and so on.
> >>>
> >>> Now, these can get a false-positive/-negative result.  In the case of
> >>> rcu_barrier(), we get a needless rcu_barrier() computation.  In the case
> >>> of __note_gp_changes(), the false negative is OK, but a false positive
> >>> from wrapping all the way around, could be fatal for a CPU that was idle
> >>> for a long time (hence ->gpwrap).  And so on.
> >>
> >> Understood and agreed! I guess what I was getting at was that ->gpwrap can be
> >> true and false-negatives start getting reported well before a full wrap /
> >> false-positive situation arises, I believe. And a false-negative may not be
> >> desirable in all scenarios. In my other patch, I posted a scenario where I was
> >> concerned about missed wakeups on a false-negative situation.
> >>
> >> In srcutree.c
> >> 		swait_event_interruptible_exclusive(
> >> 			rnp->nocb_gp_wq[rcu_seq_ctr(wait_gp_seq) & 0x1],
> >> 			rcu_seq_done(&rnp->gp_seq, wait_gp_seq) ||
> >> 			!READ_ONCE(my_rdp->nocb_gp_sleep));
> >>
> >> But I shamelessly admit I don't understand this code well enough to say if this
> >> is an issue :/.
> > 
> > This looks like RCU NOCB code rather than SRCU code?  Could you please
> > tell me what function this code lives in?  The closest match I see is
> > in nocb_gp_wait()?
> 
> Oops, sorry I was looking at too many things.
> 
> Yes, it is in nocb_gp_wait(). I meant the rcu_seq_done() in that snippet in
> 'tree_nocb.h'.

OK, same as any other use case.  Can wrap happen, either across ULONG_MIN
or zero, and if so, what are the consequences?  Are things better or worse
with rcu_seq_done_exact()?  If a change is in order, how do we test it?

Plus, given that this is nocb code, we should get Frederic's opinion.

> >>>>>> >From a testing POV, the example shows it is not set even when around when a
> >>>>>> wrap actually happened.  So may be, we are not testing gpwrap much, or at
> >>>>>> all, with rcutorture?
> >>>>>
> >>>>> True enough, and that would be well worth fixing.
> >>>>>
> >>>>> One simple (and maybe too simple) fix is to add a Kconfig option to
> >>>>> kernel/rcu/Kconfig.debug that makes rcu_gpnum_ovf() use something like
> >>>>> (32 << RCU_SEQ_CTR_SHIFT).  This, or some similar value, should cause
> >>>>> rcutorture's stuttering and/or shuffling to trigger setting of ->gpwrap
> >>>>> within a few minutes.
> >>
> >> Since minutes was too long for me :-D, I am trying just a count of 4, and I do
> >> see 15 total wraps in just 2 minutes, on 36 CPU system.
> > 
> > Very good, and thank you for taking this on!
> > 
> >> I am trying this same patch now for 10 minute run on all scenarios:
> >> https://git.kernel.org/pub/scm/linux/kernel/git/jfern/linux.git/commit/?h=rcu/next&id=984c47eeb7728829901b33462baae2771af80ea0
> >> Will try longer runs after.
> > 
> > A good prototype!
> 
> Thanks, so far I could not produce any issue with doing this. Which could mean
> that the code is robust to any wrap issues, but I guess no harm in testing it
> for future-proofing?

As long as we have proper control and proper default values, as outlined
in my earlier message.  ;-)

> >> Are you interested in an rcutorture patch where we force this? Would this be a
> >> default setting where we always run it, or do we want to set it only under
> >> torture command line options? Do we default to say a high value like 32, but
> >> still provide an option to lower it to say 4, like I am doing?
> > 
> > Something like this would be good, but only under options.  (In order to
> > more easily debug something that only happens with hyperactive ->gpwrap
> > setting.)  For example, defined with yet another "rcutree." module_param()
> > that is also recorded in kernel-parameters.txt.  And printed out at boot,
> > perhaps in the rcu_bootup_announce_oddness() function.
> > 
> > We need a default setting that causes it to happen (very roughly) every
> > ten minutes on a given CPU, or perhaps better, every minute or so across
> 
> And that default is certainly not ULONG_MAX/2 but rather something that I
> empirically calculate to cause a half-way wrap in 10 minutes?

Exactly!

Maybe also use negative numbers to allow scaling by the number of CPUs,
but that can be separate, when/if needed.  (Mentioning this now to avoid
the negative numbers being used unwittingly.)

> > the system, by scaling the default by the number of CPUs.  After all,
> > we want to test both the ->gpwrap path *and* the non-->gpwrap path.
> 
> Agreed. But do we want to test both these paths only under rcutorture testing or
> were you considering we do it for regular operation as well? i.e. wrap in 10
> minutes in regular operation. I am guessing you want it only for rcutorture
> _and_ have a 10 minute default.
> 
> If that is the case, we would then need rcu_gpnum_ovf() to consult rcutorture,
> if it is enabled, about what what the threshold is.

The rcu_gp_slow() function does something similar for the grace-period
kthread's injected delays, so that can work.  Probably better than putting
the module parameter in tree.c, because putting it in rcutorture makes
in more clear that it is for debugging.

> > Then that parameter's value would of course take the place of the
> > hard-coded "4" in your new ULONG_CMP_LT() call in rcu_gpnum_ovf().
> > 
> > For example, the "pr_alert("%s%s total_gpwrap_count...);" works well
> > for prototyping, but for production needs to take its place elsewhere,
> > most likely later in that same series of pr_cont() invocations.
> > 
> > The get_gp_wrap_count() should sum up the per-CPU ->gpwrap_count fields,
> > right?  Ah, but you are calling it from a per-CPU loop anyway, so either
> > way should be fine.  But either way, get_gp_wrap_count() does need a
> > header comment saying why.  Trust me, it won't necessarily be obvious
> > five years from now.  ;-)
> > 
> > And that "atomic_inc(&rdp->gpwrap_count)" could be this_cpu_inc(), or
> > even "WRITE_ONCE(..., ... + 1)", given that we are not worried about
> > losing counts once in awhile.
> 
> Agreed to all these, I will make these changes.
> 
> > I *think* that it is OK to have the tree.[ch] and rcutorture.c portions
> > in the same patch, but whoever is pushing this to mainline might well ask
> > for it to be split if we happen to end up with merge conflicts.  Or maybe
> > just get more practice handling merge conflicts, but it is their call.
> 
> Ok, will see how it looks and consider splitting.

Sounds good, looking forward to seeing what you come up with.

							Thanx, Paul

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: The rdp->gpwrap behavior
  2025-02-19 13:54             ` Paul E. McKenney
@ 2025-02-19 15:57               ` Joel Fernandes
  2025-02-19 17:02                 ` Paul E. McKenney
  0 siblings, 1 reply; 10+ messages in thread
From: Joel Fernandes @ 2025-02-19 15:57 UTC (permalink / raw)
  To: paulmck; +Cc: rcu



On 2/19/2025 8:54 AM, Paul E. McKenney wrote:
>>> the system, by scaling the default by the number of CPUs.  After all,
>>> we want to test both the ->gpwrap path *and* the non-->gpwrap path.
>> Agreed. But do we want to test both these paths only under rcutorture testing or
>> were you considering we do it for regular operation as well? i.e. wrap in 10
>> minutes in regular operation. I am guessing you want it only for rcutorture
>> _and_ have a 10 minute default.
>>
>> If that is the case, we would then need rcu_gpnum_ovf() to consult rcutorture,
>> if it is enabled, about what what the threshold is.
> The rcu_gp_slow() function does something similar for the grace-period
> kthread's injected delays, so that can work.  Probably better than putting
> the module parameter in tree.c, because putting it in rcutorture makes
> in more clear that it is for debugging.

Ok thanks, just to clarify:

Normal operation is still ULONG/2 threshold.
rcutorture operation will be 10 minutes (overridable by rcutorture module
option, not module option in non-test kernel code itself).

If we are in disagreement about this, please let me know. Otherwise I will work
on a patch as a next step on this.

 - Joel


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: The rdp->gpwrap behavior
  2025-02-19 15:57               ` Joel Fernandes
@ 2025-02-19 17:02                 ` Paul E. McKenney
  0 siblings, 0 replies; 10+ messages in thread
From: Paul E. McKenney @ 2025-02-19 17:02 UTC (permalink / raw)
  To: Joel Fernandes; +Cc: rcu

On Wed, Feb 19, 2025 at 10:57:29AM -0500, Joel Fernandes wrote:
> On 2/19/2025 8:54 AM, Paul E. McKenney wrote:
> >>> the system, by scaling the default by the number of CPUs.  After all,
> >>> we want to test both the ->gpwrap path *and* the non-->gpwrap path.
> >> Agreed. But do we want to test both these paths only under rcutorture testing or
> >> were you considering we do it for regular operation as well? i.e. wrap in 10
> >> minutes in regular operation. I am guessing you want it only for rcutorture
> >> _and_ have a 10 minute default.
> >>
> >> If that is the case, we would then need rcu_gpnum_ovf() to consult rcutorture,
> >> if it is enabled, about what what the threshold is.
> > The rcu_gp_slow() function does something similar for the grace-period
> > kthread's injected delays, so that can work.  Probably better than putting
> > the module parameter in tree.c, because putting it in rcutorture makes
> > in more clear that it is for debugging.
> 
> Ok thanks, just to clarify:
> 
> Normal operation is still ULONG/2 threshold.

Yes, please keep this be the default.

> rcutorture operation will be 10 minutes (overridable by rcutorture module
> option, not module option in non-test kernel code itself).

That sounds good also.

> If we are in disagreement about this, please let me know. Otherwise I will work
> on a patch as a next step on this.

Very good, looking forward to seeing what you come up with!

							Thanx, Paul

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2025-02-19 17:02 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-02-13 18:30 The rdp->gpwrap behavior Joel Fernandes
2025-02-14  8:20 ` Paul E. McKenney
2025-02-14 18:03   ` Joel Fernandes
2025-02-15 10:28     ` Paul E. McKenney
2025-02-16 12:23       ` Joel Fernandes
2025-02-18 13:17         ` Paul E. McKenney
2025-02-18 22:50           ` Joel Fernandes
2025-02-19 13:54             ` Paul E. McKenney
2025-02-19 15:57               ` Joel Fernandes
2025-02-19 17:02                 ` Paul E. McKenney

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.