All the mail mirrored from lore.kernel.org
 help / color / mirror / Atom feed
* Backport d583d360a6 into 5.12 stable
@ 2021-06-22 16:30 Oleksandr Natalenko
  2021-06-22 16:47 ` Greg KH
  0 siblings, 1 reply; 5+ messages in thread
From: Oleksandr Natalenko @ 2021-06-22 16:30 UTC (permalink / raw
  To: stable; +Cc: Johannes Weiner, Peter Zijlstra

Hello.

I'd like to nominate d583d360a6 ("psi: Fix psi state corruption when 
schedule() races with cgroup move") for 5.12 stable tree.

Recently, I've hit this:

```
kernel: psi: inconsistent task state! task=2667:clementine cpu=21 psi_flags=0 
clear=1 set=0
```

and after that PSI IO went crazy high. That seems to match the symptoms 
described in the commit message.

Thanks.

-- 
Oleksandr Natalenko (post-factum)



^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Backport d583d360a6 into 5.12 stable
  2021-06-22 16:30 Backport d583d360a6 into 5.12 stable Oleksandr Natalenko
@ 2021-06-22 16:47 ` Greg KH
  2021-06-22 17:24   ` Oleksandr Natalenko
  0 siblings, 1 reply; 5+ messages in thread
From: Greg KH @ 2021-06-22 16:47 UTC (permalink / raw
  To: Oleksandr Natalenko; +Cc: stable, Johannes Weiner, Peter Zijlstra

On Tue, Jun 22, 2021 at 06:30:46PM +0200, Oleksandr Natalenko wrote:
> Hello.
> 
> I'd like to nominate d583d360a6 ("psi: Fix psi state corruption when 
> schedule() races with cgroup move") for 5.12 stable tree.
> 
> Recently, I've hit this:
> 
> ```
> kernel: psi: inconsistent task state! task=2667:clementine cpu=21 psi_flags=0 
> clear=1 set=0
> ```
> 
> and after that PSI IO went crazy high. That seems to match the symptoms 
> described in the commit message.

But this says it fixes 4117cebf1a9f ("psi: Optimize task switch inside
shared cgroups") which did not show up until 5.13-rc1, so how are you
hitting this issue?

Did you try this patch on 5.12.y and see that it solved your problem?

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Backport d583d360a6 into 5.12 stable
  2021-06-22 16:47 ` Greg KH
@ 2021-06-22 17:24   ` Oleksandr Natalenko
  2021-06-22 18:27     ` Johannes Weiner
  0 siblings, 1 reply; 5+ messages in thread
From: Oleksandr Natalenko @ 2021-06-22 17:24 UTC (permalink / raw
  To: Greg KH; +Cc: stable, Johannes Weiner, Peter Zijlstra

Hello.

On úterý 22. června 2021 18:47:59 CEST Greg KH wrote:
> On Tue, Jun 22, 2021 at 06:30:46PM +0200, Oleksandr Natalenko wrote:
> > I'd like to nominate d583d360a6 ("psi: Fix psi state corruption when
> > schedule() races with cgroup move") for 5.12 stable tree.
> > 
> > Recently, I've hit this:
> > 
> > ```
> > kernel: psi: inconsistent task state! task=2667:clementine cpu=21
> > psi_flags=0 clear=1 set=0
> > ```
> > 
> > and after that PSI IO went crazy high. That seems to match the symptoms
> > described in the commit message.
> 
> But this says it fixes 4117cebf1a9f ("psi: Optimize task switch inside
> shared cgroups") which did not show up until 5.13-rc1, so how are you
> hitting this issue?

I'm not positive 4117cebf1a9f was a root cause of the race. To me it looks 
like 4117cebf1a9f just made an older issue more likely to be hit.

Peter, Johannes, am I correct saying that it is still possible to hit a 
corruption described in d583d360a6 on 5.12?

> Did you try this patch on 5.12.y and see that it solved your problem?

Yes, I've built the kernel with this patch, and so far it runs fine. It can 
take a while until the condition is hit though since it seems to be very 
unlikely on 5.12.

-- 
Oleksandr Natalenko (post-factum)



^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Backport d583d360a6 into 5.12 stable
  2021-06-22 17:24   ` Oleksandr Natalenko
@ 2021-06-22 18:27     ` Johannes Weiner
  2021-06-23 14:00       ` Oleksandr Natalenko
  0 siblings, 1 reply; 5+ messages in thread
From: Johannes Weiner @ 2021-06-22 18:27 UTC (permalink / raw
  To: Oleksandr Natalenko; +Cc: Greg KH, stable, Peter Zijlstra

On Tue, Jun 22, 2021 at 07:24:56PM +0200, Oleksandr Natalenko wrote:
> Hello.
> 
> On úterý 22. června 2021 18:47:59 CEST Greg KH wrote:
> > On Tue, Jun 22, 2021 at 06:30:46PM +0200, Oleksandr Natalenko wrote:
> > > I'd like to nominate d583d360a6 ("psi: Fix psi state corruption when
> > > schedule() races with cgroup move") for 5.12 stable tree.
> > > 
> > > Recently, I've hit this:
> > > 
> > > ```
> > > kernel: psi: inconsistent task state! task=2667:clementine cpu=21
> > > psi_flags=0 clear=1 set=0
> > > ```
> > > 
> > > and after that PSI IO went crazy high. That seems to match the symptoms
> > > described in the commit message.
> > 
> > But this says it fixes 4117cebf1a9f ("psi: Optimize task switch inside
> > shared cgroups") which did not show up until 5.13-rc1, so how are you
> > hitting this issue?
> 
> I'm not positive 4117cebf1a9f was a root cause of the race. To me it looks 
> like 4117cebf1a9f just made an older issue more likely to be hit.
> 
> Peter, Johannes, am I correct saying that it is still possible to hit a 
> corruption described in d583d360a6 on 5.12?

I'm not aware of a previous issue, but it's possible you discovered
one that was incidentally fixed by this change.

That said, there haven't been many changes in this area prior to 5.12,
and I stared at the old code quite a bit to see if there are other
possible scenarios, so this gives me pause.

> > Did you try this patch on 5.12.y and see that it solved your problem?
> 
> Yes, I've built the kernel with this patch, and so far it runs fine. It can 
> take a while until the condition is hit though since it seems to be very 
> unlikely on 5.12.

Is your task moving / being moved between cgroups while it's doing
work?

How long does it usually take to trigger it?

Would it be possible to share a simpler reproducer, or is this part of
a more complex application?

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Backport d583d360a6 into 5.12 stable
  2021-06-22 18:27     ` Johannes Weiner
@ 2021-06-23 14:00       ` Oleksandr Natalenko
  0 siblings, 0 replies; 5+ messages in thread
From: Oleksandr Natalenko @ 2021-06-23 14:00 UTC (permalink / raw
  To: Johannes Weiner; +Cc: Greg KH, stable, Peter Zijlstra

Hello.

On úterý 22. června 2021 20:27:51 CEST Johannes Weiner wrote:
> On Tue, Jun 22, 2021 at 07:24:56PM +0200, Oleksandr Natalenko wrote:
> > On úterý 22. června 2021 18:47:59 CEST Greg KH wrote:
> > > On Tue, Jun 22, 2021 at 06:30:46PM +0200, Oleksandr Natalenko wrote:
> > > > I'd like to nominate d583d360a6 ("psi: Fix psi state corruption when
> > > > schedule() races with cgroup move") for 5.12 stable tree.
> > > > 
> > > > Recently, I've hit this:
> > > > 
> > > > ```
> > > > kernel: psi: inconsistent task state! task=2667:clementine cpu=21
> > > > psi_flags=0 clear=1 set=0
> > > > ```
> > > > 
> > > > and after that PSI IO went crazy high. That seems to match the
> > > > symptoms
> > > > described in the commit message.
> > > 
> > > But this says it fixes 4117cebf1a9f ("psi: Optimize task switch inside
> > > shared cgroups") which did not show up until 5.13-rc1, so how are you
> > > hitting this issue?
> > 
> > I'm not positive 4117cebf1a9f was a root cause of the race. To me it looks
> > like 4117cebf1a9f just made an older issue more likely to be hit.
> > 
> > Peter, Johannes, am I correct saying that it is still possible to hit a
> > corruption described in d583d360a6 on 5.12?
> 
> I'm not aware of a previous issue, but it's possible you discovered
> one that was incidentally fixed by this change.
> 
> That said, there haven't been many changes in this area prior to 5.12,
> and I stared at the old code quite a bit to see if there are other
> possible scenarios, so this gives me pause.

Ack.

> > > Did you try this patch on 5.12.y and see that it solved your problem?
> > 
> > Yes, I've built the kernel with this patch, and so far it runs fine. It
> > can
> > take a while until the condition is hit though since it seems to be very
> > unlikely on 5.12.
> 
> Is your task moving / being moved between cgroups while it's doing
> work?

Likely, yes. IIUC, KDE spawns apps in separate cgroups so that in that very 
case Clementine should get its own one (?):

```
$ systemd-cgls
…
│   │ │ ├─app-clementine-df516e4181f446ab869e723ea2ed6094.scope 
│   │ │ │ ├─2926 /bin/clementine -session 
10de706f63000162437544200000015700012_1624379013_575845
│   │ │ │ ├─3059 /usr/bin/clementine-tagreader /tmp/clementine_735427711
│   │ │ │ ├─3060 /usr/bin/clementine-tagreader /tmp/clementine_557274898
│   │ │ │ ├─3062 /usr/bin/clementine-tagreader /tmp/clementine_1730944950
│   │ │ │ ├─3063 /usr/bin/clementine-tagreader /tmp/clementine_1509249421
│   │ │ │ ├─3065 /usr/bin/clementine-tagreader /tmp/clementine_1345386497
│   │ │ │ ├─3068 /usr/bin/clementine-tagreader /tmp/clementine_865255891
│   │ │ │ ├─3070 /usr/bin/clementine-tagreader /tmp/clementine_1782561441
│   │ │ │ ├─3072 /usr/bin/clementine-tagreader /tmp/clementine_421851305
│   │ │ │ ├─3073 /usr/bin/clementine-tagreader /tmp/clementine_175368243
│   │ │ │ ├─3075 /usr/bin/clementine-tagreader /tmp/clementine_1962830479
│   │ │ │ ├─3076 /usr/bin/clementine-tagreader /tmp/clementine_547573203
│   │ │ │ ├─3078 /usr/bin/clementine-tagreader /tmp/clementine_1819270047
│   │ │ │ ├─3079 /usr/bin/clementine-tagreader /tmp/clementine_1632862299
│   │ │ │ ├─3085 /usr/bin/clementine-tagreader /tmp/clementine_1279975869
│   │ │ │ ├─3095 /usr/bin/clementine-tagreader /tmp/clementine_1612119641
│   │ │ │ ├─3102 /usr/bin/clementine-tagreader /tmp/clementine_1789578483
│   │ │ │ ├─3103 /usr/bin/clementine-tagreader /tmp/clementine_1541442265
│   │ │ │ ├─3105 /usr/bin/clementine-tagreader /tmp/clementine_1418456770
│   │ │ │ ├─3106 /usr/bin/clementine-tagreader /tmp/clementine_1998684543
│   │ │ │ ├─3107 /usr/bin/clementine-tagreader /tmp/clementine_1349315391
│   │ │ │ ├─3108 /usr/bin/clementine-tagreader /tmp/clementine_231895572
│   │ │ │ ├─3110 /usr/bin/clementine-tagreader /tmp/clementine_492688785
│   │ │ │ ├─3111 /usr/bin/clementine-tagreader /tmp/clementine_1492630900
│   │ │ │ └─3112 /usr/bin/clementine-tagreader /tmp/clementine_2017490599
…
```

> How long does it usually take to trigger it?

I don't know :(. I don't usually peer into dmesg, and noticed this by a pure 
chance. Grepping the journal shows nothing else but only this occurrence, and 
also the journal is rotating, so some info might be already lost.

> Would it be possible to share a simpler reproducer, or is this part of
> a more complex application?

This was triggered bu KDE's autostart of Clementine player, and I don't have 
any specific reproducer. If I find one, I'll share it of course.

Thanks.

-- 
Oleksandr Natalenko (post-factum)



^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2021-06-23 14:00 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2021-06-22 16:30 Backport d583d360a6 into 5.12 stable Oleksandr Natalenko
2021-06-22 16:47 ` Greg KH
2021-06-22 17:24   ` Oleksandr Natalenko
2021-06-22 18:27     ` Johannes Weiner
2021-06-23 14:00       ` Oleksandr Natalenko

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.