Linux-arch Archive mirror
 help / color / mirror / Atom feed
From: Al Viro <viro@zeniv.linux.org.uk>
To: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: linux-kernel@vger.kernel.org, linux-arch@vger.kernel.org,
	Linus Torvalds <torvalds@linux-foundation.org>,
	Alexey Gladkov <legion@kernel.org>, Kyle Huey <me@kylehuey.com>,
	Oleg Nesterov <oleg@redhat.com>,
	Kees Cook <keescook@chromium.org>,
	Heiko Carstens <hca@linux.ibm.com>,
	Vasily Gorbik <gor@linux.ibm.com>,
	Christian Borntraeger <borntraeger@de.ibm.com>,
	Alexander Gordeev <agordeev@linux.ibm.com>,
	Martin Schwidefsky <schwidefsky@de.ibm.com>
Subject: Re: [PATCH 06/10] exit: Implement kthread_exit
Date: Fri, 7 Jan 2022 02:27:54 +0000	[thread overview]
Message-ID: <YdelKq9U86/dHPcm@zeniv-ca.linux.org.uk> (raw)
In-Reply-To: <20211208202532.16409-6-ebiederm@xmission.com>

On Wed, Dec 08, 2021 at 02:25:28PM -0600, Eric W. Biederman wrote:

> +/**
> + * kthread_exit - Cause the current kthread return @result to kthread_stop().
> + * @result: The integer value to return to kthread_stop().
> + *
> + * While kthread_exit can be called directly, it exists so that
> + * functions which do some additional work in non-modular code such as
> + * module_put_and_kthread_exit can be implemented.
> + *
> + * Does not return.
> + */
> +void __noreturn kthread_exit(long result)
> +{
> +	do_exit(result);
> +}

>  static int kthread(void *_create)
>  {
>  	static const struct sched_param param = { .sched_priority = 0 };
> @@ -286,13 +301,13 @@ static int kthread(void *_create)
>  	done = xchg(&create->done, NULL);
>  	if (!done) {
>  		kfree(create);
> -		do_exit(-EINTR);
> +		kthread_exit(-EINTR);

This do_exit(-EINTR) is pure cargo-culting; nobody will see that return
value, since by this point nobody could have looked at the task_struct
or pid.  Look: we must have had
	* __kthread_create_on_node() called
	  it has allocated a request (create), filled it, put it on
	  kthreadd's request list and woke kthreadd up.
	* it went to wait (killably) for completion of create->done.
	* it got a SIGKILL and had succefully replaced create->done
	  with NULL.
	* since it has gotten non-NULL, it has buggered off (with -EINTR,
	  incidentally).
In the meanwhile, kthreadd had picked the request from its list and
successfully forked the child.  Child had run kthread() up to the point
you'd quoted, i.e. it had observed create->done already NULL, freed create
and is now terminating itself.

Caller of kernel_thread() doesn't pass the pid to anyone, since the pid
must've been positive.	kthread() does not store its pid or task_struct
reference anywhere on that path.  __kthread_create_on_node() doesn't even
bother looking at anything other than create->done and it returns -EINTR.
Child couldn't be traced by anyone.

So how the hell could anyone look at the value we pass to do_exit() here?
Might as well use do_exit(0)...


>  	if (!self) {
>  		create->result = ERR_PTR(-ENOMEM);
>  		complete(done);
> -		do_exit(-ENOMEM);
> +		kthread_exit(-ENOMEM);

Ditto.  We must have had
        * __kthread_create_on_node() called
	  it has allocated a request (create), filled it, put it on
	  kthreadd's request list and woke kthreadd.
	* it went to wait (killably) for completion of create->done.
	* it did *NOT* get SIGKILL until after kthread() the child
          successfully forked by kthreadd got through that xchg()
	  in kthread().
Either __kthread_create_on_node() hadn't gotten SIGKILL before our call
of complete(), or it has gotten one, observed NULL create->done and simply
proceeded to do wait_for_completion() on the same thing (it's its local
variable, so it knows what create->done used to point to).  Either way,
__kthread_create_on_node() hits
        task = create->result;
sees that it's ERR_PTR(-ENOMEM) and proceeds to fail with -ENOMEM.
Again, nothing is looking at the pid or task_struct of the child and
nothing could possibly observe its exit code.

>  	}
>  
>  	self->threadfn = threadfn;
> @@ -326,7 +341,7 @@ static int kthread(void *_create)
>  		__kthread_parkme(self);
>  		ret = threadfn(data);
>  	}
> -	do_exit(ret);
> +	kthread_exit(ret);

This one, OTOH, is a different story.  Here we have already hit the
completion and stopped the child.  With __kthread_create_on_node()
having found and returned the task_struct of the child.  Since that
point, somebody had already woken the child (using the pointer returned
by __kthread_create_on_node()).

What's more, the child's payload has already run to completion.
*NORMALLY* that means kthread_stop() called on it.  And there we do the
following bit of nastiness:
        get_task_struct(k);
	...
	mark it "should stop" and wake it up
	...
        wait_for_completion(&kthread->exited);
	ret = k->exit_code;
	put_task_struct(k);

kthread->exited is what k->vfork_done is left pointing to, so this
wait_for_completion() waits for that do_exit() in the kthread() and
we proceed to pick k->exit_code (using the fact that k has just
been pinned by us).  Then kthread_stop() proceeds to return that
value.

Pardon me while I puke.  The value being returned has nothing to do
with the things one could normally find in ->exit_code, for starters.
What's more, kthread->exited is a part of per-thread data structure,
and that same structure could bloody well be used to pass the damn
return value of threadfn(), instead of doing unnatural things with
->exit_code.  And that data structure is not freed until free_task(k),
i.e. we can fetch from it whenever we can fetch k->exit_code.

Oh, and all other ways to stop the thread do not bother looking at
the exit code at all.

IMO the right way to handle that would be
	1) turn these two do_exit() into do_exit(0), to reduce
confusion
	2) deal with all do_exit() in kthread payloads.  Your
name for the primitive is fine, IMO.
	3) make that primitive pass the return value by way of
a field in struct kthread, adjusting kthread_stop() accordingly
and passing 0 to do_exit() in kthread_exit() itself.

(2) is not as trivial as you seem to hope, though.  Your patches
in drivers/staging/rt*/ had papered over the problem in there,
but hadn't really solved it.

thread_exit() should've been shot, all right, but it really ought
to have been complete_and_exit() there.  The thing is, complete()
+ return does *not* guarantee that driver won't get unloaded before
the thread terminates.  Possibly freeing its .code and leaving
a thread to resume running in there as soon as it regains CPU.

The point of complete_and_exit() is that it's noreturn *and* in
core kernel.  So it can be safely used in a modular kthread,
if paired with wait_for_completion() in or before module_exit.
complete() + do_exit() (or complete + return as you've gotten
there) doesn't give such guarantees at all.

I'm (re)crawling through that zoo right now, will post when
I get more details.

  reply	other threads:[~2022-01-07  2:28 UTC|newest]

Thread overview: 137+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-12-08 20:17 [PATCH 00/10] Removal of most do_exit calls Eric W. Biederman
2021-12-08 20:25 ` [PATCH 01/10] exit/s390: Remove dead reference to do_exit from copy_thread Eric W. Biederman
2021-12-12 17:48   ` Heiko Carstens
2021-12-13 14:50     ` Eric W. Biederman
2022-01-05  4:25     ` Al Viro
2021-12-08 20:25 ` [PATCH 02/10] exit: Add and use make_task_dead Eric W. Biederman
2022-01-05  5:01   ` Al Viro
2022-01-05 20:46     ` Eric W. Biederman
2022-01-05 21:53       ` Al Viro
2022-01-05 22:51         ` Linus Torvalds
2022-01-05 23:34           ` Al Viro
2021-12-08 20:25 ` [PATCH 03/10] exit: Move oops specific logic from do_exit into make_task_dead Eric W. Biederman
2022-01-05  5:48   ` Al Viro
2022-01-06  7:08     ` Al Viro
2022-01-07  3:42     ` Al Viro
2022-01-07 19:02       ` Eric W. Biederman
2022-01-07 18:59     ` Eric W. Biederman
2022-01-17  8:05       ` Christoph Hellwig
2022-01-17 12:15         ` Heiko Carstens
2022-01-17 13:17           ` Christoph Hellwig
2022-01-17 13:24         ` Arnd Bergmann
2022-01-17 13:27           ` [PATCH] microblaze: remove CONFIG_SET_FS Arnd Bergmann
2022-02-09 13:50             ` Michal Simek
2022-02-09 13:52               ` Christoph Hellwig
2022-02-09 14:03                 ` Michal Simek
2022-02-09 14:40               ` Arnd Bergmann
2022-02-09 14:44                 ` Michal Simek
2022-02-09 14:54                   ` Arnd Bergmann
2022-02-09 23:31                     ` Stafford Horne
2022-02-11  0:17                       ` Stafford Horne
2022-02-11 16:59                         ` Arnd Bergmann
2022-02-11 17:46                           ` Linus Torvalds
2022-02-11 20:57                             ` Arnd Bergmann
2022-02-11 21:10                               ` Eric W. Biederman
2022-02-11 22:21                                 ` Stafford Horne
2022-02-14  7:41                             ` Christoph Hellwig
2022-02-14  7:50                           ` Christoph Hellwig
2022-02-14 16:20                             ` Arnd Bergmann
2021-12-08 20:25 ` [PATCH 04/10] exit: Stop poorly open coding do_task_dead in make_task_dead Eric W. Biederman
2022-01-05  5:58   ` Al Viro
2022-01-05 22:33     ` Eric W. Biederman
2021-12-08 20:25 ` [PATCH 05/10] exit: Stop exporting do_exit Eric W. Biederman
2022-01-05  6:02   ` Al Viro
2022-01-05 22:36     ` Eric W. Biederman
2021-12-08 20:25 ` [PATCH 06/10] exit: Implement kthread_exit Eric W. Biederman
2022-01-07  2:27   ` Al Viro [this message]
2022-01-08 18:35     ` Eric W. Biederman
2022-01-08 22:44       ` David Laight
2022-01-10 15:00         ` Eric W. Biederman
2022-01-09  3:27       ` Al Viro
2022-01-10 15:05         ` Eric W. Biederman
2021-12-08 20:25 ` [PATCH 07/10] exit: Rename module_put_and_exit to module_put_and_kthread_exit Eric W. Biederman
2021-12-08 20:25 ` [PATCH 08/10] exit: Rename complete_and_exit to kthread_complete_and_exit Eric W. Biederman
2021-12-08 20:25 ` [PATCH 09/10] kthread: Ensure struct kthread is present for all kthreads Eric W. Biederman
2021-12-22 18:19   ` Nathan Chancellor
2021-12-22 18:30     ` Eric W. Biederman
2021-12-22 18:46       ` Nathan Chancellor
2021-12-22 23:22         ` Eric W. Biederman
2021-12-23  0:37           ` Nathan Chancellor
2021-12-23  1:44           ` Linus Torvalds
2021-12-23  3:34             ` Eric W. Biederman
2021-12-23  5:19               ` [PATCH] kthread: Generalize pf_io_worker so it can point to struct kthread Eric W. Biederman
2021-12-23 17:20                 ` Linus Torvalds
2022-01-07  3:59   ` [PATCH 09/10] kthread: Ensure struct kthread is present for all kthreads Al Viro
2022-01-08 18:20     ` Eric W. Biederman
2021-12-08 20:25 ` [PATCH 10/10] exit/kthread: Move the exit code for kernel threads into struct kthread Eric W. Biederman
2022-01-07  3:22   ` Al Viro
2021-12-13 22:50 ` [PATCH 0/8] signal: Cleanup of the signal->flags Eric W. Biederman
2022-01-03 21:30   ` [PATCH 00/17] exit: Making task exiting a first class concept Eric W. Biederman
2022-01-03 21:32     ` [PATCH 01/17] exit: Remove profile_task_exit & profile_munmap Eric W. Biederman
2022-01-04  7:38       ` Christoph Hellwig
2022-01-07  3:48       ` Al Viro
2022-01-08 16:10         ` Eric W. Biederman
2022-01-03 21:32     ` [PATCH 02/17] exit: Coredumps reach do_group_exit Eric W. Biederman
2022-01-03 21:32     ` [PATCH 03/17] exit: Fix the exit_code for wait_task_zombie Eric W. Biederman
2022-01-03 21:32     ` [PATCH 04/17] exit: Use the correct exit_code in /proc/<pid>/stat Eric W. Biederman
2022-01-03 21:33     ` [PATCH 05/17] taskstats: Cleanup the use of task->exit_code Eric W. Biederman
2022-01-03 21:33     ` [PATCH 06/17] ptrace: Remove second setting of PT_SEIZED in ptrace_attach Eric W. Biederman
2022-01-03 21:33     ` [PATCH 07/17] ptrace: Remove unused regs argument from ptrace_report_syscall Eric W. Biederman
2022-01-03 21:33     ` [PATCH 08/17] ptrace/m68k: Stop open coding ptrace_report_syscall Eric W. Biederman
2022-01-10 15:26       ` Geert Uytterhoeven
2022-01-10 16:20         ` Al Viro
2022-01-10 16:25           ` Al Viro
2022-01-10 17:54           ` Geert Uytterhoeven
2022-01-10 20:37             ` Al Viro
2022-01-10 21:18               ` Eric W. Biederman
2022-01-11  1:33             ` Michael Schmitz
2022-01-11 22:42               ` Finn Thain
2022-01-12  0:20                 ` Michael Schmitz
2022-01-12  3:32                   ` Finn Thain
2022-01-12  7:54                     ` Michael Schmitz
2022-01-12  7:55                   ` Geert Uytterhoeven
2022-01-12  8:05                     ` Michael Schmitz
2022-01-03 21:33     ` [PATCH 09/17] ptrace: Move setting/clearing ptrace_message into ptrace_stop Eric W. Biederman
2022-01-03 21:33     ` [PATCH 10/17] ptrace: Return the signal to continue with from ptrace_stop Eric W. Biederman
2022-01-03 21:33     ` [PATCH 11/17] ptrace: Separate task->ptrace_code out from task->exit_code Eric W. Biederman
2022-01-03 21:33     ` [PATCH 12/17] signal: Compute the process exit_code in get_signal Eric W. Biederman
2022-01-03 21:33     ` [PATCH 13/17] signal: Make individual tasks exiting a first class concept Eric W. Biederman
2022-01-03 21:33     ` [PATCH 14/17] signal: Remove zap_other_threads Eric W. Biederman
2022-01-03 21:33     ` [PATCH 15/17] signal: Add JOBCTL_WILL_EXIT to mark exiting tasks Eric W. Biederman
2022-01-03 21:33     ` [PATCH 16/17] signal: Record the exit_code when an exit is scheduled Eric W. Biederman
2022-01-03 21:33     ` [PATCH 17/17] signal: Always set SIGNAL_GROUP_EXIT on process exit Eric W. Biederman
2022-03-09  0:15     ` [PATCH 00/13] Removing tracehook.h Eric W. Biederman
2022-03-09 20:58       ` Linus Torvalds
2021-12-13 22:53 ` [PATCH 1/8] signal: Make SIGKILL during coredumps an explicit special case Eric W. Biederman
2022-01-04  6:30   ` Dmitry Osipenko
2022-01-04 16:18     ` Eric W. Biederman
2022-01-05 19:58     ` Eric W. Biederman
2022-01-05 21:39       ` Dmitry Osipenko
2022-01-08 18:13         ` Eric W. Biederman
2022-01-08 18:15           ` [PATCH 1/2] signal: Have prepare_signal detect coredumps using signal->core_state Eric W. Biederman
2022-01-08 18:15           ` [PATCH 2/2] signal: Make coredump handling explicit in complete_signal Eric W. Biederman
2022-01-11  8:59           ` [PATCH 1/8] signal: Make SIGKILL during coredumps an explicit special case Dmitry Osipenko
2022-01-11 17:20             ` Eric W. Biederman
2022-01-18 17:30               ` Dmitry Osipenko
2022-01-18 17:52                 ` Eric W. Biederman
2022-01-18 18:01                   ` Dmitry Osipenko
2022-01-04 18:44   ` Linus Torvalds
2022-01-04 19:47     ` Eric W. Biederman
2022-01-08 19:13       ` Heiko Carstens
     [not found]         ` <87ilurwjju.fsf@email.froward.int.ebiederm.org>
     [not found]           ` <87o84juwhg.fsf@email.froward.int.ebiederm.org>
2022-01-10 23:00             ` Olivier Langlois
2022-01-11 17:28               ` Eric W. Biederman
2022-01-11 18:51                 ` Eric W. Biederman
2022-01-11 19:19                   ` Linus Torvalds
2022-01-15  0:12                     ` Eric W. Biederman
2022-01-15 19:23                       ` Olivier Langlois
2022-01-17 16:09                         ` Eric W. Biederman
2022-01-17 18:46                           ` io_uring truncating coredumps Eric W. Biederman
2022-01-18  4:23                             ` Linus Torvalds
2022-01-26 15:06                           ` [PATCH 1/8] signal: Make SIGKILL during coredumps an explicit special case Olivier Langlois
2021-12-13 22:53 ` [PATCH 2/8] signal: Drop signals received after a fatal signal has been processed Eric W. Biederman
2021-12-13 22:53 ` [PATCH 3/8] signal: Have the oom killer detect coredumps using signal->core_state Eric W. Biederman
2021-12-13 22:53 ` [PATCH 4/8] signal: During coredumps set SIGNAL_GROUP_EXIT in zap_process Eric W. Biederman
2021-12-13 22:53 ` [PATCH 5/8] signal: Remove SIGNAL_GROUP_COREDUMP Eric W. Biederman
2021-12-13 22:53 ` [PATCH 6/8] coredump: Stop setting signal->group_exit_task Eric W. Biederman
2021-12-13 22:53 ` [PATCH 7/8] signal: Rename group_exit_task group_exec_task Eric W. Biederman
2021-12-13 22:53 ` [PATCH 8/8] signal: Remove the helper signal_group_exit Eric W. Biederman

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=YdelKq9U86/dHPcm@zeniv-ca.linux.org.uk \
    --to=viro@zeniv.linux.org.uk \
    --cc=agordeev@linux.ibm.com \
    --cc=borntraeger@de.ibm.com \
    --cc=ebiederm@xmission.com \
    --cc=gor@linux.ibm.com \
    --cc=hca@linux.ibm.com \
    --cc=keescook@chromium.org \
    --cc=legion@kernel.org \
    --cc=linux-arch@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=me@kylehuey.com \
    --cc=oleg@redhat.com \
    --cc=schwidefsky@de.ibm.com \
    --cc=torvalds@linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).