linux-trace-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v2] tracing: Add sched_prepare_exec tracepoint
@ 2024-04-11 10:20 Marco Elver
  2024-04-11 15:15 ` Kees Cook
                   ` (2 more replies)
  0 siblings, 3 replies; 5+ messages in thread
From: Marco Elver @ 2024-04-11 10:20 UTC (permalink / raw
  To: elver, Steven Rostedt
  Cc: Alexander Viro, Christian Brauner, Jan Kara, Eric Biederman,
	Kees Cook, Masami Hiramatsu, Mathieu Desnoyers, Azeem Shaikh,
	linux-fsdevel, linux-mm, linux-kernel, linux-trace-kernel,
	Dmitry Vyukov

Add "sched_prepare_exec" tracepoint, which is run right after the point
of no return but before the current task assumes its new exec identity.

Unlike the tracepoint "sched_process_exec", the "sched_prepare_exec"
tracepoint runs before flushing the old exec, i.e. while the task still
has the original state (such as original MM), but when the new exec
either succeeds or crashes (but never returns to the original exec).

Being able to trace this event can be helpful in a number of use cases:

  * allowing tracing eBPF programs access to the original MM on exec,
    before current->mm is replaced;
  * counting exec in the original task (via perf event);
  * profiling flush time ("sched_prepare_exec" to "sched_process_exec").

Example of tracing output:

 $ cat /sys/kernel/debug/tracing/trace_pipe
    <...>-379  [003] .....  179.626921: sched_prepare_exec: interp=/usr/bin/sshd filename=/usr/bin/sshd pid=379 comm=sshd
    <...>-381  [002] .....  180.048580: sched_prepare_exec: interp=/bin/bash filename=/bin/bash pid=381 comm=sshd
    <...>-385  [001] .....  180.068277: sched_prepare_exec: interp=/usr/bin/tty filename=/usr/bin/tty pid=385 comm=bash
    <...>-389  [006] .....  192.020147: sched_prepare_exec: interp=/usr/bin/dmesg filename=/usr/bin/dmesg pid=389 comm=bash

Signed-off-by: Marco Elver <elver@google.com>
---
v2:
* Add more documentation.
* Also show bprm->interp in trace.
* Rename to sched_prepare_exec.
---
 fs/exec.c                    |  8 ++++++++
 include/trace/events/sched.h | 35 +++++++++++++++++++++++++++++++++++
 2 files changed, 43 insertions(+)

diff --git a/fs/exec.c b/fs/exec.c
index 38bf71cbdf5e..57fee729dd92 100644
--- a/fs/exec.c
+++ b/fs/exec.c
@@ -1268,6 +1268,14 @@ int begin_new_exec(struct linux_binprm * bprm)
 	if (retval)
 		return retval;
 
+	/*
+	 * This tracepoint marks the point before flushing the old exec where
+	 * the current task is still unchanged, but errors are fatal (point of
+	 * no return). The later "sched_process_exec" tracepoint is called after
+	 * the current task has successfully switched to the new exec.
+	 */
+	trace_sched_prepare_exec(current, bprm);
+
 	/*
 	 * Ensure all future errors are fatal.
 	 */
diff --git a/include/trace/events/sched.h b/include/trace/events/sched.h
index dbb01b4b7451..226f47c6939c 100644
--- a/include/trace/events/sched.h
+++ b/include/trace/events/sched.h
@@ -420,6 +420,41 @@ TRACE_EVENT(sched_process_exec,
 		  __entry->pid, __entry->old_pid)
 );
 
+/**
+ * sched_prepare_exec - called before setting up new exec
+ * @task:	pointer to the current task
+ * @bprm:	pointer to linux_binprm used for new exec
+ *
+ * Called before flushing the old exec, where @task is still unchanged, but at
+ * the point of no return during switching to the new exec. At the point it is
+ * called the exec will either succeed, or on failure terminate the task. Also
+ * see the "sched_process_exec" tracepoint, which is called right after @task
+ * has successfully switched to the new exec.
+ */
+TRACE_EVENT(sched_prepare_exec,
+
+	TP_PROTO(struct task_struct *task, struct linux_binprm *bprm),
+
+	TP_ARGS(task, bprm),
+
+	TP_STRUCT__entry(
+		__string(	interp,		bprm->interp	)
+		__string(	filename,	bprm->filename	)
+		__field(	pid_t,		pid		)
+		__string(	comm,		task->comm	)
+	),
+
+	TP_fast_assign(
+		__assign_str(interp, bprm->interp);
+		__assign_str(filename, bprm->filename);
+		__entry->pid = task->pid;
+		__assign_str(comm, task->comm);
+	),
+
+	TP_printk("interp=%s filename=%s pid=%d comm=%s",
+		  __get_str(interp), __get_str(filename),
+		  __entry->pid, __get_str(comm))
+);
 
 #ifdef CONFIG_SCHEDSTATS
 #define DEFINE_EVENT_SCHEDSTAT DEFINE_EVENT
-- 
2.44.0.478.gd926399ef9-goog


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [PATCH v2] tracing: Add sched_prepare_exec tracepoint
  2024-04-11 10:20 [PATCH v2] tracing: Add sched_prepare_exec tracepoint Marco Elver
@ 2024-04-11 15:15 ` Kees Cook
  2024-04-11 15:38   ` Steven Rostedt
  2024-04-11 15:40 ` Kees Cook
  2024-04-11 15:56 ` Masami Hiramatsu
  2 siblings, 1 reply; 5+ messages in thread
From: Kees Cook @ 2024-04-11 15:15 UTC (permalink / raw
  To: Marco Elver
  Cc: Steven Rostedt, Alexander Viro, Christian Brauner, Jan Kara,
	Eric Biederman, Masami Hiramatsu, Mathieu Desnoyers, Azeem Shaikh,
	linux-fsdevel, linux-mm, linux-kernel, linux-trace-kernel,
	Dmitry Vyukov

On Thu, Apr 11, 2024 at 12:20:57PM +0200, Marco Elver wrote:
> Add "sched_prepare_exec" tracepoint, which is run right after the point
> of no return but before the current task assumes its new exec identity.
> 
> Unlike the tracepoint "sched_process_exec", the "sched_prepare_exec"
> tracepoint runs before flushing the old exec, i.e. while the task still
> has the original state (such as original MM), but when the new exec
> either succeeds or crashes (but never returns to the original exec).
> 
> Being able to trace this event can be helpful in a number of use cases:
> 
>   * allowing tracing eBPF programs access to the original MM on exec,
>     before current->mm is replaced;
>   * counting exec in the original task (via perf event);
>   * profiling flush time ("sched_prepare_exec" to "sched_process_exec").
> 
> Example of tracing output:
> 
>  $ cat /sys/kernel/debug/tracing/trace_pipe
>     <...>-379  [003] .....  179.626921: sched_prepare_exec: interp=/usr/bin/sshd filename=/usr/bin/sshd pid=379 comm=sshd
>     <...>-381  [002] .....  180.048580: sched_prepare_exec: interp=/bin/bash filename=/bin/bash pid=381 comm=sshd
>     <...>-385  [001] .....  180.068277: sched_prepare_exec: interp=/usr/bin/tty filename=/usr/bin/tty pid=385 comm=bash
>     <...>-389  [006] .....  192.020147: sched_prepare_exec: interp=/usr/bin/dmesg filename=/usr/bin/dmesg pid=389 comm=bash
> 
> Signed-off-by: Marco Elver <elver@google.com>

This looks good to me. If tracing wants to take it:

Acked-by: Kees Cook <keescook@chromium.org>

If not, I can take it in my tree if I get a tracing Ack. :)

-Kees

-- 
Kees Cook

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH v2] tracing: Add sched_prepare_exec tracepoint
  2024-04-11 15:15 ` Kees Cook
@ 2024-04-11 15:38   ` Steven Rostedt
  0 siblings, 0 replies; 5+ messages in thread
From: Steven Rostedt @ 2024-04-11 15:38 UTC (permalink / raw
  To: Kees Cook
  Cc: Marco Elver, Alexander Viro, Christian Brauner, Jan Kara,
	Eric Biederman, Masami Hiramatsu, Mathieu Desnoyers, Azeem Shaikh,
	linux-fsdevel, linux-mm, linux-kernel, linux-trace-kernel,
	Dmitry Vyukov

On Thu, 11 Apr 2024 08:15:05 -0700
Kees Cook <keescook@chromium.org> wrote:

> This looks good to me. If tracing wants to take it:
> 
> Acked-by: Kees Cook <keescook@chromium.org>
> 
> If not, I can take it in my tree if I get a tracing Ack. :)

You can take it.

Acked-by: Steven Rostedt (Google) <rostedt@goodmis.org>

-- Steve

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH v2] tracing: Add sched_prepare_exec tracepoint
  2024-04-11 10:20 [PATCH v2] tracing: Add sched_prepare_exec tracepoint Marco Elver
  2024-04-11 15:15 ` Kees Cook
@ 2024-04-11 15:40 ` Kees Cook
  2024-04-11 15:56 ` Masami Hiramatsu
  2 siblings, 0 replies; 5+ messages in thread
From: Kees Cook @ 2024-04-11 15:40 UTC (permalink / raw
  To: Steven Rostedt, Marco Elver
  Cc: Kees Cook, Alexander Viro, Christian Brauner, Jan Kara,
	Eric Biederman, Masami Hiramatsu, Mathieu Desnoyers, Azeem Shaikh,
	linux-fsdevel, linux-mm, linux-kernel, linux-trace-kernel,
	Dmitry Vyukov

On Thu, 11 Apr 2024 12:20:57 +0200, Marco Elver wrote:
> Add "sched_prepare_exec" tracepoint, which is run right after the point
> of no return but before the current task assumes its new exec identity.
> 
> Unlike the tracepoint "sched_process_exec", the "sched_prepare_exec"
> tracepoint runs before flushing the old exec, i.e. while the task still
> has the original state (such as original MM), but when the new exec
> either succeeds or crashes (but never returns to the original exec).
> 
> [...]

Applied to for-next/execve, thanks!

[1/1] tracing: Add sched_prepare_exec tracepoint
      https://git.kernel.org/kees/c/5c5fad46e48c

Take care,

-- 
Kees Cook


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH v2] tracing: Add sched_prepare_exec tracepoint
  2024-04-11 10:20 [PATCH v2] tracing: Add sched_prepare_exec tracepoint Marco Elver
  2024-04-11 15:15 ` Kees Cook
  2024-04-11 15:40 ` Kees Cook
@ 2024-04-11 15:56 ` Masami Hiramatsu
  2 siblings, 0 replies; 5+ messages in thread
From: Masami Hiramatsu @ 2024-04-11 15:56 UTC (permalink / raw
  To: Marco Elver
  Cc: Steven Rostedt, Alexander Viro, Christian Brauner, Jan Kara,
	Eric Biederman, Kees Cook, Masami Hiramatsu, Mathieu Desnoyers,
	Azeem Shaikh, linux-fsdevel, linux-mm, linux-kernel,
	linux-trace-kernel, Dmitry Vyukov

On Thu, 11 Apr 2024 12:20:57 +0200
Marco Elver <elver@google.com> wrote:

> Add "sched_prepare_exec" tracepoint, which is run right after the point
> of no return but before the current task assumes its new exec identity.
> 
> Unlike the tracepoint "sched_process_exec", the "sched_prepare_exec"
> tracepoint runs before flushing the old exec, i.e. while the task still
> has the original state (such as original MM), but when the new exec
> either succeeds or crashes (but never returns to the original exec).
> 
> Being able to trace this event can be helpful in a number of use cases:
> 
>   * allowing tracing eBPF programs access to the original MM on exec,
>     before current->mm is replaced;
>   * counting exec in the original task (via perf event);
>   * profiling flush time ("sched_prepare_exec" to "sched_process_exec").
> 
> Example of tracing output:
> 
>  $ cat /sys/kernel/debug/tracing/trace_pipe
>     <...>-379  [003] .....  179.626921: sched_prepare_exec: interp=/usr/bin/sshd filename=/usr/bin/sshd pid=379 comm=sshd
>     <...>-381  [002] .....  180.048580: sched_prepare_exec: interp=/bin/bash filename=/bin/bash pid=381 comm=sshd
>     <...>-385  [001] .....  180.068277: sched_prepare_exec: interp=/usr/bin/tty filename=/usr/bin/tty pid=385 comm=bash
>     <...>-389  [006] .....  192.020147: sched_prepare_exec: interp=/usr/bin/dmesg filename=/usr/bin/dmesg pid=389 comm=bash
> 
> Signed-off-by: Marco Elver <elver@google.com>

Looks good to me.

Reviewed-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>

Thanks,

> ---
> v2:
> * Add more documentation.
> * Also show bprm->interp in trace.
> * Rename to sched_prepare_exec.
> ---
>  fs/exec.c                    |  8 ++++++++
>  include/trace/events/sched.h | 35 +++++++++++++++++++++++++++++++++++
>  2 files changed, 43 insertions(+)
> 
> diff --git a/fs/exec.c b/fs/exec.c
> index 38bf71cbdf5e..57fee729dd92 100644
> --- a/fs/exec.c
> +++ b/fs/exec.c
> @@ -1268,6 +1268,14 @@ int begin_new_exec(struct linux_binprm * bprm)
>  	if (retval)
>  		return retval;
>  
> +	/*
> +	 * This tracepoint marks the point before flushing the old exec where
> +	 * the current task is still unchanged, but errors are fatal (point of
> +	 * no return). The later "sched_process_exec" tracepoint is called after
> +	 * the current task has successfully switched to the new exec.
> +	 */
> +	trace_sched_prepare_exec(current, bprm);
> +
>  	/*
>  	 * Ensure all future errors are fatal.
>  	 */
> diff --git a/include/trace/events/sched.h b/include/trace/events/sched.h
> index dbb01b4b7451..226f47c6939c 100644
> --- a/include/trace/events/sched.h
> +++ b/include/trace/events/sched.h
> @@ -420,6 +420,41 @@ TRACE_EVENT(sched_process_exec,
>  		  __entry->pid, __entry->old_pid)
>  );
>  
> +/**
> + * sched_prepare_exec - called before setting up new exec
> + * @task:	pointer to the current task
> + * @bprm:	pointer to linux_binprm used for new exec
> + *
> + * Called before flushing the old exec, where @task is still unchanged, but at
> + * the point of no return during switching to the new exec. At the point it is
> + * called the exec will either succeed, or on failure terminate the task. Also
> + * see the "sched_process_exec" tracepoint, which is called right after @task
> + * has successfully switched to the new exec.
> + */
> +TRACE_EVENT(sched_prepare_exec,
> +
> +	TP_PROTO(struct task_struct *task, struct linux_binprm *bprm),
> +
> +	TP_ARGS(task, bprm),
> +
> +	TP_STRUCT__entry(
> +		__string(	interp,		bprm->interp	)
> +		__string(	filename,	bprm->filename	)
> +		__field(	pid_t,		pid		)
> +		__string(	comm,		task->comm	)
> +	),
> +
> +	TP_fast_assign(
> +		__assign_str(interp, bprm->interp);
> +		__assign_str(filename, bprm->filename);
> +		__entry->pid = task->pid;
> +		__assign_str(comm, task->comm);
> +	),
> +
> +	TP_printk("interp=%s filename=%s pid=%d comm=%s",
> +		  __get_str(interp), __get_str(filename),
> +		  __entry->pid, __get_str(comm))
> +);
>  
>  #ifdef CONFIG_SCHEDSTATS
>  #define DEFINE_EVENT_SCHEDSTAT DEFINE_EVENT
> -- 
> 2.44.0.478.gd926399ef9-goog
> 


-- 
Masami Hiramatsu (Google) <mhiramat@kernel.org>

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2024-04-11 15:56 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-04-11 10:20 [PATCH v2] tracing: Add sched_prepare_exec tracepoint Marco Elver
2024-04-11 15:15 ` Kees Cook
2024-04-11 15:38   ` Steven Rostedt
2024-04-11 15:40 ` Kees Cook
2024-04-11 15:56 ` Masami Hiramatsu

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).