BPF Archive mirror
 help / color / mirror / Atom feed
From: Martin KaFai Lau <martin.lau@linux.dev>
To: Raman Shukhau <ramasha@fb.com>
Cc: bpf@vger.kernel.org, ast@kernel.org, andrii@kernel.org,
	daniel@iogearbox.net
Subject: Re: [PATCH bpf-next 1/1] Fix for bpf_sysctl_set_new_value
Date: Tue, 7 May 2024 16:20:36 -0700	[thread overview]
Message-ID: <ca8136e0-5d2a-402b-ad03-cc8a218affd4@linux.dev> (raw)
In-Reply-To: <20240504102312.3137741-2-ramasha@fb.com>

On 5/4/24 3:23 AM, Raman Shukhau wrote:
> Noticed that call to bpf_sysctl_set_new_value doesn't change final value
> of the parameter, when called from cgroup/syscall bpf handler. No error
> thrown in this case, new value is simply ignored and original value, sent
> to sysctl, is set. Example (see test added to this change for BPF handler
> logic):
> 
> sysctl -w net.ipv4.ip_local_reserved_ports = 11111
> ... cgroup/syscal handler call bpf_sysctl_set_new_value	and set 22222
> sysctl net.ipv4.ip_local_reserved_ports
> ... returns 11111
> 
> On investigation I found 2 things that needs to be changed:
> * return value check
> * new_len provided by bpf back to sysctl. proc_sys_call_handler	expects
>    this value NOT to include \0 symbol, e.g. if user do

Thanks for the report and the patch.

This patch is changing a few things (1 fix, 1 improvement, 1 test).

Separate these individual changes into its own patch. Patch 1 fixes the return 
value. Patch 2 improves the '\0' and *pcount situation. Patch 3 adds the test.

btw, I am curious what is missed in the test_sysctl.c that didn't catch the 
return value case?

> 
> 	```
>    open("/proc/sys/net/ipv4/ip_local_reserved_ports", ...)
>    write(fd, "11111", sizeof("22222"))
>    ```
> 
>    or `echo -n "11111" > /proc/sys/net/ipv4/ip_local_reserved_ports`
> 
>    or `sysctl -w	net.ipv4.ip_local_reserved_ports=11111
> 
>    proc_sys_call_handler receives count equal to `5`. To make it consistent
>    with bpf_sysctl_set_new_value, this change also adjust `new_len` with
>    `-1`, if `\0` passed as last character. Alternatively, using
>    `sizeof("11111") - 1` in BPF handler should work, but it might not be
>    obvious and spark confusion. Note: if incorrect count is used, sysctl
>    returns EINVAL to the user.
> 
> Signed-off-by: Raman Shukhau <ramasha@fb.com>
> ---
>   kernel/bpf/cgroup.c                           |  7 ++-
>   .../bpf/progs/test_sysctl_overwrite.c         | 47 +++++++++++++++++++
>   tools/testing/selftests/bpf/test_sysctl.c     | 35 +++++++++++++-
>   3 files changed, 85 insertions(+), 4 deletions(-)
>   create mode 100644 tools/testing/selftests/bpf/progs/test_sysctl_overwrite.c
> 
> diff --git a/kernel/bpf/cgroup.c b/kernel/bpf/cgroup.c
> index 8ba73042a239..23736aed1b53 100644
> --- a/kernel/bpf/cgroup.c
> +++ b/kernel/bpf/cgroup.c
> @@ -1739,10 +1739,13 @@ int __cgroup_bpf_run_filter_sysctl(struct ctl_table_header *head,
>   
>   	kfree(ctx.cur_val);
>   
> -	if (ret == 1 && ctx.new_updated) {
> +	if (ret == 0 && ctx.new_updated) {
>   		kfree(*buf);
>   		*buf = ctx.new_val;
> -		*pcount = ctx.new_len;
> +		if (!(*buf)[ctx.new_len])
> +			*pcount = ctx.new_len - 1;

 From looking at how new_updated is set, my understanding is new_len cannot be 0 
here. just want to double check.


> +		else
> +			*pcount = ctx.new_len;
>   	} else {
>   		kfree(ctx.new_val);
>   	}
> diff --git a/tools/testing/selftests/bpf/progs/test_sysctl_overwrite.c b/tools/testing/selftests/bpf/progs/test_sysctl_overwrite.c
> new file mode 100644
> index 000000000000..e44b429fcfc1
> --- /dev/null
> +++ b/tools/testing/selftests/bpf/progs/test_sysctl_overwrite.c
> @@ -0,0 +1,47 @@
> +// SPDX-License-Identifier: GPL-2.0
> +// Copyright (c) 2019 Facebook
> +
> +#include <string.h>
> +
> +#include <linux/bpf.h>
> +
> +#include <bpf/bpf_helpers.h>
> +
> +#include "bpf_compiler.h"
> +
> +static const char sysctl_value[] = "31337";
> +static const char sysctl_name[] = "net/ipv4/ip_local_reserved_ports";
> +static __always_inline int is_expected_name(struct bpf_sysctl *ctx)
> +{
> +	unsigned char i;
> +	char name[sizeof(sysctl_name)];
> +	int ret;
> +
> +	memset(name, 0, sizeof(name));
> +	ret = bpf_sysctl_get_name(ctx, name, sizeof(name), 0);
> +	if (ret < 0 || ret != sizeof(sysctl_name) - 1)
> +		return 0;
> +
> +	__pragma_loop_unroll_full
> +	for (i = 0; i < sizeof(sysctl_name); ++i)
> +		if (name[i] != sysctl_name[i])

bpf_strncmp() should be useful here.

> +			return 0;
> +
> +	return 1;
> +}
> +
> +SEC("cgroup/sysctl")
> +int test_value_overwrite(struct bpf_sysctl *ctx)
> +{
> +	if (!ctx->write)
> +		return 1;
> +
> +	if (!is_expected_name(ctx))
> +		return 0;
> +
> +	if (bpf_sysctl_set_new_value(ctx, sysctl_value, sizeof(sysctl_value)) == 0)
> +		return 1;
> +	return 0;
> +}
> +
> +char _license[] SEC("license") = "GPL";
> diff --git a/tools/testing/selftests/bpf/test_sysctl.c b/tools/testing/selftests/bpf/test_sysctl.c
> index bcdbd27f22f0..dfa479861d3a 100644
> --- a/tools/testing/selftests/bpf/test_sysctl.c
> +++ b/tools/testing/selftests/bpf/test_sysctl.c
> @@ -35,6 +35,7 @@ struct sysctl_test {
>   	int seek;
>   	const char *newval;
>   	const char *oldval;
> +	const char *updval;
>   	enum {
>   		LOAD_REJECT,
>   		ATTACH_REJECT,
> @@ -1395,6 +1396,16 @@ static struct sysctl_test tests[] = {
>   		.open_flags = O_RDONLY,
>   		.result = SUCCESS,
>   	},
> +	{
> +		"C prog: override write to ip_local_reserved_ports",
> +		.prog_file = "./test_sysctl_overwrite.bpf.o",

test_sysctl.c is not run in bpf CI. It is not very useful to extend this test 
further. Lets take this chance to create a new progs/cgrp_sysctl.c test that 
will be exercised by ./test_progs in bpf CI. Then it can use the newer skel 
open_and_load also.

Not asking to to migrate the existing tests in test_sysctl.c to the new 
progs/cgrp_sysctl.c in this patch set. The new cgrp_sysctl.c can only have the 
tests that exercise the changes in this patch set. However, it will be useful if 
progs/cgrp_sysctl.c can be bootstrapped in a way that the future test_sysctl.c 
migration will be easier. I also wouldn't worry too much on the existing raw 
insns tests in test_sysctl.c for now. They will need to be moved to either C or 
bpf asm in the future.

pw-bot: cr

> +		.attach_type = BPF_CGROUP_SYSCTL,
> +		.sysctl = "net/ipv4/ip_local_reserved_ports",
> +		.open_flags = O_RDWR,
> +		.newval = "11111",
> +		.updval = "31337",
> +		.result = SUCCESS,
> +	},
>   };
>   
>   static size_t probe_prog_length(const struct bpf_insn *fp)
> @@ -1520,13 +1531,33 @@ static int access_sysctl(const char *sysctl_path,
>   			log_err("Read value %s != %s", buf, test->oldval);
>   			goto err;
>   		}
> -	} else if (test->open_flags == O_WRONLY) {
> +	} else if (test->open_flags == O_WRONLY || test->open_flags == O_RDWR) {
>   		if (!test->newval) {
>   			log_err("New value for sysctl is not set");
>   			goto err;
>   		}
> -		if (write(fd, test->newval, strlen(test->newval)) == -1)
> +		if (write(fd, test->newval, strlen(test->newval)) == -1) {
> +			log_err("Unable to write sysctl value");
>   			goto err;
> +		}
> +		if (test->open_flags == O_RDWR) {
> +			char buf[128];
> +
> +			if (!test->updval) {
> +				log_err("Expected value for sysctl is not set");
> +				goto err;
> +			}
> +
> +			lseek(fd, 0, SEEK_SET);
> +			if (read(fd, buf, sizeof(buf)) == -1) {
> +				log_err("Unable to read updated value");
> +				goto err;
> +			}
> +			if (strncmp(buf, test->updval, strlen(test->updval))) {
> +				log_err("Overwritten value %s != %s", buf, test->updval);
> +				goto err;
> +			}
> +		}
>   	} else {
>   		log_err("Unexpected sysctl access: neither read nor write");
>   		goto err;


  reply	other threads:[~2024-05-07 23:20 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-05-04 10:23 [PATCH bpf-next 0/1] Fix for bpf_sysctl_set_new_value Raman Shukhau
2024-05-04 10:23 ` [PATCH bpf-next 1/1] " Raman Shukhau
2024-05-07 23:20   ` Martin KaFai Lau [this message]
2024-05-16 21:16     ` Raman Shukhau

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ca8136e0-5d2a-402b-ad03-cc8a218affd4@linux.dev \
    --to=martin.lau@linux.dev \
    --cc=andrii@kernel.org \
    --cc=ast@kernel.org \
    --cc=bpf@vger.kernel.org \
    --cc=daniel@iogearbox.net \
    --cc=ramasha@fb.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).