BPF Archive mirror
 help / color / mirror / Atom feed
From: Leon Hwang <hffilwlqm@gmail.com>
To: Zvi Effron <zeffron@riotgames.com>
Cc: bpf@vger.kernel.org, ast@kernel.org, daniel@iogearbox.net,
	andrii@kernel.org, maciej.fijalkowski@intel.com,
	jakub@cloudflare.com, pulehui@huawei.com,
	kernel-patches-bot@fb.com
Subject: Re: [PATCH bpf-next v4 3/5] bpf, x64: Fix tailcall hierarchy
Date: Fri, 17 May 2024 23:05:34 +0800	[thread overview]
Message-ID: <c1483467-b7e5-4511-8c8e-3ab9dfd0ea49@gmail.com> (raw)
In-Reply-To: <CAC1LvL27bXu5zbPj+wO1hQCGvdHooUbQchiwwawyd+iokKc42Q@mail.gmail.com>



On 2024/5/17 02:56, Zvi Effron wrote:
> On Thu, May 16, 2024 at 8:28 AM Leon Hwang <hffilwlqm@gmail.com> wrote:
>>
>>
>>
>> On 2024/5/9 23:05, Leon Hwang wrote:
>>> This patch fixes a tailcall issue caused by abusing the tailcall in
>>> bpf2bpf feature.
>>>

[SNIP]

>>>
>>
>> Oh, I missed a case here.
>>
>> This patch set is unable to provide tcc_ptr for freplace programs that
>> use tail calls in bpf2bpf.
>>
>> How can this approach provide tcc_ptr for freplace programs?
>>
>> Achieving this is not straightforward. However, it is simpler to disable
>> the use of tail calls in bpf2bpf for freplace programs, even though this
>> is a desired feature for my project.
>>
>> Therefore, I will disable it in the v5 patch set.
>>
> 
> Isn't this a breaking change such that it would effectively be a regression for
> any users already using tail_calls in bpf2bpf for freplace programs? And,
> correct me if I'm wrong, but aren't those pieces of eBPF essentially considered
> UAPI stable (unlike kfuncs)?

Yeah, this is a breaking change. However, I think it's acceptable, as
tail_calls in subprogs was considered to be disabled[0].

[0]
https://lore.kernel.org/bpf/CAADnVQLOswL3BY1s0B28wRZH1PU675S6_2=XknjZKNgyJ=yDxw@mail.gmail.com/

> 
> I appreciate that this is an esoteric use of eBPF, but as you said, you have a
> use case for it, as does my team (although we haven't had a chance to implement
> it yet), and if the two of us have use cases for it, I imagine other may have
> as well, and some of them might already have done their implementation.
> 


It seems it is an useful feature for us. I haven't use it either because
of old kernel version.

So, I figure out another approach to resolve this issue.

Here's the diff just for idea discussion:

diff --git a/arch/x86/net/bpf_jit_comp.c b/arch/x86/net/bpf_jit_comp.c
index 5159c7a229229..b0b6c84874e54 100644
--- a/arch/x86/net/bpf_jit_comp.c
+++ b/arch/x86/net/bpf_jit_comp.c
@@ -273,7 +273,7 @@ struct jit_context {
 /* Number of bytes emit_patch() needs to generate instructions */
 #define X86_PATCH_SIZE		5
 /* Number of bytes that will be skipped on tailcall */
-#define X86_TAIL_CALL_OFFSET	(11 + ENDBR_INSN_SIZE)
+#define X86_TAIL_CALL_OFFSET	(22 + ENDBR_INSN_SIZE)

 static void push_r12(u8 **pprog)
 {
@@ -403,6 +403,22 @@ static void emit_cfi(u8 **pprog, u32 hash)
 	*pprog = prog;
 }

+static notrace void bpf_prepare_tail_call_cnt_ptr()
+{
+	/* %rax stores the position to call the original prog. */
+
+	asm (
+	    "pushq %r9\n\t"       /* Push %r9. */
+	    "movq %rax, %r9\n\t"  /* Cache calling position. */
+	    "xor %eax, %eax\n\t"  /* Initialise tail_call_cnt. */
+	    "pushq %rax\n\t"      /* Push tail_call_cnt. */
+	    "movq %rsp, %rax\n\t" /* Make %rax as tcc_ptr. */
+	    "callq *%r9\n\t"      /* Call the original prog. */
+	    "popq %r9\n\t"        /* Pop tail_call_cnt. */
+	    "popq %r9\n\t"        /* Pop %r9. */
+	);
+}
+
 /*
  * Emit x86-64 prologue code for BPF program.
  * bpf_tail_call helper will skip the first X86_TAIL_CALL_OFFSET bytes
@@ -410,9 +426,9 @@ static void emit_cfi(u8 **pprog, u32 hash)
  */
 static void emit_prologue(u8 **pprog, u32 stack_depth, bool ebpf_from_cbpf,
 			  bool tail_call_reachable, bool is_subprog,
-			  bool is_exception_cb)
+			  bool is_exception_cb, u8 *image)
 {
-	u8 *prog = *pprog;
+	u8 *prog = *pprog, *start = *pprog;

 	emit_cfi(&prog, is_subprog ? cfi_bpf_subprog_hash : cfi_bpf_hash);
 	/* BPF trampoline can be made to work without these nops,
@@ -420,14 +436,16 @@ static void emit_prologue(u8 **pprog, u32
stack_depth, bool ebpf_from_cbpf,
 	 */
 	emit_nops(&prog, X86_PATCH_SIZE);
 	if (!ebpf_from_cbpf) {
-		if (tail_call_reachable && !is_subprog)
-			/* When it's the entry of the whole tailcall context,
-			 * zeroing rax means initialising tail_call_cnt.
-			 */
-			EMIT2(0x31, 0xC0); /* xor eax, eax */
-		else
+		if (tail_call_reachable && !is_subprog) {
+			/* mov rax, offset */
+			u32 offset = image + (prog - start) + 13;
+			EMIT4_off32(0x48, 0x8B, 0x04, 0x25, offset);
+			/* call bpf_prepare_tail_call_cnt_ptr */
+			emit_call(&prog, bpf_prepare_tail_call_cnt_ptr, offset-5);
+		 } else {
 			/* Keep the same instruction layout. */
-			EMIT2(0x66, 0x90); /* nop2 */
+			emit_nops(&prog, 13);
+		 }
 	}
 	/* Exception callback receives FP as third parameter */
 	if (is_exception_cb) {
@@ -1344,7 +1362,8 @@ static int do_jit(struct bpf_prog *bpf_prog, int
*addrs, u8 *image, u8 *rw_image

 	emit_prologue(&prog, bpf_prog->aux->stack_depth,
 		      bpf_prog_was_classic(bpf_prog), tail_call_reachable,
-		      bpf_is_subprog(bpf_prog), bpf_prog->aux->exception_cb);
+		      bpf_is_subprog(bpf_prog), bpf_prog->aux->exception_cb,
+		      image);
 	/* Exception callback will clobber callee regs for its own use, and
 	 * restore the original callee regs from main prog's stack frame.
 	 */


Unlink the way to prepare tcc_ptr of current patch set by bpf prog's
caller, it prepares tcc_ptr by calling a function at prologue to reserve
tail_call_cnt memory on stack. And then, call the remain part of the bpf
prog. At the end of prologue, rax is tcc_ptr, too.

This is inspired by the original RFC PATCH[0]. And then, it avoids
unwind-breaking issue by a real function call.

[0] https://lore.kernel.org/bpf/20240104142226.87869-3-hffilwlqm@gmail.com/

However, it introduces an indirect call in
bpf_prepare_tail_call_cnt_ptr(), which costs performance by retpoline.
If to improve performance here, bpf dispatcher should be considered,
like XDP.

Thanks,
Leon

  reply	other threads:[~2024-05-17 15:05 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-05-09 15:05 [PATCH bpf-next v4 0/5] bpf: Fix tailcall hierarchy Leon Hwang
2024-05-09 15:05 ` [PATCH bpf-next v4 1/5] bpf, verifier: Correct tail_call_reachable when no tailcall in subprog Leon Hwang
2024-05-09 15:05 ` [PATCH bpf-next v4 2/5] bpf: Introduce bpf_jit_supports_tail_call_cnt_ptr() Leon Hwang
2024-05-09 15:05 ` [PATCH bpf-next v4 3/5] bpf, x64: Fix tailcall hierarchy Leon Hwang
2024-05-16 15:28   ` Leon Hwang
2024-05-16 18:56     ` Zvi Effron
2024-05-17 15:05       ` Leon Hwang [this message]
2024-05-09 15:05 ` [PATCH bpf-next v4 4/5] bpf, arm64: " Leon Hwang
2024-05-09 15:05 ` [PATCH bpf-next v4 5/5] selftests/bpf: Add testcases for tailcall hierarchy fixing Leon Hwang

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=c1483467-b7e5-4511-8c8e-3ab9dfd0ea49@gmail.com \
    --to=hffilwlqm@gmail.com \
    --cc=andrii@kernel.org \
    --cc=ast@kernel.org \
    --cc=bpf@vger.kernel.org \
    --cc=daniel@iogearbox.net \
    --cc=jakub@cloudflare.com \
    --cc=kernel-patches-bot@fb.com \
    --cc=maciej.fijalkowski@intel.com \
    --cc=pulehui@huawei.com \
    --cc=zeffron@riotgames.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).