All the mail mirrored from lore.kernel.org
 help / color / mirror / Atom feed
From: Leon Hwang <hffilwlqm@gmail.com>
To: Alexei Starovoitov <alexei.starovoitov@gmail.com>
Cc: bpf <bpf@vger.kernel.org>, Alexei Starovoitov <ast@kernel.org>,
	Daniel Borkmann <daniel@iogearbox.net>,
	Andrii Nakryiko <andrii@kernel.org>,
	"Fijalkowski, Maciej" <maciej.fijalkowski@intel.com>,
	Jakub Sitnicki <jakub@cloudflare.com>,
	Pu Lehui <pulehui@huawei.com>,
	Hengqi Chen <hengqi.chen@gmail.com>,
	kernel-patches-bot@fb.com
Subject: Re: [PATCH bpf-next v3 2/3] bpf, x64: Fix tailcall hierarchy
Date: Sun, 7 Apr 2024 19:34:12 +0800	[thread overview]
Message-ID: <55442238-33d6-4e7d-9dd1-e36da20f7c90@gmail.com> (raw)
In-Reply-To: <CAADnVQ+vJyi6JFsck8KbyxvOuRvmAO5gVTJPwNiyNeBwzsHu9Q@mail.gmail.com>



On 2024/4/5 09:03, Alexei Starovoitov wrote:
>>   * Solution changes from percpu tail_call_cnt to tail_call_cnt at task_struct.
> 
> Please remind us what was wrong with per-cpu approach?

There are some cases that the per-cpu approach cannot handle properly.
Especialy, on non-SMP machine, if there are many bpf progs to run with
tail call simultaneously, MAX_TAIL_CALL_CNT limit is unable to limit the
tail call expectedly.

> 
> Also notice we have pseudo per-cpu bpf insns now,
> so things might be easier today.

Great to hear that. With pseudo per-cpu bpf insns, it is able to get
tcc_ptr from task_struct without a function call.

> 
> On Tue, Apr 2, 2024 at 8:27 AM Leon Hwang <hffilwlqm@gmail.com> wrote:
>>
>> From commit ebf7d1f508a73871 ("bpf, x64: rework pro/epilogue and tailcall
>> handling in JIT"), the tailcall on x64 works better than before.
> ...
>>
>> As a result, the previous tailcall way can be removed totally, including
>>
>> 1. "push rax" at prologue.
>> 2. load tail_call_cnt to rax before calling function.
>> 3. "pop rax" before jumping to tailcallee when tailcall.
>> 4. "push rax" and load tail_call_cnt to rax at trampoline.
> 
> Please trim it.
> It looks like you've been copy pasting it and it's no longer
> accurate.
> Short description of the problem will do.

Got it.

> 
>> Fixes: ebf7d1f508a7 ("bpf, x64: rework pro/epilogue and tailcall handling in JIT")
>> Fixes: e411901c0b77 ("bpf: allow for tailcalls in BPF subprograms for x64 JIT")
>> Signed-off-by: Leon Hwang <hffilwlqm@gmail.com>
>> ---
>>  arch/x86/net/bpf_jit_comp.c | 137 +++++++++++++++++++++---------------
>>  1 file changed, 81 insertions(+), 56 deletions(-)
>>
>> diff --git a/arch/x86/net/bpf_jit_comp.c b/arch/x86/net/bpf_jit_comp.c
>> index 3b639d6f2f54d..cd06e02e83b64 100644
>> --- a/arch/x86/net/bpf_jit_comp.c
>> +++ b/arch/x86/net/bpf_jit_comp.c
>> @@ -11,6 +11,7 @@
>>  #include <linux/bpf.h>
>>  #include <linux/memory.h>
>>  #include <linux/sort.h>
>> +#include <linux/sched.h>
>>  #include <asm/extable.h>
>>  #include <asm/ftrace.h>
>>  #include <asm/set_memory.h>
>> @@ -18,6 +19,8 @@
>>  #include <asm/text-patching.h>
>>  #include <asm/unwind.h>
>>  #include <asm/cfi.h>
>> +#include <asm/current.h>
>> +#include <asm/percpu.h>
>>
>>  static bool all_callee_regs_used[4] = {true, true, true, true};
>>
>> @@ -273,7 +276,7 @@ struct jit_context {
>>  /* Number of bytes emit_patch() needs to generate instructions */
>>  #define X86_PATCH_SIZE         5
>>  /* Number of bytes that will be skipped on tailcall */
>> -#define X86_TAIL_CALL_OFFSET   (11 + ENDBR_INSN_SIZE)
>> +#define X86_TAIL_CALL_OFFSET   (14 + ENDBR_INSN_SIZE)
>>
>>  static void push_r12(u8 **pprog)
>>  {
>> @@ -403,6 +406,9 @@ static void emit_cfi(u8 **pprog, u32 hash)
>>         *pprog = prog;
>>  }
>>
>> +static int emit_call(u8 **pprog, void *func, void *ip);
>> +static __used void bpf_tail_call_cnt_init(void);
>> +
>>  /*
>>   * Emit x86-64 prologue code for BPF program.
>>   * bpf_tail_call helper will skip the first X86_TAIL_CALL_OFFSET bytes
>> @@ -410,9 +416,9 @@ static void emit_cfi(u8 **pprog, u32 hash)
>>   */
>>  static void emit_prologue(u8 **pprog, u32 stack_depth, bool ebpf_from_cbpf,
>>                           bool tail_call_reachable, bool is_subprog,
>> -                         bool is_exception_cb)
>> +                         bool is_exception_cb, u8 *ip)
>>  {
>> -       u8 *prog = *pprog;
>> +       u8 *prog = *pprog, *start = *pprog;
>>
>>         emit_cfi(&prog, is_subprog ? cfi_bpf_subprog_hash : cfi_bpf_hash);
>>         /* BPF trampoline can be made to work without these nops,
>> @@ -421,13 +427,14 @@ static void emit_prologue(u8 **pprog, u32 stack_depth, bool ebpf_from_cbpf,
>>         emit_nops(&prog, X86_PATCH_SIZE);
>>         if (!ebpf_from_cbpf) {
>>                 if (tail_call_reachable && !is_subprog)
>> -                       /* When it's the entry of the whole tailcall context,
>> -                        * zeroing rax means initialising tail_call_cnt.
>> +                       /* Call bpf_tail_call_cnt_init to initilise
>> +                        * tail_call_cnt.
>>                          */
>> -                       EMIT2(0x31, 0xC0); /* xor eax, eax */
>> +                       emit_call(&prog, bpf_tail_call_cnt_init,
>> +                                 ip + (prog - start));
> 
> You're repeating the same bug we discussed before.
> There is nothing in bpf_tail_call_cnt_init() that
> prevents the compiler from scratching rdi,rsi,...
> bpf_tail_call_cnt_init() is a normal function from compiler pov
> and it's allowed to use those regs.
> Must have been lucky that CI is not showing crashes.

Oh, get it. In order to prevent the compiler from scratching
rdi,rsi,..., the asm clobbered register list in bpf_tail_call_cnt_init()
must be "rdi", "rsi", "rdx", "rcx", "r8". I learn it from the GCC doc[0].

static __used void bpf_tail_call_cnt_init(void)
{
	/* In short:
	 * current->bpf_tail_call_cnt = 0;
	 */

	asm volatile (
	    "addq " __percpu_arg(0) ", %1\n\t"
	    "movq (%1), %1\n\t"
	    "movl $0x0, %c2(%1)\n\t"
	    :
	    : "m" (__my_cpu_var(this_cpu_off)), "r" (&pcpu_hot.current_task),
	      "i" (offsetof(struct task_struct, bpf_tail_call_cnt))
	    : "rdi", "rsi", "rdx", "rcx", "r8" /* to avoid scratching these regs */
	);
}

[0]
https://gcc.gnu.org/onlinedocs/gcc/Extended-Asm.html#Clobbers-and-Scratch-Registers-1

> 
>>                 else
>>                         /* Keep the same instruction layout. */
>> -                       EMIT2(0x66, 0x90); /* nop2 */
>> +                       emit_nops(&prog, X86_PATCH_SIZE);
>>         }
>>         /* Exception callback receives FP as third parameter */
>>         if (is_exception_cb) {
>> @@ -452,8 +459,6 @@ static void emit_prologue(u8 **pprog, u32 stack_depth, bool ebpf_from_cbpf,
>>         /* sub rsp, rounded_stack_depth */
>>         if (stack_depth)
>>                 EMIT3_off32(0x48, 0x81, 0xEC, round_up(stack_depth, 8));
>> -       if (tail_call_reachable)
>> -               EMIT1(0x50);         /* push rax */
>>         *pprog = prog;
>>  }
>>
>> @@ -589,13 +594,61 @@ static void emit_return(u8 **pprog, u8 *ip)
>>         *pprog = prog;
>>  }
>>
>> +static __used void bpf_tail_call_cnt_init(void)
>> +{
>> +       /* The following asm equals to
>> +        *
>> +        * u32 *tcc_ptr = &current->bpf_tail_call_cnt;
>> +        *
>> +        * *tcc_ptr = 0;
>> +        */
>> +
>> +       asm volatile (
>> +           "addq " __percpu_arg(0) ", %1\n\t"
>> +           "addq %2, %1\n\t"
>> +           "movq (%1), %1\n\t"
>> +           "addq %3, %1\n\t"
>> +           "movl $0, (%1)\n\t"
>> +           :
>> +           : "m" (this_cpu_off), "r" (&pcpu_hot),
>> +             "i" (offsetof(struct pcpu_hot, current_task)),
>> +             "i" (offsetof(struct task_struct, bpf_tail_call_cnt))
>> +       );
>> +}
>> +
>> +static __used u32 *bpf_tail_call_cnt_ptr(void)
>> +{
>> +       u32 *tcc_ptr;
>> +
>> +       /* The following asm equals to
>> +        *
>> +        * u32 *tcc_ptr = &current->bpf_tail_call_cnt;
>> +        *
>> +        * return tcc_ptr;
>> +        */
>> +
>> +       asm volatile (
>> +           "addq " __percpu_arg(1) ", %2\n\t"
>> +           "addq %3, %2\n\t"
>> +           "movq (%2), %2\n\t"
>> +           "addq %4, %2\n\t"
>> +           "movq %2, %0\n\t"
>> +           : "=r" (tcc_ptr)
>> +           : "m" (this_cpu_off), "r" (&pcpu_hot),
>> +             "i" (offsetof(struct pcpu_hot, current_task)),
>> +             "i" (offsetof(struct task_struct, bpf_tail_call_cnt))
>> +       );
>> +
>> +       return tcc_ptr;
>> +}
>> +
>>  /*
>>   * Generate the following code:
>>   *
>>   * ... bpf_tail_call(void *ctx, struct bpf_array *array, u64 index) ...
>>   *   if (index >= array->map.max_entries)
>>   *     goto out;
>> - *   if (tail_call_cnt++ >= MAX_TAIL_CALL_CNT)
>> + *   if ((*tcc_ptr)++ >= MAX_TAIL_CALL_CNT)
>>   *     goto out;
>>   *   prog = array->ptrs[index];
>>   *   if (prog == NULL)
>> @@ -608,7 +661,6 @@ static void emit_bpf_tail_call_indirect(struct bpf_prog *bpf_prog,
>>                                         u32 stack_depth, u8 *ip,
>>                                         struct jit_context *ctx)
>>  {
>> -       int tcc_off = -4 - round_up(stack_depth, 8);
>>         u8 *prog = *pprog, *start = *pprog;
>>         int offset;
>>
>> @@ -630,16 +682,16 @@ static void emit_bpf_tail_call_indirect(struct bpf_prog *bpf_prog,
>>         EMIT2(X86_JBE, offset);                   /* jbe out */
>>
>>         /*
>> -        * if (tail_call_cnt++ >= MAX_TAIL_CALL_CNT)
>> +        * if ((*tcc_ptr)++ >= MAX_TAIL_CALL_CNT)
>>          *      goto out;
>>          */
>> -       EMIT2_off32(0x8B, 0x85, tcc_off);         /* mov eax, dword ptr [rbp - tcc_off] */
>> -       EMIT3(0x83, 0xF8, MAX_TAIL_CALL_CNT);     /* cmp eax, MAX_TAIL_CALL_CNT */
>> +       /* call bpf_tail_call_cnt_ptr */
>> +       emit_call(&prog, bpf_tail_call_cnt_ptr, ip + (prog - start));
> 
> same issue.

I will rewrite it to emit_bpf_tail_call_cnt_ptr(), which will use pseudo
per-cpu bpf insns to get tcc_ptr from task_struct.

static void emit_bpf_tail_call_cnt_ptr(u8 **pprog)
{
	u8 *prog = *pprog;

	/* In short:
	 * return &current->bpf_tail_call_cnt;
	 */

	/* mov rax, &pcpu_hot.current_task */
	EMIT3_off32(0x48, 0xC7, 0xC0, ((u32)(unsigned
long)&pcpu_hot.current_task));

#ifdef CONFIG_SMP
	/* add rax, gs:[&this_cpu_off] */
	EMIT1(0x65);
	EMIT4_off32(0x48, 0x03, 0x04, 0x25, ((u32)(unsigned long)&this_cpu_off));
#endif

	/* mov rax, qword ptr [rax] */
	EMIT3(0x48, 0x8B, 0x00);
	/* add rax, offsetof(struct task_struct, bpf_tail_call_cnt) */
	EMIT2_off32(0x48, 0x05, ((u32)offsetof(struct task_struct,
bpf_tail_call_cnt)));

	*pprog = prog;
}

> 
>> +       EMIT3(0x83, 0x38, MAX_TAIL_CALL_CNT);     /* cmp dword ptr [rax], MAX_TAIL_CALL_CNT */
>>
>>         offset = ctx->tail_call_indirect_label - (prog + 2 - start);
>>         EMIT2(X86_JAE, offset);                   /* jae out */
>> -       EMIT3(0x83, 0xC0, 0x01);                  /* add eax, 1 */
>> -       EMIT2_off32(0x89, 0x85, tcc_off);         /* mov dword ptr [rbp - tcc_off], eax */
>> +       EMIT2(0xFF, 0x00);                        /* inc dword ptr [rax] */
>>
>>         /* prog = array->ptrs[index]; */
>>         EMIT4_off32(0x48, 0x8B, 0x8C, 0xD6,       /* mov rcx, [rsi + rdx * 8 + offsetof(...)] */
>> @@ -663,7 +715,6 @@ static void emit_bpf_tail_call_indirect(struct bpf_prog *bpf_prog,
>>                         pop_r12(&prog);
>>         }
>>
>> -       EMIT1(0x58);                              /* pop rax */
>>         if (stack_depth)
>>                 EMIT3_off32(0x48, 0x81, 0xC4,     /* add rsp, sd */
>>                             round_up(stack_depth, 8));
>> @@ -691,21 +742,20 @@ static void emit_bpf_tail_call_direct(struct bpf_prog *bpf_prog,
>>                                       bool *callee_regs_used, u32 stack_depth,
>>                                       struct jit_context *ctx)
>>  {
>> -       int tcc_off = -4 - round_up(stack_depth, 8);
>>         u8 *prog = *pprog, *start = *pprog;
>>         int offset;
>>
>>         /*
>> -        * if (tail_call_cnt++ >= MAX_TAIL_CALL_CNT)
>> +        * if ((*tcc_ptr)++ >= MAX_TAIL_CALL_CNT)
>>          *      goto out;
>>          */
>> -       EMIT2_off32(0x8B, 0x85, tcc_off);             /* mov eax, dword ptr [rbp - tcc_off] */
>> -       EMIT3(0x83, 0xF8, MAX_TAIL_CALL_CNT);         /* cmp eax, MAX_TAIL_CALL_CNT */
>> +       /* call bpf_tail_call_cnt_ptr */
>> +       emit_call(&prog, bpf_tail_call_cnt_ptr, ip);
> 
> and here as well.

Replace with emit_bpf_tail_call_cnt_ptr(), too.

> 
> pw-bot: cr

I would like to send next version with these update.

Thanks,
Leon

  reply	other threads:[~2024-04-07 11:34 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-04-02 15:26 [PATCH bpf-next v3 0/3] bpf, x64: Fix tailcall hierarchy Leon Hwang
2024-04-02 15:26 ` [PATCH bpf-next v3 1/3] bpf: Add bpf_tail_call_cnt to task_struct Leon Hwang
2024-04-02 15:26 ` [PATCH bpf-next v3 2/3] bpf, x64: Fix tailcall hierarchy Leon Hwang
2024-04-05  1:03   ` Alexei Starovoitov
2024-04-07 11:34     ` Leon Hwang [this message]
2024-04-07 16:30       ` Alexei Starovoitov
2024-04-10 14:09         ` Leon Hwang
2024-04-11  3:42           ` Alexei Starovoitov
2024-04-14 11:47             ` Leon Hwang
2024-04-02 15:26 ` [PATCH bpf-next v3 3/3] selftests/bpf: Add testcases for tailcall hierarchy fixing Leon Hwang

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=55442238-33d6-4e7d-9dd1-e36da20f7c90@gmail.com \
    --to=hffilwlqm@gmail.com \
    --cc=alexei.starovoitov@gmail.com \
    --cc=andrii@kernel.org \
    --cc=ast@kernel.org \
    --cc=bpf@vger.kernel.org \
    --cc=daniel@iogearbox.net \
    --cc=hengqi.chen@gmail.com \
    --cc=jakub@cloudflare.com \
    --cc=kernel-patches-bot@fb.com \
    --cc=maciej.fijalkowski@intel.com \
    --cc=pulehui@huawei.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.