BPF Archive mirror
 help / color / mirror / Atom feed
* [PATCH bpf-next v6 0/4] bpf: Inline helpers in arm64 and riscv JITs
@ 2024-05-02 15:18 Puranjay Mohan
  2024-05-02 15:18 ` [PATCH bpf-next v6 1/4] riscv, bpf: add internal-only MOV instruction to resolve per-CPU addrs Puranjay Mohan
                   ` (4 more replies)
  0 siblings, 5 replies; 9+ messages in thread
From: Puranjay Mohan @ 2024-05-02 15:18 UTC (permalink / raw
  To: Catalin Marinas, Will Deacon, Alexei Starovoitov, Daniel Borkmann,
	Andrii Nakryiko, Martin KaFai Lau, Eduard Zingerman, Song Liu,
	Yonghong Song, John Fastabend, KP Singh, Stanislav Fomichev,
	Hao Luo, Jiri Olsa, Zi Shen Lim, Xu Kuohai, Florent Revest,
	linux-arm-kernel, linux-kernel, bpf, Kumar Kartikeya Dwivedi,
	Björn Töpel
  Cc: puranjay12

Changes in v5 -> v6:
arm64 v5: https://lore.kernel.org/all/20240430234739.79185-1-puranjay@kernel.org/
riscv v2: https://lore.kernel.org/all/20240430175834.33152-1-puranjay@kernel.org/
- Combine riscv and arm64 changes in single series
- Some coding style fixes

Changes in v4 -> v5:
v4: https://lore.kernel.org/all/20240429131647.50165-1-puranjay@kernel.org/
- Implement the inlining of the bpf_get_smp_processor_id() in the JIT.

NOTE: This needs to be based on:
https://lore.kernel.org/all/20240430175834.33152-1-puranjay@kernel.org/
to be built.

Manual run of bpf-ci with this series rebased on above:
https://github.com/kernel-patches/bpf/pull/6929

Changes in v3 -> v4:
v3: https://lore.kernel.org/all/20240426121349.97651-1-puranjay@kernel.org/
- Fix coding style issue related to C89 standards.

Changes in v2 -> v3:
v2: https://lore.kernel.org/all/20240424173550.16359-1-puranjay@kernel.org/
- Fixed the xlated dump of percpu mov to "r0 = &(void __percpu *)(r0)"
- Made ARM64 and x86-64 use the same code for inlining. The only difference
  that remains is the per-cpu address of the cpu_number.

Changes in v1 -> v2:
v1: https://lore.kernel.org/all/20240405091707.66675-1-puranjay12@gmail.com/
- Add a patch to inline bpf_get_smp_processor_id()
- Fix an issue in MRS instruction encoding as pointed out by Will
- Remove CONFIG_SMP check because arm64 kernel always compiles with CONFIG_SMP

This series adds the support of internal only per-CPU instructions and inlines
the bpf_get_smp_processor_id() helper call for ARM64 and RISC-V BPF JITs.

Here is an example of calls to bpf_get_smp_processor_id() and
percpu_array_map_lookup_elem() before and after this series on ARM64.

                                         BPF
                                        =====
              BEFORE                                       AFTER
             --------                                     -------

int cpu = bpf_get_smp_processor_id();           int cpu = bpf_get_smp_processor_id();
(85) call bpf_get_smp_processor_id#229032       (85) call bpf_get_smp_processor_id#8


p = bpf_map_lookup_elem(map, &zero);            p = bpf_map_lookup_elem(map, &zero);
(18) r1 = map[id:78]                            (18) r1 = map[id:153]
(18) r2 = map[id:82][0]+65536                   (18) r2 = map[id:157][0]+65536
(85) call percpu_array_map_lookup_elem#313512   (07) r1 += 496
                                                (61) r0 = *(u32 *)(r2 +0)
                                                (35) if r0 >= 0x1 goto pc+5
                                                (67) r0 <<= 3
                                                (0f) r0 += r1
                                                (79) r0 = *(u64 *)(r0 +0)
                                                (bf) r0 = &(void __percpu *)(r0)
                                                (05) goto pc+1
                                                (b7) r0 = 0


                                      ARM64 JIT
                                     ===========

              BEFORE                                       AFTER
             --------                                     -------

int cpu = bpf_get_smp_processor_id();           int cpu = bpf_get_smp_processor_id();
mov     x10, #0xfffffffffffff4d0                mrs     x10, sp_el0
movk    x10, #0x802b, lsl #16                   ldr     w7, [x10, #24]
movk    x10, #0x8000, lsl #32
blr     x10
add     x7, x0, #0x0


p = bpf_map_lookup_elem(map, &zero);            p = bpf_map_lookup_elem(map, &zero);
mov     x0, #0xffff0003ffffffff                 mov     x0, #0xffff0003ffffffff
movk    x0, #0xce5c, lsl #16                    movk    x0, #0xe0f3, lsl #16
movk    x0, #0xca00                             movk    x0, #0x7c00
mov     x1, #0xffff8000ffffffff                 mov     x1, #0xffff8000ffffffff
movk    x1, #0x8bdb, lsl #16                    movk    x1, #0xb0c7, lsl #16
movk    x1, #0x6000                             movk    x1, #0xe000
mov     x10, #0xffffffffffff3ed0                add     x0, x0, #0x1f0
movk    x10, #0x802d, lsl #16                   ldr     w7, [x1]
movk    x10, #0x8000, lsl #32                   cmp     x7, #0x1
blr     x10                                     b.cs    0x0000000000000090
add     x7, x0, #0x0                            lsl     x7, x7, #3
                                                add     x7, x7, x0
                                                ldr     x7, [x7]
                                                mrs     x10, tpidr_el1
                                                add     x7, x7, x10
                                                b       0x0000000000000094
                                                mov     x7, #0x0

              Performance improvement found using benchmark[1]

./benchs/run_bench_trigger.sh glob-arr-inc arr-inc hash-inc

  +---------------+-------------------+-------------------+--------------+
  |      Name     |      Before       |        After      |   % change   |
  |---------------+-------------------+-------------------+--------------|
  | glob-arr-inc  | 23.380 ± 1.675M/s | 25.893 ± 0.026M/s |   + 10.74%   |
  | arr-inc       | 23.928 ± 0.034M/s | 25.213 ± 0.063M/s |   + 5.37%    |
  | hash-inc      | 12.352 ± 0.005M/s | 12.609 ± 0.013M/s |   + 2.08%    |
  +---------------+-------------------+-------------------+--------------+

[1] https://github.com/anakryiko/linux/commit/8dec900975ef

             RISCV64 JIT output for `call bpf_get_smp_processor_id`
            =======================================================

                  Before                           After
                 --------                         -------

           auipc   t1,0x848c                  ld    a5,32(tp)
           jalr    604(t1)
           mv      a5,a0

  Benchmark using [1] on Qemu.

  ./benchs/run_bench_trigger.sh glob-arr-inc arr-inc hash-inc

  +---------------+------------------+------------------+--------------+
  |      Name     |     Before       |       After      |   % change   |
  |---------------+------------------+------------------+--------------|
  | glob-arr-inc  | 1.077 ± 0.006M/s | 1.336 ± 0.010M/s |   + 24.04%   |
  | arr-inc       | 1.078 ± 0.002M/s | 1.332 ± 0.015M/s |   + 23.56%   |
  | hash-inc      | 0.494 ± 0.004M/s | 0.653 ± 0.001M/s |   + 32.18%   |
  +---------------+------------------+------------------+--------------+

Puranjay Mohan (4):
  riscv, bpf: add internal-only MOV instruction to resolve per-CPU addrs
  riscv, bpf: inline bpf_get_smp_processor_id()
  arm64, bpf: add internal-only MOV instruction to resolve per-CPU addrs
  bpf, arm64: inline bpf_get_smp_processor_id() helper

 arch/arm64/include/asm/insn.h   |  8 ++++++
 arch/arm64/lib/insn.c           | 11 ++++++++
 arch/arm64/net/bpf_jit.h        |  8 ++++++
 arch/arm64/net/bpf_jit_comp.c   | 39 +++++++++++++++++++++++++
 arch/riscv/net/bpf_jit_comp64.c | 50 +++++++++++++++++++++++++++++++++
 include/linux/filter.h          |  1 +
 kernel/bpf/core.c               | 11 ++++++++
 kernel/bpf/verifier.c           |  4 +++
 8 files changed, 132 insertions(+)

-- 
2.40.1


^ permalink raw reply	[flat|nested] 9+ messages in thread

* [PATCH bpf-next v6 1/4] riscv, bpf: add internal-only MOV instruction to resolve per-CPU addrs
  2024-05-02 15:18 [PATCH bpf-next v6 0/4] bpf: Inline helpers in arm64 and riscv JITs Puranjay Mohan
@ 2024-05-02 15:18 ` Puranjay Mohan
  2024-05-07 21:07   ` Andrii Nakryiko
  2024-05-02 15:18 ` [PATCH bpf-next v6 2/4] riscv, bpf: inline bpf_get_smp_processor_id() Puranjay Mohan
                   ` (3 subsequent siblings)
  4 siblings, 1 reply; 9+ messages in thread
From: Puranjay Mohan @ 2024-05-02 15:18 UTC (permalink / raw
  To: Catalin Marinas, Will Deacon, Alexei Starovoitov, Daniel Borkmann,
	Andrii Nakryiko, Martin KaFai Lau, Eduard Zingerman, Song Liu,
	Yonghong Song, John Fastabend, KP Singh, Stanislav Fomichev,
	Hao Luo, Jiri Olsa, Zi Shen Lim, Xu Kuohai, Florent Revest,
	linux-arm-kernel, linux-kernel, bpf, Kumar Kartikeya Dwivedi,
	Björn Töpel
  Cc: puranjay12

Support an instruction for resolving absolute addresses of per-CPU
data from their per-CPU offsets. This instruction is internal-only and
users are not allowed to use them directly. They will only be used for
internal inlining optimizations for now between BPF verifier and BPF
JITs.

RISC-V uses generic per-cpu implementation where the offsets for CPUs
are kept in an array called __per_cpu_offset[cpu_number]. RISCV stores
the address of the task_struct in TP register. The first element in
task_struct is struct thread_info, and we can get the cpu number by
reading from the TP register + offsetof(struct thread_info, cpu).

Once we have the cpu number in a register we read the offset for that
cpu from address: &__per_cpu_offset + cpu_number << 3. Then we add this
offset to the destination register.

To measure the improvement from this change, the benchmark in [1] was
used on Qemu:

Before:
glob-arr-inc   :    1.127 ± 0.013M/s
arr-inc        :    1.121 ± 0.004M/s
hash-inc       :    0.681 ± 0.052M/s

After:
glob-arr-inc   :    1.138 ± 0.011M/s
arr-inc        :    1.366 ± 0.006M/s
hash-inc       :    0.676 ± 0.001M/s

[1] https://github.com/anakryiko/linux/commit/8dec900975ef

Signed-off-by: Puranjay Mohan <puranjay@kernel.org>
---
 arch/riscv/net/bpf_jit_comp64.c | 24 ++++++++++++++++++++++++
 1 file changed, 24 insertions(+)

diff --git a/arch/riscv/net/bpf_jit_comp64.c b/arch/riscv/net/bpf_jit_comp64.c
index 15e482f2c657..1f0159963b3e 100644
--- a/arch/riscv/net/bpf_jit_comp64.c
+++ b/arch/riscv/net/bpf_jit_comp64.c
@@ -12,6 +12,7 @@
 #include <linux/stop_machine.h>
 #include <asm/patch.h>
 #include <asm/cfi.h>
+#include <asm/percpu.h>
 #include "bpf_jit.h"
 
 #define RV_FENTRY_NINSNS 2
@@ -1089,6 +1090,24 @@ int bpf_jit_emit_insn(const struct bpf_insn *insn, struct rv_jit_context *ctx,
 			emit_or(RV_REG_T1, rd, RV_REG_T1, ctx);
 			emit_mv(rd, RV_REG_T1, ctx);
 			break;
+		} else if (insn_is_mov_percpu_addr(insn)) {
+			if (rd != rs)
+				emit_mv(rd, rs, ctx);
+#ifdef CONFIG_SMP
+			/* Load current CPU number in T1 */
+			emit_ld(RV_REG_T1, offsetof(struct thread_info, cpu),
+				RV_REG_TP, ctx);
+			/* << 3 because offsets are 8 bytes */
+			emit_slli(RV_REG_T1, RV_REG_T1, 3, ctx);
+			/* Load address of __per_cpu_offset array in T2 */
+			emit_addr(RV_REG_T2, (u64)&__per_cpu_offset, extra_pass, ctx);
+			/* Add offset of current CPU to  __per_cpu_offset */
+			emit_add(RV_REG_T1, RV_REG_T2, RV_REG_T1, ctx);
+			/* Load __per_cpu_offset[cpu] in T1 */
+			emit_ld(RV_REG_T1, 0, RV_REG_T1, ctx);
+			/* Add the offset to Rd */
+			emit_add(rd, rd, RV_REG_T1, ctx);
+#endif
 		}
 		if (imm == 1) {
 			/* Special mov32 for zext */
@@ -2038,3 +2057,8 @@ bool bpf_jit_supports_arena(void)
 {
 	return true;
 }
+
+bool bpf_jit_supports_percpu_insn(void)
+{
+	return true;
+}
-- 
2.40.1


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH bpf-next v6 2/4] riscv, bpf: inline bpf_get_smp_processor_id()
  2024-05-02 15:18 [PATCH bpf-next v6 0/4] bpf: Inline helpers in arm64 and riscv JITs Puranjay Mohan
  2024-05-02 15:18 ` [PATCH bpf-next v6 1/4] riscv, bpf: add internal-only MOV instruction to resolve per-CPU addrs Puranjay Mohan
@ 2024-05-02 15:18 ` Puranjay Mohan
  2024-05-07 21:11   ` Andrii Nakryiko
  2024-05-02 15:18 ` [PATCH bpf-next v6 3/4] arm64, bpf: add internal-only MOV instruction to resolve per-CPU addrs Puranjay Mohan
                   ` (2 subsequent siblings)
  4 siblings, 1 reply; 9+ messages in thread
From: Puranjay Mohan @ 2024-05-02 15:18 UTC (permalink / raw
  To: Catalin Marinas, Will Deacon, Alexei Starovoitov, Daniel Borkmann,
	Andrii Nakryiko, Martin KaFai Lau, Eduard Zingerman, Song Liu,
	Yonghong Song, John Fastabend, KP Singh, Stanislav Fomichev,
	Hao Luo, Jiri Olsa, Zi Shen Lim, Xu Kuohai, Florent Revest,
	linux-arm-kernel, linux-kernel, bpf, Kumar Kartikeya Dwivedi,
	Björn Töpel
  Cc: puranjay12

Inline the calls to bpf_get_smp_processor_id() in the riscv bpf jit.

RISCV saves the pointer to the CPU's task_struct in the TP (thread
pointer) register. This makes it trivial to get the CPU's processor id.
As thread_info is the first member of task_struct, we can read the
processor id from TP + offsetof(struct thread_info, cpu).

          RISCV64 JIT output for `call bpf_get_smp_processor_id`
	  ======================================================

                Before                           After
               --------                         -------

         auipc   t1,0x848c                  ld    a5,32(tp)
         jalr    604(t1)
         mv      a5,a0

Benchmark using [1] on Qemu.

./benchs/run_bench_trigger.sh glob-arr-inc arr-inc hash-inc

+---------------+------------------+------------------+--------------+
|      Name     |     Before       |       After      |   % change   |
|---------------+------------------+------------------+--------------|
| glob-arr-inc  | 1.077 ± 0.006M/s | 1.336 ± 0.010M/s |   + 24.04%   |
| arr-inc       | 1.078 ± 0.002M/s | 1.332 ± 0.015M/s |   + 23.56%   |
| hash-inc      | 0.494 ± 0.004M/s | 0.653 ± 0.001M/s |   + 32.18%   |
+---------------+------------------+------------------+--------------+

NOTE: This benchmark includes changes from this patch and the previous
      patch that implemented the per-cpu insn.

[1] https://github.com/anakryiko/linux/commit/8dec900975ef

Signed-off-by: Puranjay Mohan <puranjay@kernel.org>
Acked-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
---
 arch/riscv/net/bpf_jit_comp64.c | 26 ++++++++++++++++++++++++++
 include/linux/filter.h          |  1 +
 kernel/bpf/core.c               | 11 +++++++++++
 kernel/bpf/verifier.c           |  4 ++++
 4 files changed, 42 insertions(+)

diff --git a/arch/riscv/net/bpf_jit_comp64.c b/arch/riscv/net/bpf_jit_comp64.c
index 1f0159963b3e..a46ec7fb4489 100644
--- a/arch/riscv/net/bpf_jit_comp64.c
+++ b/arch/riscv/net/bpf_jit_comp64.c
@@ -1493,6 +1493,22 @@ int bpf_jit_emit_insn(const struct bpf_insn *insn, struct rv_jit_context *ctx,
 		bool fixed_addr;
 		u64 addr;
 
+		/* Inline calls to bpf_get_smp_processor_id()
+		 *
+		 * RV_REG_TP holds the address of the current CPU's task_struct and thread_info is
+		 * at offset 0 in task_struct.
+		 * Load cpu from thread_info:
+		 *     Set R0 to ((struct thread_info *)(RV_REG_TP))->cpu
+		 *
+		 * This replicates the implementation of raw_smp_processor_id() on RISCV
+		 */
+		if (insn->src_reg == 0 && insn->imm == BPF_FUNC_get_smp_processor_id) {
+			/* Load current CPU number in R0 */
+			emit_ld(bpf_to_rv_reg(BPF_REG_0, ctx), offsetof(struct thread_info, cpu),
+				RV_REG_TP, ctx);
+			break;
+		}
+
 		mark_call(ctx);
 		ret = bpf_jit_get_func_addr(ctx->prog, insn, extra_pass,
 					    &addr, &fixed_addr);
@@ -2062,3 +2078,13 @@ bool bpf_jit_supports_percpu_insn(void)
 {
 	return true;
 }
+
+bool bpf_jit_inlines_helper_call(s32 imm)
+{
+	switch (imm) {
+	case BPF_FUNC_get_smp_processor_id:
+		return true;
+	default:
+		return false;
+	}
+}
diff --git a/include/linux/filter.h b/include/linux/filter.h
index 7a27f19bf44d..3e19bb62ed1a 100644
--- a/include/linux/filter.h
+++ b/include/linux/filter.h
@@ -993,6 +993,7 @@ u64 __bpf_call_base(u64 r1, u64 r2, u64 r3, u64 r4, u64 r5);
 struct bpf_prog *bpf_int_jit_compile(struct bpf_prog *prog);
 void bpf_jit_compile(struct bpf_prog *prog);
 bool bpf_jit_needs_zext(void);
+bool bpf_jit_inlines_helper_call(s32 imm);
 bool bpf_jit_supports_subprog_tailcalls(void);
 bool bpf_jit_supports_percpu_insn(void);
 bool bpf_jit_supports_kfunc_call(void);
diff --git a/kernel/bpf/core.c b/kernel/bpf/core.c
index 99b8b1c9a248..aa59af9f9bd9 100644
--- a/kernel/bpf/core.c
+++ b/kernel/bpf/core.c
@@ -2941,6 +2941,17 @@ bool __weak bpf_jit_needs_zext(void)
 	return false;
 }
 
+/* Return true if the JIT inlines the call to the helper corresponding to
+ * the imm.
+ *
+ * The verifier will not patch the insn->imm for the call to the helper if
+ * this returns true.
+ */
+bool __weak bpf_jit_inlines_helper_call(s32 imm)
+{
+	return false;
+}
+
 /* Return TRUE if the JIT backend supports mixing bpf2bpf and tailcalls. */
 bool __weak bpf_jit_supports_subprog_tailcalls(void)
 {
diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index 7360f04f9ec7..17a10d0686f3 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -20020,6 +20020,10 @@ static int do_misc_fixups(struct bpf_verifier_env *env)
 			goto next_insn;
 		}
 
+		/* Skip inlining the helper call if the JIT does it. */
+		if (bpf_jit_inlines_helper_call(insn->imm))
+			goto next_insn;
+
 		if (insn->imm == BPF_FUNC_get_route_realm)
 			prog->dst_needed = 1;
 		if (insn->imm == BPF_FUNC_get_prandom_u32)
-- 
2.40.1


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH bpf-next v6 3/4] arm64, bpf: add internal-only MOV instruction to resolve per-CPU addrs
  2024-05-02 15:18 [PATCH bpf-next v6 0/4] bpf: Inline helpers in arm64 and riscv JITs Puranjay Mohan
  2024-05-02 15:18 ` [PATCH bpf-next v6 1/4] riscv, bpf: add internal-only MOV instruction to resolve per-CPU addrs Puranjay Mohan
  2024-05-02 15:18 ` [PATCH bpf-next v6 2/4] riscv, bpf: inline bpf_get_smp_processor_id() Puranjay Mohan
@ 2024-05-02 15:18 ` Puranjay Mohan
  2024-05-07 21:13   ` Andrii Nakryiko
  2024-05-02 15:18 ` [PATCH bpf-next v6 4/4] bpf, arm64: inline bpf_get_smp_processor_id() helper Puranjay Mohan
  2024-05-13  0:00 ` [PATCH bpf-next v6 0/4] bpf: Inline helpers in arm64 and riscv JITs patchwork-bot+netdevbpf
  4 siblings, 1 reply; 9+ messages in thread
From: Puranjay Mohan @ 2024-05-02 15:18 UTC (permalink / raw
  To: Catalin Marinas, Will Deacon, Alexei Starovoitov, Daniel Borkmann,
	Andrii Nakryiko, Martin KaFai Lau, Eduard Zingerman, Song Liu,
	Yonghong Song, John Fastabend, KP Singh, Stanislav Fomichev,
	Hao Luo, Jiri Olsa, Zi Shen Lim, Xu Kuohai, Florent Revest,
	linux-arm-kernel, linux-kernel, bpf, Kumar Kartikeya Dwivedi,
	Björn Töpel
  Cc: puranjay12

From: Puranjay Mohan <puranjay12@gmail.com>

Support an instruction for resolving absolute addresses of per-CPU
data from their per-CPU offsets. This instruction is internal-only and
users are not allowed to use them directly. They will only be used for
internal inlining optimizations for now between BPF verifier and BPF
JITs.

Since commit 7158627686f0 ("arm64: percpu: implement optimised pcpu
access using tpidr_el1"), the per-cpu offset for the CPU is stored in
the tpidr_el1/2 register of that CPU.

To support this BPF instruction in the ARM64 JIT, the following ARM64
instructions are emitted:

mov dst, src		// Move src to dst, if src != dst
mrs tmp, tpidr_el1/2	// Move per-cpu offset of the current cpu in tmp.
add dst, dst, tmp	// Add the per cpu offset to the dst.

To measure the performance improvement provided by this change, the
benchmark in [1] was used:

Before:
glob-arr-inc   :   23.597 ± 0.012M/s
arr-inc        :   23.173 ± 0.019M/s
hash-inc       :   12.186 ± 0.028M/s

After:
glob-arr-inc   :   23.819 ± 0.034M/s
arr-inc        :   23.285 ± 0.017M/s
hash-inc       :   12.419 ± 0.011M/s

[1] https://github.com/anakryiko/linux/commit/8dec900975ef

Signed-off-by: Puranjay Mohan <puranjay12@gmail.com>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
---
 arch/arm64/include/asm/insn.h |  7 +++++++
 arch/arm64/lib/insn.c         | 11 +++++++++++
 arch/arm64/net/bpf_jit.h      |  6 ++++++
 arch/arm64/net/bpf_jit_comp.c | 14 ++++++++++++++
 4 files changed, 38 insertions(+)

diff --git a/arch/arm64/include/asm/insn.h b/arch/arm64/include/asm/insn.h
index db1aeacd4cd9..8de0e39b29f3 100644
--- a/arch/arm64/include/asm/insn.h
+++ b/arch/arm64/include/asm/insn.h
@@ -135,6 +135,11 @@ enum aarch64_insn_special_register {
 	AARCH64_INSN_SPCLREG_SP_EL2	= 0xF210
 };
 
+enum aarch64_insn_system_register {
+	AARCH64_INSN_SYSREG_TPIDR_EL1	= 0x4684,
+	AARCH64_INSN_SYSREG_TPIDR_EL2	= 0x6682,
+};
+
 enum aarch64_insn_variant {
 	AARCH64_INSN_VARIANT_32BIT,
 	AARCH64_INSN_VARIANT_64BIT
@@ -686,6 +691,8 @@ u32 aarch64_insn_gen_cas(enum aarch64_insn_register result,
 }
 #endif
 u32 aarch64_insn_gen_dmb(enum aarch64_insn_mb_type type);
+u32 aarch64_insn_gen_mrs(enum aarch64_insn_register result,
+			 enum aarch64_insn_system_register sysreg);
 
 s32 aarch64_get_branch_offset(u32 insn);
 u32 aarch64_set_branch_offset(u32 insn, s32 offset);
diff --git a/arch/arm64/lib/insn.c b/arch/arm64/lib/insn.c
index a635ab83fee3..b008a9b46a7f 100644
--- a/arch/arm64/lib/insn.c
+++ b/arch/arm64/lib/insn.c
@@ -1515,3 +1515,14 @@ u32 aarch64_insn_gen_dmb(enum aarch64_insn_mb_type type)
 
 	return insn;
 }
+
+u32 aarch64_insn_gen_mrs(enum aarch64_insn_register result,
+			 enum aarch64_insn_system_register sysreg)
+{
+	u32 insn = aarch64_insn_get_mrs_value();
+
+	insn &= ~GENMASK(19, 0);
+	insn |= sysreg << 5;
+	return aarch64_insn_encode_register(AARCH64_INSN_REGTYPE_RT,
+					    insn, result);
+}
diff --git a/arch/arm64/net/bpf_jit.h b/arch/arm64/net/bpf_jit.h
index 23b1b34db088..b627ef7188c7 100644
--- a/arch/arm64/net/bpf_jit.h
+++ b/arch/arm64/net/bpf_jit.h
@@ -297,4 +297,10 @@
 #define A64_ADR(Rd, offset) \
 	aarch64_insn_gen_adr(0, offset, Rd, AARCH64_INSN_ADR_TYPE_ADR)
 
+/* MRS */
+#define A64_MRS_TPIDR_EL1(Rt) \
+	aarch64_insn_gen_mrs(Rt, AARCH64_INSN_SYSREG_TPIDR_EL1)
+#define A64_MRS_TPIDR_EL2(Rt) \
+	aarch64_insn_gen_mrs(Rt, AARCH64_INSN_SYSREG_TPIDR_EL2)
+
 #endif /* _BPF_JIT_H */
diff --git a/arch/arm64/net/bpf_jit_comp.c b/arch/arm64/net/bpf_jit_comp.c
index 76b91f36c729..ed8f9716d9d5 100644
--- a/arch/arm64/net/bpf_jit_comp.c
+++ b/arch/arm64/net/bpf_jit_comp.c
@@ -877,6 +877,15 @@ static int build_insn(const struct bpf_insn *insn, struct jit_ctx *ctx,
 			emit(A64_ORR(1, tmp, dst, tmp), ctx);
 			emit(A64_MOV(1, dst, tmp), ctx);
 			break;
+		} else if (insn_is_mov_percpu_addr(insn)) {
+			if (dst != src)
+				emit(A64_MOV(1, dst, src), ctx);
+			if (cpus_have_cap(ARM64_HAS_VIRT_HOST_EXTN))
+				emit(A64_MRS_TPIDR_EL2(tmp), ctx);
+			else
+				emit(A64_MRS_TPIDR_EL1(tmp), ctx);
+			emit(A64_ADD(1, dst, dst, tmp), ctx);
+			break;
 		}
 		switch (insn->off) {
 		case 0:
@@ -2527,6 +2536,11 @@ bool bpf_jit_supports_arena(void)
 	return true;
 }
 
+bool bpf_jit_supports_percpu_insn(void)
+{
+	return true;
+}
+
 void bpf_jit_free(struct bpf_prog *prog)
 {
 	if (prog->jited) {
-- 
2.40.1


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH bpf-next v6 4/4] bpf, arm64: inline bpf_get_smp_processor_id() helper
  2024-05-02 15:18 [PATCH bpf-next v6 0/4] bpf: Inline helpers in arm64 and riscv JITs Puranjay Mohan
                   ` (2 preceding siblings ...)
  2024-05-02 15:18 ` [PATCH bpf-next v6 3/4] arm64, bpf: add internal-only MOV instruction to resolve per-CPU addrs Puranjay Mohan
@ 2024-05-02 15:18 ` Puranjay Mohan
  2024-05-13  0:00 ` [PATCH bpf-next v6 0/4] bpf: Inline helpers in arm64 and riscv JITs patchwork-bot+netdevbpf
  4 siblings, 0 replies; 9+ messages in thread
From: Puranjay Mohan @ 2024-05-02 15:18 UTC (permalink / raw
  To: Catalin Marinas, Will Deacon, Alexei Starovoitov, Daniel Borkmann,
	Andrii Nakryiko, Martin KaFai Lau, Eduard Zingerman, Song Liu,
	Yonghong Song, John Fastabend, KP Singh, Stanislav Fomichev,
	Hao Luo, Jiri Olsa, Zi Shen Lim, Xu Kuohai, Florent Revest,
	linux-arm-kernel, linux-kernel, bpf, Kumar Kartikeya Dwivedi,
	Björn Töpel
  Cc: puranjay12

Inline calls to bpf_get_smp_processor_id() helper in the JIT by emitting
a read from struct thread_info. The SP_EL0 system register holds the
pointer to the task_struct and thread_info is the first member of this
struct. We can read the cpu number from the thread_info.

Here is how the ARM64 JITed assembly changes after this commit:

                                      ARM64 JIT
                                     ===========

              BEFORE                                    AFTER
             --------                                  -------

int cpu = bpf_get_smp_processor_id();        int cpu = bpf_get_smp_processor_id();

mov     x10, #0xfffffffffffff4d0             mrs     x10, sp_el0
movk    x10, #0x802b, lsl #16                ldr     w7, [x10, #24]
movk    x10, #0x8000, lsl #32
blr     x10
add     x7, x0, #0x0

               Performance improvement using benchmark[1]

./benchs/run_bench_trigger.sh glob-arr-inc arr-inc hash-inc

+---------------+-------------------+-------------------+--------------+
|      Name     |      Before       |        After      |   % change   |
|---------------+-------------------+-------------------+--------------|
| glob-arr-inc  | 23.380 ± 1.675M/s | 25.893 ± 0.026M/s |   + 10.74%   |
| arr-inc       | 23.928 ± 0.034M/s | 25.213 ± 0.063M/s |   + 5.37%    |
| hash-inc      | 12.352 ± 0.005M/s | 12.609 ± 0.013M/s |   + 2.08%    |
+---------------+-------------------+-------------------+--------------+

[1] https://github.com/anakryiko/linux/commit/8dec900975ef

Signed-off-by: Puranjay Mohan <puranjay@kernel.org>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
---
 arch/arm64/include/asm/insn.h |  1 +
 arch/arm64/net/bpf_jit.h      |  2 ++
 arch/arm64/net/bpf_jit_comp.c | 25 +++++++++++++++++++++++++
 3 files changed, 28 insertions(+)

diff --git a/arch/arm64/include/asm/insn.h b/arch/arm64/include/asm/insn.h
index 8de0e39b29f3..8c0a36f72d6f 100644
--- a/arch/arm64/include/asm/insn.h
+++ b/arch/arm64/include/asm/insn.h
@@ -138,6 +138,7 @@ enum aarch64_insn_special_register {
 enum aarch64_insn_system_register {
 	AARCH64_INSN_SYSREG_TPIDR_EL1	= 0x4684,
 	AARCH64_INSN_SYSREG_TPIDR_EL2	= 0x6682,
+	AARCH64_INSN_SYSREG_SP_EL0	= 0x4208,
 };
 
 enum aarch64_insn_variant {
diff --git a/arch/arm64/net/bpf_jit.h b/arch/arm64/net/bpf_jit.h
index b627ef7188c7..b22ab2f97a30 100644
--- a/arch/arm64/net/bpf_jit.h
+++ b/arch/arm64/net/bpf_jit.h
@@ -302,5 +302,7 @@
 	aarch64_insn_gen_mrs(Rt, AARCH64_INSN_SYSREG_TPIDR_EL1)
 #define A64_MRS_TPIDR_EL2(Rt) \
 	aarch64_insn_gen_mrs(Rt, AARCH64_INSN_SYSREG_TPIDR_EL2)
+#define A64_MRS_SP_EL0(Rt) \
+	aarch64_insn_gen_mrs(Rt, AARCH64_INSN_SYSREG_SP_EL0)
 
 #endif /* _BPF_JIT_H */
diff --git a/arch/arm64/net/bpf_jit_comp.c b/arch/arm64/net/bpf_jit_comp.c
index ed8f9716d9d5..1cebe9c92f51 100644
--- a/arch/arm64/net/bpf_jit_comp.c
+++ b/arch/arm64/net/bpf_jit_comp.c
@@ -1215,6 +1215,21 @@ static int build_insn(const struct bpf_insn *insn, struct jit_ctx *ctx,
 		const u8 r0 = bpf2a64[BPF_REG_0];
 		bool func_addr_fixed;
 		u64 func_addr;
+		u32 cpu_offset;
+
+		/* Implement helper call to bpf_get_smp_processor_id() inline */
+		if (insn->src_reg == 0 && insn->imm == BPF_FUNC_get_smp_processor_id) {
+			cpu_offset = offsetof(struct thread_info, cpu);
+
+			emit(A64_MRS_SP_EL0(tmp), ctx);
+			if (is_lsi_offset(cpu_offset, 2)) {
+				emit(A64_LDR32I(r0, tmp, cpu_offset), ctx);
+			} else {
+				emit_a64_mov_i(1, tmp2, cpu_offset, ctx);
+				emit(A64_LDR32(r0, tmp, tmp2), ctx);
+			}
+			break;
+		}
 
 		ret = bpf_jit_get_func_addr(ctx->prog, insn, extra_pass,
 					    &func_addr, &func_addr_fixed);
@@ -2541,6 +2556,16 @@ bool bpf_jit_supports_percpu_insn(void)
 	return true;
 }
 
+bool bpf_jit_inlines_helper_call(s32 imm)
+{
+	switch (imm) {
+	case BPF_FUNC_get_smp_processor_id:
+		return true;
+	default:
+		return false;
+	}
+}
+
 void bpf_jit_free(struct bpf_prog *prog)
 {
 	if (prog->jited) {
-- 
2.40.1


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [PATCH bpf-next v6 1/4] riscv, bpf: add internal-only MOV instruction to resolve per-CPU addrs
  2024-05-02 15:18 ` [PATCH bpf-next v6 1/4] riscv, bpf: add internal-only MOV instruction to resolve per-CPU addrs Puranjay Mohan
@ 2024-05-07 21:07   ` Andrii Nakryiko
  0 siblings, 0 replies; 9+ messages in thread
From: Andrii Nakryiko @ 2024-05-07 21:07 UTC (permalink / raw
  To: Puranjay Mohan
  Cc: Catalin Marinas, Will Deacon, Alexei Starovoitov, Daniel Borkmann,
	Andrii Nakryiko, Martin KaFai Lau, Eduard Zingerman, Song Liu,
	Yonghong Song, John Fastabend, KP Singh, Stanislav Fomichev,
	Hao Luo, Jiri Olsa, Zi Shen Lim, Xu Kuohai, Florent Revest,
	linux-arm-kernel, linux-kernel, bpf, Kumar Kartikeya Dwivedi,
	Björn Töpel, puranjay12

On Thu, May 2, 2024 at 8:19 AM Puranjay Mohan <puranjay@kernel.org> wrote:
>
> Support an instruction for resolving absolute addresses of per-CPU
> data from their per-CPU offsets. This instruction is internal-only and
> users are not allowed to use them directly. They will only be used for
> internal inlining optimizations for now between BPF verifier and BPF
> JITs.
>
> RISC-V uses generic per-cpu implementation where the offsets for CPUs
> are kept in an array called __per_cpu_offset[cpu_number]. RISCV stores
> the address of the task_struct in TP register. The first element in
> task_struct is struct thread_info, and we can get the cpu number by
> reading from the TP register + offsetof(struct thread_info, cpu).
>
> Once we have the cpu number in a register we read the offset for that
> cpu from address: &__per_cpu_offset + cpu_number << 3. Then we add this
> offset to the destination register.
>
> To measure the improvement from this change, the benchmark in [1] was
> used on Qemu:
>
> Before:
> glob-arr-inc   :    1.127 ± 0.013M/s
> arr-inc        :    1.121 ± 0.004M/s
> hash-inc       :    0.681 ± 0.052M/s
>
> After:
> glob-arr-inc   :    1.138 ± 0.011M/s
> arr-inc        :    1.366 ± 0.006M/s
> hash-inc       :    0.676 ± 0.001M/s
>
> [1] https://github.com/anakryiko/linux/commit/8dec900975ef
>
> Signed-off-by: Puranjay Mohan <puranjay@kernel.org>
> ---
>  arch/riscv/net/bpf_jit_comp64.c | 24 ++++++++++++++++++++++++
>  1 file changed, 24 insertions(+)
>

Please carry over acks you got on previous revisions, unless you
significantly change something about the patch, invalidating previous
acks. You had Bjorn's ack on this one, I believe:

Acked-by: Björn Töpel <bjorn@kernel.org>


> diff --git a/arch/riscv/net/bpf_jit_comp64.c b/arch/riscv/net/bpf_jit_comp64.c
> index 15e482f2c657..1f0159963b3e 100644
> --- a/arch/riscv/net/bpf_jit_comp64.c
> +++ b/arch/riscv/net/bpf_jit_comp64.c
> @@ -12,6 +12,7 @@
>  #include <linux/stop_machine.h>
>  #include <asm/patch.h>
>  #include <asm/cfi.h>
> +#include <asm/percpu.h>
>  #include "bpf_jit.h"
>
>  #define RV_FENTRY_NINSNS 2
> @@ -1089,6 +1090,24 @@ int bpf_jit_emit_insn(const struct bpf_insn *insn, struct rv_jit_context *ctx,
>                         emit_or(RV_REG_T1, rd, RV_REG_T1, ctx);
>                         emit_mv(rd, RV_REG_T1, ctx);
>                         break;
> +               } else if (insn_is_mov_percpu_addr(insn)) {
> +                       if (rd != rs)
> +                               emit_mv(rd, rs, ctx);
> +#ifdef CONFIG_SMP
> +                       /* Load current CPU number in T1 */
> +                       emit_ld(RV_REG_T1, offsetof(struct thread_info, cpu),
> +                               RV_REG_TP, ctx);
> +                       /* << 3 because offsets are 8 bytes */
> +                       emit_slli(RV_REG_T1, RV_REG_T1, 3, ctx);
> +                       /* Load address of __per_cpu_offset array in T2 */
> +                       emit_addr(RV_REG_T2, (u64)&__per_cpu_offset, extra_pass, ctx);
> +                       /* Add offset of current CPU to  __per_cpu_offset */
> +                       emit_add(RV_REG_T1, RV_REG_T2, RV_REG_T1, ctx);
> +                       /* Load __per_cpu_offset[cpu] in T1 */
> +                       emit_ld(RV_REG_T1, 0, RV_REG_T1, ctx);
> +                       /* Add the offset to Rd */
> +                       emit_add(rd, rd, RV_REG_T1, ctx);
> +#endif
>                 }
>                 if (imm == 1) {
>                         /* Special mov32 for zext */
> @@ -2038,3 +2057,8 @@ bool bpf_jit_supports_arena(void)
>  {
>         return true;
>  }
> +
> +bool bpf_jit_supports_percpu_insn(void)
> +{
> +       return true;
> +}
> --
> 2.40.1
>

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH bpf-next v6 2/4] riscv, bpf: inline bpf_get_smp_processor_id()
  2024-05-02 15:18 ` [PATCH bpf-next v6 2/4] riscv, bpf: inline bpf_get_smp_processor_id() Puranjay Mohan
@ 2024-05-07 21:11   ` Andrii Nakryiko
  0 siblings, 0 replies; 9+ messages in thread
From: Andrii Nakryiko @ 2024-05-07 21:11 UTC (permalink / raw
  To: Puranjay Mohan
  Cc: Catalin Marinas, Will Deacon, Alexei Starovoitov, Daniel Borkmann,
	Andrii Nakryiko, Martin KaFai Lau, Eduard Zingerman, Song Liu,
	Yonghong Song, John Fastabend, KP Singh, Stanislav Fomichev,
	Hao Luo, Jiri Olsa, Zi Shen Lim, Xu Kuohai, Florent Revest,
	linux-arm-kernel, linux-kernel, bpf, Kumar Kartikeya Dwivedi,
	Björn Töpel, puranjay12

On Thu, May 2, 2024 at 8:19 AM Puranjay Mohan <puranjay@kernel.org> wrote:
>
> Inline the calls to bpf_get_smp_processor_id() in the riscv bpf jit.
>
> RISCV saves the pointer to the CPU's task_struct in the TP (thread
> pointer) register. This makes it trivial to get the CPU's processor id.
> As thread_info is the first member of task_struct, we can read the
> processor id from TP + offsetof(struct thread_info, cpu).
>
>           RISCV64 JIT output for `call bpf_get_smp_processor_id`
>           ======================================================
>
>                 Before                           After
>                --------                         -------
>
>          auipc   t1,0x848c                  ld    a5,32(tp)
>          jalr    604(t1)
>          mv      a5,a0
>
> Benchmark using [1] on Qemu.
>
> ./benchs/run_bench_trigger.sh glob-arr-inc arr-inc hash-inc
>
> +---------------+------------------+------------------+--------------+
> |      Name     |     Before       |       After      |   % change   |
> |---------------+------------------+------------------+--------------|
> | glob-arr-inc  | 1.077 ± 0.006M/s | 1.336 ± 0.010M/s |   + 24.04%   |
> | arr-inc       | 1.078 ± 0.002M/s | 1.332 ± 0.015M/s |   + 23.56%   |
> | hash-inc      | 0.494 ± 0.004M/s | 0.653 ± 0.001M/s |   + 32.18%   |
> +---------------+------------------+------------------+--------------+
>
> NOTE: This benchmark includes changes from this patch and the previous
>       patch that implemented the per-cpu insn.
>
> [1] https://github.com/anakryiko/linux/commit/8dec900975ef
>
> Signed-off-by: Puranjay Mohan <puranjay@kernel.org>
> Acked-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
> Acked-by: Andrii Nakryiko <andrii@kernel.org>
> ---

same about carrying over acks:

Acked-by: Björn Töpel <bjorn@kernel.org>

>  arch/riscv/net/bpf_jit_comp64.c | 26 ++++++++++++++++++++++++++
>  include/linux/filter.h          |  1 +
>  kernel/bpf/core.c               | 11 +++++++++++
>  kernel/bpf/verifier.c           |  4 ++++
>  4 files changed, 42 insertions(+)
>

[...]

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH bpf-next v6 3/4] arm64, bpf: add internal-only MOV instruction to resolve per-CPU addrs
  2024-05-02 15:18 ` [PATCH bpf-next v6 3/4] arm64, bpf: add internal-only MOV instruction to resolve per-CPU addrs Puranjay Mohan
@ 2024-05-07 21:13   ` Andrii Nakryiko
  0 siblings, 0 replies; 9+ messages in thread
From: Andrii Nakryiko @ 2024-05-07 21:13 UTC (permalink / raw
  To: Puranjay Mohan, Will Deacon, Catalin Marinas, Zi Shen Lim
  Cc: Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	Martin KaFai Lau, Eduard Zingerman, Song Liu, Yonghong Song,
	John Fastabend, KP Singh, Stanislav Fomichev, Hao Luo, Jiri Olsa,
	Xu Kuohai, Florent Revest, linux-arm-kernel, linux-kernel, bpf,
	Kumar Kartikeya Dwivedi, Björn Töpel, puranjay12

On Thu, May 2, 2024 at 8:19 AM Puranjay Mohan <puranjay@kernel.org> wrote:
>
> From: Puranjay Mohan <puranjay12@gmail.com>
>
> Support an instruction for resolving absolute addresses of per-CPU
> data from their per-CPU offsets. This instruction is internal-only and
> users are not allowed to use them directly. They will only be used for
> internal inlining optimizations for now between BPF verifier and BPF
> JITs.
>
> Since commit 7158627686f0 ("arm64: percpu: implement optimised pcpu
> access using tpidr_el1"), the per-cpu offset for the CPU is stored in
> the tpidr_el1/2 register of that CPU.
>
> To support this BPF instruction in the ARM64 JIT, the following ARM64
> instructions are emitted:
>
> mov dst, src            // Move src to dst, if src != dst
> mrs tmp, tpidr_el1/2    // Move per-cpu offset of the current cpu in tmp.
> add dst, dst, tmp       // Add the per cpu offset to the dst.
>
> To measure the performance improvement provided by this change, the
> benchmark in [1] was used:
>
> Before:
> glob-arr-inc   :   23.597 ± 0.012M/s
> arr-inc        :   23.173 ± 0.019M/s
> hash-inc       :   12.186 ± 0.028M/s
>
> After:
> glob-arr-inc   :   23.819 ± 0.034M/s
> arr-inc        :   23.285 ± 0.017M/s
> hash-inc       :   12.419 ± 0.011M/s
>
> [1] https://github.com/anakryiko/linux/commit/8dec900975ef
>
> Signed-off-by: Puranjay Mohan <puranjay12@gmail.com>
> Acked-by: Andrii Nakryiko <andrii@kernel.org>
> ---
>  arch/arm64/include/asm/insn.h |  7 +++++++
>  arch/arm64/lib/insn.c         | 11 +++++++++++
>  arch/arm64/net/bpf_jit.h      |  6 ++++++
>  arch/arm64/net/bpf_jit_comp.c | 14 ++++++++++++++
>  4 files changed, 38 insertions(+)

Catalin, Will, Zi,

Any objections to landing these patches into the bpf-next tree? Can we
get some acks from ARM64 folks? Thanks!

>
> diff --git a/arch/arm64/include/asm/insn.h b/arch/arm64/include/asm/insn.h
> index db1aeacd4cd9..8de0e39b29f3 100644
> --- a/arch/arm64/include/asm/insn.h
> +++ b/arch/arm64/include/asm/insn.h
> @@ -135,6 +135,11 @@ enum aarch64_insn_special_register {
>         AARCH64_INSN_SPCLREG_SP_EL2     = 0xF210
>  };
>

[...]

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH bpf-next v6 0/4] bpf: Inline helpers in arm64 and riscv JITs
  2024-05-02 15:18 [PATCH bpf-next v6 0/4] bpf: Inline helpers in arm64 and riscv JITs Puranjay Mohan
                   ` (3 preceding siblings ...)
  2024-05-02 15:18 ` [PATCH bpf-next v6 4/4] bpf, arm64: inline bpf_get_smp_processor_id() helper Puranjay Mohan
@ 2024-05-13  0:00 ` patchwork-bot+netdevbpf
  4 siblings, 0 replies; 9+ messages in thread
From: patchwork-bot+netdevbpf @ 2024-05-13  0:00 UTC (permalink / raw
  To: Puranjay Mohan
  Cc: catalin.marinas, will, ast, daniel, andrii, martin.lau, eddyz87,
	song, yonghong.song, john.fastabend, kpsingh, sdf, haoluo, jolsa,
	zlim.lnx, xukuohai, revest, linux-arm-kernel, linux-kernel, bpf,
	memxor, bjorn, puranjay12

Hello:

This series was applied to bpf/bpf-next.git (master)
by Alexei Starovoitov <ast@kernel.org>:

On Thu,  2 May 2024 15:18:50 +0000 you wrote:
> Changes in v5 -> v6:
> arm64 v5: https://lore.kernel.org/all/20240430234739.79185-1-puranjay@kernel.org/
> riscv v2: https://lore.kernel.org/all/20240430175834.33152-1-puranjay@kernel.org/
> - Combine riscv and arm64 changes in single series
> - Some coding style fixes
> 
> Changes in v4 -> v5:
> v4: https://lore.kernel.org/all/20240429131647.50165-1-puranjay@kernel.org/
> - Implement the inlining of the bpf_get_smp_processor_id() in the JIT.
> 
> [...]

Here is the summary with links:
  - [bpf-next,v6,1/4] riscv, bpf: add internal-only MOV instruction to resolve per-CPU addrs
    https://git.kernel.org/bpf/bpf-next/c/19c56d4e5be1
  - [bpf-next,v6,2/4] riscv, bpf: inline bpf_get_smp_processor_id()
    https://git.kernel.org/bpf/bpf-next/c/2ddec2c80b44
  - [bpf-next,v6,3/4] arm64, bpf: add internal-only MOV instruction to resolve per-CPU addrs
    https://git.kernel.org/bpf/bpf-next/c/7a4c32222b0e
  - [bpf-next,v6,4/4] bpf, arm64: inline bpf_get_smp_processor_id() helper
    https://git.kernel.org/bpf/bpf-next/c/75fe4c0b3e18

You are awesome, thank you!
-- 
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html



^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2024-05-13  0:00 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-05-02 15:18 [PATCH bpf-next v6 0/4] bpf: Inline helpers in arm64 and riscv JITs Puranjay Mohan
2024-05-02 15:18 ` [PATCH bpf-next v6 1/4] riscv, bpf: add internal-only MOV instruction to resolve per-CPU addrs Puranjay Mohan
2024-05-07 21:07   ` Andrii Nakryiko
2024-05-02 15:18 ` [PATCH bpf-next v6 2/4] riscv, bpf: inline bpf_get_smp_processor_id() Puranjay Mohan
2024-05-07 21:11   ` Andrii Nakryiko
2024-05-02 15:18 ` [PATCH bpf-next v6 3/4] arm64, bpf: add internal-only MOV instruction to resolve per-CPU addrs Puranjay Mohan
2024-05-07 21:13   ` Andrii Nakryiko
2024-05-02 15:18 ` [PATCH bpf-next v6 4/4] bpf, arm64: inline bpf_get_smp_processor_id() helper Puranjay Mohan
2024-05-13  0:00 ` [PATCH bpf-next v6 0/4] bpf: Inline helpers in arm64 and riscv JITs patchwork-bot+netdevbpf

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).