BPF Archive mirror
 help / color / mirror / Atom feed
From: Andrii Nakryiko <andrii.nakryiko@gmail.com>
To: Alan Maguire <alan.maguire@oracle.com>
Cc: andrii@kernel.org, ast@kernel.org, jolsa@kernel.org,
	acme@redhat.com,  quentin@isovalent.com, eddyz87@gmail.com,
	mykolal@fb.com,  daniel@iogearbox.net, martin.lau@linux.dev,
	song@kernel.org,  yonghong.song@linux.dev,
	john.fastabend@gmail.com, kpsingh@kernel.org,  sdf@google.com,
	haoluo@google.com, houtao1@huawei.com, bpf@vger.kernel.org,
	 masahiroy@kernel.org, mcgrof@kernel.org, nathan@kernel.org
Subject: Re: [PATCH v2 bpf-next 00/13] bpf: support resilient split BTF
Date: Mon, 29 Apr 2024 11:02:28 -0700	[thread overview]
Message-ID: <CAEf4BzY__EKpsdxqw+CiUQ=oe8CLtsVwP3nQgqTZ7eCBzxTr4w@mail.gmail.com> (raw)
In-Reply-To: <f05afb12-5ec8-47d7-bcd6-a0d6b913b38c@oracle.com>

On Mon, Apr 29, 2024 at 10:31 AM Alan Maguire <alan.maguire@oracle.com> wrote:
>
> On 29/04/2024 18:05, Andrii Nakryiko wrote:
> > On Mon, Apr 29, 2024 at 8:25 AM Alan Maguire <alan.maguire@oracle.com> wrote:
> >>
> >> On 27/04/2024 01:24, Andrii Nakryiko wrote:
> >>> On Fri, Apr 26, 2024 at 3:56 PM Andrii Nakryiko
> >>> <andrii.nakryiko@gmail.com> wrote:
> >>>>
> >>>> On Wed, Apr 24, 2024 at 8:48 AM Alan Maguire <alan.maguire@oracle.com> wrote:
> >>>>>
> >>>>> Split BPF Type Format (BTF) provides huge advantages in that kernel
> >>>>> modules only have to provide type information for types that they do not
> >>>>> share with the core kernel; for core kernel types, split BTF refers to
> >>>>> core kernel BTF type ids.  So for a STRUCT sk_buff, a module that
> >>>>> uses that structure (or a pointer to it) simply needs to refer to the
> >>>>> core kernel type id, saving the need to define the structure and its many
> >>>>> dependents.  This cuts down on duplication and makes BTF as compact
> >>>>> as possible.
> >>>>>
> >>>>> However, there is a downside.  This scheme requires the references from
> >>>>> split BTF to base BTF to be valid not just at encoding time, but at use
> >>>>> time (when the module is loaded).  Even a small change in kernel types
> >>>>> can perturb the type ids in core kernel BTF, and due to pahole's
> >>>>> parallel processing of compilation units, even an unchanged kernel can
> >>>>> have different type ids if BTF is re-generated.  So we have a robustness
> >>>>> problem for split BTF for cases where a module is not always compiled at
> >>>>> the same time as the kernel.  This problem is particularly acute for
> >>>>> distros which generally want module builders to be able to compile a
> >>>>> module for the lifetime of a Linux stable-based release, and have it
> >>>>> continue to be valid over the lifetime of that release, even as changes
> >>>>> in data structures (and hence BTF types) accrue.  Today it's not
> >>>>> possible to generate BTF for modules that works beyond the initial
> >>>>> kernel it is compiled against - kernel bugfixes etc invalidate the split
> >>>>> BTF references to vmlinux BTF, and BTF is no longer usable for the
> >>>>> module.
> >>>>>
> >>>>> The goal of this series is to provide options to provide additional
> >>>>> context for cases like this.  That context comes in the form of
> >>>>> distilled base BTF; it stands in for the base BTF, and contains
> >>>>> information about the types referenced from split BTF, but not their
> >>>>> full descriptions.  The modified split BTF will refer to type ids in
> >>>>> this .BTF.base section, and when the kernel loads such modules it
> >>>>> will use that base BTF to map references from split BTF to the
> >>>>> current vmlinux BTF - a process of relocating split BTF with the
> >>>>> currently-running kernel's vmlinux base BTF.
> >>>>>
> >>>>> A module builder - using this series along with the pahole changes -
> >>>>> can then build a module with distilled base BTF via an out-of-tree
> >>>>> module build, i.e.
> >>>>>
> >>>>> make -C . M=path/2/module
> >>>>>
> >>>>> The module will have a .BTF section (the split BTF) and a
> >>>>> .BTF.base section.  The latter is small in size - distilled base
> >>>>> BTF does not need full struct/union/enum information for named
> >>>>> types for example.  For 2667 modules built with distilled base BTF,
> >>>>> the average size observed was 1556 bytes (stddev 1563).
> >>>>>
> >>>>> Note that for the in-tree modules, this approach is not needed as
> >>>>> split and base BTF in the case of in-tree modules are always built
> >>>>> and re-built together.
> >>>>>
> >>>>> The series first focuses on generating split BTF with distilled base
> >>>>> BTF, and provides btf__parse_opts() which allows specification
> >>>>> of the section name from which to read BTF data, since we now have
> >>>>> both .BTF and .BTF.base sections that can contain such data.
> >>>>>
> >>>>> Then we add support to resolve_btfids for generating the .BTF.ids
> >>>>> section with reference to the .BTF.base section - this ensures the
> >>>>> .BTF.ids match those used in the split/base BTF.
> >>>>>
> >>>>> Finally the series provides the mechanism for relocating split BTF with
> >>>>> a new base; the distilled base BTF is used to map the references to base
> >>>>> BTF in the split BTF to the new base.  For the kernel, this relocation
> >>>>> process happens at module load time, and we relocate split BTF
> >>>>> references to point at types in the current vmlinux BTF.  As part of
> >>>>> this, .BTF.ids references need to be mapped also.
> >>>>>
> >>>>> So concretely, what happens is
> >>>>>
> >>>>> - we generate split BTF in the .BTF section of a module that refers to
> >>>>>   types in the .BTF.base section as base types; these are not full
> >>>>>   type descriptions but provide information about the base type.  So
> >>>>>   a STRUCT sk_buff would be represented as a FWD struct sk_buff in
> >>>>>   distilled base BTF for example.
> >>>>> - when the module is loaded, the split BTF is relocated with vmlinux
> >>>>>   BTF; in the case of the FWD struct sk_buff, we find the STRUCT sk_buff
> >>>>>   in vmlinux BTF and map all split BTF references to the distilled base
> >>>>>   FWD sk_buff, replacing them with references to the vmlinux BTF
> >>>>>   STRUCT sk_buff.
> >>>>>
> >>>>> Support is also added to bpftool to be able to display split BTF
> >>>>> relative to its .BTF.base section, and also to display the relocated
> >>>>> form via the "-R path_to_base_btf".
> >>>>>
> >>>>> A previous approach to this problem [1] utilized standalone BTF for such
> >>>>> cases - where the BTF is not defined relative to base BTF so there is no
> >>>>> relocation required.  The problem with that approach is that from
> >>>>> the verifier perspective, some types are special, and having a custom
> >>>>> representation of a core kernel type that did not necessarily match the
> >>>>> current representation is not tenable.  So the approach taken here was
> >>>>> to preserve the split BTF model while minimizing the representation of
> >>>>> the context needed to relocate split and current vmlinux BTF.
> >>>>>
> >>>>> To generate distilled .BTF.base sections the associated dwarves
> >>>>> patch (to be applied on the "next" branch there) is needed.
> >>>>> Without it, things will still work but bpf_testmod will not be built
> >>>>> with a .BTF.base section.
> >>>>>
> >>>>> Changes since RFC [2]:
> >>>>>
> >>>>> - updated terminology; we replace clunky "base reference" BTF with
> >>>>>   distilling base BTF into a .BTF.base section. Similarly BTF
> >>>>>   reconcilation becomes BTF relocation (Andrii, most patches)
> >>>>> - add distilled base BTF by default for out-of-tree modules
> >>>>>   (Alexei, patch 8)
> >>>>> - distill algorithm updated to record size of embedded struct/union
> >>>>>   by recording it as a 0-vlen STRUCT/UNION with size preserved
> >>>>>   (Andrii, patch 2)
> >>>>> - verify size match on relocation for such STRUCT/UNIONs (Andrii,
> >>>>>   patch 9)
> >>>>> - with embedded STRUCT/UNION recording size, we can have bpftool
> >>>>>   dump a header representation using .BTF.base + .BTF sections
> >>>>>   rather than special-casing and refusing to use "format c" for
> >>>>>   that case (patch 5)
> >>>>> - match enum with enum64 and vice versa (Andrii, patch 9)
> >>>>> - ensure that resolve_btfids works with BTF without .BTF.base
> >>>>>   section (patch 7)
> >>>>> - update tests to cover embedded types, arrays and function
> >>>>>   prototypes (patches 3, 12)
> >>>>>
> >>>>> One change not made yet is adding anonymous struct/unions that the split
> >>>>> BTF references in base BTF to the module instead of adding them to the
> >>>>> .BTF.base section.  That would involve having to maintain two pipes for
> >>>>> writing BTF, one for the .BTF.base and one for the split BTF.  It would
> >>>>> be possible, but there are I think some edge cases that might make it
> >>>>> tricky.  For example consider a split BTF reference to a base BTF
> >>>>> ARRAY which in turn referenced an anonymous STRUCT as type.  In such a
> >>>>> case, it wouldn't make sense to have the array in the .BTF.base section
> >>>>> while having the STRUCT in the module.  The general concern is that once
> >>>>
> >>>> Hm.. not really? ARRAY is a reference type (and anonymous at that), so
> >>>> it would have to stay in module's BTF, no? I'll go read the patch
> >>>> series again, but let me know if I'm missing something.
> >>>>
> >>
> >> The way things currently work, we preserve all relationships prior to
> >> distilling base BTF. That is, if a type was in split BTF prior to
> >> calling btf__distill_base(), it will stay in split BTF afterwards. Ditto
> >> for base types. This is true for reference types as well as named types.
> >> So in the case of the above array for example, prior to distilling types
> >> it is in base BTF. If it in turn then referred to a base anonymous
> >> struct, both would be in the base and thus the distilled base BTF. In
> >> the above case, I was suggesting the array itself was referred to from
> >> split BTF, but not in split BTF, sorry if that wasn't clearer.
> >>
> >> So the problem comes if we moved the anon struct to the module; then we
> >> also need to move types that depend on it there. This means we'd need to
> >> make the move recursive. That seems doable; the only question is around
> >
> > Yep, it should be very doable. We just mark everything used from
> > "to-be-moved-to-new-split-BTF" types recursively, unless it's
> > "qualified named type", where we stop. You have a pass to mark
> > embedded types, here it might be another pass to mark
> > "used-by-split-BTF-types-but-not-distillable" types.
> >
> >> the logistics and the effects of doing so. At one extreme we might end
> >> up with something that resembles standalone BTF (many/most types in the
> >
> > My hypothesis is that it is very unlikely that there will be a lot of
> > types that have to be copied into split BTF.
> >
> >> split BTF). That seems unlikely in most cases. I examined one module's
> >> BTF base for example, and the only anon structs arose from typedef
> >> references possible_net_t, sockptr_t, rwlock_t and atomic_t. These in
> >> turn were only referenced once elsewhere in distilled base BTF; a
> >> sockptr was in a FUNC_PROTO, but aside from that the typedefs were not
> >> otherwise referenced in distilled base BTF, they were referenced in
> >> split BTF as embeeded struct field types.
> >>
> >> So moving all of this to the split BTF seems possible; what I think we
> >> probably need to think on a bit is how to handle relocation.  Is there a
> >> need to relocate these module types too, or can we live with having
> >> duplicate atomic_t/sockptr_t typedefs in the module? Currently
> >> relocation is simplified by the fact that we only need to relocate the
> >> types prior to the module's start id. All we need to do is rewrite type
> >> references in split BTF to base ids. If we were relocating split types
> >> too we'd need to remove them from split BTF.
> >
> > I think anything that is not in distilled base should not be
> > relocated, so current simplicity is remapping distilled BTF IDs will
> > remain. It's ok to have clones/copies of some simple typedefs,
> > probably.
> >
> > We have a few somewhat competing goals here and we need to make a
> > tradeoff between them:
> >
> >   a) minimizing split BTF size (or rather not making it too large)
> >   b) making sure PTR_TO_BTF_ID types work (so module kfuncs can accept
> > task_struct and others)
> >   c) keeping relocation simple, fast, and reliable/unambiguous
> >
> > By copying anonymous types we potentially hurt a) (but presumably not
> > a lot to worry about), and we significantly improve c) by making
> > relocation simple/fast/reliably (to the extent possible with "by name"
> > lookups). And we (presumably) don't change b), it still works for all
> > existing and future cases.
> >
>
> Yeah, case b) is the only lingering concern I have, but in practice it
> seems unlikely to arise. One point of clarification - we've discussed so
> far mostly anonymous STRUCTs and UNIONs; do you think there are other
> anonymous types we should consider, ARRAYs for example?

Everything is technically possible, but I'd be surprised if anything
but STRUCT/UNION is referred to by PTR_TO_BTF_ID for kfunc. But let's
get there first.

> > If we ever need to pass anonymous typedef'ed types to kfunc, we'll
> > need to think how to represent them in distilled base BTF. But it most
> > probably won't be TYPEDEF -> STRUCT chain, but rather empty STRUCT
> > with the name of original TYPEDEF + some bit to specify that we are
> > looking for a TYPEDEF in real base BTF; I think we have a pass forward
> > here, and that's the main thing, but I don't think it's a problem
> > worth solving now (or ever).
> >
> > WDYT?
>
> Agreed. I think (hope) it's unlikely to arise.
>
> >
> >>
> >>>>> we move a type to the module we would need to also ensure any base types
> >>>>> that refer to it move there too.  For now it is I think simpler to
> >>>>> retain the existing split/base type classifications.
> >>>>
> >>>> We would have to finalize this part before landing, as it has big
> >>>> implications on the relocation process.
> >>>
> >>> Ran out of time, sorry, will continue on Monday. But please consider,
> >>> meanwhile, what I mentioned about only having named
> >>> structs/unions/enums in distilled base BTF.
> >>>
> >>
> >> Sure, I'll dig into it further. FWIW I agree with the goal of moving
> >> anonymous structs/unions if it's doable. I can't see any blocking issues
> >> thus far.
> >
> > Yep, please give it a go, and I'll try to finish the review today, thanks.
> >
> >>
> >>>>
> >>>>
> >>>>>
> >>>>> [1] https://lore.kernel.org/bpf/20231112124834.388735-14-alan.maguire@oracle.com/
> >>>>> [2] https://lore.kernel.org/bpf/20240322102455.98558-1-alan.maguire@oracle.com/
> >>>>>
> >>>>>
> >>>>>
> >>>>> Alan Maguire (13):
> >>>>>   libbpf: add support to btf__add_fwd() for ENUM64
> >>>>>   libbpf: add btf__distill_base() creating split BTF with distilled base
> >>>>>     BTF
> >>>>>   selftests/bpf: test distilled base, split BTF generation
> >>>>>   libbpf: add btf__parse_opts() API for flexible BTF parsing
> >>>>>   bpftool: support displaying raw split BTF using base BTF section as
> >>>>>     base
> >>>>>   kbuild,bpf: switch to using --btf_features for pahole v1.26 and later
> >>>>>   resolve_btfids: use .BTF.base ELF section as base BTF if -B option is
> >>>>>     used
> >>>>>   kbuild, bpf: add module-specific pahole/resolve_btfids flags for
> >>>>>     distilled base BTF
> >>>>>   libbpf: split BTF relocation
> >>>>>   module, bpf: store BTF base pointer in struct module
> >>>>>   libbpf,bpf: share BTF relocate-related code with kernel
> >>>>>   selftests/bpf: extend distilled BTF tests to cover BTF relocation
> >>>>>   bpftool: support displaying relocated-with-base split BTF
> >>>>>
> >>>>>  include/linux/btf.h                           |  32 +
> >>>>>  include/linux/module.h                        |   2 +
> >>>>>  kernel/bpf/Makefile                           |   8 +
> >>>>>  kernel/bpf/btf.c                              | 227 +++++--
> >>>>>  kernel/module/main.c                          |   5 +-
> >>>>>  scripts/Makefile.btf                          |  12 +-
> >>>>>  scripts/Makefile.modfinal                     |   4 +-
> >>>>>  .../bpf/bpftool/Documentation/bpftool-btf.rst |  15 +-
> >>>>>  tools/bpf/bpftool/bash-completion/bpftool     |   7 +-
> >>>>>  tools/bpf/bpftool/btf.c                       |  20 +-
> >>>>>  tools/bpf/bpftool/main.c                      |  14 +-
> >>>>>  tools/bpf/bpftool/main.h                      |   2 +
> >>>>>  tools/bpf/resolve_btfids/main.c               |  22 +-
> >>>>>  tools/lib/bpf/Build                           |   2 +-
> >>>>>  tools/lib/bpf/btf.c                           | 561 +++++++++++-----
> >>>>>  tools/lib/bpf/btf.h                           |  61 ++
> >>>>>  tools/lib/bpf/btf_common.c                    | 146 ++++
> >>>>>  tools/lib/bpf/btf_relocate.c                  | 630 ++++++++++++++++++
> >>>>>  tools/lib/bpf/libbpf.map                      |   3 +
> >>>>>  tools/lib/bpf/libbpf_internal.h               |   2 +
> >>>>>  .../selftests/bpf/prog_tests/btf_distill.c    | 298 +++++++++
> >>>>>  21 files changed, 1864 insertions(+), 209 deletions(-)
> >>>>>  create mode 100644 tools/lib/bpf/btf_common.c
> >>>>>  create mode 100644 tools/lib/bpf/btf_relocate.c
> >>>>>  create mode 100644 tools/testing/selftests/bpf/prog_tests/btf_distill.c
> >>>>>
> >>>>> --
> >>>>> 2.31.1
> >>>>>
> >>>

      reply	other threads:[~2024-04-29 18:02 UTC|newest]

Thread overview: 43+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-04-24 15:47 [PATCH v2 bpf-next 00/13] bpf: support resilient split BTF Alan Maguire
2024-04-24 15:47 ` [PATCH v2 bpf-next 01/13] libbpf: add support to btf__add_fwd() for ENUM64 Alan Maguire
2024-04-26 22:56   ` Andrii Nakryiko
2024-04-24 15:47 ` [PATCH v2 bpf-next 02/13] libbpf: add btf__distill_base() creating split BTF with distilled base BTF Alan Maguire
2024-04-26 22:57   ` Andrii Nakryiko
2024-04-30 23:06   ` Eduard Zingerman
2024-05-01 17:29     ` Alan Maguire
2024-05-01 17:43       ` Eduard Zingerman
2024-05-02 11:51         ` Alan Maguire
2024-04-24 15:47 ` [PATCH v2 bpf-next 03/13] selftests/bpf: test distilled base, split BTF generation Alan Maguire
2024-04-30 23:50   ` Eduard Zingerman
2024-04-30 23:55   ` Eduard Zingerman
2024-05-01 17:31     ` Alan Maguire
2024-04-24 15:47 ` [PATCH v2 bpf-next 04/13] libbpf: add btf__parse_opts() API for flexible BTF parsing Alan Maguire
2024-04-29 23:40   ` Andrii Nakryiko
2024-05-01 17:42     ` Alan Maguire
2024-05-01 17:47       ` Andrii Nakryiko
2024-05-01  0:07   ` Eduard Zingerman
2024-04-24 15:47 ` [PATCH v2 bpf-next 05/13] bpftool: support displaying raw split BTF using base BTF section as base Alan Maguire
2024-04-29 23:42   ` Andrii Nakryiko
2024-04-24 15:47 ` [PATCH v2 bpf-next 06/13] kbuild,bpf: switch to using --btf_features for pahole v1.26 and later Alan Maguire
2024-04-29 23:43   ` Andrii Nakryiko
2024-05-01 17:22     ` Alan Maguire
2024-04-24 15:48 ` [PATCH v2 bpf-next 07/13] resolve_btfids: use .BTF.base ELF section as base BTF if -B option is used Alan Maguire
2024-04-29 23:45   ` Andrii Nakryiko
2024-05-01 20:39   ` Eduard Zingerman
2024-05-02 14:53     ` Alan Maguire
2024-04-24 15:48 ` [PATCH v2 bpf-next 08/13] kbuild, bpf: add module-specific pahole/resolve_btfids flags for distilled base BTF Alan Maguire
2024-04-24 15:48 ` [PATCH v2 bpf-next 09/13] libbpf: split BTF relocation Alan Maguire
2024-04-30  0:14   ` Andrii Nakryiko
2024-04-30 16:56     ` Alan Maguire
2024-04-30 17:41       ` Andrii Nakryiko
2024-05-02 16:39         ` Eduard Zingerman
2024-04-24 15:48 ` [PATCH v2 bpf-next 10/13] module, bpf: store BTF base pointer in struct module Alan Maguire
2024-04-24 15:48 ` [PATCH v2 bpf-next 11/13] libbpf,bpf: share BTF relocate-related code with kernel Alan Maguire
2024-04-24 15:48 ` [PATCH v2 bpf-next 12/13] selftests/bpf: extend distilled BTF tests to cover BTF relocation Alan Maguire
2024-04-24 15:48 ` [PATCH v2 bpf-next 13/13] bpftool: support displaying relocated-with-base split BTF Alan Maguire
2024-04-26 22:56 ` [PATCH v2 bpf-next 00/13] bpf: support resilient " Andrii Nakryiko
2024-04-27  0:24   ` Andrii Nakryiko
2024-04-29 15:25     ` Alan Maguire
2024-04-29 17:05       ` Andrii Nakryiko
2024-04-29 17:31         ` Alan Maguire
2024-04-29 18:02           ` Andrii Nakryiko [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAEf4BzY__EKpsdxqw+CiUQ=oe8CLtsVwP3nQgqTZ7eCBzxTr4w@mail.gmail.com' \
    --to=andrii.nakryiko@gmail.com \
    --cc=acme@redhat.com \
    --cc=alan.maguire@oracle.com \
    --cc=andrii@kernel.org \
    --cc=ast@kernel.org \
    --cc=bpf@vger.kernel.org \
    --cc=daniel@iogearbox.net \
    --cc=eddyz87@gmail.com \
    --cc=haoluo@google.com \
    --cc=houtao1@huawei.com \
    --cc=john.fastabend@gmail.com \
    --cc=jolsa@kernel.org \
    --cc=kpsingh@kernel.org \
    --cc=martin.lau@linux.dev \
    --cc=masahiroy@kernel.org \
    --cc=mcgrof@kernel.org \
    --cc=mykolal@fb.com \
    --cc=nathan@kernel.org \
    --cc=quentin@isovalent.com \
    --cc=sdf@google.com \
    --cc=song@kernel.org \
    --cc=yonghong.song@linux.dev \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).