BPF Archive mirror
 help / color / mirror / Atom feed
From: Mykyta Yatsenko <mykyta.yatsenko5@gmail.com>
To: Andrii Nakryiko <andrii.nakryiko@gmail.com>
Cc: Quentin Monnet <qmo@kernel.org>,
	Eduard Zingerman <eddyz87@gmail.com>,
	bpf@vger.kernel.org, ast@kernel.org, andrii@kernel.org,
	daniel@iogearbox.net, kafai@meta.com, kernel-team@meta.com,
	Mykyta Yatsenko <yatsenko@meta.com>
Subject: Re: [PATCH bpf-next] bpftool: introduce btf c dump sorting
Date: Thu, 9 May 2024 00:35:46 +0100	[thread overview]
Message-ID: <83829b9e-d561-4209-b4c7-9ef3e35708b0@gmail.com> (raw)
In-Reply-To: <CAEf4Bzar7k4P+JE2FVrcMzaXMHd6OZwvPjf7QUm5rJWoizYXZw@mail.gmail.com>

On 5/9/24 00:21, Andrii Nakryiko wrote:
> On Wed, May 8, 2024 at 4:07 PM Mykyta Yatsenko
> <mykyta.yatsenko5@gmail.com> wrote:
>> On 5/7/24 22:02, Andrii Nakryiko wrote:
>>> On Mon, May 6, 2024 at 6:45 AM Mykyta Yatsenko
>>> <mykyta.yatsenko5@gmail.com> wrote:
>>>> From: Mykyta Yatsenko <yatsenko@meta.com>
>>>>
>>>> Provide a way to sort bpftool c dump output, to simplify vmlinux.h
>>>> diffing and forcing more natural definitions ordering.
>>>>
>>>> Use `normalized` argument in bpftool CLI after `format c` for example:
>>>> ```
>>>> bpftool btf dump file /sys/kernel/btf/fuse format c normalized
>>>> ```
>>>>
>>>> Definitions are sorted by their BTF kind ranks, lexicographically and
>>>> typedefs are forced to go right after their base type.
>>>>
>>>> Type ranks
>>>>
>>>> Assign ranks to btf kinds (defined in function btf_type_rank) to set
>>>> next order:
>>>> 1. Anonymous enums
>>>> 2. Anonymous enums64
>>>> 3. Named enums
>>>> 4. Named enums64
>>>> 5. Trivial types typedefs (ints, then floats)
>>>> 6. Structs
>>>> 7. Unions
>>>> 8. Function prototypes
>>>> 9. Forward declarations
>>>>
>>>> Lexicographical ordering
>>>>
>>>> Definitions within the same BTF kind are ordered by their names.
>>>> Anonymous enums are ordered by their first element.
>>>>
>>>> Forcing typedefs to go right after their base type
>>>>
>>>> To make sure that typedefs are emitted right after their base type,
>>>> we build a list of type's typedefs (struct typedef_ref) and after
>>>> emitting type, its typedefs are emitted as well (lexicographically)
>>>>
>>>> There is a small flaw in this implementation:
>>>> Type dependencies are resolved by bpf lib, so when type is dumped
>>>> because it is a dependency, its typedefs are not output right after it,
>>>> as bpflib does not have the list of typedefs for a given type.
>>>>
>>>> Signed-off-by: Mykyta Yatsenko <yatsenko@meta.com>
>>>> ---
>>>>    tools/bpf/bpftool/btf.c | 264 +++++++++++++++++++++++++++++++++++++++-
>>>>    1 file changed, 259 insertions(+), 5 deletions(-)
>>>>
> [...]
>
>>> can we avoid all this complexity with TYPEDEFs if we just rank them
>>> just like the type they are pointing to? I.e., TYPEDEF -> STRUCT is
>>> just a struct, TYPEDEF -> TYPEDEF -> INT is just an INT. Emitting the
>>> TYPEDEF type will force all the dependent types to be emitted, which
>>> is good.
>>>
>>> If we also use this "pointee type"'s name as TYPEDEF's sort name, it
>>> will also put it in the position where it should be, right? There
>>> might be some insignificant deviations, but I think it would keep the
>>> code much simpler (and either way we are striving for something that
>>> more-or-less works as expected in practice, not designing some API
>>> that's set in stone).
>>>
>>> WDYT?
>>>
>> I don't think this will guarantee for each type all typedefs follow
>> immediately.
>> For example:
>>
>> With this patch next output is generated:
>>       typedef s64 aaa; /* aaa is the smallest first level child of s64 */
>>       typedef aaa ccc; /* ccc immediately follows aaa as child */
>>       typedef s64 bbb; /* bbb is first level child of s64 following aaa */
>>       typedef s32 xxx; /* xxx follows bbb lexicographically */
>>
>> Option 2: I we apply flat sorting by rank and then name, we'll get next
>> order:
>>       typedef s64 aaa;
>>       typedef s64 bbb;
>>       typedef aaa ccc;
>>       typedef s32 xxx;
>>
>> Here order just follows aaa - bbb - ccc - xxx. Type ccc does not immediately
>> follow its parent aaa.
>>
>> Option3: If we use pointee name as sort name, next output is expected:
>>       typedef s64 aaa; /* dependency of the next line */
>>       typedef aaa ccc; /* sort name aaa */
>>       typedef s32 xxx; /* sort name s32 */
>>       typedef s64 bbb; /* sort name s64 */
>>
>> I think Option 2 will have the simplest implementation, but we are
>> getting BFS
>> ordering instead of DFS. I'm not entirely sure, but it looks to me, that we
>> can't achieve DFS ordering with sort-based simple implementation, let me
>> know if
>> I'm missing anything here.
>> If DFS ordering is not required, I'm happy to scrap it.
> I'd go with Option3, but I'd resolve ccc -> aaa -> s64 as sort name
> (so all the way to non-reference type), and use INT as rank for that
> typedef. We'd have ordering ambiguity between `ccc -> aaa -> s64`
> chain and `bbb -> s64`, to resolve that we'd need to be able to
> compare entire chains, but that would require more bookkeeping. So
> maybe let's remember both resolved name and "original" name (i.e., for
> ccc resolved would be "s64", original is "ccc"), and if the resolved
> name is the same, then compare original name. That will give the
> ordering more stability. And it's just an extra u32 to keep track per
> type in this extra sort_datum thingy. WDYT?
This will do.
> I think simple trumps whatever ideal ordering we come up with. In any
> case it's going to be pretty stable and easy to diff, so I'd go with
> that.
Agreed.
>>>> +static int *sort_btf_c(const struct btf *btf)
>>>> +{
>>>> +       int total_root_types;
>>>> +       struct sort_datum *datums;
>>>> +       int *sorted_indexes = NULL;
>>>> +       int *type_index_to_datum_index;
>>> nit: most of these names are unnecessarily verbose. It's one
>>> relatively straightforward function, just use shorter names "n",
>>> "idxs", "idx_to_datum", stuff like this. Cooler and shorter C names
>>> :))
>>>
>>>> +
>>>> +       if (!btf)
>>>> +               return sorted_indexes;
>>> this would be a horrible bug if this happens, don't guard against it here
>>>
>>>> +
>>>> +       total_root_types = btf__type_cnt(btf);
>>>> +       datums = malloc(sizeof(struct sort_datum) * total_root_types);
>>>> +
>>>> +       for (int i = 1; i < total_root_types; ++i) {
>>>> +               struct sort_datum *current_datum = datums + i;
>>>> +
>>>> +               current_datum->index = i;
>>>> +               current_datum->name = btf_type_sort_name(btf, i);
>>>> +               current_datum->type_rank = btf_type_rank(btf, i, false);
>>>> +               current_datum->emitted = false;
>>> btf_dump__dump_type() keeps track of which types are already emitted,
>>> you probably don't need to do this explicitly?
>> I use `emitted` to indicate whether type index has been copied into output
>> `sorted_indexes` array. This is needed because type (if it is a typedef)
>> can be put into output out of order by its parent base type, if base has
>> been processed earlier. It helps to avoid putting the same type twice in
>> the output array preventing buffer overrun.
> Would this still be needed if we do this sorting only approach?
Right, most of this code can be removed.
> [...]



      reply	other threads:[~2024-05-08 23:35 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-05-06 13:44 [PATCH bpf-next] bpftool: introduce btf c dump sorting Mykyta
2024-05-07 21:02 ` Andrii Nakryiko
2024-05-07 23:30   ` Quentin Monnet
2024-05-08 16:21     ` Andrii Nakryiko
2024-05-08 23:07   ` Mykyta Yatsenko
2024-05-08 23:21     ` Andrii Nakryiko
2024-05-08 23:35       ` Mykyta Yatsenko [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=83829b9e-d561-4209-b4c7-9ef3e35708b0@gmail.com \
    --to=mykyta.yatsenko5@gmail.com \
    --cc=andrii.nakryiko@gmail.com \
    --cc=andrii@kernel.org \
    --cc=ast@kernel.org \
    --cc=bpf@vger.kernel.org \
    --cc=daniel@iogearbox.net \
    --cc=eddyz87@gmail.com \
    --cc=kafai@meta.com \
    --cc=kernel-team@meta.com \
    --cc=qmo@kernel.org \
    --cc=yatsenko@meta.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).