Git Mailing List Archive mirror
 help / color / mirror / Atom feed
From: Kyle Lippincott <spectral@google.com>
To: Junio C Hamano <gitster@pobox.com>
Cc: git@vger.kernel.org
Subject: Re: libification: how to avoid symbol collisions?
Date: Mon, 12 Feb 2024 18:48:26 -0800	[thread overview]
Message-ID: <CAO_smVizKLL2NHFBpszJn+ieJhCEZyvvOT-BWv6Oz5pGiafPVg@mail.gmail.com> (raw)
In-Reply-To: <xmqqil2ximxq.fsf@gitster.g>

On Fri, Feb 9, 2024 at 9:10 AM Junio C Hamano <gitster@pobox.com> wrote:
>
> Kyle Lippincott <spectral@google.com> writes:
>
> > If I'm right that this is an issue, does this imply that we'd need to
> > rename every non-static function in the git codebase that's part of a
> > library to have a `git_` prefix, even if it won't be used outside of
> > the git project's own .c files? Is there a solution that doesn't
> > involve making it so that we have to type `git_` a billion times a day
> > that's also maintainable? We could try to guess at how likely a name
> > collision would be and only do this for ones where it's obviously
> > going to collide, but if we get it wrong, I'm concerned that we'll run
> > into subtle ODR violations that *at best* erode confidence in our
> > library, and can actually cause outages, data corruption, and
> > security/privacy issues.
>
> If you end up with a helper function name "foo" that is defined in
> our X.c and used by our Y.c but is not part of the published "git
> library API", we may want to rename it so that such a common name
> can be used by programs that link with the "git library".  We may
> choose to rename it to "GitLib_foo".

If it's internal, we may want to name it with a different prefix than
GitLib, if we expect the exposed API of the library to have this
prefix, just as a signal to readers where the internal/external
boundaries are.

>
> Do we want to keep the source of our library code, which defines the
> helper function "foo" in X.c and calls it in Y.c, intact so that the
> helper is still named "foo" as far as we are concerned?  Or do we
> "bite the bullet" and bulk modify both the callers and the callee?
>
> I'd imagine that we would rather avoid such a churn at all cost [*].
> After all, "foo" is *not* supposed to be callable by any human
> written code, and that is why we rename it to a name "GitLib_foo"
> that is unlikely to overlap with any sane human would use.
>
>         Side note: if a public API function that we want our library
>         clients to call is misnamed, we want to rename it so that we
>         would both internally and externally use the same public
>         name, I would imagine.
>
> The mechanics to rename should be a solved problem, I think, as we
> are obviously not the first project that wants to build a library.
>
> If the names are all simple, we could do this in CPP,

At first I thought you meant C++, and I was like "Yeah, that's a
possible solution: when building a library, compile it as C++ with
name mangling, except for the symbols we intend to export!" -- this
was not what you meant, though. Kind of amusingly, that idea might
work, and might even be maintainable once we got to that state, but
getting to that state would be a lot of cleanup because of C++'s
stricter type system (`char *p = ptr;`, where `ptr` is a `void*` for
example; maybe this is a call to malloc or similar). Since the git
libraries don't exist yet, there's technically no worries about
backwards compatibility with requiring a C++ compiler.

> i.e. invent a
> header file that has bunch of such renames like
>
>     #define foo GitLib_foo
>
> and include it in both X.c and Y.c.  But "foo" may also appear as
> the name of a type, a member in a structure, etc., where we may not
> want to touch, so in a project larger than toy scale, this approach
> may not work well.

Glancing at the tags file, it looks like there's a small number of
cases where this would be problematic, and they're mostly things where
there's a function named the same thing as either a struct variable
storing the result of the function. So it could work, but there's over
3,500 symbols (if I did my filtering of the tags file correctly) that
are not scoped to a specific file (i.e. static), or
struct/enum/typedef/union names. That's going to be quite annoying to
maintain; even if we don't end up having to do all 3,500 symbols, for
the files that are part of some public library, we'd add maintenance
burden because we'd need to remember to either make every new function
be static, or add it to this list. I assume we could create a test
that would enforce this ("static, named with <prefix>, or added to
<list>"), so the issue is catchable, but it will be exceedingly
annoying every time one encounters this.

>
> "objcopy --redefine-sym" would probably be a better way.  I haven't
> written a linker script, but I heard rumors that there is RENAME()
> to rename symbols, and using that might be another avenue.
>
>

I'd thought of linker scripts, but rejected the idea due to
assumptions I made about their portability - this could be mitigated
by having a linker-script-generator step in the build process, but
this seems difficult to maintain. It also implies the same maintenance
burden as the #defines, where when introducing a new function to X.c
that is called from Y.c we'd have to edit the list of "symbols to
rename".

  reply	other threads:[~2024-02-13  2:48 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-02-09  2:30 libification: how to avoid symbol collisions? Kyle Lippincott
2024-02-09 13:31 ` rsbecker
2024-02-13  1:02   ` Kyle Lippincott
2024-02-09 17:09 ` Junio C Hamano
2024-02-13  2:48   ` Kyle Lippincott [this message]
2024-02-13 16:50     ` Junio C Hamano

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAO_smVizKLL2NHFBpszJn+ieJhCEZyvvOT-BWv6Oz5pGiafPVg@mail.gmail.com \
    --to=spectral@google.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).