From: "Robin H. Johnson" <robbat2@gentoo.org>
To: Patrick Steinhardt <ps@pks.im>, Git Mailing List <git@vger.kernel.org>
Cc: Junio C Hamano <gitster@pobox.com>,
"Robin H. Johnson" <robbat2@gentoo.org>
Subject: Re: Feature request: secondary index by path fragment
Date: Tue, 7 May 2024 05:38:34 +0000 [thread overview]
Message-ID: <robbat2-20240507T053331-859497691Z@orbis-terrarum.net> (raw)
In-Reply-To: <ZjmtJFF7rv7B8Nhj@tanuki>
[-- Attachment #1: Type: text/plain, Size: 1787 bytes --]
On Tue, May 07, 2024 at 06:25:08AM +0200, Patrick Steinhardt wrote:
> On Mon, May 06, 2024 at 04:22:11PM -0700, Junio C Hamano wrote:
> > "Robin H. Johnson" <robbat2@gentoo.org> writes:
> >
> > > Gentoo has some tooling that boils down to repeated runs of 'git log -- somepath/'
> > > via cgit as well as other shell tooling.
> > > ...
> > > I was wondering if Git could gain a secondary index of commits, based on
> > > path prefixes, that would speed up the 'git log' run.
> >
> > Perhaps the bloom filters are good fit for the use case?
>
> Yes, Bloom filters are the first thing that pop into my mind here as
> they are exactly designed to solve this problem. So if you rewrite your
> commit graphs with `git commit-graph write --changed-paths --reachable`
> you should hopefully see a significant speedup.
Good news & bad news.
"git log -- sys-apps/pv >/dev/null" as my testcase from before:
The fast system (2.45.0) went from 11 seconds to ~1 second!
The slow system (2.44.0) went from 45 seconds to 49 seconds :-(.
I'll try to trace down why one system slowed down.
commit-graph command:
fast: 1m10s
slow: 3m43s
> It makes me wonder whether we can maybe enable generation of Bloom
> filters by default. The biggest downside is of course that writing
> commit graphs becomes slower. But that should happen in the background
> for normal users anyway, and most forges probably hand-roll maintenance
> and thus wouldn't care.
Most repos are also MUCH smaller than this, so it should be safe to
enable.
--
Robin Hugh Johnson
Gentoo Linux: Dev, Infra Lead, Foundation President & Treasurer
E-Mail : robbat2@gentoo.org
GnuPG FP : 11ACBA4F 4778E3F6 E4EDF38E B27B944E 34884E85
GnuPG FP : 7D0B3CEB E9B85B1F 825BCECF EE05E6F6 A48F6136
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 1113 bytes --]
prev parent reply other threads:[~2024-05-07 5:38 UTC|newest]
Thread overview: 4+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-05-06 23:11 Feature request: secondary index by path fragment Robin H. Johnson
2024-05-06 23:22 ` Junio C Hamano
2024-05-07 4:25 ` Patrick Steinhardt
2024-05-07 5:38 ` Robin H. Johnson [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=robbat2-20240507T053331-859497691Z@orbis-terrarum.net \
--to=robbat2@gentoo.org \
--cc=git@vger.kernel.org \
--cc=gitster@pobox.com \
--cc=ps@pks.im \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).