Git Mailing List Archive mirror
 help / color / mirror / Atom feed
From: "Robin H. Johnson" <robbat2@gentoo.org>
To: Patrick Steinhardt <ps@pks.im>, Git Mailing List <git@vger.kernel.org>
Cc: Junio C Hamano <gitster@pobox.com>,
	"Robin H. Johnson" <robbat2@gentoo.org>
Subject: Re: Feature request: secondary index by path fragment
Date: Tue, 7 May 2024 05:38:34 +0000	[thread overview]
Message-ID: <robbat2-20240507T053331-859497691Z@orbis-terrarum.net> (raw)
In-Reply-To: <ZjmtJFF7rv7B8Nhj@tanuki>

[-- Attachment #1: Type: text/plain, Size: 1787 bytes --]

On Tue, May 07, 2024 at 06:25:08AM +0200, Patrick Steinhardt wrote:
> On Mon, May 06, 2024 at 04:22:11PM -0700, Junio C Hamano wrote:
> > "Robin H. Johnson" <robbat2@gentoo.org> writes:
> > 
> > > Gentoo has some tooling that boils down to repeated runs of 'git log -- somepath/'
> > > via cgit as well as other shell tooling.
> > > ...
> > > I was wondering if Git could gain a secondary index of commits, based on
> > > path prefixes, that would speed up the 'git log' run.
> > 
> > Perhaps the bloom filters are good fit for the use case?
> 
> Yes, Bloom filters are the first thing that pop into my mind here as
> they are exactly designed to solve this problem. So if you rewrite your
> commit graphs with `git commit-graph write --changed-paths --reachable`
> you should hopefully see a significant speedup.

Good news & bad news.
"git log -- sys-apps/pv >/dev/null" as my testcase from before:
The fast system (2.45.0) went from 11 seconds to ~1 second!
The slow system (2.44.0) went from 45 seconds to 49 seconds :-(.

I'll try to trace down why one system slowed down.

commit-graph command:
fast: 1m10s
slow: 3m43s

> It makes me wonder whether we can maybe enable generation of Bloom
> filters by default. The biggest downside is of course that writing
> commit graphs becomes slower. But that should happen in the background
> for normal users anyway, and most forges probably hand-roll maintenance
> and thus wouldn't care.
Most repos are also MUCH smaller than this, so it should be safe to
enable.


-- 
Robin Hugh Johnson
Gentoo Linux: Dev, Infra Lead, Foundation President & Treasurer
E-Mail   : robbat2@gentoo.org
GnuPG FP : 11ACBA4F 4778E3F6 E4EDF38E B27B944E 34884E85
GnuPG FP : 7D0B3CEB E9B85B1F 825BCECF EE05E6F6 A48F6136

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 1113 bytes --]

      reply	other threads:[~2024-05-07  5:38 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-05-06 23:11 Feature request: secondary index by path fragment Robin H. Johnson
2024-05-06 23:22 ` Junio C Hamano
2024-05-07  4:25   ` Patrick Steinhardt
2024-05-07  5:38     ` Robin H. Johnson [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=robbat2-20240507T053331-859497691Z@orbis-terrarum.net \
    --to=robbat2@gentoo.org \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=ps@pks.im \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).