All the mail mirrored from lore.kernel.org
 help / color / mirror / Atom feed
From: Jeff King <peff@peff.net>
To: Patrick Palka <patrick@parcs.ath.cx>
Cc: Junio C Hamano <gitster@pobox.com>, git@vger.kernel.org
Subject: Re: [PATCH] Improve contrib/diff-highlight to highlight unevenly-sized hunks
Date: Thu, 18 Jun 2015 23:54:09 -0400	[thread overview]
Message-ID: <20150619035408.GA23679@peff.net> (raw)
In-Reply-To: <20150618212356.GA20271@peff.net>

On Thu, Jun 18, 2015 at 05:23:56PM -0400, Jeff King wrote:

> +# Return the individual diff-able items of the hunk, but with any
> +# diff or color prefix/suffix for each line split out (we assume that the
> +# prefix/suffix for each line will be the same).
> +sub split_hunk {
> +	my ($prefix, $suffix, @r);
> +	foreach my $line (@_) {
> +		$line =~ /^($COLOR*[+-]$COLOR*)(.*)($COLOR*)/
> +			or die "eh, this is supposed to match everything!";

This isn't quite right. We'll never match the suffix, because it gets
sucked up by the greedy (.*). But making that non-greedy matches
nothing, unless we also add "$" at the end.

_But_, there is still something else weird going on. I replaced this
split:

> +		push @r, split(//, $2), "\n";

with:

  push @r, split(/([[:space:][:punct:]])/, $2), "\n";

which behaves much better. With that, 99a2cfb shows:

  -       <if> (!peel_ref(path, peeled)) {
  -               <is>_annotated = !!<hashcmp>(<sha1>, peeled);
  +       <if> (!peel_ref(path, peeled<.hash>)) {
  +               <is>_annotated = !!<oidcmp>(<oid>, <&>peeled);

The latter half of both lines looks perfect. But what's that weird
highlighting of the initial "if" and "is" on those lines?

It turns out that the colored output we produce is quite odd:

  $ git show --color 99a2cfb | grep peel_ref | cat -A
  ^[[31m-^Iif (!peel_ref(path, peeled)) {^[[m$
  ^[[32m+^[[m^I^[[32mif (!peel_ref(path, peeled.hash)) {^[[m$

For the pre-image, we print the color, the "-", and then the line data.
Makes sense.

For the post-image, we print the color, "+", a reset, then the initial
whitespace, then the color again!

So of course the diff algorithm says "hey, there's this weird color in
here". The original implementation of diff-highlight didn't care,
because it skipped leading whitespace and colors as "boring". But this
one cannot do that. It pulls the strict prefix out of each line (and it
must, because it must get the same prefix for each line of a hunk).

I think git is actually wrong here; it's mingling the ANSI colors and
the actual content. Nobody ever noticed because it looks OK to a human,
and who would be foolish enough to try machine-parsing colorized diff
output?

The fix is:

diff --git a/diff.c b/diff.c
index 87b16d5..a80b5b4 100644
--- a/diff.c
+++ b/diff.c
@@ -501,9 +501,9 @@ static void emit_line_checked(const char *reset,
 		emit_line_0(ecbdata->opt, ws, reset, sign, line, len);
 	else {
 		/* Emit just the prefix, then the rest. */
-		emit_line_0(ecbdata->opt, set, reset, sign, "", 0);
+		emit_line_0(ecbdata->opt, set, "", sign, "", 0);
 		ws_check_emit(line, len, ecbdata->ws_rule,
-			      ecbdata->opt->file, set, reset, ws);
+			      ecbdata->opt->file, "", reset, ws);
 	}
 }
 

But I'm a little worried it may interfere with the way the
whitespace-checker emits colors (e.g., it may emit its own colors for
the leading spaces, and we would need to re-assert our color before
showing the rest of the line). So I think you could also argue that
because of whitespace-highlighting, colorized diffs are fundamentally
going to have colors intermingled with the content and should not be
parsed this way.

All the more reason to try to move this inside diff.c, I guess.

-Peff

  parent reply	other threads:[~2015-06-19  3:54 UTC|newest]

Thread overview: 20+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-06-15 17:20 [PATCH] Improve contrib/diff-highlight to highlight unevenly-sized hunks Patrick Palka
2015-06-18 15:50 ` Junio C Hamano
2015-06-18 16:28   ` Patrick Palka
2015-06-18 18:08     ` Junio C Hamano
2015-06-18 19:04       ` Jeff King
2015-06-18 20:14         ` Patrick Palka
2015-06-18 20:45           ` Jeff King
2015-06-18 21:23             ` Jeff King
2015-06-18 21:39               ` Junio C Hamano
2015-06-18 22:25               ` Patrick Palka
2015-06-19  3:54               ` Jeff King [this message]
2015-06-19  4:49                 ` Junio C Hamano
2015-06-19  5:32                   ` Jeff King
2015-06-19  7:34                     ` Jeff King
2015-06-19 11:38                       ` Jeff King
2015-06-19 17:20                     ` Junio C Hamano
2015-06-18 23:06             ` Patrick Palka
2015-06-18 20:23       ` Patrick Palka
2015-06-18 19:08     ` Jeff King
2015-06-18 20:27       ` Patrick Palka

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20150619035408.GA23679@peff.net \
    --to=peff@peff.net \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=patrick@parcs.ath.cx \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.