* [PATCH v2 0/3] Fix how for-each-ref handles broken loose references
@ 2015-06-02 15:57 Michael Haggerty
2015-06-02 15:57 ` [PATCH v2 1/3] t6301: new tests of for-each-ref error handling Michael Haggerty
` (2 more replies)
0 siblings, 3 replies; 8+ messages in thread
From: Michael Haggerty @ 2015-06-02 15:57 UTC (permalink / raw
To: Junio C Hamano
Cc: Anders Kaseorg, Stefan Beller, Jeff King, git, Michael Haggerty
This is a reroll of [1]. Thanks to Peff and Junio for their comments
about v1.
Changes since v1:
* Use $_z40 to initialize $ZEROS in test.
* Test that "git for-each-ref --format="%(objectname) %(refname)"
*doesn't* notice references that point at missing objects.
This patch series is also available from my GitHub account [2] as
branch for-each-ref-errors.
[1] http://thread.gmane.org/gmane.comp.version-control.git/270429
[2] https://github.com/mhagger/git
Michael Haggerty (3):
t6301: new tests of for-each-ref error handling
for-each-ref: report broken references correctly
read_loose_refs(): treat NULL_SHA1 loose references as broken
builtin/for-each-ref.c | 5 ++++
refs.c | 7 ++++++
t/t6301-for-each-ref-errors.sh | 56 ++++++++++++++++++++++++++++++++++++++++++
3 files changed, 68 insertions(+)
create mode 100755 t/t6301-for-each-ref-errors.sh
--
2.1.4
^ permalink raw reply [flat|nested] 8+ messages in thread
* [PATCH v2 1/3] t6301: new tests of for-each-ref error handling
2015-06-02 15:57 [PATCH v2 0/3] Fix how for-each-ref handles broken loose references Michael Haggerty
@ 2015-06-02 15:57 ` Michael Haggerty
2015-06-02 15:57 ` [PATCH v2 2/3] for-each-ref: report broken references correctly Michael Haggerty
2015-06-02 15:57 ` [PATCH v2 3/3] read_loose_refs(): treat NULL_SHA1 loose references as broken Michael Haggerty
2 siblings, 0 replies; 8+ messages in thread
From: Michael Haggerty @ 2015-06-02 15:57 UTC (permalink / raw
To: Junio C Hamano
Cc: Anders Kaseorg, Stefan Beller, Jeff King, git, Michael Haggerty
Add tests that for-each-ref correctly reports broken loose reference
files and references that point at missing objects. In fact, two of
these tests fail, because (1) NULL_SHA1 is not recognized as an
invalid reference value, and (2) for-each-ref doesn't respect
REF_ISBROKEN. Fixes to come.
Note that when for-each-ref is run with a --format option that doesn't
require the object to be looked up, then we should still notice if a
loose reference file is corrupt or contains NULL_SHA1, but we don't
notice if it points at a missing object because we don't do an object
lookup. This is OK.
Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
---
Notes (discussion):
Note that a reference that points at NULL_SHA1 is reported as "broken"
rather than "missing". This is because NULL_SHA1 is manifestly bogus,
whereas we have no systematic basis for rejecting any other
40-character hexadecimal string without doing an object lookup.
t/t6301-for-each-ref-errors.sh | 56 ++++++++++++++++++++++++++++++++++++++++++
1 file changed, 56 insertions(+)
create mode 100755 t/t6301-for-each-ref-errors.sh
diff --git a/t/t6301-for-each-ref-errors.sh b/t/t6301-for-each-ref-errors.sh
new file mode 100755
index 0000000..cf25244
--- /dev/null
+++ b/t/t6301-for-each-ref-errors.sh
@@ -0,0 +1,56 @@
+#!/bin/sh
+
+test_description='for-each-ref errors for broken refs'
+
+. ./test-lib.sh
+
+ZEROS=$_z40
+MISSING=abababababababababababababababababababab
+
+test_expect_success setup '
+ git commit --allow-empty -m "Initial" &&
+ git tag testtag &&
+ git for-each-ref >full-list &&
+ git for-each-ref --format="%(objectname) %(refname)" >brief-list
+'
+
+test_expect_failure 'Broken refs are reported correctly' '
+ r=refs/heads/bogus &&
+ : >.git/$r &&
+ test_when_finished "rm -f .git/$r" &&
+ echo "warning: ignoring broken ref $r" >broken-err &&
+ git for-each-ref >out 2>err &&
+ test_cmp full-list out &&
+ test_cmp broken-err err
+'
+
+test_expect_failure 'NULL_SHA1 refs are reported correctly' '
+ r=refs/heads/zeros &&
+ echo $ZEROS >.git/$r &&
+ test_when_finished "rm -f .git/$r" &&
+ echo "warning: ignoring broken ref $r" >zeros-err &&
+ git for-each-ref >out 2>err &&
+ test_cmp full-list out &&
+ test_cmp zeros-err err &&
+ git for-each-ref --format="%(objectname) %(refname)" >brief-out 2>brief-err &&
+ test_cmp brief-list brief-out &&
+ test_cmp zeros-err brief-err
+'
+
+test_expect_success 'Missing objects are reported correctly' '
+ r=refs/heads/missing &&
+ echo $MISSING >.git/$r &&
+ test_when_finished "rm -f .git/$r" &&
+ echo "fatal: missing object $MISSING for $r" >missing-err &&
+ test_must_fail git for-each-ref 2>err &&
+ test_cmp missing-err err &&
+ (
+ cat brief-list &&
+ echo "$MISSING $r"
+ ) | sort -k 2 >missing-brief-expected &&
+ git for-each-ref --format="%(objectname) %(refname)" >brief-out 2>brief-err &&
+ test_cmp missing-brief-expected brief-out &&
+ test_must_be_empty brief-err
+'
+
+test_done
--
2.1.4
^ permalink raw reply related [flat|nested] 8+ messages in thread
* [PATCH v2 2/3] for-each-ref: report broken references correctly
2015-06-02 15:57 [PATCH v2 0/3] Fix how for-each-ref handles broken loose references Michael Haggerty
2015-06-02 15:57 ` [PATCH v2 1/3] t6301: new tests of for-each-ref error handling Michael Haggerty
@ 2015-06-02 15:57 ` Michael Haggerty
2015-06-02 15:57 ` [PATCH v2 3/3] read_loose_refs(): treat NULL_SHA1 loose references as broken Michael Haggerty
2 siblings, 0 replies; 8+ messages in thread
From: Michael Haggerty @ 2015-06-02 15:57 UTC (permalink / raw
To: Junio C Hamano
Cc: Anders Kaseorg, Stefan Beller, Jeff King, git, Michael Haggerty
If there is a loose reference file with invalid contents, "git
for-each-ref" incorrectly reports the problem as being a missing
object with name NULL_SHA1:
$ echo '12345678' >.git/refs/heads/nonsense
$ git for-each-ref
fatal: missing object 0000000000000000000000000000000000000000 for refs/heads/nonsense
With an explicit "--format" string, it can even report that the
reference validly points at NULL_SHA1:
$ git for-each-ref --format='%(objectname) %(refname)'
0000000000000000000000000000000000000000 refs/heads/nonsense
$ echo $?
0
This has been broken since
b7dd2d2 for-each-ref: Do not lookup objects when they will not be used (2009-05-27)
, which changed for-each-ref from using for_each_ref() to using
git_for_each_rawref() in order to avoid looking up the referred-to
objects unnecessarily. (When "git for-each-ref" is given a "--format"
string that doesn't include information about the pointed-to object,
it does not look up the object at all, which makes it considerably
faster. Iterating with DO_FOR_EACH_INCLUDE_BROKEN is essential to this
optimization because otherwise for_each_ref() would itself need to
check whether the object exists as part of its brokenness test.)
But for_each_rawref() includes broken references in the iteration, and
"git for-each-ref" doesn't itself reject references with REF_ISBROKEN.
The result is that broken references are processed *as if* they had
the value NULL_SHA1, which is the value stored in entries for broken
references.
Change "git for-each-ref" to emit warnings for references that are
REF_ISBROKEN but to otherwise skip them.
Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
---
builtin/for-each-ref.c | 5 +++++
t/t6301-for-each-ref-errors.sh | 2 +-
2 files changed, 6 insertions(+), 1 deletion(-)
diff --git a/builtin/for-each-ref.c b/builtin/for-each-ref.c
index 83f9cf9..13d2172 100644
--- a/builtin/for-each-ref.c
+++ b/builtin/for-each-ref.c
@@ -851,6 +851,11 @@ static int grab_single_ref(const char *refname, const unsigned char *sha1, int f
return 0;
}
+ if (flag & REF_ISBROKEN) {
+ warning("ignoring broken ref %s", refname);
+ return 0;
+ }
+
if (*cb->grab_pattern) {
const char **pattern;
int namelen = strlen(refname);
diff --git a/t/t6301-for-each-ref-errors.sh b/t/t6301-for-each-ref-errors.sh
index cf25244..72d2397 100755
--- a/t/t6301-for-each-ref-errors.sh
+++ b/t/t6301-for-each-ref-errors.sh
@@ -14,7 +14,7 @@ test_expect_success setup '
git for-each-ref --format="%(objectname) %(refname)" >brief-list
'
-test_expect_failure 'Broken refs are reported correctly' '
+test_expect_success 'Broken refs are reported correctly' '
r=refs/heads/bogus &&
: >.git/$r &&
test_when_finished "rm -f .git/$r" &&
--
2.1.4
^ permalink raw reply related [flat|nested] 8+ messages in thread
* [PATCH v2 3/3] read_loose_refs(): treat NULL_SHA1 loose references as broken
2015-06-02 15:57 [PATCH v2 0/3] Fix how for-each-ref handles broken loose references Michael Haggerty
2015-06-02 15:57 ` [PATCH v2 1/3] t6301: new tests of for-each-ref error handling Michael Haggerty
2015-06-02 15:57 ` [PATCH v2 2/3] for-each-ref: report broken references correctly Michael Haggerty
@ 2015-06-02 15:57 ` Michael Haggerty
2015-06-02 17:28 ` Stefan Beller
2015-06-02 20:11 ` Junio C Hamano
2 siblings, 2 replies; 8+ messages in thread
From: Michael Haggerty @ 2015-06-02 15:57 UTC (permalink / raw
To: Junio C Hamano
Cc: Anders Kaseorg, Stefan Beller, Jeff King, git, Michael Haggerty
NULL_SHA1 is never a valid value for a reference. If a loose reference
has that value, mark it as broken.
Why check NULL_SHA1 and not the nearly 2^160 other SHA-1s that are
also invalid in a given repository? Because (a) it is cheap to test
for NULL_SHA1, and (b) NULL_SHA1 is often used as a "SHA-1 is invalid"
value inside of Git client source code (not only ours!), and
accidentally writing it to a loose reference file would be an easy
mistake to make.
Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
---
refs.c | 7 +++++++
t/t6301-for-each-ref-errors.sh | 2 +-
2 files changed, 8 insertions(+), 1 deletion(-)
diff --git a/refs.c b/refs.c
index 47e4e53..c28fde1 100644
--- a/refs.c
+++ b/refs.c
@@ -1321,6 +1321,13 @@ static void read_loose_refs(const char *dirname, struct ref_dir *dir)
hashclr(sha1);
flag |= REF_ISBROKEN;
}
+
+ if (!(flag & REF_ISBROKEN) && is_null_sha1(sha1)) {
+ /* NULL_SHA1 is never a valid reference value. */
+ hashclr(sha1);
+ flag |= REF_ISBROKEN;
+ }
+
if (check_refname_format(refname.buf,
REFNAME_ALLOW_ONELEVEL)) {
hashclr(sha1);
diff --git a/t/t6301-for-each-ref-errors.sh b/t/t6301-for-each-ref-errors.sh
index 72d2397..cdb67a0 100755
--- a/t/t6301-for-each-ref-errors.sh
+++ b/t/t6301-for-each-ref-errors.sh
@@ -24,7 +24,7 @@ test_expect_success 'Broken refs are reported correctly' '
test_cmp broken-err err
'
-test_expect_failure 'NULL_SHA1 refs are reported correctly' '
+test_expect_success 'NULL_SHA1 refs are reported correctly' '
r=refs/heads/zeros &&
echo $ZEROS >.git/$r &&
test_when_finished "rm -f .git/$r" &&
--
2.1.4
^ permalink raw reply related [flat|nested] 8+ messages in thread
* Re: [PATCH v2 3/3] read_loose_refs(): treat NULL_SHA1 loose references as broken
2015-06-02 15:57 ` [PATCH v2 3/3] read_loose_refs(): treat NULL_SHA1 loose references as broken Michael Haggerty
@ 2015-06-02 17:28 ` Stefan Beller
2015-06-02 21:10 ` Michael Haggerty
2015-06-02 20:11 ` Junio C Hamano
1 sibling, 1 reply; 8+ messages in thread
From: Stefan Beller @ 2015-06-02 17:28 UTC (permalink / raw
To: Michael Haggerty
Cc: Junio C Hamano, Anders Kaseorg, Jeff King, git@vger.kernel.org
On Tue, Jun 2, 2015 at 8:57 AM, Michael Haggerty <mhagger@alum.mit.edu> wrote:
> NULL_SHA1 is never a valid value for a reference. If a loose reference
> has that value, mark it as broken.
>
> Why check NULL_SHA1 and not the nearly 2^160 other SHA-1s that are
> also invalid in a given repository? Because (a) it is cheap to test
> for NULL_SHA1, and (b) NULL_SHA1 is often used as a "SHA-1 is invalid"
I don't quite agree with the reasoning here. Just because it's cheap doesn't
mean it's right. ;) But I fully agree with (b) so this still makes sense.
> value inside of Git client source code (not only ours!), and
> accidentally writing it to a loose reference file would be an easy
> mistake to make.
>
> Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
> ---
> refs.c | 7 +++++++
> t/t6301-for-each-ref-errors.sh | 2 +-
> 2 files changed, 8 insertions(+), 1 deletion(-)
>
> diff --git a/refs.c b/refs.c
> index 47e4e53..c28fde1 100644
> --- a/refs.c
> +++ b/refs.c
> @@ -1321,6 +1321,13 @@ static void read_loose_refs(const char *dirname, struct ref_dir *dir)
> hashclr(sha1);
> flag |= REF_ISBROKEN;
> }
> +
> + if (!(flag & REF_ISBROKEN) && is_null_sha1(sha1)) {
Why do we do the extra check for !(flag & REF_ISBROKEN) here?
> + /* NULL_SHA1 is never a valid reference value. ...
... *by our definition*, because we believe it helps detecting
errors/mistakes in the future.
*/
> + hashclr(sha1);
While this code looks consistent to the rest around, at closer inspection
this feels a bit redundant to me. If is_null_sha1(sha1) is true, then
hashclr(sha1); doesn't change the state. Or did I miss a subtle side effect?
> + flag |= REF_ISBROKEN;
> + }
> +
> if (check_refname_format(refname.buf,
> REFNAME_ALLOW_ONELEVEL)) {
> hashclr(sha1);
> diff --git a/t/t6301-for-each-ref-errors.sh b/t/t6301-for-each-ref-errors.sh
> index 72d2397..cdb67a0 100755
> --- a/t/t6301-for-each-ref-errors.sh
> +++ b/t/t6301-for-each-ref-errors.sh
> @@ -24,7 +24,7 @@ test_expect_success 'Broken refs are reported correctly' '
> test_cmp broken-err err
> '
>
> -test_expect_failure 'NULL_SHA1 refs are reported correctly' '
> +test_expect_success 'NULL_SHA1 refs are reported correctly' '
> r=refs/heads/zeros &&
> echo $ZEROS >.git/$r &&
> test_when_finished "rm -f .git/$r" &&
> --
> 2.1.4
>
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH v2 3/3] read_loose_refs(): treat NULL_SHA1 loose references as broken
2015-06-02 15:57 ` [PATCH v2 3/3] read_loose_refs(): treat NULL_SHA1 loose references as broken Michael Haggerty
2015-06-02 17:28 ` Stefan Beller
@ 2015-06-02 20:11 ` Junio C Hamano
1 sibling, 0 replies; 8+ messages in thread
From: Junio C Hamano @ 2015-06-02 20:11 UTC (permalink / raw
To: Michael Haggerty; +Cc: Anders Kaseorg, Stefan Beller, Jeff King, git
Michael Haggerty <mhagger@alum.mit.edu> writes:
> NULL_SHA1 is never a valid value for a reference. If a loose reference
> has that value, mark it as broken.
>
> Why check NULL_SHA1 and not the nearly 2^160 other SHA-1s that are
> also invalid in a given repository? Because (a) it is cheap to test
> for NULL_SHA1, and (b) NULL_SHA1 is often used as a "SHA-1 is invalid"
> value inside of Git client source code (not only ours!), and
> accidentally writing it to a loose reference file would be an easy
> mistake to make.
>
> Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
> ---
> refs.c | 7 +++++++
> t/t6301-for-each-ref-errors.sh | 2 +-
> 2 files changed, 8 insertions(+), 1 deletion(-)
>
> diff --git a/refs.c b/refs.c
> index 47e4e53..c28fde1 100644
> --- a/refs.c
> +++ b/refs.c
> @@ -1321,6 +1321,13 @@ static void read_loose_refs(const char *dirname, struct ref_dir *dir)
> hashclr(sha1);
> flag |= REF_ISBROKEN;
> }
> +
> + if (!(flag & REF_ISBROKEN) && is_null_sha1(sha1)) {
> + /* NULL_SHA1 is never a valid reference value. */
> + hashclr(sha1);
Do you need to clear after checking with is_null_sha1()?
> + flag |= REF_ISBROKEN;
> + }
> +
> if (check_refname_format(refname.buf,
> REFNAME_ALLOW_ONELEVEL)) {
> hashclr(sha1);
> diff --git a/t/t6301-for-each-ref-errors.sh b/t/t6301-for-each-ref-errors.sh
> index 72d2397..cdb67a0 100755
> --- a/t/t6301-for-each-ref-errors.sh
> +++ b/t/t6301-for-each-ref-errors.sh
> @@ -24,7 +24,7 @@ test_expect_success 'Broken refs are reported correctly' '
> test_cmp broken-err err
> '
>
> -test_expect_failure 'NULL_SHA1 refs are reported correctly' '
> +test_expect_success 'NULL_SHA1 refs are reported correctly' '
> r=refs/heads/zeros &&
> echo $ZEROS >.git/$r &&
> test_when_finished "rm -f .git/$r" &&
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH v2 3/3] read_loose_refs(): treat NULL_SHA1 loose references as broken
2015-06-02 17:28 ` Stefan Beller
@ 2015-06-02 21:10 ` Michael Haggerty
2015-06-03 9:09 ` Jeff King
0 siblings, 1 reply; 8+ messages in thread
From: Michael Haggerty @ 2015-06-02 21:10 UTC (permalink / raw
To: Stefan Beller
Cc: Junio C Hamano, Anders Kaseorg, Jeff King, git@vger.kernel.org
On 06/02/2015 07:28 PM, Stefan Beller wrote:
> On Tue, Jun 2, 2015 at 8:57 AM, Michael Haggerty <mhagger@alum.mit.edu> wrote:
>> NULL_SHA1 is never a valid value for a reference. If a loose reference
>> has that value, mark it as broken.
>>
>> Why check NULL_SHA1 and not the nearly 2^160 other SHA-1s that are
>> also invalid in a given repository? Because (a) it is cheap to test
>> for NULL_SHA1, and (b) NULL_SHA1 is often used as a "SHA-1 is invalid"
>
> I don't quite agree with the reasoning here. Just because it's cheap doesn't
> mean it's right. ;) But I fully agree with (b) so this still makes sense.
Its cheapness improves the cost/benefit ratio of adding this check.
>> value inside of Git client source code (not only ours!), and
>> accidentally writing it to a loose reference file would be an easy
>> mistake to make.
>>
>> Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
>> ---
>> refs.c | 7 +++++++
>> t/t6301-for-each-ref-errors.sh | 2 +-
>> 2 files changed, 8 insertions(+), 1 deletion(-)
>>
>> diff --git a/refs.c b/refs.c
>> index 47e4e53..c28fde1 100644
>> --- a/refs.c
>> +++ b/refs.c
>> @@ -1321,6 +1321,13 @@ static void read_loose_refs(const char *dirname, struct ref_dir *dir)
>> hashclr(sha1);
>> flag |= REF_ISBROKEN;
>> }
>> +
>> + if (!(flag & REF_ISBROKEN) && is_null_sha1(sha1)) {
>
> Why do we do the extra check for !(flag & REF_ISBROKEN) here?
That was an attempt to avoid calling is_null_sha1() unnecessarily. I
think I can make this go away and make the code clearer in general by
restructuring the logic a little bit. I will do that in the next round.
>> + /* NULL_SHA1 is never a valid reference value. ...
>
> ... *by our definition*, because we believe it helps detecting
> errors/mistakes in the future.
It's not even by our definition. It is just astronomically more likely
that NULL_SHA1 got set there due to an error than that it is the SHA-1
of legitimate content. In fact it is so unlikely that we use NULL_SHA1
throughout our code to indicate "invalid SHA-1", ignoring the
theoretical possibility that it could appear some day as a real SHA-1.
I'll try to explain this point better in the comment.
> */
>> + hashclr(sha1);
>
> While this code looks consistent to the rest around, at closer inspection
> this feels a bit redundant to me. If is_null_sha1(sha1) is true, then
> hashclr(sha1); doesn't change the state. Or did I miss a subtle side effect?
You're right, there is no reason to call hashclr() here.
>> + flag |= REF_ISBROKEN;
>> + }
>> +
>> [...]
Thanks for your review!
Michael
--
Michael Haggerty
mhagger@alum.mit.edu
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH v2 3/3] read_loose_refs(): treat NULL_SHA1 loose references as broken
2015-06-02 21:10 ` Michael Haggerty
@ 2015-06-03 9:09 ` Jeff King
0 siblings, 0 replies; 8+ messages in thread
From: Jeff King @ 2015-06-03 9:09 UTC (permalink / raw
To: Michael Haggerty
Cc: Stefan Beller, Junio C Hamano, Anders Kaseorg,
git@vger.kernel.org
On Tue, Jun 02, 2015 at 11:10:22PM +0200, Michael Haggerty wrote:
> >> +
> >> + if (!(flag & REF_ISBROKEN) && is_null_sha1(sha1)) {
> >
> > Why do we do the extra check for !(flag & REF_ISBROKEN) here?
>
> That was an attempt to avoid calling is_null_sha1() unnecessarily. I
> think I can make this go away and make the code clearer in general by
> restructuring the logic a little bit. I will do that in the next round.
If you get rid of the useless hashclr(), then this just becomes:
if (!(flag & REF_ISBROKEN) && is_null_sha1(sha1))
flag |= REF_ISBROKEN;
The reason for the initial check seems pretty obvious then (but it would
also be OK without it; is_null_sha1 is not that expensive).
-Peff
^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2015-06-03 9:09 UTC | newest]
Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2015-06-02 15:57 [PATCH v2 0/3] Fix how for-each-ref handles broken loose references Michael Haggerty
2015-06-02 15:57 ` [PATCH v2 1/3] t6301: new tests of for-each-ref error handling Michael Haggerty
2015-06-02 15:57 ` [PATCH v2 2/3] for-each-ref: report broken references correctly Michael Haggerty
2015-06-02 15:57 ` [PATCH v2 3/3] read_loose_refs(): treat NULL_SHA1 loose references as broken Michael Haggerty
2015-06-02 17:28 ` Stefan Beller
2015-06-02 21:10 ` Michael Haggerty
2015-06-03 9:09 ` Jeff King
2015-06-02 20:11 ` Junio C Hamano
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.