Git Mailing List Archive mirror
 help / color / mirror / Atom feed
From: Johannes Schindelin <Johannes.Schindelin@gmx.de>
To: Elijah Newren <newren@gmail.com>
Cc: git@vger.kernel.org
Subject: Replaying merges
Date: Sat, 18 May 2024 02:35:07 +0200 (CEST)	[thread overview]
Message-ID: <ee35f3b2-bf20-6fcc-2c71-38499aa592fe@gmx.de> (raw)

Hi Elijah,

I took the suggestion to heart that you explained a couple of times to me:
To replay merge commits (including their merge conflict resolutions) by
using the _remerged_ commit as merge base, the original merge commit as
merge head, and the newly-created merge (with conflicts and all) as HEAD.

I noodled on this idea a bit until I got it into a usable shape that I
applied to great effect when working on the recent embargoed releases.

Here it is, the script [*1*] that I used (basically replacing all the
`merge -C` instances in the rebase script with `replay-merge.sh`):

-- snip --
#!/bin/sh

die () {
	echo "$*" >&2
	exit 1
}

test $# = 2 ||
die "Usage: $0 <original-merge> <rewritten-merge-head>"

original_merge="$(git rev-parse --verify "$1")" ||
die "Not a revision? $1"
test ' ' = "$(git show -s --format=%P "$original_merge" | tr -dc ' ')" ||
die "Not a merge? $1"
rewritten_merge_head="$(git rev-parse --verify "$2" 2>/dev/null)" ||
rewritten_merge_head="$(git rev-parse --verify "refs/rewritten/$2")" ||
die "Not a revision? $2"

# Already merged?
if test 0 -eq $(git rev-list --count HEAD..$rewritten_merge_head)
then
	echo "Already merged: $2" >&2
	exit 0
fi

# Can we fast-forward instead?
if test "$(git rev-parse HEAD $rewritten_merge_head)" = "$(git rev-parse $original_merge^ $original_merge^2)"
then
	echo "Fast-forwarding to $1" >&2
	exec git merge --no-stat --ff-only $original_merge
	die "Could not fast-forward to $original_merge"
fi

# Only Git v2.45 and newer can handle the `--merge-base=<tree>` invocation
validate_git_version () {
	empty_tree=4b825dc642cb6eb9a060e54bf8d69288fbee4904
	git merge-tree --merge-base=$empty_tree $empty_tree $empty_tree >/dev/null 2>&1 ||
	die "Need a Git version that understands --merge-base=<tree-ish>"
}
validate_git_version

do_merge () {
	git update-ref refs/tmp/head $1 &&
	git update-ref refs/tmp/merge_head $2 &&
	{ result="$(git merge-tree refs/tmp/head refs/tmp/merge_head)"; res=$?; } &&
	echo "$result" | head -n 1 &&
	return $res
}

remerge_original=$(do_merge $original_merge^ $original_merge^2)
test -n "$remerge_original" || die "Could not remerge $original_merge"
merge_new=$(do_merge HEAD $rewritten_merge_head)
test -n "$merge_new" || die "Could not merge $rewritten_merge_head"
new_tree=$(git merge-tree --merge-base=$remerge_original $original_merge $merge_new | head -n 1)
test -n "$new_tree" || die "Could not create new merge"

# Even though there might be merge conflicts, the `merge-tree` command might
# succeed with exit code 0! The reason is that the merge conflict may originate
# from one of the previous two merges.

files_with_conflicts="$(git diff $original_merge..$new_tree |
	sed -ne '/^diff --git /{
		# store the first file name in the hold area
		s/^diff --git a\/\(.*\) b\/.*$/\1/
		x
	}' -e '/^+<<<<<<< refs\/tmp\/head$/{
		# found a merge conflict
                :1
                # read all lines until the ==== line
                n
		/^+=======$/b2
                b1
                :2
                # read all lines until the >>>> line
                /+>>>>>>> refs\/tmp\/merge_head$/{
			# print file name
			x
			p
			# skip to next file
			:3
			n
			/^diff --git/{
				# store the first file name in the hold area
				s/^diff --git a\/\(.*\) b\/.*$/\1/
				x
				b
			}
			b3
		}
		n
		b2
	}')"

# Is it a "Sync with <version>" merge? Then regenerate the log
sync_info="$(git cat-file commit $original_merge |
	sed -n '/^$/{N;s/^\n//;/^Sync with 2\./{N;N;s/^\(.*\)\n\n\* \([^:]*\).*/\1,\2/p};q}')"
merge_msg=
if test -n "$sync_info"
then
	merge_msg="$(printf '%s\t\t%s\n' $rewritten_merge_head "${sync_info#*,}" |
		git fmt-merge-msg --log -m "${sync_info%,*}" |
		grep -v '^#')"
fi

if test -z "$files_with_conflicts"
then
	# No conflicts
	committer="$(git var GIT_COMMITTER_IDENT)" ||
	die "Could not get committer ident"
	new_commit="$(git cat-file commit "$original_merge")" ||
	die "Could not get commit message of $original_merge"
	new_commit="$(echo "$new_commit" |
		sed '1,/^$/{
			s/^tree .*/tree '"$new_tree"'/
			s/^committer .*/committer '"$committer"'/
			/^parent /{
				:1
				N
				s/.*\n//
				/^parent /b1
				i\
parent '"$(git rev-parse HEAD)"'\
parent '"$(git rev-parse $rewritten_merge_head)"'
			}
		}')"
	if test -n "$merge_msg"
	then
		new_commit="$(printf '%s\n\n%s\n' \
			"$(echo "$new_commit" | sed '/^$/q')" \
			"$merge_msg")"
	fi
	new_commit="$(echo "$new_commit" | git hash-object -t commit -w --stdin)" ||
	die "Could not transmogrify commit object"
	git merge --no-stat -q --ff-only "$new_commit"
else
	echo "no-ff" >"$(git rev-parse --git-path MERGE_MODE)"
	git rev-parse "$rewritten_merge_head" >"$(git rev-parse --git-path MERGE_HEAD)"
	if test -n "$merge_msg"
	then
		echo "$merge_msg"
	else
		git cat-file commit "$original_merge" |
		sed '1,/^$/d'
	fi >"$(git rev-parse --git-path MERGE_MSG)"

	git read-tree -u --reset "$new_tree" ||
	die "Could not update to $new_tree"

	echo "$files_with_conflicts" |
	while read file
	do
		echo "Needs merge: $file"
		mode="$(git ls-tree $new_tree "$file" | sed 's/ .*//')" &&
		a=$(git show "$new_tree:$file" |
			sed -e '/^<<<<<<< refs\/tmp\/head$/d' \
			    -e '/^=======$/,/>>>>>>> refs\/tmp\/merge_head$/d' |
			git hash-object -w --stdin) &&
		b=$(git show "$new_tree:$file" |
			sed -e '/^<<<<<<< refs\/tmp\/head$/,/^=======$/d' \
			    -e '/>>>>>>> refs\/tmp\/merge_head$/d' |
			git hash-object -w --stdin) &&
		printf "%s %s %s\t%s\n" \
			0 $a 0 "$file" \
			$mode $(git rev-parse HEAD:"$file") 1 "$file" \
			$mode $a 2 "$file" \
			$mode $b 3 "$file" |
		git update-index --index-info ||
		die "Could not update the index with '$file'"
	done
	die "There were merge conflicts"
fi
-- snap --

For the most part, this worked beautifully.

However. The devil lies in the detail. You will see that the majority of
the script is concerned with recreating the stages that need to be put
into the index. The reason is that the merge conflicts are already part of
the merge base and hence the `merge-tree` arguments do not reflect the
stages.

But it gets even worse. The biggest complication is not even addressed in
this script, when I realized what was going on, I understood immediately
that it was time to abandon the shell script and start implementing this
logic in C (which I can currently only do on my own time, which is
scarce). The biggest complication being the scenario... when a merge
conflict had been addressed in the original merge commit, but in the
replayed merge there is no conflict. In such a scenario, this script _will
create not one, but two merge conflicts, nested ones_!

I still do think that your idea has merit, but I fear that it won't ever
be as easy as performing multiple three-way merges in succession. To
address the observed problem, the code will always have to be aware of
unresolved conflicts in the provided merge base, so that it can handle
them appropriately, and not treat them as plain text, so that no nested
conflicts need to be created.

Unfortunately, I did not document properly in what precise circumstances
those nested conflicts were generated (I was kind of busy trying to
coordinate everything around the security bug-fix releases), but I hope to
find some time soon to do so, and to turn them into a set of test cases
that we can play with.

Ciao,
Johannes

Footnote *1*: You'd think that I'd learn from past experiences _not_ to
prototype in Bash when I want to eventually implement it in C. Honestly, I
thought I could get away with it because I failed to anticipate the many
complications, not the least of which being that there is currently no
_actually_ correct way to generate the stages. So basically I thought that
the script would consist of the part before the code comment starting with
"Even though there might be merge conflicts"...

             reply	other threads:[~2024-05-18  0:35 UTC|newest]

Thread overview: 3+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-05-18  0:35 Johannes Schindelin [this message]
2024-05-18  1:45 ` Replaying merges Elijah Newren
     [not found]   ` <CANiSa6gyNpJ3cUNLD1hFnBYeDFm6aFYv8k41MGvX+C90G8oaaw@mail.gmail.com>
2024-05-18 17:50     ` Martin von Zweigbergk

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ee35f3b2-bf20-6fcc-2c71-38499aa592fe@gmx.de \
    --to=johannes.schindelin@gmx.de \
    --cc=git@vger.kernel.org \
    --cc=newren@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).