Git repo corruption on MacOS with git-mirrors experiment

For awhile now we’ve been seeing issues where a git-mirror directory has become corrupted. I haven’t been able to narrow down under what conditions exactly it happens, but when it has happened, I see errors like (sorry about the redactions/obfuscations, we’re a bit security paranoid here :-) ):

2023-02-14 22:17:45 UTC	Preparing working directory
2023-02-14 22:17:45 UTC	$ cd /Users/ci/buildkite/builds/mac-buildkite-10-xxx-xx-xx-1/org-name/repo-redacted
2023-02-14 22:17:45 UTC	# Host "github.org-name.net" already in list of known hosts at "/Users/ci/.ssh/known_hosts"
2023-02-14 22:17:45 UTC	# Using git-mirrors experiment 🧪
2023-02-14 22:17:45 UTC	$ cd /Users/ci/buildkite/repos
2023-02-14 22:17:45 UTC	# Commit "0ea2865f5505f6abf85faf24ce50bb1aabebdf88" exists in mirror
2023-02-14 22:17:45 UTC	$ cd /Users/ci/buildkite/builds/mac-buildkite-10-xxx-xx-xx-1/org/repo-redacted
2023-02-14 22:17:45 UTC	$ git remote set-url origin git@github.org-name.net:project/project-mobile.git
2023-02-14 22:17:45 UTC	$ git submodule foreach --recursive "git clean -ffxdq"
2023-02-14 22:17:45 UTC	$ git clean -ffxdq
2023-02-14 22:17:45 UTC	# Fetch and checkout pull request head from GitHub
2023-02-14 22:17:45 UTC	$ git fetch -v --prune -- origin refs/pull/11379/head
2023-02-14 22:17:45 UTC	error: refs/remotes/origin/branch-redaccted-phase-1.2-part2 does not point to a valid object!
2023-02-14 22:17:46 UTC	error: refs/remotes/origin/branch-redaccted-phase-1.2-part2 does not point to a valid object!
2023-02-14 22:17:46 UTC	fatal: bad object HEAD
2023-02-14 22:17:46 UTC	error: github.org-name.net:project/project-mobile.git did not send all necessary objects
2023-02-14 22:17:46 UTC	
2023-02-14 22:17:46 UTC	Auto packing the repository in background for optimum performance.
2023-02-14 22:17:46 UTC	See "git help gc" for manual housekeeping.
2023-02-14 22:17:46 UTC	warning: The last gc run reported the following. Please correct the root cause
2023-02-14 22:17:46 UTC	and remove .git/gc.log.
2023-02-14 22:17:46 UTC	Automatic cleanup will not be performed until the file is removed.
2023-02-14 22:17:46 UTC	
2023-02-14 22:17:46 UTC	fatal: bad object refs/remotes/origin/branch-redaccted-phase-1.2-part2
2023-02-14 22:17:46 UTC	fatal: failed to run repack

Then, getting on the box and trying some git commands myself, I see that the repo seems corrupt beyond saving:

ci@buildkite-10-xxx-xx-xx dirname % git status
git status
fatal: bad object HEAD
ci@buildkite-10-xxx-xx-xx dirname % git fsck
git fsck
Checking object directories: 100% (256/256), done.
Checking object directories: 100% (256/256), done.
Checking objects: 100% (2830123/2830123), done.
error: refs/remotes/origin/branch-redacted-phase-1.2-part2: invalid sha1 pointer 36e4292bb7e145c21bafede7e770744fecc1cc7d
error: HEAD: invalid sha1 pointer ce9065576e659bdce9bfc430268f323f2fa4cdec
error: b28a9e7189249d4dd8ef266498bce370334eb53c: invalid sha1 pointer in cache-tree
dangling commit 2200e0e8e4bfd38c21f9420e4d1c10d5a9dca024
dangling commit c0002093f93f5971b1bc8be9ace9bc6ebcc35db3
... [Many many lines of dangling commit or dangling tag...]
dangling tag 53c5df929794b62a3bf8a79f2443ae8bda61dd24
... [Many many lines of dangling commit or dangling tag...]

Not sure what the best approach is, other than just turning off the git-mirrors experiment. For now I’ve just been deleting the mirror directory, but it seems to be happening a fair amount so that’s not really sustainable.

Hello, @rthille! Welcome to the community!
Let me look into what’s going on here and try to reproduce the issue.
The redactions/obfuscations are the way to do it :+1:.

On it. Cheers!

Hello again, @rthille! Have you tried cloning your main repo afresh and mirroring it? Will the issue with your mirror directories persist in this case? Also, do you use force pushes? They might sometimes cause your main repo and the mirror to go out of sync.

There can be a number of other reasons causing this issue, so if it persists on a fresh repository and its mirrors, you can reach out to the support email with the build details.

I hope this helps. Best!

I’ll dig in if I get a chance. I manage the Macs used for the builds, but am not very familiar with the projects which build on them. We have several repos which are using the git-mirrors experiment and so far I think we’ve only seen the corruption with one of them. My approach up to now has just been to manually delete the mirror directory when a team notifies me that they’re seeing corruption. After the deletion it’s recreated automatically and works fine, for awhile. I’ve got an approach now where I can script the detection of the corrupt mirror directories, so this isn’t a super high priority, but I’m curious if you could think of what might cause the corruption. I’m pretty certain we don’t have multiple agents running on the same host, sharing the git-mirror directories, which I did think could possibly cause the issue. Also, it doesn’t seem to be related to disk/filesystem corruption nor to the Macs rebooting at “inopportune” times :slight_smile:

Hello, @rthille!
In the case you’re describing, the most likely culprits could be issues with the repo that’s being mirrored - which is why I recommend cloning it and running a mirror off a fresh clone to see if the issue emerges again. Another usual suspect is force push being used that might cause the repo and its mirror to go out of sync. For deeper troubleshooting, after these causes have been ruled out, we’ll need the build details. You can send those to the support email.

Happy to help! Cheers!