Question about how git clones happen

Hello! My company is a pretty well established customer with many pipelines. For a recent project, I have been working on a pipeline for one of our repos that lints files that were changed as part of the pull request.

I did search the forums to see if I could find any other people running into this problem, to no avail. I will say I’ve seen references to external plugins that pull a list of changed files, but they seem to all do more or less what the below git command does, unless I am missing something.

For a long time, we’ve been using this command to get a list of files changed in the PR (in theory):

git diff --name-only --diff-filter=ACMRT $(git merge-base HEAD main) | grep '.libsonnet\\|.jsonnet'

This works great on our laptops, and for the PR I am working on the command locally shows the 1 libsonnet file that changed. However, when run in the buildkite pipeline, it shows a ton of files that were not changed in this PR, but were changed semi-recently.

So essentially since the start of us using BK, we’ve sometimes been running the linters the wrong files.

I did some digging to see if I can figure out what is different between the BK environment and my laptop, and did find some differences but I’m not sure what they mean.

Here’s the “preparing working directory” commands that are run in my pipeline:

> Preparing working directory
$ git clean -ffxdq
# Fetch and checkout pull request head from GitHub

$ git fetch -v --prune -- origin refs/pull/4927/head
Warning: Permanently added the ECDSA host key for IP address to the list of known hosts.
From github.com:persona-id/repo
 * branch                refs/pull/4927/head -> FETCH_HEAD
# FETCH_HEAD is now `c2aa574191ba559e0c9301240427454abaa15262`

$ git fetch -v --prune -- origin c2aa574191ba559e0c9301240427454abaa15262
From github.com:persona-id/repo
 * branch                c2aa574191ba559e0c9301240427454abaa15262 -> FETCH_HEAD

$ git checkout -f c2aa574191ba559e0c9301240427454abaa15262
Warning: you are leaving 1 commit behind, not connected to
any of your branches:
  09343b78c chore(stacks): replica server deploy for stack-0000
If you want to keep it by creating a new branch, this may be a good time
to do so with:
 git branch <new-branch-name> 09343b78c
HEAD is now at c2aa57419 Merge branch 'main' into kuzmik/add-dd-linter

# Cleaning again to catch any post-checkout changes
$ git clean -ffxdq

# Checking to see if git commit information needs to be sent to Buildkite...
$ buildkite-agent meta-data exists buildkite:git:commit

After that we have a plugin that runs:

> Running plugin buildkite-persona-plugin pre-command hook
$ /etc/buildkite-agent/plugins/buildkite-runner-pipeline-builder-amd64-std-m67x/github-com-persona-id-buildkite-persona-plugin-v1-0-15/hooks/pre-command
running: git fetch origin main
From github.com:persona-id/repo
 * branch                main       -> FETCH_HEAD

And then the rest of the steps run.

I paired with some of the team and the only thing we can think of is that when BK does the initial clone, it’s doing something… “weird” for lack of a better adjective, where it loses commit history, branch state… or something.

As a quick fix, I am now calling the Github API to get a canonical list of files that were changed in a PR, but I’d much rather rely on the tools that are built into the pipelines than a third party API if possible.

Anyone else run into weirdness like this, or have a clever way they are enumerating changed files in a pipeline?

Thanks!

Hey @kuzmik,

Welcome to the community!

Are you using mirrored repositories that might not be up to date? Another possible cause could be not performing a clean checkout, which might explain why you’re seeing more file changes than locally.

Also, when you say it was tested locally, do you mean running an agent locally or just checking out the repo and running the git command?

If you’d like to share more detailed information, like a build link etc, feel free to email us at support@buildkite.com.

Hi Stephanie,

Ah, good question. When I said I tested locally, I mean that when I run:

git diff --name-only --diff-filter=ACMRT $(git merge-base HEAD main) | grep '.libsonnet\\|.jsonnet' locally I get the expected output of 1 file.

However, as an experiment I also ran all the commands that the BK runner runs (from my original post) and got the expected output as well.

I feel like there’s something weird about whatever copy of the repo the runner has when the pipeline runs. But we’re on self-hosted runners, and they cycle enough that I am pretty sure it’s not using an old copy of the git repo that has been around for a while.

I will email support@, because even though I’ve solved this with the github api, I hate a lingering mystery and it’s been eating at me :rofl:

When and if we figure out what the issue is, I will update the post in case anyone in the future has the same weird issue!

Update!

I reached out to support@ and they were incredibly helpful. We think we’ve solved the issue by changing the command from:

git diff --name-only --diff-filter=ACMRT $(git merge-base HEAD #{base_branch}) | grep '.libsonnet\\|.jsonnet'

to

git diff --name-only --diff-filter=ACMRT origin/main | grep '.libsonnet\\|.jsonnet'

I ran some tests and it now seems to work properly. I am somewhat embarrassed that I didn’t try that, because I tried a LOT of different permutations :D Ah well, it works now.

Thanks again Stephanie and BK the support team!