Hello! My company is a pretty well established customer with many pipelines. For a recent project, I have been working on a pipeline for one of our repos that lints files that were changed as part of the pull request.
I did search the forums to see if I could find any other people running into this problem, to no avail. I will say I’ve seen references to external plugins that pull a list of changed files, but they seem to all do more or less what the below git command does, unless I am missing something.
For a long time, we’ve been using this command to get a list of files changed in the PR (in theory):
git diff --name-only --diff-filter=ACMRT $(git merge-base HEAD main) | grep '.libsonnet\\|.jsonnet'
This works great on our laptops, and for the PR I am working on the command locally shows the 1 libsonnet file that changed. However, when run in the buildkite pipeline, it shows a ton of files that were not changed in this PR, but were changed semi-recently.
So essentially since the start of us using BK, we’ve sometimes been running the linters the wrong files.
I did some digging to see if I can figure out what is different between the BK environment and my laptop, and did find some differences but I’m not sure what they mean.
Here’s the “preparing working directory” commands that are run in my pipeline:
> Preparing working directory
$ git clean -ffxdq
# Fetch and checkout pull request head from GitHub
$ git fetch -v --prune -- origin refs/pull/4927/head
Warning: Permanently added the ECDSA host key for IP address to the list of known hosts.
From github.com:persona-id/repo
* branch refs/pull/4927/head -> FETCH_HEAD
# FETCH_HEAD is now `c2aa574191ba559e0c9301240427454abaa15262`
$ git fetch -v --prune -- origin c2aa574191ba559e0c9301240427454abaa15262
From github.com:persona-id/repo
* branch c2aa574191ba559e0c9301240427454abaa15262 -> FETCH_HEAD
$ git checkout -f c2aa574191ba559e0c9301240427454abaa15262
Warning: you are leaving 1 commit behind, not connected to
any of your branches:
09343b78c chore(stacks): replica server deploy for stack-0000
If you want to keep it by creating a new branch, this may be a good time
to do so with:
git branch <new-branch-name> 09343b78c
HEAD is now at c2aa57419 Merge branch 'main' into kuzmik/add-dd-linter
# Cleaning again to catch any post-checkout changes
$ git clean -ffxdq
# Checking to see if git commit information needs to be sent to Buildkite...
$ buildkite-agent meta-data exists buildkite:git:commit
After that we have a plugin that runs:
> Running plugin buildkite-persona-plugin pre-command hook
$ /etc/buildkite-agent/plugins/buildkite-runner-pipeline-builder-amd64-std-m67x/github-com-persona-id-buildkite-persona-plugin-v1-0-15/hooks/pre-command
running: git fetch origin main
From github.com:persona-id/repo
* branch main -> FETCH_HEAD
And then the rest of the steps run.
I paired with some of the team and the only thing we can think of is that when BK does the initial clone, it’s doing something… “weird” for lack of a better adjective, where it loses commit history, branch state… or something.
As a quick fix, I am now calling the Github API to get a canonical list of files that were changed in a PR, but I’d much rather rely on the tools that are built into the pipelines than a third party API if possible.
Anyone else run into weirdness like this, or have a clever way they are enumerating changed files in a pipeline?
Thanks!