We upgraded from Elasticstack version v6.17.0 to v6.18.0.
This in turn upgraded the agents from v3.66.0 to v3.67.0
Now our Buildkite pipelines have bogus headings taken from completely different branches from the one that was merged into the master/main branches.
I don’t know how to diagnose this…
For example this build: https://buildkite.com/healthengineau/megatron/builds/187326
has the title: NOOT-640 Is this finally the end of scope creep?
But the branch/PR has the title: PK-1071 Update user data request list page
I’m just guessing, but looking at the changes for 3.67.0 it seems like this patch might be the likely cause?
Based on the information you’ve provided and your current pipeline step setup using the skip-checkout plugin, it seems the root of the issue lies in how metadata, specifically the buildkite:git:commit key, is being handled.
As you’ve noted, version 3.67.0 of the Buildkite agent introduces functionality where the agent sets a special metadata key buildkite:git:commit during the checkout process. This key includes vital information about the commit such as the commit message, author, and hash (details found in the respective pull request). The use of the skip-checkout plugin inherently skips the standard checkout process and as I can see from the build you have shared, this value is being set as ( buildkite-agent meta-data set buildkite:git:commit < /dev/stdin) , and the value is being read from STDIN.
This likely seems to be the cause for your issue. I do understand that you would like to perform a minimal and focused checkout of only necessary files , and then use these files to update or configure the pipeline dynamically. This is a quite interesting approach as the use of a temporary directory ensures that these operations are performed in isolation and do not interfere with the rest of the workspace or persist beyond the lifecycle of the build. However I would suggest to temporarily disable the skip-checkout plugin in a test environment to confirm the behaviour. This will help establish whether the plugin’s omission of the checkout process is the definitive cause of the issue.
We are skipping the default checkout and using a sparse-checkout of just the .buildkite/ directory and its contents because the repository is > 400 MB in size but for the pipeline upload step it only needs .buildkite/pipeline.yml that is only a few kilobytes to perform that step.
With your previous build using the skip-checkout, the agent is using a previously cloned/checked out version of the repository. This will be the commit with message “NOOT…”. So the agent won’t do any standard checkout steps like updating the repo to BUILDKITE_COMMIT so it is left as is. The agent then looks into the current HEAD and sets that into the special “buildkite:git:commit” meta-data - which our server uses to set the build message. This is why the message is wrong.
So, to fix this, you’ll need to implement your own agent level checkout hook which will run before the agent sets the git commit. You are currently doing the sparse-checkout as part of your command but it is too late at that point. You will need to run sparse-checkout logic as part of your agent checkout hook and add buildkite-agent meta-data set ‘buildkite:git:commit’ “$$BUILDKITE_COMMIT” to the hook.
Also, note, setting custom logic at agent hook level will impact all pipelines so make sure to use condition for your changes.
You’ve had us scratching our heads at this one for a while hence it taking while to respond. But I think we see the full problem now.
The agent PR you linked originally looks to be the culprit here. But at the same time, I think the PR has introduced better logic that has actually highlighted a problem with the design of your sparse-checkout anyway - that it could lead to rebuilds targeting different commits over time. Its not likely to cause an issue when builds are triggered from GitHub webhooks (I think), but moreso for manual builds.
Regardless, I think the way forward here is to not use the “skip checkout” plugin and to instead create a “sparse-checkout” plugin.
I’ve got the team looking if we can help you with a plugin for this as I think it makes sense for us to provide a spare-checkout for use as a step upload, but I’ve also found this one that exists already: GitHub - pragmaplatform/sparse-checkout-buildkite-plugin.
Can you give that a go with the .buildkite directory and let us know if it works?
I guess I should point out that because we’ve had out own checkout for the first step that prepares the pipeline upload, the heading the build gets seems to be from the 2nd step were we don’t do that.
And as athreya pointed out above it’s using buildkite-agent meta-data set buildkite:git:commit < /dev/stdin at that point.
The second step in the build I linked in the first post has the following output which is where it is picking up the warning about the branch being left behind and using that for the heading.
$ git checkout -f 5d12580388cbcbe8370f66f5cbd25b3833f1cee2
Warning: you are leaving 1 commit behind, not connected to
any of your branches:
e78bcc55cb0 NOOT-640 Is this finally the end of scope creep?
If you want to keep it by creating a new branch, this may be a good time
to do so with:
git branch <new-branch-name> e78bcc55cb0
HEAD is now at 5d12580388c PK-1071 Update user data request list page (#13078)
When instead it should be picking up the HEAD is now at 5d12580388c PK-1071 Update user data request list page (#13078) message instead.
I don’t think this would apply to your use-case as your builds are coming from GitHub where the specific commit SHA to checkout is known. So if you hit the rebuild button from such a build, it would still use that commit SHA.
The issue is more around manual builds that target HEAD. As HEAD moves from commit to commit, if you previously rebuilt a build targeting HEAD, it would end up running on the latest HEAD from a pull. Whereas with this change to the agent which sends the actual commit metadata to BK, we’ll know for a rebuild of a HEAD to target the same commit SHA
I’ll share how this whole process works in the default flow because I think things are getting mixed up.
A build starts and the very first step to run will pull the repo to the given commit then extract the commit metadata
The agent then checks if the buildkite:git:commit meta-data key is set on the server
if it is, it doesn’t overwrite it
if its not, it will send it up
The server parses that special meta-data and will set the build message to the commit message from that meta-data key
This happens immediately after the checkout hook. So the state of the repository needs to be correct at this point. The issue with your current approach is that the checkout hook is being skipped and the state of the repo is being set by the command hook.
So when an agent goes to run this upload step (with a sparse checkout) after previously running a “standard” step, it doesn’t update the repository during the checkout hook because its skipped. So the repository state is still pointing at the previous step the agent run (the one with the NOOT… message). And so now, when the agent uploads the meta-data, it reads that from the stale repo state.
Leads us to how to fix this: the state of the repository needs to be set from the checkout hook as this is the last point before the agent will set meta-data/build message. So basically, you will need to write a plugin with the custom sparse-checkout logic you wish to achieve.
The logic in our script for the sparse checkout follows
# Set some defaults, some may come from a Buildkite pipeline.
BRANCH="${BUILDKITE_BRANCH:-master}"
DIR="."
REPO="$BUILDKITE_REPO"
These defaults can be overridden via command line options…
The PATHS array is built up from command line parameters.
Then there’s stuff running the ssh-keyscan
Then finally the sparse checkout itself.
if [[ ! -d "$DIR/.git" ]]
then
echo "- Performing empty clone of the $BRANCH branch of $REPO"
# If the Buildkite git-mirrors feature is being used, clone from that.
git clone \
--branch="$BRANCH" \
--depth 1 \
--filter=blob:none \
--no-checkout \
${BUILDKITE_REPO_MIRROR:+--reference "$BUILDKITE_REPO_MIRROR"} \
"$REPO" \
"$DIR"
elif ! git -C "$DIR" config --get remote.origin.url | grep -Eiq "^$REPO\$"
then
echo >&2 "Error: A different repository is already checked out to $DIR - Exiting..."
exit 1
else
echo "- Repo already checked out. Fetching updates"
git -C "$DIR" fetch --depth 1 origin "$BRANCH"
fi
echo "- Configuring git sparse checkout"
git -C "$DIR" sparse-checkout set --no-cone "${PATHS[@]}"
echo "- Checking out selected files in the repository"
git -C "$DIR" checkout
The --filter=blob:none is needed otherwise the sparse checkout still ends up pulling the repo over the network even though it checks no files out.
I think the presence of the < /dev/stdin you see in the logs might be throwing off here. But its stdin from the git log process run by the agent.
Ah yes, the current plugin hook doesn’t do a keyscan so it might hang there waiting for human input to accept the keys. That will need updating.
It also doesn’t handle mirror repositories. I think the best thing would be for you to fork and apply these changes to the hook so that it works for you, then we can work on how that would best fit into a community plugin
In the 2nd step (the one after the pipeline upload one) that is labelled: Upload mjml html to artifacts and is in the Preparing working directory section of that.
The sparse-checkout of your current plugin downloads a lot more than is needed.
It ends up basically pulling all the files of the repository down still.
I forked it and tried my own version which really does just check out the files needed.
That works fine, BUT because it isn’t in its own separate directory below /var/lib/buildkite-agent/builds/$BUILDKITE_AGENT_NAME/$BUILDKITE_ORGANIZATION_SLUG/$BUILDKITE_PIPELINE_SLUG the sparse checkout changes it makes affects future git clones…
So when the next step comes along that performs the default git clone steps, it still only pulls the files configured by the git sparse-checkout command.
If I add a git sparse-checkout disable at the end of the plugin, then that also defeats the purpose because at that point all files are downloaded.
At the moment the hooks/checkout command in my branch looks as follows
#!/bin/bash
set -euo pipefail
DIR="$(cd "$( dirname "${BASH_SOURCE[0]}" )" && pwd)"
# shellcheck source=lib/plugin.bash
. "$DIR/../lib/plugin.bash"
if plugin_read_list_into_result PATHS; then
CHECKOUT_PATHS=("${result[@]}")
else
echo "Missing 'paths' option in the plugin"
exit 1
fi
echo "Scanning SSH keys for remote git repository"
[[ -d ~/.ssh ]] || mkdir -p ~/.ssh
ssh-keyscan "${BUILDKITE_REPO_SSH_HOST}" >> ~/.ssh/known_hosts
echo "Creating sparse-checkout with paths: ${CHECKOUT_PATHS[*]}"
# clone the repo without checking out files if it does not exist already
if [[ ! -d .git ]]; then
git clone \
--depth 1 \
--filter=blob:none \
--no-checkout \
${BUILDKITE_REPO_MIRROR:+--reference "$BUILDKITE_REPO_MIRROR"} \
-v \
"${BUILDKITE_REPO}" .
fi
git clean -ffxdq
# fetch branch if commit is HEAD
if [[ ${BUILDKITE_COMMIT} = "HEAD" ]]; then
git fetch --depth 1 origin "${BUILDKITE_BRANCH}"
else
git fetch --depth 1 origin "${BUILDKITE_COMMIT}"
fi
git sparse-checkout set --no-cone "${CHECKOUT_PATHS[@]}"
git checkout
I could potentially add steps to use mktemp and then have the sparse checkout work with the content in there and it would probably solve the issue…
I’m not sure then if this plugin will be something you want or not?
Raise a PR even though it is broken?
As it is now, future checkouts don’t work because they still use the sparse settings and so they don’t checkout everything that they need to.
Yeah I think create a PR from the current state of your fork. It will be much easier to work together on this to create a working solution if we do it from GitHub
I think I have a solution for the subsequent checkouts problem but will leave it to collaborate on the PR
As you collaborate with Jarryd on this matter, I’ve also raised this with our pipelines team parallelly to see if they can advice us of some alternatives.
Jordan an engineer from the Pipelines team here. We’ll catch up on this thread and then see if we can figure something out to resolve this. In the meantime continue to work with @paula as it sounds like he has a way forward.