Hi James, welcome to the community and thank you for posting a message. When you say the device isn’t always available, is that just for a short period of time and you want the job to wait until it is available?
Maybe you could write a short script that checks for the availability of the device and add that as a step before continuing the job, so something like:
I should have included more detail, sorry about that.
For example I also don’t want to see the step that determines the device is unavailable to show a red fail tick in GitHub. More or less what you wrote is what I have attempted.
But the key thing that did work out for me in the end was to use soft_fail: true on the step the checks for device availability and I made the next step depend on that one.
If I make the device check step a soft fail step that’s fine but the next step that is dependent on the device check step just sits there waiting for an agent. I hoped it would just not run.
If I could use a condition in the second step that could test if the device check step (which is soft_fail) succeeded or not that would work. I can’t see a way to pass the success or failure of that step to it’s dependant.
An if: device_check_failed or similar would be how I imagine it.
The key thing to know here is whether the device being unavailable is just for a short period and if you want to wait for it. I was thinking that device_heartbeat.sh would be a script that loops until it gets the sign of life from the device (maybe wrapped in a longer timeout so that you’re not stuck forever).
The device could be unavailble for long periods.
I don’t want to wait for it to come back.
I want it to be as if the step that would run a job on it did not exist if the device is not available and the build to succeed. If the device is available then the step can run and fail or succeed as per usual.
Ah gotcha. To reiterate:
You would like to run a job on a particular agent but only if the agent is available. If the agent is not available you just want the pipeline to continue.
You run a script that checks for the availability of the device (e.g. with ping) or the agent (via the Agents API)
The outcome of the script determines the next pipeline step
To ensure that the job runs on a specific agent you can use Agent targeting.
I ran a test to see if it works. Here’s what I did:
I have a niktest.yml in my .buildkite folder
I have a script.sh in my src folder
I started an agent with hostname “swarm-1-1” and a tag so that I can target it with: buildkite-agent start --tags "queue=special"
Following the dynamic pipelines doc I put the following in niktest.yml:
#!/bin/bash
# Note that we don't enable the 'e' option, which would cause the script to
# immediately exit if it fails
set -uo pipefail
TOKEN="123123mysecrettoken12312312"
# Query all agents and set the SUB string to the desired agent hostname
STR=$(curl -H "Authorization: Bearer $TOKEN" https://api.buildkite.com/v2/organizations/niks-playground/agents)
SUB="swarm-1-1"
read -r -d '' VAR << EOM
- command: echo "It worked!"
label: ":sparkles:"
agents:
queue: "special"
EOM
# begin the pipeline.yml file
echo "steps:"
# add a new command step based on agent availability
if [[ "$STR" == *"$SUB"* ]]; then
echo "$VAR"
else
echo " - command: echo \"Skipped\""
fi
VAR contains the steps that are injected into the pipeline if the agent is available.
This is what I got when the agent was not running: