How would you make a step conditional on an agent being online?

I have a job that can only run on a certain device.
The device isn’t always available and I don’t want the step to fail if it can’t be run.

I’m guessing something like a first step that gets a list of the online agents over the API but not sure how to tie that into the second step.

Any suggestions?

Hi James, welcome to the community and thank you for posting a message. When you say the device isn’t always available, is that just for a short period of time and you want the job to wait until it is available?

Maybe you could write a short script that checks for the availability of the device and add that as a step before continuing the job, so something like:

steps:
  - command: "device_heartbeat.sh"
    key: "device_check"
  - command: "build.sh"
    key: "build"
    depends_on: "device_check"

Do you think that could work? You can read more about defining explicit dependencies here https://buildkite.com/docs/pipelines/dependencies#defining-explicit-dependencies.

I should have included more detail, sorry about that.
For example I also don’t want to see the step that determines the device is unavailable to show a red fail tick in GitHub. More or less what you wrote is what I have attempted.
But the key thing that did work out for me in the end was to use soft_fail: true on the step the checks for device availability and I made the next step depend on that one.

Thanks!

Oh I spoke too soon.

If I make the device check step a soft fail step that’s fine but the next step that is dependent on the device check step just sits there waiting for an agent. I hoped it would just not run.

If I could use a condition in the second step that could test if the device check step (which is soft_fail) succeeded or not that would work. I can’t see a way to pass the success or failure of that step to it’s dependant.

An if: device_check_failed or similar would be how I imagine it.

The key thing to know here is whether the device being unavailable is just for a short period and if you want to wait for it. I was thinking that device_heartbeat.sh would be a script that loops until it gets the sign of life from the device (maybe wrapped in a longer timeout so that you’re not stuck forever).

Ah right. Again, sorry for the lack of clarity.

The device could be unavailble for long periods.
I don’t want to wait for it to come back.
I want it to be as if the step that would run a job on it did not exist if the device is not available and the build to succeed. If the device is available then the step can run and fail or succeed as per usual.

Ah gotcha. To reiterate:
You would like to run a job on a particular agent but only if the agent is available. If the agent is not available you just want the pipeline to continue.

To do that you can use dynamic pipelines.

Here’s how it would work:

  1. You run a script that checks for the availability of the device (e.g. with ping) or the agent (via the Agents API)
  2. The outcome of the script determines the next pipeline step
    To ensure that the job runs on a specific agent you can use Agent targeting.

I ran a test to see if it works. Here’s what I did:
I have a niktest.yml in my .buildkite folder
I have a script.sh in my src folder
I started an agent with hostname “swarm-1-1” and a tag so that I can target it with:
buildkite-agent start --tags "queue=special"

Following the dynamic pipelines doc I put the following in niktest.yml:

steps:    
  - command: src/script.sh | buildkite-agent pipeline upload
    label: ":hammer:"
  - wait
  - command: echo "Final step in YAML file"
    label: ":smile: Done!"

This is the content of script.sh:

#!/bin/bash

# Note that we don't enable the 'e' option, which would cause the script to
# immediately exit if it fails
set -uo pipefail
TOKEN="123123mysecrettoken12312312"

# Query all agents and set the SUB string to the desired agent hostname
STR=$(curl -H "Authorization: Bearer $TOKEN" https://api.buildkite.com/v2/organizations/niks-playground/agents)
SUB="swarm-1-1"
read -r -d '' VAR << EOM
- command: echo "It worked!"
  label: ":sparkles:"
  agents:
      queue: "special"
EOM

# begin the pipeline.yml file
echo "steps:"

# add a new command step based on agent availability
if [[ "$STR" == *"$SUB"* ]]; then
    echo "$VAR"
else
    echo "  - command: echo \"Skipped\"" 
fi

VAR contains the steps that are injected into the pipeline if the agent is available.
This is what I got when the agent was not running:

This is what I got when it was:

That should do it, right?

1 Like

Ah that’s a clever use of the dynamic pipelines.
Thanks for the solution!