How would you make a step conditional on an agent being online?

James · April 24, 2020, 2:10pm

I have a job that can only run on a certain device.
The device isn’t always available and I don’t want the step to fail if it can’t be run.

I’m guessing something like a first step that gets a list of the online agents over the API but not sure how to tie that into the second step.

Any suggestions?

anon17095254 · April 24, 2020, 4:33pm

Hi James, welcome to the community and thank you for posting a message. When you say the device isn’t always available, is that just for a short period of time and you want the job to wait until it is available?

Maybe you could write a short script that checks for the availability of the device and add that as a step before continuing the job, so something like:

steps:
  - command: "device_heartbeat.sh"
    key: "device_check"
  - command: "build.sh"
    key: "build"
    depends_on: "device_check"

Do you think that could work? You can read more about defining explicit dependencies here https://buildkite.com/docs/pipelines/dependencies#defining-explicit-dependencies.

James · April 25, 2020, 3:09pm

I should have included more detail, sorry about that.
For example I also don’t want to see the step that determines the device is unavailable to show a red fail tick in GitHub. More or less what you wrote is what I have attempted.
But the key thing that did work out for me in the end was to use soft_fail: true on the step the checks for device availability and I made the next step depend on that one.

Thanks!

James · April 25, 2020, 3:48pm

Oh I spoke too soon.

If I make the device check step a soft fail step that’s fine but the next step that is dependent on the device check step just sits there waiting for an agent. I hoped it would just not run.

James · April 27, 2020, 8:16am

If I could use a condition in the second step that could test if the device check step (which is soft_fail) succeeded or not that would work. I can’t see a way to pass the success or failure of that step to it’s dependant.

An if: device_check_failed or similar would be how I imagine it.

anon17095254 · April 28, 2020, 4:16am

The key thing to know here is whether the device being unavailable is just for a short period and if you want to wait for it. I was thinking that device_heartbeat.sh would be a script that loops until it gets the sign of life from the device (maybe wrapped in a longer timeout so that you’re not stuck forever).

James · April 28, 2020, 7:53am

Ah right. Again, sorry for the lack of clarity.

The device could be unavailble for long periods.
I don’t want to wait for it to come back.
I want it to be as if the step that would run a job on it did not exist if the device is not available and the build to succeed. If the device is available then the step can run and fail or succeed as per usual.

anon17095254 · April 28, 2020, 8:17pm

Ah gotcha. To reiterate:
You would like to run a job on a particular agent but only if the agent is available. If the agent is not available you just want the pipeline to continue.

To do that you can use dynamic pipelines.

Here’s how it would work:

You run a script that checks for the availability of the device (e.g. with ping) or the agent (via the Agents API)
The outcome of the script determines the next pipeline step
To ensure that the job runs on a specific agent you can use Agent targeting.

I ran a test to see if it works. Here’s what I did:
I have a niktest.yml in my .buildkite folder
I have a script.sh in my src folder
I started an agent with hostname “swarm-1-1” and a tag so that I can target it with:
buildkite-agent start --tags "queue=special"

Following the dynamic pipelines doc I put the following in niktest.yml:

steps:    
  - command: src/script.sh | buildkite-agent pipeline upload
    label: ":hammer:"
  - wait
  - command: echo "Final step in YAML file"
    label: ":smile: Done!"

This is the content of script.sh:

#!/bin/bash

# Note that we don't enable the 'e' option, which would cause the script to
# immediately exit if it fails
set -uo pipefail
TOKEN="123123mysecrettoken12312312"

# Query all agents and set the SUB string to the desired agent hostname
STR=$(curl -H "Authorization: Bearer $TOKEN" https://api.buildkite.com/v2/organizations/niks-playground/agents)
SUB="swarm-1-1"
read -r -d '' VAR << EOM
- command: echo "It worked!"
  label: ":sparkles:"
  agents:
      queue: "special"
EOM

# begin the pipeline.yml file
echo "steps:"

# add a new command step based on agent availability
if [[ "$STR" == *"$SUB"* ]]; then
    echo "$VAR"
else
    echo "  - command: echo \"Skipped\"" 
fi

VAR contains the steps that are injected into the pipeline if the agent is available.
This is what I got when the agent was not running:

This is what I got when it was:

That should do it, right?

James · April 29, 2020, 7:54pm

Ah that’s a clever use of the dynamic pipelines.
Thanks for the solution!

Topic		Replies	Views
How to quickly in one step determine if you need to run the rest Pipelines	7	487	April 6, 2023
Conditional trigger skip if previous step failed General	3	1311	May 27, 2022
Conditioning a step on the EXIT_CODE of the previous step General	5	3264	January 5, 2021
Run step based on status of another step syntactic sugar Features Requests	7	101	October 29, 2024
Conditional logic in bootstrap script to not fail buildkite step without running further commands Elastic CI Stack for AWS	3	61	November 4, 2024

How would you make a step conditional on an agent being online?

Related topics