Stopping the pipeline on first failure

I have three shell scripts that execute in order. Each has ‘set -e’ in place to ensure that the return code is non-zero. I expected the pipeline to stop at first failure. It does not. I see GitHub - envato/stop-the-line-buildkite-plugin: Stop Buildkite pipelines based on build meta-data values., but I cannot figure out how to configure it for a non-YAML pipeline.

Of course, I have a workaround - I can delete two of the .sh scripts and put their contents in a single monolithic one. Defeats the elegance of a build pipeline, of course :-(

Advice?

  • Paul
1 Like

Hey Paul, by the sounds of it - I think you should be able to split your 3 shell scripts into 3 seperate build steps, with a “wait” step in between.

In YAML, it’d look something like this:

steps:
  - command: “script1.sh”
  - wait
  - command: “script2.sh”
  - wait
  - command: “script3.sh”

And using the Step UI it’d look like this:

With this setup, the pipeline will fail and stop running if script1 or script2 fails.

Does something like that work?

Trying that now Keith. I was looking in the wrong place!! (user error)

No worries, let me know how you go!

This “solution” has a big drawback.
If you have two steps StepA, StepB which supposed to run in parallel (on different machines, say for deploy) and if these steps create substeps connected with ‘wait’, such waits will force substeps of StepA to wait for substeps for StepB and kill all parallelity.

So inconvenient.

1 Like

So inconvenient.

It’s also a massive waste of time to wait for each to complete.

1 Like

The key bit of this “solution” was the fact that they had to be run in order (no parallelism).

Wait steps are a perfect fit for this sort of use case, and it is indeed, the right answer as far as the pipeline is concerned.

Having said that @Dzenly, I’m super keen to hear more about the sorts of pipelines your building where wait steps aren’t a good fit - perhaps you can create another topic? If you create one and list out kinda what you wanna see, I’ll see what I can do to it make happen! (Also, please do the same @nathan.pierce - would love to hear your thoughts too)

1 Like

Thank you for answer.

I am already created topics with feature requests. Actually I need argument like agents to wait steps.

I will try to clean up the issue here.

I have pipelineA.
I have build step which can perform on 3 agents in parallel. Everything is good with build step.
Then I have many deploy steps (in the pipelineA) which work on separate machines and which use branch filtering. There are cases where the same build is deployed on several machines. Say I have machineA1 and machineA2 to deploy branch branchA.
This deploy machines have buildkite-agents: agent machineA1-1 deploys to machineA1 and agent machineA2-1 deploys to machine A2.

Of course deploy steps should go in sequence (not in parallel) on some machine: download artifacts, unzip them, copy, restart application. And of cource next step must not work if previous step failed. So I used wait.

But deploys on machineA1 and on machineA2 should go in parallel cause they are independent.

But “wait steps” on machineA1 waits for steps on machineA2 without any sense for my workflow.

I could use triggers and many deploy pipelines, but it will lose my structure where one parrent build pipeline contains child deploy pipelines inside.

In my workflow “wait” steps were inappropriate.

So my workaround (where steps are on the same agent, if not - meta data could be used):
for the first step in buildkite GUI I set variable: FIRST_STEP=yes.

My pre-command hook:

#!/bin/bash

echo FIRST_STEP: $FIRST_STEP

if [[ -n "$FIRST_STEP" ]]; then
  rm -f /opt/ci/inner/skip || true
  exit 0
fi

if [[ -f "/opt/ci/inner/skip" ]]; then
    SKIP_STEPS=$(cat /opt/ci/inner/skip)
    export SKIP_STEPS
    echo SKIP_STEPS: $SKIP_STEPS
    exit 1
fi

My post-command hook:

#!/bin/bash

echo BUILDKITE_COMMAND_EXIT_STATUS $BUILDKITE_COMMAND_EXIT_STATUS
if [ $BUILDKITE_COMMAND_EXIT_STATUS -ne 0 ]; then
  echo "Steps are skipped because $BUILDKITE_AGENT_NAME:$BUILDKITE_LABEL returned $BUILDKITE_COMMAND_EXIT_STATUS" > /opt/ci/inner/skip
fi

So if some step fails other steps on the same agent will fail also.

I tried set export SKIP_STEPS=yes in pre-command hook and use skip: "$SKIP_STEPS" and skip: "$$SKIP_STEPS in my commands, but it does not work.