Add option to `bail_early` for steps that fails all sibling steps that are still running

chaseadamsio · August 29, 2019, 10:08pm

Currently, if a build has multiple parallel steps and any of those steps will cause the build to fail, there’s no way to tell steps to kill the whole build. If a build has more than one step and one of the steps fails, that means that potentially long-running steps could be unnecessarily causing resource contention by waiting to finish, even if the build will fail after the last step is done.

Ideally, I would be able to setup my pipeline so that any (or all) steps, based on a configuration option, would notify the BuildKite step manager that the step failed and will cause the whole build to fail and the step manager should kill all sibling steps.

I know that there’s an option to "wait" between critical steps as an alternative method, but if I have to fine-tune my "wait" sequences to be able to do this (for example, currently our first stage before a "wait" could fail in as early as 3 minutes, but the longest running step (a build used in steps in the next stage) could take anywhere between 10-15 minutes, so the feedback loop that would ideally be 3 minutes if the shortest step failed won’t signal the user until the longest (10-15 min) fails.

If there was a bail_early to signal the job manager to stop as “neutral” (or fail) all sibling steps, that would help us a lot in faster feedback and in reducing resource contention by releasing unnecessarily acquired nodes.

anon85971592 · August 30, 2019, 2:42am

I really dig this idea @chaseadamsio! Let me brew on it and we’ll see what we can do.

patmantru · September 3, 2019, 3:08pm

I’d also like to see a ‘bail early’ feature…but allow setting the build result to fail or pass. Granted, bailing with a pass makes less sense in context with canceling sibling steps, but it would be very handy for a singleton step to be able to terminate early but still report success. Kind of like running make when the targets are up to date.

I was working on an automated VERSION file update step, which works great…except that when BK does the git push, that triggers another build…which updates the VERSION file again, lather, rinse repeat. One way to get around that would be to have the first job push the update to the version file and then bail out with success on the rest of the steps, and let the second job run to completion.

chaseadamsio · September 3, 2019, 3:50pm

if there’s any other insights I can give (or if you need to kick around the thought with a second set of ears), please let me know.

evanrelf · March 25, 2021, 8:18pm

This feature would be really valuable for my company as well. We generate our Buildkite config dynamically at runtime, based on the Nix dependency graph, so we have a lot of jobs running in parallel. I think auto-cancelling jobs using something like bail_early could save a lot of wasted CI time.

Topic		Replies	Views
Fast Fail the entire pipeline Pipelines	5	491	August 8, 2024
Command argument to cancel the whole build in case of fail Features Requests	1	494	July 2, 2019
Cancel_on_build_failing with fail counter Features Requests	6	735	January 24, 2023
Fail-fast Github commit status Features Requests	1	1025	October 28, 2019
Mark some jobs as non-cancellable Features Requests	2	246	May 21, 2022

Add option to `bail_early` for steps that fails all sibling steps that are still running

Related topics