Currently, if a build has multiple parallel steps and any of those steps will cause the build to fail, there’s no way to tell steps to kill the whole build. If a build has more than one step and one of the steps fails, that means that potentially long-running steps could be unnecessarily causing resource contention by waiting to finish, even if the build will fail after the last step is done.
Ideally, I would be able to setup my pipeline so that any (or all) steps, based on a configuration option, would notify the BuildKite step manager that the step failed and will cause the whole build to fail and the step manager should kill all sibling steps.
I know that there’s an option to "wait"
between critical steps as an alternative method, but if I have to fine-tune my "wait"
sequences to be able to do this (for example, currently our first stage before a "wait"
could fail in as early as 3 minutes, but the longest running step (a build used in steps in the next stage) could take anywhere between 10-15 minutes, so the feedback loop that would ideally be 3 minutes if the shortest step failed won’t signal the user until the longest (10-15 min) fails.
If there was a bail_early
to signal the job manager to stop as “neutral” (or fail) all sibling steps, that would help us a lot in faster feedback and in reducing resource contention by releasing unnecessarily acquired nodes.