Automatically retry failed steps on AGENT_STOP

noizwaves · February 3, 2021, 6:21pm

Hey folks,

I’d like to submit a feature request where steps that fail because of an AGENT_STOP exit code are automatically retried.

When an agent is stopped, we see Received cancellation signal, interrupting in the log. As this can happen at any time, this can interrupt any hook, plugin, or command that runs. This makes it tricky to implement with the automatic_retry logic at a step level.

Looking at the timeline of such affected steps, it looks like Buildkite knows when a agent is stopped:

Command Exit Status	1
Command Exit Signal	None: The process intercepted the signal and exited with the above status
Exit Signal Reason	AGENT_STOP: The agent sent the signal to the process because the agent was stopped

Given that Buildkite seems to know that the agent was stopped, it would help us a lot if these steps could be automatically retried.

Jason · February 5, 2021, 12:20am

Thanks for the feature request!

This is something that is on our radar, but we currently don’t have a timeline as yet. We will update this thread as we know more!

Topic		Replies	Views
How can we retry jobs only when the agent is gracefully terminated? Pipelines	10	2616	August 9, 2022
Reschedule builds on other agents rather than Fail builds when agents time out or are killed (machine shut down or put to sleep) Features Requests	5	1762	December 19, 2020
Restart step in a different agent Features Requests	9	1663	March 21, 2025
Rerun the failed step on the same agent Features Requests	2	25	May 13, 2025
Auto-restart job on terminated spot instance? Elastic CI Stack for AWS	6	1692	June 29, 2021

Automatically retry failed steps on AGENT_STOP

Related topics