Automatically retry failed steps on AGENT_STOP

Hey folks,

I’d like to submit a feature request where steps that fail because of an AGENT_STOP exit code are automatically retried.

When an agent is stopped, we see Received cancellation signal, interrupting in the log. As this can happen at any time, this can interrupt any hook, plugin, or command that runs. This makes it tricky to implement with the automatic_retry logic at a step level.

Looking at the timeline of such affected steps, it looks like Buildkite knows when a agent is stopped:

Command Exit Status	1
Command Exit Signal	None: The process intercepted the signal and exited with the above status
Exit Signal Reason	AGENT_STOP: The agent sent the signal to the process because the agent was stopped

Given that Buildkite seems to know that the agent was stopped, it would help us a lot if these steps could be automatically retried.

Thanks for the feature request! :yellow_heart:

This is something that is on our radar, but we currently don’t have a timeline as yet. We will update this thread as we know more!