Restart step in a different agent

tiangolo · November 22, 2019, 7:04am

Hello Buildkite team!

Just a feature request:

As a software developer, checking the CI results of the branch I’m working, while not necessarily having access to more advanced/admin configurations, I would like to be able to restart a job for a step in a pipeline in a different agent than the one it just ran.

This is because, sometimes, the failure is related to the current state of the machine in which the tests ran, so it’s specific to that machine’s agent. An example could be having a full disk, not enough memory, etc.

When I restart a job, it always restarts in the same agent that it just ran.

The other option I have is to restart the full build, for the full pipeline. But that would take a lot of time, as some of the steps take several minutes, and the specific step that I currently care about could be run on the same agent/machine that is having problems.

SeanR · November 27, 2019, 3:26pm

We have a similar need, but for different reasons. In our case, we have some jobs that deploy resources to machines in China.

The Great Firewall, however, sometimes decides that the dynamic IP address assigned to our cloud build agent isn’t allowed to talk to China. Thus, retries on this machine will never work, but running the same job from a different machine would.

Right now, the best answer we have is to shut down the cloud agent that’s blacklisted and start up a new instance that gets a new IP, and just hope that one isn’t also blacklisted. This usually works, but is obviously far from ideal.

ccarpita-butterfly · June 9, 2020, 4:09pm

For us, just having a signal of a subsequent failure on a different agent would give us a high-quality signal in distinguishing code-specific failures from host-specific failures. While local caches might be a useful counterpoint, we tend to clear all local caches for each build for reproducibility.

mycarrysun · February 18, 2022, 7:00pm

Are there any updates here? We have the same use case as OP, where agents can run out of disk space and we need a way to target a different agent on retries.

jeremy · February 19, 2022, 12:51am

Hi again @mycarrysun, and welcome to the community!

Thank you for bumping these threads, I’ve reached out to the team to see where things are at with this request, and we’ll update the thread once we have some more information.

Jason · February 21, 2022, 6:14am

Hey folks,

We raised this with the team and it’s on the road map, but there is no timeline around it as yet. We will keep y’all updated in this thread when there is more movement on it.

alexf · March 14, 2024, 3:49am

+1. @Jason , do you have a timeline on this? We’ve just discussed building this feature client-side after some issues where a dodgy agent failed many jobs despite retry. I’d love to know if this is going to happen in the next few months, in which case I’ll tell my team to hold off.

benmc · March 14, 2024, 3:59am

Hey @alexf!

Thanks for posting and welcome to the community!

I’ve taken a look at this in the backlog and it is being investigated however it’s not something that will be released within the next few months while we work on other features and additions to Buildkite.

I’ll mark your +1 on the request though as it allows us to build a metric on how often requests come up and how they should be prioritised.

Cheers

samsternatretool · March 21, 2025, 10:31pm

+1 to this feature request we’d really like to set retries to not use the same agent they just failed on.

suma · March 21, 2025, 10:47pm

Hey @samsternatretool ,

This is Suma from Buildkite support team. Thank you for reaching out and sharing your feedback on this.

This is currently in our backlog and team is looking into it. We do not have an ETA yet on this but I update it with your feedback.

Thank you again for sharing this feedback.

Thanks,
Suma

Topic		Replies	Views
Rerun the failed step on the same agent Features Requests	2	20	May 13, 2025
Reschedule builds on other agents rather than Fail builds when agents time out or are killed (machine shut down or put to sleep) Features Requests	5	1757	December 19, 2020
Automatically retry failed steps on AGENT_STOP Features Requests	1	762	February 5, 2021
Isn't All Agents Supposed To Execute All Jobs? General	2	269	August 29, 2023
Permissions on retry Features Requests	2	444	April 19, 2021

Restart step in a different agent

Related topics