Rerun the failed step on the same agent

zenogueira · May 9, 2025, 9:20am

It would be helpful for debugging purposes if we could rerun the failed step on the same agent. This would help determine if we have flaky build steps.

Sometimes the issue for a failed step might lie in the infra provisioning, and retrying the step will just run it in another container/machine and pass. This masks the issue and won’t get addressed. Then next time the same step runs on that agent, it will fail again and be dismissed again as flaky tests, but won’t help calling attention to the infra situation.

This is something super easy to implement and would be quite helpful for building more robust pipelines.

paula · May 9, 2025, 2:53pm

Thanks @zenogueira for your suggestion.

Often, retrying the job places it back on the same agent (that has the same targeting rules), even if there are other agents available. This is partly because we give preference to agents that have most recently run a job, because they’re more likely to have warm caches etc.
If a step failed, we considered that something was off with that agent, and It’s possible that an agent with a warmer cache takes preference.

Some discussions are happening around how we can allow more granular rules around jobs that get assigned to particular agents, and these situations are part of this discussion.
We haven’t scheduled the work yet, but we appreciate your feedback.

zenogueira · May 13, 2025, 9:45am

Often, retrying the job places it back on the same agent (that has the same targeting rules), even if there are other agents available.

This is not my experience. Might have something to do with our specific setup, but I don’t think I ever saw this happening. I see that job assignment tends to favour certain bots instead of others, but I don’t see bots running the same job twice.

I think this feature would be quite helpful for debugging purposes, please add this topic to the discussions if possible.

Thank you

Topic		Replies	Views
Restart step in a different agent Features Requests	9	1660	March 21, 2025
Ability to have downstream steps trigger again on the successful retry of an upstream step Features Requests	4	719	March 21, 2025
Allow the rerunning of successful pipeline steps Features Requests	2	1301	March 2, 2022
Mix automatic retry with manual retry Pipelines	5	330	November 7, 2023
Option to retry failed jobs within a group Features Requests	1	9	May 20, 2025

Rerun the failed step on the same agent

Related topics