Global Retry Settings

pecigonzalo · July 2, 2019, 10:45am

Given some ephemeral and “dynamic” environments (kubernetes, spot instances, etc) sometimes a job may fail due to agent lost, which in most cases we wish to retry.
It would great to have a global/pipeline level setting (maybe with env vars we can set on the agent as default) for automatically retying this cases as otherwise we need to past this setting to every step.

benmonkey · November 4, 2019, 3:12am

This alludes to spot instances, but to add one clear use case that’s impacting us is that we use Google Cloud and Preemptible instances.

These instances are cycled in 24 hours or less, so the symptom of losing hosts is rather prolific across all of our pipelines, and all of our steps. Having an ability to set a standard retry either on the pipeline level or across all pipelines in a project would be helpful. As is, we’ll need to go through and ad this to every step in all of our pipelines.

Sam · December 2, 2019, 1:06am

Yeah we would also love this

onsails · October 22, 2020, 7:45pm

Yes, we also run agents in kubernetes cluster which is running on preemptible instances.
It’s very redundant to specify the same retry options for each step in each pipeline.

Topic		Replies	Views
Limiting agent access to specific jobs within a pipeline Features Requests	2	84	June 28, 2024
Reschedule builds on other agents rather than Fail builds when agents time out or are killed (machine shut down or put to sleep) Features Requests	5	1760	December 19, 2020
Is buildkite-agent intended to be used on preemptible instances? General	7	1613	December 25, 2020
Auto-restart job on terminated spot instance? Elastic CI Stack for AWS	6	1686	June 29, 2021
Restart step in a different agent Features Requests	9	1659	March 21, 2025

Global Retry Settings

Related topics