Given some ephemeral and “dynamic” environments (kubernetes, spot instances, etc) sometimes a job may fail due to agent lost
, which in most cases we wish to retry.
It would great to have a global/pipeline level setting (maybe with env vars we can set on the agent as default) for automatically retying this cases as otherwise we need to past this setting to every step.
This alludes to spot instances, but to add one clear use case that’s impacting us is that we use Google Cloud and Preemptible instances.
These instances are cycled in 24 hours or less, so the symptom of losing hosts is rather prolific across all of our pipelines, and all of our steps. Having an ability to set a standard retry either on the pipeline level or across all pipelines in a project would be helpful. As is, we’ll need to go through and ad this to every step in all of our pipelines.
Yeah we would also love this
Yes, we also run agents in kubernetes cluster which is running on preemptible instances.
It’s very redundant to specify the same retry options for each step in each pipeline.