Global Pipeline retry rules don't apply

I use buildkite with preemptible VM instances. This means, agents can disappear in the middle of a build.

Looking through Top level pipeline env empty when queried through API, it seems I can set global settings in the YAML editor, so I went on and added the following in the web interface:

retry:
  automatic:
    - exit_status: -1 # Agent was lost
      limit: 2

The web interface seems to accept it, but I don’t see steps restarted when agents disappear.

I first commented on the thread there, but haven’t gotten any feedback, so opening a new thread here.

Hey @flokli, sorry it’s taken us so long to get back to you, thanks so much for persevering :bowing_woman:t2:

There are only 3 attributes that can be set at the top level - env, notify, and agents. This is definitely something we need to add to the docs, thanks for pointing it out.

For the moment, retry will need to be set on each step that you want to automatically retry. I’ve had a quick chat with @keithpitt about adding it as a top level attribute; he was keen so it could totally be a thing at some point! Sorry I can’t give you any timelines for it being added, but please do let me know if it’s a real blocker for you and we can sort something out :+1:t2:

Hey @harriet, thanks for getting back to me!

Adding these 4 lines to each of our pipeline steps (more than 10) would be possible, but will make the pipeline code very unreadable. Also, when we talk about constructing dynamic pipelines, having to remember to always add the retry attribute will be pretty clunky.

It’d be really nice if this top level attribute could be implemented :tada:

In the meantime, the way we handle this ourselves to make it less clunky is by using YAML named anchors that we include in all of our steps, like so:

retries: &retries
  retry:
    automatic:
    - exit_status: 125
      limit: 3

common: &common    
  - <<: *retries       # defined above
  - <<: *agent_config  # defined above
  - <<: *env           # defined above

<snip>

- label: "build-file-verification"
command: "go/ci/build-file-verification.sh"
<<: *common

- label: "vendor-directory-verification"
command: "go/ci/vendor-verify.sh"
<<: *common

etc.

It’s by no means perfect (like you said, generated steps can’t benefit from these reusables, since YAML anchors cannot be referred to outside of the document where they’re defined), but it’s made our steps definitions far nicer.

Yeah, Anchors can work in some cases, but I’d really prefer if global retry rules can be added, as they fix things in a nice and consistent way.