We’ve been using Buildkite for a few weeks, and have gotten a lot of benefit out of it for our CI pipeline. We have a remaining source of build flakiness that we want to figure out how to better handle.
We have kind of of timeouts or response failure that seems to happen when we’re interacting with Docker Hub through the docker-login plugin. Authentication sometimes appears to timeout when an agent is logging in with the docker-login plugin. Is there a way to have that plugin step and only that step retry if it fails. I don’t want to enable global job retries, but this has been happening a non-trivial amount in our build pipeline.
We’re using the Elastic CI stack with no customizations that I know of that should affect this. Is there anything we can do?