I’d like to retry jobs automatically when the agent is gracefully terminated (
exit_status: 255). However, timed-out jobs also have
exit_status: 255, and they will be retried automatically when we have a step like this:
steps: - command: ... retry: automatic: - exit_status: 255 limit: 1
Graceful termination of agents can happen exceptionally (like maintenance of infrastructure) and it’s a kind of accidental failure. On the other hand, timeouts are caused by the inappropriate value of
timeout_in_minutes in most cases and it’s a reproducible failure. I think they should be treated differently.
Is there any good way to exclude timed-out jobs from the retry condition?
Or we might need a new feature for example:
- Use a special exit status (e.g. -2) for timed-out jobs.
- Add a new attribute like
state(the one in the REST API response) to the automatic retry attributes.