Right now, my company is splitting up our test suite (Rspec + Cypress) across a bunch of different runners, so each one runs say 10 tests. However, sometimes we hit flakes, where one test will fail. Right now, we have some automatic retry logic that will re-run the tests in the runner. However, it would be great if we could retry only the tests that failed.
From looking around, it doesn’t seem like Buildkite supports this out of the box, and so I’m exploring how I could potentially do this with code. I see that with the List Builds for a Pipeline API, we do have access to information about whether a job was a retry, and from which job this was (see below for an example from one of our builds):
"retries_count": 1,
"retry_source": {
"job_id": "018ecacc-253b-4901-b63a-6cae18db9aed",
"retry_type": "automatic"
},
So, my question is: when we’re within a file that’s being run as a Command step, is there a way for us to know/pass in information about a) whether this is a retry, and most importantly, b) what the retry_source is? If we had this information, I could use the Buildkite API to grab the source job’s logs, parse which tests failed, and then run just those tests.
Is this possible right now?
I guess a related question would be: do we have any control over how automatic retries work at all? As in, is the Command step re-run when a job is retried, giving us access to potentially change things, or is the job input cached and re-run without actually stepping into the Command step script?