Notebook for investigating build failures, especially flaky tests

I’ve been investigating some flaky tests on one of the projects I worked on. I wanted to make sure I’d got them all, and what better way than getting a computer to check. We upload our tests results for every builds step as (JUnit) XML artifacts, so it wasn’t too hard to use the Buildkite API and pybuildkite to write a Jupyter notebook that examines each of them to find flaky tests in two ways:

  • failures within a build that passed, meaning that a human retried some of the steps and it eventually worked, without the code changing
  • multiple builds of a single commit, where some builds and some failed (this works best if the configuration between builds isn’t too different)

I’ve uploaded the notebook at: https://gist.github.com/huonw/5b15172499251ce88ac42a6a926e6162 including example output from https://github.com/stellargraph/stellargraph. (It may need edits to work with other projects or variants of JUnit XML.)

To run on another project, it can be:

Hope this helps someone!

2 Likes