Notebook for investigating build failures, especially flaky tests

huon · June 15, 2020, 12:02am

I’ve been investigating some flaky tests on one of the projects I worked on. I wanted to make sure I’d got them all, and what better way than getting a computer to check. We upload our tests results for every builds step as (JUnit) XML artifacts, so it wasn’t too hard to use the Buildkite API and pybuildkite to write a Jupyter notebook that examines each of them to find flaky tests in two ways:

failures within a build that passed, meaning that a human retried some of the steps and it eventually worked, without the code changing
multiple builds of a single commit, where some builds and some failed (this works best if the configuration between builds isn’t too different)

I’ve uploaded the notebook at: https://gist.github.com/huonw/5b15172499251ce88ac42a6a926e6162 including example output from https://github.com/stellargraph/stellargraph. (It may need edits to work with other projects or variants of JUnit XML.)

To run on another project, it can be:

downloaded via “Raw”: https://gist.githubusercontent.com/huonw/5b15172499251ce88ac42a6a926e6162/raw/9b335f7a8b5f04e8a16f15bceaa63697065d41fa/flaky%20tests.ipynb
run online in Google Colab: https://colab.research.google.com/gist/huonw/5b15172499251ce88ac42a6a926e6162/flaky-tests.ipynb

Hope this helps someone!

Topic		Replies	Views
App for detecting flaky tests General	6	831	September 29, 2020
Better notebooks through CI: automatically testing documentation for graph machine learning (using Buildkite!) General	0	624	April 8, 2020
Add `pytest-retry` support to Test Analytics Features Requests	2	205	February 17, 2023
Add ability to add notes to tests Test Analytics	1	15	March 29, 2025
Test failures showing up in Test Digest but not on jobs Test Analytics	9	51	March 31, 2025

Notebook for investigating build failures, especially flaky tests

Related topics