Cancel_on_build_failing with fail counter

Ya, I think this is fundamentally different than that. We use automatic retries already on the most flaky tests + want to use automatic retries sparingly in general + without something like Restart step in a different agent, there is nothing from stopping a single, rogue agent/job from cancelling a build.

I think there is much more signal from N steps failing at least once than from a single step failing N times.