Allow manually retrying a job with fast fail

noahtallen · January 9, 2025, 1:06am

We have a very large generated pipeline, most of which is set up to fast fail using cancel_on_build_failing. If you attempt to manually retry the job which failed the build, it immediately gets cancelled due to the fast fail settings. Is it possible to bypass that so you can manually retry a job even when fast fail is on?

Ultimately, I want to, say, take a video of a failing test if you manually retry that specific job. (To avoid the overhead normally, but make it super easy to access otherwise.) But I still want fast fail enabled normally.

lizette · January 9, 2025, 2:21am

Hi @noahtallen, there is a Retry failed jobs button (besides rebuild) that triggers the failed jobs for you. This should help to rerun those failed jobs without them being cancelled immediately.

noahtallen · January 9, 2025, 5:44pm

Unfortunately, that’s where I’m experiencing the problem! Let’s say there are 10 jobs, and one of them fails. This causes all other jobs to get cancelled via fast failure.

When I then go to Retry failed jobs (or retry that individual job), that new retry is instantly cancelled. I’m guessing this is due to the cancel_on_build_failing flag, since that setting applies to every job in the build, and I guess the job settings don’t change when you retry it.

I’m wondering if there’s a work around so that the job can normally be cancelled by fast failure, but still allow manually retries.

amna · January 10, 2025, 12:45am

Hi @noahtallen and thank you for raising this!

The cancel_on_build_failing flag just doesn’t let failed jobs be retried—it cancels everything immediately, even on retries. Unfortunately, it looks like there’s no real workaround for this. If you want manual retries to work, you’d probably have to disable fast-fail entirely.

noahtallen · January 10, 2025, 3:06am

That’s unfortunate! Can I change this to a feature request?

stephanie.atte · January 11, 2025, 12:15am

Hey @noahtallen for sure we can raise a feedback for this issue. To understand a little better, we would like to clarify a few things. As this helps to dissect the issue and how it affects you currently

Can you describe the problem you’re trying to solve?
How is this issue impacting your workflow or team’s productivity?
How are you currently working around this issue? Is there anything that partially meets your needs?

lizette · January 14, 2025, 4:43am

Hi @noahtallen , just an update on this. We have raised a feedback to the product folks about this issue and informed them that the attribute can be limiting when you only want to retry one job from the build.

noahtallen · January 15, 2025, 9:05pm

Thank you! Fast fail and job retry are both useful features. Fast failure helps avoid running extra workflows when they aren’t strictly needed, and job retry lets you see if a specific job might pass on a re-run due to a flaky test. Currently, you can’t get both of these benefits together.

The specific workflow I’ve been looking at is a little different: I want to allow devs to retry an individual job to get some extra functionality. For example, let’s say I have a massive CI pipeline with multiple jobs which each run a few e2e tests. When a test fails, it could be useful to auto-cancel all the other jobs to help decrease CI resource utilization.

At the same time, I want a developer to be able to click “retry job,” which then enables video recording for the e2e test framework. (Video recording in, e.g. Cypress, has performance overhead which isn’t needed almost all of the time… unless a test fails and a developer needs to investigate it!) This manually retry job → enable video recording in that specific job is totally possible, but that new job gets cancelled instantly if fast failure is enabled.

Doing this via “retry job” instead of “retry build” is useful, because the entire build has a large number of pipeline steps – running the individual job (the smallest unit of that build as possible) is much more efficient, since a new build isn’t needed to enable video recording.

IMO, fast failure shouldn’t impact manually retried jobs in CI – the main point of fast failure is to cancel automatically started jobs which don’t necessarily need to continue. But when a developer manually retries a job, that’s a strong signal they don’t want it to be auto-cancelled.

lizette · January 15, 2025, 10:08pm

Hi @noahtallen ,

Thanks for this added context on the feedback. We will pass it on to our product team.

Cheers!

Topic		Replies	Views
Cancel_on_build_failing with fail counter Features Requests	6	732	January 24, 2023
Manual retry limit Features Requests	3	417	August 31, 2021
Option to retry failed jobs within a group Features Requests	1	12	May 20, 2025
Button to retry all failed steps in a build Features Requests	3	1443	April 27, 2021
Retry all failed Features Requests	2	563	August 22, 2021

Allow manually retrying a job with fast fail

Related topics