Cancel multiple jobs from Web UI

We often need to cancel multiple jobs in the Web UI, e.g. when we expect jobs to fail based on changes to some scripts, or when iterating on our CI system.

In those cases, it would be helpful to be able to select multiple jobs to cancel at the same time. The same is true for retry - a retry-all function would be very helpful.

Hey @kai !

Thanks for reaching out to us :wave:

Also obliged for the suggestion - from a usability perspective is the suggestion for jobs (inside a build) that run at the same time? Just getting some context since the current Cancel button appears once a job is accepted by a agent, and multi-clicking would be the current way of going about it.

Cheers!

Hi!

Yeah, it’s in a single job. At the moment I have to click on e.g. 10 different build jobs. And because every cancel opens a dialog to ask me if I’m sure, this takes some time. I think the dialog makes a lot of sense, but it would b e ideal if I could just select all jobs I want to cancel (e.g. with a checkbox), cancel them, and confirm only once.

Thanks @kai :)

In the case of cancelling - would the case of cancelling the build entirely help in this case? It depends on how much jobs you actually cancel during said build - and like you mentioned if its up to 10 each time, could be a fair point (i.e 10 out of say 15 jobs need to be cancelled).

I also assume each of your steps is not really dependant on the next but thanks for the context

Cheers :+1:

Cancelling the whole build will also cancel the jobs I’m interested in, which can take about 30 minutes to run, so it’s unfortunately not an option. In my current case, I would want to cancel 21/24 jobs.

Thanks again @kai :+1:

There is an additional option to fail fast (set the cancel_on_build_failing attribute to true) in steps you want to be cancelled as soon as their corresponding build starts to fail - but I believe in your case it could vary and you’d like to keep them running to see the outcome.

Let me know if that is correct - it will be good feedback to post through to our product folks to assess.

Cheers and thanks for all the feedback!

Hi @kai , I am from the product team at Buildkite. Thanks for raising this feedback with us.

Can you explain the use case of the 3 steps that still need to run?

What are the 10 jobs that are running? Are they parallel jobs being run? Are they all under the same group?

For retries, do you retry all jobs for flaky test mostly? Or are there different cases in which multiple jobs would have failed for you that you would retry all?

In this concrete case, we have a “builder job” that kicks off the downstream jobs. One of the jobs (here concretely: A Python wheel build) is the one that I’m iterating on in my development. This is the job I’m interested in. The other jobs that are kicked off (e.g. integration tests) are not interesting to me. I don’t want to change the main pipeline configuration, as it would impact other PR builds and its managed by terraform, changing it for iterating on a single PR is too much overhead. I can comment out parts of the script that generates the downstream jobs, but that is also not ideal. Generally I would like not to change our build scripts or infrastructure when iterating on a single job. Of course we can work around that, but being able to just abort multiple jobs in the web UI would be a very straightforward solution for this case.

To answer your other questions:

The jobs are integration and unit tests for different parts of the code base. There are also other longer running jobs (up to 120) but because the builder job for those takes several minutes, I can cancel it before it kicks off the downstream jobs.

The jobs are run in parallel.

We don’t use groups for that pipeline at the moment, but if we did, they would be under the same group.

For retries, since I will have to run the retry on the latest commit of the PR branch, I just push to the PR branch, which starts a new build (cancels the old build) and then cancel the uninteresting jobs manually. As you can imagine, with some iteration this can get a bit old when canceling many jobs :-)

Thanks @kai this now makes a lot of sense.

I was asking if they are under the same group because that helps me understand whether you can cancel the whole group. And in this case for tests that seems to be the case.

I just want to check one last thing before I add this to our backlog, are you cancelling the builds for cost purposes? Or to free up agents for other PRs in the same pipeline? Or another reason I can not think of?

Thanks heaps for patiently answering!

It’s mostly for cost purposes. The jobs are rather expensive to run and for the specific set in question we only have a limited amount of agents.