Input/Block steps usability improvements

Use case:
We have a validation job in my CI pipeline that runs the new code on historical prod inputs and compares the new outputs to historical prod outputs. This is extremely useful to catch unintended effects that might have drastic impact on a complex financial product. When differences are observed, this job requires manual approval as it is impossible to automate the criteria of “is this behavior change expected?”. This seems like a perfect match for a block/input step.

Unfortunately, as it stands, BuildKite cannot support this use case because:

  1. input/block step put a build in “blocked” state but the build is treated as fixed instead of pending
  2. input/block steps are disabled if any other step in the build is failed

The first issue means builds requiring manual review would mistakenly be treated as successful by downstream systems and allowed to be deployed, which would be catastrophic.

The second issue means that my hacky workaround of triggering an extra step after unblocking to retry the failed build and short-circuit that retry as a success doesn’t actually work.

I could simply allow a direct retry of the failed build to mark it as success, but that removes the ability to add metadata to the unblocking.

I could also keep a busy loop in one agent to artificially keep the overall build pending but I would really rather not waste EC2 resources that way…

Would BK be open to addressing either of these issues, or offering a different way to support my use case?