I have a pipeline that’s composed of many ordered steps in the DAG. Steps publish and consume metadata and artifacts to avoid repeating work.
If a step fails towards the right-hand end, to be able to iterate on fixing that, my current option is to make a fix then wait a while to see what happens. For game builds, this could be several hours.
I would like to be able to restart a build that has failed, from the point (or points, I suppose) of failure, with a newly chosen revision (probably defaulting to HEAD in the same branch), so that I can avoid needing to wait for all the previous steps that were green to be green again.
I’d like the restarted build to have access to the same build-state (artifacts and metadata and env) as it had at the point of failure.
I’d be totally fine with not being able to restart a restarted pipeline, if the first restart cleared the first failure and exposed a second; I’d still be sped up, here, and I appreciate that the semantics could get complex.
I would not want a restarted build to change how commit-statuses got published onward to things like SCM providers, because I would absolutely still want only an end-to-end green build to be canonical.
I have thought about ways to make the artifacts possible to come from previously-successful builds of the pipeline - but this is both fiddly, and not necessarily suitable if the reason for the failure is relating to how the artifacts are dealt with.
Does anyone have any different ideas about how this set of problems could be solved today?