ExternalCommand job step for Asynchronous operations (GitOps/Kube use case)

I would love to have something like an “External” job type. It’d be a job type that on the UI would be represented similarly to a Command type, but the difference is that the job isn’t scheduled via agents.

Instead, the job can be accepted/updated/finished via GraphQL API.

This could allow for external operations, such as callbacks from an “eventually consistent system” like a Kube deploy, to influence the Buildkite pipeline, without having to “hog” an actual agent that would just sit there in a tight loop asking for work to be completed.

Think of a command step today that does something like:

trigger_external_operation()

while external_operation_not_completed():
  wait

In these situations, some external system is already responsible for observing the task. It would be wasteful to have an agent just sit there in a loop and repeatedly ask “are you done yet”?

Additionally, there are situations where the Buildkite agents only have indirect access to the external system in question. Consider a use case where the Buildkite agent just pushes some Kubernetes Manifests to a gitops repo, and now has to wait for the GitOps Engine (something like Argo CD) to reconcile the change to a target cluster.
Buildkite agents may not actually have access to query Argo CD directly.

Having a step like this External Job Step could allow a system like Argo CD to asynchronously notify Buildkite of progress. It could “accept” the job once work starts, push arbitrary log output, and also finalize a job with an exit code.

I realize that this would potentially conflate the duties between the Agent API and the GraphQL API. Maybe this could be implemented on the Agent API instead of GraphQL?

And because I love mockups, here’s a mockup ;-)

Hi @moensch!

:open_mouth: That’s very interesting! Not sure where this could fit on our product roadmap though :thinking: I’ll share it with our product team.

Cheers!

This feature would be so helpful for my builds, I have asynchronous baremetal provisioning that I’d love to off load from agents to some workflow engine better suited to waiting for a long job to complete