Declare dependencies between jobs


#1

Hi there! :wave:

We have a lot of pipelines that are dynamically generated based on dependency (and “what changed”) information from elsewhere. :sparkles::mage: Here’s a high-level one of such pipelines, which mostly just triggers other pipelines and waits for their results:

The actual dependencies here look something like…

…but we have to flatten them into a straight line, because Buildkite doesn’t (yet? :pray:) support declaring dependencies between jobs in any other way. This requires “shaking out” the dependency graph into groups of jobs that can be run together — something like tsort. Easy. But the result will always leave something to be desired, and sometimes that something is really significant.

Consider in the above example if “build web client” takes 15 minutes longer than any other job in its group, and “test backend” takes 15 minutes longer than any other job in its group. From that alone, the overall build will take 15 minutes longer than necessary. We have a lot of such scenarios in different parts of our pipelines, and they’re really starting to add up to a lot of unnecessary build latency.

Now, I can imagine some ways to work around this, e.g., a pipeline that manages its own graph of jobs internally and keeps adding a new “poll until ready to spawn new job” job to the end each time it realises another dependency has been satisfied. But then I’d effectively be reimplementing what I think of as being some of the core functionality of Buildkite — i.e. job queueing and scheduling — at which point I’d end up asking myself some deeper questions that I’d rather not have to think about. :thinking: :sweat_smile:

So what would work really well for me?

I can understand that you’d be hesitant to add any kind of explicit DAG support to Buildkite unless it was something you believed you could support with a great user experience through the UI. I’d like to suggest something that I believe will really help power users in the short-term, and doesn’t preclude or interfere with a grander/friendlier design further down the line.

My problem would be solved if, in addition to the existing wait steps, I could declare “tags” and “wait for tags” on jobs, like so:

{
    label: "Build web-client",
    command: "...",
    tags: ["build-web-client"],
},
{
    label: "Test web-client",
    waitForTags: ["build-web-client"]
}

A job with “wait for tags” would not start until for each of those tags there is either (1) no job that provides that tag, or (2) all jobs with that tag have finished.

As far as the UI is concerned, I’d personally be happy enough just having jobs float around mysteriously if they have any unsatisfied dependencies. But I guess that could be a bit confusing, so how about marking jobs with unsatisfied dependencies with some tiny tag, and then allow hovering over them to reveal what they’re waiting on:

(Maybe with a tool-tip as well, explaining exactly what’s going on?)

This might be a bit of extra clutter in the UI, but it would only ever show up for power-users who have explicitly opted in to this complexity by declaring dependencies in their pipelines. For extra cleverness, you could make it automatically infer “wait steps” where jobs can be precisely linearised, just to tidy up the display of it where possible.

What do you reckon? :man_shrugging: I’d be overjoyed to have access to a feature like this, even without any UI attached to it; dependencies/DAGs are often the natural way (and always the most flexible way) to express dependencies between tasks, and we’re going to keep suffering a lot of wasted time in our pipelines without some kind of support for them.

If you have anything similar to this in the works, or some other idea that solves the same problem, any chance of getting alpha/beta access? Or if it’s not yet that far along, would you be willing to share some ideas about what the design is likely to look like, so that we can have a think about how it might fit into our world?

Thanks! :bowing_man:


P.S.

Sorry, new users can only put one image in a post.

:anguished:

Sorry, new users can only put 2 links in a post.

:anguished: :anguished:

Are you able to make me not a “new user” so that I can re-upload those other images to display inline?

UPDATE: Hurrah! I’ve replaced the imgur links with inline images. Muchas gracias!


#2

I just increased your user level, and upped those default limits. Sorry about that! :blush:


#3

Thank you! :smiley: I’ve replaced the imgur links with inline images. :tada:


#4

Big +1 for this. We’re at the point where not having a way to imply dependancies as a first class feature means we’re looking at a choice between incurring considerable costs in terms of extra build time or writing our own implementation of DAGS, which will not only be costly in terms of developer time, but also likely complicate our pipeline and make it a lot harder for us to use.

Jeff’s suggestion of a config-centric implementation with minimal UI changes sounds excellent.


#5

This is something we’re very keen on too. We probably won’t support a DAG initially, the present thinking is that we’ll support groups of steps, where multiple groups can run in parallel.


#6

Here’s a more complete view of the dependencies in our top-level pipeline:

Note that depending on what code has changed since {last build from which we can re-use artefacts} different jobs in this graph may be present or absent. Our “subordinate” pipelines that are triggered by this one are currently “dumber” by comparison (we just build everything in them if they run at all) but we’d like to be able to make them a little bit more sophisticated, too — preferably using the same solution we end up with for this top-level pipeline.

As you can see in the diagram above, there are a few diamonds going on. This means that whether we’re using builds the way they are defined today, or with the proposed “groups” feature, to get the least bad build times we’d need to:

  • Query for historical timings of different kinds of builds.
  • Use those to search (randomly or exhaustively) for linearisations of the dependency graph that minimise the expected “dead time”.

In practice this will still leave nontrivial amounts of “dead time” in the graph, so from our perspective, this would introduce even more complexity than we have now for not a whole lot of benefit.

As a workaround, we’re thinking of implementing something like this:

  • Instead of sorting out dependencies up-front, dynamically build the pipeline as we go.
  • To allow this, wrap every step in a bit of code that runs after the step’s own code to see if there are any pending jobs that can now start. This will:
    1. Take out a lock (DynamoDB or whatever) for modifications to this job.
    2. Use buildkite-agent pipeline upload to replace the rest of the pipeline with any steps whose dependencies are now satisfied, and new “dummy steps” (see below).
  • Purely for operator convenience/sanity, each time we update the pipeline we would also include a list of “dummy” steps at the end representing each job that can’t yet start, prefixed by a block step to make sure they never actually run. (Maybe with a special mark in their name, e.g. “… The Job” or “:hourglass:The job”.) This would at least make it clear what will eventually run in this build, even if you can’t see what they’re waiting on.

Based on the dependencies above, one sequence might end up looking something like this (depending on what jobs happen to finish first):

This might sound like a bit of a hack, but I think would actually be relatively easy to implement, and decently approximates the idea in my original post.

I’m keen to know whether anybody has already built something like this, or has been thinking about it.