Reuse an agent between subsequent steps in a pipeline?

I have a project where I want to run a lengthy build process, then execute some suite of tests on the output of that build, inside the same execution environment.

In short: I build a big beefy suite of C++ programs in a containerized environment with a large number of dependencies. After the build, I want to execute a large suite of integration tests on the same software. I want each integration test to be an individual step in buildkite.

In other systems such as CircleCI, this is trivial, because you can spin up a single runner that will exist for the entire duration of the pipeline, and the filesystem sticks around between individual jobs in the pipeline.

Buildkite, on the other hand, has a much “cleaner” paradigm in that jobs within a pipeline execute in a completely isolated environment. This helps guide you towards clean CI/CD architecture, but makes it slow/overcomplicated to run a quick step within a larger pipeline.

The options I have now, and why they aren’t great:

  • Use build artifacts: Keeps steps isolated, at the cost of spinning up/down a large number of extra containers, unnecessarily reading from the artifacts many times, etc. which imposes a very high time overhead
  • Use a single job/step, and a bash script: Now I lose the job status in Buildkite but I could potentially use the Buildkite CLI/API to push a custom step onto the build as it runs?

I wish I could somehow define a group of jobs which are all to run on a single agent in sequence and share the filesystem / execution environment. These jobs would execute as subshells in that environment. The benefit of being able to do this would be to reduce the time waiting for jobs to spin up & copy around artifacts, as well as the ability to declare the pipeline declaratively in the buildkite config instead of a big shell script.

What is the canonical way of doing this, am I missing something conceptually that would make this style of pipeline work better in Buildkite, or should I be thinking about this job another way?

Hello, thanks for your interest in Buildkite! You’re right that the Buildkite paradigm does try to steer users away from the model that you are describing. Philosophically, our feeling is that teams benefit from the speed increase that comes with breaking down their pipelines into the smallest possible components that can be run in parallel. In terms of typical solutions people use in a scenario like yours, the two that you’ve identified are definitely common - either use artifacts to pass things around from job to job, or write a build script that orchestrates everything that needs to happen on a single agent, so it can be run as one job.

Another common pattern is to have your build step generate a container image, and then have all your test steps consume that image and run their tests within it. If your container registry is close to your agents (as with e.g. AWS ECS), the overhead to pull down the image for tests can be surprisingly low.

There is one other solution which might be an easier path for you - it’s what we call “node affinity.” This is a way of configuring your agents and pipelines which causes all steps uploaded by a given agent to run on that agent. Take a look at the docs for that and see if it would work for you - you’ll see we’ve included a short note about potential drawbacks of this approach.

I hope this is helpful, please let us know if you have more questions about this!