Docker-compose and agent-stack-k8s

ianwremmel · January 16, 2026, 5:02am

I’m working on migrating from the cf stack to agent-stack-k8s on a local cluster. I’m still interesting in using ECR because I’m not sure I’ll ever fully retire the CF stack. My builds rely on the docker-compose plugin: the first step builds a container specific that branch/sha and each subsequent step pulls that container and runs steps within it.

Unfortunately, because I’m no longer running on ECR, this significantly increases my network delay. I think I most of that delay would disappear (and in fact improve upon the CF stack) if I could reuse pulled images across agents.

Is there any way to make that work without updating my in-repo pipeline definitions (i.e., can i reconfigure either my cluster or the agent-stack-k8s to have a shared, persistent image and layer cache)?

Priya · January 16, 2026, 6:24am

Hey @ianwremmel

Welcome to the Buildkite community!

I hear your concerns to stick with ECR to go with both CF stack and K8s cluster. Could you clarify if you are currently running your k8s cluster in EKS? What is the image registry currently been used with K8s?

If you’re sticking to any non ECR registry, I’m wondering if you should use ECR pull through caches to reduce the latency and also looking at establishing node level container cache? Do you have any imagePullPolicy currently defined as part of your pod.

We can dig deeper once we understand a bit more about your current setup.

Cheers,

Priya

ianwremmel · January 16, 2026, 4:22pm

I’m not running the cluster in ~k8s~ EKS, I’m running it onprem. both my k8s stack and my cloudformation station push to and pull from a private ECR registry. If I’m reading your link correctly, ECR pull-through caches are for using ECR as a cache, which is valuable when your builders are on AWS infrastructure, but not particularly valuable when your builders and runners are elsewhere.

I’ve been trying to set up a pull-through cache in the cluster that caches images coming from ECR, but so far, the credentials needed for a private registry have gotten in my way. I also tried to set up node-level caching, but since I’ll be running multiple agents on the same work and potentially multiple agents will be doing builds, all indications are that this will cause issues with the docker layer cache.

Micheal · January 16, 2026, 5:49pm

@ianwremmel Thanks for the additional context here I am taking a look at this

Micheal · January 16, 2026, 7:15pm

Hi @ianwremmel Wanted to check if you had a chance to review this document as it could give some insights into the approach you could try here and please let me know if this helps. Just another note, the image cache is handed by Kubernetes with imagePullPolicy in the Pod spec - https://kubernetes.io/docs/concepts/containers/images/#image-pull-policy and for build layer caching, I believe you can create a Persistent Volume Claim and mount this at /var/lib/docker in container-0 using config.pod-spec-patch in the controller’s configuration to achieve your use case to not update the in-repo pipeline YAML files

ianwremmel · January 16, 2026, 8:00pm

yea, i’m currently using the dind approach. Everything I was reading suggested that mounting /var/lib/docker was discouraged by the docker folks because the cache wasn’t intended for multiple writes and corruption is likely.

does the imagePullPolicy apply to the image used by agents or does it also apply to docker containers started by those agents? (I’m admittedly rather fuzzy on where the boundaries between docker and kubernetes are across the various configuration i’m juggling )

Micheal · January 16, 2026, 9:02pm

Hi @ianwremmel looking at your scenario and after reviewing it a bit further for this use case, I see your point as there are multiple daemons pointing to the same data root which could cause corruption. Regarding imagePullPolicy , I believe it only applies to the agent pod images themselves (the Buildkite agent container, DinD sidecar, etc.) and not to images pulled by docker-compose inside the DinD container. Since you’re using DinD, there’s a separate Docker daemon running inside your pod, and that daemon pulls images completely independently of Kubernetes. So imagePullPolicy won’t help with caching the images your docker-compose plugin pulls from ECR. If you stick with your current DinD approach then you could add an in-cluster registry that acts as a caching proxy for ECR. After this, you can then point your DinD daemons at it so they pull through the cache. The main thing you’d need to handle is keeping the ECR credentials refreshed
(You could use a cronjob)so the proxy can continue authenticating to your registry.

ianwremmel · January 16, 2026, 11:20pm

I’m not tied to dind :)

i’ve been trying to set up a caching proxy for ECR for the last few days and it keeps either failing to authenticate or make things slower.

From what I’ve read, if I switch to buildkx, i’ll get the shared build cache i’m looking for, but I think that still leaves the docker-compose plugin without a cache. is docker-compose still recommended for k8s or do y’all have a different approach entirely at this point?

Micheal · January 19, 2026, 5:16pm

Hi @ianwremmel There are other approaches asides dind. I believe you can explore using Kaniko as it also supports ECR and could work for your use case or any other other approaches described here for building container images. For the ECR authentication you can review this documentation for guidance.

Topic		Replies	Views
How does Caching work between agents? General	1	178	November 4, 2024
What's the best way to access ECR repositories using agent-stack-k8s? General	2	18	January 7, 2026
Docker buildx with cache options fails Elastic CI Stack for AWS	3	76	January 20, 2026
Docker images pulled during bootstrap missing Elastic CI Stack for AWS	5	495	May 24, 2022
Docker-compose cache-from directive not working with v6 of the AWS stack Elastic CI Stack for AWS	3	1141	October 13, 2023

Docker-compose and agent-stack-k8s

Related topics