Hi Buildkite(and other community members),
I’m looking for advice on how to increase the speed of our ‘Prepare working directory’ step.
At best its 2-5 seconds, but sometimes it slows to >30 seconds.
A bit about our configuration/setup:
Biggest repo is ~120 MB.
We are using git-mirrors feature and running agents in ASG’s in EC2. We have several queues: small(20 agents per instance, to run pipeline expansion, and wait on aws cli calls), medium(1 agent/instance) to run docker builds, large(1 agent/instance) to run tests.
Our agents, on EC2, come up as there are jobs available and terminate after a couple of minutes idle. It is common for an agent to be ‘fresh’, although we do get some agent re-use when several PRs are being built at once.
We use an EFS mount to share the ‘git-mirrors’ folder across agents. We wanted to reduce ‘internet-traffic’(downloads from github) as this incurs high nat-bytes cost. Once an agent has updated the mirror(so EFS now contains the desired commits), its available to the other agents. I believe there is a filesystem lock, so the other agents block until the lock is released.
Our usual pattern is that the first few pipeline steps expand a set of jobs(dynamic pipeline). Each of these jobs runs simultaneously, and does its own checkout on a new(depending) agent/instance. These parallel ‘checkouts’ seem to be the ones exhibiting the majority of the slowdown.
Is the issue to do with lots of little files being transferred from the EFS to local disk, when cloning from the mirror into the ‘job’ directory?
I’ll include screenshot on our EFS mounts, and spec.
Happy to answer any questions. Interested to hear any advice.
Thanks,
Michael