Shallow clones?

Hey all,

I’ve got a use case where we have a pipeline that will only ever need the current version of a repo, but the buildkite-agent will always do a full clone anyway. I see that there was an FR posted a while back about this (https://github.com/buildkite/agent/issues/437) and I was wondering if this was still on the roadmap somewhere.

For some context, in our case, this results in us having to do a full 2GB repo clone, where a --depth 1 clone weighs in at around 170MB. I’d really like to avoid doing the slow, wasteful full clone where we don’t need 90%+ of the actual clone we’re retrieving.

Thanks!
-Sean

It looks like this may have been added to the agent configurations as part of this PR: https://github.com/buildkite/agent/pull/957/files

Would it be possible to allow this configuration in pipeline steps as well?

This would really help my team’s use of BK as we don’t require the full codebase history in certain pipelines but other team’s may require the full history in their pipeline(s).

Thanks for pointing out that this has been added!

Since this is an env var, I think that you should be able to set it on a per-step basis like so (but I have not yet tested this):

steps:
  - label: "Test things"
    commands:
      - "run-tests.sh"
    branches: "master"
    env:
      BUILDKITE_GIT_FETCH_FLAGS: "--depth=1"

Any update on this topic, we are currently dealing with the same issue and besides
BUILDKITE_GIT_FETCH_FLAGS
there is also
BUILDKITE_GIT_CLONE_FLAGS

it seems that the clone process is only in the beginning when our yml gets loaded and after that its only fetching the repo. Question is, do I have to place the env var to every pipeline step or is there a possibility to enable it by default for every step?

Edit, nevermind found the solution and implemented it.

Github has since published Get up to speed with partial clone and shallow clone | The GitHub Blog

Blobless clones definitely feel much faster than full clones, and don’t have the broken-history problem that a shallow clone has (blobs are fetched on-demand, but the full commit history is present).

Treeless clones are mentioned as being useful for CI builds in which you’re throwing away the clone after the build (if Buildkite nodes are configured to fully self-scrub after any build step), though their performance is noticeably worse right now due to redundant work when doing history operations or checkouts of other commits (treeless clones are not optimized at this time, and the server apparently ends up sending a lot of trees that the client already has, due to no great protocol support for this clone mode).

At least from my own limited tinkering, blobless clones are almost always as good as treeless clones in the cases most favorable to treeless, but much better than treeless and full-clone in almost every other case. Blobless seems like an easy, transparent speedup with no real downsides to enable for at least git hosting services that support it (such as Github).

1 Like