Git sparse checkout

Hi there,

Is there a way to do a git sparse checkout? We have a monorepo repository and I want to build just one directory of it without downloading the whole repo

Hey @chomey! :wave:

Unfortunately, we don’t have that functionality at the moment, but it’s a really great suggestion! I’m going to pass this as a feature request to our Product team :slightly_smiling_face:

Alternatively, I think you may be able to prototype it with an agent pre-checkout hook in bash, by blowing away any checkout cache that exists, doing a new bare checkout, running the git sparse-checkout commands, and then letting the default agent checkout run its course.

Hope this helps!

Cheers!

@paula thank you for your quick response!

We had a few questions (@chomey and I work together):

  1. Do I simply place the pre-checkout hook under .buildkite/hooks in my infrastructure repository?
  2. How does the pre-checkout hook run before the git clone command? Our issue is that we’re running out of disk space during git clone (133gb repo, about 500mb is relevant to the build), and don’t want to provision a large disk for irrelevant build data (and it will slow down our git clone command on the machine).

We mainly want to translate this to run on the buildkite build agent instead of the git clone:

git clone --depth 1 --filter=blob:none --sparse git@github.com:MYOrg/huge-repo/relevant-folder
cd huge-repo
git sparse-checkout set relevant-folder

Any clarity you could give us would be much appreciated!

Thanks!

Hey @tmendez! :wave:

Thanks for the detailed information! That’s a really interesting use case! :hugs:

So, what I think you could do is something like this:

  1. Customize your agent(s) using the attribute of git-clone-flag, and in it you set the commands: --depth 1 --filter=blob:none --sparse.

  2. The second line is not necessary because the agent is already on the repo directory. So, that leaves us to the sparse-checkout command. In here, you should create a pre-checkout agent hook. Each agent installer comes with a hooks directory containing a set of sample hooks. You can find the location of your agent hooks directory in your platform’s installation documentation. To get started with agent hooks copy the relevant example script and remove the .sample file extension.

Something to have into consideration is that you should have a particular agent(s) for this pipeline, setting a queue. Otherwise, you would have this configuration across all your agents (unless is something you want to do).

Hope this helps!

Cheers!

Hi @paula, thanks for getting back to us.

I don’t think that will quite work. We have a number of different pipelines that all have different directories that they should be checking out. We can’t scale linearly the number of Buildkite agents we have with the number of pipelines we have.

I’m exploring the plugin route as advised in another thread, and noticed from the documentation that:

Add plugins to [command steps](Command Step | Buildkite Documentation) in your YAML pipeline to add functionality to Buildkite.

Key word there is command steps. We have a multi-part dynamic build. Are we supposed to add this plugin stanza to every single command? There could be dozens? This is a huge amount of duplication to write a simple checkout? This could really be a make or break for us exploring other CI options.

Conceptually, it also doesn’t make sense when the checkout step is associated with the whole pipeline, and for some reason I’m writing a checkout hook per step, unless I really am understanding this wrong.

FWIW, here’s my attempt at a plugin: GitHub - pragmaplatform/sparse-checkout-buildkite-plugin

Hey!

We love when folks build plugins! and this is definitely an interesting one.

So, we had an internal discussion about this, and sparse checkout It’s super experimental at the moment and complicated to implement.
[pre/post] checkout hooks are ideal for this type of situation, so it’s recommended to use those hooks to prepare a workflow that works for your specific use case.

This is an idea of a pre-checkout hook; we don’t have experience with sparse checkout so it’s not fully working, but just to have an idea:

# Prepare the repository to be sparse
git init "${BUILDKITE_BUILD_CHECKOUT_PATH}"
git --git-dir
"${BUILDKITE_BUILD_CHECKOUT_PATH}/.git" remote add origin "${BUILDKITE_REPO}"
git --git-dir
"${BUILDKITE_BUILD_CHECKOUT_PATH}/.git" sparse-checkout set frontend

# Make sure you're using a shallow fetch
export BUILDKITE_GIT_FETCH_FLAGS="$BUILDKITE_GIT_FETCH_FLAGS --depth 1"

This article is really interesting, and maybe you can find some ideas on how to do it.

Cheers!

Great, thank you for the information.

Hi there, picking this up after a while.

I’m trying to implement a sparse checkout that looks like this:

I see this in my build log:

Preparing plugins
[2021-08-04T18:14:06Z] # Plugin "github.com/pragmaplatform/sparse-checkout-buildkite-plugin" will be checked out to "/etc/buildkite-agent/plugins/github-com-pragmaplatform-sparse-checkout-buildkite-plugin-v1-1-2"
[2021-08-04T18:14:06Z] $ cd /etc/buildkite-agent/plugins/github-com-pragmaplatform-sparse-checkout-buildkite-plugin-v1-1-2
[2021-08-04T18:14:06Z] # Switching to the plugin directory
[2021-08-04T18:14:06Z] $ git clone -v -- https://github.com/pragmaplatform/sparse-checkout-buildkite-plugin .
[2021-08-04T18:14:06Z] Cloning into '.'...
[2021-08-04T18:14:06Z] POST git-upload-pack (415 bytes)
[2021-08-04T18:14:06Z] remote: Enumerating objects: 40, done.
remote: Counting objects: 100% (40/40), done.
remote: Compressing objects: 100% (29/29), done.
[2021-08-04T18:14:06Z] remote: Total 40 (delta 12), reused 27 (delta 6), pack-reused 0
Unpacking objects: 100% (40/40), 10.56 KiB | 1.51 MiB/s, done.
[2021-08-04T18:14:06Z] # Checking out `v1.1.2`
[2021-08-04T18:14:06Z] $ git checkout -f v1.1.2
[2021-08-04T18:14:06Z] Note: switching to 'v1.1.2'.
[2021-08-04T18:14:06Z]
[2021-08-04T18:14:06Z] You are in 'detached HEAD' state. You can look around, make experimental
[2021-08-04T18:14:06Z] changes and commit them, and you can discard any commits you make in this
[2021-08-04T18:14:06Z] state without impacting any branches by switching back to a branch.
[2021-08-04T18:14:06Z]
[2021-08-04T18:14:06Z] If you want to create a new branch to retain commits you create, you may
[2021-08-04T18:14:06Z] do so (now or later) by using -c with the switch command. Example:
[2021-08-04T18:14:06Z]
[2021-08-04T18:14:06Z]   git switch -c <new-branch-name>
[2021-08-04T18:14:06Z]
[2021-08-04T18:14:06Z] Or undo this operation with:
[2021-08-04T18:14:06Z]
[2021-08-04T18:14:06Z]   git switch -
[2021-08-04T18:14:06Z]
[2021-08-04T18:14:06Z] Turn off this advice by setting config variable advice.detachedHead to false
[2021-08-04T18:14:06Z]
[2021-08-04T18:14:06Z] HEAD is now at 730b91a Debug
[2021-08-04T18:14:06Z] $ cd /var/lib/buildkite-agent/builds
Running plugin sparse-checkout pre-checkout hook
[2021-08-04T18:14:06Z] $ /etc/buildkite-agent/plugins/github-com-pragmaplatform-sparse-checkout-buildkite-plugin-v1-1-2/hooks/pre-checkout
PATHS='dir1 dir2'
echo 'Sparse checking out the following paths: dir1 dir2'
[2021-08-04T18:14:06Z] Sparse checking out the following paths: dir1 dir2
git clone --no-checkout git@github.com:pragmaplatform/myrepo.git
[2021-08-04T18:14:06Z] fatal: destination path 'myrepo' already exists and is not an empty directory.
[2021-08-04T18:14:06Z] 🚨 Error: The plugin sparse-checkout pre-checkout hook exited with status 128

When I check on the buildkite agent, I see this:

ubuntu@ip-10-200-1-232:/var/lib/buildkite-agent/builds$ ls
myrepo

Note the cd /var/lib/buildkite-agent/builds. Why is the pre-checkout hook being executed from this directory? Should it be doing it in /var/lib/buildkite-agent/builds/<my ip address>/myrepo?

Hey!

Good to hear there are some improvements on this plugin :slightly_smiling_face:
The pre-checkout hook will run just before your pipeline’s source code is checked out from your SCM provider. Whilst the checkout hook script will replace the default checkout routine of the bootstrap.sh script, so you can use this hook to do your own SCM checkout behavior.
As a suggestion, using the cd command on your script makes the plugin brittle and tied to your use case