Bumping versions in infrastructure repository

Hey!

Was wondering if y’all have any creative ways to solve a problem that I’m not seeing much prior art for on The Internets — or I’m failing to write the correct keywords on the search box :stuck_out_tongue:. We currently have a terrible, terrible solution that’s one of my least favorite parts of our code.

In a GitOps workflow, one would have (for example) two repositories, the app repository and the infrastructure repository. The latter would have the versions of the app that are currently deployed, and you’d presumably have something monitoring the infrastructure repository, making the current live state match the state in the repository, right?

Okay so, how are y’all bumping the version in the infrastructure repository when you make changes to the app repository. I think in an ideal situation, you’d build/release the app repository to a version, and percolate that version that was chosen into the infrastructure repository. Some challenges off the top

  1. Getting Buildkite to play with another repository in the same pipeline
  2. Presumably your infrastructure repository requires a +1 approval for compliance reasons, right? So getting the actual version bump into the main branch can be… annoying.

Some fine but maybe not great idea’s I’ve had:

For resolving 1, maybe the answer is having Buildkite trigger another pipeline that’s for the infrastructure repository but number 2 is just painful.

For resolving both, you could just have the deployed versions be in an external system (if we assume this is Kubernetes and you have ArgoCD, you could store the version in the ArgoCD App), but then the infrastructure repository doesn’t have a full view of the desired state.

For resolving 2, you could have the user making the commits be a special superuser (in GitHub parlance, Admin) and disable the +1 approval validation for that class of user… That works but feels icky, and falls down if actual humans are Admin too.

For resolving 2, you could have the user create a pull request and have some relatively complex orchestration to auto-approve the pull request using an alternate user, waiting for status checks, and committing it…

Anyways, been mulling this over and trying to research it on my own for a while, and have not come up with an answer I like, so figured I’d reach out for halp :slightly_smiling_face: :heart:

I have a similar problem. Not related to compliance exactly, but related to solving privilege escalation when you want to deploy the new version of the app, via the secondary infrastructure codebase/pipeline.

If the artifact for your application (e.g. a docker image) got built/published from a shared set of BK agents (anyone in the organisation can use these agents and release artifacts), then you need to get the reference for the new version into your infrastructure repo.

You could clone the infrastructure repo and write the new version reference to a file and push it back. This means through the shared nature of this agent, ANYONE can write to the infrastructure code base (via orchestrating an attack using the BK agents git credentials).

If you don’t want to give BK agents write access to code, you could instead trigger the infrastructure pipeline passing the new version reference as metadata. Now ANYONE can deploy a new version, but at least they can’t manipulate arbitrary code in the infrastructure repo.

If you have decent smoke tests in place, then this new version of the app can at least be verified before being deployed. The trade off is that it’s now a built more difficult to rollback or see what version of an app was deployed at any one time, as the version reference is never in code.

The most common GitOps pattern I’ve seen Buildkite customers use is to have the app pipeline commit changes to the infrastructre/deploy-config repo.

When compliance allows it folks prefer committing directly to main/master, but I’ve seen it implemented as opening pull requests as well. The pull request approach definitely complicates the workflow and makes CD harder, but then I guess that’s the point of compliance? GitHub does allow auto merging of PRs when checks pass now, which is neat. Maybe that’d help?

If the pool of agents that build the applications are shared by multiple teams with different levels of trust, then controlling who is allowed to make changes that flow through to the infrastructure repository will be a challenge.

One possibility that I’ve never seen anyone try but might work pretty well is to use GPG signing of commits. GPG isn’t very user friendly, but imagine a setup like this:

  1. all commits to the app repository are signed
  2. there’s a shared pool of agents that run build steps, and can push new docker images to a registry. They don’t have permission to push to the infrastructure repo
  3. there’s a separate pool of agents per team with a pre-command hook that checks the gpg signature against a whitelist of staff who are allowed to have code deploy for that team
  4. the final step in the application pipeline runs on the team-specific agent pool and (assuming the hook passes) it will commit the change to the infrastructure repo