Unable to cache after migrating from stack v5.11.0 to v6.22.1

Hi, I am a newbie and I have been fighting a cascade of issues in our pipeline that uses Buildkite Elastic CI Stack for AWS.

Our pipeline is a bit old and it uses stack version v5.11.0 and a bunch of old plugins. An issue with one of those plugins made me upgrade it to v6.22.1. The stack got successfully deployed using the aws CLI. So that’s great!

Now I am migrating a pipeline that was running jobs on the old stack (unclustered agents) to a new one (clustered). Unfortunately there are gaps in Buildkite’s documentation on how to set this up correctly, so here’s what I tried:

  1. add new cluster
  2. defined 2 queues, default, 16cores-1agent (as named in stack parameter BuildkiteQueue)
    • I picked self-hosted infrastructure there
  3. switch the pipeline from unclustered area to this cluster
  4. try building a branch that is known to build on the old stack

I get this strange error, and I don’t know where it is coming from, and why?:

Have I configured something wrong? Does plugin nienbo/cache-buildkite-plugin requires some special parameter/configuration?

Here’s a snippet from the pipeline upload YAML:

cache: &cache
  id: my_app
  backend: s3
  key: "v1-cache-{{ id }}-{{ git.commit }}"
  restore-keys:
    - "v1-cache-{{ id }}-{{ git.commit }}"
  s3:
    bucket: my-test-bucket
  paths:
    - .

steps:
  - label: ":buildkite: meta-data"
    command: .buildkite/scripts/set_version_meta_data.sh my_app

  - wait

  - label: ":cpp: build my_app"
    command: .buildkite/scripts/build_my_app.sh
    plugins:
      - chronotc/metadata-env#v1.0.0:
          keys:
            - my-app-release-version=MY_APP_RELEASE_VERSION
      - docker-login#v2.1.0:
      - docker#v5.3.0:
          image: "docker-registry.something.com/images/ubuntu-22.04:0.2.0"
          environment:
            - MY_APP_RELEASE_VERSION
      - nienbo/cache#v2.4.16: *cache
      - artifacts#v1.8.0:
          upload: "build/my_app/my_app-*"
    agents:
      queue: "16cores-1agent"

I have studied the plugin’s documentation and all logs on CloudWatch. Nothing hints as to what could be the cause. Can anyone tell me what could be the issue?

Hey Gaur :wave:

Welcome to buildkite community: )

That’s an intresting one! Would you be able to send over the build details to support@buildkite.com, so we can have a look into this further?. Also, here’s a document that might help you with the migration process Manage clusters | Buildkite Documentation

Cheers!
Athreya

Thank you, Athreya! I just now sent the build details to support@buildkite.com.

Cheers For sharing those details Gaur!

We are currently looking into it and will revert back on the email with our findings : )

Regards

Athreya

To round out this post with the answer we worked out via email:
The problem is the stack upgrade brought with it an agent version upgrade that supports hosted agents. Hosted agents also support cache volumes. This made the YAML cache key reserved on the new agent version and cause the pipeline parsing failure.

To fix this, simply renaming the top level key from cache to cache_plugin fixes the issue.

Eg.

- cache: &cache
+ cache_plugin: &cache
  id: icfse
  backend: s3
  key: "v1-cache-{{ id }}-{{ git.commit }}"
  restore-keys:
    - "v1-cache-{{ id }}-{{ git.commit }}"
  s3:
    bucket: my-test-bucket
  paths:
    - .

1 Like

Thank you, Jarryd and Athreya!

It worked like a charm! :+1: :slight_smile: I didn’t suspect it at all.

Have a great day!