Does agent-stack-k8s require podSpec?

I’m evaluating using agent-stack-k8s for Kubernetes pipelines. The documentation makes it seem like the “podSpec” field is required, and that it would ignore any “command” steps that specify a script, although that isn’t explicitly stated.

However, I ran the following test pipeline, and it succeeded with the expected output.

steps:
  - label: test
    command:
    - echo "Hello world top level" && sleep 60
    agents:
      queue: staging-k8s
    plugins:
    - kubernetes:
        gitEnvFrom:
        - secretRef:
            name: agent-stack-ssh

I also ran the following:

steps:
  - label: test
    command:
    - echo "Hello world top level"
    agents:
      queue: staging-k8s
    plugins:
      - kubernetes:
          gitEnvFrom:
            - secretRef:
                name: agent-stack-ssh
        podSpec:
          containers:
          - image: busybox:latest
            command:
            - /bin/sh
            args:
            - "-c"
            - "'echo Hello world in container'"

and only got the output “Hello world in container”, seemingly ignoring the “Hello world top level” from the command script.

I note that in both cases the corresponding pod manifest set the environment variable BUILDKITE_SCRIPT_PATH to contain the script I provided.

So I have the following questions:

  • Is podSpec actually required by the kubernetes plugin and agent-stack-k8s? Or is the first example’s behavior expected and officially supported?
  • If podSpec is provided, is ignoring the command script the expected behavior, or is running both scripts what’s expected? (i.e. is this a bug?)
  • If either behavior is expected, should the agent-stack-k8s documentation and/or examples be updated to note this use case? I would love for at least the former to be supported since that would make converting existing steps simple (just add the plugins: [kubernetes: {gitEnvFrom: agent-stack-ssh}] field), as opposed to completely rewriting the steps to embed their commands inside of the podSpec.

Hello again @mbarrien!

Thanks for the question - hope you’ve been well :wave:

For both pointers #1 and #2 - that can fortunately be answered with the way that the scheduler in the K8s plugin parses plugin definitions. Upon creation, a job wrapper is created and for which the plugins specified by it are parsed. Further down, the Kubernetes job is then built - and specifically, if there is no podSpec defined in the kubernetes plugin, defaults are used (agent container and substituted step (job) level command respectfully). The latter case where you had both a podSpec and a step (job) level command also explains the behaviour that you saw from the code above: since there was a podSpec definition in the kubernetes plugin, its config was parsed and hence why its echo/sleep commands were run from within the busybox container (as parsed and created by the controller)

The folks behind the Kubernetes stack are always keen on improvements that can be made with it - you’re welcome to submit any issue onto the repo: and in tandem, I’ll get this up to the folks to parlay into improvements (and potentially cleaner explanations for the order of config operations - especially if there is both a top-level command along with a podSpec definition).

Thanks for pointing me at the code and showing me the behavior is hardcoded in. (And I see that the behavior was added in Run command jobs with no plugin config by benmoss · Pull Request #36 · buildkite/agent-stack-k8s · GitHub). My biggest question in that case is, can we rely on this behavior (falling back on “command” if no podSpec is provided) officially, or can this behavior be removed in future versions of agent-stack-k8s, since it isn’t officially documented? I don’t want to rely on undocumented features based on examining what the code does, and then have it ripped away later on if your team decides to go a different route.

Thanks!

No worries! :slight_smile:

I’ve brought the callout of said functionality up with the repo owners: since its inclusive in code, you’re definitely implored to rely on the functionality - its just a case where the explanation that the above feature and respective PR was missed (and also with some examples too - which is where I’ve called out for a specific example / few to make it clear).