BUILDKITE_AGENT_JOB_API_SOCKET empty or undefined

Hi.

I upgraded our Buildkite Elasticstack from version v6.41.0 to v6.41.2
This in turn upgrades the agent from v3.103.1to v3.105.0

We have build steps that run the following to fetch an OIDC token so we can assume some other IAM roles later using it.

_OIDC_TOKEN=“$(buildkite-agent oidc request-token --audience sts.amazonaws.com)”

After the upgrade, this command is throwing the error:

buildkite-agent: fatal: failed to create Job API client: BUILDKITE_AGENT_JOB_API_SOCKET empty or undefined

On reverting back to the older Elasticstack again, the error is gone.

I did not try the v3.104.0 stack in case the problem was there as well, as I didn’t want to risk more downtime for our developers.

I’m also running the v3.105.0 agent standalone in some Kubernetes pods for performing deployments within the cluster.
These also run the buildkite-agent oidc request-token --audience sts.amazonaws.com command, but do not trigger the same error, so I guess it might be Elasticstack specific?

When I review the release notes for the stack and the agents, I see that the agent v3.104.0 adds some OIDC token redaction, so I don’t know if that could be somehow related?

In case it helps, this is a build that encountered the problem

https://buildkite.com/healthengineau/megatron/builds/214299/steps/table?jid=01993c4e-6d56-45f0-b879-4b6d5c7789f6

Hey @Jim ,

Thanks for sharing all the details. We are taking a look at this and will keep you posted on any updates.

Thanks,

Dahtey

Hey @Jim,

This relates from a change in this PR on the Agent codebase: https://github.com/buildkite/agent/pull/3450.

The Job API has been around for a while and is enabled by default within the agent. It’s been used within a few less popular commands previously. The Job API allows sub-commands to communicate directly with the agent process running the job via a socket, which is required for redaction (as the buildkite-agent oidc request-token runs within the job in a sub-process but redaction occurs in the agent process).

You’ll need to make BUILDKITE_AGENT_JOB_API_SOCKET , BUILDKITE_AGENT_ACCESS_TOKEN available as environment variables within the runtime environment to run buildkite-agent oidc request-token with redaction by default and you’ll also need to mount the per-job socket specified in $BUILDKITE_AGENT_JOB_API_SOCKET to the environment.

There’s a good example within the docker-buildkite-plugin that could be helpful guide. We also added --skip-redaction which preserves the old behaviour (and doesn’t require the Job API), but the benefit of redaction is that you can avoid those tokens leaking into build logs.

As such, you could modify your command as follows to resolve this issue:

_OIDC_TOKEN=“$(buildkite-agent oidc request-token --audience sts.amazonaws.com --skip-redaction)”

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.