Support a download origin env var for `artifact download`

Background

In several of our pipelines, we need to share an artifact between multiple pipelines or builds. Using the artifact uploader, we can specify a path to upload these artifacts to a deterministic location. However, there is no way to easily download these artifacts without either using the artifact rest api to search and find the the build which originally uploaded the artifact, to then download it - or directly interacting with the GCS apis.

We have currently worked around this by using the GCS rest api directly, but would love to see this workflow supported when using the artifact api.

Specific Ask

An arbitrary object can be downloaded from GCS/artifactory/SW3 by setting a BUILDKITE_ARTIFACT_DOWNLOAD_ORIGIN env var, which mirrors the behavior of BUILDKITE_ARTIFACT_UPLOAD_DESTINATION.

For example, an artifact uploaded using the command: BUILDKITE_ARTIFACT_UPLOAD_DESTINATION='gs://bucket-name/path/within/bucket' buildkite-agent artifact upload artifact-<hash>.tar.gz would be able to be downloaded using BUILDKITE_ARTIFACT_DOWNLOAD_ORIGIN='gs://bucket-name/path/within/bucket' buildkite-agent artifact download artifact-<hash>.tar.gz .

Note: A requirement for the solution would be not needing to provide the job id in which the artifact was originally uploaded

Hi! :wave:

Artifact downloads do honour the upload destination already. But the agent can’t search for artifacts outside of its current build, sorry. We’ve talked about adding pipeline or organization scoping to artifact searching, but that raises interesting security boundaries within an agent.

Could you use the Buildkite REST API to search for the artifacts you need? You can generate an appropriate API token for artifacts and pipelines, and use that within your builds to search and download the artifacts you need?

Cheers,
Sam

Hi Sam,

Yes, we could use the search API - thats one of the workarounds. However, it feels silly to do extra work to search for something when we already know exactly where it was uploaded - so thats why we’ve switched to using the respective storage APIs directly.

Can you elaborate on this a bit?

In our case, we already have to provide specific credentials for the buckets we’re uploading to - it seems like that security concern can be handled by the bucket owners, and wouldn’t need to be the concern of the agent - but maybe I’m misunderstanding.

Yes, we could use the search API - thats one of the workarounds. However, it feels silly to do extra work to search for something when we already know exactly where it was uploaded - so thats why we’ve switched to using the respective storage APIs directly.

Ah, if you know exactly which other build it was uploaded within, you can use buildkite-agent artifact download --build ${BUILD_ID} ... to download on the agent. But you can’t search for a build id, you need to know it already.

If you only have the url to the artifact in your bucket, note that includes the build id, and is roughly:

${BUILDKITE_ARTIFACT_UPLOAD_DESTINATION}/{org id}/{pipeline id}/{build id}/{job id}/{artifact path}

So the third UUID should be the build id, and you could supply this to the download command.

We’ve talked about adding pipeline or organization scoping to artifact searching, but that raises interesting security boundaries within an agent.

Can you elaborate on this a bit?

Sure! We’d love to add the ability to do something like buildkite-agent artifact download --pipeline other --build latest --branch master --state passed some/artifact/path.rb but this is a brand new capability that folks may not have modelled their agents and pipelines around, so we’d want to do it carefully. But it sounds like maybe you don’t need this, if you already know which build has you artifact, or where it is stored. :slightly_smiling_face:

Sorry to clarify - we don’t know what build the artifact was uploaded in. They have a deterministic path which has no references to the build or job id.

I assumed the search API would allow us to find things across builds, but double checking that reference that does not look to be the case, so using the REST API is also not an option.