Historical cost view per step and pipeline

We use the https://github.com/buildkite/elastic-ci-stack-for-aws to run buildkite agents in our own AWS account. We then have a few different stacks with the main breakdown of one stack for on-demand machines and another stack for spot machines. We then use these machines across a multitude of steps and pipelines. This attempts to avoid the cost of waiting for new machines to scale in and out. We have been facing the question of how much does a pipeline cost per day? We can see that we can view the price for the stack with associated tags via the AWS cost explorer but would love to have the same metrics with more insight. We have some analysis in the past where we call the buildkite API to see build and job runtime. However, this feels very heavy-handed and is not presented as nicely as the AWS cost explorer.

We thought about creating another scheduler that tags the ec2 instances with the pipeline that they are running and then removes it when there is no job that an agent is running. However, that becomes more difficult with multiple agents per EC2 instance. I think it would be really great to have some more insight into the cost per pipeline rather than the cost at the agent level!

Thanks!
P.S. Buildkite has been really amazing!

These sound like great ideas!

Sorry, I’m not familiar enough with cost explorer, and what sort of options are technically feasible. Can you amortize running costs for an instance partially to a tag, somehow? And is that pro-rated based on the period of time for which a tag is present?

If so, it sounds like some agent hooks which add and remove tags during a job’s lifecycle might work?

1 Like

@sj26 using the buildkite agent hooks seems like a promising start for this for the single-agent per instance strategy! I think that we could potentially integrate this into a plugin that we add to all of our steps as well

Can you amortize running costs for an instance partially to a tag, somehow?

Exactly, you can filter over time periods and tags based on buildkite-build or buildkite-job depending on your tagging strategy

And is that pro-rated based on the period of time for which a tag is present?

Edit: I’m not completely sure at what frequency it prorates. Since EC2 billing is per second but it may take some time for the tag based cost to propagate. I will look into this.

I mainly was hoping to get this on Buildkite’s radar in case others have this use case. I could see this being a more niche request given the different places that people are running agents.