We’re having problems with autoscaling our agents. Currently we are not running builds as often as we used to so would like to have the flexibility to only have EC2 instances spin up to support Buildkite agents when they are needed. I can see in the Cloudwatch logs that there are six scheduled jobs and six unfinished jobs. Could that be causing the problem?
I found the cause of this problem. We had six very old jobs that were still running. I eventually found out how to see them via the URL https:/buildkite/organizations//builds?state=running, thanks to UnfinishedJobsCount stuck at >0 · Issue #72 · buildkite/buildkite-agent-metrics · GitHub. Once I’d cancelled these old jobs, autoscaling started working again!
Also, I should note that I incorrectly assumed that this was within the scope of Elastic CI Stack for AWS. Whilst we use AWS, our Buildkite configuration is customised via Terraform.