Queue wait times metric

The build-agent-metrics application monitors queue metrics, like RunningJobCount, ScheduleJobsCount, UnfinishedJobsCount, etc. These metrics could indicate an issue with the system, but they usually suggest that the system is under load but running normally. The team I’m working on uses these metrics to alert us when the system is having an issue, but these alerts are frequently false alarms.

Long wait times always indicates a problem for our users. I believe a useful feature would be to track queue wait times, similar to build-agent-metrics.

Hey @mat! Sorry for such a delay getting back to you on this one. Did you come up with any workarounds in the meantime?

I just wanted to get a bit more info on your idea if that’s alright :grinning_face_with_smiling_eyes: By ‘queue wait times’ do you mean the time a job sits in a queue before the agent picks it up? And where/how would you want to consume this info? Would you need it from an API, or would something in the UI suit? Would you be pulling it into your own dashboards?

+1 to this.

On a related note, how is WaitJobCount defined?

Hey @clambertops, hrmm not sure about that value, where about’s are you seeing WaitJobCount?

I see it in CloudWatch after upgrading to 5.1.0 from 4.5.0:

Skimming the release notes for the stack and the agent, I don’t see any mention of this. Nor do I see any mention of it here:

Additionally, the Total and Idle count metrics are not working. You can see all of this here:

Shared with CloudApp

I’m quite at a loss. Hoping someone here can point me in some helpful direction.

Thanks,
Chris

Ahh, I see it! You are correct - it was added into the V5 release.

It is in the V5 changelog buildkite-agent-metrics/CHANGELOG.md at c8145a178990ff59994eb45e4e1cd4c91fc411e1 · buildkite/buildkite-agent-metrics · GitHub and It refers to jobs that are waiting behind a wait step and is used for pre-emptive scaling.

You can see it in the code here buildkite-agent-metrics/collector.go at c8145a178990ff59994eb45e4e1cd4c91fc411e1 · buildkite/buildkite-agent-metrics · GitHub

@clambertops If you’re having any difficulty with your stack, we can take a deeper look for you if you send us in a message at support@buildkite.com.