The build-agent-metrics application monitors queue metrics, like RunningJobCount, ScheduleJobsCount, UnfinishedJobsCount, etc. These metrics could indicate an issue with the system, but they usually suggest that the system is under load but running normally. The team I’m working on uses these metrics to alert us when the system is having an issue, but these alerts are frequently false alarms.
Long wait times always indicates a problem for our users. I believe a useful feature would be to track queue wait times, similar to build-agent-metrics.