Agent Dashboard Sorting (version, job count, status)

It would be very useful for folks with hundreds/thousands of agents to have sorting capabilities on the Agent Dashboard ( buildkite.com/organizations/:org/agents ).

  • Sorting by version.
  • Sorting by OS.
  • Sorting by job count.
  • Sorting by status (connected).

These suggestions are great!

At this sort of cardinality I can see more filtering and sorting being important. Are these particular problems you’re trying to solve with these options?

Running hundreds/thousands of agents and wanting to be able to passively review the state of them.

As the team responsible for owning/managing thousands of agents there is obviously no shortage of active monitoring that alerts of anomalies (too few agents, spikes in volume, et cetera). This won’t be perfect though and will need continuous adjustment.

Being able to have a view of overall health of agents is likely the best form / desire here more-so than sorting but sorting helps solve it if that is more trivial to implement/request.

  • Do we have any agents behind on versions? Why? How many?
  • Are we running any unexpected OS versions? Did our OS bump miss any hosts?
  • Any hosts not taking jobs for some reason?
  • Any anomalies in status that our monitoring didn’t pickup or we need to add to a monitor?

An argument could be made to instead export all of this to an Observability product and build this out there.

1 Like

Yeah, we’ve had lots of folks do this sort of work via Datadog or similar platforms, and there’s ongoing work to improve it. We also have an AWS EventBridge integration which allows ingesting a lot of this stuff into the AWS family of tools for analysis. That might be the best way for now. But this feedback is wonderful, and will help us figure out better built-in tools.

Makes sense - we will keep going down that route / improving that path at this time.

Thank you-