Hi, I’m using a scheduled job to do periodic maintenance on agents such as deleting downloaded dependencies—the sort of cleanup where it would be slow/wasteful to recreate it from scratch every single build but it becomes a problem if it churns and grows indefinitely.
Previously this cleanup was implemented as a cronjob local to each agent but if it ran at the same time as a build it would cause spurious failures. The current solution is a cleanup pipeline on its own branch that names every single agent specifically, with OS-specific cleanup steps against it. Each agent will finish what it’s doing, run the cleanup job, then be ready for the next job.
This cleanup pipeline is a maintenance headache as agents come and go. Is there any way I can use Buildkite’s tools to make every online agent matching a particular tag run a set of steps? Or otherwise solve this problem differently? Thanks!
That’s a tricky question! It’s not possible at the moment to achieve exactly that, but it is something that was discussed internally with the team.
Alternative, what other users do is one or a combination of the following:
schedule a build that runs and targets agents i.e they could schedule a build that runs every 6/12/24 hours to clean up all resources: Scheduled Builds | Buildkite Documentation
create a step at the end of the pipeline that runs at the end of every build even if the other step fails (except if the user initiates a cancellation), like:
steps:
- command: exit 1
key: "a"
- command: echo "do this thing first"
key: "b"
depends_on: "a"
- command: echo "clean up the things"
label: "clean up task"
depends_on: "b"
allow_dependency_failure: true
Hello, have you had any new developments that would help with maintenance on the agent hosts?
We have a group of unclustered agents that run random jobs from random pipelines as builds start, and we’d like to run maintenance scripts at regular intervals on those agents. We want that the host will only be running that one job at the maintenance time.
We are looking for a solution that allows us to queue these maintenance jobs with high priority, which would be assigned to a specific agent. Eg: one maintenance job per agent, with high priority (so this would allow current work to gracefully finish and then run maintenance as the next task).
Is this possible to do with buildkite? Maybe we can add specific steps through some pipeline upload?
Yes, you can prioritize maintenance jobs in Buildkite by setting a high priority value (e.g., priority: 100) to ensure they are picked up immediately after other jobs finish. You can target specific agents using tags or queues and prevent overlapping maintenance jobs by using a concurrency_group with a limit of 1. It will help the agent to handle maintenance and guarantee it won’t pick up new jobs until maintenance completes. More details are available in our[ documentation].(Job prioritization | Buildkite Documentation)
Hope this helps! Feel free to contact us if you have any questions support@buildkite.com.