Timeout waiting for agent

We had an issue recently where our agents weren’t starting and the jobs were sitting around for days waiting on agents. While we can setup monitoring for this, I’d love to have an option to timeout on waiting for agents.

This would be a great feature to have, we occasionally update tool versions that are in the tags, and if people try to run builds on older branches, we don’t discover for a while that some builds have just been waiting forever

4 Likes

Echoing this sentiment. There’s already a timeout_in_minutes that limits a job run time after it’s running. I’d suggest either including agent-matching time in this or specifying a new configuration value (like agent_assignment_timeout_in_minutes).

3 Likes

Me too, would be very helpful for working with experimental agents (e.g. new hardware) which not always be online.

1 Like

Hi @simonbyrne we recently implemented default timeouts on all jobs, Default timeouts for command steps, which you can specify at the organization level on your organization’s Pipeline Settings Page

@jeremy That doesn’t appear to limit the wait time though?

1 Like

@simonbyrne my apologies! You’re absolutely correct - timeouts on jobs waiting for an agent still aren’t possible. I’ll pass this feedback along to the team though, as I think it would still be a great thing to have.

1 Like

Hi folks!

There is a global limit of 30 days:

We built it in a way that might be customisable in future, but it isn’t yet sorry.

2 Likes

Anyone choosing to run their agents on spot instances and running any sort of agent maintenance routines is vulnerable to a particularly annoying version of this issue – namely where you schedule things for specific EC2 identifiers (that then get reclaimed), leading to jobs that go on for days.

+1 for introducing a agent allocation timeout. +2 if we can monitor that metric on a account wide basis and setup alerting in crossing some thresholds.

1 Like

Hey @spaul!

Welcome to the community and thanks for your message!

As Sam mentioned above, we have note that customising this time frame is desired, but it’s not something we’re able to work on right now.

Re: metrics, this too is something we’re keeping an eye on. Have you investigated setting up your own alerting in AWS, or another cloud provider? Utilising services such as EventBridge and Lambda to measure agent events?

Cheers,

Ben

I could use this feature too.

I’m building the AWS AMIs in buildkite, and have steps to spin up a new instance using the new AMI, and then run a job on it to verify that it actually works. Today it failed (got stuck waiting for the agent to come online) as resizing the disk failed and the instance got wedged.

Hi @pme

Thank you for the feedback and sharing your interest in this feature request. At the moment we do have an open feature request for this but it is not part of our immediate roadmap. I shared your feedback with our product team on this request.

Thanks,
Suma

Hi everyone,

I am from the product team at Buildkite. Thanks for raising the feedback. As @suma said, it is not in our immediate roadmap.

However I just wanted to jump in and ask some questions.

  • The first one is about the 30 day limit. Is that too long? Would you rather have a shorter global limit? If yes how would you define this?
  • Secondly, do you have timelines that require a shorter timeout and some that require a longer timeout?

Thanks,
Oz

The 30 day limit seems far too long, in most cases something is wrong if there’s no agent within a few hours.

Would prefer a much shorter timeout for optional steps, ex. those with soft_fail enabled. It’d be best if this was configurable per step, just like the normal timeout.

Hey @rcheu

Welcome to our Forum!

This feature is on the roadmap for next year, we are going to make the timeout 7days and we would send comms out when this is done.

Cheers!

Hi everyone,

We just released out a setting which lets you configure a shorter timeout for scheduled jobs for your organisation. You can read about it here.

Let us know if you have questions.

Have a great day!