Buildkite Elastic CI Stack for AWS v5.0.0 released

Hot on the heels of our v5.0.0-beta1 release we’ve just released our new Elastic CI Stack for AWS v5.0.0. This has been in the works for a long time and has been a great opportunity for us to tidy some things up all in one go and make it easier for newcomers to Buildkite to work with. You should be able to perform a Cloudformation update in-place over your existing stack (tested with v4.5.0, but most older versions should work too). There are a number of small but possibly breaking changes (contingent on how you use the agents), so we recommend you migrate carefully.

Check out the GitHub release for v5.0.0 here!

If you find any bugs or have questions, feel free to reply to this topic, or email us at support@buildkite.com

What’s new?

Here are the headline features, but do check the release notes and changelog for more details.

Previously experimental Lambda-based scaler is the new default :rocket:

Our previously experimental fast autoscaler that scales up much faster is now always used. Instances will automatically scale to demand using the stack defaults and you should see good results without needing to change any parameters.

Experimental Windows support is now available windows

A new AMI built on Windows Server 2019 is optionally available to use in the stack for Windows builds. You can enable this by changing the InstanceOperatingSystem setting from the default linux to windows. This release includes recent updates to Windows stability on Buildkite agent v3.25.0. Huge thanks to @jeremiahsnapp and @tduffield for their contributions here!

Note: There is a known issue with graceful handling of spot instances under windows. The agent may not disconnect gracefully, and may appear in the Buildkite UI for a few minutes after they terminate (See issue #752). We recommend using Windows on-demand instances for now.

Summary of parameter changes

If you were using them, the following parameters have been removed or reworked:

  • EnableExperimentalLambdaBasedAutoscaling was removed (it’s the default now)
  • BuildkiteOrgSlug was removed – the information reported by buildkite-agent-scaler make it redundant, but consider buildkite-agent-metrics if you need more detailed metric monitoring that supports multiple metric backends
  • BuildkiteTerminateInstanceAfterJobTimeout, ScaleDownPeriod and ScaleCooldownPeriod are replaced by the more concise ScaleInIdlePeriod which lets agents self-terminate to scale in when they’ve been idle after a set period
  • BuildkiteTerminateInstanceAfterJobDecreaseDesiredCapacity and ScaleDownAdjustment were removed - instances will now always try to decrement the ASG desired count when their waiting period for new jobs has elapsed
  • ScaleUpAdjustment is replaced by ScaleOutFactor as the new lambda scaler calculates how many agents are needed at the time. ScaleOutFactor allows you to multiply the relative quantity provisioned to the ASG, either to slow it down or speed it up
1 Like

We’ve released a bugfix v5.0.1 release which fixes the following issue for Windows instance:

Fixed

  • Allow retrieval of agent token from parameter store on Windows agents #762 (chrisfowles)