Elastic CI Stack for AWS v5.0.0 released

Elastic CI Stack for AWS v5.0.0 has been released :tada:

What's new?

Previously experimental Lambda-based scaler is the new default 🚀

Our previously experimental fast autoscaler that scales up much faster is now always used. Instances will automatically scale to demand using the stack defaults and you should see good results without changing any parameters.

Experimental Windows support is now available ![windows|20x20](upload://dGbOAfU1uzuBOkvkoXUbsr5TQNR.png)

A new AMI built on Windows Server 2019 is optionally available to use in the stack for Windows builds. You can enable this by changing the InstanceOperatingSystem setting from the default linux to windows. This release includes recent updates to Windows stability on Buildkite agent v3.25.0. Huge thanks to @jeremiahsnapp and @tduffield for their contributions here!

Note: There is a known issue with graceful handling of spot instances under windows. The agent may not disconnect gracefully, and may appear in the Buildkite UI for a few minutes after they terminate (See issue #752). We recommend using Windows on-demand instances for now.

Added

Changed

  • Docker configuration is now isolated per-step #678 (patrobinson) #756 (yob)
  • Use EC2 LaunchTemplate instead of a LaunchConfiguration #589 (lox)
  • InstanceType default is now t3.large (was t2.nano) #699 (pda)
  • Disable AZRebalancing to prevent running instances being terminated unnecessarily #751

Dependencies updated

And much, much more, we recommend you view the full changelog for this release.

Upgrading

In most cases, you should be able to upgrade in-place older versions of the stack with a CloudFormation stack update using the following template URL:
https://s3.amazonaws.com/buildkite-aws-stack/v5.0.0/aws-stack.yml

If you want to launch a new stack, you can use this link (make sure not to use your production AWS account, create a new one for CI):

Launch Buildkite AWS Stack

If you were using them, the following parameters have been removed or reworked:

  • EnableExperimentalLambdaBasedAutoscaling was removed (it's the default now)
  • BuildkiteOrgSlug was removed – the information reported by buildkite-agent-scaler make it redundant, but consider buildkite-agent-metrics if you need more detailed metric monitoring that supports multiple metric backends
  • BuildkiteTerminateInstanceAfterJobTimeout, ScaleDownPeriod and ScaleCooldownPeriod are replaced by the more concise ScaleInIdlePeriod #586 (jeremiahsnapp) which lets agents self-terminate to scale in when they've been idle after a set period
  • BuildkiteTerminateInstanceAfterJobDecreaseDesiredCapacity and ScaleDownAdjustment were removed - instances will now always try to decrement the ASG desired count when their waiting period for new jobs has elapsed
  • ScaleUpAdjustment is replaced by ScaleOutFactor as the new lambda scaler calculates how many agents are needed at the time. ScaleOutFactor allows you to multiply the relative quantity provisioned to the ASG, either to slow it down or speed it up

Documentation

See the Readme for this release.

love the scaler, just got it working (in terraform). theres an outstanding pr that id love to see merged. it about having a buffer amount. i was literally thinking of writing this and went and looked and there was the PR, sitting unacknowledged since 2022.10.