In case others run into anything similar in the future:
For some reason, setting disconnect-after-idle-timeout did not fully work for me (at least, as I interpreted its functionality from this forum post). It would stop the buildkite-agent service on the underlying server/instance, but it would not ‘terminate the instance and decrement the autoscaling group desired count atomically’ as stated above.
I’m not sure why this is the case, but regardless, a workaround is as follows:
In your buildkite agent’s cloud-init (or similar) script after you start the buildkite agent, run a background script that does something like:
#!/bin/bash
sleep 300
while service buildkite-agent status | grep -q 'Running'; do
sleep 60
done
INSTANCE_ID=$(curl -s http://169.254.169.254/latest/meta-data/instance-id)
AWS_REGION=$(curl -s http://169.254.169.254/latest/meta-data/placement/region)
aws ec2 terminate-instances --instance-ids "$INSTANCE_ID" --region "$AWS_REGION"
Depending on your ASG config, you may need additional commands before the terminate to decrement the ASG desired-counts accordingly. Feels pretty janky but it works.