AWS Stack upgrade from v5.21.0 to v6.4.0

Agents are getting terminated immediately after getting assigned to a job. Please refer to UI and ASG snapshots.

Also looked inot lambda autoscaler logs and it seems lile its using 1.5.0 dev version
2023/08/31 00:29:58 buildkite-agent-scaler version 1.5.0 dev.

I have scalein period set at 2700s and max size 2 and min size 0.


Hi @farhan ,

Welcome to the Buildkite Support Community! :wave:

We’ve had a similar issue reported before with upgrading v5.21.0 stack to v6.0. However, creating a new stack with v6.4.0 should work fine. Please let us know if this works for you.

Cheers!

@lizette
Thanks for the reply I didn’t create a change set only deployed a new stack v6.4.0 with same parameter settings as we are using for 5.21.0 (ofcourse with updated names).
please find below parameter settings

[
  {
    "ParameterKey": "AgentsPerInstance",
    "ParameterValue": "1"
  },
  {
    "ParameterKey": "ArtifactsBucket",
    "ParameterValue": "artifacts-bucket"
  },
  {
    "ParameterKey": "AssociatePublicIpAddress",
    "ParameterValue": "true"
  },
  {
    "ParameterKey": "BuildkiteAgentRelease",
    "ParameterValue": "stable"
  },
  {
    "ParameterKey": "BuildkiteAgentTags",
    "ParameterValue": "autoscale=true"
  },
  {
    "ParameterKey": "BuildkiteAgentTimestampLines",
    "ParameterValue": "false"
  },
  {
    "ParameterKey": "BuildkiteAgentTokenParameterStorePath",
    "ParameterValue": "token_path"
  },
  {
    "ParameterKey": "BuildkiteQueue",
    "ParameterValue": "test-v6"
  },
  {
    "ParameterKey": "BuildkiteTerminateInstanceAfterJob",
    "ParameterValue": "false"
  },
  {
    "ParameterKey": "BuildkiteAgentEnableGitMirrors",
    "ParameterValue": "false"
  },
  {
    "ParameterKey": "CostAllocationTagName",
    "ParameterValue": "CostCenter"
  },
  {
    "ParameterKey": "CostAllocationTagValue",
    "ParameterValue": "AutoscaleCI"
  },
  {
    "ParameterKey": "ECRAccessPolicy",
    "ParameterValue": "poweruser"
  },
  {
    "ParameterKey": "EnableCostAllocationTags",
    "ParameterValue": "true"
  },
  {
    "ParameterKey": "EnableDockerExperimental",
    "ParameterValue": "false"
  },
  {
    "ParameterKey": "EnableDockerLoginPlugin",
    "ParameterValue": "true"
  },
  {
    "ParameterKey": "EnableDockerUserNamespaceRemap",
    "ParameterValue": "false"
  },
  {
    "ParameterKey": "EnableECRPlugin",
    "ParameterValue": "true"
  },
  {
    "ParameterKey": "EnableSecretsPlugin",
    "ParameterValue": "true"
  },
  {
    "ParameterKey": "InstanceCreationTimeout",
    "ParameterValue": "PT5M"
  },
  {
    "ParameterKey": "InstanceTypes",
    "ParameterValue": "m6i.large"
  },
  {
    "ParameterKey": "OnDemandPercentage",
    "ParameterValue": "100"
  },
  {
    "ParameterKey": "MaxSize",
    "ParameterValue": "2"
  },
  {
    "ParameterKey": "MinSize",
    "ParameterValue": "0"
  },
  {
    "ParameterKey": "ScaleOutFactor",
    "ParameterValue": "1.0"
  },
  {
    "ParameterKey": "ScaleInIdlePeriod",
    "ParameterValue": "2700"
  },
  {
    "ParameterKey": "ScaleOutForWaitingJobs",
    "ParameterValue": "true"
  },
  {
    "ParameterKey": "RootVolumeName",
    "ParameterValue": "/dev/xvda"
  },
  {
    "ParameterKey": "RootVolumeSize",
    "ParameterValue": "75"
  },
  {
    "ParameterKey": "RootVolumeType",
    "ParameterValue": "gp3"
  },
  {
    "ParameterKey": "RootVolumeEncrypted",
    "ParameterValue": "true"
  },
  {
    "ParameterKey": "SecretsBucket",
    "ParameterValue": "our-bucket"
  },
  {
    "ParameterKey": "AvailabilityZones",
    "ParameterValue": "us-east-1a,us-east-1d"
  },
  {
    "ParameterKey": "ManagedPolicyARNs",
    "ParameterValue": "our policies"
  },
  {
    "ParameterKey": "EnableInstanceStorage",
    "ParameterValue": "true"
  }
]

Hi @farhan ,

I do not see any issues with the above parameters. As you have mentioned that after the agents were assigned jobs, they have been terminated. Are you able to provide us some agent logs? You can read about how to get the agent logs here. You can send these through to support@buildkite.com and we’ll have a closer look at the errors you are facing.

@triarius Ofcourse however just for you information that even though EnableInstanceStorage=true stack v5.2.10 for m6i.large instances was working fine for us. Curious to know what changed in v6.4.0 that this parameter is not ignorable even if set to true when instance doesn’t have nvme.

This topic was automatically closed 24 hours after the last reply. New replies are no longer allowed.