Elastic-ci stack ec2 instance restarting and terminating

surajthakur · October 13, 2023, 5:12am

Hi Team

I am facing an issue from yesterday.
After creating a new stack of buildkite for elastic builders, the creation of stack happens correctly, but when I trigger a job and instance is starting to create and initialising, the instance terminates automatically without start the job and keeps creating itself.
I am not able to find anything specific in the logs.

I have used the stack version: “https://s3.amazonaws.com/buildkite-aws-stack/v5.21.0/aws-stack.yml”
and also tried with the latest. In both cases, I am getting the same issue.

The error logs i could find when running V5.21.0 is below

2023-10-13T15:52:28.000+11:00

Copy
Oct 13 04:52:28 ip-172-31-2-39 cloud-init: Oct 13 04:52:28 cloud-init[2586]: util.py[WARNING]: Failed running /var/lib/cloud/instance/scripts/part-003 [1]
Oct 13 04:52:28 ip-172-31-2-39 cloud-init: Oct 13 04:52:28 cloud-init[2586]: util.py[WARNING]: Failed running /var/lib/cloud/instance/scripts/part-003 [1]

I have used many params, but two of these are as below.

    {
        "ParameterKey": "MaxSize",
        "ParameterValue": "1"
    },
    {
        "ParameterKey": "MinSize",
        "ParameterValue": "0"
    },

The same version with same above parameter was working fine till now. I deleted the stack and tried to recreate and started getting this issue.

Any ideas ?

Thanks
Regards
Suraj

benmc · October 13, 2023, 5:18am

@surajthakur

Firstly, I’d advise removing any screenshots that contain any value named token, just in case.

Are you using any custom variables or is this an out-of-the-box install? Are there any errors in the actual Cloud Foamation stack output, rather than the ec2 logs?

Cheers!

surajthakur · October 13, 2023, 5:24am

Hi @benmc

I am not using any custom variables.
I have list of parameters which are the same which I was using before.
I know there are a set of parameter names that were changes in version v6.x
But I used the same parameter file which I used before for the version V5.21.0

surajthakur · October 13, 2023, 5:25am

The issues are very strange happening with the latest version as well. They were working two days back but yesterday when tried recreating, I got this instance recreating again and again.

surajthakur · October 13, 2023, 5:29am

I use some custom variables in the /environment script. The variables are passed on build job is triggered.
They were working before as expected. If they are unset, the job should fail and expected behaviour.

I dont use any custom AMI. The stack uses the AMI mentioned in stack config file.

benmc · October 13, 2023, 5:31am

Are you enabling any experimental features?

surajthakur · October 13, 2023, 5:39am

Not that I am aware of. They should be disabled by default I will assume.

benmc · October 13, 2023, 5:42am

Thanks @surajthakur!

If you could email any logs you have to support@buildkite.com we’ll be able to dive deeper in to the issue and see what the cause is. It’s not clear from the error snippet but looks like maybe a file is missing.

Cheers!

surajthakur · October 13, 2023, 5:52am

Thanks, I have sent the logs of /buildkite/elastic-ci/{instance-id} over the email.

surajthakur · October 13, 2023, 6:32am

With regards to the latest version stack errors, I have fixed. There was a file missing due to which it was happening.

But i came across an interesting case which I could not understand.
I have two stacks
Stack1:
queue:frontend
tags:build=true

Stack2:
queue:frontend
tags:deploy=true

Now when a job with tags
queue:frontend
tags:build=true
is triggered, stack1 obviously had another issue due to which instance was restarting, but a instance of stack2 was also created by elastic-ci but the job was not place on stack2’s instance (which i understand to be obvious)

Is queue name being same for two stack can cause confusion ?

Still looking for issue in V5.21.0 stack i have been facing.

Thanks

surajthakur · October 15, 2023, 10:56pm

The original issue is solved.
It seems there was an issue with the bootstrap script. The logs didnt clearly say what error, but some non zero exit code causing instance unhealthy and causing the termination of instances. Removing block of code from bootstrap script worked for me. Might need to investigating the cause but thats a separate issue.

system · November 15, 2023, 8:57am

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Buildkite elastic stack MinSuccessfulInstancesPercent Elastic CI Stack for AWS	7	263	May 30, 2023
AWS Stack upgrade from v5.21.0 to v6.4.0 Elastic CI Stack for AWS	5	426	September 12, 2023
Buildkite Elastic CI Stack for AWS v5.0.0 released Announcements	1	1564	November 9, 2020
[Beta] Elastic CI Stack for AWS v5.0.0-beta released Announcements	0	415	October 8, 2020
Elastic CI Stack for AWS v5.0.0 released Announcements	1	952	March 22, 2023

Elastic-ci stack ec2 instance restarting and terminating

Related topics