I am facing an issue from yesterday.
After creating a new stack of buildkite for elastic builders, the creation of stack happens correctly, but when I trigger a job and instance is starting to create and initialising, the instance terminates automatically without start the job and keeps creating itself.
I am not able to find anything specific in the logs.
Firstly, I’d advise removing any screenshots that contain any value named token, just in case.
Are you using any custom variables or is this an out-of-the-box install? Are there any errors in the actual Cloud Foamation stack output, rather than the ec2 logs?
I am not using any custom variables.
I have list of parameters which are the same which I was using before.
I know there are a set of parameter names that were changes in version v6.x
But I used the same parameter file which I used before for the version V5.21.0
The issues are very strange happening with the latest version as well. They were working two days back but yesterday when tried recreating, I got this instance recreating again and again.
I use some custom variables in the /environment script. The variables are passed on build job is triggered.
They were working before as expected. If they are unset, the job should fail and expected behaviour.
I dont use any custom AMI. The stack uses the AMI mentioned in stack config file.
If you could email any logs you have to support@buildkite.com we’ll be able to dive deeper in to the issue and see what the cause is. It’s not clear from the error snippet but looks like maybe a file is missing.
With regards to the latest version stack errors, I have fixed. There was a file missing due to which it was happening.
But i came across an interesting case which I could not understand.
I have two stacks
Stack1:
queue:frontend
tags:build=true
Stack2:
queue:frontend
tags:deploy=true
Now when a job with tags
queue:frontend
tags:build=true
is triggered, stack1 obviously had another issue due to which instance was restarting, but a instance of stack2 was also created by elastic-ci but the job was not place on stack2’s instance (which i understand to be obvious)
Is queue name being same for two stack can cause confusion ?
Still looking for issue in V5.21.0 stack i have been facing.
The original issue is solved.
It seems there was an issue with the bootstrap script. The logs didnt clearly say what error, but some non zero exit code causing instance unhealthy and causing the termination of instances. Removing block of code from bootstrap script worked for me. Might need to investigating the cause but thats a separate issue.