Spin-up ESXi VMs with Buildkite

Hi there!

My current CI setup includes both macOS and Linux VMs on top of Mac Pros with ESXi/vSphere in MacStadium. One of the critical steps in the process is reseting those VMs to snapshot after every build, and spinning them back up to be ready for the next build.

I am trying to find a way to control/communicate with my vSphere API upon build start and end. I found https://github.com/macstadium/vmkite but it seems like it is no longer maintained.

Do you have any suggestions on how to spin up/down VMWare VMs during builds? Any written material you can refer me to?

Hey @rotemmiz :wave:t2: I wrote vmkite, but largely as an experiment. The primary issue there was the maintenance of the heavy-weight vmware images. The packer build process took 45+ mins and was fairly fragile. Beyond that, slow boot times made it harder to spawn VM’s on demand. I started investigating using vmfork (copy-on-write) for speeding up boot times, but ran out of time. My work is at https://github.com/lox/govmomi-vmfork.

It would certainly still be possible to do though! More than happy to assist if you are keen to proceed.

If you are able, I’d absolutely suggest going with https://veertu.com for the macOS virtualization side of things, Shopify did a great write up on it when they switched from ESXi: https://engineering.shopify.com/blogs/engineering/scaling-ios-ci-with-anka

Hey @lox, thanks for this super quick response. I read a bit about Anka a few months ago, and it does seem promising (wonder if the have METAL support, do you know by any chance?). I also read Shopify’s blog post now (thanks for the link), and it really seems like a heavenly setup :slight_smile:, we will give that a serious look.

Our current setup with Jenkins and ESXi is still pretty fast, we revert a machine to snapshot after every build, and it takes about 4 seconds for it to be ready for the next one.

Actually, I am looking for something fairly simple here, I just want have a hook so I can tell vSphere to revert the VM to snapshot, similar to what vSphere Cloud Plugin provides.
https://wiki.jenkins.io/display/JENKINS/vSphere+Cloud+Plugin

Not sure about METAL support, well worth dropping them a note to ask. They also have a Slack at https://veertuchat.slack.com.

Cool, let’s see if we can figure out the simplest way to get vSphere to revert a snapshot after a build. Were you imagining that you’d have the buildkite-agent running inside the VM’s, or outside them? I can imagine two models:

  • Agent running in a long-lived VM, receives jobs and runs them inside ephemeral VM’s. This means you have lifecycle control, you can target a VM, pass metadata to it via guestinfo and then revert snapshots afterwards.
  • Agent running inside VM. At the end of a job you could use a pre-exit agent hook to talk to the vSphere API and revert the snapshot. It might be worth running the agent in with --disconnect-after-job if you do this so that there isn’t a chance of accidentally picking up more work before the revert.

Do either of those sound like a fit?

Thanks for the elaborate reply. I will try to implement the second option, and see how it goes.
I’ll post the results here.

No worries! Happy to help.