Hi there,
I’m wondering if there is some way to inspect a running agent to determine if it is currently executing a job? My use case is that we run a fleet of macs, and to perform rolling updates I offline the agents until the updates are complete.
The mechanism I currently use to offline the agents is to launchctl unload <launch-agent-plist> which in turn sends a SIGTERM to the agent, but that is non-blocking, and so I still don’t know when the agent has finished up gracefully.
I might be missing something obvious.
Cheers, Pete
Hmm, you could send a TERM via launchctl and then loop waiting for pgrep buildkite-agent to be empty perhaps? Or for launchctl list homebrew.mxcl.buildkite-agent to stop returning a PID (... | grep -q PID)?
It would be lovely if Apple provided something like systemctl stop [--no-block] buildkite-agent, but I’m not aware of anything.
How do I prevent SIGTERM killing a job mid-flight? Just by having an excessively large cancel-grace-period? The jobs we run can sometimes exceed the hour mark.
Yeah, a SIGTERM will tell the agent to gracefully allow a job to finish without a timeout (docs), but launchd timeouts would also need to be considered. I don’t have enough experience to suggest how to do that, sorry! But it sounds like you’re on the right track.
Thank you!!! I would have been pulling my hair out had you not pointed me in the direction of launchd’s own behaviour for SIGTERM → SIGKILL
For anyone else that comes across this trying to do the same, you’ll want a 0 or very large value for ExitTimeOut in your launch agent plist
<key>ExitTimeOut</key>
<integer>0</integer>
From launchd man page:
ExitTimeOut <integer>
The amount of time launchd waits before sending a SIGKILL signal. The
default value is 20 seconds. The value zero is interpreted as infinity.