From the host running the agent, can it be determined if the agent is executing a job?

Hi there,
I’m wondering if there is some way to inspect a running agent to determine if it is currently executing a job? My use case is that we run a fleet of macs, and to perform rolling updates I offline the agents until the updates are complete.
The mechanism I currently use to offline the agents is to launchctl unload <launch-agent-plist> which in turn sends a SIGTERM to the agent, but that is non-blocking, and so I still don’t know when the agent has finished up gracefully.

I might be missing something obvious.
Cheers, Pete

Hi @petergoldsmith-rea,

Hmm, you could send a TERM via launchctl and then loop waiting for pgrep buildkite-agent to be empty perhaps? Or for launchctl list homebrew.mxcl.buildkite-agent to stop returning a PID (... | grep -q PID)?

It would be lovely if Apple provided something like systemctl stop [--no-block] buildkite-agent, but I’m not aware of anything.

Cheers,
Sam

How do I prevent SIGTERM killing a job mid-flight? Just by having an excessively large cancel-grace-period? The jobs we run can sometimes exceed the hour mark.

Yeah, a SIGTERM will tell the agent to gracefully allow a job to finish without a timeout (docs), but launchd timeouts would also need to be considered. I don’t have enough experience to suggest how to do that, sorry! But it sounds like you’re on the right track. :+1:

Thank you!!! I would have been pulling my hair out had you not pointed me in the direction of launchd’s own behaviour for SIGTERM → SIGKILL

For anyone else that comes across this trying to do the same, you’ll want a 0 or very large value for ExitTimeOut in your launch agent plist

<key>ExitTimeOut</key>
<integer>0</integer>

From launchd man page:

 ExitTimeOut <integer>
     The amount of time launchd waits before sending a SIGKILL signal. The
     default value is 20 seconds. The value zero is interpreted as infinity.