I’m trying to confirm if this is a bug or not. We’re trying to programmatically update a “notify” step with a bash script with a custom message. However, if the “message” has spaces in it, it seems like the sent response to the BK agent isn’t properly parsed.
The “message” is incomplete (only contains “Hey!”)
The last double-quote is not escaped which is what I think is leading to the error
Also, the error we receive is misleading, which seems to get triggered because the JSON is invalid once parsed. However, I’m not positive since I know there’s a specific JSON error as well.
{
"message": "`notify` must be defined as an array"
}
I’m happy to provide more details if needed. Any help is appreciated!
It looks to me like the script generating the step isn’t working as expected. Can you share the code you’re using to generate the step that’s supposed to send the notification?
I also have a different issue I’ve noticed with the if property that I figured is worth mentioning. If I try to send "if": "build.state == 'failed'" (or some other quote combination: \", ", `) it is unable to parse it correctly. I can start a different thread for that if needed, but figured it was worth mention :)
Looping in with Jeremy on the above - and thanks for all the details so far
Are you setting the originalnotify in the same step as the step update command/the notification that you are trying to append to? The behaviour (as you’ve most likely worked out!) is that you can only append when updating a notify - but the multi-word message should be supported.
Are SLACK_USER and SLACK_MESSAGE both set through upper pipeline level env/passed into the build from an API? For the "notify must be defined as an array error (which I did encounter a few times before parsing the message like so:, have you tried tying in the variables into a single string?:
Can you say more about what you mean by original? From what I’m seeing if notify isn’t explicitly added to a step, it still has a default value of [] (which is why you need append, as I have found out ).
As for where SLACK_USER and SLACK_MESSAGE come from, the former is defined in the bash script and the latter is passed as an argument to the script (e.g. SLACK_MESSAGE="$1"). I have tried a single string, which still breaks due to the multi-word message.
Also, it’s worth noting that (from what I’ve seen) you must surround an interpolated variable in single-quotes (e.g. "@'$SLACK_USER'") so what you have above won’t work. This is the main issue I’m seeing with something like "build.state == 'failed'" where the buildkite-agent tries to interpolate failed as a variable and fails.
So what James is mentioning by original, is that when using buildkite-agent step update usually there is an existing step you’re modifying (and possibly an existing notify step). I think in this case it’s alright as you’re essentially trying to add a notification to an existing step.
Another follow-up question: is there a reason for trying to update an existing step instead of uploading a step dynamically to send the notify? It may also make more sense to use YAML instead of JSON, but if you have a requirement to use JSON for the step, you might consider also using jq to build the JSON string, example to better escape those variables
Hey, Jeremy. All good on the “original” part, that’s what I assumed. We should be in the clear for what we’re trying to do.
As for JSON vs YAML, we don’t have a constraint on JSON. YAML would likely be easier. Do you have an example for how to do that with the step update command based on what I’ve shared above?
Also, just to follow-up, I did attempt this with jq and am encountering the same error as with my original implementation. I think YAML could likely circumvent this, but it seems like there’s an issue parsing the JSON from the buildkite-agent.
James again - also thanks for explaining the above too! Can confirm: the original callout above was for referring to a step-levelnotify that you are trying to append (add) another notification with the step update command - something akin to this (and below):
On the above with the JSON methodology - defining a Slack message/user like below does work on my side - though of course, its more the case that said #general channel gets both of the messages - as depicted - said step update can only append/create a new message in the channel. Note that both are quoted too - this was run on a v3.60.0 agent:
env:
SLACK_USER: '@user' #Insert the specific user you need here
SLACK_MESSAGE: 'your command passed!'
steps:
- label: "Slack noti"
key: slack
notify:
- slack:
channels:
- "#general"
message: "Your command is about to be run."
command: |
echo "Before the update - shell script/command runs here"
buildkite-agent step update "notify" '[
{
"slack": {
"channels": [
"#general"
],
"message": "$SLACK_USER - $SLACK_MESSAGE"
}
}
]' --append --debug --debug-http
Note too - if you are using conditionals in the actual command on a step to deterministically update the notification inside a step (i.e build.state in particular) - the state of the overall build wont be determined - as a job might still be executing/running - and the conditional may not work as we intend.
On the premise on the above (also what Jeremy did mention above with uploading steps dynamically) - one idea I did think of in the interim from my above message was potentially wrapping this up in a triggering-triggered pipeline setup if that works on your part? I’m assuming you’d want users to know when a specific step (job) runs successfully, and in a A triggers B setup, you could potentially utilise a pipeline level notify with the resultant build.state check (which would be able to be run deterministically). Of course, it all depends on how you’d like to setup your pipeline/CI!
Thanks for the detailed response. For the sake of completeness, I’ll post exactly what I’m trying to do. I’m sure there’s a different way of doing this, but this is the solution we’re planning on running with for the time being, but I’m definitely open to future improvements. All to say that we have something that is working, but resolving some of the issues we’re seeing would lead to a better developer experience.
What we’re doing:
Groups within larger pipeline.yml for PR opened and PR merge (e.g. build.branch != main and build.branch == main)
First step in each group calls a slack-message-user.sh which adds a notify to the step in which it’s called
What’s working (sufficient for MVP):
The notify is added to the step w/o issue and sent to the appropriate channel on success/failure
What isn’t working (both non-blocking, but would be nice to have):
Adding an if condition from the step update command
Prevents us from only running on build.state == 'failure'
Adding a message with spaces from the step update command
Prevents us from sending a more complex message
Limitations we’re seeing:
Interpolated env vars from the shell script must be surrounded with single-quotes
'$SLACK_USER' works, but $SLACK_USER does not
Because of the above constraint, we can only use single-quotes in the shell script for interpolation
Something like "build.state == 'failed'" leads to an error along the lines of "Variable 'failed' does not exist"
pipeline.yml
- group: ':open-pull-request: PR Checks'
if: build.branch != "main"
steps:
- label: ':broom: All the things!'
key: 'pr-checks'
commands:
- '.buildkite/scripts/slack-message-user.sh'
# other commands...
slack-message-user.sh
#!/usr/bin/env bash
set -e
# Pull SLACK_USER from BK meta-data (set in get-gh-user.sh)
export SLACK_USER=`buildkite-agent meta-data get "slack-user"`
echo $SLACK_USER
# Update current step with Slack notification
buildkite-agent step update "notify" '[
{
"slack": {
"channels": [
"Company#channel"
],
"message": "@'$SLACK_USER'"
}
}
]' --debug-http --debug --step $BUILDKITE_STEP_ID --append
buildkite-agent step get "notify" --format json
If we were going to send a $SLACK_MESSAGE it would be used in the pipeline.yml as an argument passed to the slack-message-user.sh (aka as $1).That requires us to update the "message" like the below, which creates the initial error from the start of this post.
Really appreciate the write up - definitely sets the entire context which definitely helps with the end-to-end view on the setup.
The quote-on-quote “good” news is that I’m also able to get the same result as you with the single word being interpolated through the script/export variable methodology - albeit interestingly the original way that I posted above does still hold, so I believe its something to do with this being run through said script.
Just for completeness, what agent version are you folks running? Containerised/on the host directly?
After this, I’ll definitely loop our agent experts on this, as I think we’ve come across something that might/can be improved!
Thanks @hugh.mh - also dittoing the same on my end though a containerised v3.44.0 agent too.
Just one final context ask on my side - and for SLACK_MESSAGE in particular: are you folks setting it though a step beforehand akin to the SLACK _USER through a meta-data set and exporting that within the script? Or through an argument to the script (I’m assuming both)? Both methods of course would mean they’re available for the step update to interpolate (and also same result on my end), just assessing there (containers & environment propagation)
Hey, @james.s! I think I outlined this above, but here is is again, just in case:
SLACK_USER: Set as meta-data in a different script to propagate through the whole pipeline
SLACK_MESSAGE: Passed as an argument upon script invocation (e.g. `some-script.sh “Here’s my message”).
We do not set the slack message to meta-data as it seemed better for DX to set the message when calling the script; setting to meta-data and passing feels redundant since messages would be unique for each script invocation (since they live on different steps like linting, tests, serverless deploy, etc.).
Thanks again for all the help and LMK if I can provide any additional info
Thanks @hugh.mh - affirmed: I probably glanced over it in the convo above: but essentially mirrored what I was doing (username through meta-data, message through script’s arg). I also did other combinations (all through metadata, two script variables etc) and same results as you.
In light of this - I’ll get some agent assessment on this, thanks very much for all the context you’ve given too: definitely helps us!
Before running buildkite-agent step update, Bash has to parse the command, and part of that is processing single- and double-quoted strings. Bash is interpreting the single-quoted string as one argument, but Bash considers the string terminated in the middle by the single-quoted string inside the blob of JSON, not the last single-quote.
But luckily we have another way to get the new attribute into step update. Something to try is to provide the new attribute value via standard input. You don’t need a temporary file for this - a Bash heredoc (<< EOF .... EOF) such as this should work:
It’s not possible to use build.state at the Step level as a conditional for Steps due to the nature of it, as the Step would need to run so the Build State isn’t known as the Build is running.
It is possible to use build.state with the notify attribute if it is set at the Pipeline level, here’s an example and note about it from the docs:
But there is other ways to have a conditional notifications sent but this would require making use of dynamic pipelines. The docs have an example of this: