New Jira Cloud Webhook Retry Policy

Hi everyone,

I’m happy to announce that Jira Cloud webhooks have recently become much more resilient. We’ve rolled out a new retry mechanism that repeats failed requests.

Retries are attempted when any of the following are true:

  • the callback server returns any of the following status codes: 408, 409, 425, 429, 5xx.
  • the connection fails or times out.

This means that:

  • some webhooks might be delivered more than once (if the delivery acknowledgment fails).
  • webhooks might be delivered later than usually (up to 30 minutes, subject to change).
  • you might need to modify your integrations to take this into account (e.g. check the webhook timestamp or the special retry header, more on which in the next paragraph).

The X-Atlassian-Webhook-Retry header with the current retry count is included with webhooks that have been retried. We recommend monitoring this header and cross-referencing it with the callback server logs to stay on top of any unexpected reliability problems.

We hope this will allow you to fully rely on webhooks without the need of resorting to periodical polling.

See the official documentation here: https://developer.atlassian.com/cloud/jira/platform/webhooks/#retry-policy

5 Likes

Hi, does it mean it is live ?

Is my understanding correct: If the server of an app returns any mentioned status codes as the result of the webhook, the webhook will be sent again by Jira?

Hi @kkercz,
does this apply to post-function /triggered calls too (since they rely on the same mechanism as webhooks)?

Just to be clear: this can have a huge impact both for Atlassian and vendors, because previously vendors may have treated the web hooks as a one-off, hit and run type of request.

By implementing a retry mechanism with acknowledgement (correct response codes and no timeout) this might result in a cascading number of requests for vendors if they have previously not handled requests properly.

You might want to be monitoring the number of requests that are being sent to vendors and see if this increases gradually over time. Also, you might have wanted to notify vendors prior to rolling it out :speak_no_evil:

2 Likes

Yes, it’s live.

That’s correct.

Yes.

True. We realise the roll out of this was not perfect. I guess we got carried away by the prospect of potential value and the desire to solve the long-standing pain-point as soon as possible, without properly assessing the implications.

However, we do have monitoring and so far the negative impact was minimal. Fortunately, even If someone was previously not aknowledging webhooks correctly, it is easy to notice and fix.

@kkercz Is this also impacting the initial delivery of Webhooks? We are seeing quite some delay between for example the UI action (e.g. updating an issue) to receving a webhook (more than 5min)… This is unfortunate because we are using this for syncing stuff, and the delay might confuse customers…

@tobias.viehweger the initial delivery of webhook should not be affected - the delay applies only to the retries.

Hi,

We’ve receive reports about the delay in Devhelp and have created a public facing issue for it. Please see [ACJIRA-1908] - Ecosystem Jira

Cheers,
Anne

2 Likes

Hi everyone,

Due to the high load put by the number of retries on our infrastructure we’ve temporarily reduced the number of retries to 1.

1 Like

Sounds like a worthwhile enhancement. Does this apply to webhooks initiated from bulk updates to issues?

This applies to all Jira Cloud webhooks, they are all sent in the same way.

Awesome, now we need full JQL support for dynamic webhooks. :slight_smile:

Could it be that this is because vendors were unaware of the change and have yet to implement the correct request handling with proper response? :thinking:

4 Likes

My apologies for sounding a bit sarcastic, but this strikes me as very basic stakeholder management for which product owners are responsible, especially for changes that require adjustments from third parties. I expect better from a company that wishes to be an advocate for software development best practices.

3 Likes

Great update! Beside all the technicalities, it is a move into the right direction!

2 Likes

Hi,
This is a nice addition to the webhooks!

Are these changes applied to the lifecycle events as well?

Cheers

Yes, this applies to all webhooks in Jira Cloud.

1 Like

Hi Krzysztof,

thanks a lot for introducing retries! How would you propose to deduplicate requests? Is the response body guaranteed to be the same on the retry? Or is there another header with a correlation ID that we could use (staying the same across retries for one request)?

Hi,

The body is always the same, so if you are afraid of duplicates, just keep an eye on the “X-Atlassian-Webhook-Retry” header and discard retried webhooks that you believe you’ve already processed.

1 Like

Thanks for the confirmation.

Ideally we’d like to have another header that allows us to relate retries to their original requests. In theory, we might have multiple (separate) requests having exactly the same body within a short time period. If some of those are retried it’s impossible for us to know how many unique, non-retried requests there were to begin with.

Example:
00:00 request 1, body “ABC”, retried header = null
00:01 request 2, body “ABC”, retried header = 1

We cannot tell if request 2 is now a retry of request 1 or another, separate request altogether, whose initial request got lost due to connection issues. This could be fixed by introducing a correlation header, e.g.:

00:00 request 1, body “ABC”, retried header = null, correlation header = “def”
00:01 request 2, body “ABC”, retried header = 1, correlation header = “def”

Now we know for sure that request 2 was a retry of request 1.

00:00 request 1, body “ABC”, retried header = null, correlation header = “def”
00:01 request 2, body “ABC”, retried header = 1, correlation header = “ghi”

Now we know for sure that request 2 was a separate request and not a retry of request 1.

It’s very important for our app to properly detect and ignore any duplicate requests, so doing deduplication e.g. with a hash of the body might not work for all of our customers.

2 Likes