New Jira Cloud Webhook Retry Policy

kkercz · September 20, 2019, 9:18am

All Jira webhooks have a timestamp in the body, so you should be able to identify them by that: timestamps should be pretty unique; I don’t expect two different webhooks to be sent at exactly the same time, let alone two webhooks with otherwise the same content.

To make it extra clear: the timestamp is the time the event was generated, not when the webhook was sent, so it will remain unchanged across retries.

If that’s still not enough, could you create an ACJIRA ticket for your header proposal?

jbevan · September 20, 2019, 10:04am

Last time I looked, postfuntion webhooks did not contain a timestamp. We’d love a correlation ID too, so we don’t have to compare event bodies or hashes etc

kkercz · September 20, 2019, 10:30am

Alright, standard Jira webhooks have timestamps, but post-functions and maybe some other webhooks might not. Also, it’s possible to subscribe to webhooks without a body, so you wouldn’t get the timestamp in this case either. I’ll put adding the header into our backlog.

BenRomberg · September 20, 2019, 12:10pm

Thanks a lot! We use both webhooks and post-function triggers and would need the correlation header for the latter.

kkercz · September 27, 2019, 6:47am

I’m happy to say that we’ve just added a new header sent with every webhook: X-Atlassian-Webhook-Identifier that contains a unique webhook ID, the same across retries. Note that each tenant has their own pool of IDs, so to uniquely identify a webhook you actually need a pair of <tenant, webhookId>.

jbevan · September 27, 2019, 10:57am

Awesome thanks!

BenRomberg · September 27, 2019, 12:38pm

Thanks a lot for those continued improvements!

In order to bridge the gap towards 100% reliability for webhooks, there’s now a proposal of providing a REST API for the webhook request history. Please vote/watch ACJIRA-1981 if you’d benefit from something like this. Thanks!

david2 · September 27, 2019, 3:59pm

Does it also apply to post-function /triggered calls (which are technically webhooks)?

If so, I have a question: how can we use this to make post-function executions more reliable? I understand that if the /triggered call gets “lost” in transit, Jira will automatically retry the call at a later time (but that’s unrelated to the new header). But if the call does make it to the app, how do we take advantage of the header to handle cases where an error occurs (such as a 403 or 429 error returned by Jira) during the execution of the post-function? I understand that we could return an http error, but unfortunately we can’t because the actual post-function execution is handled by worker processes, all our web-facing processes do is queue the post-function execution and that rarely fails… This architecture was the only way to handle the variable flow of post-function calls from Jira and the very variable processing time of each post-function.

Anything we can do with the X-Atlassian-Webhook-Identifier? Could it be used to identify the “root cause” of the REST calls the app will make during the post-function execution? And eventually be used to create some equivalent of a transaction system (i.e. “eventual consistency”)?

kkercz · September 30, 2019, 6:15am

Yes, the header is included in post-functions as well, and retries work in exactly the same way for post-function webhooks. But I’m afraid there is not much else you can do with it except for identifying retried webhooks that you have already seen. The problems you are describing have more to do with the infamous problem with DB connections, not webhooks. The way we send webhooks is hardly related to that.

david2 · September 30, 2019, 6:40am

Actually it’s not just about DB connections, which are a hot topic right now, but about data consistency in general, whatever the error is that prevents the post-function from completing its execution. But you’re right, it’s not related to the retries per se. I was just hoping that the id in the header could be used as the foundation for eventual consistency.

kkercz · November 28, 2019, 7:05am

We’ve just released a new API for retrieving webhooks that failed delivery: Get failed webhooks.

fboucquez · December 11, 2019, 4:07pm

Is this feature on?

I’m seeing the x-atlassian-webhook-identifier but not the X-Atlassian-Webhook-Retry header. I’m also forcing a 500 error but the webhook is not resent.

Do I need to do anything to enable the feature?

/webhook/failed is returning an empty

{
  "maxResults" : 100
}

jtrzebiatowski · December 12, 2019, 1:02pm

Yes, this feature is enabled. You can’t enable or disable the feature manually.

The header X-Atlassian-Webhook-Retry is included only in webhooks that have been retried.
The webhook can be retried up to 15 minutes later.

The Failed webhooks API will list the webhooks only after all webhook retry attempts failed. Currently, all webhooks attempts can take up to 75 minutes.

sank · August 13, 2020, 1:23pm

Hi @kkercz,

what is the request timeout for webhooks?

We’re interested because when we record the webhook internally, we retry upon failure; but it doesn’t make sense to retry past the time when Jira decides to resend the webhook. So we’d like to place a timeout on the webhook recording process based on Jira’s webhook request timeout.

Thanks
Igor

Kilian.B · November 20, 2020, 9:08am

@jtrzebiatowski is this only a Jira feature? I can not find a x-atlassian-webhook-identifier in confluence webhooks? Is it on the roadmap to align the behavior of webhooks across different products?