Considering the use case below…
- User triggers app realm migration.
- JIRA was able to send the scheduled migration hook to the app successfully
- JIRA was also able to send the start migration hook to the app successfully in which app starts the migration process from their side.
- During the migration operation, JIRA will constantly send the status migration hook to check the app’s migration status.
- BUT the app went down unexpectedly (server not running).
So my question is…
- Will JIRA trigger a rollback by sending the rollback migration hook since it was not able to successfully access the app via status hook?
- Let’s say after #2, the app also went down and JIRA sends the start hook, will it retry to send the start hook? or will it treat the app migration as failure and trigger a rollback?
Thanks.
Reference: https://developer.atlassian.com/cloud/jira/platform/data-residency/#data-residency-and-atlassian-marketplace-apps
1 Like
I also have some related questions:
Is there a retry policy for /rollback webhooks? In broader terms, what happens if rollback returns non-2xx response?
Also, what is expected behaviour if /status returns non-2xx response and/or response in non-compliant format? Will it also cause a rollback?
Is there maybe a possibility to consider retry for intermittent 502/503 responses?
2 Likes
Hi @StevenPila and @lexek-92, apologies for the delay in responding to this. I’ve answered each of your questions in-line below, hopefully this helps.
This is correct, any non-2xx response received during the migration to a status hook request will result in a roll-back occurring. Do you anticipate a high (or sufficient) frequency of failures to these intermittent requests?
We will retry sending the start
hook three times with a backoff delay. If we are unable to receive a 2xx response within three attempts, then that migration will be marked as a rollback and receive a rollback hook.
This is similar to start
, we will attempt to communicate this with your app two times, however the product will be brought back online upon the first request.
Any 2xx response (non-format specific) is ok, however any non-2xx response will result in a rollback.
Good question - this is something which we’re open to exploring given a 502/503 response could constitute a retry. However for now, these are treated as a non-2xx response.
1 Like
Hi @SeanBourke , thanks for your reply so just to clarify…
Since you mentioned /start and /rollback have a retry mechanism, does this mean,
- The hooks’ /schedule, /start, /commit and /rollback will retry to send for a max of 3 attempts if unable to receive 2xx response?
- and for /status hook, there’s no retry mechanism at all?
Thank you.
Hey @StevenPila,
Thanks for the reply. We’ve reassessed our endpoints and identified that they do not today. With that said, we also believe it’s unreasonable that one failed response would result in a failure for the entire migration, particularly given it’s potential to increase the complexity or cost of migrations for yourselves (if things can fail more easily/frequently) and increasing the likelihood of customers seeing failed migrations.
Given this, we’re assessing implement retries for hooks which could result in an otherwise immediate failure of a migration. For example, this means that a /status
hook would only move to rollback
when it:
- Provides an explicit status response of
failed
in a successful request
- Fails to provide a 2xx response over at least 3 three retries
The above would also apply in these circumstances.
2 Likes
Hi @SeanBourke,
Thank you for the confirmation. I agree, that would be really helpful to us.
Also, I would like to clarify regarding the error codes as mentioned here https://developer.atlassian.com/cloud/jira/platform/data-residency/#error-codes.
As stated in the /status hook only, to send the predefined error code, we just need to follow the format below.
{
“status”: “failed”,
“errorResponseCode”: “E0004”
}
But as mentioned also in the error codes section,
To help diagnose problems with migrations, we’re adding a set of standardised error codes that your app can report back to us with when you’re reporting back with a non-2xx to the hooks or the status retrieval.
Does this mean, if we want to send these predefined error codes for the other hooks (E.g., /schedule, /start, etc.), then we can use the following format above using a JSON with errorResponseCode
along with 2xx status code? or should we use the non-2xx status code?
Also, by the way, is there any ticket or page regarding your implementation for the hook retries? so we can watch for any updates?
Thank you.
Hey @StevenPila,
My apologies for a day in responding to this. A few quick updates:
-
We’ve updated our documentation to be more informative regarding the use of error codes. These can be included in any non-2xx response to provide more details about what happened. Hopefully this helps - please let us know if it isn’t clear and we can explore further updates.
-
Retries are coming soon, you can follow here on AC-2572. Where we receive a non-2xx response which does not include an errorResponseCode
in its response, we’ll try a few more attempts before failing the migration. This means a decreased likelihood of transient errors resulting in a full migration failure for customers.
1 Like
Hi @SeanBourke , no worries, thank you for the clarifications and addressing our concern. I really appreciate it.
Also, just wanted to ask, is there any plan already on when the retry mechanism will be implemented?
Hey @StevenPila,
We’ve released some improvements to the retry mechanisms for the data residency migration hooks. The related documentation has been updated to reflect this improved behaviour.