Up until now our process to handle communication during API incidents impacting apps in production has been a bit ad-hoc - using emails, etc. With more apps in production we realized that wasn’t going to scale, so we’ve looked at how to improve it. We now have a new solution, and in true Atlassian style it involves Jira of course And Statuspage!
What is an incident?
If your app is working in production one day, and stops working the next because of issues with the Stride/Atlassian API: that’s an incident . It could stop working for all users or a subset of users. Examples:
- Elevated 500 error rates
- Sudden spike in 403 errors for conversations the app is meant to have access to
- Increased latency in API calls
However if you are trying to implement an app and:
- You find a bug with the API: that’s a bug, not an incident
- you find a missing feature in the API: that’s a feature suggestion, not an incident
How do I report an incident?
First, check out the Atlassian API Statuspage - we might already know about it. If the Statuspage is empty and you think something’s broken in the Stride API:
- Go to https://support.atlassian.com/contact/#/
- Select Technical issues and bugs
- Select Stride
- For incidents, create the ticket as Level 1 and start the summary with “API incident:”
- This will create a Jira ticket for our 24/7 support team
Please provide as many details as you can to help us fast-track this process. And please only use this process after you’ve ruled out issues in your end (e.g. a recent deployment), as someone will end up getting paged
How can I best help you troubleshoot the issue?
Make sure to log the trace-id that we send back with every REST API call response, in the HTTP header. That will help us greatly to trace the call and diagnose the issue.
What happens next?
Someone from our support team will get back to you, diagnose the issue, and escalate to the relevant team at Atlassian if need be, depending on what the issue is. You will receive emails for all updates made to the Jira ticket.
One of the first things we’ll do is try to identify the overall severity of the incident:
- Crisis incident with maximum impact
- Critical incident with very high impact.
- Major incident with significant impact
- Minor incident with low impact
Depending on the severity different rules will apply for followup communications and related SLAs.
- All incidents with severity 0,1 and 2 will end up on the Atlassian API Statuspage
- For incidents of severity 3 (the vast majority of incidents) the communication will be done on the ticket.
How can I be notified of an incident?
Make sure to subscribe to the Atlassian API Statuspage via email or SMS.
You can also choose which APIs you care about:
How do I report a bug or ask for a new API feature?
Keep using the community forum, and the public issue tracker. Please use the “Developer API” component.
Thanks again for building apps for Stride!
The Stride API team