Our app had 87 function invocations but was rate limited (FYI Forge has 1200 invocations / minutes limits which are per app, not per instance). I raised a support request.
This is the response I got after 5 days:
Thanks for raising this and for sharing the metrics. You’re right that your app’s own traffic (87 invocations in that minute) is well below the documented limit of 1,200 invocations per minute. The rate‑limited responses you saw were not caused by your app exceeding its per‑app limit.
During a short window on 18 December, there was an issue on our side where some apps were incorrectly grouped into a shared rate‑limit pool. That pool was already heavily used by other apps, so a small number of your requests were returned as rate‑limited (429), even though your own usage was low. This has since been corrected and the normal per‑app limits now apply again. There is no developer status communication about this issue because the issue was internally flagged and acted upon.
Atlassian wrongly enforced rate limits, didn’t bother to inform us or update status page while we were dealing with tons of support requests from customers or resulting churn from this as the app didn’t work. Also, it was not a short window. The issue has been ongoing for a week or so.
I think global rate limits are terrible for this platform. Our app is used by some of the largest Atlassian customers, and it was broken because it happened to be in some weird shared pool with free users? I would be more than happy to pay extra to move to per-instance limits and protect large paying customers.
There is no developer status communication about this issue because the issue was internally flagged and acted upon.
When I see stuff like this, I get pretty concerned about the future of my business. “The issue was internally flagged and acted upon” is a terrible excuse for not communicating about it. We need to know how it happened and why it won’t happen again. Every time. It doesn’t matter int he slightest that you knew what was wrong. We need to be able to tell our own customers why they couldn’t use the product they paid for. It should not require a support case to know when we are affected by a systemic mistake in the platform we are being coerced onto. OCNB, please.
It seems like Atlassian still has to learn how to be a proper service provider.
If you want a PaaS, act like one. You can’t have your cake and eat it.
You are now a Cloudflare. You are now an AWS. You are now a Vercel. If we are to pay for your hosting platform, we are your customers. Treat us a such.
Separate your SaaS business from your PaaS business @AlanBraun / @ChrisHemphill1 and make sure that you follow industry best practices on how to operate one.
We apologise for the delay and the earlier, incorrect explanation.
After further investigation, we confirmed that your customer’s errors were caused by our gradual rollout of new requests‑per‑second (RPS) rate limits for Forge invocations, announced on November 20 (Forge Changelog post). This was a per-user+installation limit intended to prevent short traffic bursts blocking customers for up to a minute in the previous model. In your case, while we have been incrementally rolling out the new RPS changes, these new limits caused the rate limiting that your customer experienced.
We’ve now rolled back this change and restored the previous behaviour while we reassess our approach for RPS‑based limiting (or alternatives).
The intention of the change was based on partner feedback to allow a retry and back-off strategy to ensure that rate-limiting has less impact on customers. We apologise for the disruption and poor experience surrounding both the incident and our communication. We are using your feedback to improve the reliability of Forge rate limiting and our communication of issues that affect your apps.
Whilst it is great to have more insights on the technicalities of what happened, this thread is not about the engineering aspect of the outage.
This thread is about the handling of the incident and has a clear ask on how Atlassian is going to improve this to restore trust in Forge as a platform, and this is not good enough.
There are clear indicators about what needs improvement.
the response [..] after 5 days
Atlassian wrongly enforced rate limits
didn’t bother to inform us or update status page while we were dealing with tons of support requests from customers
Hi @Remie It is after work hours on Christmas Eve here in Australia, so I do not have all the answers you’re looking for, but I do want to respond and clarify so I can find answers for your questions over the break.
I cannot remark on the 5 day response time, I will seek a response from the team who originally dealt with this request.
On the rate limits - to be clear, the initial response quoted above was incorrect. We didn’t wrongly enforce rate limits. We were incrementally rolling out the announced request-per-second (RPS) rate limits, and were still many multiples above the existing request-per-minute (RPM) limits, which were still in place. We were also tracking the metrics, and saw no increase in the total number of rate limiting events during the rollout.
I am not aware of any other support tickets referring to rate limiting outside the original post, and we can only see a limited number of such events for this app. That is why there was no status page updates as this didn’t appear to be a widespread issue.
If you have other examples we are happy to look into them. Do you have an Atlassian support ticket you can share? Have you seen these issues resolve in the past 24 hours?
Apologies I can’t give more concrete answers right now.
@DanCrammond, I’m not asking you to respond after work hours on Christmas Eve. This is why I always address Atlassian as an entity, not individual Atlassians. You should just close your laptop and go enjoy your time off, celebrate Christmas with your family and completely ignore whatever happens here on CDAC
Now with regard to this thread: Atlassian is approaching this as an engineering issue, but it is not. It is a business issue.
Forge is a PaaS and given that Atlassian will be charging Marketplace Partners, we are now Atlassian customers. That means that we need to be treated as such, with appropriate support, a concession policy to remediate commercial damage from outages and public accountability that will allow marketplace partners to communicate to our customers that the issue was with the service provider, not the marketplace partner.
These threads should be tracked by customer success employees that understand the business part of running a PaaS, not engineering employees
Thank you @DanCrammond for the insights, appreciate it.
While I understand the main cause now, it raises more questions. As you said, it’s a Christmas Eve, but unfortunately Atlassian is giving me such a fantastic holiday time.
It’s been a week and I still do not have proper answer in the support request: ECOHELP-103477
I raised the request as P2-Major and provided all the details that were needed. Why does it take forever to get much needed support for such critical issues? If I had not raised this thread here on CDAC, I would not have got the response yet.
Please tell me again, how do we trust such a platform to build our business?
Hi everyone - Andrew from Ecosystem Support here.
I wanted to address the support related concerns raised in this thread.
First, I want to acknowledge the delays some of you have experienced recently when filing support tickets. Over the past few months, we have at times fallen behind our usual SLA standards for partner tickets more broadly - not just in the case discussed here - and we know this has created challenges for you.
Timely and predictable support is not a “nice to have”, but a critical business need. Meeting our SLAs is something we take seriously, and we are actively working to normalize our operations by reducing backlog and sharpening our focus on issues that impact partners. As Forge matures, we know you need reliable support more than ever. It’s on us to ensure support is a trusted channel and a strong advocate for you internally.
You should begin to see improvements in response times from now onward. It will take some time to fully return to the level of consistency you rightly expect, but the overall trajectory should be clearly improving - especially for higher priority tickets.
Thank you for your patience and for continuing to highlight where our support experience needs to be better. We are committed to making sustained, long‑term improvements to how we support all Marketplace Partners.
If I would get a penny for every time Atlassian was “committed to making sustained, long‑term improvements to how we support all Marketplace Partners” over the course of the past 13.5 since Marketplace inception, I would be very very very rich.
As the hour approaches in which Atlassian will enter the PaaS market and become a hosting company, it is imperative that she prepares herself to offer the type of support that has become the industry standard.
Vague promises are not good enough for a service provider, especially if the offering is a mere wrapper around AWS that comes at premium pricing and a required vendor lock-in.
Once again, I would like to remind Atlassian that it needs customer success / account managers to track these threads as this is a business issue. Not engineering, not support, but business.
I wanted to follow up on this. The rollback has indeed fixed the issue which means RPS based rate limiting is not suitable for us.
So I would like to ask, what is the way forward? The rollback is temporary, I assume. We need to find a sustainable solution before we get hit with surprise rate limit again.