Hi @MaheshPopudesi
The hourly quota is intended as an accounting boundary, not as a guarantee that an app will be unavailable for a full hour if a spike occurs.
The model will forgive occasional hourly spikes for the app; however, when the app consistently crosses the Global Pool thresholds, please explore optimizing API usage patterns. Additionally, please apply for the Tier 2 quota. We recognize that apps can experience short, high-intensity bursts, so that short-lived spikes don’t translate into prolonged service disruptions for all other tenants.
A design that results in hour-long lockouts for apps would not be acceptable, and this is one of the key scenarios we’re pressure-testing during this period before enforcement.
I am glad we both agree that an hour-long lockout is not acceptable. If this new design truly cannot lock out apps for an hour and this hour component is actually just an accounting boundary, then it sounds like the original description was not quite right. Could Atlassian please describe exactly what the time window is (since it’s apparently not one hour?), explain which parameters influence any lockout period, and describe worst-case expectations for recovery (conditions required to get there, the ability/inability to use any APIs, potential for slowed responses or other impacts, and the maximum time period over which an outage could be expected)?
All of these things need to be understood in order to run our business and plan around this change.
I mentioned this earlier, but it is also really important to understand really soon if front-end calls are included in the new rate limits or not.
We understand the question about why this doesn’t follow a longer changeover period. Over the past year, we’ve seen a significant increase in overall API usage, which drives important business outcomes. We have also observed an increase in policy violations that can affect the experience of all other apps in our ecosystem.
I recognize that all of the above is important to Atlassian. It is not clear to me though why anything in “driving business outcomes” justifies not following the standard six-month notice period? One-off violators can be presumably dealt with individually, so it feels like this is mostly a question of cost optimization? Or maybe the internal team was hoping to move on to a different project? Neither of these would tend to give warm and fuzzy feelings to vendors, especially given the holiday timing. Although nothing obligates Atlassian to describe its internal thinking, if you are looking for warm and fuzzy, being more transparent here about why this has to happen right now (and not in June) would go a long way.
Along with introducing the point-based rate limits, our goal is to support a smooth transition for all apps and minimize impact. Based on our traffic analysis for the past year, ~95% of the apps never cross boundaries of the Global Pool; of those that did, we proactively moved qualifying apps to Tier 2. We don’t expect the vast majority of apps to experience any impact from this change.
95% of apps (right now) do not cross the global pool. What about planning for growth? The more successful an app is, the more customers it has, and the more usage is generated.
Atlassian is deliberately creating what amounts to a trap that will cause apps to go out of service once they hit a magic threshold (and ironically, only for successful apps). What happens with this project in two years when traffic increases threefold, no one on this original rate limit team is still even in their current role at Atlassian, and mostly chatbots are handling the responses to “increase my rate tier” tickets?
To ask a more pointed question, why does tier 1 even exist? Can all apps not just be tier 2? Why do vendors need to be the ones to raise a ticket with a dozen detailed data fields to justify why they need to keep their apps working? If Atlassian insists on having a tier 1 and a tier 2, why can Atlassian not just transition apps for us automatically (or at least alert us) before they start breaking? Can this not be part of the MVP? Requiring vendors to take proactive actions based on response headers is literally impossible for Runs on Atlassian apps. It seems like Atlassian is saying that vendors now have to manually log into a dashboard and monitor some metric on a periodic basis just to ensure that their app is not inadvertently taken out of service?
Vendors are paying Atlassian millions in revenue share in order to work with its products. We all pay the same percentage. It does not seem objectively fair that some apps get X calls to APIs, whereas other apps get 5000X calls.
Looking at the questions in the “apply for tier 2” support ticket, it seems that Atlassian wants to curtail misuse and sloppy API calls.
To handle high API usage transparently and fairly, there is also the nuclear option: charge vendors for API usage.
You can do this fairly by reducing the existing Marketplace cut in order to end up net-neutral in revenue for Atlassian (where the aggregate new_vendor_share_dollars + api_usage_dollars = old_vendor_share_dollars). Customers and apps who make more calls will be billed more, and those who have lower demands will be billed less. You can then presumably dispense with most arbitrary API cutoffs and tiers, except whatever burst limits you need in order to keep your infrastructure healthy.
This would still need a few safety controls (allow vendors to specify their own per-tenant and global limits, with both warning and hard cutoff thresholds) in order to provide cost control and visibility, but it is at least transparent, it does not arbitrarily create app failures, and it puts control in the hands of app vendors. There will be winners and losers, but I would say that this is arguably fair for everyone, and it drives down Atlassian infrastructure costs (because vendors are directly incentivized to optimize their API calls).