Rate limit abuse (new attack vector)

I understand the need for a fair rate limit system, and I prefer the new simplified HTTP headers for the point-based system over the previous ones. I also don’t want to encourage anyone to exploit the new limits to compromise other apps. However, there are clear flaws in the point-based rate limit system that have been raised multiple times in 2026 point-based rate limits at the end of last year, and Atlassian has not really addressed them yet.

I’ve created a simple Confluence app (Tier 1) that fetches 250 pages when the user clicks a button (one API request). It takes only about 3 minutes for a single user to exceed the global quota by clicking the button repeatedly, rendering the app unusable for up to one hour on all tenants. The fact that the quota always resets at the top of the hour makes this new attack vector even easier to exploit.

Here’s a video demonstrating how easy it is to lock out other tenants:

According to a response from Atlassian in the community thread, the rate limit system is supposed to handle this scenario to prevent lockouts. However, it doesn’t seem to be functioning, at least not for my test app.

The hourly quota is intended as an accounting boundary, not as a guarantee that an app will be unavailable for a full hour if a spike occurs. The model will forgive occasional hourly spikes for the app; however, when the app consistently crosses the Global Pool thresholds, please explore optimizing API usage patterns. […] A design that results in hour-long lockouts for apps would not be acceptable, and this is one of the key scenarios we’re pressure-testing during this period before enforcement.

Upgrading the app to Tier 2 would reduce the lockout risk, but even Tier 2 apps can become unusable within a tenant for up to an hour if just a single user repeatedly triggers a somewhat complex action through the app UI.

How are app vendors supposed to handle this risk, given the short timeline for the rollout of the point-based rate limit system?

24 Likes

Thank you for the detailed reproduction.

The Tier 1 Global Pool is a shared quota across tenants and is intended to keep the model simpler for the majority of apps that operate well within those limits. Apps with higher per-request point costs can be more sensitive to concentrated usage from a single tenant.

We take scenarios like this seriously, especially where concentrated usage may affect a shared pool across tenants. Apps with usage patterns that are a better fit for the Per-Tenant (Tier 2) model please reach out for a Tier 2 review.

Can you please confirm whether your app responses are showing Beta-RateLimit-Policy or RateLimit-Policy (without the Beta-prefix)?

This scenario is one of the reasons we are doing a phased-rollout, so we can expand it gradually and minimize disruption.

1 Like

@MaheshPopudesi
Forgive my frankness, but it is not true that Atlassian is seriously considering these scenarios.

There is clear evidence that any malicious actor can carry out a DoS attack on an app extremely easily, causing damage to ALL customers who use it in production and to the vendors who are left holding the bag and having to deal with abuse and unexpected application outages.

I understand the idea of keeping the model simple for most applications, but let’s be honest: from a purely engineering standpoint, the fixed global limit is an absolutely insane decision. The limit should be calculated per tenant and not globally, even on Tier 1. That way, in the event of spikes, abuse, malicious attacks, or anything else, the damage would remain confined to a single tenant.

Atlassian sometimes needs the honesty to admit it made a mistake and fix it, instead of continuing to insist on a model that clearly has several flaws.
If malicious actors were to exploit this mechanism to disrupt apps, this could raise questions about Atlassian’s potential legal responsibility, given that the mechanism may be seen as introducing a significant security weakness despite repeated community feedback and reports.

You cannot demand dozens of compliance and security requirements from us and then be the first to open the door to DoS attacks.

15 Likes

Hi @MaheshPopudesi,

I’m using the Beta-RateLimit header and the r parameter to check if the quota has been exceeded.

That’s a valid point. However, Atlassian expects vendors to handle this at the application level by ensuring their apps don’t generate excessive or redundant requests, for example, when a user repeatedly triggers the same action. In practice, this means implementing your own safeguards (such as caching, debouncing, or internal rate limiting) on top of Atlassian’s platform limits.

While Atlassian can increase rate limits in some cases, it doesn’t eliminate the underlying issue, without proper controls in the app, you can still run into the same constraints.

Point you’re missing is that Atlassian is enforcing a global rate limit pool by default.

Meaning the rate limiting you see in the video demo here applies to every customer installation of your app, not only the customer who triggered the limit. One customer triggers the limit = all customers experience one hour of downtime.

There’s no logical way for us developers to handle that. You’d be trying to do some calculation like {fixed_global_pool_limit} / {variable_total_installs}

And the only response Atlassian can parrot is “oh just fill out paperwork to upgrade to per-tenant”. So if you don’t want your entire customer base to be trivially DoS’d by a single free trial user, you have to fill out unnecessary paperwork, wait weeks for review and hope they approve it. And then do that for every app.

Trivial for a human to bring down a single app. Trivial for an OpenClaw agent to bring down the entire Marketplace. Both would look like requests from power users rather than malicious actors so they will be undetectable by any system. Partners shouldn’t need to waste a single second of time trying to explain how poorly designed this is.

2 Likes

I completely agree with your point, and I’m aware that rate limiting operates on a shared global pool. My main point is that we shouldn’t rely on Atlassian to fully solve this, as they are also focused on managing and conserving platform resources. As a result, they expect vendors to design apps that avoid hitting these limits in the first place even in a global pool.

In practice, this means we’re operating within a framework where responsibility is shared. You can take the longer route, raising support tickets and pushing for platform changes, or take a more immediate approach by implementing internal rate limiting and safeguards within your app (or ideally, both).

From our experience, waiting alone isn’t a sustainable strategy. Over time, you’ll likely encounter the same limitations unless you proactively design around them.

they expect vendors to design apps that avoid hitting these limits in the first place even in a global pool

@PrinceNyeche Can you please share how you are planning to do this? How would you guard yourself against this scenario?

In addition, how are you going to prevent the scenario mentioned by @scott.dudley, where a single permissions check mandated by Atlassian security policy already consumes 1000 points?

Nobody disagrees with the statement that vendors should implement proper controls to avoid rate limiting issues, nor does anyone disagree that rate limiting is required. We all want Atlassian to operate a secure, reliable service.

But we have been consistently telling Atlassian that their current implementation is not an industry best practice, and that it has introduced a security issue as it creates a DDoS attack vector.

If you are saying that other vendors should level-up (even considering the incredible amount of experience this community has), I urge you to bring receipts and show us how you believe it should be done.

6 Likes

I mentioned a practical approach earlier to help mitigate this common scenario:

In practice, this means implementing your own safeguards (such as caching, debouncing, or internal rate limiting) on top of Atlassian’s platform limits.

I will even go as far as disabling that button if the clicks are too rapid. To be clear, these measures aren’t foolproof, but they are a strong starting point, and it’s an approach we’re already applying. I also agree that the concerns raised here are valid and Atlassian should address them.

My main point is this: while it’s absolutely reasonable to raise this with Atlassian and push for platform improvements, those changes can take time. In the meantime, it’s more effective to implement safeguards within our own apps rather than wait.

That’s essentially why many of us became vendors in the first place, because there were gaps we chose to solve ourselves. The ideal approach is to do both: continue engaging Atlassian for improvements, while also building internal solutions to handle the problem today.

And just to add, I’ve been part of this community for a long time and have had the opportunity to see how Atlassian operates from the inside. So this is simply practical advice based on that experience, it’s there for those who find it useful.

Ok, let’s go over those practical approaches:

Caching
This is only viable for apps that have external egress enabled (so not RoA) as Atlassian has abandoned the Forge Cache initiative. There is currently no Forge native caching solution.

debouncing
There are two different ways to debounce: client side (like your suggestion to disable the button) and server-side.

Let’s start with the client-side: because Atlassian chose a global pool, the pool is consumed by all users of the app. Our Figma for Confluence LITE app has 3,542 installs. Each install can have multiple macros on the same page, let’s say 3 on average. If every installation loads their page within a 10m timespan, we have 3542 * 3 API requests. If each macro does a permissions check, this will consume 10,626,000 points. The app will now be disabled for 50m. This is not abuse, but regular use. I cannot prevent this by disabling a button.

With regard to server-side debouncing: because Forge is a serverless architecture, each request will be routed to different instances of the Forge function. The underlying code is not aware of how many Forge functions are executed at the same time. I cannot group API request server-side, I cannot delay them. The only information I have is in the rate limiting headers, but for any app with significant use, this information is often too late.

Internal rate limiting
The problem with internal rate limiting is how to apply it? Are you going to divide the total global pool per hour available for your app with the amount of installs? So that means for our Figma for Confluence LITE app, each tenant will have 65,000 / 3542 = 18 points per hour. Now take into consideration that a single page can have multiple macros. This means that the app becomes unusable for customers. Please do share the internal rate limiting policy that you deem viable here!

Obviously, any app with significant usage will be put in the Tier 2, as @nathanwaters also mentioned, but it does require vendors to make a case and wait for Atlassian to decide at her discretion.

In reality, Tier 1 will mostly impact vendors with small install bases, who just have to hope that their app does not go viral overnight and create abysmal UX issues that will discredit them with their prospective customers.

6 Likes

That assumption relies on all installations being active at roughly the same time or within similar intervals, which is unlikely in practice. Atlassian’s rate-limiting model is designed around aggregated usage patterns across tenants, and it’s reasonable to assume they’ve accounted for varying app behavior, usage intensity, and installation sizes when designing Forge (I might be wrong but knowing how Atlassian did things even for internal apps like micros/slauth, I would say they did).

That’s usually the case for everyone, don’t get me started on even the licensing API that’s another headache.

That said, I agree with the concerns raised in this thread. Different apps behave very differently, and larger or more active installations can disproportionately impact shared limits. We saw this ourselves and had to introduce some safeguards to prevent a single installation from consuming a large portion of the available quota. As mentioned earlier, these measures aren’t foolproof, but they help operate within the current constraints while we wait for potential platform improvements.

From my perspective, a per-installation rate limit model would better isolate usage and reduce the risk of one tenant affecting others. It’s something worth advocating for which I did in atlascamp 26 and also having calls with Atlassian Forge team (that has always been the thing I reiterate with them), alongside continuing to build internal protections.

One thing I’ve observed is that the people who would typically push for changes like this at Atlassian are no longer there, or are on their way out. So realistically, this becomes a waiting game, and it’s likely to be a long one.

Ah yes, until you realise that I only mentioned the app installs, not the size of the instances. Unfortunately, Atlassian no longer publishes number of users per app.

Those 3,542 installs include customers like Ubisoft, Woolworths, The Pokemon Company, New Relic & Snyk. These are not single-user instances. For apps with significant installs, this is not a hypothetical issue.

4 Likes