RFC-117: Forge LLMs

marton.kelemen · November 3, 2025, 6:50pm

Hi,

I like the idea of introducing LLM support into Forge.

First of all, I agree with most of the objections regarding shared responsibility, pricing, throttling and such aspects from above.

Next, I want to extend the debate, as I haven’t really seen remarks on custom tooling, which seems to be an essential capability in the age of MCP servers:

Maybe I’m not familiar enough with external MCP integrations, but I assume that each custom-defined “tool” requires communication to 3rd-party services, which may or may not Atlassian context and customer data be exposed to. Nevertheless, as these services have nothing to do with the LLM providers (and the the configurable models/model families), such connections should be either pre-configured in the app manifest or made dynamically configurable by the site/app admin. I haven’t seen such particulars in the example above and it’s not entirely clear, what’s the concept here.

Does this mean that custom MCP servers won’t be allowed for use, at least not as part of RoA? Or do you plan to maintain a curated list of MCP servers that are considered to be safe enough to use?

Cheers,
Márton

DanielFrench · November 4, 2025, 2:52pm

Hi, are you able to give more information on the following points:

Data retention and privacy: what request and response data is retained, for how long, and in which regions, and is there an opt-out for logging?
Legal and training data: will prompts or outputs be used to train Atlassian models and will tenants be able to opt out of training data use?
When will feature flags for LLMs be available in preview and what specific controls will they expose?
What is the timeline for consumption-based Marketplace billing and are there interim recommendations for monetising LLM features?

Many thanks,
Daniel

Anja · November 5, 2025, 7:52am

Hi @AdamMoore ,

you stated that

However, the usual billing model would be ‘You have X amount of tokens included, you will have to pay Y more for Z more tokens’. With the Atlassian marketplace, however, we cannot implement the second part.

Basically, all we can do is sell them an app that will then no longer work as soon as they have used up their token limits, and the user will have no way of extending that.

AdamMoore · November 6, 2025, 12:39am

They will be vanilla models with an additional moderation check

AdamMoore · November 6, 2025, 12:51am

Hey @marton.kelemen,

The function of tools in this sense is really about tool selection, it doesn’t do the tool invocation (that’s up to your app).

So for a Runs on Atlassian app, it might select a “tool” that is just another function in your resolver, as well as any parameters that you need to pass into it.

It let’s you:

Detect when a users request matches a defined function in your app e.g. create_issue
Return a json schema-based object to call the function
Your app code then executes the function
(optional) You maybe pass the results of the function back into the LLM to generate a natural language response to the user.

Of course, you could also use it to create calls to third party/external APIs as well but you would need to add the egress permissions and have them approved by the admin.

There’s definitely more we can do to enable Forge apps to be MCP clients, both for internal Atlassian tools and third party tools but that’s not part of this initial release.

AdamMoore · November 6, 2025, 1:25am

Hey @DanielFrench

The exact specifics of logging etc. are still being finalised but broadly speaking I can say prompts and outputs won’t be used by Atlassian or our sub processor (AWS in this case) for model training.

Feature flags will just be a generic capability that you can use in your app. So you could use it like you would Launch Darkly, Statsig etc.

I don’t have a timeline for consumption based billing yet, next year’s roadmap for Marketplace is still being finalised.

AdamMoore · November 6, 2025, 1:43am

Hey @Anja ,

I totally understand that. Not having consumption based billing does limit your options.

That said, there are hundreds of apps on Marketplace that already leverage LLMs (albeit external ones) and have managed to work that cost into their current business model. Many before we even supported app editions.

Anja · November 6, 2025, 5:24am

Hey @AdamMoore ,

absolutely correct, there are already a lot of Marketplace apps already using LLMs - where the user then often brings their own e.g. openAi api key!

This means that the current business model of most LLM apps (including our own apps) is that the customer pays for the LLM usage themself.

The proposed business model leads to very weird incentives concerning integrating LLMs into our apps:

As an app developer looking at the customer, I want to sell an app to the customer that is useful for them (otherwise, why buy the app), meaning an app they can and will use frequently. The more the better!
But as an app looking towards the marketplace business models, I do not want them to use the LLM feature at all, because this will generate costs for me I cannot recuperate directly from the customer that causes these costs. I can only recuperate them by having all customers pay more, which will likely lead to a decrease in sales!

Do not misunderstand me - I really like and want to use your proposed LLM models! I think this is an awesome feature. But until billing the customers for the usage, I do not see us using this feature.

MiikaMkiniemi · November 7, 2025, 8:11am

Really excited to see LLMs coming to Forge few things came to mind:

1. Streaming
Others have already mentioned streaming, and just want to reinforce how important that will be. Streaming responses make a huge difference for long completions and perceived latency

2. AWS Bedrock Features
Since this runs on Bedrock, any chance we’ll get access to other Bedrock features in future like Knowledge Bases (for built in RAG) or Guardrails (for safety controls)?

3. Data Residency
How will regional routing and data residency be handled for customers with strict locality requirements? Will cross-region inference be used for the models?

PabloBeltran · November 8, 2025, 8:26pm

Will the Forge-hosted LLMs support actual fine-tuning (i.e., updating model weights) using vendor-provided training datasets?

For example, after the fine-tuning step, we would obtain a new fine_tuned_model_id and call it like:

const response = await chat({
  model: 'fine_tuned_model_id',
  maxCompletionTokens: 50,
  temperature: 0.7,
  topP: 0,
  messages: [
    { role: 'system', content: 'You are a text editing agent' },
    { role: 'user', content: 'Find any typo in the following text' },
    { role: 'user', content: 'Tis is an intereating artivle!' }
  ]
});

AdamMoore · November 9, 2025, 11:54pm

Thanks @MiikaMkiniemi

Streaming - yes that feedback has come through loud and clear. It may not be available in EAP but we’ll see how quickly we can add support for it.

Bedrock features - This is something we’re looking into. We’ll definitely do more to support RAG and different capabilities but whether that’s via Bedrock or something else is still TBC.

Data residency - Cross region inference will be used, but no data will be stored out of region so it meets Atlassian’s definition of data residency.

AdamMoore · November 9, 2025, 11:55pm

@PabloBeltran - no immediate plans to support fine tuning but something to consider for the future roadmap. Thanks.

janette · November 10, 2025, 8:44am

Hey,

This is exciting, and we see a lot of product potential here.

Mostly echoing what others have already said. For any meaningful AI use case, context is everything. The value of an AI feature doesn’t come from the model itself, but from its ability to use information across:

Confluence pages, whiteboards, and databases
Jira issues, workflows, and custom fields

If an app can’t access and inject that context efficiently and securely, then whether the model runs “inside Atlassian” or not becomes irrelevant, because the output will be too generic to deliver real customer value.

Atlassian has made it clear that Marketplace apps should move toward vertical and industry-specific solutions. To make that possible, we’ll need:

The ability to inject domain-specific context dynamically, not just through prompt text
Some form of fine-tuning or persistent instruction layer

On pricing, ideally we’d be able to offload costs to customers, but can workaround and build guardrails ourselves as well.

I do think it’s important that the initial release of Forge LLMs offers enough capability to build real value. If the first version feels too limited or generic, adoption will likely be low, which shouldn’t be seen as a lack of demand. It’s more about ensuring that partners have enough to work with from the start, similar to the early Forge rollout, where adoption didn’t take off as the platform wasn’t mature enough.

AdamMoore · November 24, 2025, 2:35am

Hey everyone,

Thanks for participating in the RFC. This is certainly an exciting topic and there is lots for us to think about as we plan our future roadmap. There were some really strong themes around:

Improving usage tracking and controls
Developing mechanisms for evals and quality testing
More features like response streaming and caching
The need for better monetisation options on Marketplace
The ability to use customer’s Rovo credits instead of paying as a Forge developer

While we won’t be able to satisfy many of those asks from the first release, we are keen to move forward and have launched the EAP ( sign-up form).

Access will be limited, and we’ll be prioritising apps that are Runs on Atlassian (or plan to be soon). All going well, we’ll gradually open up to more participants (including non-RoA).

Thanks again for contributing to the RFC!
Adam