RFC-117: Forge LLMs

Hi,

I like the idea of introducing LLM support into Forge.

First of all, I agree with most of the objections regarding shared responsibility, pricing, throttling and such aspects from above.

Next, I want to extend the debate, as I haven’t really seen remarks on custom tooling, which seems to be an essential capability in the age of MCP servers:

Maybe I’m not familiar enough with external MCP integrations, but I assume that each custom-defined “tool” requires communication to 3rd-party services, which may or may not Atlassian context and customer data be exposed to. Nevertheless, as these services have nothing to do with the LLM providers (and the the configurable models/model families), such connections should be either pre-configured in the app manifest or made dynamically configurable by the site/app admin. I haven’t seen such particulars in the example above and it’s not entirely clear, what’s the concept here.

Does this mean that custom MCP servers won’t be allowed for use, at least not as part of RoA? Or do you plan to maintain a curated list of MCP servers that are considered to be safe enough to use?

Cheers,
Márton

Hi, are you able to give more information on the following points:

  • Data retention and privacy: what request and response data is retained, for how long, and in which regions, and is there an opt-out for logging?
  • Legal and training data: will prompts or outputs be used to train Atlassian models and will tenants be able to opt out of training data use?
  • When will feature flags for LLMs be available in preview and what specific controls will they expose?
  • What is the timeline for consumption-based Marketplace billing and are there interim recommendations for monetising LLM features?

Many thanks,
Daniel

Hi @AdamMoore ,

you stated that

However, the usual billing model would be ‘You have X amount of tokens included, you will have to pay Y more for Z more tokens’. With the Atlassian marketplace, however, we cannot implement the second part.

Basically, all we can do is sell them an app that will then no longer work as soon as they have used up their token limits, and the user will have no way of extending that.

They will be vanilla models with an additional moderation check

Hey @marton.kelemen,

The function of tools in this sense is really about tool selection, it doesn’t do the tool invocation (that’s up to your app).

So for a Runs on Atlassian app, it might select a “tool” that is just another function in your resolver, as well as any parameters that you need to pass into it.

It let’s you:

  1. Detect when a users request matches a defined function in your app e.g. create_issue
  2. Return a json schema-based object to call the function
  3. Your app code then executes the function
  4. (optional) You maybe pass the results of the function back into the LLM to generate a natural language response to the user.

Of course, you could also use it to create calls to third party/external APIs as well but you would need to add the egress permissions and have them approved by the admin.

There’s definitely more we can do to enable Forge apps to be MCP clients, both for internal Atlassian tools and third party tools but that’s not part of this initial release.

Hey @DanielFrench

The exact specifics of logging etc. are still being finalised but broadly speaking I can say prompts and outputs won’t be used by Atlassian or our sub processor (AWS in this case) for model training.

Feature flags will just be a generic capability that you can use in your app. So you could use it like you would Launch Darkly, Statsig etc.

I don’t have a timeline for consumption based billing yet, next year’s roadmap for Marketplace is still being finalised.

Hey @Anja ,

I totally understand that. Not having consumption based billing does limit your options.

That said, there are hundreds of apps on Marketplace that already leverage LLMs (albeit external ones) and have managed to work that cost into their current business model. Many before we even supported app editions.

Hey @AdamMoore ,

absolutely correct, there are already a lot of Marketplace apps already using LLMs - where the user then often brings their own e.g. openAi api key!

This means that the current business model of most LLM apps (including our own apps) is that the customer pays for the LLM usage themself.

The proposed business model leads to very weird incentives concerning integrating LLMs into our apps:

  • As an app developer looking at the customer, I want to sell an app to the customer that is useful for them (otherwise, why buy the app), meaning an app they can and will use frequently. The more the better!
  • But as an app looking towards the marketplace business models, I do not want them to use the LLM feature at all, because this will generate costs for me I cannot recuperate directly from the customer that causes these costs. I can only recuperate them by having all customers pay more, which will likely lead to a decrease in sales!

Do not misunderstand me - I really like and want to use your proposed LLM models! I think this is an awesome feature. But until billing the customers for the usage, I do not see us using this feature.

3 Likes

Really excited to see LLMs coming to Forge :smiling_face_with_sunglasses: few things came to mind:

1. Streaming
Others have already mentioned streaming, and just want to reinforce how important that will be. Streaming responses make a huge difference for long completions and perceived latency

2. AWS Bedrock Features
Since this runs on Bedrock, any chance we’ll get access to other Bedrock features in future like Knowledge Bases (for built in RAG) or Guardrails (for safety controls)?

3. Data Residency
How will regional routing and data residency be handled for customers with strict locality requirements? Will cross-region inference be used for the models?

1 Like

Will the Forge-hosted LLMs support actual fine-tuning (i.e., updating model weights) using vendor-provided training datasets?

For example, after the fine-tuning step, we would obtain a new fine_tuned_model_id and call it like:

const response = await chat({
  model: 'fine_tuned_model_id',
  maxCompletionTokens: 50,
  temperature: 0.7,
  topP: 0,
  messages: [
    { role: 'system', content: 'You are a text editing agent' },
    { role: 'user', content: 'Find any typo in the following text' },
    { role: 'user', content: 'Tis is an intereating artivle!' }
  ]
});
1 Like

Thanks @MiikaMkiniemi

Streaming - yes that feedback has come through loud and clear. It may not be available in EAP but we’ll see how quickly we can add support for it.

Bedrock features - This is something we’re looking into. We’ll definitely do more to support RAG and different capabilities but whether that’s via Bedrock or something else is still TBC.

Data residency - Cross region inference will be used, but no data will be stored out of region so it meets Atlassian’s definition of data residency.

1 Like

@PabloBeltran - no immediate plans to support fine tuning but something to consider for the future roadmap. Thanks.

Hey,

This is exciting, and we see a lot of product potential here.

Mostly echoing what others have already said. For any meaningful AI use case, context is everything. The value of an AI feature doesn’t come from the model itself, but from its ability to use information across:

  • Confluence pages, whiteboards, and databases

  • Jira issues, workflows, and custom fields

If an app can’t access and inject that context efficiently and securely, then whether the model runs “inside Atlassian” or not becomes irrelevant, because the output will be too generic to deliver real customer value.

Atlassian has made it clear that Marketplace apps should move toward vertical and industry-specific solutions. To make that possible, we’ll need:

  • The ability to inject domain-specific context dynamically, not just through prompt text

  • Some form of fine-tuning or persistent instruction layer

On pricing, ideally we’d be able to offload costs to customers, but can workaround and build guardrails ourselves as well.

I do think it’s important that the initial release of Forge LLMs offers enough capability to build real value. If the first version feels too limited or generic, adoption will likely be low, which shouldn’t be seen as a lack of demand. It’s more about ensuring that partners have enough to work with from the start, similar to the early Forge rollout, where adoption didn’t take off as the platform wasn’t mature enough.

2 Likes