RFC-117: clarification on Forge LLMs compute usage and pricing

Hi & Happy New Year 2026 everyone :tada:

I know RFC-117 is now closed, but I still had a question related to the Pricing section.

While looking into Forge LLMs, we noticed a very significant increase in Forge Functions compute usage (GB-seconds) when invoking LLMs. This seems logical given the current execution model, but it raised a question for us.

With the upcoming token-based pricing for Forge LLMs mentioned in RFC-117, should we expect this current Forge Functions compute impact to be absorbed by the dedicated LLM pricing, or will GB-seconds remain a significant part of the cost when using LLMs?

Thanks in advance for any clarification :slightly_smiling_face:

2 Likes

Sorry to ping you, @AdamMoore, but do you have any idea?

Hi @FabienPenchenat, I’ve raised this question internally to try to help expedite an answer for you.

2 Likes

Thank you very much for raising this internally @DanielleLarregui.
I really appreciate your help.

@FabienPenchenat This is the answer that I received from the Forge LLM team:

Compute usage (GB-seconds) is a different cost element to the LLM usage (token based).
While LLM responses do take longer to respond, the running costs will therefore be twofold:

  • compute usage (longer running times)

  • LLM usage itself

Thanks a lot for taking the time to raise this topic and for the very quick response.

I have to say I’m quite surprised by the potential impact on pricing, which could be both significant and hard to control, as it will strongly depend on how much the LLM-based features are actually used. In that context, I’m not sure this is the most effective approach to encourage vendors to adopt LLM usage.

In any case, thanks again for the clarifications and the transparency on this topic.

1 Like

You’re welcome @FabienPenchenat. I’m not sure that this can be changed at this time, but I will surface this feedback to the Product Manager and the LLM team.

1 Like

Hey Fabien,

I understand the concern, and agree that it will be important to keep usage in check because there are the two aspects to the cost you need to consider.

It’s the nature of Forge’s severless architecture, that we need to account for the cost of the underlying lambda which waits for the response as well as the tokens used by the model.

When Forge Containers are available they will also support Forge LLMs, but whether that turns out to be more cost effective will depend on a number of factors.

Did you have any other thoughts on how we could make things more efficient cost-wise?

2 Likes

Hi @AdamMoore

Thanks for the explanation and for taking the time to clarify.

To be fully transparent, I haven’t had the chance yet to dig into Forge Containers, so it’s hard for me at this stage to assess how interesting or cost-effective they might be in our case.

That said, in the current state, even when reducing memory to the minimum, a single LLM request significantly increases compute usage, with a factor of x15 to x27 compared to what our apps usually consume.

Initially, I expected LLM-related execution to be handled differently from classic function compute, to reflect the fact that on the Forge side there isn’t much actual processing happening during that time. It is mostly waiting for the model response.

It might be worth considering a more LLM-specific architecture rather than the current synchronous model, for example a more asynchronous approach where the LLM request is triggered and processed in the background without keeping a function active for the entire waiting period. This could better reflect the nature of these workloads and help make costs more predictable and manageable for vendors.

1 Like