RFC-107: Forge FIFO Queues

RashiChandola · September 16, 2025, 4:58am

RFCs are a way for Atlassian to share what we’re working on with our valued developer community.

It’s a document for building shared understanding of a topic. It expresses a technical solution, but can also communicate how it should be built or even document standards. The most important aspect of an RFC is that a written specification facilitates feedback and drives consensus. It is not a tool for approving or committing to ideas, but more so a collaborative practice to shape an idea and to find serious flaws early.

Please respect our community guidelines: keep it welcoming and safe by commenting on the idea not the people (especially the author); keep it tidy by keeping on topic; empower the community by keeping comments constructive. Thanks!

Project summary

We’re developing the capability of First-in, First-out (FIFO) processing of events in Forge queues. This will include a new Forge module, fifoConsumer. This capability aims to address the limitations of current async events by providing FIFO sequential event processing.

Timeline:

Publish: 16 September, 2025
Discuss: 7 October, 2025
Resolve: 28 October, 2025

Problem

Currently, the Async Events API allows apps to push events to a queue for later processing. This, however, doesn’t guarantee the order of events, making it unusable for workflows or integrations that require strict ordering. If events are not processed in the required strict order, this can lead to inconsistent states, race conditions, and failed operations.

Proposal

To address this, we’re proposing the addition of FIFO event queues.

With FIFO event queues, an app can declare a queue in the manifest with a single consumer function.

Along with a queue, we will have a groupId parameter in place, and the ordering of events will be guaranteed within a unique combination of installationId, queue, and groupId. This implementation may change, depending on feedback gathered.

Producers

Apps can push events using the push API, which will allow apps to push multiple events at a time.

Sending events from a Forge app

import { FifoQueue } from '@forge/events';
const fifoQueue = new FifoQueue({ queue: 'my-queue' });

export async function sendEvent() {
    const result: PushResult = await fifoQueue.push([{
        groupId: "fifo-mvp-group",
        eventId: "b462857a-1426-4ef7-8332-1747080fb461",
        payload: {
            hello: "world"
        }
    }]);
    if (result.status == 'rejected') {
      console.error(`Events was rejected with error code ${result.errorCode} and error message ${result.errorMessage}`);
    }
}
export interface PushResult {
  status: "success|rejected";
  errorCode: "string";
  errorMessage: "string";
}

Event retention

While events pushed to a FIFO event queue have a base maximum retention period (beyond which events will be dropped), this period has the potential to be extended in the event of outage periods caused by platform errors or instances of unexpected platform performance degradation.

Consumers

Consumers for each queue will receive a batch of events, where the order of the events belonging to a groupId will be in the same order as they were sent by the producer for that groupId.

Proposed consumer configuration in manifest

modules:
  fifoConsumer:
    - key: consumer1
      queue: queue-1
      function: consumer-function
    - key: consumer2
      queue: queue-2
      function: consumer-function-2
  function:
    - key: consumer-function
      handler: consumer.handler
    - key: consumer-function-2
      handler: consumer.handler2

Event processing

// Events in order for queue1-fifo-mvp-group
export const handler = async (eventContext: EventContext, context) => {
  const {events} = eventContext;
  for (const event of events) {
    await processEvent(event.payload);
  }
};

// Events in order for queue2-fifo-mvp-group
export const handler2 = async (eventContext: EventContext, context) => {
  const {events} = eventContext;
  for (const event of events) {
    await processEvent(event.payload);
  }
};

export interface EventContext {
  installationContext: string;
  groupId: string;
  queueName: string;
  events: Event[];
  environmentId: string;
  environmentType: string;
  environmentKey: string;
}

export interface Event {
  eventId: string;
  receiptHandle: string;
  payload: Record<string,any>;
  ingestionTime: Timestamp;
}

Limitations

The following limitations are subject to change, depending on feedback gathered.

There would be limitations on the number of queues, number of groupIds, and number of events that can be obtained by the consumer in a single invocation.
A single consumer can be present for a single queue.
Consumers will have a maximum timeout of 55 seconds.
Invocation of consumer functions is subject to the same rate limit as that of the rate limit present for invocation.
The payload size of the function cannot be more than 200 KB.

Feedback

Thank you for taking the time to read this RFC. We’d love to know what you think about this proposal, especially around the following points:

What is your use case for Forge FIFO queues? How are you solving for this currently?
Would you have a use case for different kinds of invocations or time limits, as compared to the ones being offered in this RFC?
In case of partial event processing failures on the app side, would you prefer to delete the processed events, or would you be willing to handle the logic to reprocess the same event?
What kind of of monitoring or observability capabilities would be helpful for this feature?
Would you need to use dead-letter queues (DLQs), which handle and store messages that cannot be processed? If yes, what would be your use case for this functionality?

EliasBrattliSorensen · September 16, 2025, 10:26am

Hi, thank you for posting this rfc.

I have some design feedback on how the whole manifest/module is structured. Why hard-code fifoConsumer as a unique module? There is a hard limitation to modules for Forge apps, so this approach will scale badly if later adding consumers for other kinds of scheduling schemes in future. Forge apps may already have trouble hitting the limit for the amount of modules, so there should be no reason to have more than one module to handle events.

Surely a better approach would be to add a generic EventConsumer module and capture the queueing scheme in a field like “type” or “kind”. In its first implementation it may only support FIFO and async to begin with (equivalent to this suggestion), but later it would also be possible to support other relevant queueing schemes. In such a case, a manifest could look something like this:

modules:
  consumer:
    - key: consumer1
      queue: queue-1
      function: consumer-function
      kind: fifo
    - key: consumer2
      queue: queue-2
      function: consumer-function-2
      kind: async #(or async is just default when field ommitted?)
  function:
    - key: consumer-function
      handler: consumer.handler
    - key: consumer-function-2
      handler: consumer.handler2

RashiChandola · September 17, 2025, 3:31am

Hi @EliasBrattliSorensen

Thank you for the suggestion.
Every key in the manifest contributes towards the module count of the Forge app. So, having different modules for FIFO & async, or the same module with additional fields should not make a difference in the module count.
While the design you proposed is certainly very clean, it would need reworking of our current async module solution, which is not something we are planning to do as of now. We will keep this in mind in the future iterations though.
Thanks again!

EliasBrattliSorensen · September 17, 2025, 10:49am

Hi @RashiChandola thank you for your answer.

Forgive me for pushing back on this a bit more.

Every key in the manifest contributes towards the module count of the Forge app. So, having different modules for FIFO & async, or the same module with additional fields should not make a difference in the module count.

This could still end up being different in the future if the underlying implementation changes. Then, giving the Consumer module more capabilities rather than adding X special cases as separate modules gives much less of a mental overload and also limits complexity. Imagine if key counts and module counts suddenly do have a different impact? If Forge is indeed intended to be a scalable development tool for enterprise solutions, then something will have to change about the module (or instance) limit eventually.

While the design you proposed is certainly very clean, it would need reworking of our current async module solution, which is not something we are planning to do as of now. We will keep this in mind in the future iterations though.

I’d assume that if following engineering best-practices, Atlassian would refactor the existing Consumer module to allow for more capabilities rather than bolting an unnecessarily specialized module on the side. What will you do the next time the need arises for a different event queueing scheme and you do enter into another iteration? Yet another module? What will you do in the iteration when you eventually decide to collapse it all into the Consumer module, and all manifests out there now have FifoConsumer modules in them that you must still support? Then you will have to deal with the consequences of today’s design decision. I urge you to instead spend a bit more time on this and first rework the existing Consumer so it can capture the fifo case. I assure you that this would most likely be less confusing for all involved parties, and in the long term it will cause a lot less headache for Atlassian developers as well as marketplace partners.

I will give the rest of your questions some consideration and come back about that in a separate post.
All the best!

Elias

RashiChandola · September 18, 2025, 12:34pm

Thank you @EliasBrattliSorensen. Noted your concerns about the designs. We will take these into consideration.
Looking forward to hearing more about your use case!

m.herrmann · September 19, 2025, 6:54am

I was confused when I used the current Queue, where the queue is processing the items at the same time. This caused some issues when we worked to implement the forge migration assistant as larger migrations (with lots of splitted uploads) caused 429 errors, which we could not manage.

This queue might be able to solve some of the 429 issues, but I’m not sure yet if this use case can be solved with the proposed solution.

As far as I can see, we are not able to update the ‘migration-app-data’ trigger to be fifo based. So we would need to add the fifo queues within the app migration trigger and process them within the consumer.

Questions:

You mentioned that “queue” is one of the unique combination, but what happens when multiple separate processes write into a group ID? If a separate queue would be created, we would not be able to use this to improve our app migration process
As far as I can see the consumer gets multiple queue items, but what happens when I’m unable to process all of them within the 55 seconds? For our use case, we might have a queue of 1+ million issues where custom fields needs to be updated
What happens when I put multiple 50 KB payloads into the queue? Is the payload size limit per event or total of the queue
The migration assistant trigger is called again after a few minutes when migration.messageProcessed is not called within a time window. This could cause issues when we are not able to identify if a items is already within the queue or already processed

JakobRenner · September 19, 2025, 6:57am

The current Delivery Guarantees of the Async Events API are delivery “at least once”. Would it be possible for the new FIFO Queues to guarantee “exactly once” so that developers don’t have to build their own checks against things unintentionally happening multiple times.

RashiChandola · September 23, 2025, 10:00am

Hello @m.herrmann,

You mentioned that “queue” is one of the unique combination, but what happens when multiple separate processes write into a group ID? If a separate queue would be created, we would not be able to use this to improve our app migration process.

For an installation, a combination of groupId and queue would ensure FIFO. Multiple processes writing within this combination would lead to the sequence being maintained for the combination as a whole. E.g. If process P1 is writing 2 events to this combination (E11, E12) and process P2 is writing 2 events to this combination (E21, E22) such that the order of sending messages within a groupId G1 and queue Q1, from first to last event, is E11->E21->E12->E22, messages would be processed in the same order.
Would this be an issue for you?

As far as I can see the consumer gets multiple queue items, but what happens when I’m unable to process all of them within the 55 seconds? For our use case, we might have a queue of 1+ million issues where custom fields needs to be updated.

The timeout of 55 seconds is not for the entire queue, but for one batch of events.
Does your use case comprise of a possibility wherein one single event would handle updation of 1+ million issues?

What happens when I put multiple 50 KB payloads into the queue? Is the payload size limit per event or total of the queue

Payload size of the function is 200kB. This refers to payload size of a batch of events. So, a batch having payload size of 50kB should be able to be processed.

The migration assistant trigger is called again after a few minutes when migration.messageProcessed is not called within a time window. This could cause issues when we are not able to identify if a items is already within the queue or already processed

Could you talk about this in more detail? Are you referring to processing issues due to function timeouts? Or, do you want to be able to identify if a particular event coming to the consumer has been processed before?

Additionally, could you tell us a little more about your use case for needing FIFO? Do you need this only for Connect to Forge migration, and specifically only for those 429 issues?

RashiChandola · September 23, 2025, 10:03am

Hello @JakobRenner,

Would it be possible for the new FIFO Queues to guarantee “exactly once” so that developers don’t have to build their own checks against things unintentionally happening multiple times.

A batch of events is marked as processed only when all events of the batch are processed. If even one event is not processed successfully, the processing of the entire batch will be retried. Hence, in such scenarios, the “exactly once” guarantee will not hold.

Could you tell us about your use case for needing FIFO?

m.herrmann · September 23, 2025, 11:57am

The order of events seems good for be. So even when I create a new FifoQueue in a different process (same forge function but called a few ms later), it will write it into the same queue and will not create a separate one.

The timeout of 55 seconds is not for the entire queue, but for one batch of events.
Does your use case comprise of a possibility wherein one single event would handle updation of 1+ million issues?

Not a single Event, but I don’t see how the example code is able to identify if one of the batched events was able to be processed as there is no visible callback.

So let’s say I have 25 events in the eventContext and I do get a 55 seconds timeout while processing the 5th event. How is the queue able to identify that in the next batch, it should start with the 5th event again?

const {events} = eventContext;
  for (const event of events) {
    await processEvent(event.payload);
  }

Additionally, could you tell us a little more about your use case for needing FIFO? Do you need this only for Connect to Forge migration, and specifically only for those 429 issues?

It would improve our DC to Forge migration where we need to split our customField migration part. Let’s take a larger instance where we could have 1 million issues with 3 of our custom fields. We use this API and split it into 3 * 1.000.000 / 1000 (1000 Updates per API Request) = 3.000 files. These files are currently handled by atlassian using something similar to the current queue and processed in parallel (not sure how many in parallel). During our internal tests, we did run into 429 issues because a few APIs are called in pallel, which we currently can not influence ourself. When the migration assistant runs into a 429 (or any error), the same file will be processed at a later point. If there is no update for 15 minutes the app migration will marked as timeout. The fifo queue may allow us to have a steady flow of api requests

RashiChandola · September 25, 2025, 7:33am

Hi @m.herrmann

Thanks for the reply.
Regarding the 429 issues, have you tried setting concurrency key and limits to ensure all the events don’t execute at the same time?

Also, would it be possible for you to attend a call with the team so that we can understand your need for FIFO in greater detail? I have DMed you about this as well

m.herrmann · September 25, 2025, 12:19pm

Thanks for the concurrency hint, I just checked the Async Events and can see some good changes compared to the time I worked on the forge migration asisstant.

I think the new version of them is everything we need to improve our migration assistant process as we don’t actually need the correct order of events.

To contribute something with regard to this RFC because I could not find it:

Is something similar to the Async Event “Retry Request” planned?

RashiChandola · September 26, 2025, 4:48am

Yes, Marcel. We are thinking of using “retryContext” for FIFO as well.