RFCs are a way for Atlassian to share what we’re working on with our valued developer community.
It’s a document for building shared understanding of a topic. It expresses a technical solution, but can also communicate how it should be built or even document standards. The most important aspect of an RFC is that a written specification facilitates feedback and drives consensus. It is not a tool for approving or committing to ideas, but more so a collaborative practice to shape an idea and to find serious flaws early.
Please respect our community guidelines: keep it welcoming and safe by commenting on the idea not the people (especially the author); keep it tidy by keeping on topic; empower the community by keeping comments constructive. Thanks!
Summary of Project:
This project aims to better support long-running compute processes on Forge by extending the timeouts on Forge functions on async event consumers.
- Publish: 9 August 2024
- Discuss: 23 August 2024
- Resolve: 30 August 2024
Problem
Forge functions provide the hosted compute for the Forge apps. Currently, Forge functions have a timeout of 25 seconds for most modules and 55 seconds for async events. This limits the amount of processing that can be done in a single invocation, which adds complexity for apps with long-running processes.
These timeouts can be difficult to work with if your app is:
- Processing large amounts of data (e.g. when setting up a new installation)
- making a high volume of API calls (e.g. paginating through an endpoint), or
- calling APIs that can take a long time to respond (e.g. LLMs)
While it is often possible to work around these limitations by batching up large processes with async events, breaking up big workloads into such small batches adds complexity to your integration and can create overhead that leads to other limitations.
This has meant that developers building apps with long-running processes have had to invest in complex workarounds or have decided not to run their compute natively on Forge.
We see this as an important problem to solve so that more developers can build their apps natively on Forge. This is increasingly important to our enterprise customers as they migrate to the cloud.
Proposed Solution
To solve this problem we intend to increase the timeout of Forge functions on async event consumers to 15 minutes, which is the maximum supported by AWS Lambda. To achieve this in a performant and scalable way, we will introduce the ability to run these functions asynchronously.
For example:
consumer:
- key: big-job-queue-consumer
queue: big-job-queue
resolver:
function: processBigJobQueue
method: submit-big-job-queue-listener
function:
- key: processBigJobQueue
handler: index.processQueue
async: true #opt in to long-running functions
timeout: 900 #specify the timeout up to a max 900 secs
Adopting async functions on async event consumers won’t cause a change in behaviour and the developer experience remains much the same (except for updating your manifest). Logs and metrics in the developer console will allow you to monitor invocation times, errors etc. in a similar way as you do currently.
If you need to start a long-running process from a different module (e.g., UI resolvers), you can use async events from the short-lived function.
Note: We acknowledge that the ability to send notifications from the backend to the front end is still a gap in Forge’s capabilities. For now, you will need to poll from your front end to detect when a long-running job is completed. Options for this will be discussed in a future RFC but feel free to share any ideas or concerns.
Limits for async functions
As part of the project we are assessing our limits around function invocations. In particular:
- Invocation rate limit: 500 per 25-second sliding window
- Log lines per invocation: 100
- Log size per invocation: 200kb
- Network requests 100
We would be really interested in your feedback on how these limits would affect your app once 15-minute functions are available. Which would be a blocker for your use case and what increase in limits might you need?
Asks
While we’re happy to get any feedback on this RFC, we’re particularly hoping to get insights on:
- What use cases do you see this enabling?
- Do you think this solution is sufficient to support your long-running tasks?
- If you’d prefer to have long-running async functions on other modules - what would they be?
- Would you like to see any changes to the existing invocation limits to ensure this will work for your use case?