Improving support for long-running tasks in Forge

Hi everyone,

I’m a Product Manager working on Forge, and my team is responsible for the runtime layer and function invocation flow for the platform.

As part of planning our future roadmap, I would love to understand in detail where Forge’s existing architecture, quotas and limits prevent you from supporting long-running tasks within your app.

If you’ve experienced this problem, I want to hear from you! I am especially keen to understand:

  1. What use case or customer problem were you trying to solve?
  2. When in your app creation journey did you get blocked by this limitation (early on in prototyping or much later in development or during customer testing?)
  3. What workarounds or alternatives did you try?
  4. What you would like to see changed in Forge to improve support for long-running tasks?

Please feel free to reply directly in this community topic, or else you can set up some time with me to talk about it in more detail: Calendly - Joe Clark

Thank you in advance for your time and feedback!

Joe Clark
Atlassian

8 Likes

@HeyJoe, I was wondering if this would qualify for an RFC?

CC: @ibuchanan

Hey @remie - yes, I think any proposed solution would be a great candidate for an RFC!

I’m still in the information gathering phase right now, so I think an RFC would be premature. I’m not recommending any specific solution yet and, in fact, I want to avoid having a solution in mind before researching the problem :slight_smile:

If you have strong opinions about how you’d like to see Forge evolve in regards to compute capacity and supporting long-running/batch tasks I’m eager to hear your input in whatever format is convenient for you.

3 Likes

Two issues we sometimes struggle with are the 25s load time and the GraphQL 5242880 byte size limit.

The 25s load time limit we typically run into when dealing with paginated API calls, including to Atlassian APIs and third party. For example, when doing an API call to perform a JQL search to get issues to generate a report.

The size limit we run into when generating Confluence macros with large tables or multiple tabs of information, and end up trimming down, or splitting into separate macros.

Any recommendations for tackling either of these would be appreciated.

1 Like

Cross-posting a use case:

2 Likes

Hello, when I was testing Forge I run into the issue of really slow loading times. When I investigated I figured that making calls to ‘bridge’ took atleast 2 seconds, which doesn’t seems such a long time but when there are more calls it adds up.

I found those links

In this ticket it was supposed to be fixed
https://ecosystem.atlassian.net/browse/FRGE-232
But the issue was still there ( I had most recent versions of all packages and Forge )
Since it was slow even when I requested static resources via ‘bridge’ it looked like it is beeing slowed on purpose.

So the biggest trouble I had with Forge was performance because of that I never left researching and testing phase.

I’ve had a couple of product ideas that would require iterating over all of the content in a Confluence instance and grabbing certain fields or metadata from each page. Think specialized search and organization tools. I haven’t implemented it because the 25s limit could cause issues with doing this on larger instances. There might be workarounds using queues but they have their own limits and I didn’t feel like going down a rabbit hole to find a dead end.

The thing is I would only need to do this once when the app was installed. After that, I could just listen for page_created, page_updated events to keep the index up to date.

Thank you for gathering feedback on this topic, @HeyJoe!

We have a Forge app that scans Confluence spaces to find outdated and archivable content. The app is processing a lot of pages in the background to generate reports for our users. We started working on it shortly after the async events API was released and had to find a few workarounds for its limitations:

  • When making lots of requests to the Confluence API, it can be difficult to predict how many requests can be made within 25 seconds and how to split up a large amount of work into small 25 second batches. After every batch, we are pushing a new event to the queue until all batches are processed.
  • Since the 200 KB async event payload limit is too small for our use case, we had to find a way to store intermediate results in Forge storage after each invocation and to aggregate them into a report at the end.
  • It’s hard to control parallelism with async events because we don’t know in advance how many batches we have to process. We can’t run everything in parallel because of REST API and Forge storage rate limits.

I think we could avoid most of this complexity if we could run background jobs without time limits.

@DavidPezet1 - how big of an increase to the 25s invocation timeout limit would you need to no longer worry about this limit?

In terms of recommendations for right now, the best approach is to use the Async Events API to process the paginated results in batches if you are not already doing that. Even then, there are still limits that put an upper ceiling on how much you can process in this way.

Another option is to utilise external resources to offload some of the compute from your app, but this isn’t a feasible option in all cases, and erodes some of the value of trying to build your app as wholly contained within Atlassian’s cloud in the first place.

I had personally not come across the limit before! Thanks for flagging it with me - I will pass this feedback on to the Confluence team and see if they have any recommendations.

Hi @LukasKotyza1

When I investigated I figured that making calls to ‘bridge’ took atleast 2 seconds, which doesn’t seems such a long time but when there are more calls it adds up.

Thanks for flagging this. Where are you based in the world? Currently all forge functions run out US-West, so there can be a pretty substantial delay caused by geographic latency - we are planning to address this and make Forge invocations multi-region later in the year.

Thanks @ryan for calling this out. In my conversations so far this is appearing as a very common use case!

Thanks for this detailed feedback, @klaussner !

Some follow-up questions:

  • By storing intermediate results in Forge storage, have you run into any read/write API limits on the storage API?

  • We may not be able to support background jobs with no time limits easily (due to our current reliance on Lambda for invocations - which can go up to 15mins). What kind of increase in the timeout limit would substantially simplify things for you? Does 1min make a difference? 5min?

Hey @HeyJoe, that could explain it. I am based in Europe and my instance of Jira was also in Europe.