Forge app restarts without any errors thrown

BrianPham · January 15, 2025, 5:07am

Hi, I am having a Forge app as a dynamic pipeline provider for our Bitbucket repository.

We encountered an issue that the app just does not continue working but restarts over and over again until Bitbucket pipeline is timeout. Running it in the local environment is totally fine.

We figured out that it’s a single string concatenation operation, which was added in an attempt to correct the logic, that causes it restarting.

// This will cause the issue
stringA + '/' + stringB

// This will not
stringA + stringB

We assumed that it’s an performance issue and that the Forge app has run out of memory. So we move the whole function to AWS Lambda and let the Forge app calls it.

FYI The run in Lambda takes ~1400 MB memory

But still the issue happens. The Forge app calls the Lambda for the first time, gets restarting while being in the middle of the call, call the Lambda for the second time, succeeds. This causes us to bear 2x billed duration in Lambda

Any help would be appreciated!

CaterinaCurti · January 17, 2025, 5:13am

@BrianPham,

Would you be able to share some of the traceIds of the failed invocations with us?
We would like to make sure that the memory limit is what’s causing the issue.

FYI: each Forge invocation has a limit of 512MB as the Available memory per invocation (Invocation limits).

This might indeed be the cause but we would like to be able to tell for sure by looking at the logs on our end.

Thank you,
Caterina

Benny · January 17, 2025, 5:41am

Best to raise a support ticket with us so we can investigate what error you are hitting on our end: https://developer.atlassian.com/support
In that ticket, please provide your AppID and a traceID of it failing.

It could be several things; hitting the 25s Lambda timeout or running out of Lambda memory.

BrianPham · January 17, 2025, 5:16pm

Hi, thanks for jumping into this. My app id is 9cea5b71-26f0-442e-8646-4ff63a4b655f.
These are the ids of the one that was stopped

Invocation ID: 887806ca-a1e6-413e-afb1-35c0c12304bf
Trace ID: a090df368be7464aab01cd69c1af6eee-e26448a2203cc103

And these are the ids of that subsequent one that ran successfully (which took 9secs)

Invocation ID: c8ca6005-b76f-4213-9c28-81e6370905a8
Trace ID: a090df368be7464aab01cd69c1af6eee-321dc4c126c50476

The lambda function takes ~27s on avg with 0.7s of initialisation, and it does run 2 times for each CloudWatch log stream.
Hope they’re helpful.

Before I raise a support ticket, I’d like to see if there’s any quick and simple resolution that I can take on my end , so peeps can also see this

Thanks

JingYuan · January 20, 2025, 6:37am

After checking the log, I found the first invocation for your Forge App timed out after 25 seconds and the second one succeeded after around 9 seconds. I wonder if there’s some complex logic in your code and the lambda retains the result from first run so the second call succeeded. That also explains why it works on your local because it does not have the time out.

Maybe have a review of your code and see if you can speed it up by doing some optimisation, so that it can be executed within the limit of 25 seconds (https://developer.atlassian.com/platform/forge/platform-quotas-and-limits/#invocation-limits)

BrianPham · January 20, 2025, 4:19pm

Thank you for taking a look. The fact that it succeeds after only 9 seconds is very weird. I’ve moved all of main logic to the lambda so the logic for the Forge app is relatively as simple as this:

  logger.info('Start running!!!');
  logger.debug({ request, context });

  if (commitHash !== 'abc') {
    logger.info('Not the testing commit, Stopping!');
    return defaultResult;
  }

  const lambdaResult = await getFromApi(logger, lambdaUrl, 'POST', { request, context }).catch(
    (e) => {
      logger.error('Lambda failed to return a result');
      logger.debug(e);

      return defaultResult;
    },
  );

  logger.info('Finish running!!!');
  return lambdaResult;

As for the lambda, now we have a time constraint, we will do some optimisations to make it working. But the thing is, our lambda is very complicated. Long story short, we try to analyse our big monorepo to identify which projects are affected by the changes and generate the pipeline accordingly. As the our repo grows, it is inevitable that it will take more than 25 seconds to complete.

So I wonder, if there is any ways that we can increase the limit of 25 seconds.

JingYuan · January 24, 2025, 5:35am

Hi @BrianPham,

Sorry currently there’s not a way to extend 25s timeout for synchronous execution. Forge asynchronous execution( using async events) has a longer timeout of 55s, and in the near future long running functions(also asynchronous execution) will be available which can run up to 15min.

However Bitbucket dynamic pipeline only supports synchronous execution so unfortunately there’s no work around at this stage.