Let's chat about Connect webhooks & Forge events

HeyJoe · March 17, 2022, 7:10am

Hi everyone!

My team is working on some new enhancements to the Forge events module. We are working to make Forge events capable of high scalability, high throughput event processing - perhaps even better than what you can achieve yourself in Atlassian Connect!

I am eager to meet with existing Connect app developers who have built complex webhook processing infrastructure to understand how we could offer you a superior experience for event processing in Forge.

I would love to hear about:

How your event processing is set up today
What are your biggest pain points?
Whether or not some of the enhancements we are prototyping would help to simplify your app.

I’m happy to chat either here in this thread on the Developer Community, or you can schedule some time with me to talk face-to-face in either US-friendly or EU-friendly working hours: Calendly - Joe Clark - Let’s chat about webhooks

Looking forward to hearing from you! And of course, my door is always open if there are other hot topics you want to ~~complain about~~ discuss with me!

Cheers,
Joe @ Atlassian

MateKavaleczMidori · March 17, 2022, 2:22pm

Hello Joe!

First of all, thank you, I feel like posts like these are a real lifeline for Forge development.
I’d like to share one of my biggest pain points about web-triggers: The URL is created through the Forge CLI.
This might be great for development use-cases, but this feature would also be very good for handling production callbacks between different parts of a micro-service system.
For that to be realistically feasible, I’m pretty sure that we would need a way to have a well-known static url, or at the very least, make it available to get programmatically.
As far as I know that these url-s stay constant as long as these parameters are all the same:

module key
app
site
product
Forge environment

While most of these are static and well-known, the site is what throws off our workflow. We would need to be able to get these url-s in runtime for every customer, which is not possible currently, as far as I know.
Just my 2 cents. My teammates might come back with more ideas later, I just wanted to share this opinion before I forget to do so.

danielwester · March 17, 2022, 2:48pm

@MateKavaleczMidori You can retrieve a web trigger url through the api - see https://developer.atlassian.com/platform/forge/runtime-reference/web-trigger-api/

danielwester · March 17, 2022, 2:53pm

For us - one the biggest challenges is to dynamically update all issues using a logical statement (ie not just set these fields to have X as a value). We’re currently solving it by using the async event functionality to walk across the issues in a search. But that can be quite chatty (and I’m not sure how scalable that is).

If we had some type of thing where we could supply a list of objects (or descriptor of a list of objects - like a search configuration) and a function to process each item on the object - it would be really awesome.

paul · March 17, 2022, 6:09pm

It would be good to be able to specify the REST parameters of the issue that is passed to the webhook. Currently, if the issue details doesn’t have everything we need we have to retrieve the issue again. Being able to specify the issue query parameters in the manifest would mean that the issue passed would have exactly what we need. It would also reduce the need to have to re-query the exact same issue again. It could also speed things up if we only needed the basic issue details

I’m not sure if there are any limits on the amount of time that a webhook can run for but if there is a limit then the ability to run an async webhook would be great (not sure if this is the case at the moment). Three of our connect apps currently create a new task that can run for as long as it needs. Not sure if you can do this in Forge at the moment

Lastly, we want the events to pass enough information. The last time I checked Forge, a delete comment event doesn’t provide you with the text of the deleted comment or its id. So all you know is that a comment was deleted off of a particular issue. The only way around this is to keep a list of current comments and re-query all comments to see what was removed.

EDIT: It would also be nice to specify that I only want the webhook to fire if an issue was transitioned. At the moment in Connect, you just do and update issue event.

Thanks
Paul

remie · March 17, 2022, 9:17pm

Our apps heavily rely on events, and more specifically message queueing or Pub/Sub. We are currently hosting on GCP Firebase, making optimal use of the Firebase integration with GCP PubSub. It allows us to register function handlers as either http/https, topic-based message queues or scheduled topic-based message queues.

Some common use cases in our apps:

Making task processing asynchronous. A request from the frontend is received by the backend and an event is fired to move the processing to the background.
Splitting up task execution to avoid compute timeouts and compartmentalise logic / reponsibility. If function A finishes, it commits to the database and triggers function B with an event
Scheduled tasks that run over multiple tenants (for instance, GDPR related clean-up tasks on inactivity). These scheduled tasks often also use the #2 pattern, looping over the instances in the scheduled task and firing events for further processing.

Now with regard to web hooks

There are several issues with the current implementation of web hooks in Connect:

There is no delivery guarantee, meaning that all vendors have implementing polling for the failed web hooks endpoint or other ways to ensure that they didn’t miss out on an event.
Filtering is only possible using dynamic modules, which are limited to 100. In most cases, this means just accepting all web hooks and filter them in your backend. This is a huge part of why a scalable architecture is a must for any web hook filtering (which we achieve with the earlier mentioned PubSub set-up)
Web hook payloads are confusing. As @Paul also mentioned, there is no control over the data in the payload and most of the times we just ignore it completely and only use the web hook to indicate that we should run our own processing. The web hook is merely a trigger for processing, not a useful entity on its own
A lot of events are missing (which is also true for Server/DC BTW)

In order for us to be able to use Forge, the platform must support a proper implementation of PubSub, allowing us to create a scalable environment that can also deal with compute timeouts.

MateKavaleczMidori · March 18, 2022, 7:01am

Yes, sorry, you’re right. I worked on this a bit earlier and my memory was hazy when I wrote my comment, but I wasn’t entirely wrong, let me elaborate:

Now that you mention it, I did find that API, but I don’t see how it can be called outside Forge?
I mean, it’s great that we can get the url programmatically inside Forge, but if we have a scheduled job starting up that runs outside Forge, and we want to call a Forge function, it doesn’t really help that there’s a Forge API, which gives us the url that can be used to call Forge functions. It’s a chicken-and-egg problem.
We would need to have that url readily accessible somewhere outside Forge, if there was a REST API or something we could call, that would work. Or do I misunderstand? Can this API be called outside Forge with something like a Basic Auth?

The last time I worked on this, I couldn’t resolve this issue at all, now that the install event is available in Forge, hypothetically speaking, I could subscribe to the install event inside Forge, call this webTrigger.getUrl API when the event occurs, and send the result to our backend to store it somewhere outside. This way we could use this I think, but that’s quite a bit roundabout for something so simple.

Also, another issue with using web triggers in production is that they are not authenticated. That’s great for development, but not very easy to trust for production.

JanPschko · March 18, 2022, 9:18am

Great to talk about improving Connect webhooks!

The main problem we’re facing right now is that webhooks are not reliably fired. Even in a new Jira Cloud environment with no other traffic, the jira:issue_created and jira:issue_updated webhook URLs are not always called when creating a new ticket or transitioning its status. I understand webhook delivery is not guaranteed, but (more often than not) it seems Jira isn’t even making any attempt to call the URL (nothing in our web server logs, and neither Jira nor our server or busy at the time).

Some more details here: Jira connect app webhook not being recieved for so...
and here: How to query Jira webhooks defined in descriptor of Atlassian Connect app? - Stack Overflow

Any ways to improve, mitigate or work around this (other than regular polling for all Jira data) would be appreciated.

HeyJoe · April 1, 2022, 8:37am

Thank you everyone for all your suggestions. I am feeding this into an internal document that will help inform our future roadmap for event processing in Forge.

BrunoMarotta · December 6, 2022, 10:35am

Not sure if this subject is still open, but you should stop sending webhooks for instance where the application have no valid license. This really bloats our server with unnecessary webhooks.

Example:

User install the app for evaluation
Evaluation expires but user never uninstalled app

We will be receiving indefinitely webhooks for this user.

remie · December 6, 2022, 10:44am

This is why we almost always use Dynamic Modules to configure web hooks. This allows us to disable them as soon as we believe the application is no longer active.

We have defined active if any of the following is true:

There has not been any server request for the last 30 days
The app no longer has access to the /rest/api/3/myself endpoint

In addition, we also remove web hooks if the user has not yet configured the app, or has removed any of the features that require the web hook to be enabled.

BrunoMarotta · December 6, 2022, 11:03am

That’s not a bad idea. But it is arbitrary. Both rules tells you that the user is currently not using our app. But it doesn’t tell you that the user will no longer use your app.

What happens if the user start using your app after 30 days? Are you processing backwards all changes that happen on his instance?

For an evaluation user, probably this is not a super important thing. But if I’m a paid user, I don’t want the app deciding by itself when it should process or not webhooks.

Now, if I’m an user that didn’t pay for one or two months, I think it is legitime that webhooks are not processed during this time.

A solution would be to call the check license API on every webhook call. But it is not pratically possible due to the amount of webhooks an app receives.

remie · December 6, 2022, 11:35am

The 30 days is only arbitrary to the extend that is considered to be the interpretation of undue delay within the context of GDPR data deletion. It is also part of our EULA. We try to notify end users of pending deletion, but due to the lack of means to contact customers apart from billing & technical contacts (which are not always the end-users) this is not always possible.

But yeah, if the user comes back after those 30 days, their data will be gone.