Ensuring your Atlassian Connect app handles customer site imports

HeyJoe · September 29, 2020, 6:11am

Hi everyone!

I’m writing here in the developer community to provide details on our proposed solution to AC-1528 - Add-ons break after customer data imports. We’ve had lots of feedback from customers, partners and account managers that this problem is causing pain on a regular basis, especially as the frequency of server to cloud migrations increases.

We’ve been able to implement a fix for this problem that also requires a change to the Connect app. In this post, I’ll explain the problem in a bit of detail and then describe the solution and what change is required.

Why do site imports break apps?

Firstly we need to distinguish a site import from a site restore from backup. Restore from backup is an operation at the database level within a single site. When a site is restored, its clientKey does not change. Import, on the other hand, is the process of taking the content of a site, potentially a different site, and bringing it into an existing site. It will result in a new clientKey value.

When an import occurs, installed Connect apps are uninstalled in the target site. Any Connect apps installed in the source site are also not included. This is done so that any install information from the original site is not included, so that two distinct sites do not end up with colliding installation records from the app’s perspective.

Although the site URL stays the same, e,g foo.atlassian.net, an import effectively creates a new logical site. For this reason the clientKey of the site changes. This change of clientKey indicates that this is effectively a new site. The old version of the site, including any app install records associated with it, is removed at this point in time. Due to some underlying architectural limitations, apps are not notified when this occurs.

Loss of secret synchronization

The import process causes a loss of app install state and secret synchronization. This happens in the following scenario:

The app is installed in a site, with the original clientKey.
The site undergoes an import. App install records are removed and the site’s clientKey changes.
Re-installation of the app is attempted in the site after the import is complete. As Connect has no install records, the install request is unsigned.
The app sees an installation request for the site but already has an existing installation record for a different clientKey. The app (correctly) rejects the install as unauthorized (401), as it is expecting the request to be signed with the original clientKey.

Until recently, we could only resolve this desynchronisation manually, in co-ordination with the app developer (eg. via a customer support request).

Our solution to the problem.

When Atlassian encounters a HTTP 401 (Unauthorized) response to an install request, we will retry the installation by sending a follow-up installation request; this time signed with the app’s current secret.

How should apps handle this situation?

Your app should continue to validate that requests from Atlassian are validly signed. Your app should continue to reject un-signed installation requests as normal (excepting the first, unsigned install for a site).

When you receive a signed install request for a site that has an existing installation record under a different clientKey, your app should associate the site with the new clientKey. Older clientKeys for that site should no longer be used for signing outbound requests to the site.

Orphaned installation records should be retained for 30 days before being cleaned up.

Recommendations for atlassian-connect-express apps

atlassian-connect-express always uses the clientKey as the unique identifier for an installation. So, an install following an import will automatically behave as a new install. The existing install with the old clientKey remains, but is effectively inactive. If your app ever searches for an installation record by its base URL, you should ensure you are finding the latest clientKey entry associated with the baseURL in the database.

Recommendations for atlassian-connect-spring-boot apps

atlassian-connect-spring-boot will also treat an install following an import as a new install and create a new entry keyed on the new clientKey. For most operations, atlassian-connect-spring-boot uses the request context to determine the clientKey to use in making any API calls.

atlassian-connect-spring-boot provides a method to look-up an installation based on the base URL. This can be used to initiate an API call when there is no incoming request context. Until recently, the returned record (and therefore its underlying clientKey) was non-deterministic (see ACSPRING-117: Host lookup by base URL is not deterministic. This is fixed in release 2.1.0 of atlassian-connect-spring-boot so that this method always returns the most recent clientKey for the site.

Recommendations for custom apps

If your app is a custom development or uses a separate framework from the above, you should observe the following guidelines:

Installs which use an existing clientKey

If Atlassian has no record we will send an unsigned install request. This must be rejected with a HTTP 401 response. After failing an unsigned install, Atlassian will fall back to an install request signed with the app’s standard key. If this is valid you may proceed to update the install record. If the install request is not signed your app must never replace an existing install.

Installs which use a new clientKey.

If your app only makes API calls in the context of an incoming request, e.g. on receiving a webhook call-back, you may simply create a new record for the new clientKey. Any existing install records for the same site will become inactive.
If your app does host base URL look ups and your app has an existing install for this base URL, you should reject the unsigned install request. Connect will retry with a signed install. If this is valid you should create a new record. Any base URL lookups should use the latest received install information.
If your app iterates across all install records, you should operate as though you perform base URL lookups (described above) and you should ensure that you only accept signed requests to add an entry for an existing site. Also, the iteration should only use records which are active - i.e. the latest received for a given site.

In all cases, please observe the golden rule:

Your app must never overwrite or remove an installation unless the installation request has a valid signature from Atlassian.

jack · September 29, 2020, 6:29am

Hi,

Do I understand correctly, that if we use atlassian-connect-express and we don’t search/access the records by base URL (but clientKey only), then the site import works fine and we don’t have to do anything?

Update: we still need to delete the orphan data after some time.

Thanks,
Jack

james.dellow · September 29, 2020, 7:09am

Does this mean that you never re-issue site URLs to different entities?

cmacneill · September 29, 2020, 7:32am

This topic is about dealing with site imports and what they mean for a site’s addons. It does not really address the question you are asking, which touches on legal, trademark and policy realms.

I’m not really able to give insights in those areas. There are definitely circumstances where a site URL must be assigned to a different entity. Such a site would, of course, have a different clientKey.

angel · September 29, 2020, 7:42am

Is there a due date for this new mechanism to come into effect?

james.dellow · September 29, 2020, 8:45am

Maybe I should have asked, will the shared secret remain the same?

angel · September 29, 2020, 9:39am

Could not this be handled by ACE or atlassian-connect-spring-boot themselves?

aragot · September 29, 2020, 9:45am

Do I understand correctly:

The clientKey is generally the primary key, except when it isn’t,
No /installation request should be accepted if the clientKey OR the baseURL already exist, unless it is signed with the (old) secret,
A request to change the baseUrl can be sent to /installation, signed by the secret,
A request to change the secret can be sent to /installation, signed by the old secret,
A request to change the clientKey can be sent to /installation, signed by the old secret, and the secret doesn’t change.

Concerning AC SpringBoot 2.1.0, it will fix this but…

… the table AcHost has no ID column, clientKey is the primary key. Therefore any other data in our apps have a “clientKey” column.

How should our app be notified that the clientKey has changed, so we can update the clientKey in every record of the app?
Should the table AcHost have an ID in SpringBoot, so we can reliably rely on an immutable primary key?

Thank you,
Adrien Ragot, for Requirement Yogi

andy · September 29, 2020, 11:41am

About that. In case you didn’t know, purging data is not quite enough. As cloud app uninstall is two phase most customers only unlicense, leaving the app installed (MC-235). That means after we’ve purged the data, the client still has the app installed. Some time later, this instance is rebooted for maintenance/other, and we get an install hook, again, and repeat, forever. Meantime we continue to receive all their webhook traffic, forever. Every cloud vender is in the same boat, they may just not realize it.

To topic. Please clarify if site-imports remove unlicensed apps, or is the problem of ghost-installs going to keep happening indefinitely?

BobBergman · September 29, 2020, 2:59pm

Does the oauthClientId remain unchanged or does it also get rotated?

HeyJoe · September 30, 2020, 1:18am

@jack yes, if your app is based on ACE then you just need to make sure you are periodically cleaning up orphan data.

HeyJoe · September 30, 2020, 1:19am

Yes, the secret would remain the same, since it is keyed to the app key.

HeyJoe · September 30, 2020, 1:28am

Adding a mega-reply to all the questions rather than replying individually.

Via @angel

Is there a due date for this new mechanism to come into effect?

The new behaviour on the Atlassian side is already in effect. The fall-back installation requests are in production.

Could not this be handled by ACE or atlassian-connect-spring-boot themselves?

We’ve ensured that ACE and atlassian-connect-spring-boot have the correct behaviour by default. However, it would still be possible for app developers to implement logic that queries for installations by base URL instead of by client key, which would cause problems.

Via @andy

To topic. Please clarify if site-imports remove unlicensed apps, or is the problem of ghost-installs going to keep happening indefinitely?

Apps are not re-installed as part of a site import, they need to be installed via user intervention. So, there would not be a situation where an import causes previously uninstalled apps to re-appear.

techtime · October 5, 2020, 5:24am

What is the actual “business” expectation from the user’s point of view for this process – should the data that existed on the app vendor side be automaGically re-linked after this reset?

Is this purely trying to address the failure that used to happen on re-subscribe or actually the fact that a sequence of normal actions as described would cause a complete disconnect of this customer from their data on the vendor’s side?

Speaking from experience as Solution Partner currently involved in a complex Cloud to Cloud merge project – where this exact use case happens across approximately 12 vendors.

If I am able to re-subscribe but then still have to go through 12 vendors trying to recover my client’s data before it gets purged as “orphan” – that’s not really an improvement.

I do note comment from @cmacneill that NOT connecting to the data automatically is a valid use case. In this case how are we to distinguish the site re-import from such an event? The way I am reading this – the site URL is re-assigned to a different entity. Same URL, different clientKey, same secret (apparently it is “keyed to the app key”?). How is this different from re-import?
If the automatic reconnect is expected, why would anyone (using a custom platform, not ACE or springboot) create new records and keep “orphans” around in the first place? Why not update the existing record (by URL) after all security/signature checks and not have to deal with the concept of “latest”?
If the automatic reconnect is NOT expected under not circumstances until ownership of data is verified – can this be explicitly clarified?

BobBergman · October 5, 2020, 3:10pm

Thanks for raising these questions, I had it on my todo list to ask these same things today.

james.dellow · October 5, 2020, 9:07pm

The way I am reading this – the site URL is re-assigned to a different entity. Same URL, different clientKey, same secret (apparently it is “keyed to the app key”?). How is this different from re-import?

That’s the point I was trying to get at earlier in the thread. I’m operating under the assumption that if it is signed, it must be the same (legal) entity and they want to stay connected to their original install. But I would like to see that clarified as well.

james.dellow · October 5, 2020, 9:09pm

@HeyJoe for marketplace apps, how do we test this process, without creating a temporary site and going through the site import process?

danielwester · October 5, 2020, 9:29pm

Url is a secondary attribute. The only thing you can use the secret against to confirm validity is the instance key. If you try to identify instances through the url - you’re asking for trouble. At least from my understanding.

Hoping that somebody from atlassian will speak up.

@mhart @rwhitbeck @mpaisley @HeyJoe

techtime · October 5, 2020, 9:44pm

Well, hence the question… what EXACTLY is this trying to solve? If we are not supposed to automatically marry the new clientKey to the data previously linked to the old key – then it seems the scope of this “improvement” is merely to remove the failure to re-subscribe i.e. this makes Atlassian system look better, but the very next step for most cases will be “wait a minute, but where is my data?”

cmacneill · October 5, 2020, 10:22pm

The clientKey is the primary key.

As you note, keying off the site URL is problematic. It will mean your app will not handle situations like a site rename, where the clientKey does not change but the site URL does.