Ensuring your Atlassian Connect app handles customer site imports

cmacneill · October 5, 2020, 10:34pm

@aragot, to your bullet points:

The clientKey is the primary key for a site
If you use the baseURL to lookup sites and an installation updates that relationship, then you should verify such installs. If you only ever use clientKey interacting with APIs then you only need to validate an install with the same clientKey. In general we recommend only relying in the clientKey
Yes, baseUrls can be changed in athe /installation callback. This mechanism is used for site rename operations.
Yes, currently a secret can be changed using the /installation callback
No, a clientKey cannot be changed, as such. A new clientKey can be associated with a baseUrl that was previously associated with a different clientKey but this is a completely separate installation.

cmacneill · October 5, 2020, 10:36pm

The oauthClientId will be changed.

james.dellow · October 5, 2020, 10:38pm

Yes, I’m ignoring the baseUrl as an identifier - well, now, I am

But would like to get this clear:

unsigned install, treat as brand new (but the clientKey shouldn’t exist at my end)
signed install based on old clientKey, but use the clientKey (+ sharedSecret & baseUrl) in the body

Is that right?

james.dellow · October 5, 2020, 10:43pm

Except when it changes because of an import?

cmacneill · October 5, 2020, 10:52pm

This depends on what is the customer’s intent when doing an import. If they are importing a completely different site, with different content, and then install your addon, they will not want to associate with the old data.

If they are importing to correct some error, say to remove an erroneous project, they may want to reconnect to their old data.

As a site import creates a new site (potentially living on an existing url), you should not associate any existing data with the site. There will be cases, however, where customers may request that you do make that association. They will need to explicitly request that to happen.

Automatic reconnect is not expected and should not be performed. Orphans should be kept around because it may help out a customer who need that data.

cmacneill · October 5, 2020, 10:56pm

If you think of a “site” by the baseUrl, this statement seems reasonable, but in reality a site is defined by the clientKey. The site associated with a baseUrl can change. A new clientKey with the same baseUrl as some other clientKey is a new instance of that site. The clientKey associated with that instance will not change.

cmacneill · October 5, 2020, 11:06pm

The intent of this post is to answer the question raised in AC-1528: Why do addons break after a customer import.

If addons are keyed off the baseUrl, then they will break. In addition they will not be compatible with site-rename operations. We are providing these guidelines to reduce the amount of 401s customers are experiencing when attempting to install addons after an import. These 401s are due to loss of secret sync between the product and the addons.

We are not trying to make Atlassian systems look better at the expense of addons, we are trying to improve the overall experience for a lot of our mutual customers. A lot of addons do key off the baseUrl and that should be changed.

I think there will be instances where a customer will ask “… but where is my data” but it is not safe to make that association automatically.

aragot · October 5, 2020, 11:29pm

Cmacneill, @HeyJoe literally explained the opposite of what you are saying. He said “A reimport will result in a new clientKey value.” and “For this reason the clientKey changes.”

Maybe you should gather with Joe and check what was coded, because if no-one can spell it out to the ecosystem, I wonder what the programmer understood of the solution.

As a side note, making sure we don’t use the baseurl as a primary key, is not the central point here, although it was worth emphasizing what the primary key was after Joe saying clientKeys could change

james.dellow · October 5, 2020, 11:31pm

Exactly. And I’d like to know how we go about testing all these cases (easily).

cmacneill · October 6, 2020, 12:10am

@aragot @HeyJoe and I are aligned and I don’t think what I said was really the opposite of what Joe said. It’s comes about from the perspective taken.

Say we have foo.atlassian.net and it exists with clientKey 1. The customer decides to import a different site’s content.

foo.atlassian.net now has clientKey 2.

So from the perspective of the baseUrl, the clientKey associated with the baseUrl has changed. This is, however, because this is, effectively, a new site. It just lives on the same Url where an old site used to live. New clientKey is a new instance of the site. The clientKey associated with the data that represents a site will never change. the clientKey associated with a baseUrl will change.

People do tend to associate the site with the baseUrl but from a data instance perspective, a site is associated with the clientKey.

I hope that clears it up. Let me know if you still have questions.

james.dellow · October 6, 2020, 2:17am

@cmacneill so what I’ve done is put in place the following for the install lifecycle:

If there is no authorisation token, treat as a new install
If there is a valid authorisation token, I update my install document with whatever is in the body (baseUrl, sharedSecret, clientKey, etc). But if the clientKey is different, I now start to use the new clientKey as the primary key.

But:

If there is no authorisation token but the clientKey exists, I reject it
If the authorisation token isn’t valid, I reject it

In all cases the clientKey is the primaryKey. BaseUrl is just a property I associate with that clientKey for the purpose of interacting with the REST API.

Is that what you would expect to happen?

Just to check - if the baseUrl changes, does that trigger a reinstall?

techtime · October 6, 2020, 3:07am

So what I am not entirely comfortable with here:

the secret key arrives to us for the first time in the unsigned /install payload (together with the URL, and the client key). This is what we use to create a “subscription” record on our side.

As per this communication, when the site is reset/reimported we will get two /install callbacks both with the new client key – one again unsigned (why is this being sent at all in this case, is unclear) which we are supposed to reject based on the fact that we already have another client key associated with this URL then another one, signed, which we are supposed to trust based on the secret key stored in some other record, with a different clientKey, that just happens to have the same url, and once verified create a new subscription record that should supersede that other one that we just used to verify the request and never use it again.

I keep getting a nagging feeling that the “chain of trust” is prone to be broken here unless something else is a MUST.

Considering the fact that any hacker can flood my app backend with unsigned /install requests with any content, being “latest received install information” is not enough, I think.

For the above to really work the record we use to verify the signature has to be trusted in the first place, so the app vendors must take some verification action (perform a callback to the instance the /install is claiming to be from) before any subscription record can be considered trusted?

In the case that another entity takes the possession of the URL is there any stand down period? Does this somehow correspond to the 30 days deadline for “orphans”?

If one registers a site, subscribes to the app, then resets the site/reimports the site so the app doesn’t get /uninstall, and then doesn’t re-install the app, then the URL somehow (?) goes into possession of another organisation, who subscribes to the same app – it seems we will get an /install request technically signed by a secret key that used to not only “belong” to a different clientKey but also to a completely different organisation.

The way the “orphans” are defined in the above relies on a valid /install request arriving with a new clientKey. It seems to be implied that a record is never an orphan until this happens. Is this “unless you actively verify that an active Atlassian site actually exists on the other side and find that it doesn’t”?
It’s unclear what methods to use to verify the ownership of data in the case of “site reset, app re-subscribed, oops where is my data?” and how (except for the length of the time period that has passed since the last known trusted interaction with the app) to distinguish a situation where the current owner of the data is asking to get their data back after site re-import from a situation where some other entity that somehow took possession of the URL is asking to get access to the data previously associated with this URL (via old clientKey)
The user experience – on one hand to make it less stressful in the case “oops where is my data after reimport” it would be good to tell the customer that they can get it back if they reach out to our support. However, if the entity that just took possession of the URL is unrelated telling them “hey we happen to have somebody else’s data here do you want it?” sounds like a security hole.
Possible mis-interpretations of “secret is keyed to the app key”. Can someone re-confirm what this really means?

Since presumably every distinct site gets a different secret, and apparently the secret survives the site reimport and clientKey change – does this mean that behind the scenes the secret is keyed to the app + URL? I am also aware (from experience) that the subscription to the app on the Atlassian side (i.e. billing matters) somehow survive the re-import/site reset too.

james.dellow · October 6, 2020, 5:45am

No, I think the second is signed:

I understood that to mean that the token will authorise using the old details but the payload will have the new clientKey?

HeyJoe · October 7, 2020, 10:30am

@cmacneill and I did collaborate on the original announcement here, so if anyone is seeing misalignment in what we’re saying, please do let us know so that we can clarify any mis-understanding.

We don’t really have a good way to test this except by manually triggering the import process as you described. There’s a lot of moving parts in the site import process that are hard to replicate in an automated fashion.

As per this communication, when the site is reset/reimported we will get two /install callbacks both with the new client key – one again unsigned (why is this being sent at all in this case, is unclear)

It’s sent because the plugin on the Atlassian side exists within the imported site and has no knowledge of the prior installation that existed, so it can’t tell if the plugin being installed was previously installed. Therefore, the initial unsigned install request is sent as if the app is being installed for the first time.

then another one, signed, which we are supposed to trust based on the secret key stored in some other record, with a different clientKey, that just happens to have the same url , and once verified create a new subscription record that should supersede that other one that we just used to verify the request and never use it again.

The chain of trust that’s kept intact is the shared secret, which persists across the installations, and should only be known by Atlassian and by the app. An attacker could flood your app with /install requests, but they would not have the correct shared secret, so you would be able to verify that the request did not originate from Atlassian.

For the above to really work the record we use to verify the signature has to be trusted in the first place, so the app vendors must take some verification action (perform a callback to the instance the /install is claiming to be from) before any subscription record can be considered trusted?

So you’re saying that a malicious attacker could construct a fake initial install, with a bogus shared secret, and then send a follow-up request to override that install, because they still have that original secret? As long as the install data is isolated correctly, the attacker hasn’t gained any information that they didn’t already send to you in the first place, right? Let me know if I’m misunderstanding.

In the case that another entity takes the possession of the URL is there any stand down period? Does this somehow correspond to the 30 days deadline for “orphans”?

The process by which are previously claimed base URL could become available for re-use by a different customer or user is not a defined part of our API, but yes, it is substantially longer than the 30 day period for handling orphans (correct me if I’m wrong, @cmacneill?)

The way the “orphans” are defined in the above relies on a valid /install request arriving with a new clientKey. It seems to be implied that a record is never an orphan until this happens. Is this “unless you actively verify that an active Atlassian site actually exists on the other side and find that it doesn’t”?

Yes, orphan records could also already be created in your database through other mechanisms. For example, if a customer’s Atlassian site is deactivated and the uninstall lifecycle events are not transmitted correctly.

It’s unclear what methods to use to verify the ownership of data in the case of “site reset, app re-subscribed, oops where is my data?” and how (except for the length of the time period that has passed since the last known trusted interaction with the app) to distinguish a situation where the current owner of the data is asking to get their data back after site re-import from a situation where some other entity that somehow took possession of the URL is asking to get access to the data previously associated with this URL (via old clientKey)

Is it possible for you to use billing records or license information to verify the customer’s identity? The customer’s license data should uniquely identify them?

The user experience – on one hand to make it less stressful in the case “oops where is my data after reimport” it would be good to tell the customer that they can get it back if they reach out to our support. However, if the entity that just took possession of the URL is unrelated telling them “hey we happen to have somebody else’s data here do you want it?” sounds like a security hole.

Yes, I agree that there’s a trade-off on data privacy vs. usability here. We’ve erred on the side of caution by recommending that the customer ask for the data to be re-associated with their site before you take any action. However, I’m open to feedback if you discover that this process is causing some serious user friction.

Possible mis-interpretations of “secret is keyed to the app key”. Can someone re-confirm what this really means?

It means that the generation of the secret is associated with the unique key of the app, not the client key of the site. If the client key changes, the secret remains the same.

Since presumably every distinct site gets a different secret, and apparently the secret survives the site reimport and clientKey change – does this mean that behind the scenes the secret is keyed to the app + URL?

Currently there is a single secret for the app shared across all installations/all sites. There are some downsides to this approach, so I’d recommend avoiding relying on this mechanic where possible (ie. assuming that the secrets can be different per site would be a more future-proof assumption).

I am also aware (from experience) that the subscription to the app on the Atlassian side (i.e. billing matters) somehow survive the re-import/site reset too.

Yes, the installed state of the app and the licensed state of the app are two mutually exclusive variables. So, it’s possible to have 4 states (installed & licensed, installed & unlicensed, un-installed but licensed, un-installed and un-licensed). Licensing data is stored in our central billing system and is re-associated with the customer’s imported site when they perform the re-installation of the app.

Hope this helps! I’m definitely getting out of my depth on the technicalities, so when in doubt, trust what Conor says

techtime · October 7, 2020, 11:43pm

if anyone is seeing misalignment in what we’re saying, please do let us know so that we can clarify any mis-understanding.

The confusion re: Joe vs Conor, I think was indeed in the perspective. The clientkey doesn’t change for every instance of a site (as per Conor) i.e. every re-import gets a new one, but from the point of view of a subscription that we (some of us?) as vendors need to hold for the customer (in fact for the URL that the customer owns), as per the process suggested the clientkey effectively does change, with every re-import.

Currently there is a single secret for the app shared across all installations/all sites. There are some downsides to this approach, so I’d recommend avoiding relying on this mechanic where possible (ie. assuming that the secrets can be different per site would be a more future-proof assumption).

OK, so the fact that the secret key is currently shared across all sites was news to me. TIL. I appreciate the note about not relying on that, though I have to point out that it is a bit cheeky to then rely on exactly that in the retry process being suggested.

I understand why the 1st request is unsigned – effectively at the time of this re-install due to reimport Atlassian side doesn’t know if the vendor already/still holds anything on their side to verify a signed request.

Yes, orphan records could also already be created in your database through other mechanisms. For example, if a customer’s Atlassian site is deactivated and the uninstall lifecycle events are not transmitted correctly.

So it seems to me vendors must have a way to actively verify the “is this client key still active” status. An orphan is only an orphan when this verification fails and then the 30 days start ticking.

Same applies to the “latest” record and possibly attackers faking installs (we already had this):

What I meant – imagine this:

a real customer subscribes to the plugin, there is only one record in the system
a hacker submits a bogus request, different client key, different secret
and another hundred of these to boot, just to make it more interesting
now the customer does re-import + re-subscribe
we reject the 1st /install since it has not signature and we already have other records for this URL
we get the 2nd signed /install - as per your process, since we shouldn’t rely on the secret being shared, we need to lookup the “last” record – but in our case the last one is the attacker’s

Hence the statement: “latest received install information” is not enough, it has to be trusted i.e. verified.

I also think this is dangerous even without re-import/re-install. A human doing support can easily mix up records and leak something to the hacker for example.

An attacker could flood your app with /install requests, but they would not have the correct shared secret, so you would be able to verify that the request did not originate from Atlassian.

Correct, what I am highlighting here is that this need to verify the request is a MUST, otherwise the chain of trust cannot be established reliably. To my knowledge, nowhere in Connect doco this is mandated at the moment.

Is it possible for you to use billing records or license information to verify the customer’s identity? The customer’s license data should uniquely identify them?

OK, so in the reset/reimport case, when the customer is coming to us distressed looking for their data, we really have to resort to “give me a screenshot of my.atlassian.com that lists the site, the URL, the SEN and your email address to prove that you own it”. And somehow prove to me that this is done recently and not a week ago? Should I be asking for a photo with a today’s newspaper in the background? Can this be run by Atlassian Security team please?

We’ve erred on the side of caution by recommending that the customer ask for the data to be re-associated with their site before you take any action. However, I’m open to feedback if you discover that this process is causing some serious user friction.

Well, as a Solution Partner I deal with real business users that only blow away their site for a reason, and in every case the disappearance of data from the apps is a major thing, and so far has always been unexpected. Blow away, re-subscribe, don’t care about the data and continue as if nothing happened sounds like a test/evaluation scenario to me. Also this 30 day period before the orphan is deleted – it will only take one high profile customer to get disconnected from something important for 30 days and not notice it to become a big stink.

At least a notice needs to be added to the document on how to reset the site AND preferably in the UI from where the reimport is triggered.

james.dellow · October 13, 2020, 11:53pm

@HeyJoe & @cmacneill I just had a resubmitted app rejected by the marketplace team because (I think) they tried to reinstall with a signed install webhook. I had to delete their install record to address the reason it was rejected originally (an issue with setting up a dynamic module at install time). Suggestions for dealing with this?

david.pinn · October 23, 2020, 2:40am

There’s a tonne of information in this thread. Can we make sure that it gets published as part of the Jira Cloud developer documentation? I can imagine that our dear Atlassian brethren are a little busy right now (LOL), but yeah, this stuff is too important to be left in an aging discussion thread. When you get 'round to writing up the doco, dear Atlassionies, please fold in as much of the meat of the other 36 comments as seem relevant. Peace to all.

james.dellow · October 23, 2020, 5:11am

Or in all the product pages that talk about the lifecycle!
What would be even better:

Document the lifecycle in a diagram
Create something like the Connect inspector https://connect-inspector.services.atlassian.com/ so we can test different scenarios

david.pinn · October 24, 2020, 12:58am

I’m confused. Can someone draw up a flow chart to illustrate the logic path for handling calls to the installation end-point?

david.pinn · October 24, 2020, 2:57am

But that would result in a customer being reconnected to their earlier data (but with a new clientKey), wouldn’t it? I understand from @cmacneill’s comments that we should not automatically do that reconnection: