When is it save to cleanup the atlassian_host table for connect apps?

How can a connect app determine if the Atlassian cloud instance no longer exists? And can this non-existing instance be deleted from the atlassian_host table to clean it up.

Currently I have 1400+ records in the atlassian_host table where I’m not able to query the cloud id of using the https://[instance].atlassian.net/_edge/tenant_info endpoint. For all of these tenants it gives this error

404 Not Found
{
  "errorMessage": "Site temporarily unavailable",
  "errorCode": "OTHER"
}

Is this 404 error a valid method to check if the Cloud instance no longer exists and can be cleanup by the app?

If the Cloud instance no longer exists then there is no point in keeping the customer data, and even the atlassian_host record with the client_key and shared_secret can be dropped since the app cannot be reinstalled on the cloud instance.

3 Likes

Good question @markrekveld.

I think, we also still have some trash data lying around from the time before we implemented our automated mechanism to clean this up when someone properly unsubscribes and uninstalls the app.

I think, it’s not enough to rely on one occurence of this HTTP 404 error though. I’ve seen this error message before when a site was actually temporarily unavailable (due to maintenance reasons, or similar). So I’d recommend to at least check multiple times (on different days) before you delete that data.

If someone has a better approach, I’m also interested to hear it. :ear:

Yeah, its not a one day check as I have seen sites come back online as well.

Currently its a daily check, if the cloudId and license cannot be found, then the tenant is marked for deletion after 60 days. At which point before deleting the data the app checks if the tenant is using the app again.

This works great, but the deletion process doesn’t cleanup the atlassian_host table because this lead to issues in the past. Previously I also deleted the record for the tenant on uninstall events, but if the tenant would install the app again in the future this resulted in installation failures. So as long as the Cloud instance is alive/online/able to install apps, this record needs to be kept.

Now I’m looking to find the save moment in time where this table can also be cleaned up.

This is an interesting question.

What are the implications if we just retain the records?
Is there a need to do the cleanup?

What are the implications if we just retain the records?

The table will just keep on growing and could result in performance issues.

Is there a need to do the cleanup?

I would say common sense. Why keep data that will never be used again.

Lets say cloud instance A is created and installs your app. Later that cloud instance is deleted and data is cleaned up by Atlassian. After a while, that same cloud instance base url can be used again, by a different or the same user, and now they want to install the app again.
This could lead to non-unique errors based on the cloud instance base url.

I have not seen this case yet, but I think it can happen.

I asked the question because you mentioned

Previously I also deleted the record for the tenant on uninstall events, but if the tenant would install the app again in the future this resulted in installation failures. So as long as the Cloud instance is alive/online/able to install apps, this record needs to be kept.

That’s why I am wondering what will happen if we do not delete the records.
I agree it is a best practice to clean up if there are no risks.

Lets say cloud instance A is created and installs your app. Later that cloud instance is deleted and data is cleaned up by Atlassian. After a while, that same cloud instance base url can be used again, by a different or the same user, and now they want to install the app again.
This could lead to non-unique errors based on the cloud instance base url.

I would think that Atlassian will generate a different client_key even if the base_url is recycled. Otherwise that will be a potential security issue.

They do do that, as there is no unique constraint on the base_url column, however, currently there is a reliance on the last_modified_date to find a tenant via its base_url so this leaves room for issues down the line as the last_modified_date is not set by the connect library (Atlassian Connect Spring Boot starter in my case) but instead is set by the Data Access auditing meta data implementation which can be disabled.

Thanks for sharing. Now I understand why you want to housekeep proactively.

It is a security risk which may lead to information disclosure.
I think Atlassian should fix the logic since there are many other app vendors who are unaware of the implications.

Hope you don’t mind me looping you in @HanjooSong and @ragrawal.
You have reviewed PRs from me in the part for the Atlassian Connect Spring Boot libraries and was hoping you could chime in, or loop in someone that may know when we can cleanup data in connect apps.