Confluence Cloud migration endpoint called locally, but not in QA-environment

Hi there,

Our QA is trying to test the final implementation of our server → cloud migration for Confluence that I was finally able to finish after the fix of MIG-905. Locally (our server running on my dev computer, accessed via ngrok) it works after setting the dark features, I do get proper callback to my migration webhook.

However, when our QA tries it in their environment, migration looks like its running on the server, but there is never any callback on the cloud-side. They don’t use any dark-features as I do locally.

I’m suspecting that there is something in the Marketplace REST API that we should set to make the migration callbacks happen and that is now missing and preventing the migration from making callbacks to our server. Currently we have defined these keys there:

{
  "vendorId": 99,
  "addonKey": "com.gliffy.integration.confluence",
  "addonName": "Gliffy Diagrams for Confluence",
  "cloudAddonKey": "com.gliffy.integration.confluence",
  "cloudVersionAvailability": "PUBLIC",
  "migrationDocumentation": "https://help.gliffy.com/confluence/Content/GliffyConfluence/Migrating-server-cloud-Confluence.htm"
}

The add-on key-values are correct and do match those defined in server-side.

Should those be enough to allow public (or QA) migration to have proper callbacks or am I missing something?

Hi @PetriJuhaniRiipinen,

You need to include minimum version as well. This need to be a semantic version number, so x.y.z. You can make it higher than what’s publicly published in Marketplace.

James.

Hi @jrichards

Oh right, I missed that completely. I’ll add that and ask QA to retry testing!

Thanks!

  • Petri

Hi @jrichards

Callback isn’t still working after I added this entry to the marketplace data:

"cloudMigrationAssistantCompatibility": "9.3.2"

And our QA is actually testing with version 9.7.1 so should be compatible but we still don’t get any migration callback.

Our QA guy did a migration run yesterday (26th of April) with the following site being the target cloud site:

https://gcc-qa1-01.atlassian.net/wiki/home

and in the server side the planId for the migration was:

19a1138f-172e-491e-90fb-f5fff2f9b9ce

Based on that information, are you able to investigate why that migration didn’t get the callback on the cloud side? In our logs we see that the migration callback was registered without errors but we just don’t get any callback to it.

Regards,

  • Petri

Hi @PetriJuhaniRiipinen,

Thanks for the info. I can see the PUT /webhook endpoint was most recently called on 19/Apr/2022. And I can see the transfer initiated 26/Apr/2022 9:47 (UTC+0 - I think, I don’t like timezones). I can see the event for putting the listener-triggered into the queue, just after this and it has the webhook URL as https://gcc-stage-2.gliffy.net/migration/listener. Also I can see several app-data-uploaded messages after this.

I can also see successful messages for processing the message, and then we get this error from your webhook endpoint

java.lang.RuntimeException: Failed webhook call: 401 status 401.

As a note, the migration webhook comes with a jwt, so you need to make sure it’s set up for that.

Please have a look and see what how you’ve configured your endopoint. For example, in my Atlassian Connect Spring Boot example I have (note there’s no @IgnoreJwt annotation)

    @PostMapping(value = "/webhook", consumes = MediaType.APPLICATION_JSON_VALUE)
    public ResponseEntity<Void> webhook(@RequestBody WebhookEvent webhookEvent) {

        log.info("Received webhook={}", webhookEvent);

        switch (webhookEvent.eventType) {
            case "listener-triggered":
            case "app-data-uploaded":
                migrationService.doAppMigration(ProductType.CONFLUENCE, webhookEvent.migrationDetails.confluenceClientKey, webhookEvent.transferId);
                break;
        }

        return ResponseEntity.noContent().build();
    }

Hope this helps.
James.

Hi @PetriJuhaniRiipinen,

Something I forgot to mention, a good practice is to record all the transferId and progress status. Then set a job to run every hour or so to poll

And check for new transfers that you don’t have a current record of. These are transfers that have started but not settled (that is, has a progress of SUCCESS, FAILED or INCOMPLETE). If you find any that haven’t been processed, you can start processing them.

The same goes for

You can look for data exports that need to be picked up and worked on.

These are the ones that you’re likely to have missed from a webhook call.

Regards,
James.

Hi @jrichards

Thanks for the good comments. Just started doubting myself and want to confirm one thing: This value

"cloudMigrationAssistantCompatibility": "9.3.2"

refers to the server-side version of the application, right? So as our QA is testing with version 9.7.1, it should qualify for this criteria? Ok, confirmed from the API docs that it indeed is supposed to refer to server/DC version.

The JWT handling should be ok there, as in my local testing everything seemed to work fine eventually (once I put in the dark-feature I manage to run several successful migrations on my local development environment and petridev5-dev site), certainly there wasn’t any issues with JWT verification. Can’t really imagine why 401 would be returned from the QA installation.

Ok, if something was successfully processed, that’s even more weird as I have logging in place just when any migration event callback is received and we had no such log entries in our log.

Ok, the polling approach sounds reasonable, that way we wouldn’t be totally dependent on the callback arriving properly as that seems now very unreliable for one reason or another.

Regards,

  • Petri

Hi @PetriJuhaniRiipinen,

So, my work here is done?

James.

Hi @jrichards

Well, for now at least. I think next step for us is to implement the polling solution and that should fix this problem finally.

  • Petri

Hey @jrichards

Got new questions as I checked out the query to request in-progress requests:

curl --request GET \
  --url 'https://your-site.atlassian.net/rest/atlassian-connect/1/migration/transfer/recent' \
  --header 'Accept: */*'

And it seems that we would need to execute this request for every site that have installed our app. There are quite a lot of those, so this would mean that we need to execute thousands of requests per each poll cycle to check if any of those sites happens to have a migration in progress.

You mention that it is a good practice to record all transferId and progress status, but how are we able to do that when we don’t get any callback on the cloud-side? I can’t think of any other way that the cloud-site would get to learn about which migrations there are potentially in progress?

Edit: Just noticed from here: The App migration platform REST API this:

GET /rest/atlassian-connect/1/migration/transfer/recent

Returns a list of latest active transfers (upto 100) with migration
details available for the provided cloudAppKey

What cloudAppKey it is referring to? Our apps cloudAppKey? Where is that passed in the request? And what should the “your-site” then be?

  • Petri

Hi @PetriJuhaniRiipinen,

I see your issue. I’d like to point out the webhooks are being sent, it’s just your service is return a HTTP 401. So you need to look at why you aren’t receiving them. But yes, you can determine how often you need to make those calls. You could generate some heuristics to determine who you need to call, for example does the customer have a server licence, or have they previously done a migration … even once every 12 hours might be fine.Production migrations on a large site can take a day or more.

The documentation is a bit misleading, as the cloud app key is included when the Connect call is made, you don’t need to add it to the request. I’ll get the documentation updated, but it’s the last 100 non-settled transfers for that atlassian.net cloud site for your Connect app call. If you’re doing testing and not sending a SUCCESS, etc. message that list can get quite large. But in production it should be quite small.

Hope this helps.

Regards,
James.

Hi @jrichards

Ok, so 401 means the JWT used by the webhook doesn’t validate properly in our side. I think I’ll add some (temporary) logging about this to figure out why that is as it might have other implications as well.

The puzzling thing to me is that JWT validates fine with my local environment, when I use my petridev5.atlassian.net site but it fails on the QA-site. There has to be some difference there that causes the failure and I think it is good to identify what that is and resolve it.

Regards,

  • Petri

Hi @PetriJuhaniRiipinen,

Make sure you fully uninstall and re-install the app, not just re-upload the link to the descriptor. It may be that it’s got the old key stored and hasn’t received a new /installed event to update the data.

Regards,
James.