For the second time this autumn, our releases have been stuck in Atlassian approval automation limbo for 3 days… Last time was in September, with the exact same problem, some internal issue of which Atlassian won’t share any details. This is no way to be doing modern software engineering and CI/CD.
Atlassian really needs to re-think this whole approval process. Approval made sense for new app submissions, sure, but it does not make sense for new releases of an existing app. Right now, a broken version can be the one customers can and will update to, while the fix (the working version) is stuck in approval limbo. Yes, most of the time it works, but when it degrades (which so far is like every 2 months), several days might pass before a fix is made available.
Please do something this instead:
Allow us to release a version right away. Always make it rightly available like it used to be when submitting releases to an existing app. We need actual CI/CD like we used to be able to for a decade
Run vulnerability scanning at your leisure, notify us if something is off like the process is already in place with AMS
If there is a violation of some sort in latest release delta second to latest, Atlassian can surely make the latest private and revert to the second to last after the scans have run. At most during normal opetaion, the vulnerable version will probably have been available only for a few minutes while the scans are running. Of course when a degradation like above happens, Atlassian’s automation will be unable to remove the app (and likewise will the vendor be unable to because that option was also removed).
The thing that makes the least sense of this whole process is that the version en route for release will likely in the worst case be as vulnerable as the previous one, and more likely less vulnerable than the older version already available for download. In other words, releases should be sped up, and if anything a security scan should push vendors to release a patch to the latest version even faster.
Especially when releasing a critical fix, potentially being stuck for days is unacceptable. The release we currently have stuck in limbo, is patching a vulnerability reported in an AMS ticket which is timing out of its SLA because of Atlassian’s own failed release automation so it won’t actually release. Surely Atlassian can imagine how it would be like if their releases would be stuck for 50+ 96+ hours in approval when releasing an urgent fix, and being unable to release anything at all for that duration?
We were also hit by this issue a few days ago. Luckily in our case the new version was a feature release and not a critical fix.
Ever since introducing this approval step back in June/July, it has killed our release velocity. This was compounded by (in our case, a Confluence app) the breaking changes in Confluence 10 which meant for the first time in the more than 10 year history of our app, we now have to publish two releases (one for Confluence <= 9.x, and one for Confluence 10+), which means two approvals that we’re waiting on every time we release.
Even if, as is the usual case, these approvals are automated, it still adds ~10-30minutes before we can finish our release flow, which is not enough time for an engineer to context switch to something else, but too much time to spend idly waiting.
Follow-up: we are now at 80 hours and counting without any change. Our currently submitted releases (containing important fixes) have been held up by Atlassian’s failed automation for over 3 whole workdays. Last time (September) we were similarly held up for several days. That such a degradation has happened twice in three months is very concerning.
Edit: Friday morning, after 4 whole days of being unable to do any releases while two releases were stuck in queue, they have finally been approved it seems and the error sorted out (hopefully). I would love to get an explanation from Atlassian about how they plan to prevent these degradations happening in the future. Alternatively: is there anything we vendors can to to mitigate it?
This architecture is dragging down the business results of Atlassian.
It does not comply with customer satisfaction requirements, because customers appreciate a fast service,
It does not comply with security requirements, because preventing a vulnerability patch from being published is inacceptable,
It requires more workforce for vendors, hence makes the service more expensive to customers. As much as Atlassian is in a “milking the cow” phase, it’s really sad that it translates into losses instead of keeping time available for our developers to develop Cloud apps.
I have to thank the Atlassian teams for everything they did for me concerning the acceleration of this process, but the entire idea of blocking releases is still wrong.
This new system is non-compliant with business goals.
While I am not a proper developer, it seems that the current process does not take into account the nature of the release?
If I have this wrong, please forgive me, as I am not working as a developer for the Atlassian marketplace.
If Atlassian is not following standard ITIL processes for change yet, would it make sense to have multiple flows depending on the nature of the release?
An emergency release should obviously not be stopped for any reason, as it is a response to an incident that requires fixing. A standard change, such as a bugfix or small adjustment release, also does not normally require any major review, but a security/code review as part of the release pipeline should be there, of course.
The only change that should require approvals or a more in-depth analysis would be a normal change where you push out new functionality or bigger corrections?
Would that make sense, and would that speed up the release process for you while still satisfying Atlassian’s need to ensure safe code in their marketplace?