Marketplace had ~10 outages in the last 46 days. What is being done to address this?

Hey Marketplace Team,

First and foremost, kudos to finally having some monitoring for the marketplace transactions. It seems that the last 3-4 outages were posted to Statuspage before someone from the vendor community had to nudge you first. This is a big leap forward, and I hope we continue to see improvements on this.

That being said, can someone shed some light on why there were 10 (or more?) outages in the last 46 days (since June 30), and what is being done to address this?

August alone had 5 incidents, and we are barely halfway into the month. Furthermore, none of these incidents have any details as to cause, what changes are being made to prevent this in the future, etc.

Cheers,
-Anthony

22 Likes

Actually, the last one was reported by me :wink:

1 Like

Hi @ademoss ,

The current issues faced by the Marketplace are due to a legacy data pipeline system which is in the process of being deprecated. We are working to update the systems, identify any gaps in processes, monitoring and maintaining the older system simultaneously. Our intent is to improve the outcomes for our partners. You can expect further communications nearer the end of month on the plans going forward to address these issues.

Sincerely,

Chuck Talk
Sr. Technical Partner Manager
Atlassian

1 Like

Hey @ctalk - I’m not quite sure I follow.

The current system has been around for a long time, and while it has had many outages over the years, the last month a half seem absolutely excessive.

I get that legacy systems can deteriorate, but the marketplace team has been talking about improving this process for many years, and in fact, if I go spelunking in this very forum, I’m fairly certain I can find lots of “promises” to improve things just over the last few months. So, what has changed to make things so much worse in recent times?

And please realize, I’m being very realistic here. I don’t expect a fix overnight. I don’t expect an announcement that the new system is ready tomorrow and we’re all riding into the sunset together. But at the very least, you could give us more details on each of these outages, what is causing things to be so much worse, and what you are doing in the short term about it.

We are working to update the systems, identify any gaps in processes, monitoring and maintaining the older system simultaneously.

Ok, that sounds interesting. But your entire statement fails to communicate any kind of details as to what we can expect. If you’re working on monitoring for example, why is it that the community still has to notify you about outages every time, instead of you guys catching it?

If you’re maintaining the older system, why has it failed 5 days in the past 16 days?
And furthermore, if we’re waiting for some kind of announcement at the end of the month, does that mean we should expect outages to continue at this pace?

Our intent is to improve the outcomes for our partners.

That’s a pretty statement. But How? You’re not really telling us anything.
In fact, since the announcement of a new system a few months back, things have gotten worse.

Again, not expecting miracles, but I think we can do better here with regards to communications and setting expectations with the vendor community.

I am fairly certain if any vendor app had as many issues as there marketplace, Atlassian would come knocking and ask what’s going on. And if that vendor provided such vague answers, I am also fairly certain Atlassian would be unhappy about that.

4 Likes

Oh Antony, you’re expecting to much from the company that produces white papers on operations, team collaboration and stakeholder management. This is just not Atlassian core business.

2 Likes

@ademoss the size of the marketplace has exceeded the capability of the current system to maintain pace since its creation. We’ve done more transactions this year than ever before, and the system cannot address the volume sufficiently in its current configuration. This has been coming for a while, and the need to address this is what is driving the current project to resolve the known issues.

As for the details, I don’t have all of the facts, but I am trying to help and work to communicate what I can within the realm of what is available to me. As for each and every outage, I do not have every RCA, but the last few have been because of the current data pipeline being overwhelmed and not completing its jobs on time. This has meant no data loss, but it has meant delayed reporting and throughput. Not a preferred outcome, but the current problem we are working to solve and modernize the systems.

The details of work being done must be delivered from the team actually performing that work. My statement is meant solely as acknowledgement of the issues, that we are working on it, and that we do not want this outcome for the partners. I do not promise rainbows and unicorns, but I am not intentionally deflecting here, it is my intent to let you know that the issue is concerning to us, we do care about this, and sincerely want to resolve the problems.

No one comes to work wanting to have a bad outcome for the marketplace. We are growing the support and systems necessary, and it will take time to resolve. Nothing built by humans will ever be perfect, and we don’t want to spend our days and nights with partners upset. There is change coming, but as with all changes, they take time. Yes, if I could, I’d snap my fingers and say “make it so” - but that isn’t reality. It is a work in progress.

1 Like

Hi all, again not a happy day as one job completed and another has failed. We are working to refresh the data for you. The team are working to implement a fix for the failure to allow the data refresh to complete.

See: https://developer.status.atlassian.com

I will continue to update as more information is available. Your frustrations are justified, I am aware that this does not help you with your marketing or other timed activities as needed. Please accept my apology, and know that this is a matter I do not take lightly.

4 Likes

Its down again, 503 Service Unavailable, I gave it 10, there is now a statuspage entry referring.

1 Like