RFC-15: Ratings Enhancement

ChoppaAditya · June 5, 2023, 4:30pm

RFCs are a way for Atlassian to share what we’re working on with our valued developer community.

It’s a document for building shared understanding of a topic. It expresses a technical solution, but can also communicate how it should be built or even document standards. The most important aspect of an RFC is that a written specification facilitates feedback and drives consensus. It is not a tool for approving or committing to ideas, but more so a collaborative practice to shape an idea and to find serious flaws early.

Please respect our community guidelines : keep it welcoming and safe by commenting on the idea not the people (especially the author); keep it tidy by keeping on topic; empower the community by keeping comments constructive. Thanks!

Summary of Project:

We’re improving the Atlassian Marketplace rating system to improve the customer evaluation process and empower customers in the Marketplace.

Publish: Jun 5, 2023
Discuss: Jun 19, 2023
Resolve: Jul 3, 2023

Problem

Atlassian’s marketplace provides a review mechanism that enables customers to rate and share their experiences with apps, thereby offering insightful feedback to prospective buyers and partners for enhancement. However, the current approach lacks the granularity to offer comprehensive evaluations, which can prompt users to seek alternative resources for further information. Consequently, Atlassian has implemented measures to counteract fraudulent reviews and improve the rating process. This Request for Comment (RFC) aims to augment the calculation logic and display of ratings within the marketplace.

What is the problem?

The ratings in the Atlassian Marketplace play a crucial role in helping customers evaluate an app. However, our current rating system has several limitations that may hinder customer decision-making. For instance, the absence of average rating numbers in our Marketplace’s sections makes it difficult for customers to determine an app’s usefulness accurately. Additionally, the visual representation of stars as a means of evaluating quality may not be inclusive since the color contrast ratio does not meet accessibility guidelines. Our star rating system’s visual representation also utilizes increments of 0.5, which may obscure distinctions between comparable apps with marginally different ratings. Furthermore, our 4-star rating system may deviate from some customers’ mental models of conventional rating systems, leading to ambiguity and difficulty in understanding reviewer sentiment. Finally, combining ratings across cloud, Data Center, and server versions can lead to imprecise assessments of an app’s performance, particularly when legacy ratings are given equal weightage as more recent ratings. To address these limitations, we are exploring alternative ways of presenting app ratings that provide customers with more precise and reliable feedback.

How do we benefit by improving ratings on Marketplace ?

Establishing trust in the Atlassian Marketplace is of utmost importance to us. As a primary initiative, we have implemented Privacy and Security measures to bolster customer confidence. The evaluation process, including ratings and reviews, serves as a critical trust barometer for customers. Therefore, we are concentrating our efforts on enhancing the review system.

Our objective is to enable customers to provide more comprehensive and insightful feedback, and the rating enhancement will aid them in assessing the efficacy of apps more effectively. Enhancing the review system is conditional upon improving the rating system. At present, the ratings lack the necessary granularity to be truly beneficial for customers. To address this issue, we aim to shift towards a standardized 5-star rating system. This alteration will allow customers to provide more detailed feedback and facilitate better differentiation between apps based on their ratings. With the imminent conclusion of server hosting, it is even more crucial to segment ratings by hosting type. This ensures that customers can concentrate on hosting-specific ratings while providing cloud-oriented partners with an equitable appraisal.

In essence, our goal is to overhaul the current rating system by transitioning to a standardized 5-star rating system, introducing recency bias, and enabling hosting-specific ratings. These endeavors collectively strive to enhance customer trust and provide a more valuable and informative experience in the Atlassian Marketplace.

Proposed Solution and Change in user experience

Accurate representation of ratings: Improving the precision of rating representation involves two significant components: precise numeric ratings and fractional representation. Customers will be able to view the exact numerical rating values, which will facilitate clear visibility and ease of understanding. The rating system will incorporate fractional values in its visual representation, ensuring a higher degree of accuracy than the traditional 0.5-star increments.

aa21c41a-2a27-4f44-8cd5-eca8fc809e6c

Disclaimer: The design presented is tentative and subject to further refinement. It does not represent the final version and is provided for illustrative purposes only.

Displaying ratings in numeric form and utilizing fractional representation can greatly improve user experience. This provides clear visibility, allowing customers to quickly assess an app’s rating and make informed decisions. Fractional representation offers increased precision, enabling customers to distinguish between marginal performance variations and make more accurate evaluations.

Separating ratings by hosting type: It is important to note that each app will receive different ratings for its cloud, Data Center, and server versions. This differentiation acknowledges the inherent disparities and unique attributes of every app with respect to the hosting environment.

Disclaimer: The design presented is tentative and subject to further refinement. It does not represent the final version and is provided for illustrative purposes only.

Separating ratings by hosting type improves the user experience, as it allows customers to evaluate and compare each offering independently. This differentiation acknowledges that performance and features may vary across different hosting environments, which enables customers to make more informed choices based on their specific needs. By having separate ratings for cloud, Data Center, and server versions of an app, customers are provided with specific insights into each offering.

Adding recency bias to the ratings : At present, the computation of average ratings is derived from a basic mean calculation of all ratings. However, by introducing the concept of recency bias, greater significance will be attributed to more recent ratings.

Incorporating recency bias into ratings enhances the user experience by assigning greater importance to recent ratings. This approach offers a more precise reflection of an app’s present performance, which boosts customer trust in the ratings. Furthermore, recency bias addresses past problems and ensures a fair evaluation based on current user experiences. It also provides Marketplace partners with the opportunity to improve their app ratings by enhancing their app and support without being hindered by previous performances.

Moving from a 4-star system to a 5-star system: Adjustments are needed to transition from a 4-star rating system to a 5-star rating system, including Historical Rating Conversion to transform existing ratings. Extrapolation is one way to maintain consistency and accuracy while updating the framework. We are also exploring alternative methods to accomplish this. Additionally, we plan to implement changes that will allow customers to rate apps on Marketplace on a 5-star scale in the future.

The shift from a 4-star rating system to a 5-star rating system provides several advantages for the user experience. This includes an increase in granularity, which allows for more detailed feedback and evaluation of an app’s performance. Additionally, adopting the industry-standard 5-star system promotes consistency and familiarity, making it easier for customers to interpret ratings across different platforms and websites.

By implementing these changes, the app marketplace will provide customers with an improved experience by offering accurate, detailed, and hosting-specific ratings. It will also enhance precision, compatibility, and consideration of recency in evaluating app performance.

Asks

While we would appreciate any reactions you have to this RFC (even if it’s simply giving it a supportive “Agree, no serious flaws”), we’re especially interested in learning more about:

Overall Rating and Hosting-Specific Ratings: Should we continue to maintain an overall rating in addition to hosting-specific ratings? Alternatively, considering the end of life for server hosting, would it be more appropriate to focus solely on cloud ratings and Data Center ratings, excluding the overall rating?
Recency Bias in Ratings: We would appreciate gaining insight into our partners’ perspective on the perceived benefits of incorporating recency bias into the rating system. This modification involves assigning greater importance to recent ratings. Your viewpoint regarding the advantages and potential challenges of implementing recency bias would be highly valuable.
Transition from a 4-Star System to a 5-Star System: We are currently evaluating the possibility of transitioning from our present 4-star rating system to a standardized 5-star system. We would greatly appreciate your professional opinion regarding the potential benefits of this transition, as well as any concerns you may have regarding the use of a simple extrapolation method for converting legacy ratings to the new 5-star system.

We appreciate your time and expertise in providing feedback on these specific points outlined in the RFC. Your insights will play a significant role in shaping our decision-making process.

yvesriel · June 5, 2023, 6:27pm

Hi @ChoppaAditya ,

Thanks for the proposition. I always found strange the 4 rating compared to the standard 5 so the change would be welcome for me.

However, I have a few concerns.

Since Server and DC are essentially the same app, the rating from server should be incorporated into the DC rating. Some apps have a long presence on the marketplace and getting rid of the Server rating would not make justice to the apps.
I’m against Recency Bias. You see, even if we provide exceptional service to customers and ask them to rate the apps, 99% of people will not do it. This is even exacerbated when the customer is a large one. Employees don’t have the right to speak on behalf of their employers so they simply don’t give reviews. However, a customer that feels he was not properly served, will immediately jump and leave a somewhat negative review. So, it takes a long long time to grow positive reviews and Recency Bias will simply bring that down. So, I’m completely against it.
I don’t have a big opinion on the transition. As long as it’s fair, everybody will be on the same boat anyway.

Thanks,

Yves

scott.dudley · June 5, 2023, 6:50pm

When using separate ratings per hosting method, Atlassian needs to consider that the Marketplace has a default landing page for each app that may differ from what the customer is actually looking for. For example, I could land on the page for the Xyz Widget, showing the Cloud default, when I am in fact looking for the DC version. If the cloud version is 1-star but the DC is 5-star, I might not get around to clicking on the dropdown because it looks like a bad app overall. Perhaps consider showing all of the applicable ratings, regardless of the specific hosting page being shown.

Ditto for search results. In this scenario, there seem to be a few possible solutions: (a) show context-sensitive ratings in the search results if the user is, say, performing a DC-only search; (b) calculate an overall average for all hosting types and display that (if no context clues are present); or (c) display all ratings within the search tile.

Also, in addition to separating the star ratings, what is planned for the existing written reviews, or for future written reviews? It seems that previously-written reviews get allocated somewhat randomly to server, DC and cloud hostings, even if said reviews were written long before the existence of that deployment method.

Has Atlassian considered how to integrate the ratings process with earlier customer-installed versions of the UPM that support in-app rating? Since nothing forces the customer to upgrade UPM, I trust that the relevant APIs will handle the necessary star-rating conversion from old UPM instances that are still using 1-4 stars?

nick · June 5, 2023, 11:36pm

Can we please have a way to deep link a user to the ‘leave a review’ functionality, as it is hard to navigate there today.

Chris_at_DigitalRose · June 6, 2023, 4:15am

I appreciate the effort to improve the rating system. A couple of thoughts…

Our experience as a small vendor is that customers/users do not naturally leave app reviews. To illustrate, we have a small Confluence app that after 2 years has 301 active installs, 10,000+ total users, and yet has only a single review (from 2021).

My perception is that most of the unsolicited reviews are the customer sharing their support experience, rather than being a review of the app itself. 90% of our customers never raise a support ticket, are (presumably) satisfied with the app because they continue to use it. But those customers never write reviews.

Does the Star rating add enough value/accuracy to keep promoting as the one of the primary filters. Or is the number of installs a better indicator of the quality of an app?
e.g. If I need a diagram app for Confluence I would evaluate draw.io because it has 70K installs, not because it has N stars.

Chris
Digital Rose

jack · June 6, 2023, 7:01am

What I miss in the current review system:

there is no ability to quickly look at the bad reviews only (as a user, I want to see what other users complain about - this way I can see if it is crucial feature missing and I can see if vendor replied and maybe fixed the problem)
users usually do not update reviews if the problem was fixed or missing feature added
there is still unfair behaviour and some apps getting reviews from the employees of the company that created the app - this way larger companies can quickly get dozen/tens of positive reviews for new app (which is not possible for small companies)
majority of lower rated reviews for our apps is caused by limitation of the Atlassian frameworks (Connect, API, etc.) or problems caused by Atlassian in the other way (outage, billing system, etc.). Those reviews should not count. But Atlassian does not care much about flagged reviews.

As for the RFC

No opinion
It might be problematic for the reason Yves described in his answer, but also what I described in the second bullet of my answer above (users usually do not update reviews if the problem was fixed or missing feature added). For those reasons old reviews are equally important as the new one.

remie · June 6, 2023, 7:08am

I have a lot of thoughts and feelings about the review system, but none of those are about having the average number on the listing page, switching to a 5-star system and/or including recency bias.

I know that we’re not supposed to comment on Atlassian priorities in RFC’s, but if you really want to fix the review system, perhaps you should add a new RFC to solicit the Marketplace Partner community on what the real problem is of the review system. I’m sure you will get a lot of response.

BenRomberg · June 6, 2023, 8:20am

I agree with:

Changing reviews from 4 to 5 stars (finally!)
More accuracy when showing the average number of stars
Hosting specific reviews

Like @yvesriel I don’t agree with recency bias as it’s probably difficult to implement fairly. At the moment, one bad review is already a big issue for us, since it lowers the review average considerably due to the low number of reviews given in the Marketplace in general. If those newly added reviews now also get a higher weight as the already existing ones, the review average will drop even more.

If recency bias is to be implemented, you would also need to make sure that the average doesn’t change without a review being given. That would feel wrong, but depending on the algorithm used, could easily be the case.

An alternative to recency bias could be to give reviews more weight that have a higher “Was this Review Helpful” count/share, or even exclude/heavily penalize reviews that receive a 0/x helpful share. But then you need the same rules for the “helpful” votes than are already in place for the reviews themselves (no votes by members of the vendor or by the competition, only logged-in users that have actually used the app, …).

I would also suggest to disallow anonymous reviews. Us vendors should know who has given a rating and be able to respond to the reviewer. It’s always frustrating to get an anonymous 1-star rating and you have no way to know why it was given and no chance to respond. Even though the Marketplace team has taken steps to reduce fake reviews, we can never be sure with anonymous reviews.

dlindsay · June 6, 2023, 11:23am

Thanks for brining this initiative to RFC @ChoppaAditya. Here’s the feedback you were after:

Overall vs platform ratings - I lean more towards overall ratings but I acknowledge that there are arguments for both ways. For customers it might be helpful to see individual platform ratings however it presumes that all potential customers know what platform they’re on (which isn’t a given - especially for apps that target personas other than admins). The disincentive for app vendors is focus. Given the platforms are so different and invested in so differently by Atlassian, we often have to follow your focus. This likely will result in radically different listings/reviews from customers.
Recency bias - Disagree with this like others on this RFC. Until there are more stringent rules around who can leave reviews and a method by which paid-for reviews can be monitored, introducing this method would only allow for apps that take advantage of the system.
5 star rating system - I’m not sure of the value of this after so many years. Yes, it is standardising it to other systems but given the historic issues with reviews, becoming more precise with the star system will disadvantage those that haven’t engaged dicey review practices.

As others have mentioned, there are a number of underlying issues that would improve the level of trust in the reviews that are not covered by the RFC. Also, happy to share views on those if desired.

BorisBerenberg · June 6, 2023, 11:40am

I think all of the changes suggested make sense and have a number of positive second order effects such as allowing bad apps to be acquired and cleaned up to improve ratings.

However, it is also important for the team implementing these changes to acknowledge that this isn’t mission accomplished when it comes to reviews. This is only the first step, and that the team understands that vendors expect to see continued improvements in the space along the lines of feedback provided up-thread.

MaelleCARTRON · June 7, 2023, 6:00pm

Great initiative! Here are our feedbacks:

Overall Rating and Hosting-Specific Ratings: Keeping an overall rating remains pertinent to us (when navigating the marketplace), but hosting specific ratings would be as well a very good improvement for the customer presuming that some apps can be quite different (UX/UI, features, etc.) depending the platform. An appropriate solution for us would be to combine Server and Data Center rating.
Recency Bias in Ratings: Yes, a recency bias would be a positive change as many older negative reviews relate to missing features or bugs that have long since been addressed.
Transition from a 4-Star System to a 5-Star System: From our experience, a 4 star system generally forces people to give a positive or negative evaluation (black or white) with no possibility of neutral score. A 5 star system transition provides several advantages (standardization, precision, etc.) BUT we have many concerns about how this transition is going to be done. How are you going to recalculate the average score on a scale of 5 ?

ssidbury · June 15, 2023, 7:57am

I’ve compiled some feedback from a group of colleagues across our organisation:

Overall Rating and Hosting-Specific Ratings
Here we think it makes sense to split the ratings between Cloud and On-Prem as they are fundamentally different - rather than a single overall rating. However, like previous responses, we strongly believe the DC rating should consist of the Server and DC ratings combined. The Server and DC experiences are close to identical and losing the Server ratings and reviews would be a significant loss of context and guidance to future customers, as well as an injustice to longstanding marketplace vendors.

Recency Bias in Ratings
Our group has mixed opinions on this. In general we agree with the principle but we have some significant concerns around implementation and implications in practice, most of which are covered in previous responses. It’s also likely to make the app rating less intuitive and therefore ultimately less trustworthy and useful. What does this rating really mean when I can’t clearly understand how it was generated?

Transition from a 4-Star System to a 5-Star System
Again in principle we agree with the proposal, but the way this is implemented could be problematic. What does a simple extrapolation method to convert legacy ratings mean in practice? Would all 4 star ratings become 5 star, 3 stars would become 3.75, etc? Intuitively, 3 / 4 “feels” better than 3.75 / 5. Given the established 4-star system, does the value of changing it really outweigh the pain and confusion of changing it at this stage?

As others have highlighted, there are other aspects of the current review system which are of greater importance and concern to us, for example authenticity and trustworthiness of reviews or anonymous in-app reviews that contribute to the overall score but don’t have a comment or a username for us to respond to.

ChoppaAditya · June 23, 2023, 9:30am

Thank you, @yvesriel , for your detailed feedback. We will take into account your suggestion of merging server and DC ratings.

We understand your concern about recency bias. We can solve for it by making sure that the most recent rating doesn’t negatively affect all the good work done by partners. For apps with fewer ratings, the impact of recency bias would be much larger. Hence, we won’t create recency bias till the number of ratings crosses a reasonable threshold.

ChoppaAditya · June 23, 2023, 9:36am

Dear @scott.dudley ,

Thank you for your detailed feedback. We will address the issue of customers being unaware of the hosting type by revisiting the designs on our app listing page. This will ensure that the customers are made aware of the hosting type-specific widget. Additionally, the overall rating will be visible, allowing the customer to be aware of the performance for the DC version.

We will display overall ratings for search results and hosting-specific ratings will be displayed only on the app listing page.

We have considered in-app ratings via UPM and will ensure that consistency is there with Marketplace rating system.

ChoppaAditya · June 23, 2023, 9:39am

Thanks @nick , we have added this feedback to our backlog. Will update this thread as we make progress.

ChoppaAditya · June 23, 2023, 9:41am

Hi @Chris_at_DigitalRose ,

We concur with your assertion that the number of installs serves as a dependable indicator of an application’s quality. This has been substantiated by customer interviews conducted in the past. The significance of ratings decreases as the number of installs increases. Thank you for bringing up these points.

ChoppaAditya · June 23, 2023, 9:43am

Hi @jack ,

Thanks for your suggestions on improving the current review system. We appreciate your input and will take them into consideration as we continue to make enhancements. Additionally, we will be removing any reviews from employees of the company that developed the app in the near future. This will help to prevent any potential bias and ensure that the review system is fair and accurate. We will also work on addressing the issue of recency bias so that one negative review does not unfairly outweigh all the positive work done in the recent past.

ChoppaAditya · June 23, 2023, 9:45am

Hi @remie ,

We acknowledge that there are several areas that require attention to improve the review system. The initial measures to improve ratings and detect fake reviews serve as a starting point, with further enhancements for reviews planned. We highly appreciate and welcome your valuable feedback on this matter.

ChoppaAditya · June 23, 2023, 9:55am

Thank you, @BenRomberg, for providing us with your detailed feedback. As mentioned in previous comments, we will make sure that one bad rating doesn’t affect the good work done by you. Moreover, we would like to inform you that we have decided to disallow anonymous ratings soon.

ChoppaAditya · June 23, 2023, 10:16am

Thanks, @dlindsay, for sharing your perspective on the RFC. I understand your concerns regarding fake reviews from customers, and I want to assure you that we are working on a system to remove them. We will address the issue of recency bias only after solving for fake reviews. The changes proposed are just the beginning, and we will continue to make improvements to our review system to make it more transparent and trustworthy for our customers.