Upcoming changes to modernize search REST APIs

rwhitbeck · February 3, 2020, 5:00am

3 Feb 2020

Upcoming changes to modernize search REST APIs

On July 24, 2020, Confluence will update how we return paginated results for our REST APIs from an index-based system to a cursor-based system. See full notice on changes.

raimonds.simanovskis · May 12, 2020, 7:42am

Currently we used start and limit parameters to do parallel requests which helped to do full pagination through all Confluence CQL results faster. Now we will not be able to do parallel pagination anymore as we can get the next page URL just from the previous page URL. I’m wondering how this will affect pagination time through large CQL results.

Therefore I am concerned about these questions:

What was the reason for this change? I assume that the reason was the bad performance of repeated CQL searches with large start parameter values. Will the new solution improve individual response times of each next page query? Are you keeping open in-memory these cursors on the server side to improve the pagination response time and avoid repeated CQL queries?
How this new cursor based pagination will behave if pages are created and updated during the pagination calls? For example, I have ORDER BY lastmodified DESC in CQL. Let’s assume that a page was not in the first search results page but then it was updated before the next page request (and has the most recent lastmodified timestamp). Will it be returned in the some of the next pages using cursor pagination or will it be missed as it jumped in front of the CQL search results?
Do I understand correctly that currently there is no way to test queries with this cursor parameter? We just need to change our implementation to use the _links.next URL and hope that it will work correctly after the change from start to cursor parameter?
What will happen if someone will still use the start parameter after July 15? Will the request fail or will the start parameter be ignored? (Which might cause endless pagination for those who will not update the their pagination implementation).

Kind regards,
Raimonds

BobBergman · May 12, 2020, 5:29pm

I share all these questions and concerns.

At the very least I would expect a detailed explanation from Atlassian as to why a breaking change is being made to an API, and how doing so was completely unavoidable, rather than simply introducing a new version of the same API to support a different pagination scheme.

It’s also disturbing to be changing the pagination style of just some of the REST APIs. Confluence REST API docs define a specific style of pagination to be employed across the entire suite of APIs. As a result, some developers have undoubtedly coded common abstractions to handle pagination for multiple endpoints. Due to inconsistent implementations on Atlassian’s side (e.g. not always including _links), however, those abstractions have supported index-based pagination rather than opaque link-based ones. Now by moving only some of the paginated REST APIs to a new mechanism, you’re forcing a reimplementation of these common abstractions. I think the impact of such a breaking change may be greater than you realize.

candid · May 19, 2020, 2:15pm

Are there any updates on this?

One the one hand, we are hoping that this would improve some shortcomings of the current paging:

As mentioned by Raimonds, queries with a high start parameter are slow. Request speeds seem to indicate that when I request results 800–1000, the endpoint actually fetches pages 0–1000 to then return the last 200 of those.
Paging is sometimes unreliable. Sometimes the same object is returned multiple times on different pages, so we have to filter out duplicates by ID. Sometimes a page contains less results than what is set as the limit, even though there are more pages. If I remember correctly, sometimes it even returns 0 results even though there are more pages.

However, we are also concerned about when this will be deployed and how much time we will have to adapt our code.

In our app, we only ever fetch the first page, or all pages in one go, so in theory we do not require backwards compatibility for using the start parameter. There is one case however where we do rely on it, and we are wondering how this will be with the new API: When expansions are paged (for example when fetching Confluence pages and expanding the labels), we currently use the specific endpoint (for example the labels endpoint) to fetch the missing pages. Will expansions provide a link with a cursor to fetch the missing pages?

DavidRizzuto · May 21, 2020, 8:14pm

Hello everyone,
I’m an Engineering Manager on the Confluence Cloud team and I can provide a bit of context behind these changes.

As some of you have pointed out, paging though results today with high start values can indeed get very slow. The problem only gets worse as you have more and more data to page through. This change was made to address this issue.

As part of an ongoing effort to improve Confluence’s search experience (both in-product and via API), we’re rebuilding our search backend to be more scalable and performant. In order to meet those scalability and performance requirements, it was deemed that our offset/limit pagination would no longer be suitable, and we would have to migrate to cursor based pagination instead.

I would also like to try to clarify implementation details that have been mentioned here in this thread:

While our APIs will technically support a cursor parameter, we don’t expect clients to ever need to handle it directly. Each API response will be returned with a _links block that containes next and previous pre-generated URLs. These can be used as-is and will fetch the next and previous pages respectively.
The next and previous URLs today don’t contain the cursor parameter, but will be switched over to do so in the future. If you integrate with Confluence using the next and previous URLs directly, then you will be future-proof as these URLs are part of the API contract and will continue to function no matter which parameters are actually present in them. The idea is that you won’t need to be aware of the cursor parameter at all.
Once the new pagination method fully rolls out across production, you should no longer see a particular object be returned multiple times across pages. This problem usually occurred when there were updates or new pages published while a client was paging through results.
If you supply the start parameter after July 15th, it will be ignored.

I hope this serves to clarify some of the reasoning and details regarding this change. Please let us know if there is anything else we can clarify or assist with.

David Rizzuto

candid · May 26, 2020, 11:56am

Thanks for the info, for me that clarifies it a lot.

I just had a look at how paged expansions work, and they already contain a next property to retrieve the next page of results for the expansions, so I assume we will be able to use that in the future as well.

pete · June 11, 2020, 6:51pm

Hey @rwhitbeck and @DavidRizzuto,

Thank you for the notification and for the clarification.

I am still a bit confused about the use. Let’s say I have a list of 1000 results and I want to display them in 20 pages, 50 results on each page. A user wants to see page 10 directly. How do I get the results 451 to 500? Will there be a way to skip result 1 to 450? Or do I have to actually fetch results 1 to 450 and discard them just to get the cursor to be at result 451?

I think this is a common use case when paginating results and we definitely have it. What is the recommended approach?

Best
Peter

BobBergman · June 12, 2020, 3:17pm

This is a great question. I still feel like Raimonds’ point about wanting to query pages in parallel still hasn’t been addressed either. We seem to be losing random access to result sets with this change, which seems like a serious design flaw.

maximilian · July 1, 2020, 7:37am

The Confluence REST API does not consistently include next links in its responses. For example, when listing users:

/rest/api/group/confluence-users/member?limit=1

The response does not include a next link even though there is additional data. When you add &start=1 to the URL, you can see the second result. With the start parameter still supported, you can at least fetch one additional page to see if it’s empty.

The same problem exists when sending a request like this:

/rest/api/search?cql=user.accountId+%3D+"123"+or+user.accountId+%3D+"456"&limit=1

In that case the totalSize property is also wrong.

How are we supposed to work around problems like these without the start parameter?

On this page about the change the /rest/api/group/{groupName}/member resource is not explicitly listed (/rest/api/search, however, is). Does that mean that /rest/api/group/{groupName}/member and any resource that is not explicitly listed on that page and that currently uses the start parameter will continue to support the parameter?

candid · July 13, 2020, 11:48am

Hey @rwhitbeck and @DavidRizzuto,

These changes are supposed to go live this Wednesday. Apart from the fact that there are still some open valid questions about how to use the new paging API in certain scenarios (such as the one raised by pete), the bug raised by maximilian is still open.

When querying for users, next links are missing, for example when calling /rest/api/search?cql=type=user&limit=1.

Internally, we have already changed our code to use the next links instead of manually setting the start parameter. However, releasing that change would break our app because of the bug, and we are worried that not releasing the change would break our app this Wednesday.

Could you give us an update whether the new pagination API will really be released this Wednesday as announced, and what is the status of fixing the user query bug?

marc · July 13, 2020, 12:03pm

@DavidRizzuto @rwhitbeck If I understand correctly, [CONFCLOUD-66469] Get group members REST API is missing "next" link used for pagination - Create and track feature requests for Atlassian products. is still open - Get group members REST API is missing “next” link used for pagination.

That means we could not list all group members in Confluence Cloud?

candid · July 13, 2020, 2:16pm

@marc We have been assured by Atlassian that only the 3 REST endpoints explicitly listed in the article will be changed to the new format.

candid · July 13, 2020, 2:23pm

There is also another bug: When accessing the REST API through AP.request, a new _r parameter is added to each next link. For example, when doing AP.request('/rest/api/search?cql=type=page'), the next link returned is /rest/api/search?next=true&_r=1594649848128&limit=25&start=25&cql=type=page. When requesting that, the next link is /rest/api/search?next=true&_r=1594649848128&_r=1594649908722&limit=25&start=50&cql=type=page. As you can see, an additional _r parameter is added on each page. Strangely, this only happens when using AP.request, but not when accessing the REST API in any other way.

This means that the URLs will get longer and longer for each page. As a significant percentage of the webservers running Confluence Cloud instances seem to be configured wrongly and crash HTTP requests that are too long, this will cause this paging to crash after a certain amount of pages have been fetched.

A workaround is to manually remove the _r parameter from the URL each time.

BobBergman · July 14, 2020, 11:13pm

So is this shipping tomorrow? I don’t see this change yet on my EAP, so it makes me nervous that the switch will be flipped without any way to actually test our apps.

rwhitbeck · July 15, 2020, 11:23pm

A few developers have raised concerns about the change in how the search REST API pagination works. We’re preparing examples that show how to use the new cursor-based method and we’ll post by tomorrow at the latest. In the meantime, to give you more time to update your code, we are extending the deprecation date to July 24, 2020 .

To make it easier to test your code against the API in its final form, we will roll out the change to the Confluence Cloud: Developer-First Rollouts for Ecosystem App Developers (formerly known as Ecosystem Beta Group) on July 20, 2020.

BobBergman · July 15, 2020, 11:47pm

Thanks for going to EAP first! I checked today and noticed the new change had not been released yet, so I very much appreciate the updated timeline and opportunity to test in advance.

DavidRizzuto · July 16, 2020, 9:10pm

Hey all, let me clarify some of the concerns raised in this thread:

Regarding which APIs are changing

The only APIs that are subject to this change are:

/rest/api/search
/rest/api/content/search

(NOTE: a previous version of the deprecation notice included /rest/api/content in this list also - this was a publishing error)

No other API routes are subject to this change. Notably, that means the following APIs are not changing and will continue to support the start parameter:

/rest/api/search/user
/rest/api/group/member

Regarding queries for users

It was raised that when issuing a query to /rest/api/search with the cql clause type=user or user.*, the next links are missing. This is indeed the case. However, it should be noted that querying user objects via this API is no longer supported, as per Changes to Confluence Cloud Search APIs and deprecation notice

Client should instead be using /rest/api/search/user to search for user objects. This API is not subject to these pagination changes and will continue to function as it has been.

Regarding random page access and queries in parallel

It’s been raised that with this change, it will no longer be possible to ‘jump’ to a particular page of results, and also no longer be possible to fetch multiple pages in parallel. This is correct - from June 24th onwards, when using the two APIs in question, client will only be able to navigate from the current page to the next or previous page.

We understand that this is potentially disruptive, but it was a deliberate tradeoff made in order to achieve greater performance and scalability for large data sets.

Regarding AP.request and “_r” parameter

It was raised above that accessing these APIs using AP.request appears to be erroneously appending extra _r parameters to the response. This is indeed a bug in the interaction with AP.request, but it does not effect the REST APIs directly. We will prioritize this bug and fix it accordingly.

DavidRizzuto · July 16, 2020, 9:14pm

Hi @maximilian, @marc - thanks for raising these concerns.

Indeed the next links are missing from responses from /rest/api/group/confluence-users/member and /rest/api/group/member. This is an open bug [CONFCLOUD-66469] Get group members REST API is missing "next" link used for pagination - Create and track feature requests for Atlassian products..

However this API is not changing and the start parameter will continue to function as it is today.

BobBergman · July 16, 2020, 11:50pm

Thanks for the detail. In the future it would be greatly appreciated if these things could be shared before the previously stated release date. We had a stressful week trying to make sure we were ready for the July 15th release. Also, I think you have a date typo where you say “June 24th.”

anon77654700 · July 17, 2020, 9:07am

The /rest/api/search/user API has the same problems as the /rest/api/search API with type=user. As mentioned by @maximilian, the totalSize property is wrong and there is no next link if the results are spread across multiple pages.

So the only way to workaround this problem at the moment is to make an additional request to check if there is a following result page.