Release of v2 Confluence REST API for Pages and Blogposts (Experimental)

PreethiK · October 3, 2022, 5:24pm

We are excited to announce our initial release of our version 2 Confluence REST API, with endpoints for retrieving pages and blogposts.

These granular endpoints correlate to more specific functions and allow you to be intentional about how you interact with Confluence data. Content is broken down into specific types - so you can work with pages, blog posts, and more as discrete entities.
They also offer (up to 30x) faster speeds when retrieving bulk content. Our new cursor-based pagination (instead of offset-based pagination) allows quick iteration through content.

The following endpoints are now available as experimental:

Pages
- Get all pages
- Get page by id
Blogposts
- Get all blogposts
- Get blogpost by id

Note: As these endpoints are currently experimental, they can change without any prior deprecation notice. Please be aware of this before using them in business critical applications.

Please have a look at the changelog for more information or our docs to explore the API.

The release includes support for Connect, OAuth 2.0 (3LO), and Forge apps.

We’re eager to get your feedback at this time. Please join the discussion in the Developer Community by tagging your community questions with rest-api-v2 to surface them to our team. Let us know what use cases you’re building, as well as what feature requests and fixes you’d like to see. We’re excited to work closely with you to shape the future of this API.

marc · October 3, 2022, 6:18pm

Thanks for the update!

nathanwaters · October 3, 2022, 11:19pm

fyi @PreethiK this change has broken all previous links to the API docs. You might want to setup some kind of forwarding for old URLs to redirect to /v1/

eg: https://developer.atlassian.com/cloud/confluence/rest/api-group-content/#api-wiki-rest-api-content-id-put

PreethiK · October 4, 2022, 4:07am

Hi @nathanwaters, thanks for the great catch! Our team is going to look into this ASAP.

anon87782987 · October 4, 2022, 9:41am

Very cool to see you working on a v2 REST API for Confluence!

Unfortunately, in its current state this won’t be all too useful for us as we need to expand labels, content properties, and other metadata when bulk fetching content. But I’m sure you’re also gonna add expansions and/or other endpoints to allow us to do this in the future?

Also, are there already plans on what this may mean for the v1 REST API going forward?

Cheers,
Sven

PreethiK · October 5, 2022, 12:27am

Hi @anon87782987, excited to see your interest in the new API!

Yes, we are in the early stages, but we will continue to build out the API and offer additional endpoints to provide needed functionality.

No plans as of yet for changes to the v1 REST API, but we will be sure to keep everyone updated!

DhiralPandya · October 7, 2022, 6:45pm

Hi @anon87782987, thanks for the details. Due to various reasons, we are planning to move away from the expansions and create separate endpoints to get those details.

Currently, most of the expansions have separate V1 APIs as well. You can use that until the v2 version is available for those. We have already started working on the content properties v2 APIs and a plan in place for labels API.

anon87782987 · October 10, 2022, 12:11pm

Sounds great! Just want to let you know that in our case the expansions allow us to use the REST API very efficiently (in terms of amount of requests that is).

In our apps we need to e.g. fetch all content in a space, expanding things like labels, attachments, content body, and so on. Currently, the expansions allow us to do this with a single type of request, simply iterating though all the pages. If the v2 API splits this up into many separate endpoints this may lead to the number of requests we need to perform to get the data we need being multiplied.

But I suppose you already have thought about this a lot. Looking forward to see this evolve!

Cheers,
Sven

marc · October 10, 2022, 12:26pm

Want to second @anon87782987 response. We also make use of expand, and if we could not use it anymore, we’d need around 5-10 times more API requests.

james.dellow · October 19, 2022, 5:36pm

Likewise, removing expansion points would increase the number of API calls and processing would become more complex.

TylerBrown · October 25, 2022, 4:21pm

Hi all,

Just want to start out by saying we really appreciate all your feedback. As we build the V2 APIs this is exactly the kind of feedback we are looking for so that we can build better APIs for everyone.

As you all mention, removing expansions will cause the number of network requests to increase. This is a problem we are aware of, but we are okay with it for a number of reasons.

Expansions cause scopes to have to be extremely broad. As you’ll know, when using the content endpoint an app has to require an abundance of scopes because scopes can only be enforced at the API path level, not taking into account query params. Moving to only having individual resources means we can better separate the scopes per endpoint and apps can then specify exactly what they need, letting users feel more comfortable when installing an app.

The other aspect we are actually going to improve by this move, is performance. Expansions are not setup in a way where they can be performant. They can easily lead to N+1 problems, very deeply nested calls, and all sorts of other performance issues. With the switch to individual resources, we can design the REST APIs in a way that will allow much more efficient loading. Not to mention we can more easily monitor performance since it’s difficult to track the performance of the content API when each request could be drastically different from the others.

While we work towards moving away from expansions where possible, we would love to hear more of your feedback on how you are using expansions, the pain points removing them would cause, and anything we can do to make the transition easier. Once we know these we can work to make sure our V2 APIs are providing the appropriate solution.

Thanks!

Confluence Ecosystem Team

TylerBrown · October 25, 2022, 4:21pm

As an aside, we do offer new GraphQL APIs which DO still have all sorts of these relationships as part of the language itself. But we are only able to offer this because GraphQL itself is a paradigm designed to handle this problem, and offers first class support for functionality like data loaders which let us avoid N+1 problems. Bringing you new Confluence GraphQL APIs in Beta - Atlassian Developer Blog

christoffer · October 26, 2022, 8:45am

hi @TylerBrown - as far as I know the GraphQL API is not available to Connect Apps. Is that still correct?
And - what is the timeframe for getting rid of REST API v1? I would guess that would be a major killer for all Connect apps feature and migration roadmaps if this is not synchronized with the whole Connect to Forge migration roadmap.
Cheers, Chris

riku · October 27, 2022, 11:34am

Hi @TylerBrown !

We (I’m working with @anon87782987 and @christoffer ) have some concerns about how our app would perform if expansions are not supported. We have use cases where having to fetch all the information that is needed by our app without using expansions could potentially lead to tens of thousands of requests on Confluence instances of some of our customers.

Currently all the needed information can be fetched with a single REST call that has several expansions. It is true though, that the response will have several pages, and in some cases we might need to do other REST calls to fill in information that did not fit into eg expanded children (this seems to be required if a page has more than 25 children). Getting all the information that we need can currently take several minutes using the old REST API and expansions, so we’re definitely excited to hear that the performance might get better.

In any case here’s a real-life use case that might be interesting for you to consider while designing the V2 API:

our app supports a concept called ‘variant’ where a page is part of a variant if it and all its ancestors have certain labels
at the beginning of a publishing process, we calculate which pages in a page tree belong to each of the variants that are chosen to be published
after that we always filter out all the pages that do not belong to currently processed variant
doing the variant calculations always when needed would be very inefficient, and keeping the information in memory does not need too much memory even if there is a big amount of content
we additionally cache some other information that is fetched on the initial REST call by adding additional expansions: eg to which space content belongs, and some history information (when content was created or last modified)

The use case requires an efficient way to get all the pages in a page tree, including the labels they have, and all the parent-child relations. We have customers that have page trees containing thousands of pages.

I think there might be several ways to support our use case efficiently also with a solution that is more restricting than current expansions. And we’re happy to change our code if the end result might perform better

Would be great to hear some information about how our use case could be efficiently solved with the V2 APIs, and I would also be happy to share other cases where we’re using expansions if you’re interested

Cheers,
Riku

marc · October 27, 2022, 12:02pm

@TylerBrown I understand that you are saying expansions make dealing with scopes difficult. But at the same time you support graphQL, which seems to have the same issues with scopes. If we were to switch from using expansions to graphQL, how are scopes handled? And are you going to support graphQL for connect?

anon87782987 · October 27, 2022, 1:13pm

@TylerBrown There is also one more question that came to my mind which is: how would this affect the CQL endpoint which also currently supports extensions? Could we expect a v2 CQL endpoint to only return IDs, possibly accompanied by a small set of metadata?

Or would you also think of GraphQL as essentially a replacement for CQL?

TylerBrown · November 1, 2022, 4:39pm

Hi,

Appreciate all the comments and saw there was a few similar questions so answering them all here.

GraphQL and Forge/Connect:

GraphQL APIs are indeed currently only available via Forge. The reason for this is purely a technical one with how Connect auth/scopes are handled. Support for Connect is being looked into, but we do not have any timeline related information to share.

GraphQL and Scopes:

Great question about this. This is where the power of graphql comes into play. With the REST APIs, the scopes can only be set at the path level. Thus anything that can be fetched with that path, must be part of the scopes for that path regardless if it’s actually fetched.

GraphQL via the Atlassian GraphQL Gateway has been configured in a way where scopes can be applied at the type level. This is extremely powerful because it means scopes can be tailored to exactly the query an app is trying to run - not just some top level generic ones based on the possible data being fetched. Trying to fetch a page and just get its title? Use just the page scope. Want to then also fetch its space? You’ll also then need the space scope - it wasn’t required before this.

Even with V2 REST APIs scopes will continue to be at the path level only.

Expansions:

@riku thanks for showing us your use case in such detail.

Our current thought process on labels is this:

Labels are a great example of something that is currently an expansion, that does not necessarily need to be an expansion in the V2 APIs. We understand that people often use labels as filters, so we will make sure to support this use case. Meaning, we will not require you to get every piece of content, and then fetch the labels in a separate call. We will make sure the list returning endpoints appropriately accept labels as a filter, and most importantly, we will work to optimize and make sure these can run a lot faster than the current queries.

You bring up another point we are looking to address, which is createdBy. While currently this is on the history / version which requires an expansion, we believe a field like this is core to the data itself. So for these types of items, we’d like to bring these up to being top level fields next to the other fields like title, space id, and so on. Therefore no expansion would be needed for these.

Page tree is a bit trickier and depends on whether the page tree is actually needed or just all of the content under a page / within a space. The latter being much faster.

Our goal is to make it so that we find a healthy balance between resources being properly scoped to their respective endpoints, and not unnecessarily increasing the number of calls required.

CQL:
We have not yet investigated the search returning endpoints and how they will work in V2 APIs. You raise a good point which we would have to consider once we tackle them.

The GraphQL APIs are more of an alternative to the REST APIs. Both will offer search functionality that should accept CQL.

Thank you!

riku · November 17, 2022, 3:27pm

Hi @TylerBrown!
Thank you for the reply and sorry that it took me so long to reply. In general I really appreciate it that you’re asking for feedback on planned new features in the developer community.

More in “core data” instead of expansions:
I think this sounds great in many cases!
I agree that many things like ‘lastModified’ and ‘created’ timestamps would be great to have in the “core data”.

Page tree:
For our purposes it would be enough to have also ‘parentId’ and ‘position’ (relative to other children of the parent) in the “core data”. In cases where we fetch all the content in a space or under a page, that information would allow us the build all the required parent-child relations (this is what I mean when talking about “page tree”). I know this data would have been easily available in Confluence Server

Labels:
Being able to filter by labels is an important use case for us. Being able to do that without relying on CQL would also be great (the index is not always up-to-date which is not acceptable in all use cases). I think we would want to be able to also get all the labels of content in addition to being able to filter by the labels. Maybe the labels could also be in the “core data”… Another alternative that could work great for use (for labels and eg attachments) would be if there could be an endpoint like ‘/content//descendant/page/labels’ that would fetch all the labels of content under parent page (or in whole space). We could then call endpoints like that concurrently with fetching the “core data”.

Praise:
Having cursor-based pagination sounds great! In addition to speed improvements, it would be great if the cursor-based pagination could work on a consistent snapshot even if content items are added or deleted while pagination is in progress - though I don’t think we’ve in practice seen race conditions like that very often.

Bulk fetching by set of ids (feature request):

In some of our use cases it would be great to have a bulk fetch endpoint that takes a set of content ids (or title and space key pairs). The content to fetch would not be connected in any clear way, the content items would for example not be children of the same parent.
We store references to content in this way in certain use cases, and need to fetch lots of content (potentially hundreds or thousands of content items) to build certain reports.
We’re currently using CQL queries based on a list of content ids to fetch eg 50 Confluence pages per request, and falling back to fetching possibly missing items one by one if needed - but this is not ideal.
An option to make a lot of requests concurrently if you think that it would be fine to do (very) many concurrent content fetches per single content id. Some guidance on what kind of patterns are desired for fetching a lot of content could be helpful

riku · November 17, 2022, 3:56pm

Some more things that would be useful

would be cool to have an endpoint that filters content by ‘lastModified’ or ‘created’ timestamps in addition to filtering by labels
would be cool if possible that all paginated responses would have a field telling the total size of the result set on the first page - this could be used for example to show progress information in some cases, or maybe to just get an up-to date value of how many pages there are in a space (without needing to actually fetch all of them)

PreethiK · November 17, 2022, 9:30pm

@riku Love these suggestions and appreciate you sharing them with our team! Will definitely keep these in mind as we continue to develop.