Get Space page hierarchy using the REST api

mathieu.yargeau · March 13, 2018, 4:46pm

There was a question asked a while ago, but with no helpful answer.

In a webpage (a form in a html page in a Jira plugin), I want to build a tree view of the pages of a space, and maybe put them in a dropdown so the user can select one.

I am able to do a REST call to /rest/api/space/{spaceKey}/content to get the pages, but they are all in a list, with no hierarchy, and the ancestor field is always empty.

I am able to get only the root page with the same api, but setting the depth to “root”. From there, I could do a REST call to get the children of that page, then do a REST call for every child to get their children and so on…

This would mean a number of REST calls equal to the number of pages, which is not great. Is there a better way to do it?

pvandevoorde · March 14, 2018, 9:23am

Hi @mathieu.yargeau,

I’m going to check this with our Confluence team.

cheers,
Peter

pvandevoorde · March 14, 2018, 7:34pm

@mathieu.yargeau

I’m back with some news straight from our Confluence Cloud team:

The following API endpoints:

Should return all children, grandchildren, and so on… for a certain piece of content in Confluence.

I hope this helps.

mathieu.yargeau · March 14, 2018, 8:30pm

Thank you.

This is for Cloud? I am on Confluence Server 6.4.2.

I tried

/rest/api/space/{spaceKey}/content?depth=root

to get the root page, got the id from the root page and then called that api you suggested.

/rest/api/content/1310767/descendant/page

is giving me

{“statusCode”:501,“data”:{“authorized”:false,“valid”:true,“errors”:,“successful”:false},“message”:“Page children is currently only supported for direct children”}

When I know there are pages under the home page.

/rest/api/content/1310767/descendant

is just giving me generic information

{“_links”:{“base”:“http://localhost:2050”,“context”:“”,“self”:“http://localhost:2050/rest/api/content/1310767/descendant"},“_expandable”:{“attachment”:“/rest/api/content/1310767/descendant/attachment”,“comment”:"/rest/api/content/1310767/descendant/comment”}}

I tried several things with the expand parameters, but it didn’t seem to change anything.

pvandevoorde · March 14, 2018, 9:12pm

Do’h…

I read “confluence cloud page” in that post from the user community you posted and didn’t look at the category anymore…

Let me see what I can find out for Confluence Server

pvandevoorde · March 14, 2018, 9:20pm

Can you change this

/rest/api/content/1310767/descendant/page

into this

/rest/api/content/1310767/child/page

And tell me if that gives you the result you need?

mathieu.yargeau · March 14, 2018, 9:47pm

It does give me results, but they are what I was explaining in my first post. They give the pages directly below the page which id was given in the url, and not the pages below in a hierarchy tree. Even if I add “?expand=descendants.page”, which according to the documentation “returns pages that are descendants at any level below the content.”

Maybe this is not implemented in the Server 6.4.2 version.

pvandevoorde · March 14, 2018, 9:59pm

One more thing can you try this:

/rest/api/content/1310767/child/page?expand=children.page

mathieu.yargeau · March 15, 2018, 2:11pm

This one almost works. I didn’t try it before because it was saying “returns pages that are descendants at the level immediately below the content”.

And it seems it does exactly that. “1310767” is my home page. The result gives me all pages directly under the home page, and the children of those pages in hierarchy. However, it does not go beyond that depth wise. If I want to go deeper, I will have to do the same REST call on each grand-child.

I’ll have to check if it can satisfy the requirements. Thank you for your help.

jawa9000 · February 14, 2019, 6:49pm

@mathieu.yargeau, did you ever get a solution for generating a list of all descendants (children, grand children, etc.)? I too face a similar issue and aside from doing a recursive “dive down the hierarchy”, I hope you found and can share a simple solution.

gbrunning · October 17, 2019, 2:57am

I created a python script to get the page hierarchy levels and create a CSV with a column for each level and list pages in the appropriate column under their relevant parent. A CSV is obviously just one possible output but good as a demo. I’m sure there are people out there with better solutions but this gets the job done.

(This is for Confluence Server)

import requests, json
import csv

token = your_super_secret_token

site = 'https://yoursite.atlassian.com'

class Page:
    
    def __init__(self, title, url, page_id):
        self.url = url
        self.id = page_id
        self.title = title
        self.level = 0
        self.ancestors = []
        
    def __str__(self):
        return self.title+' - '+self.url

def get_pages(space, *args):

    headers = {
        "Authorization": "Basic " + token,
        "Content-Type": "application/json"
    }

    pages = []
    cql = f'space = {space} AND type = page'  # or blogpost
    url = site+f'/rest/api/content/search?cql={cql}&start=0&limit=50&expand=ancestors'

    # Get that passes in the space and expands the ancestors
    r = requests.get(url, headers=headers, timeout=10)
    
    page_list = r.json()['results']
    for page in page_list:
        
        pages.append(page)
    
    is_next_page = True
    
    while is_next_page:
        try:
            next_page = r.json()['_links']['next']
            url = site+next_page
            r = requests.get(url, headers=headers)
            
            page_list = r.json()['results']
            for page in page_list:
                pages.append(page)
        except KeyError:
            is_next_page = False
            
    return pages


def create_page_obj(page):
    title = page['title']
    url = page['_links']['webui']
    page_id = page["id"]
    p = Page(title, url, page_id)
    
    return p


def sort_pages(page_objs):
    # Add pages to a list based on their hierarchy and parent
    sorted_pages = []
    page_levels = max(page.level for page in page_objs)
    for level in range(page_levels + 1):
        if level == 0:
            # First add pages at the root level of the space
            sorted_pages.extend([page for page in page_objs if page.level == 0])

        else:
            # Create list of pages at the current level
            children = [page for page in page_objs if page.level == level]
            # Create a list of parent pages for the children
            parents = [page for page in sorted_pages if page.level == level - 1]
            for page in children:
                for pg in parents:
                    # Check whether the parent ID is in the child's ancestors and put the child after the parent if so.
                    if pg.id in page.ancestors:
                        try:
                            sorted_pages.insert(sorted_pages.index(pg) + 1, page)
                            continue
                        except ValueError:
                            print(pg.title + ' caused an error')
                    else:
                        continue
    for page in page_objs:
        if page not in sorted_pages:
            sorted_pages.append(page)
                        
    return sorted_pages


def create_csv(space, pages):
    page_levels = max(page.level for page in pages)
    with open(f'./{space}_hierarchy.csv', mode='w+') as levels:
        fieldnames = [f'Tree depth {level}' for level in range(page_levels+1)]

        fieldnames.append('URL')

        writer = csv.DictWriter(levels, fieldnames=fieldnames)

        writer.writeheader()
        
        for page in pages:
            link = site + page.url
            row_dict = {f'Tree depth {page.level}': page.title, 'URL': link}
            writer.writerow(row_dict)

def create_hierarchy_audit_csv(space, *args):
    pages = get_pages(space)
    page_objs = []
    for page in pages:
        pg = create_page_obj(page)
        page_objs.append(pg)
        pg.level = len(page['ancestors'])
        ancestors = page['ancestors']
        pg.ancestors = [ancestor['id'] for ancestor in ancestors]

    sorted_pages = sort_pages(page_objs)
    create_csv(space, sorted_pages)
    return None

HelenSetchell · March 24, 2020, 1:54pm

I find I can get multiple levels in one request like this
/rest/api/content/{pageId}/child?expand=page.children.page.children.page

In order to know how many ‘children.page’ parts to add to the end of that request I guess I’d need a way of finding the page with the max depth. Haven’t tried it yet but I’m thinking do this first
/rest/api/space/{spaceKey}/content?expand=ancestors
Loop through all results and set the max depth based on max number of ancestors, then create the first query based on that.

You’d also have to deal with paging potentially.

I’m also looking at how I do this from the site’s homepage, which I can get from
/rest/api/space/{spaceKey}?expand=homepage
but then my max depth from above might be a bit too big (but not sure that matters if I put early exits on loops)

HelenSetchell · March 26, 2020, 9:03am

Another thing I’ve tried to keep requests to a minimum is calling /rest/api/space/{spacekey}/content/page?expand=ancestors (add any paging you need) and creating this array of objects (I’m using javascript/jquery)

$.each(results,function(r,result) {
	parentId = (typeof result.ancestors[0] == 'undefined' ? '0' : result.ancestors[(result.ancestors.length-1)].id);
	childPosition = (result.extensions.position=='none' ? 0 : result.extensions.position);
	tree.push({
		'parentId':parentId,
		'childPosition':childPosition,
		'childId':result.id,
		'childTitle':result.title,
		'childUrl':result._links.webui
	});
});

Then using a recursive function (or whatever) to build your tree from that.

The one caveat is the immediate parentId bit, which I get from the last item in the array of returned ancestors:

result.ancestors.length-1)].id

Every now and then the immediate parentId was not the last item in the array of ancestors, and the pages where they were in the ‘wrong’ order then appeared at the root of the tree. However, if I visited those page in the browser and then re-ran the script, the order of the ancestors was reliable again.

I found some mention of this elsewhere https://community.atlassian.com/t5/Answers-Developer-Questions/Get-a-page-s-immediate-ancestor-parent-using-Confluence-REST-API/qaq-p/542273

HelenSetchell · March 26, 2020, 9:18am

One last possible/hacky solution (incomplete - see note below) if you have the pagetree macro available and you want to dynamically add a pagetree to a Confluence page:

Some variables needed (javascript styley):

thisPageId = AJS.params.pageId;
var rootPageId = '';  //the root of the tree you want to show, e.g. the space homepage id
var spaceKey = '';  //as it appears in the URL
var thisPageDescendentIds = [];  //populate this array before running the code (see note below)

string to create:

<div class="plugin_pagetree conf-macro output-inline" data-hasbody="false" data-macro-name="pagetree">
	<ul class="plugin_pagetree_children_list plugin_pagetree_children_list_noleftspace">
		<div class="plugin_pagetree_children">
		</div>
	</ul>
	<fieldset class="hidden">
		<input type="hidden" name="treeId" value="" />
		<input type="hidden" name="treeRequestId" value="/plugins/pagetree/naturalchildren.action?decorator=none&amp;excerpt=false&amp;sort=position&amp;reverse=false&amp;disableLinks=false&amp;expandCurrent=false&amp;placement=" />
		<input type="hidden" name="treePageId" value="' + thisPageId + '" />

		<input type="hidden" name="noRoot" value="false" />
		<input type="hidden" name="rootPageId" value="' + rootPageId + '" />

		<input type="hidden" name="rootPage" value="" />
		<input type="hidden" name="startDepth" value="0" />
		<input type="hidden" name="spaceKey" value="' + spaceKey + '" />

		<input type="hidden" name="i18n-pagetree.loading" value="Loading..." />
		<input type="hidden" name="i18n-pagetree.error.permission" value="Unable to load page tree. It seems that you do not have permission to view the root page." />
		<input type="hidden" name="i18n-pagetree.eeror.general" value="There was a problem retrieving the page tree. Please check the server log file for more information." />
		<input type="hidden" name="loginUrl" value="/login.action?os_destination=%2Fpages%2Fviewpage.action%3FpageId%3D' + thisPageId + '&amp;permissionViolation=true" />
		<input type="hidden" name="mobile" value="false" />
		<input type="hidden" name="placement" value="" />

		<fieldset class="hidden">
			<input type="hidden" name="ancestorId" value="' + thisPageId > ancestors[0].id + '" />
			<input type="hidden" name="ancestorId" value="' + thisPageId > ancestors[1].id + '" />
			<input type="hidden" name="ancestorId" value="' + thisPageId > ancestors[2].id + '" />
			<input type="hidden" name="ancestorId" value="' + thisPageId > ancestors[...].id + '" />
		</fieldset>
	</fieldset>
</div>

Notes:

Most of the time this works fine but occasionally I’ve noticed it doesn’t initialise (i.e. nothing is shown), so I still need to work out order of events on the page and see what I can do; any help appreciated
I’ve found I don’t need to include the inner fieldset of ancestor Ids, even though they do appear in the code when the pagetree macro is in a page

PS - if you’re wondering why I would want to do this it’s because I’m building a tool to help me easily review and restructure content across various spaces so that we can re-use the content more reliably elsewhere, e.g. in a virtual web assistant and in within-webapp help popups.

PPS - I’m vaguely aware that there might be a pagetree rest api I should be using instead of what’s above, but all the examples were for something more complex that I needed and I was getting tired (!) but I’m open to suggestion.

UPDATE: Instead of working out how to initialise whatever adds nodes to the above code, I’ve realised I can populate the contents myself, i.e. the bit in between

		<div class="plugin_pagetree_children">
		</div>

from the code above by calling

url = '/plugins/pagetree/naturalchildren.action?decorator=none&excerpt=false&sort=position&reverse=false&disableLinks=false&expandCurrent=true&placement=sidebar&hasRoot=true&pageId=' + homepageId + '&treeId=0&startDepth=0&mobile=false'

then loop through thisPageId’s ancestorIds to add as many of these bit to the querystring as needed

url = url + '&ancestors=' + ancestor.id

then finish by adding

url = url + '&treePageId=' + thisPageId

the plugins/pagetree call returns html not json btw.

ArnaudBoyer · January 2, 2024, 9:52am

Hello fellow community members, the newer API offers the solution one would think (haven’t tested it yet The Confluence Cloud REST API (atlassian.com)

vijayendra · September 13, 2024, 5:34am

I’m encountering a Confluence server API error when attempting to fetch children of a page. This API generally works but fails for specific server instances. The error only occurs when I try to expand “children.page”. Removing this parameter resolves the issue.

The API call in question is:
rest/api/space/{spaceID}/content?expand=ancestors,children.page&start=0&limit=100