Get table data from Confluence page with 'Page Properties Report' macro - Python

TomLarsen · November 8, 2023, 8:40am

Hi
I am trying to get table data from a Confluence cloud page to Python. The table on the page is generated by a ‘Page Properties Report’ macro.

I have tried using the Python code below. However, the received HTML code does not contain the table data. Do I have to use another expansion than body.view?

from atlassian import Confluence

username = 'zzzzz'
API_token = 'qqqqqq'

confluenceBaseUrl = 'https://xxxxxxx.atlassian.net/wiki/'
page_id = 'yyyyyyy' 

conf = Confluence(url=confluenceBaseUrl, username=username, password=API_token)

page = conf.get_page_by_id(page_id, expand='body.view')
page_content = page['body']['view']['value']

ibuchanan · November 8, 2023, 11:56am

Welcome to the Atlassian developer community @TomLarsen,

For context, can you link to the library you are using?

TomLarsen · November 10, 2023, 9:05am

GregKowalczyk · December 12, 2023, 11:30pm

Hello @TomLarsen,
If you just had simple html table added to the page you could use:
confluence.get_page_by_id(page, expand="body.storage")["body"]["storage"]["value"]

and then use bs4 library to scrap the html tables and save them as python list of lists.

However for page properties macro is stored in confluence pages as:

"<table data-table-width=\"760\" data-layout=\"default\" ac:local-id=\"38f83295-c9e8-4528-87a1-aa5ee197051f\">

I don’t see any easy way to extract content of page properties report table from API, at least in Confluence Cloud

MorganBeale · October 8, 2024, 5:51am

Hey, not sure if you found an answer yet, but for Confluence (not Cloud, but hopefully interchangeable) I have found that using the cql method of the api successfully retrieves the pages matching the label used for the page properties report, and then the get_tables_from_page method for each page matched can return the table data itself.

using your connection above:

import json  # to read the output of the tables func
import pandas as pd  # to wrap it up in a dataframe
from tqdm import tqdm  # just so we have a nice progress tracker

reports = conf.cql('label = "your_label" and space = "your_space_key"', limit=1000)

report_tables = []

for i, rpt in enumerate(tqdm(reports['results'])):
    id_ = int(rpt['content']['id'])
    report_table = json.loads(confl.get_tables_from_page(id_))
    table_content = {x[0]: x[1] for x in report_table['tables_content'][0]}
    report_tables.append(pd.DataFrame(table_content, index=[i]))

df = pd.concat(report_tables)
df.to_csv("report_register.csv")