Hi
I am trying to get table data from a Confluence cloud page to Python. The table on the page is generated by a ‘Page Properties Report’ macro.
I have tried using the Python code below. However, the received HTML code does not contain the table data. Do I have to use another expansion than body.view?
from atlassian import Confluence
username = 'zzzzz'
API_token = 'qqqqqq'
confluenceBaseUrl = 'https://xxxxxxx.atlassian.net/wiki/'
page_id = 'yyyyyyy'
conf = Confluence(url=confluenceBaseUrl, username=username, password=API_token)
page = conf.get_page_by_id(page_id, expand='body.view')
page_content = page['body']['view']['value']
1 Like
Welcome to the Atlassian developer community @TomLarsen,
For context, can you link to the library you are using?
Hello @TomLarsen,
If you just had simple html table added to the page you could use:
confluence.get_page_by_id(page, expand="body.storage")["body"]["storage"]["value"]
and then use bs4 library to scrap the html tables and save them as python list of lists.
However for page properties macro is stored in confluence pages as:
"<table data-table-width=\"760\" data-layout=\"default\" ac:local-id=\"38f83295-c9e8-4528-87a1-aa5ee197051f\">
I don’t see any easy way to extract content of page properties report table from API, at least in Confluence Cloud
1 Like
Hey, not sure if you found an answer yet, but for Confluence (not Cloud, but hopefully interchangeable) I have found that using the cql
method of the api successfully retrieves the pages matching the label used for the page properties report, and then the get_tables_from_page
method for each page matched can return the table data itself.
using your connection above:
import json # to read the output of the tables func
import pandas as pd # to wrap it up in a dataframe
from tqdm import tqdm # just so we have a nice progress tracker
reports = conf.cql('label = "your_label" and space = "your_space_key"', limit=1000)
report_tables = []
for i, rpt in enumerate(tqdm(reports['results'])):
id_ = int(rpt['content']['id'])
report_table = json.loads(confl.get_tables_from_page(id_))
table_content = {x[0]: x[1] for x in report_table['tables_content'][0]}
report_tables.append(pd.DataFrame(table_content, index=[i]))
df = pd.concat(report_tables)
df.to_csv("report_register.csv")