Convert Confluence XML export to PDF

Hi all,

Currently, we use Confluence Server 7.5.2.
We have a Jenkins job that builds each night to export all our spaces in PDF (it is an internal requirement).
However, this action take a lot of CPU, a lot of memory and pollute our logs with a lot of warnings :

15-Jan-2021 08:14:02.542 WARNING [Catalina-utility-1] org.apache.catalina.valves.StuckThreadDetectionValve.notifyStuckThreadCompleted Thread [http-nio-8090-exec-5] (id=[203]) was previously reported to be stuck but has completed. It was active for approximately [24,348,599] milliseconds. There is/are still [1] thread(s) that are monitored by this Valve and may be stuck.
com.atlassian.confluence.content.render.xhtml.XhtmlException: RuntimeException occurred while performing an XHTML storage transformation (null)

We have thus contacted the Atlassian Support for help and advices and it advices us to export our spaces in XML and, then, to convert this XML to PDF.

Do you have any documentation or examples to help us ?
We have not found any documentation page to perform that.

More information : we have written a Python script for the XML export.

Kind regards

Hi @jira3,

The recommendation from our support team is most likely to import that XML export into a secondary instance of Confluence that is dedicated solely to the purpose of performing PDF exports. This is what we sometimes do for PDF exports on our own https://confluence.atlassian.com instance.

Using this method, the resource and performance pressures of the PDF export do not impact the day-to-day operations of your primary server.

You can generate and attach a developer license to your secondary Confluence instance fore this purpose (see How to get a Confluence Developer license | Confluence | Atlassian Documentation) so that you don’t have to pay for an additional license.

If this is not helpful, I’m happy to look at the support issue (if you provide the case number) and verify what their recommendation was.

Regards,
Joe Clark Atlassian

Hi,

We have opened this case : CSP-286941
Hemant (from Atlassian) have told us :

Hello Team,
Good day! I hope you are doing well.
Apologies for the delay in response and thanks for your patience on this ticket.
I still prefer to export each space in the XML format instead of HTML because XML export is easy and less resource consuming. After export all spaces in the XML, you can use any tool that can convert to PDF.
We are not the expert and we don’t have much information but you can check in the Atlassian Community for assistance.