Good Morning Everyone,
When handling links/heading heading IDs in the page content after converting a page to a view format we have observed that for those same links targeting a page title that starts with special characters or digits, as well as headings belonging to that same page, the IDs of the headings are prefixed with “id-” before the page title part e.g.: id=“id-$pecialPage-Heading1”, this is also the case for digits and observable for any diacritic characters.
The issue arrises due to the fact that when we handle the links in our code base, we have to discern which of these fall into this “special” handling scenario as there was not explicit documentation regarding the subject, which broken our link handling when processing the sources originating from confluence. This inconsistency requires a workaround but is nonetheless an ad-hoc approach when it comes to the solution and its coverage.
A practical example which can be reproduced:
- Page title “Æcharacter”
- View format of the heading in that page:
<h1 id="id-Æcharacter-head">head</h1>
- When processing the anchors in the content, we prefix them with the page title, but when accessing it, the page title only returns
Æcharacter
instead ofid-Æcharacter
- When processing the anchors in the content, we prefix them with the page title, but when accessing it, the page title only returns
- View format of the heading in that page:
Although I am unaware of the reasons that sustain this case for page titles and their content, it would be ideal that either we get a more uniformed way of handling these scenarios, or at least have some transparency under the form of documentation that specifically states which cases fall under this category. As of today, we are considering any character that doesn’t match a lower or upper case character to be under this scope:
!Pattern.matches("[\\p{Lower}\\p{Upper}]"
Thank you for your time!
Regards
Miguel Machado @ K15t, Scroll Exporters