Links targeting page(s) with titles starting with special characters

MiguelMachado · November 18, 2022, 10:07am

Good Morning Everyone,

When handling links/heading heading IDs in the page content after converting a page to a view format we have observed that for those same links targeting a page title that starts with special characters or digits, as well as headings belonging to that same page, the IDs of the headings are prefixed with “id-” before the page title part e.g.: id=“id-$pecialPage-Heading1”, this is also the case for digits and observable for any diacritic characters.

The issue arrises due to the fact that when we handle the links in our code base, we have to discern which of these fall into this “special” handling scenario as there was not explicit documentation regarding the subject, which broken our link handling when processing the sources originating from confluence. This inconsistency requires a workaround but is nonetheless an ad-hoc approach when it comes to the solution and its coverage.

A practical example which can be reproduced:

Page title “Æcharacter”
- View format of the heading in that page: <h1 id="id-Æcharacter-head">head</h1>
  - When processing the anchors in the content, we prefix them with the page title, but when accessing it, the page title only returns Æcharacter instead of id-Æcharacter

Although I am unaware of the reasons that sustain this case for page titles and their content, it would be ideal that either we get a more uniformed way of handling these scenarios, or at least have some transparency under the form of documentation that specifically states which cases fall under this category. As of today, we are considering any character that doesn’t match a lower or upper case character to be under this scope:

!Pattern.matches("[\\p{Lower}\\p{Upper}]"

Thank you for your time!
Regards

Miguel Machado @ K15t, Scroll Exporters

nmansilla · December 5, 2022, 8:20pm

Hi @MiguelMachado, interesting find. I’m reaching out to some internal teams for more details about this behavior (which I have reproduced on my own instance as well). As soon as I have more information, I’ll share here.

nmansilla · December 6, 2022, 6:20am

@MiguelMachado, here is what we found out in our investigation. If the first character of the page title starts with anything other than a-z (or A-Z), then the id attribute value will be prepended with id- prefix.

Therefore, even numbers (along with special characters) in the first character position of the page title will trigger this prepending of id-. While I don’t have the exact date as to when this was enabled, it’s been in place for at least ten years.

I’ll be following up with the content team about including this information in the documentation. Thanks!

MiguelMachado · December 6, 2022, 1:52pm

Good Afternoon
Correct, those findings are in accordance with OP.
Thank you for the update. I’ll await further news.