What is the maximum size of an ADF document? i.e. Should we use a streaming parser?

aragot · June 12, 2020, 3:10pm

For our Confluence Cloud addon, we have to implement ADF parsers. We’re worried about loading the whole ADF documents upfront in memory when we want to transform them. Is there a maximum size for an ADF document?

We have difficulty programming a parser which uses Java streams (I’m talking about Collections’s streams, not IO streams), because ADF does not specify the order of the “type” property in nodes. Example:

{
  "type": "table",
  "attrs": {...},
  "content": [...all_my_rows...]
}

In the above excerpt, I know the type is “table” so I can reinitialize all my variable and parse each row of the table without having to wait for the closing bracket of the table node.

But in the following excerpt, I have to keep everything in memory, and wait until the end to know that it was, in fact, a table:

{
  "attrs": {...},
  "content": [...all_my_rows...],
  "type": "table"
}

The order is not specified, and the same situation happens for every element, so I need to load the whole DOM in memory before I can work. I can only load the full DOM if the maximum document size is constrained.

Thank you,
Adrien

dmorrow · June 12, 2020, 11:19pm

Hi @aragot,

I see your conundrum. I’ll reach out to the editor team to see if there is a size limit and how they parse ADF.

Regards,
Dugald

dmorrow · June 15, 2020, 3:25am

Hi @aragot,

Confluence utilises a platform service to store ADF. This service specifies a limit in the size of the ADF, but the Confluence API doesn’t specify this limitation so I recognise that it is difficult for you to rely on it. Because of this, I’ve created [CONFCLOUD-70224] The existing ADF contract is inadequate to rigorously parse - Create and track feature requests for Atlassian products..

Between you an me, the ADF size limit specified by the internal service is 25 MB .

Regards,
Dugald

aragot · June 15, 2020, 11:51am

Hi Dugald,

Thank you very much for your research. Do you know whether Confluence uses a streaming API to parse ADF? Or do they build a whole model in memory of the entire document?

If we are close to their implementation, at least we can respond to changes more easily.
Best regards,
Adrien

dmorrow · June 15, 2020, 12:13pm

Hi @aragot,

I didn’t actually investigate which method is used by Confluence to parse ADF. I’ll get to this shortly.

Regards,
Dugald

dmorrow · June 16, 2020, 3:56am

Hi @aragot,

According to the development team, we load the entire ADF document into memory and then parse it.

Regards,
Dugald