How should we handle large data in Confluence Server or Jira Server?

aragot · February 28, 2019, 5:09pm

Hi,

General question about Server programming (and scalability for Data Center versions): How are we supposed to handle large batches of data?

We support importing requirements (That’s our data) from Excel. How much data should we accept in a single Excel file?
Are we supposed to handle extremely large files? That may require a lot of work with the streaming apis, and not everything can be done such a way.
Are we supposed to let out-of-memory-error happen, or are we supposed to set an artificial limit to prevent OOME from happening? If so, how do we choose the magic number, should it be proportional to the Xmx parameter of Java/the RAM of the machine, according to our calculations on our computers? Why imposing a limit, given that some servers can handle more and we’re at risk of preventing the use of a feature that could work.
We have a report that takes a long time to build, and we’ve already optimized/cached what we could. How can we limit the impact? Do administrators have a way of limiting expensive requests to a certain % of the CPU?

All of this is one single question, as you’ve guessed. How are we supposed to limit the impact of some users managing a lot of data, so the other users can still use Confluence/Jira properly.

Best regards,
Adrien

nmansilla · February 28, 2019, 6:59pm

Hi @aragot - check out these guidelines for data center app development. Most of it is relevant for non-DC apps, too.

https://developer.atlassian.com/platform/marketplace/guidelines-for-data-center-app-development/

I’m going to ask my fellow Atlassian @ayakovlev to chime in on some of your questions. Andriy is a premier support engineer that specializes in performance.

danielwester · March 2, 2019, 5:30am

I can’t speak to what Atlassian will say but I’ll
relay the thoughts of myself 7 years ago - don’t mess with my instance, let me break it myself. I would recommend having a max of some type (stream when possible). If you want the user to override that - let them do it but put up dialogs and screens that lets them know that they’re heading into bad territory.

As far as the max in regards to memory - be aware that systems available memory really differs between each and depending on the time of tree day.

ayakovlev · March 4, 2019, 9:38pm

Hey @aragot,
To add to Neil and Daniel comments.

All of this is one single question, as you’ve guessed

Thanks for asking This is a very valid question and it’s important for application stability. To give you are a short and quick answer: add limits for everything and stream as much as possible.

Long version with comments:

How much data should we accept in a single Excel file?

It depends on your data and domain, in any case, I would advise having soft and hard limits (both configurable).

Are we supposed to handle extremely large files?

Same as above, but in any case streaming is highly recommended.

Are we supposed to let out-of-memory-error happen

The memory definitely should be controlled by the App and you shouldn’t let OOM. There will be many other users and actions running on the same node, so JVM crash might cause data loss.

… proportional to the Xmx parameter …

It’s quite unreliable since used heap changes over time (peak/non-peak) and depends on other installed Apps and actions. As an option, I would suggest to have a fixed buffer and specify expected JVM size in docs.

we’ve already optimized/cached what we could.

This is really good to hear

How can we limit the impact? Do administrators have a way of limiting expensive requests to a certain % of the CPU?

In general, there is no easy way. Since JVM running as a single process, it will be hard to limit CPU to specific threads inside the JVM.

A couple of ideas (used by other Vendors):

Throttle execution in batches - could be hard in some cases
Run execution in separate JVM - requires JVM process management.
Run a separate node for tasks - requires DC and extra hardware.

Hope this helps.
Cheers.

aragot · July 15, 2019, 10:00am

Thank you. Here’s what we’ve done:

We’ve added a “global limit” in the settings (and default limits for UI, import, etc) that admins can modify,
We’ve used pagination, obviously.

Therefore:

Customers can’t import more than X requirements at a time, they can’t view more than X, we never have a batch or a back-end job than manages more than X requirements at a time.
If a customer is unhappy and really wants to squeeze Y requirements, they can increase their limit.

The drawback is that they may be blocked from importing a 100.000-requirement Excel sheet that could have possibly gone well, but they generally get used to the limit (and rearrange their imports around this limit) and they have much more trust in the stability of their Confluence instance. In other words, they are ready for scale.

Thank to you and to @danielwester for both answers.