How to correctly replicate our plugin's own Lucene index to newly added Jira DC nodes

denis · April 15, 2021, 10:20pm

Hello All,

Our plugin creates and maintains its own Lucene index (located in the “$JIRA-HOME/caches/indexesV1/plugins/our-idx-name/” folder).
When a new node is added to a Jira Data Center cluster – Jira automatically replicates Lucene indexes into it. But the index of our plugin is not replicated. We have to run the ‘full reindex’ operation in our plugin to re-create the index on the new node. This operation may be very time-consuming especially in comparison with a simple index replication through copying.

What is the best way for our plugin to behave in that case? Is there a way to ‘plug into’ the Jira index replication mechanisms? Are there any APIs to perform such a replication by a plugin?
What is the best way to achieve automatic index replication for our plugin?

During searching I found only the following article – Jira Data Center search indexing | Atlassian Support | Atlassian Documentation – that only shortly describes the index replication algorithm used by Jira (in the ‘Replicating the indexes’ section).

Thank you.

Denis.

AnkitTiwari · May 6, 2021, 7:54am

Hi Denis,

Thanks for sharing your concern with us.
We are looking into this and will revert soon.

AnkitTiwari · May 13, 2021, 12:05pm

Hi Denis,

We have looked into this and here are our findings:

Index replication is complex

There’s much more to the index replication than just copying files from one node to another.

In Jira DC, Indexes mutate based on events happening in the system. Issue index gets updated with any change in the issue. All these changes needs to be reflected on all other nodes in order to maintain consistently across the cluster. JiraDC uses the Document Based Replication technique for this.

Since interaction happens on one node and then gets replicated to other nodes, the changes to the indexes happen asynchronously, and they happen at different times on different nodes. So Indexes are updated in different order on individual nodes in the cluster.

Let’s unpack that. Let’s imagine two user requests reaching two different nodes to edit two different issues. These two nodes change their local index immediately and now they have new, unique states of their index. But they are ahead of all the other nodes in that cluster, which are unaware of the changes. As information reaches the other nodes, these nodes then reflect the changes in their index. And this may happen in different sequence due to the asynchronous nature of the communication.

It affects index replication to new nodes

The first problem is that if Apps bring in their own indexes, they need to manage the index state on individual nodes, and by copying the index, the App is still responsible to figure out which state is the index in, and that does not get replicated with just copying the files.

The second problem is that given any state of the index, the App needs to understand how to catch up in order to get the out-of-date index to up-to-date state.

There are various steps we have to take to manage this in Jira

Keeping information on mutations to issues index in the database
Index can be queried for the freshest update made to it, per-issue
Once index is replicated, regular updates flowing from other nodes are blocked
The fresh node scans that history and catches up from the latest included operation
Once that is done, if that took a lot of time (dependent on how much time it took) the previous operation may be needed to happen again
Once we catch up and we can start updating the index in real time, we do, and the replication is done.

In order to support index replication on basic, filesystem level, we could simply copy the files, without handling any of the other effects. This might give the false promise that this is enough to handle the scenario. It will create a huge gap in the support for this scenario. Due to these complex scenarios, for now it will not be possible to support index replication for Apps from our end.

But, here are some quick fixes for you

If your App handles data connected to issues, consider using Jira Fields to tap into the existing mechanism.

If you want to handle the index replication on your own, use the shared home to put snapshots of your index there and any additional information you need to catch up with later. You can also use cluster locks to provide synchronisation between nodes to build the behaviours needed to store the snapshot and to catch up with the index state later.